Re: [PATCH] Provide diagnostic hints for missing inttypes.h string constants.

2020-05-22 Thread Mark Wielaard
Hi,

On Fri, May 22, 2020 at 09:42:28AM -0400, David Malcolm wrote:
> On Fri, 2020-05-22 at 01:30 +0200, Mark Wielaard wrote:
> > This is on top of the stdbool.h and stdint.h patches.
> 
> Sorry, I didn't see those patches; I've replied to them now.

No worries, there was no hurry and I didn't CC you on those.
I pushed them both now. Thanks.

> > This adds a flag to c_parser so we know when we were trying to
> > constract a string literal. If there is a parse error and we were
> > constructing a string literal, and the next token is an unknown
> > identifier name, and we know there is a standard header that defines
> > that name as a string literal, then add a missing header hint to
> > the error messsage.
> 
> By comparison, what's the existing behavior for the example?
> Is it just what you posted below, but without the "note" diagnostics?

Yes.

> > I haven't checked yet if we can do something similar for the C++
> > parser. And the testcase needs to be extended a bit. But I hope the
> > direction is OK.
> 
> I think it's a worthy goal; as noted below I'd want a C frontend
> maintainer to approve those parts of it.

OK.

> > $ /opt/local/gcc/bin/gcc -c t.c
> > t.c:4:26: error: ‘PRIx64’ undeclared here (not in a function)
> > 4 | const char *hex64_fmt =  PRIx64;
> >   |  ^~
> > t.c:3:1: note: ‘PRIx64’ is defined in header ‘’; did you forget 
> > to ‘#include ’?
> > 2 | #include 
> >   +++ |+#include 
> > 3 |
> > t.c:5:35: error: expected ‘,’ or ‘;’ before ‘PRIx64’
> > 5 | const char *msg_fmt = "Provide %" PRIx64 "\n";
> >   |   ^~
> > t.c:5:35: note: ‘PRIx64’ is defined in header ‘’; did you 
> > forget to ‘#include ’?
> 
> It's a big improvement, but it's still a little clunky.  The
>   expected ‘,’ or ‘;’ before ‘PRIx64’
> is the sort of thing we've gotten used to with years of GCC messages,
> but might make little sense to a neophyte.
> I wonder if it's possible to improve this further, but I fear that that
> might overcomplicate things (if I understand things correctly, the
> string concatenation fails in the tokenizer, and then the PRIx64 fails
> in the parser, so we have a bad interaction involving two different
> levels of abstraction within the frontend - I think).

Yes, part of the string concatenation is done by the tokenizer, the
other part by the parser. With this patch we now track when we were
trying to concatenate a string. Then if any error occurs and we notice
an unknown token is next that could be a string literal then we add
the hint to the error message with an alternative suggestion/hint for
a fix. I didn't want to override the existing error message, because
it might be right or better, and we don't actually know whether with
the suggestion the rest will actully parse correctly.

> > +/* These can be used as string macros or as identifiers. Must all be
> > +   string-literals.  Used in get_stdlib_header_for_name and
> > +   get_c_stdlib_header_for_string_macro_name.  */
> > +static const stdlib_hint c99_cxx11_macro_hints[] = {
> > +/*  and .  */
> > +{"PRId8", {"", ""} },
> > [...]
> > +{"SCNxPTR", {"", ""} }
> > +};
> 
> I found myself squinting at the array trying to decide if every entry
> had
>{"", ""}
> as its second element.  I *think* that's the case, right? 
> 
> I know there's a lot of pre-existing duplication in this file, but
> maybe this would be better as simply an array of const char *?
> It could probably be reformatted to take up far fewer lines by grouping
> them logically.
> 
> Maybe have a
> 
>   static const char *
>   get_c99_cxx11_macro_hint (const char *, enum stdlib lib);
> 
> to do the predicate, and return one of "", "", or NULL?

Yes, that is actually better. And much easier to read. And the code
can still be shared between get_c_stdlib_header_for_string_macro_name
and get_stdlib_header_for_name. Changed in the attached patch.

I also extended the testcase so it covers both identifiers and string
concatenation cases. The identifier hints also work for C++, but I
haven't yet added the string token concatenation detection to the
cp_parser. It looks like it can be done similar as done for the
c_parser, but the parsers are different enough that it seems
better/simpler to do that in a followup patch once we have this patch
in the the c_parser.

Thanks,

Mark
>From 98be2e83b6d0e69d769e8eff0f4b815240e51e1c Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Fri, 22 May 2020 01:10:50 +0200
Subject: [PATCH] Provide diagnostic hints for missing inttypes.h string
 constants.

This adds a flag to c_parser so we know when we were trying to
construct a string literal. If there is a parse error and we were
constructing a string literal, and the next token is an unknown
identifier name, and we know there is a standard header that defines
that name as a string literal, then add a missing header hint to
the error messsage.

gcc/c-family/ChangeLog:

	* known-headers.cc 

Re: [PATCH] c++: template instantiation during fold_for_warn [PR94038]

2020-05-22 Thread Jason Merrill via Gcc-patches

On 5/22/20 9:18 PM, Patrick Palka wrote:

On Fri, 22 May 2020, Jason Merrill wrote:

On 5/20/20 10:08 PM, Patrick Palka wrote:

On Wed, 20 May 2020, Patrick Palka wrote:

On Tue, 19 May 2020, Jason Merrill wrote:


On 5/8/20 11:42 AM, Patrick Palka wrote:

On Wed, 6 May 2020, Patrick Palka wrote:


On Wed, 6 May 2020, Patrick Palka wrote:


On Tue, 5 May 2020, Patrick Palka wrote:


On Tue, 5 May 2020, Patrick Palka wrote:


Unfortunately, the previous fix to PR94038 is fragile.  When
the
argument to fold_for_warn is a bare CALL_EXPR, then all is
well: the
result of maybe_constant_value from fold_for_warn (with
uid_sensitive=true) is reused via the cv_cache in the
subsequent
call to
maybe_constant_value from cp_fold (with uid_sensitive=false),
so we
avoid instantiating bar.

But when the argument to fold_for_warn is more complex, e.g.
an
INDIRECT_REF of a CALL_EXPR, as in the testcase below (due to
bar()
returning const int& which we need to decay to int) then from
fold_for_warn we call maybe_constant_value on the
INDIRECT_REF, and
from
cp_fold we call it on the CALL_EXPR, so there is no reuse via
the
cv_cache and we therefore end up instantiating bar.

So for a more robust solution to this general issue of warning
flags
affecting code generation, it seems that we need a way to
globally
avoid
template instantiation during constexpr evaluation whenever
we're
performing warning-dependent folding.

To that end, this patch replaces the flag
constexpr_ctx::uid_sensitive
with a global flag uid_sensitive_constexpr_evaluation_p, and
enables
it
during fold_for_warn using an RAII helper.

The patch also adds a counter that keeps track of the number
of
times
uid_sensitive_constexpr_evaluation_p is called, and we use
this to
determine whether the result of constexpr evaluation depended
on the
flag's value.  This lets us safely update the cv_cache and
fold_cache
from fold_for_warn in the most common case where the flag's
value
was
irrelevant during constexpr evaluation.


Here's some statistics about about the fold cache and cv cache that
were
collected when building the testsuite of range-v3 (with the default
build flags which include -O -Wall -Wextra, so warning-dependent
folding
applies).

The "naive solution" below refers to the one in which we would refrain
from updating the caches at the end of cp_fold/maybe_constant_value
whenever we're in uid-sensitive constexpr evaluation mode (i.e.
whenever
we're called from fold_for_warn), regardless of whether constexpr
evaluation was actually restricted.


FOLD CACHE  | cache hit | cache miss| cache miss|
   |   | cache updated | cache not updated |
+---+---+---+
naive sol'n |   5060779 |  11374667 |  12887790 |
+
this patch  |   6665242 |  19688975 |199137 |
+---+---+---+


CV CACHE| cache hit | cache miss| cache miss|
   |   | cache updated | cache not updated |
+---+---+---+
naive sol'n | 76214 |   3637218 |   6889213 |
+
this patch  |215980 |   9882310 |405513 |
+---+---+---+

-- >8 --

gcc/cp/ChangeLog:

PR c++/94038
* constexpr.c (constexpr_ctx::uid_sensitive): Remove field.
(uid_sensitive_constexpr_evaluation_value): Define.
(uid_sensitive_constexpr_evaluation_true_counter): Define.
(uid_sensitive_constexpr_evaluation_p): Define.
(uid_sensitive_constexpr_evaluation_sentinel): Define its
constructor.
(uid_sensitive_constexpr_evaluation_checker): Define its
constructor and its evaluation_restricted_p method.
(get_fundef_copy): Remove 'ctx' parameter.  Use u_s_c_e_p
instead of constexpr_ctx::uid_sensitive.
(cxx_eval_call_expression): Use u_s_c_e_p instead, and test it
last.  Adjust call to get_fundef_copy.
(instantiate_cx_fn_r): Test u_s_c_e_p so that we increment the
counter if necessary.
(cxx_eval_outermost_constant_expr): Remove 'uid_sensitive'
parameter.  Adjust function body accordingly.
(maybe_constant_value): Remove 'uid_sensitive' parameter and
adjust function body accordingly.  Set up a
uid_sensitive_constexpr_evaluation_checker, and use it to
conditionally update the cv_cache.
* cp-gimplify.h (cp_fold): Set up a
uid_sensitive_constexpr_evaluation_checker, and use it to
conditionally update the fold_cache.
* cp-tree.h (maybe_constant_value): Update declaration.
(struct uid_sensitive_constexpr_evaluation_sentinel): Define.
(struct sensitive_constexpr_evaluation_checker): Define.
* expr.c (fold_for_warn): 

Re: [PATCH] More c++ math reject macros

2020-05-22 Thread Alexandre Oliva
Hi, Doug,

Sorry about the delay, I'd somehow missed your email.

On May 20, 2020, Douglas B Rupp  wrote:

> 2020-05-20  Douglas B Rupp  

> libstdc++v3/
>   * crossconfig.m4 (GLIBCXX_CHECK_MATH_DECL): More reject macros.
>   * configure: Rebuild.

The change is ok, but the ChangeLog entry isn't.

You're not modifying the m4 macro associated with the name between
parentheses, but rather a use thereof.  The entry of mine that you based
it on actually introduced the m4 macro (and its use), so the
parenthesized ChangeLog construct made sense there.  Without a change to
that macro, as in your patch, it doesn't seem correct.

I also realized my own ChangeLog entry was misleading and unclear:
macros could refer to both the preprocessor macros defined by the
checks, that are then used to reject replacement definitions, or to the
standard math functions themselves, that might be implemented by macros
in C, and we can't allow such macro implementations to pollute all C++
namespaces.


Patch is ok with the following ChangeLog entry:

* crossconfig.m4 <*-vxworks>: Check for more math decls.
* configure: Rebuild.


Thanks!

-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


Re: [PATCH] c++: template instantiation during fold_for_warn [PR94038]

2020-05-22 Thread Patrick Palka via Gcc-patches
On Fri, 22 May 2020, Jason Merrill wrote:
> On 5/20/20 10:08 PM, Patrick Palka wrote:
> > On Wed, 20 May 2020, Patrick Palka wrote:
> > > On Tue, 19 May 2020, Jason Merrill wrote:
> > > 
> > > > On 5/8/20 11:42 AM, Patrick Palka wrote:
> > > > > On Wed, 6 May 2020, Patrick Palka wrote:
> > > > > 
> > > > > > On Wed, 6 May 2020, Patrick Palka wrote:
> > > > > > 
> > > > > > > On Tue, 5 May 2020, Patrick Palka wrote:
> > > > > > > 
> > > > > > > > On Tue, 5 May 2020, Patrick Palka wrote:
> > > > > > > > 
> > > > > > > > > Unfortunately, the previous fix to PR94038 is fragile.  When
> > > > > > > > > the
> > > > > > > > > argument to fold_for_warn is a bare CALL_EXPR, then all is
> > > > > > > > > well: the
> > > > > > > > > result of maybe_constant_value from fold_for_warn (with
> > > > > > > > > uid_sensitive=true) is reused via the cv_cache in the
> > > > > > > > > subsequent
> > > > > > > > > call to
> > > > > > > > > maybe_constant_value from cp_fold (with uid_sensitive=false),
> > > > > > > > > so we
> > > > > > > > > avoid instantiating bar.
> > > > > > > > > 
> > > > > > > > > But when the argument to fold_for_warn is more complex, e.g.
> > > > > > > > > an
> > > > > > > > > INDIRECT_REF of a CALL_EXPR, as in the testcase below (due to
> > > > > > > > > bar()
> > > > > > > > > returning const int& which we need to decay to int) then from
> > > > > > > > > fold_for_warn we call maybe_constant_value on the
> > > > > > > > > INDIRECT_REF, and
> > > > > > > > > from
> > > > > > > > > cp_fold we call it on the CALL_EXPR, so there is no reuse via
> > > > > > > > > the
> > > > > > > > > cv_cache and we therefore end up instantiating bar.
> > > > > > > > > 
> > > > > > > > > So for a more robust solution to this general issue of warning
> > > > > > > > > flags
> > > > > > > > > affecting code generation, it seems that we need a way to
> > > > > > > > > globally
> > > > > > > > > avoid
> > > > > > > > > template instantiation during constexpr evaluation whenever
> > > > > > > > > we're
> > > > > > > > > performing warning-dependent folding.
> > > > > > > > > 
> > > > > > > > > To that end, this patch replaces the flag
> > > > > > > > > constexpr_ctx::uid_sensitive
> > > > > > > > > with a global flag uid_sensitive_constexpr_evaluation_p, and
> > > > > > > > > enables
> > > > > > > > > it
> > > > > > > > > during fold_for_warn using an RAII helper.
> > > > > > > > > 
> > > > > > > > > The patch also adds a counter that keeps track of the number
> > > > > > > > > of
> > > > > > > > > times
> > > > > > > > > uid_sensitive_constexpr_evaluation_p is called, and we use
> > > > > > > > > this to
> > > > > > > > > determine whether the result of constexpr evaluation depended
> > > > > > > > > on the
> > > > > > > > > flag's value.  This lets us safely update the cv_cache and
> > > > > > > > > fold_cache
> > > > > > > > > from fold_for_warn in the most common case where the flag's
> > > > > > > > > value
> > > > > > > > > was
> > > > > > > > > irrelevant during constexpr evaluation.
> > > > > 
> > > > > Here's some statistics about about the fold cache and cv cache that
> > > > > were
> > > > > collected when building the testsuite of range-v3 (with the default
> > > > > build flags which include -O -Wall -Wextra, so warning-dependent
> > > > > folding
> > > > > applies).
> > > > > 
> > > > > The "naive solution" below refers to the one in which we would refrain
> > > > > from updating the caches at the end of cp_fold/maybe_constant_value
> > > > > whenever we're in uid-sensitive constexpr evaluation mode (i.e.
> > > > > whenever
> > > > > we're called from fold_for_warn), regardless of whether constexpr
> > > > > evaluation was actually restricted.
> > > > > 
> > > > > 
> > > > > FOLD CACHE  | cache hit | cache miss| cache miss|
> > > > >   |   | cache updated | cache not updated |
> > > > > +---+---+---+
> > > > > naive sol'n |   5060779 |  11374667 |  12887790 |
> > > > > +
> > > > > this patch  |   6665242 |  19688975 |199137 |
> > > > > +---+---+---+
> > > > > 
> > > > > 
> > > > > CV CACHE| cache hit | cache miss| cache miss|
> > > > >   |   | cache updated | cache not updated |
> > > > > +---+---+---+
> > > > > naive sol'n | 76214 |   3637218 |   6889213 |
> > > > > +
> > > > > this patch  |215980 |   9882310 |405513 |
> > > > > +---+---+---+
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   PR c++/94038
> > > > >   * constexpr.c (constexpr_ctx::uid_sensitive): Remove field.
> > > > >   

Re: [PATCH v2] c++: Change the default dialect to C++17.

2020-05-22 Thread Jason Merrill via Gcc-patches

On 5/19/20 7:34 PM, Marek Polacek wrote:

On Mon, May 18, 2020 at 03:51:36PM -0400, Jason Merrill wrote:

On 5/16/20 6:34 PM, Marek Polacek wrote:

Since GCC 9, C++17 support is no longer experimental.  It was too late
to change the default C++ dialect to C++17 in GCC 10, but I think now
it's time to pull the trigger (C++14 was made the default in GCC 6.1).
We're still missing two C++17 library features, but that shouldn't stop
us.  See

and

for the C++17 status.

I won't list all C++17 features here, but just a few heads-up:

- trigraphs were removed (hardly anyone cares, unless your keyboard is
missing the # key),
- operator++(bool) was removed (so some tests now run in C++14 and down
only),
- the keyword register was removed (some legacy code might trip on
this),
- noexcept specification is now part of the type system and C++17 does
not allow dynamic exception specifications anymore (the empty throw
specification is still available, but it is deprecated),
- the evaluation order rules are different in C++17,
- static constexpr data members are now implicitly inline (which makes
them definitions),
- C++17 requires guaranteed copy elision, meaning that a copy/move
constructor call might be elided completely.  That means that if
something relied on a constructor being instantiated via e.g. copying
a function parameter, it might now fail.

Jakub, the last point is actually what I encountered in for-27.C and it
looks like an OpenMP issue, in that it wrongly depends on an implicit
instantiation.  Am I mistaken?

Due to the 'register' removal, this patch depends on the regenerated
cfns.h version I posted earlier today.

I'll post an update for cxx-status.html and add a new caveat to changes.html
once this is in.

Bootstrapped/regtested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
and powerpc64le-unknown-linux-gnu.  Ok for trunk?

* doc/invoke.texi (C Dialect Options): Adjust -std default for C++.
* doc/standards.texi (C Language): Correct the default dialect.
(C++ Language): Update the default for C++ to gnu++17.

* c-opts.c (c_common_init_options): Default to gnu++17.

* c-c++-common/torture/vector-subscript-3.c: Remove the register
keywords.
* g++.dg/cpp1z/attributes-enum-1a.C: Update for C++17.
* g++.dg/cpp1z/fold7a.C: Likewise.
* g++.dg/cpp1z/nontype3a.C: Likewise.
* g++.dg/cpp1z/utf8-2a.C: Likewise.
* g++.dg/parse/error11.C: Use c++14_down.
* g++.dg/torture/pr34850.C: Add -Wno-attribute-warning.
* g++.dg/torture/pr49394.C: Add dg-warning.
* g++.dg/torture/pr82154.C: Use -std=c++14.
* g++.dg/ubsan/pr85029.C: Use -Wno-register.
* g++.dg/ubsan/vptr-12.C: Use -fstrong-eval-order=some.
* lib/target-supports.exp: Set to C++17.
* obj-c++.dg/try-catch-9.mm: Use -Wno-register.

* testsuite/libgomp.c++/atomic-3.C: Use -std=gnu++14.
* testsuite/libgomp.c++/for-27.C: Explicitly instantiate the
copy constructor for I.
---
   gcc/c-family/c-opts.c   | 4 ++--
   gcc/doc/invoke.texi | 2 +-
   gcc/doc/standards.texi  | 4 ++--
   gcc/testsuite/c-c++-common/torture/vector-subscript-3.c | 6 +++---
   gcc/testsuite/g++.dg/cpp1z/attributes-enum-1a.C | 4 ++--
   gcc/testsuite/g++.dg/cpp1z/fold7a.C | 4 ++--
   gcc/testsuite/g++.dg/cpp1z/nontype3a.C  | 4 ++--
   gcc/testsuite/g++.dg/cpp1z/utf8-2a.C| 4 ++--
   gcc/testsuite/g++.dg/parse/error11.C| 5 +++--
   gcc/testsuite/g++.dg/torture/pr34850.C  | 2 +-
   gcc/testsuite/g++.dg/torture/pr49394.C  | 2 +-
   gcc/testsuite/g++.dg/torture/pr82154.C  | 3 ++-
   gcc/testsuite/g++.dg/ubsan/pr85029.C| 2 +-
   gcc/testsuite/g++.dg/ubsan/vptr-12.C| 2 +-
   gcc/testsuite/lib/target-supports.exp   | 2 +-
   gcc/testsuite/obj-c++.dg/try-catch-9.mm | 2 +-
   libgomp/testsuite/libgomp.c++/atomic-3.C| 3 ++-
   libgomp/testsuite/libgomp.c++/for-27.C  | 6 ++
   18 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 7695e88c130..5b266c06f3b 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -256,9 +256,9 @@ c_common_init_options (unsigned int decoded_options_count,
  }
   }
-  /* Set C++ standard to C++14 if not specified on the command line.  */
+  /* Set C++ standard to C++17 if not specified on the command line.  */
 if (c_dialect_cxx ())
-set_std_cxx14 (/*ISO*/false);
+set_std_cxx17 (/*ISO*/false);
 

[pushed] c++: Fix C++17 eval order for virtual op=.

2020-05-22 Thread Jason Merrill via Gcc-patches
In a function call expression in C++17 evaluation of the function pointer is
sequenced before evaluation of the function arguments, but that doesn't
apply to function calls that were written using operator syntax.  In
particular, for operators with right-to-left ordering like assignment, we
must not evaluate the LHS to find a virtual function before we evaluate the
RHS.

gcc/cp/ChangeLog:

* cp-gimplify.c (cp_gimplify_expr) [CALL_EXPR]: Don't preevaluate
the function address if the call used operator syntax.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/eval-order9.C: New test.
---
 gcc/cp/cp-gimplify.c |  1 +
 gcc/testsuite/g++.dg/cpp1z/eval-order9.C | 18 ++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/eval-order9.C

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 2804958c246..b60a319283f 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -852,6 +852,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
   ret = GS_OK;
   if (flag_strong_eval_order == 2
  && CALL_EXPR_FN (*expr_p)
+ && !CALL_EXPR_OPERATOR_SYNTAX (*expr_p)
  && cp_get_callee_fndecl_nofold (*expr_p) == NULL_TREE)
{
  tree fnptrtype = TREE_TYPE (CALL_EXPR_FN (*expr_p));
diff --git a/gcc/testsuite/g++.dg/cpp1z/eval-order9.C 
b/gcc/testsuite/g++.dg/cpp1z/eval-order9.C
new file mode 100644
index 000..7001f5a7b52
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/eval-order9.C
@@ -0,0 +1,18 @@
+// { dg-do run }
+// { dg-additional-options -fstrong-eval-order=all }
+
+struct A
+{
+  virtual A& operator=(const A&) { return *this; }
+};
+
+int i;
+
+A& f() { if (i != 1) __builtin_abort (); i = 2; static A a; return a; }
+A& g() { if (i != 0) __builtin_abort (); i = 1; static A a; return a; }
+
+int main()
+{
+  f() = g();
+  if (i != 2) __builtin_abort ();
+}

base-commit: 72af65b91cc2a2eb726afe56af6b44d6c57bb10f
-- 
2.18.1



[PATCH] Check and substitute AR in libcpp and libdecnumber

2020-05-22 Thread David Edelsohn via Gcc-patches
TL;DR: This patch updates configure.ac and Makefile.in in libcpp and
libdecnumber to substitute AR archiver.

AIX supports "FAT" libraries containing 32 bit and 64 bit objects
(similar to Darwin), but commands for manipulating libraries do not
default to accept both 32 bit and 64 bit object files.  While updating
the AIX configuration to support building and running GCC as a 64 bit
application, I have encountered some build libraries that hard code
AR=ar instead of testing the environment.

This patch adds AR_CHECK_TOOL(AR, ar) to configure.ac for the two
libraries and updates Makefile.in to accept the substitution.

Bootstrapped on powerpc64le-ibm-linux-gnu.

Okay?

Thanks, David

libcpp/
* Makefile.in (AR): Substitute @AR@.
* configure.ac (CHECK_PROG AR): New.
* configure: Regenerate.

libdecnumber/
* Makefile.in (AR): Substitute @AR@.
* configure.ac (CHECK_PROG AR): New.
* configure: Regenerate.

diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
index ebbca07..5fbba9b 100644
--- a/libcpp/Makefile.in
+++ b/libcpp/Makefile.in
@@ -25,7 +25,7 @@ srcdir = @srcdir@
 top_builddir = .
 VPATH = @srcdir@
 INSTALL = @INSTALL@
-AR = ar
+AR = @AR@
 ARFLAGS = cru
 ACLOCAL = @ACLOCAL@
 AUTOCONF = @AUTOCONF@
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index 540efeb..1efa96f 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -12,6 +12,7 @@ AC_PROG_INSTALL
 AC_PROG_CC
 AC_PROG_CXX
 AC_PROG_RANLIB
+AC_CHECK_TOOL(AR, ar)

 AC_USE_SYSTEM_EXTENSIONS
 AC_SYS_LARGEFILE
diff --git a/libdecnumber/Makefile.in b/libdecnumber/Makefile.in
index 9260b48..9da028d 100644
--- a/libdecnumber/Makefile.in
+++ b/libdecnumber/Makefile.in
@@ -25,7 +25,7 @@ srcdir = @srcdir@
 top_builddir = .
 VPATH = @srcdir@
 INSTALL = @INSTALL@
-AR = ar
+AR = @AR@
 ARFLAGS = cru
 ACLOCAL = @ACLOCAL@
 AUTOCONF = @AUTOCONF@
diff --git a/libdecnumber/configure.ac b/libdecnumber/configure.ac
index de7e008..ae475a0 100644
--- a/libdecnumber/configure.ac
+++ b/libdecnumber/configure.ac
@@ -28,6 +28,7 @@ AC_CONFIG_AUX_DIR(..)
 AC_PROG_MAKE_SET
 AC_PROG_CC
 AC_PROG_RANLIB
+AC_CHECK_TOOL(AR, ar)

 MISSING=`cd $ac_aux_dir && ${PWDCMD-pwd}`/missing
 AC_CHECK_PROGS([ACLOCAL], [aclocal], [$MISSING aclocal])


[PATCH 5/7] [OpenACC] Distinguish structural/dynamic mappings in libgomp

2020-05-22 Thread Julian Brown
This patch provides support for distinguishing target_mem_descs introduced
via structured data lifetimes from those arising from dynamic data
lifetimes.  This is a prerequisite for the following reference-count
self-checking patch.

This patch (and those following it) are not vital for this patch series,
but are "nice-to-have" additions.

OK?

Julian

libgomp/
* libgomp.h (struct target_mem_desc): Update comment on prev field.
* oacc-int.h (goacc_mark_dynamic, goacc_marked_dynamic_p): Add
prototypes.
* oacc-mem.c (dyn_tgt_sentinel): New.
(goacc_mark_dynamic, goacc_marked_dynamic_p): New functions.
(goacc_enter_datum): Call goacc_mark_dynamic.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_unmap_vars_internal): Convert a target_mem_desc from
a structural mapping to dynamic when appropriate.
---
 libgomp/libgomp.h  |  3 ++-
 libgomp/oacc-int.h |  3 +++
 libgomp/oacc-mem.c | 28 
 libgomp/target.c   |  8 +++-
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 7b52ce7d5c2..0d1978ffb13 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -971,7 +971,8 @@ struct target_mem_desc {
   uintptr_t tgt_end;
   /* Handle to free.  */
   void *to_free;
-  /* Previous target_mem_desc.  */
+  /* Previous target_mem_desc.  Also used in OpenACC to indicate that this
+ target_mem_desc is used only for an "enter data" mapping.  */
   struct target_mem_desc *prev;
   /* Number of items in following list.  */
   size_t list_count;
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 3c2c9b84b2f..2d8d3eb5a4b 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -165,6 +165,9 @@ bool _goacc_profiling_setup_p (struct goacc_thread *,
 void goacc_profiling_dispatch (acc_prof_info *, acc_event_info *,
   acc_api_info *);
 
+extern void goacc_mark_dynamic (struct target_mem_desc *);
+extern bool goacc_marked_dynamic_p (struct target_mem_desc *tgt);
+
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility pop
 #endif
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index c2b4a131a5f..038ab68e8a2 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -497,6 +497,30 @@ acc_unmap_data (void *h)
 }
 }
 
+/* Indicate (via storing its address in the "prev" field) a target_mem_desc
+   that is used for an "enter data" mapping.  */
+const static struct target_mem_desc dyn_tgt_sentinel;
+
+attribute_hidden void
+goacc_mark_dynamic (struct target_mem_desc *tgt)
+{
+  tgt->prev = (struct target_mem_desc *) _tgt_sentinel;
+}
+
+attribute_hidden bool
+goacc_marked_dynamic_p (struct target_mem_desc *tgt)
+{
+  return tgt->prev == (struct target_mem_desc *) _tgt_sentinel;
+}
 
 /* Enter dynamic mapping for a single datum.  Return the device pointer.  */
 
@@ -563,6 +587,8 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, 
unsigned short kind,
= gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,
   , true, GOMP_MAP_VARS_ENTER_DATA);
   assert (tgt);
+  goacc_mark_dynamic (tgt);
+
   assert (tgt->list_count == 1);
   n = tgt->list[0].key;
   assert (n);
@@ -1122,6 +1148,8 @@ goacc_enter_data_internal (struct gomp_device_descr 
*acc_dev, size_t mapnum,
   [i], [i], true,
   GOMP_MAP_VARS_ENTER_DATA);
  assert (tgt);
+ goacc_mark_dynamic (tgt);
+
  for (size_t j = 0; j < tgt->list_count; j++)
{
  n = tgt->list[j].key;
diff --git a/libgomp/target.c b/libgomp/target.c
index 3f2becdae0e..1d60d0cb573 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1447,7 +1447,13 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, 
bool do_copyfrom,
 
   bool do_unmap = false;
   if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
-   k->refcount--;
+   {
+ k->refcount--;
+ /* If we only have dynamic references left, mark the tgt_mem_desc
+appropriately.  */
+ if (k->refcount == k->dynamic_refcount)
+   goacc_mark_dynamic (k->tgt);
+   }
   else if (k->refcount == 1)
{
  k->refcount--;
-- 
2.23.0



[PATCH 6/7] [OpenACC] Reference count self-checking (dynamic_refcount version)

2020-05-22 Thread Julian Brown
This is a new version of the reference count self-checking code, adjusted
to work with the new (old) dynamic_refcount counting scheme.  The key
observation is that a target_mem_desc that was created from a dynamic
data lifetime should not contribute to the structured refcount for splay
tree keys in its variable list.  We can figure out which target_mem_descs
that applies to using the information recorded in the previous patch.

In a sense, this takes the "awkward corner cases" from the
virtual_refcount ("overhaul") patch, and moves them to the optional
self-test code, where they can potentially do less harm.  With this, we
still have a formal-ish model of what refcounts mean and some confidence
that they remain consistent (at least throughout execution of a test run),
which I think is a good thing.

OK? (We probably want a way of configuring-in this testing automatically,
as mentioned previously.)

Julian

ChangeLog

libgomp/
* libgomp.h (RC_CHECKING): New macro, disabled by default, guarding all
hunks in this patch.
(target_mem_desc): Add refcount_chk, mark fields.
(splay_tree_key_s): Add refcount_chk field.
(dump_tgt, gomp_rc_check): Add prototypes.
* oacc-mem.c (GOACC_enter_exit_data): Add refcount self-check code.
* oacc-parallel.c (GOACC_parallel_keyed_internal): Add refcount
self-check code.
(GOACC_data_start, GOACC_data_end, GOACC_enter_exit_data): Likewise.
* target.c (stdio.h): Include.
(dump_tgt, rc_check_clear, rc_check_count, rc_check_verify,
gomp_rc_check): New functions to consistency-check reference counts.
---
 libgomp/libgomp.h   |  18 
 libgomp/oacc-mem.c  |   6 ++
 libgomp/oacc-parallel.c |  27 ++
 libgomp/target.c| 185 
 4 files changed, 236 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 0d1978ffb13..eaa7c6ebb4c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -960,9 +960,17 @@ struct target_var_desc {
   uintptr_t length;
 };
 
+/* Uncomment to enable reference-count consistency checking (for development
+   use only).  */
+//#define RC_CHECKING 1
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
+#ifdef RC_CHECKING
+  uintptr_t refcount_chk;
+  bool mark;
+#endif
   /* All the splay nodes allocated together.  */
   splay_tree_node array;
   /* Start of the target region.  */
@@ -1019,6 +1027,10 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Dynamic reference count.  */
   uintptr_t dynamic_refcount;
+#ifdef RC_CHECKING
+  /* The recalculated reference count, for verification.  */
+  uintptr_t refcount_chk;
+#endif
   struct splay_tree_aux *aux;
 };
 
@@ -1174,6 +1186,12 @@ extern void gomp_detach_pointer (struct 
gomp_device_descr *,
 struct goacc_asyncqueue *, splay_tree_key,
 uintptr_t, bool, struct gomp_coalesce_buf *);
 
+#ifdef RC_CHECKING
+extern void dump_tgt (const char *, struct target_mem_desc *);
+extern void gomp_rc_check (struct gomp_device_descr *,
+  struct target_mem_desc *);
+#endif
+
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
  size_t, void **, void **,
  size_t *, void *, bool,
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 038ab68e8a2..c8ec3c9a7dd 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1450,4 +1450,10 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum, void 
**hostaddrs,
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  gomp_rc_check (acc_dev, thr->mapped_data);
+  gomp_mutex_unlock (_dev->lock);
+#endif
 }
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index c7e46e35bd6..0774cdc7e4f 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -301,6 +301,15 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
_info);
 }
   
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  assert (tgt);
+  dump_tgt (__FUNCTION__, tgt);
+  tgt->prev = thr->mapped_data;
+  gomp_rc_check (acc_dev, tgt);
+  gomp_mutex_unlock (_dev->lock);
+#endif
+
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
 devaddrs[i] = (void *) gomp_map_val (tgt, hostaddrs, i);
@@ -347,6 +356,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  gomp_rc_check (acc_dev, thr->mapped_data);
+  gomp_mutex_unlock (_dev->lock);
+#endif
 }
 
 /* Legacy entry point (GCC 5).  Only provide host fallback execution.  */
@@ -481,6 +496,12 @@ GOACC_data_start (int flags_m, size_t mapnum,
   thr->prof_info = NULL;
   

[PATCH 7/7] [OpenACC] Stricter dynamic data unmapping testing (WIP)

2020-05-22 Thread Julian Brown
Using the ability to distinguish structural from dynamic mappings'
target_mem_descs, we can adjust how the assertions in goacc_exit_datum
and goacc_exit_data_internal work.  This is possibly a slightly stronger
test than the one introduced earlier in this patch series -- though
actually I haven't quite convinced myself of that.

Anyway, this passes a regression run, with the refcount self-checking
code enabled also.

OK, or any comments?

Julian

ChangeLog

libgomp/
* oacc-mem.c (goacc_exit_datum): Adjust self-test code.
(goacc_exit_data_internal): Likewise.
* target.c (gomp_unmap_vars_internal): Clear target_mem_desc variable
list keys on unmapping.
---
 libgomp/oacc-mem.c | 43 ---
 libgomp/target.c   |  8 +++-
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index c8ec3c9a7dd..d7a1d87c9ef 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -755,16 +755,19 @@ goacc_exit_datum (void *h, size_t s, unsigned short kind, 
int async)
gomp_remove_var_async (acc_dev, n, aq);
   else
{
- int num_mappings = 0;
- /* If the target_mem_desc represents a single data mapping, we can
-check that it is freed when this splay tree key's refcount
-reaches zero.  Otherwise (e.g. for a struct mapping with multiple
-members), fall back to skipping the test.  */
- for (int i = 0; i < n->tgt->list_count; i++)
-   if (n->tgt->list[i].key)
- num_mappings++;
+ int remaining_mappings = 0;
+ bool dynamic = goacc_marked_dynamic_p (n->tgt);
+ if (dynamic)
+   {
+ /* For dynamic mappings, we may have more than one live splay
+tree in the target_mem_desc's variable list.  That's not an
+error.  */
+ for (int i = 0; i < n->tgt->list_count; i++)
+   if (n->tgt->list[i].key)
+ remaining_mappings++;
+   }
  bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
- assert (num_mappings > 1 || is_tgt_unmapped);
+ assert ((dynamic && remaining_mappings > 0) || is_tgt_unmapped);
}
 }
 
@@ -1283,17 +1286,19 @@ goacc_exit_data_internal (struct gomp_device_descr 
*acc_dev, size_t mapnum,
gomp_fatal ("synchronize failed");
  }
  }
-   int num_mappings = 0;
-   /* If the target_mem_desc represents a single data mapping, we
-  can check that it is freed when this splay tree key's
-  refcount reaches zero.  Otherwise (e.g. for a struct
-  mapping with multiple members), fall back to skipping the
-  test.  */
-   for (int j = 0; j < n->tgt->list_count; j++)
- if (n->tgt->list[j].key)
-   num_mappings++;
+   int remaining_mappings = 0;
+   bool dynamic = goacc_marked_dynamic_p (n->tgt);
+   if (dynamic)
+ {
+  /* For dynamic mappings, we may have more than one live
+ splay tree in the target_mem_desc's variable list.
+ That's not an error.  */
+   for (int j = 0; j < n->tgt->list_count; j++)
+ if (n->tgt->list[j].key)
+   remaining_mappings++;
+ }
bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
-   assert (num_mappings > 1 || is_tgt_unmapped);
+   assert ((dynamic && remaining_mappings > 0) || is_tgt_unmapped);
  }
  }
  break;
diff --git a/libgomp/target.c b/libgomp/target.c
index 9a51e1c70f6..f072e050cc1 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1630,6 +1630,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, 
bool do_copyfrom,
   if (k == NULL)
continue;
 
+  bool clear_mapping = true;
   bool do_unmap = false;
   if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
{
@@ -1637,7 +1638,10 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, 
bool do_copyfrom,
  /* If we only have dynamic references left, mark the tgt_mem_desc
 appropriately.  */
  if (k->refcount == k->dynamic_refcount)
-   goacc_mark_dynamic (k->tgt);
+   {
+ goacc_mark_dynamic (k->tgt);
+ clear_mapping = false;
+   }
}
   else if (k->refcount == 1)
{
@@ -1662,6 +1666,8 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, 
bool do_copyfrom,
  assert (!is_tgt_unmapped
  || k_tgt != tgt);
}
+  if (clear_mapping)
+   tgt->list[i].key = NULL;
 }
 
   if (aq)
-- 
2.23.0



[PATCH 2/7] [OpenACC] Adjust dynamic reference count semantics

2020-05-22 Thread Julian Brown
This patch adjusts the semantics of dynamic reference counts, as described
in the parent email. There are also two new test cases derived from
Thomas's test in the email:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546166.html

that work now.

OK?

Julian

ChangeLog

libgomp/
* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
dynamic_refcount.
(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
dynamic_refcount.
(acc_unmap_data): Replace open-coded refcount handling with call to
gomp_remove_var.
(goacc_enter_datum): Adjust for dynamic_refcount semantics.  Use tgt
returned from gomp_map_vars_async.  Update assertions.
(goacc_exit_datum): Re-add some error checking.  Adjust for
dynamic_refcount semantics.  Fix is_tgt_unmapped test for struct
mappings.
(goacc_enter_data_internal): Implement "present" case of dynamic
memory-map handling here.  Update "non-present" case for
dynamic_refcount semantics.
(goacc_exit_data_internal): Update for dynamic_refcount semantics.
Re-introduce error checking for tgt unmapping when appropriate.
* target.c (gomp_map_vars_internal): Remove
GOMP_MAP_VARS_OPENACC_ENTER_DATA handling.  Update for dynamic_refcount
handling.
(gomp_unmap_vars_internal): Remove virtual_refcount handling.
(gomp_load_image_to_device): Substitute dynamic_refcount for
virtual_refcount.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
---
 libgomp/libgomp.h |   8 +-
 libgomp/oacc-mem.c| 241 --
 libgomp/target.c  |  38 +--
 .../libgomp.oacc-c-c++-common/refcounting-1.c |  31 +++
 .../libgomp.oacc-c-c++-common/refcounting-2.c |  31 +++
 5 files changed, 243 insertions(+), 106 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-2.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index ca42e0de640..7b52ce7d5c2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1016,11 +1016,8 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Reference counts beyond those that represent genuine references in the
- linked splay tree key/target memory structures, e.g. for multiple OpenACC
- "present increment" operations (via "acc enter data") referring to the 
same
- host-memory block.  */
-  uintptr_t virtual_refcount;
+  /* Dynamic reference count.  */
+  uintptr_t dynamic_refcount;
   struct splay_tree_aux *aux;
 };
 
@@ -1153,7 +1150,6 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
-  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index c06b7341cbb..fff0d573f59 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -407,7 +407,7 @@ acc_map_data (void *h, void *d, size_t s)
   assert (tgt);
   splay_tree_key n = tgt->list[0].key;
   assert (n->refcount == 1);
-  assert (n->virtual_refcount == 0);
+  assert (n->dynamic_refcount == 0);
   /* Special reference counting behavior.  */
   n->refcount = REFCOUNT_INFINITY;
 
@@ -454,7 +454,7 @@ acc_unmap_data (void *h)
  (void *) n->host_start, (int) host_size, (void *) h);
 }
   /* TODO This currently doesn't catch 'REFCOUNT_INFINITY' usage different from
- 'acc_map_data'.  Maybe 'virtual_refcount' can be used for disambiguating
+ 'acc_map_data'.  Maybe 'dynamic_refcount' can be used for disambiguating
  the different 'REFCOUNT_INFINITY' cases, or simply separate
  'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA'
  etc.)?  */
@@ -475,14 +475,19 @@ acc_unmap_data (void *h)
   gomp_mutex_unlock (_dev->lock);
   gomp_fatal ("cannot unmap target block");
 }
-  else if (tgt->refcount > 1)
-tgt->refcount--;
-  else
+
+  if (tgt->refcount == 1)
 {
-  free (tgt->array);
-  free (tgt);
+  /* This is the last reference.  Nullifying these fields prevents
+'gomp_unmap_tgt' via 'gomp_remove_var' from freeing the target
+memory.  */
+  tgt->tgt_end = 0;
+  tgt->to_free = NULL;
 }
 
+  bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
+  assert (is_tgt_unmapped);
+
   gomp_mutex_unlock (_dev->lock);
 
   if (profiling_p)
@@ -540,10 +545,8 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
 
   assert (n->refcount != REFCOUNT_LINK);
   if (n->refcount != 

[PATCH 4/7] [OpenACC] Fix incompatible copyout for acc_map_data (PR92843)

2020-05-22 Thread Julian Brown
This patch provides a bug fix (on top of previous patches in this
series) that allows the PR92843 test case to pass. Data mapped in with
"acc_map_data" is not copied out by an "exit data" directive.

OK?

Julian

ChangeLog

PR libgomp/92843

libgomp/
* oacc-mem.c (goacc_exit_data_internal): Don't copyout data mapped
with acc_map_data in exit data directive.
---
 libgomp/oacc-mem.c  | 1 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 20d241382a8..c2b4a131a5f 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1230,6 +1230,7 @@ goacc_exit_data_internal (struct gomp_device_descr 
*acc_dev, size_t mapnum,
  }
 
if (copyfrom
+   && n->refcount != REFCOUNT_INFINITY
&& (kind != GOMP_MAP_FROM || n->refcount == 0))
  gomp_copy_dev2host (acc_dev, aq, (void *) cur_node.host_start,
  (void *) (n->tgt->tgt_start + n->tgt_offset
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
index f16c46a37bf..db5b35b08d9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
@@ -1,7 +1,6 @@
 /* Verify that 'acc_copyout' etc. is a no-op if there's still a structured
reference count.  */
 
-/* { dg-xfail-run-if "TODO PR92843" { *-*-* } } */
 /* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
 
 #include 
-- 
2.23.0



[PATCH 3/7] [OpenACC] Don't pass kind array via pointer to goacc_enter_datum

2020-05-22 Thread Julian Brown
Since goacc_enter_datum only maps a single data item now, there is no
need to pass "kinds" as an array.  Passing as a scalar allows for some
simplification in the function's callers.

OK?

Julian

ChangeLog

libgomp/
* oacc-mem.c (goacc_enter_datum): Use scalar kind argument instead of
kinds array.
(acc_create, acc_create_async, acc_copyin, acc_copyin_async): Update
calls to goacc_enter_datum.
---
 libgomp/oacc-mem.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index fff0d573f59..20d241382a8 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -501,7 +501,8 @@ acc_unmap_data (void *h)
 /* Enter dynamic mapping for a single datum.  Return the device pointer.  */
 
 static void *
-goacc_enter_datum (void **hostaddrs, size_t *sizes, void *kinds, int async)
+goacc_enter_datum (void **hostaddrs, size_t *sizes, unsigned short kind,
+  int async)
 {
   void *d;
   splay_tree_key n;
@@ -560,7 +561,7 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
 
   struct target_mem_desc *tgt
= gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,
-  kinds, true, GOMP_MAP_VARS_ENTER_DATA);
+  , true, GOMP_MAP_VARS_ENTER_DATA);
   assert (tgt);
   assert (tgt->list_count == 1);
   n = tgt->list[0].key;
@@ -584,15 +585,13 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
 void *
 acc_create (void *h, size_t s)
 {
-  unsigned short kinds[1] = { GOMP_MAP_ALLOC };
-  return goacc_enter_datum (, , , acc_async_sync);
+  return goacc_enter_datum (, , GOMP_MAP_ALLOC, acc_async_sync);
 }
 
 void
 acc_create_async (void *h, size_t s, int async)
 {
-  unsigned short kinds[1] = { GOMP_MAP_ALLOC };
-  goacc_enter_datum (, , , async);
+  goacc_enter_datum (, , GOMP_MAP_ALLOC, async);
 }
 
 /* acc_present_or_create used to be what acc_create is now.  */
@@ -617,15 +616,13 @@ acc_pcreate (void *h, size_t s)
 void *
 acc_copyin (void *h, size_t s)
 {
-  unsigned short kinds[1] = { GOMP_MAP_TO };
-  return goacc_enter_datum (, , , acc_async_sync);
+  return goacc_enter_datum (, , GOMP_MAP_TO, acc_async_sync);
 }
 
 void
 acc_copyin_async (void *h, size_t s, int async)
 {
-  unsigned short kinds[1] = { GOMP_MAP_TO };
-  goacc_enter_datum (, , , async);
+  goacc_enter_datum (, , GOMP_MAP_TO, async);
 }
 
 /* acc_present_or_copyin used to be what acc_copyin is now.  */
-- 
2.23.0



[PATCH 1/7] [OpenACC] Missing unlocking on error paths in attach/detach code

2020-05-22 Thread Julian Brown
This patch adds some missing unlocking from error paths in the OpenACC
attach/detach code, noticed during development of other patches in
this series.

OK?

Julian

ChangeLog

libgomp/
* oacc-mem.c (acc_attach_async): Add missing gomp_mutex_unlock on
error path.
(goacc_detach_internal): Likewise.
---
 libgomp/oacc-mem.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 2d4bba78efd..c06b7341cbb 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -887,7 +887,10 @@ acc_attach_async (void **hostaddr, int async)
   n = splay_tree_lookup (_dev->mem_map, _node);
 
   if (n == NULL)
-gomp_fatal ("struct not mapped for acc_attach");
+{
+  gomp_mutex_unlock (_dev->lock);
+  gomp_fatal ("struct not mapped for acc_attach");
+}
 
   gomp_attach_pointer (acc_dev, aq, _dev->mem_map, n, (uintptr_t) hostaddr,
   0, NULL);
@@ -920,7 +923,10 @@ goacc_detach_internal (void **hostaddr, int async, bool 
finalize)
   n = splay_tree_lookup (_dev->mem_map, _node);
 
   if (n == NULL)
-gomp_fatal ("struct not mapped for acc_detach");
+{
+  gomp_mutex_unlock (_dev->lock);
+  gomp_fatal ("struct not mapped for acc_detach");
+}
 
   gomp_detach_pointer (acc_dev, aq, n, (uintptr_t) hostaddr, finalize, NULL);
 
-- 
2.23.0



[PATCH 0/7] [OpenACC] Dynamic reference counts for mapped data

2020-05-22 Thread Julian Brown
After questions from Thomas:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545399.html

here is a patch series that adjusts how reference counting works
(again) for dynamic data lifetimes in OpenACC.  Since the "overhaul"
patch was applied,

https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01249.html

dynamic data lifetimes have been represented by counting "excess"
references beyond those that are explicitly part of the linked splay
tree/target memory descriptor data structure.  This allowed self-test
code to be written to ensure that reference counts remained consistent
throughout execution.

However, there were some awkward corner-cases, which -- though fixable
-- made some of the code more complex than it could have been.  So,
this patch series reverts the dynamic reference counting implementation
to the previous semantics, which are that the dynamic reference count
recorded in each mapping's splay tree key corresponds more directly to
the source-level semantics (i.e. "enter data" operations increment the
reference count, and "exit data" operations decrement it).

This is not a plain revert of the "overhaul" patch above.  I have tried
to keep various refactoring introduced in that patch in place, though
I have re-introduced some error checking that the aforementioned patch
removed.

I have also managed to adjust the (still optional, development use only)
self-checking code to be able to work with the "dynamic_refcount" scheme.
A couple of patches containing minor cleanups are included too.

Tested with offloading to NVPTX (as a series). OK?

Julian Brown (7):
  [OpenACC] Missing unlocking on error paths in attach/detach code
  [OpenACC] Adjust dynamic reference count semantics
  [OpenACC] Don't pass kind array via pointer to goacc_enter_datum
  [OpenACC] Fix incompatible copyout for acc_map_data (PR92843)
  [OpenACC] Distinguish structural/dynamic mappings in libgomp
  [OpenACC] Reference count self-checking (dynamic_refcount version)
  [OpenACC] Stricter dynamic data unmapping testing (WIP)

 libgomp/libgomp.h |  29 +-
 libgomp/oacc-int.h|   3 +
 libgomp/oacc-mem.c| 306 +-
 libgomp/oacc-parallel.c   |  27 ++
 libgomp/target.c  | 231 +++--
 .../libgomp.oacc-c-c++-common/pr92843-1.c |   1 -
 .../libgomp.oacc-c-c++-common/refcounting-1.c |  31 ++
 .../libgomp.oacc-c-c++-common/refcounting-2.c |  31 ++
 8 files changed, 542 insertions(+), 117 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-2.c

-- 
2.23.0



Re: ChangeLog files - server and client scripts

2020-05-22 Thread Ian Lance Taylor via Gcc-patches
On Fri, May 22, 2020 at 12:48 PM Jakub Jelinek  wrote:
>
> On Fri, May 22, 2020 at 12:37:29PM -0700, Ian Lance Taylor wrote:
> > Thanks for looking into this.
> >
> > Unfortunately, my push is still failing.  I'm not sure why.
> >
> > remote: *** ChangeLog format failed:
> > remote: ERR: cannot find a ChangeLog location in message
> > remote:
> > remote: Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
> > remote:
> > remote: error: hook declined to update refs/heads/master
> > To git+ssh://gcc.gnu.org/git/gcc
> >  ! [remote rejected] master -> master (hook declined)
> > error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc'
> >
> >
> > I've attached the output of "git format-patch -k 1 --stdout", in case
> > that helps.
>
> Bet the script first looks for the ChangeLog entry and only considers the
> ignored prefixes if it finds files in the patch that are not mentioned in
> the ChangeLog entry.  So, if you say modified gcc/go/whatever.cc and had
> ChangeLog entry for that and not for the files you've changed, it would be
> ok.
>
> So, I think before emitting the above message, it should look through the
> patch and if it finds all files in ignored prefixes, it should just not to
> do anything.
>
> We'll need it e.g. for the DATESTAMP bump job too which only modifies those
> files too and doesn't write ChangeLog entry for that.
>
> I'm sorry but I don't know the script well enough to fix it quickly, will
> defer to Martin as the author.
>
> Can you wait with the commit until Monday?  If not, I could just temporarily
> disable this for your commit.

Sure, I can wait.  Thanks for looking at it.

Ian


[pushed] c++: -fsanitize=vptr and -fstrong-eval-order. [PR95221]

2020-05-22 Thread Jason Merrill via Gcc-patches
With -fstrong-eval-order=all we evaluate the function address before the
arguments.  But this caused trouble with virtual functions and
 -fsanitize=vptr; we would do vptr sanitization as part of calculating the
'this' argument, and separately look at the vptr in order to find the
function address.  Without -fstrong-eval-order=all 'this' is evaluated
first, but with that flag the function address is evaluated first, so we
would access the null vptr before sanitizing it.

Fixed by instrumenting the OBJ_TYPE_REF of a virtual function call instead
of the 'this' argument.

This issue suggests that we should be running the ubsan tests in multiple
standard modes like the rest of the G++ testsuite, so I've made that change
as well.  The testcase changes are all to accommodate that.

gcc/cp/ChangeLog:

* cp-ubsan.c (cp_ubsan_maybe_instrument_member_call): For a virtual
call, instrument the OBJ_TYPE_REF.

gcc/testsuite/ChangeLog:

* g++.dg/ubsan/ubsan.exp: Use g++-dg-runtest.
* c-c++-common/ubsan/bounds-13.c: Adjust.
* c-c++-common/ubsan/bounds-2.c: Adjust.
* c-c++-common/ubsan/div-by-zero-1.c: Adjust.
* c-c++-common/ubsan/div-by-zero-6.c: Adjust.
* c-c++-common/ubsan/div-by-zero-7.c: Adjust.
* c-c++-common/ubsan/overflow-add-1.c: Adjust.
* c-c++-common/ubsan/overflow-add-2.c: Adjust.
* c-c++-common/ubsan/overflow-int128.c: Adjust.
* c-c++-common/ubsan/overflow-sub-1.c: Adjust.
* c-c++-common/ubsan/overflow-sub-2.c: Adjust.
* g++.dg/ubsan/pr85029.C: Adjust.
* g++.dg/ubsan/vptr-14.C: Adjust.
---
 gcc/cp/cp-ubsan.c | 33 +++
 gcc/testsuite/c-c++-common/ubsan/bounds-13.c  |  1 +
 gcc/testsuite/c-c++-common/ubsan/bounds-2.c   |  1 +
 .../c-c++-common/ubsan/div-by-zero-1.c|  1 +
 .../c-c++-common/ubsan/div-by-zero-6.c|  1 +
 .../c-c++-common/ubsan/div-by-zero-7.c|  1 +
 .../c-c++-common/ubsan/overflow-add-1.c   |  1 +
 .../c-c++-common/ubsan/overflow-add-2.c   |  1 +
 .../c-c++-common/ubsan/overflow-int128.c  |  1 +
 .../c-c++-common/ubsan/overflow-sub-1.c   |  1 +
 .../c-c++-common/ubsan/overflow-sub-2.c   |  1 +
 gcc/testsuite/g++.dg/ubsan/pr85029.C  |  2 +-
 gcc/testsuite/g++.dg/ubsan/vptr-14.C  |  2 +-
 gcc/testsuite/g++.dg/ubsan/ubsan.exp  |  2 +-
 14 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/cp-ubsan.c b/gcc/cp/cp-ubsan.c
index b064a98202a..c40dac72b42 100644
--- a/gcc/cp/cp-ubsan.c
+++ b/gcc/cp/cp-ubsan.c
@@ -118,14 +118,33 @@ cp_ubsan_maybe_instrument_member_call (tree stmt)
 {
   if (call_expr_nargs (stmt) == 0)
 return;
-  tree *opp = _EXPR_ARG (stmt, 0);
-  tree op = *opp;
-  if (op == error_mark_node
-  || !INDIRECT_TYPE_P (TREE_TYPE (op)))
-return;
-  while (TREE_CODE (op) == COMPOUND_EXPR)
+  tree op, *opp;
+
+  tree fn = CALL_EXPR_FN (stmt);
+  if (fn && TREE_CODE (fn) == OBJ_TYPE_REF)
 {
-  opp = _OPERAND (op, 1);
+  /* Virtual function call: Sanitize the use of the object pointer in the
+OBJ_TYPE_REF, since the vtable reference will SEGV otherwise (95221).
+OBJ_TYPE_REF_EXPR is ptr->vptr[N] and OBJ_TYPE_REF_OBJECT is ptr.  */
+  opp = _TYPE_REF_EXPR (fn);
+  op = OBJ_TYPE_REF_OBJECT (fn);
+  while (*opp != op)
+   {
+ if (TREE_CODE (*opp) == COMPOUND_EXPR)
+   opp = _OPERAND (*opp, 1);
+ else
+   opp = _OPERAND (*opp, 0);
+   }
+}
+  else
+{
+  /* Non-virtual call: Sanitize the 'this' argument.  */
+  opp = _EXPR_ARG (stmt, 0);
+  if (*opp == error_mark_node
+ || !INDIRECT_TYPE_P (TREE_TYPE (*opp)))
+   return;
+  while (TREE_CODE (*opp) == COMPOUND_EXPR)
+   opp = _OPERAND (*opp, 1);
   op = *opp;
 }
   op = cp_ubsan_maybe_instrument_vptr (EXPR_LOCATION (stmt), op,
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-13.c 
b/gcc/testsuite/c-c++-common/ubsan/bounds-13.c
index 25b0467ec67..a8bce370f8f 100644
--- a/gcc/testsuite/c-c++-common/ubsan/bounds-13.c
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-13.c
@@ -1,6 +1,7 @@
 /* PR sanitizer/71498 */
 /* { dg-do run } */
 /* { dg-options "-fsanitize=bounds -Wno-array-bounds" } */
+/* { dg-options "-fsanitize=bounds -Wno-array-bounds -Wno-volatile" { target 
c++ } } */
 
 struct S { int a[100]; int b, c; } s;
 
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-2.c 
b/gcc/testsuite/c-c++-common/ubsan/bounds-2.c
index 56654486f65..ec4ef7eed20 100644
--- a/gcc/testsuite/c-c++-common/ubsan/bounds-2.c
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-2.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-fsanitize=bounds -Wall -Wextra -Wno-unused -Wno-array-bounds 
-Wno-uninitialized" } */
+/* { dg-options "-fsanitize=bounds -Wall -Wextra -Wno-unused -Wno-array-bounds 
-Wno-uninitialized -Wno-volatile" { target c++ } } */
 
 /* Test runtime errors.  */
 
diff --git 

Re: New mklog script

2020-05-22 Thread Jason Merrill via Gcc-patches
On Thu, May 21, 2020 at 6:03 PM Jason Merrill  wrote:
>
> On Fri, May 15, 2020 at 11:39 AM Martin Liška  wrote:
> >
> > On 5/15/20 3:22 PM, Marek Polacek wrote:
> > > On Fri, May 15, 2020 at 03:12:27PM +0200, Martin Liška wrote:
> > >> On 5/15/20 2:42 PM, Marek Polacek wrote:
> > >>> I actually use mklog -i all the time.  But I can work around it if it
> > >>> disappears.
> > >>
> > >> Ah, I can see a consumer.
> > >> There's an updated version that supports that.
> > >>
> > >> For the future, will you still use the option? Wouldn't be better
> > >> to put the ChangeLog content directly to commit message? Note
> > >> that you won't have to copy the entries to a particular ChangeLog file.
> > >
> > > The way I do it is to generate a patch using format-patch, use mklog -i
> > > on it, then add the ChangeLog entry to the commit message via commit 
> > > --amend.
> >
> > Hmm, you can do much better with:
> >
> > $ git diff | ./contrib/mklog > changelog && git commit -a -t changelog
> >
> > Or for an already created commit you can do:
> >
> > $ git diff HEAD~ | ./contrib/mklog > changelog && git commit -a --amend -e 
> > -F changelog
>
> With these git aliases:
>
> mklog-editor = "!f() { git show | git gcc-mklog >> $1; }; f"
> addlog = "!f() { GIT_EDITOR='git mklog-editor' git commit --amend; }; 
> f"
>
> I can 'git addlog' to append the output of mklog to the current
> commit.  Probably better would be to do something with
> prepare-commit-msg.

This is pretty rudimentary, but good enough as a start:

#!/bin/sh

#COMMIT_MSG_FILE=$1
#COMMIT_SOURCE=$2
#SHA1=$3

if ! [ -f "$1" ]; then exit 0; fi

#echo "# $0 $1 $2 $3" >> $1

if fgrep 'ChangeLog:' $1 > /dev/null 2>&1; then exit 0; fi

if [ -z "$2" ]; then
cmd="diff --cached"
elif [ $2 == commit ]; then
cmd="show $3"
else
exit 0
fi

git $cmd | git gcc-mklog >> $1



Re: [PATCH] Extend std::copy/std::copy_n char* overload to deque iterator

2020-05-22 Thread François Dumont via Gcc-patches

On 21/05/20 2:17 pm, Jonathan Wakely wrote:


Why is the optimization not done for C++03 mode?

I did it this way because the new std::copy overload rely on std::copy_n 
implementation details which is a C++11 algo.




It looks like the uses of 'auto' can be reaplced easily, and
__enable_if_t<> can be replaced with __gnu_cxx::__enable_if<>::__type.


But yes, we can indeed provide those implementation details in pre-C++11.

This is what I've done in this new version.

Tested under Linux x86_64 in default c++ mode.

I tried to use CXXFLAGS=-std=c++03 but it doesn't seem to work even if I 
do see the option in build logs. I remember you adivised a different 
approach, can you tell me again ?


François

diff --git a/libstdc++-v3/include/bits/deque.tcc b/libstdc++-v3/include/bits/deque.tcc
index e773f32b256..66ecd9da2f3 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -1065,6 +1065,57 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   return __result;
 }
 
+  template
+typename __gnu_cxx::__enable_if<
+  __is_char<_CharT>::__value,
+  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> >::__type
+__copy_move_a2(
+	istreambuf_iterator<_CharT, char_traits<_CharT> > __first,
+	istreambuf_iterator<_CharT, char_traits<_CharT> > __last,
+	_GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> __result)
+{
+  if (__first == __last)
+	return __result;
+
+  for (;;)
+	{
+	  const std::ptrdiff_t __len = __result._M_last - __result._M_cur;
+	  const std::ptrdiff_t __nb
+	= std::__copy_n_a(__first, __len, __result._M_cur, false)
+	- __result._M_cur;
+	  __result += __nb;
+
+	  if (__nb != __len)
+	break;
+	}
+
+  return __result;
+}
+
+  template
+typename __gnu_cxx::__enable_if<
+  __is_char<_CharT>::__value,
+  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> >::__type
+__copy_n_a(
+  istreambuf_iterator<_CharT, char_traits<_CharT>> __it, _Size __size,
+  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> __result,
+  bool __strict)
+{
+  if (__size == 0)
+	return __result;
+
+  do
+	{
+	  const _Size __len
+	= std::min<_Size>(__result._M_last - __result._M_cur, __size);
+	  std::__copy_n_a(__it, __len, __result._M_cur, __strict);
+	  __result += __len;
+	  __size -= __len;
+	}
+  while (__size != 0);
+  return __result;
+}
+
   template
 _OI
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 932ece55529..70d8232aece 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -705,31 +705,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __result;
 }
 
-  template
-_GLIBCXX20_CONSTEXPR
-_OutputIterator
-__copy_n_a(_InputIterator __first, _Size __n, _OutputIterator __result)
-{
-  if (__n > 0)
-	{
-	  while (true)
-	{
-	  *__result = *__first;
-	  ++__result;
-	  if (--__n > 0)
-		++__first;
-	  else
-		break;
-	}
-	}
-  return __result;
-}
- 
-  template
-__enable_if_t<__is_char<_CharT>::__value, _CharT*>
-__copy_n_a(istreambuf_iterator<_CharT, char_traits<_CharT>>,
-	   _Size, _CharT*);
-
   template
 _GLIBCXX20_CONSTEXPR
 _OutputIterator
@@ -738,7 +713,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   return std::__niter_wrap(__result,
 			   __copy_n_a(__first, __n,
-	  std::__niter_base(__result)));
+	  std::__niter_base(__result), true));
 }
 
   template::value)
 { return __it; }
 
+  template
+_Ite
+__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
+		 std::random_access_iterator_tag>&);
+
   // Reverse the __niter_base transformation to get a
   // __normal_iterator back again (this assumes that __normal_iterator
   // is only used to wrap random access iterators, like pointers).
@@ -466,6 +471,15 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 __copy_move_a2(istreambuf_iterator<_CharT, char_traits<_CharT> >,
 		   istreambuf_iterator<_CharT, char_traits<_CharT> >, _CharT*);
 
+  template
+typename __gnu_cxx::__enable_if<
+  __is_char<_CharT>::__value,
+  _GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*> >::__type
+__copy_move_a2(
+	istreambuf_iterator<_CharT, char_traits<_CharT> >,
+	istreambuf_iterator<_CharT, char_traits<_CharT> >,
+	_GLIBCXX_STD_C::_Deque_iterator<_CharT, _CharT&, _CharT*>);
+
   template
 _GLIBCXX20_CONSTEXPR
 inline _OI
@@ -539,6 +553,41 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 		  const ::__gnu_debug::_Safe_iterator<_IIte, _ISeq, _ICat>&,
 		  const ::__gnu_debug::_Safe_iterator<_OIte, _OSeq, _OCat>&);
 
+  template
+_GLIBCXX20_CONSTEXPR
+_OutputIterator
+__copy_n_a(_InputIterator __first, _Size __n, _OutputIterator __result,
+	   bool)
+{
+  if (__n > 0)
+	{
+	  while (true)
+	{
+	  *__result = *__first;
+	  ++__result;
+	  if (--__n > 0)
+		++__first;
+	

Teach free_lang_data to simplify types of TYPE_VALUES in enumeral types

2020-05-22 Thread Jan Hubicka
Hi,
streaming code assumes that INTEGER_CST never appears in non-trivial component.
This is not true and we sometimes stream such components which sort of silently
works but breaks our IL invariant about tree sharing.  This patch fixes one
instance of this problem where ENUMERAL_TYPE lists all its valids in TYPE_VALUES
that with some FEs (like Ada and C++) are having the enumeral type as a type
while in other FEs (like C) are simple integer types.

I convert them all to integers which also increases chance that they will be
shared with other integer constants at stream time.

lto-bootstrapped/regtested x86_64-linux.  I plan to commit this shortly.

* tree.c (free_lang_data_in_type): Simpify types of TYPE_VALUES in
enumeral types.
diff --git a/gcc/tree.c b/gcc/tree.c
index 54e471acc89..ea6692476e7 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5556,6 +5556,7 @@ free_lang_data_in_type (tree type, class free_lang_data_d 
*fld)
 {
   if (TREE_CODE (type) == ENUMERAL_TYPE)
{
+ tree it = NULL_TREE;
  ENUM_IS_OPAQUE (type) = 0;
  ENUM_IS_SCOPED (type) = 0;
  /* Type values are used only for C++ ODR checking.  Drop them
@@ -5568,8 +5569,27 @@ free_lang_data_in_type (tree type, class 
free_lang_data_d *fld)
  /* Simplify representation by recording only values rather
 than const decls.  */
for (tree e = TYPE_VALUES (type); e; e = TREE_CHAIN (e))
- if (TREE_CODE (TREE_VALUE (e)) == CONST_DECL)
-   TREE_VALUE (e) = DECL_INITIAL (TREE_VALUE (e));
+ {
+   if (TREE_CODE (TREE_VALUE (e)) == CONST_DECL)
+ {
+   TREE_VALUE (e) = DECL_INITIAL (TREE_VALUE (e));
+   /* We can not stream values whose TREE_TYPE is type itself
+  because that would create non-trivial CSS.  Canonicalize
+  them to integer types.  */
+ }
+   /* Some frontends use ENUMERAL_TYPE to represent the constants.
+  This leads to nontrivial SCC components containing
+  INTEGER_CST which is not good for streaming.  Convert them
+  all to corresponding integer type.  */
+   if (TREE_CODE (TREE_TYPE (TREE_VALUE (e))) != INTEGER_TYPE)
+ {
+   if (!it)
+ it = lang_hooks.types.type_for_size
+  (TYPE_PRECISION (TREE_TYPE (TREE_VALUE (e))),
+   TYPE_UNSIGNED (TREE_TYPE (TREE_VALUE (e;
+   TREE_VALUE (e) = fold_convert (it, TREE_VALUE (e));
+ }
+  }
}
   free_lang_data_in_one_sizepos (_MIN_VALUE (type));
   free_lang_data_in_one_sizepos (_MAX_VALUE (type));


[PATCH] RS6000, add VSX mask manipulation support

2020-05-22 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch adds support for builtins vec_genbm(),  vec_genhm(),
vec_genwm(), vec_gendm(), vec_genqm(), vec_cntm(), vec_expandm(),
vec_extractm().  Support for instructions mtvsrbm, mtvsrhm, mtvsrwm,
mtvsrdm, mtvsrqm, cntm, vexpandm, vextractm.

The test has been tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)

and mambo with no regression errors.

Please let me know if this patch is acceptable for mainline.

Thanks.

   Carl Love
---

RS6000 RFC 2629, add VSX mask manipulation support

gcc/ChangeLog

2020-05-22  Carl Love  

* config/rs6000/vsx.md  (VSX_MM): New define_mode_iterator.
(VSX_MM4): New define_mode_iterator.
(VSX_MM_SUFFIX4): New define_mode_attr.
(vec_mtvsrbm): New define_expand.
(vec_mtvsrbmi): New define_insn.
(vec_mtvsr_): New define_insn.
(vec_cntmb_): New define_insn.
(vec_extract_): New define_insn.
(vec_expand_): New define_insn.
(define_c_enum unspec): Add entries UNSPEC_MTVSBM, UNSPEC_VCNTMB,
UNSPEC_VEXTRACT, UNSPEC_VEXPAND.
* config/rs6000/altivec.h: Add defines vec_genbm, vec_genhm, vec_genwm,
vec_gendm, vec_genqm, vec_cntm, vec_expandm, vec_extractm.
* config/rs6000/rs6000-builtin.c: Add defines BU_FUTURE_2, BU_FUTURE_1.
(BU_FUTURE_1): Add definitions for mtvsrbm, mtvsrhm, mtvsrwm,
mtvsrdm, mtvsrqm, vexpandmb, vexpandmh, vexpandmw, vexpandmd, vexpandmq,
vextractmb, vextractmh, vextractmw, vextractmd, vextractmq.
(BU_FUTURE_2): Add definitions for cntmbb, cntmbh, cntmbw, cntmbd.
(BU_FUTURE_OVERLOAD_1): Add definitions for mtvsrbm, mtvsrhm,
mtvsrwm, mtvsrdm, mtvsrqm, vexpandm, vextractm.
(BU_FUTURE_OVERLOAD_2): Add defition for cntm.
* config/rs6000/rs6000-call.c (rs6000_expand_binop_builtin): Add
checks for CODE_FOR_vec_cntmbb_v16qi, CODE_FOR_vec_cntmb_v8hi,
CODE_FOR_vec_cntmb_v4si, CODE_FOR_vec_cntmb_v2di.
(altivec_overloaded_builtins): Add overloaded argument entries for
FUTURE_BUILTIN_VEC_MTVSRBM, FUTURE_BUILTIN_VEC_MTVSRHM, 
FUTURE_BUILTIN_VEC_MTVSRWM,
FUTURE_BUILTIN_VEC_MTVSRDM, FUTURE_BUILTIN_VEC_MTVSRQM, 
FUTURE_BUILTIN_VEC_VCNTMBB,
FUTURE_BUILTIN_VCNTMBB, FUTURE_BUILTIN_VCNTMBH, FUTURE_BUILTIN_VCNTMBW,
FUTURE_BUILTIN_VCNTMBD, FUTURE_BUILTIN_VEXPANDMB, 
FUTURE_BUILTIN_VEXPANDMH,
FUTURE_BUILTIN_VEXPANDMW, FUTURE_BUILTIN_VEXPANDMD, 
FUTURE_BUILTIN_VEXPANDMQ,
FUTURE_BUILTIN_VEXTRACTMB, FUTURE_BUILTIN_VEXTRACTMH, 
FUTURE_BUILTIN_VEXTRACTMW,
FUTURE_BUILTIN_VEXTRACTMD, FUTURE_BUILTIN_VEXTRACTMQ.
(builtin_function_type): Add case entries for FUTURE_BUILTIN_MTVSRBM,
FUTURE_BUILTIN_MTVSRHM, FUTURE_BUILTIN_MTVSRWM, FUTURE_BUILTIN_MTVSRDM,
FUTURE_BUILTIN_MTVSRQM, FUTURE_BUILTIN_VCNTMBB, FUTURE_BUILTIN_VCNTMBH,
FUTURE_BUILTIN_VCNTMBW, FUTURE_BUILTIN_VCNTMBD, 
FUTURE_BUILTIN_VEXPANDMB,
FUTURE_BUILTIN_VEXPANDMH, FUTURE_BUILTIN_VEXPANDMW, 
FUTURE_BUILTIN_VEXPANDMD,
FUTURE_BUILTIN_VEXPANDMQ.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add entries
for MTVSRBM, MTVSRHM, MTVSRWM, MTVSRDM, MTVSRQM, VCNTM, VEXPANDM, 
VEXTRACTM.
* testsuite/gcc.target/powerpc/vsx_mask-runnable.c:  Add runnable test 
case.
---
 gcc/config/rs6000/altivec.h   |  10 +
 gcc/config/rs6000/rs6000-builtin.def  |  45 ++
 gcc/config/rs6000/rs6000-call.c   |  66 +-
 gcc/config/rs6000/vsx.md  |  67 ++
 .../gcc.target/powerpc/vsx_mask-runnable.c| 614 ++
 5 files changed, 801 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx_mask-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 0a7e8ab3647..5917d3a2b76 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -710,6 +710,16 @@ __altivec_scalar_pred(vec_any_nle,
 
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
+
+/* VSX Mask Manipulation builtin. */
+#define vec_genbm __builtin_vec_mtvsrbm
+#define vec_genhm __builtin_vec_mtvsrhm
+#define vec_genwm __builtin_vec_mtvsrwm
+#define vec_gendm __builtin_vec_mtvsrdm
+#define vec_genqm __builtin_vec_mtvsrqm
+#define vec_cntm __builtin_vec_cntm
+#define vec_expandm __builtin_vec_vexpandm
+#define vec_extractm __builtin_vec_vextractm
 #endif
 
 #endif /* _ALTIVEC_H */
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 8b1ddb00045..7cab5097aeb 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1049,6 +1049,22 @@
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_TERNARY), \

Re: [PATCH] c++: template instantiation during fold_for_warn [PR94038]

2020-05-22 Thread Jason Merrill via Gcc-patches

On 5/20/20 10:08 PM, Patrick Palka wrote:

On Wed, 20 May 2020, Patrick Palka wrote:

On Tue, 19 May 2020, Jason Merrill wrote:


On 5/8/20 11:42 AM, Patrick Palka wrote:

On Wed, 6 May 2020, Patrick Palka wrote:


On Wed, 6 May 2020, Patrick Palka wrote:


On Tue, 5 May 2020, Patrick Palka wrote:


On Tue, 5 May 2020, Patrick Palka wrote:


Unfortunately, the previous fix to PR94038 is fragile.  When the
argument to fold_for_warn is a bare CALL_EXPR, then all is well: the
result of maybe_constant_value from fold_for_warn (with
uid_sensitive=true) is reused via the cv_cache in the subsequent
call to
maybe_constant_value from cp_fold (with uid_sensitive=false), so we
avoid instantiating bar.

But when the argument to fold_for_warn is more complex, e.g. an
INDIRECT_REF of a CALL_EXPR, as in the testcase below (due to
bar()
returning const int& which we need to decay to int) then from
fold_for_warn we call maybe_constant_value on the INDIRECT_REF, and
from
cp_fold we call it on the CALL_EXPR, so there is no reuse via the
cv_cache and we therefore end up instantiating bar.

So for a more robust solution to this general issue of warning flags
affecting code generation, it seems that we need a way to globally
avoid
template instantiation during constexpr evaluation whenever we're
performing warning-dependent folding.

To that end, this patch replaces the flag
constexpr_ctx::uid_sensitive
with a global flag uid_sensitive_constexpr_evaluation_p, and enables
it
during fold_for_warn using an RAII helper.

The patch also adds a counter that keeps track of the number of
times
uid_sensitive_constexpr_evaluation_p is called, and we use this to
determine whether the result of constexpr evaluation depended on the
flag's value.  This lets us safely update the cv_cache and
fold_cache
from fold_for_warn in the most common case where the flag's value
was
irrelevant during constexpr evaluation.


Here's some statistics about about the fold cache and cv cache that were
collected when building the testsuite of range-v3 (with the default
build flags which include -O -Wall -Wextra, so warning-dependent folding
applies).

The "naive solution" below refers to the one in which we would refrain
from updating the caches at the end of cp_fold/maybe_constant_value
whenever we're in uid-sensitive constexpr evaluation mode (i.e. whenever
we're called from fold_for_warn), regardless of whether constexpr
evaluation was actually restricted.


FOLD CACHE  | cache hit | cache miss| cache miss|
  |   | cache updated | cache not updated |
+---+---+---+
naive sol'n |   5060779 |  11374667 |  12887790 |
+
this patch  |   6665242 |  19688975 |199137 |
+---+---+---+


CV CACHE| cache hit | cache miss| cache miss|
  |   | cache updated | cache not updated |
+---+---+---+
naive sol'n | 76214 |   3637218 |   6889213 |
+
this patch  |215980 |   9882310 |405513 |
+---+---+---+

-- >8 --

gcc/cp/ChangeLog:

PR c++/94038
* constexpr.c (constexpr_ctx::uid_sensitive): Remove field.
(uid_sensitive_constexpr_evaluation_value): Define.
(uid_sensitive_constexpr_evaluation_true_counter): Define.
(uid_sensitive_constexpr_evaluation_p): Define.
(uid_sensitive_constexpr_evaluation_sentinel): Define its
constructor.
(uid_sensitive_constexpr_evaluation_checker): Define its
constructor and its evaluation_restricted_p method.
(get_fundef_copy): Remove 'ctx' parameter.  Use u_s_c_e_p
instead of constexpr_ctx::uid_sensitive.
(cxx_eval_call_expression): Use u_s_c_e_p instead, and test it
last.  Adjust call to get_fundef_copy.
(instantiate_cx_fn_r): Test u_s_c_e_p so that we increment the
counter if necessary.
(cxx_eval_outermost_constant_expr): Remove 'uid_sensitive'
parameter.  Adjust function body accordingly.
(maybe_constant_value): Remove 'uid_sensitive' parameter and
adjust function body accordingly.  Set up a
uid_sensitive_constexpr_evaluation_checker, and use it to
conditionally update the cv_cache.
* cp-gimplify.h (cp_fold): Set up a
uid_sensitive_constexpr_evaluation_checker, and use it to
conditionally update the fold_cache.
* cp-tree.h (maybe_constant_value): Update declaration.
(struct uid_sensitive_constexpr_evaluation_sentinel): Define.
(struct sensitive_constexpr_evaluation_checker): Define.
* expr.c (fold_for_warn): Set up a
uid_sensitive_constexpr_evaluation_sentinel before calling

Re: ChangeLog files - server and client scripts

2020-05-22 Thread Jakub Jelinek via Gcc-patches
On Fri, May 22, 2020 at 12:37:29PM -0700, Ian Lance Taylor wrote:
> Thanks for looking into this.
> 
> Unfortunately, my push is still failing.  I'm not sure why.
> 
> remote: *** ChangeLog format failed:
> remote: ERR: cannot find a ChangeLog location in message
> remote:
> remote: Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
> remote:
> remote: error: hook declined to update refs/heads/master
> To git+ssh://gcc.gnu.org/git/gcc
>  ! [remote rejected] master -> master (hook declined)
> error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc'
> 
> 
> I've attached the output of "git format-patch -k 1 --stdout", in case
> that helps.

Bet the script first looks for the ChangeLog entry and only considers the
ignored prefixes if it finds files in the patch that are not mentioned in
the ChangeLog entry.  So, if you say modified gcc/go/whatever.cc and had
ChangeLog entry for that and not for the files you've changed, it would be
ok.

So, I think before emitting the above message, it should look through the
patch and if it finds all files in ignored prefixes, it should just not to
do anything.

We'll need it e.g. for the DATESTAMP bump job too which only modifies those
files too and doesn't write ChangeLog entry for that.

I'm sorry but I don't know the script well enough to fix it quickly, will
defer to Martin as the author.

Can you wait with the commit until Monday?  If not, I could just temporarily
disable this for your commit.

Jakub



Re: ChangeLog files - server and client scripts

2020-05-22 Thread Ian Lance Taylor via Gcc-patches
On Fri, May 22, 2020 at 4:11 AM Jakub Jelinek  wrote:
>
> On Fri, May 22, 2020 at 12:04:10PM +0100, Richard Earnshaw wrote:
> > >> The directories in question are
> > >>
> > >> gcc/go/gofrontend
> > >> libgo
> > >> gcc/testsuite/go.test/test
> > >
> > > The script has:
> > > ignored_prefixes = [
> > > 'gcc/d/dmd/',
> > > 'gcc/go/frontend/',
> >
> > The directory is gcc/go/gofrontend
> >
> > so it's missing 'go' from frontend.
>
> Thanks for spotting.  I believe Martin said he will be afk
> today, so I've fixed it for him and committed as obvious and
> am going to install into git-hooks now too.
>
> diff --git a/contrib/ChangeLog b/contrib/ChangeLog
> index 7b61bb8915b..64a0db18e58 100644
> --- a/contrib/ChangeLog
> +++ b/contrib/ChangeLog
> @@ -1,3 +1,9 @@
> +2020-05-22  Jakub Jelinek  
> +
> +   * gcc-changelog/git_commit.py: Add trailing / to
> +   gcc/testsuite/go.test/test and replace gcc/go/frontend/
> +   with gcc/go/gofrontend/ in ignored locations.
> +
>  2020-05-22  Martin Liska  
>
> * gcc-changelog/git_commit.py: Add gcc/testsuite/go.test/test
> diff --git a/contrib/gcc-changelog/git_commit.py 
> b/contrib/gcc-changelog/git_commit.py
> index ba9f5ce9650..8c5fa2c0fc9 100755
> --- a/contrib/gcc-changelog/git_commit.py
> +++ b/contrib/gcc-changelog/git_commit.py
> @@ -127,8 +127,8 @@ bug_components = set([
>
>  ignored_prefixes = [
>  'gcc/d/dmd/',
> -'gcc/go/frontend/',
> -'gcc/testsuite/go.test/test',
> +'gcc/go/gofrontend/',
> +'gcc/testsuite/go.test/test/',
>  'libgo/',
>  'libphobos/libdruntime',
>  'libphobos/src/',

Thanks for looking into this.

Unfortunately, my push is still failing.  I'm not sure why.

remote: *** ChangeLog format failed:
remote: ERR: cannot find a ChangeLog location in message
remote:
remote: Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
remote:
remote: error: hook declined to update refs/heads/master
To git+ssh://gcc.gnu.org/git/gcc
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc'


I've attached the output of "git format-patch -k 1 --stdout", in case
that helps.

Ian
From 81994eab700da7fea6644541c163aa0f0f3b8cf1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Tue, 19 May 2020 16:03:54 +0200
Subject: libgo: update x/sys/cpu after gccgo support added

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/234597
---
 gcc/go/gofrontend/MERGE   |  2 +-
 .../sys/cpu/{cpu_aix_ppc64.go => cpu_aix.go}  |  2 +-
 .../golang.org/x/sys/cpu/syscall_aix_gccgo.go | 27 +++
 3 files changed, 29 insertions(+), 2 deletions(-)
 rename libgo/go/golang.org/x/sys/cpu/{cpu_aix_ppc64.go => cpu_aix.go} (96%)
 create mode 100644 libgo/go/golang.org/x/sys/cpu/syscall_aix_gccgo.go

diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index bc9c1f07eda..284374820b0 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-bc27341f245a5cc54ac7530d037a609db72b677c
+ea58b8491064fbed18a220571a3043c38dccf7c7
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/golang.org/x/sys/cpu/cpu_aix_ppc64.go 
b/libgo/go/golang.org/x/sys/cpu/cpu_aix.go
similarity index 96%
rename from libgo/go/golang.org/x/sys/cpu/cpu_aix_ppc64.go
rename to libgo/go/golang.org/x/sys/cpu/cpu_aix.go
index b0ede112d4e..02d03129e50 100644
--- a/libgo/go/golang.org/x/sys/cpu/cpu_aix_ppc64.go
+++ b/libgo/go/golang.org/x/sys/cpu/cpu_aix.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix,ppc64
+// +build aix
 
 package cpu
 
diff --git a/libgo/go/golang.org/x/sys/cpu/syscall_aix_gccgo.go 
b/libgo/go/golang.org/x/sys/cpu/syscall_aix_gccgo.go
new file mode 100644
index 000..2609cc49ae7
--- /dev/null
+++ b/libgo/go/golang.org/x/sys/cpu/syscall_aix_gccgo.go
@@ -0,0 +1,27 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Recreate a getsystemcfg syscall handler instead of
+// using the one provided by x/sys/unix to avoid having
+// the dependency between them. (See golang.org/issue/32102)
+// Morover, this file will be used during the building of
+// gccgo's libgo and thus must not use a CGo method.
+
+// +build aix
+// +build gccgo
+
+package cpu
+
+import (
+   "syscall"
+)
+
+//extern getsystemcfg
+func gccgoGetsystemcfg(label uint32) (r uint64)
+
+func callgetsystemcfg(label int) (r1 uintptr, e1 syscall.Errno) {
+   r1 = uintptr(gccgoGetsystemcfg(uint32(label)))
+   e1 = syscall.GetErrno()
+   return
+}
-- 
2.27.0.rc0.183.gde8f92d652-goog



Re: [RFC] analyzer: Add exit, and _exit replacement, to sm-signal.

2020-05-22 Thread Mark Wielaard
On Thu, May 21, 2020 at 09:05:09AM -0400, David Malcolm wrote:
> So the above should read something like:
> 
> +  exit(1); /* { dg-warning "call to 'exit' from within signal handler" 
> "warning" } */
> +  /* { dg-message "note: '_exit' is a possible signal-safe alternative for 
> 'exit'" "replacement note" { target *-*-* } .-1 } */
> 
> OK with that change (but please double-check I got the syntax correct!)

You got the syntax correct and now the test PASS message is different
in gcc.sum. Pushed.

Thanks,

Mark


Re: New mklog script

2020-05-22 Thread Jonathan Wakely via Gcc-patches
On Fri, 22 May 2020 at 19:15, Thomas Koenig via Gcc  wrote:
>
> Hi,
>
> what's currently in trunk (as of a few hours ago) fails for me with
>
>File "contrib/mklog.py", line 36, in 
>  from unidiff import PatchSet
> ModuleNotFoundError: No module named 'unidiff'
>
> I think this is an error which would have to be taken into account
> one way or another - maybe include it in download_prerequisites ?

No, most people do not want download_prerequisites to start installing
python packages, they should use their system's package manager.


>
> Regards
>
> Thomas


[Patch] PR fortran/95106 - truncation of long symbol names with EQUIVALENCE

2020-05-22 Thread Harald Anlauf
The PR is originally about a bogus warning, which turned out to originate
from truncation of mangled names.  Fix: simply extend temporaries to hold
all possible allowed values (which would enable 2^31 equivalences).

Regtested on x86_64-pc-linux-gnu.

The testcase checks the presence of properly generated mangled names
in the assembler.  If this should be restricted to x86-type systems,
I can adjust that.

OK for master?

Thanks,
Harald


PR fortran/95106 - truncation of long symbol names with EQUIVALENCE

For long module names, the generated name-mangled symbol was
truncated, leading to bogus warnings about COMMON block
mismatches.  Provide sufficiently large temporaries.

gcc/fortran/

2020-05-22  Harald Anlauf  

PR fortran/95106
* trans-common.c (gfc_sym_mangled_common_id): Enlarge temporaries
for name-mangling.

gcc/testsuite/

2020-05-22  Harald Anlauf  

PR fortran/95106
* gfortran.dg/equiv_11.f90: New test.
diff --git a/gcc/fortran/trans-common.c b/gcc/fortran/trans-common.c
index bf163bc4f52..3775a8bea74 100644
--- a/gcc/fortran/trans-common.c
+++ b/gcc/fortran/trans-common.c
@@ -242,8 +242,9 @@ static tree
 gfc_sym_mangled_common_id (gfc_common_head *com)
 {
   int has_underscore;
-  char mangled_name[GFC_MAX_MANGLED_SYMBOL_LEN + 1];
-  char name[GFC_MAX_SYMBOL_LEN + 1];
+  /* Provide sufficient space to hold "symbol.eq.1234567890__".  */
+  char mangled_name[GFC_MAX_MANGLED_SYMBOL_LEN + 1 + 16];
+  char name[GFC_MAX_SYMBOL_LEN + 1 + 16];

   /* Get the name out of the common block pointer.  */
   strcpy (name, com->name);
diff --git a/gcc/testsuite/gfortran.dg/equiv_11.f90 b/gcc/testsuite/gfortran.dg/equiv_11.f90
new file mode 100644
index 000..0f4a1ab5c32
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/equiv_11.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fsecond-underscore" }
+! PR fortran/95106
+
+module m2345678901234567890123456789012345678901234567890123456789_123
+  implicit none
+  real :: a(4), u(3,2)
+  real :: b(4), v(4,2)
+  equivalence (a(1),u(1,1)), (b(1),v(1,1))
+end
+! { dg-final { scan-assembler {m2345678901234567890123456789012345678901234567890123456789_123.eq.0__} } }
+! { dg-final { scan-assembler {m2345678901234567890123456789012345678901234567890123456789_123.eq.1__} } }


Re: New mklog script

2020-05-22 Thread Thomas Koenig via Gcc-patches

Hi,

what's currently in trunk (as of a few hours ago) fails for me with

  File "contrib/mklog.py", line 36, in 
from unidiff import PatchSet
ModuleNotFoundError: No module named 'unidiff'

I think this is an error which would have to be taken into account
one way or another - maybe include it in download_prerequisites ?

Regards

Thomas


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-22 Thread Segher Boessenkool
On Fri, May 22, 2020 at 01:22:10PM +0200, Richard Biener wrote:
> On Wed, May 20, 2020 at 10:37 PM Segher Boessenkool
>  wrote:
> >
> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> > > I think this is the wrong way to approach this.  You're doing too many
> > > things at once.  Try to fix the powerpc regression with the extra
> > > flag_rtl_unroll_loops, that could be backported.  Then you can
> > > independently see whether enabling more unrolling at -O2 makes
> > > sense.  Because currently we _do_ unroll at -O2 when it does
> > > not increase size.  Its just your patches make this as aggressive
> > > as -O3.
> >
> > Just do a separate flag (and option) for cunroll, instead?
> >
> > The RTL unroller is *the* unroller, and has been since forever.
> 
> Sorry but that ship has sailed - I don't think we should make
> -O2 -funroll-loops no longer affect GIMPLE.

Of course, and it would certainly be good if we could do some non-
trivial unrolling earlier.  My point is, why rename all RTL stuff
now?  That is just churn.

(And we will need to keep unrolling in RTL for a long time still, if
not forever).

> But sure, what I am proposing is have
> 
> -frtl-unroll-loops
> -fgimple-unroll-loops
> 
> with obvious semantics and -funroll-loops enabling both.
> Users should not really care about what is RTL or GIMPLE.

Yeah -- so this shouldn't be part of the name at all.  What you care
about are higher level characteristics of the unrolling, like
"complete" or "early" or whatever.

> The split above allows the "bug" to be fixed (even on the branch)
> without introducing even more target specialities.

So does any split.  Or I don't see what you mean?


Segher


Re: New mklog script

2020-05-22 Thread Martin Sebor via Gcc-patches

On 5/21/20 2:16 AM, Martin Liška wrote:

Hello Martin.

Can you please compare the current mklog.py. Is there anything
you miss compared to your current script?


Nope, it matches the format I get with my script and even works
better and runs faster.  Very nice!  I'll be happy to switch to
using it instead.

Thanks!
Martin

PS A couple of ideas for future enhancements are to have the script
print "New function." or "New type." for newly added functions and
types, and to print "Adjust comments." for changes to comments alone.


Re: [PATCH] Let numeric_limits::is_iec559 reflect -ffast-math

2020-05-22 Thread Jonathan Wakely via Gcc-patches

On 22/05/20 09:49 +0200, Matthias Kretz wrote:

On Donnerstag, 21. Mai 2020 17:46:01 CEST Marc Glisse wrote:

On Thu, 21 May 2020, Jonathan Wakely wrote:
> On 27/04/20 17:09 +0200, Matthias Kretz wrote:
>> From: Matthias Kretz 
>>
>>PR libstdc++/84949
>>* include/std/limits: Let is_iec559 reflect whether
>>__GCC_IEC_559 says float and double support IEEE 754-2008.
>>* testsuite/18_support/numeric_limits/is_iec559.cc: Test IEC559
>>mandated behavior if is_iec559 is true.
>>* testsuite/18_support/numeric_limits/infinity.cc: Only test inf
>>behavior if is_iec559 is true, otherwise there is no guarantee
>>how arithmetic on inf behaves.
>>* testsuite/18_support/numeric_limits/quiet_NaN.cc: ditto for
>>NaN.
>>* testsuite/18_support/numeric_limits/denorm_min-1.cc: Compile
>>with -ffast-math.
>>* testsuite/18_support/numeric_limits/epsilon-1.cc: ditto.
>>* testsuite/18_support/numeric_limits/infinity-1.cc: ditto.
>>* testsuite/18_support/numeric_limits/is_iec559-1.cc: ditto.
>>* testsuite/18_support/numeric_limits/quiet_NaN-1.cc: ditto.
>
> I'm inclined to go ahead and commit this (to master only, obviously).
> It certainly seems more correct to me, and we'll probably never find
> out if it's "safe" to do unless we actually change it and see what
> happens.
>
> Marc, do you have an opinion?

I don't have a strong opinion on this. I thought we were refraining from
changing numeric_limits based on flags (like -fwrapv for modulo) because
that would lead to ODR violations when people link objects compiled with
different flags. There is a value in libstdc++.so, which may have been
compiled with different flags than the application.


But these ODR violations happen in any case: The floating-point types are
different types with or without -ffast-math (and related) flags. They behave
differently. Compiling a function in multiple TUs with different flags
produces observably different results. Choosing a single one of them is
obviously fragile and broken. That's the spirit of an ODR violation...

It would sometimes be useful to have different types:
float, float_no_nan, float_no_nan_no_signed_zero, ...


Sure. There are ODR violations like that, and then there are ones
like:

  template
  struct X
  {
conditional_t::is_iec559, T, BigNum> val;
  };

I'm generally not concerned about ODR violations where one TU behaves
as requested by the flags used to compile that TU and another behaves
as requested by the flats used to compile that second TU. That happens
all the time with -fno-exceptions and -fno-rtti and such like. That
causes ODR violations too, but of the kind where each definition does
what was requested.

Constants defined by the library changing value is a bit more
concerning. But I don't know if it's really a problem in this case.



Re: Improve lto streaming dumps

2020-05-22 Thread Jan Hubicka
> >
> 
> I got
> 
> ../../src-master/gcc/lto-streamer-out.c: In constructor
> ??DFS::DFS(output_block*, tree, bool, bool, bool)??:
> ../../src-master/gcc/lto-streamer-out.c:807:10: error: too many
> arguments for format [-Werror=format-extra-args]
>   807 |  " Streaming single tree\n", size);
>   |  ^
> 
> Can you fix it?
Sorry,
I fixed it by this patch

* lto-streamer-out.c (DFS::DFS): Silence warning.

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 5ff7f33bfb1..887a51d3eae 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -803,8 +803,7 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, 
bool this_ref_p,
}
  else if (streamer_dump_file)
{
- fprintf (streamer_dump_file,
-  " Streaming single tree\n", size);
+ fprintf (streamer_dump_file, " Streaming single tree\n");
}
 
  /* Write size-1 SCCs without wrapping them inside SCC bundles.


Re: Improve lto streaming dumps

2020-05-22 Thread H.J. Lu via Gcc-patches
On Fri, May 22, 2020 at 7:15 AM Jan Hubicka  wrote:
>
> Hi,
> this patch cleans up dumping of streaming so it is clear how dump is organized
> and how much space individual components needs.
>
> Compiling:
>
> int a=1;
> main()
> {
>   return a;
> }
>
> The output is now:
>
> Creating output block for function_body
> Streaming tree  
>  Start of LTO_trees of size 1
>   Encoding indexableas 0
>  10 bytes
>
> ^^^ I do not think we should need 10 bytes to stream single indexable 
> reference
> to 0 :)
>
>  Start of LTO_trees of size 1
>   Encoding indexableas 1
>  10 bytes
>   Streaming header ofto 
> function_body
>   Streaming body ofto function_body
>   Encoding indexableas 2
>   Encoding indexableas 0
>   Streaming ref to  
>   Streaming ref to  
> 52 bytes
>
> ^^^ Instead of having multiple LTO_trees sections followed by the final tree
> it would make a lot of sense to have only one LTO_trees where the first tree
> is one lto_input_tree should return.  This is easy to arrange in DFS walk -
> one does not need to pop after every SCC component but pop once at the end of
> walk.  However this breaks handling of integer_csts because they may now
> become of LTO_trees block and streamed as header + body.
> This bypasses the separate code for shared integer_cst streaming.  I think
> I want to stream everything into header and materialize the tree since it is 
> not
> part of SCC anyway.
>
> Streaming tree  
>   Streaming header ofto function_body
>   Streaming body ofto function_body
> 8 bytes
>   Streaming gimple stmt _2 = a;
> Streaming ref to  
> 4 bytes
> Streaming tree  
>  Start of LTO_trees of size 1
>   Encoding indexableas 3
>  10 bytes
>  Start of LTO_trees of size 1
>   Streaming header ofto function_body
>   Streaming body ofto function_body
>   Encoding indexableas 0
>  15 bytes
>   Streaming header ofto function_body
>   Streaming body ofto function_body
>   Streaming ref to  
>   Streaming ref to  
> 42 bytes
>   Streaming gimple stmt return _2;
>
> Outputting global stream
>  0:  
> Streaming tree  
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>  576 bytes
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>  68 bytes
>  Streaming single tree
>   Streaming header ofto decls
>   Streaming body ofto decls
>  3 bytes
>  Streaming single tree
>   Streaming header ofto decls
>   Streaming body ofto decls
>  3 bytes
>  Streaming single tree
>   Streaming header ofto 
> decls
>   Streaming body ofto decls
>   Streaming ref to  
>  22 bytes
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>  38 bytes
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>  38 bytes
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>  58 bytes
> 806 bytes
>  0:  
> Streaming tree  
>  Streaming single tree
>   Streaming header ofto decls
>   Streaming body ofto decls
>  3 bytes
>  Streaming single tree
>   Streaming ref to  
>  7 bytes
>  Streaming single tree
>   Streaming ref to  
>  7 bytes
>  Start of LTO_tree_scc of size 1
>   Streaming header ofto decls
>   Streaming body ofto decls
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>   Streaming ref to  
>  49 bytes
> 66 bytes
>
>
> Bootstrapped/regtested x86_64-linux. I plan to commit it today if there
> are no complains :)
>
> gcc/ChangeLog:
>
> 2020-05-22  Jan Hubicka  
>
> * lto-section-out.c (lto_output_decl_index): Adjust dump indentation.
> * lto-streamer-out.c (create_output_block): Fix whitespace
> (lto_write_tree_1): Add (debug) dump.
> (DFS::DFS): Add dump.
> (DFS::DFS_write_tree_body): Do not dump here.
> (lto_output_tree): Improve dumping; do not stream ref when not needed.
> (produce_asm_for_decls): Fix whitespace.
> * tree-streamer-out.c (streamer_write_tree_header): Add dump.
>
> diff --git 

[committed] i386: Fix 2 expander [PR95255]

2020-05-22 Thread Uros Bizjak via Gcc-patches
2020-05-22  Uroš Bizjak  

gcc/ChangeLog:
PR target/95255
* config/i386/i386.md (2): Do not try to
expand non-sse4 ROUND_ROUNDEVEN rounding via SSE support routines.

gcc/testsuite/ChangeLog:
PR target/95255
* gcc.target/i386/pr95255.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e81c737b7bc..459cf62b0de 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17115,16 +17115,18 @@
 && (flag_fp_int_builtin_inexact || !flag_trapping_math))
|| (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
&& (TARGET_SSE4_1
- || (ROUND_ != ROUND_ROUNDEVEN
- && (flag_fp_int_builtin_inexact || !flag_trapping_math"
+  || (ROUND_ != ROUND_ROUNDEVEN
+  && (flag_fp_int_builtin_inexact || !flag_trapping_math"
 {
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
-  && (TARGET_SSE4_1 || flag_fp_int_builtin_inexact || !flag_trapping_math))
+  && (TARGET_SSE4_1
+ || (ROUND_ != ROUND_ROUNDEVEN
+ && (flag_fp_int_builtin_inexact || !flag_trapping_math
 {
   if (TARGET_SSE4_1)
emit_insn (gen_sse4_1_round2
-  (operands[0], operands[1], GEN_INT (ROUND_
-  | ROUND_NO_EXC)));
+  (operands[0], operands[1],
+   GEN_INT (ROUND_ | ROUND_NO_EXC)));
   else if (TARGET_64BIT || (mode != DFmode))
{
  if (ROUND_ == ROUND_FLOOR)
diff --git a/gcc/testsuite/gcc.target/i386/pr95255.c 
b/gcc/testsuite/gcc.target/i386/pr95255.c
new file mode 100644
index 000..5b731941f72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95255.c
@@ -0,0 +1,8 @@
+/* PR target/95255 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse4.1 -mfpmath=both" } */
+
+double foo (double x)
+{
+  return __builtin_roundeven (x);
+}


Re: [PATCH][PPC64] [PR88877]

2020-05-22 Thread kamlesh kumar via Gcc-patches
ping?

On Tue, May 19, 2020 at 5:32 PM kamlesh kumar 
wrote:

> can someone look at the patch, please?
>
>
> On Wed, Apr 8, 2020 at 9:29 PM Jeff Law  wrote:
>
>> On Mon, 2020-04-06 at 14:58 +0530, kamlesh kumar via Gcc-patches wrote:
>> > Hi Richard,
>> > Here is a discussion we did some time ago
>> > https://gcc.gnu.org/pipermail/gcc/2019-January/227834.html
>> > please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88877 for more
>> > info regarding the bug.
>> >
>> > We incorporated below Jakub's suggestion in this patch.
>> >
>> > Jakub wrote:
>> > ""
>> > Yeah, all the callers of emit_library_call* would need to be changed to
>> pass
>> > triplets rtx, machine_mode, int/bool /*unsignedp*/, instead of just
>> > rtx_mode_t pair.
>> > Jakub
>> > ""
>> I think you're generally on the right track here, but I'm deferring this
>> to gcc11
>> (other maintainers are, of course, welcome to push it for gcc10 as it's a
>> code
>> correctness issue).
>>
>> Jeff
>> >
>>
>>


Re: [PATCH] c++: P0848R3 and member function templates [PR95181]

2020-05-22 Thread Patrick Palka via Gcc-patches
On Fri, 22 May 2020, Patrick Palka wrote:

> When comparing two special member function templates to see if one hides
> the other (as per P0848R3), we need to check satisfaction which we can't
> do on templates.  So this patch makes add_method skip the eligibility
> test on member function templates and just lets them coexist.

It just occurred to me that this problem isn't limited to member function
templates.  Consider this valid testcase which we currently reject:

template struct g {
  g() requires B && false;
  g() requires B;
};

g b; // error

During add_method, we check satisfaction of both default constructors,
and since their constraints are dependent, constraints_satisfied_p
returns true for both sets of constraints.  We then see that
'B && false' is more constrained than 'B' and therefore discard the second
constructor.  Since we discarded the second default constructor at
definition time, the instantiation g has no eligible default
constructor.

I am not sure what to do from here...

> 
> Passes 'make check-c++', does this look OK to commit to master and later
> to the 10 branch?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/95181
>   * class.c (add_method): Let special member function templates
>   coexist if they are not equivalently constrained.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/95181
>   * g++.dg/concepts/pr95181.C: New test.
> ---
>  gcc/cp/class.c  | 10 ++
>  gcc/testsuite/g++.dg/concepts/pr95181.C |  9 +
>  2 files changed, 15 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/concepts/pr95181.C
> 
> diff --git a/gcc/cp/class.c b/gcc/cp/class.c
> index bab15524a60..cb268d191d6 100644
> --- a/gcc/cp/class.c
> +++ b/gcc/cp/class.c
> @@ -1079,10 +1079,12 @@ add_method (tree type, tree method, bool via_using)
>   {
> special_function_kind sfk = special_memfn_p (method);
>  
> -   if (sfk == sfk_none || DECL_INHERITED_CTOR (fn))
> - /* Non-special member functions coexist if they are not
> -equivalently constrained.  A member function is not hidden
> -by an inherited constructor.  */
> +   if (sfk == sfk_none
> +   || DECL_INHERITED_CTOR (fn)
> +   || TREE_CODE (fn) == TEMPLATE_DECL)
> + /* Member function templates and non-special member functions
> +coexist if they are not equivalently constrained.  A member
> +function is not hidden by an inherited constructor.  */
>   continue;
>  
> /* P0848: For special member functions, deleted, unsatisfied, or
> diff --git a/gcc/testsuite/g++.dg/concepts/pr95181.C 
> b/gcc/testsuite/g++.dg/concepts/pr95181.C
> new file mode 100644
> index 000..0185c86b438
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/concepts/pr95181.C
> @@ -0,0 +1,9 @@
> +// PR c++/95181
> +// { dg-do compile { target concepts } }
> +
> +template  struct f {
> +  template  f();
> +  template  requires false f();
> +};
> +
> +f a;
> -- 
> 2.27.0.rc1.5.gae92ac8ae3
> 
> 



V3 [PATCH] x86: Move cpuinfo.h from libgcc to common/config/i386

2020-05-22 Thread H.J. Lu via Gcc-patches
On Thu, May 21, 2020 at 10:37 AM H.J. Lu  wrote:
>
> On Wed, May 20, 2020 at 4:21 AM H.J. Lu  wrote:
> >
> > On Tue, May 19, 2020 at 11:10 PM Uros Bizjak  wrote:
> > >
> > > On Tue, May 19, 2020 at 11:40 PM H.J. Lu  wrote:
> > >
> > > > > > > > > > > I will take a look to see if we share the same CPU 
> > > > > > > > > > > detection code between
> > > > > > > > > > > libgcc and config/i386/driver-i386.c.
> > > > > > > > > >
> > > > > > > > > > I don't think it will bring any benefit, this is mainly one 
> > > > > > > > > > huge
> > > > > > > > > > switch statement that maps to different stuff in libgcc and
> > > > > > > > > > driver-i386.
> > > > > > > > >
> > > > > > > > > libgcc and config/i386/driver-i386.c differ even before my 
> > > > > > > > > patch.
> > > > > > > > > I think we can do better.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Move cpuinfo.h from libgcc to common/config/i386 so that 
> > > > > > > > get_intel_cpu
> > > > > > > > can be shared by libgcc, GCC driver, 
> > > > > > > > gcc.target/i386/builtin_target.c
> > > > > > > > and libgfortran to detect the specific type of Intel CPU.  
> > > > > > > > Update
> > > > > > > > libgfortran to use has_cpu_feature to detect x86 CPU features.
> > > > > > > >
> > > > > > > > Tested on Linux/x86 and Linux/x86-64.  OK for master?
> > > > > > >
> > > > > > > Handling only Intel targets and not others is a sure way for 
> > > > > > > patch to
> > > > > > > be ignored.
> > > > > > >
> > > > > >
> > > > > > Here is the updated patch to cover AMD CPU.  It also fixes:
> > > > > >
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95212
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95220
> > > > > >
> > > > > > OK for master?
> > > > >
> > > > > Huh... I didn't think the solution will be this messy... I have to
> > > > > rethink the approach a bit.
> > > >
> > > > That is what will happen when we have the same info in more than one 
> > > > place
> > > > There should only one place for CPU info.
> > >
> > > Looking at the patch, it is clear that cpuinfo.c and driver-i386.c
> > > should stay apart. They are two different things, and they are
> > > orthogonal to each other; one can be updated without affecting the
> > > other one. driver-i386.c handles way more targets than cpuinfo.c and
> > > your patch only handles a subset of them. The unification does not
> > > bring any benefit, it even complicates things more.
> >
> > There should one place to check a CPU feature, not 2.  It can
> > be done with a proper cpuinfo.h.
> >
>
> This updated patch does it.   It removes duplicated codes and
> simplifies the implementation.  With this patch, we just to update
> a single place to add the new ISA support.
>

Here is the updated patch to add common/config/i386/cpuinfo-builtins.h.
SIZE_OF_CPU_FEATURES is now defined with CPU_FEATURE_MAX
so that we can simply just add a new feature to enum processor_features
and __cpu_features2 will have the correct size to cover all features.
fold_builtin_cpu is changed to work with any SIZE_OF_CPU_FEATURES
values.

Tested on Linux/x86-64 and Linux/x86.   OK for master?

Thanks.

-- 
H.J.
From 74bf612b8f33937cc33dcea46baabf2bf56eb9ee Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 18 May 2020 05:58:41 -0700
Subject: [PATCH] x86: Move cpuinfo.h from libgcc to common/config/i386

Move cpuinfo.h from libgcc to common/config/i386 so that get_intel_cpu
can be shared by libgcc, GCC driver and gcc.target/i386/builtin_target.c
to detect the specific type of Intel and AMD CPUs:

1. Use the same enum processor_features in libgcc and x86 backend.
2. Add more processor features to enum processor_features.
3. Add M_VENDOR, M_CPU_TYPE and M_CPU_SUBTYPE in i386-builtins.c to
avoid duplication in.
4. Use cpu_indicator_init, has_cpu_feature, get_amd_cpu and get_intel_cpu
in driver-i386.c and builtin_target.c.

gcc/

	PR target/95259
	* common/config/i386/cpuinfo-builtins.h: Moved from
	libgcc/config/i386/cpuinfo.h.
	(processor_vendor): Add VENDOR_CENTAUR, VENDOR_CYRIX, VENDOR_NSC
	and BUILTIN_VENDOR_MAX.
	(processor_types): Add BUILTIN_CPU_TYPE_MAX.
	(processor_features): Add FEATURE_3DNOW, FEATURE_3DNOWP,
	FEATURE_ADX, FEATURE_ABM, FEATURE_CLDEMOTE, FEATURE_CLFLUSHOPT,
	FEATURE_CLWB, FEATURE_CLZERO, FEATURE_CMPXCHG16B,
	FEATURE_CMPXCHG8B, FEATURE_ENQCMD, FEATURE_F16C,
	FEATURE_FSGSBASE, FEATURE_FXSAVE, FEATURE_HLE, FEATURE_IBT,
	FEATURE_LAHF_LM, FEATURE_LM, FEATURE_LWP, FEATURE_LZCNT,
	FEATURE_MOVBE, FEATURE_MOVDIR64B, FEATURE_MOVDIRI,
	FEATURE_MWAITX, FEATURE_OSPKE, FEATURE_OSXSAVE, FEATURE_PCONFIG,
	FEATURE_PKU, FEATURE_PREFETCHWT1, FEATURE_PRFCHW,
	FEATURE_PTWRITE, FEATURE_RDPID, FEATURE_RDRND, FEATURE_RDSEED,
	FEATURE_RTM, FEATURE_SERIALIZE, FEATURE_SGX, FEATURE_SHA,
	FEATURE_SHSTK, FEATURE_TBM, FEATURE_TSXLDTRK, FEATURE_VAES,
	FEATURE_WAITPKG, FEATURE_WBNOINVD, FEATURE_XSAVE, FEATURE_XSAVEC,
	FEATURE_XSAVEOPT, FEATURE_XSAVES and CPU_FEATURE_MAX.
	(__processor_model): Moved to cpuinfo.h.
	(__cpu_model): 

Re: [PATCH] c++: P0848R3 and member function templates [PR95181]

2020-05-22 Thread Patrick Palka via Gcc-patches
On Fri, 22 May 2020, Patrick Palka wrote:

> When comparing two special member function templates to see if one hides
> the other (as per P0848R3), we need to check satisfaction which we can't
> do on templates.  So this patch makes add_method skip the eligibility
> test on member function templates and just lets them coexist.
> 
> Passes 'make check-c++', does this look OK to commit to master and later
> to the 10 branch?

... after a full bootstrap and regtest on x86_64-pc-linux-gnu, of
course.



[PATCH] c++: P0848R3 and member function templates [PR95181]

2020-05-22 Thread Patrick Palka via Gcc-patches
When comparing two special member function templates to see if one hides
the other (as per P0848R3), we need to check satisfaction which we can't
do on templates.  So this patch makes add_method skip the eligibility
test on member function templates and just lets them coexist.

Passes 'make check-c++', does this look OK to commit to master and later
to the 10 branch?

gcc/cp/ChangeLog:

PR c++/95181
* class.c (add_method): Let special member function templates
coexist if they are not equivalently constrained.

gcc/testsuite/ChangeLog:

PR c++/95181
* g++.dg/concepts/pr95181.C: New test.
---
 gcc/cp/class.c  | 10 ++
 gcc/testsuite/g++.dg/concepts/pr95181.C |  9 +
 2 files changed, 15 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/pr95181.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index bab15524a60..cb268d191d6 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1079,10 +1079,12 @@ add_method (tree type, tree method, bool via_using)
{
  special_function_kind sfk = special_memfn_p (method);
 
- if (sfk == sfk_none || DECL_INHERITED_CTOR (fn))
-   /* Non-special member functions coexist if they are not
-  equivalently constrained.  A member function is not hidden
-  by an inherited constructor.  */
+ if (sfk == sfk_none
+ || DECL_INHERITED_CTOR (fn)
+ || TREE_CODE (fn) == TEMPLATE_DECL)
+   /* Member function templates and non-special member functions
+  coexist if they are not equivalently constrained.  A member
+  function is not hidden by an inherited constructor.  */
continue;
 
  /* P0848: For special member functions, deleted, unsatisfied, or
diff --git a/gcc/testsuite/g++.dg/concepts/pr95181.C 
b/gcc/testsuite/g++.dg/concepts/pr95181.C
new file mode 100644
index 000..0185c86b438
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/pr95181.C
@@ -0,0 +1,9 @@
+// PR c++/95181
+// { dg-do compile { target concepts } }
+
+template  struct f {
+  template  f();
+  template  requires false f();
+};
+
+f a;
-- 
2.27.0.rc1.5.gae92ac8ae3



Avoid streaming of stray referneces

2020-05-22 Thread Jan Hubicka
Hi,
this patch avoids stremaing completely useless stray references to gobal decl
stream.  I am re-testing the patch (rebased to current tree) on x86_64-linux
and intend to commit once testing finishes.

Honza

gcc/ChangeLog:

2020-05-22  Jan Hubicka  

* lto-streamer-out.c (lto_output_tree): Do not stream final ref if
it is not needed.

gcc/lto/ChangeLog:

2020-05-22  Jan Hubicka  

* lto-common.c (lto_read_decls): Do not skip stray refs.

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index f5daadc657b..5ff7f33bfb1 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -1780,7 +1780,7 @@ lto_output_tree (struct output_block *ob, tree expr,
 it.  */
   if (!existed_p)
lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
-  else
+  else if (this_ref_p)
{
  if (streamer_dump_file)
{
diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
index d04b1c9ca7b..3ea1894ce96 100644
--- a/gcc/lto/lto-common.c
+++ b/gcc/lto/lto-common.c
@@ -1955,25 +1955,19 @@ lto_read_decls (struct lto_file_decl_data *decl_data, 
const void *data,
   else
{
  t = lto_input_tree_1 (_main, data_in, tag, 0);
- /* We streamed in new tree.  Add it to cache and process dref.  */
- if (data_in->reader_cache->nodes.length () == from + 1)
+ gcc_assert (data_in->reader_cache->nodes.length () == from + 1);
+ num_unshared_trees_read++;
+ data_in->location_cache.accept_location_cache ();
+ process_dref (data_in, t, from);
+ if (TREE_CODE (t) == IDENTIFIER_NODE
+ || (TREE_CODE (t) == INTEGER_CST
+ && !TREE_OVERFLOW (t)))
+   ;
+ else
{
- num_unshared_trees_read++;
- data_in->location_cache.accept_location_cache ();
- process_dref (data_in, t, from);
- if (TREE_CODE (t) == IDENTIFIER_NODE
- || (TREE_CODE (t) == INTEGER_CST
- && !TREE_OVERFLOW (t)))
-   ;
- else
-   {
- lto_maybe_register_decl (data_in, t, from);
- process_new_tree (t, , from, , data_in);
-   }
+ lto_maybe_register_decl (data_in, t, from);
+ process_new_tree (t, , from, , data_in);
}
- else
-   /* FIXME: It seems useless to pickle stray references.  */
-   gcc_assert (data_in->reader_cache->nodes.length () == from);
}
 }
 


Re: RFC: Provide diagnostic hints for missing inttypes.h string constants.

2020-05-22 Thread David Malcolm via Gcc-patches
On Fri, 2020-05-22 at 01:30 +0200, Mark Wielaard wrote:
Hi Mark

> This is on top of the stdbool.h and stdint.h patches.

Sorry, I didn't see those patches; I've replied to them now.

> This adds a flag to c_parser so we know when we were trying to
> constract a string literal. If there is a parse error and we were
> constructing a string literal, and the next token is an unknown
> identifier name, and we know there is a standard header that defines
> that name as a string literal, then add a missing header hint to
> the error messsage.

By comparison, what's the existing behavior for the example?
Is it just what you posted below, but without the "note" diagnostics?

> I haven't checked yet if we can do something similar for the C++
> parser. And the testcase needs to be extended a bit. But I hope the
> direction is OK.

I think it's a worthy goal; as noted below I'd want a C frontend
maintainer to approve those parts of it.

> For the following source:
> 
> const char *hex64_fmt =  PRIx64;
> const char *msg_fmt = "Provide %" PRIx64 "\n";
> 
> void foo (uint32_t v)
> {
>   printf ("%" PRIu32, v);
> }
> 
> We will get the following:
> 
> $ /opt/local/gcc/bin/gcc -c t.c
> t.c:4:26: error: ‘PRIx64’ undeclared here (not in a function)
> 4 | const char *hex64_fmt =  PRIx64;
>   |  ^~
> t.c:3:1: note: ‘PRIx64’ is defined in header ‘’; did you forget 
> to ‘#include ’?
> 2 | #include 
>   +++ |+#include 
> 3 |
> t.c:5:35: error: expected ‘,’ or ‘;’ before ‘PRIx64’
> 5 | const char *msg_fmt = "Provide %" PRIx64 "\n";
>   |   ^~
> t.c:5:35: note: ‘PRIx64’ is defined in header ‘’; did you forget 
> to ‘#include ’?

It's a big improvement, but it's still a little clunky.  The
  expected ‘,’ or ‘;’ before ‘PRIx64’
is the sort of thing we've gotten used to with years of GCC messages,
but might make little sense to a neophyte.
I wonder if it's possible to improve this further, but I fear that that
might overcomplicate things (if I understand things correctly, the
string concatenation fails in the tokenizer, and then the PRIx64 fails
in the parser, so we have a bad interaction involving two different
levels of abstraction within the frontend - I think).

> t.c: In function ‘foo’:
> t.c:9:14: error: expected ‘)’ before ‘PRIu32’
> 9 |   printf ("%" PRIu32, v);
>   |  ^~~
>   |  )
> t.c:9:15: note: ‘PRIu32’ is defined in header ‘’; did you forget 
> to ‘#include ’?
> 9 |   printf ("%" PRIu32, v);
>   |   ^~
> ---

[...snip...]
  
> +/* These can be used as string macros or as identifiers. Must all be
> +   string-literals.  Used in get_stdlib_header_for_name and
> +   get_c_stdlib_header_for_string_macro_name.  */
> +static const stdlib_hint c99_cxx11_macro_hints[] = {
> +/*  and .  */
> +{"PRId8", {"", ""} },
> +{"PRId16", {"", ""} },
> +{"PRId32", {"", ""} },
> +{"PRId64", {"", ""} },
> +{"PRIi8", {"", ""} },
> +{"PRIi16", {"", ""} },
> +{"PRIi32", {"", ""} },
> +{"PRIi64", {"", ""} },
> +{"PRIo8", {"", ""} },
> +{"PRIo16", {"", ""} },
> +{"PRIo32", {"", ""} },
> +{"PRIo64", {"", ""} },
> +{"PRIu8", {"", ""} },
> +{"PRIu16", {"", ""} },
> +{"PRIu32", {"", ""} },
> +{"PRIu64", {"", ""} },
> +{"PRIx8", {"", ""} },
> +{"PRIx16", {"", ""} },
> +{"PRIx32", {"", ""} },
> +{"PRIx64", {"", ""} },
> +{"PRIX8", {"", ""} },
> +{"PRIX16", {"", ""} },
> +{"PRIX32", {"", ""} },
> +{"PRIX64", {"", ""} },
> +
> +{"PRIdPTR", {"", ""} },
> +{"PRIiPTR", {"", ""} },
> +{"PRIoPTR", {"", ""} },
> +{"PRIuPTR", {"", ""} },
> +{"PRIxPTR", {"", ""} },
> +{"PRIXPTR", {"", ""} },
> +
> +{"SCNd8", {"", ""} },
> +{"SCNd16", {"", ""} },
> +{"SCNd32", {"", ""} },
> +{"SCNd64", {"", ""} },
> +{"SCNi8", {"", ""} },
> +{"SCNi16", {"", ""} },
> +{"SCNi32", {"", ""} },
> +{"SCNi64", {"", ""} },
> +{"SCNo8", {"", ""} },
> +{"SCNo16", {"", ""} },
> +{"SCNo32", {"", ""} },
> +{"SCNo64", {"", ""} },
> +{"SCNu8", {"", ""} },
> +{"SCNu16", {"", ""} },
> +{"SCNu32", {"", ""} },
> +{"SCNu64", {"", ""} },
> +{"SCNx8", {"", ""} },
> +{"SCNx16", {"", ""} },
> +{"SCNx32", {"", ""} },
> +{"SCNx64", {"", ""} },
> +
> +{"SCNdPTR", {"", ""} },
> +{"SCNiPTR", {"", ""} },
> +{"SCNoPTR", {"", ""} },
> +{"SCNuPTR", {"", ""} },
> +{"SCNxPTR", {"", ""} }
> +};

I found myself squinting at the array trying to decide if every entry
had
   {"", ""}
as its second element.  I *think* that's the case, right? 

I know there's a lot of pre-existing duplication in this file, but
maybe this would be better as simply an array of const char *?
It could probably be reformatted to take up far fewer lines by grouping
them logically.

Maybe have a

  static const char *
  get_c99_cxx11_macro_hint (const char *, enum stdlib 

Improve lto streaming dumps

2020-05-22 Thread Jan Hubicka
Hi,
this patch cleans up dumping of streaming so it is clear how dump is organized
and how much space individual components needs.

Compiling:

int a=1;
main()
{
  return a;
}

The output is now:

Creating output block for function_body
Streaming tree  
 Start of LTO_trees of size 1
  Encoding indexableas 0 
 10 bytes

^^^ I do not think we should need 10 bytes to stream single indexable reference
to 0 :)

 Start of LTO_trees of size 1
  Encoding indexableas 1 
 10 bytes
  Streaming header ofto function_body
  Streaming body ofto function_body
  Encoding indexableas 2 
  Encoding indexableas 0 
  Streaming ref to  
  Streaming ref to  
52 bytes

^^^ Instead of having multiple LTO_trees sections followed by the final tree
it would make a lot of sense to have only one LTO_trees where the first tree
is one lto_input_tree should return.  This is easy to arrange in DFS walk -
one does not need to pop after every SCC component but pop once at the end of
walk.  However this breaks handling of integer_csts because they may now
become of LTO_trees block and streamed as header + body.
This bypasses the separate code for shared integer_cst streaming.  I think
I want to stream everything into header and materialize the tree since it is not
part of SCC anyway.

Streaming tree  
  Streaming header ofto function_body
  Streaming body ofto function_body
8 bytes
  Streaming gimple stmt _2 = a;
Streaming ref to  
4 bytes
Streaming tree  
 Start of LTO_trees of size 1
  Encoding indexableas 3 
 10 bytes
 Start of LTO_trees of size 1
  Streaming header ofto function_body
  Streaming body ofto function_body
  Encoding indexableas 0 
 15 bytes
  Streaming header ofto function_body
  Streaming body ofto function_body
  Streaming ref to  
  Streaming ref to  
42 bytes
  Streaming gimple stmt return _2;

Outputting global stream
 0:  
Streaming tree  
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
 576 bytes
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
 68 bytes
 Streaming single tree
  Streaming header ofto decls
  Streaming body ofto decls
 3 bytes
 Streaming single tree
  Streaming header ofto decls
  Streaming body ofto decls
 3 bytes
 Streaming single tree
  Streaming header ofto decls
  Streaming body ofto decls
  Streaming ref to  
 22 bytes
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
 38 bytes
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
 38 bytes
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
 58 bytes
806 bytes
 0:  
Streaming tree  
 Streaming single tree
  Streaming header ofto decls
  Streaming body ofto decls
 3 bytes
 Streaming single tree
  Streaming ref to  
 7 bytes
 Streaming single tree
  Streaming ref to  
 7 bytes
 Start of LTO_tree_scc of size 1
  Streaming header ofto decls
  Streaming body ofto decls
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
  Streaming ref to  
 49 bytes
66 bytes


Bootstrapped/regtested x86_64-linux. I plan to commit it today if there
are no complains :)

gcc/ChangeLog:

2020-05-22  Jan Hubicka  

* lto-section-out.c (lto_output_decl_index): Adjust dump indentation.
* lto-streamer-out.c (create_output_block): Fix whitespace
(lto_write_tree_1): Add (debug) dump.
(DFS::DFS): Add dump.
(DFS::DFS_write_tree_body): Do not dump here.
(lto_output_tree): Improve dumping; do not stream ref when not needed.
(produce_asm_for_decls): Fix whitespace.
* tree-streamer-out.c (streamer_write_tree_header): Add dump.

diff --git a/gcc/lto-section-out.c b/gcc/lto-section-out.c
index 8eda3b5fde1..0182cd6059e 100644
--- a/gcc/lto-section-out.c
+++ b/gcc/lto-section-out.c
@@ -170,7 +170,7 @@ lto_output_decl_index (struct lto_output_stream *obs,
   index = encoder->trees.length ();
   if (streamer_dump_file)
{
- print_node_brief (streamer_dump_file, "Encoding indexable ",
+ 

Re: [PATCH] Suggest including or for [u]int[8|16|32|64]_t

2020-05-22 Thread David Malcolm via Gcc-patches
On Thu, 2020-05-21 at 00:27 +0200, Mark Wielaard wrote:
> Plus [u]intptr_t and associated constants.
> 
> Refactor the bool, true, false,  code so it fits into the
> new table based design.
> 
> gcc/c-family/ChangeLog:
> 
>   * known-headers.cc (get_stdlib_header_for_name): Add a new
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/spellcheck-stdint.c: New test.
>   * g++.dg/spellcheck-stdint.C: Likewise.

Looks good to me

Thanks
Dave



Re: [PATCH] Suggest including for bool, true and false

2020-05-22 Thread David Malcolm via Gcc-patches
On Tue, 2020-05-19 at 23:36 +0200, Mark Wielaard wrote:
> Currently gcc suggests to use _Bool instead of bool and doesn't give
> any suggestions when true or false are used, but undefined. This
> patch
> makes it so that (for C99 or higher) a fixit hint is emitted to
> include
> .

[...snip...]

Looks good to me.

Thanks
Dave




[PATCH] coroutines: Handle lambda closure pointers like 'this'.

2020-05-22 Thread Iain Sandoe
Hi,

We didn’t quite have time to push this through before the 10.1
deadline, but (since it’s a correctness issue) it should be applied
to the branch too.

tested on x86_64-darwin, linux
OK for master?
10.2 after some bake time?
thanks
Iain

===

It was agreed amongst the implementors that the correct
interpretation of the standard is that lambda closure pointers
should be treated in the same manner as class object pointers.

gcc/cp/ChangeLog:

* coroutines.cc (instantiate_coro_traits): Pass a reference
to lambda closure objects to traits instantiation.
(morph_fn_to_coro): Likewise for promise parameter
preview and allocator lookup.
---
 gcc/cp/coroutines.cc | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index b79e2c66b70..6e3f03bdb0d 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -304,8 +304,8 @@ instantiate_coro_traits (tree fndecl, location_t kw)
 
   while (arg_node != NULL_TREE && !VOID_TYPE_P (TREE_VALUE (arg_node)))
 {
-  /* See PR94807, as to why we must exclude lambda here.  */
-  if (is_this_parameter (arg) && !lambda_p)
+  if (is_this_parameter (arg)
+ || (lambda_p && DECL_NAME (arg) == closure_identifier))
{
  /* We pass a reference to *this to the param preview.  */
  tree ct = TREE_TYPE (TREE_TYPE (arg));
@@ -3793,13 +3793,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
parm.frame_type = actual_type;
 
  parm.this_ptr = is_this_parameter (arg);
- /* See PR94807.  When a lambda is in a template instantiation, the
-closure object is named 'this' instead of '__closure'.  */
  if (lambda_p)
-   {
- parm.lambda_cobj = DECL_NAME (arg) == closure_identifier;
- gcc_checking_assert (!parm.this_ptr);
-   }
+   parm.lambda_cobj = DECL_NAME (arg) == closure_identifier;
  else
parm.lambda_cobj = false;
 
@@ -3953,9 +3948,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
{
  param_info *parm_i = param_uses->get (arg);
  gcc_checking_assert (parm_i);
- if (parm_i->lambda_cobj)
-   vec_safe_push (args, arg);
- else if (parm_i->this_ptr)
+ if (parm_i->this_ptr || parm_i->lambda_cobj)
{
  /* We pass a reference to *this to the allocator lookup.  */
  tree tt = TREE_TYPE (TREE_TYPE (arg));
@@ -4159,9 +4152,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
 
  /* Add this to the promise CTOR arguments list, accounting for
 refs and special handling for method this ptr.  */
- if (parm.lambda_cobj)
-   vec_safe_push (promise_args, arg);
- else if (parm.this_ptr)
+ if (parm.this_ptr || parm.lambda_cobj)
{
  /* We pass a reference to *this to the param preview.  */
  tree tt = TREE_TYPE (arg);
-- 
2.24.1



PING: [PATCH] x86: Add UNSPECV_PATCHABLE_AREA

2020-05-22 Thread H.J. Lu via Gcc-patches
On Sat, May 2, 2020 at 4:55 AM H.J. Lu  wrote:
>
> Currently patchable area is at the wrong place.  It is placed immediately
> after function label, before both .cfi_startproc and ENDBR.  This patch
> adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
> changes ENDBR insertion pass to also insert patchable area instruction.
> TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY is defined to avoid placing
> patchable area before .cfi_startproc and ENDBR.
>
> OK for master?
>
> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/93492
> * config/i386/i386-features.c (rest_of_insert_endbranch):
> Renamed to ...
> (rest_of_insert_endbr_and_patchable_area): Change return type
> to void. Add need_endbr and patchable_area_size arguments.
> Don't call timevar_push nor timevar_pop.  Replace
> endbr_queued_at_entrance with insn_queued_at_entrance.  Insert
> UNSPECV_PATCHABLE_AREA for patchable area.
> (pass_data_insert_endbranch): Renamed to ...
> (pass_data_insert_endbr_and_patchable_area): This.  Change
> pass name to endbr_and_patchable_area.
> (pass_insert_endbranch): Renamed to ...
> (pass_insert_endbr_and_patchable_area): This.  Add need_endbr
> and patchable_area_size;.
> (pass_insert_endbr_and_patchable_area::gate): Set and check
> need_endbr and patchable_area_size.
> (pass_insert_endbr_and_patchable_area::execute): Call
> timevar_push and timevar_pop.  Pass need_endbr and
> patchable_area_size to rest_of_insert_endbr_and_patchable_area.
> (make_pass_insert_endbranch): Renamed to ...
> (make_pass_insert_endbr_and_patchable_area): This.
> * config/i386/i386-passes.def: Replace pass_insert_endbranch
> with pass_insert_endbr_and_patchable_area.
> * config/i386/i386-protos.h (ix86_output_patchable_area): New.
> (make_pass_insert_endbranch): Renamed to ...
> (make_pass_insert_endbr_and_patchable_area): This.
> * config/i386/i386.c (ix86_asm_output_function_label): Set
> function_label_emitted to true.
> (ix86_print_patchable_function_entry): New function.
> (ix86_output_patchable_area): Likewise.
> (x86_function_profiler): Replace endbr_queued_at_entrance with
> insn_queued_at_entrance.  Generate ENDBR only for TYPE_ENDBR.
> Call ix86_output_patchable_area to generate patchable area if
> needed.
> (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): New.
> * i386.h (queued_insn_type): New.
> (machine_function): Add function_label_emitted.  Replace
> endbr_queued_at_entrance with insn_queued_at_entrance.
> * config/i386/i386.md (UNSPECV_PATCHABLE_AREA): New.
> (patchable_area): New.
>
> gcc/testsuite/
>
> PR target/93492
> * gcc.target/i386/pr93492-1.c: New test.
> * gcc.target/i386/pr93492-2.c: Likewise.
> * gcc.target/i386/pr93492-3.c: Likewise.
> * gcc.target/i386/pr93492-4.c: Likewise.
> * gcc.target/i386/pr93492-5.c: Likewise.

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545021.html

-- 
H.J.


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-22 Thread Richard Biener via Gcc-patches
On Wed, May 20, 2020 at 10:37 PM Segher Boessenkool
 wrote:
>
> On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> > I think this is the wrong way to approach this.  You're doing too many
> > things at once.  Try to fix the powerpc regression with the extra
> > flag_rtl_unroll_loops, that could be backported.  Then you can
> > independently see whether enabling more unrolling at -O2 makes
> > sense.  Because currently we _do_ unroll at -O2 when it does
> > not increase size.  Its just your patches make this as aggressive
> > as -O3.
>
> Just do a separate flag (and option) for cunroll, instead?
>
> The RTL unroller is *the* unroller, and has been since forever.

Sorry but that ship has sailed - I don't think we should make
-O2 -funroll-loops no longer affect GIMPLE.  But sure, what I
am proposing is have

-frtl-unroll-loops
-fgimple-unroll-loops

with obvious semantics and -funroll-loops enabling both.
Users should not really care about what is RTL or GIMPLE.
The split above allows the "bug" to be fixed (even on the branch)
without introducing even more target specialities.

Richard.

>
> Segher


[PATCH] coroutines: Ensure distinct DTOR trees [PR95137].

2020-05-22 Thread Iain Sandoe

Hi

This is almost obvious - except perhaps I’m missing some more
efficient way of doing it; it seems less than ideal to have to build the
ctor call twice with exactly the same inputs.

If there’s no better way ….

tested on x86_64-darwin, linux
OK for master, 10.2?
thanks
iain

Part of the PR notes that there are UBSAN fails for the coroutines
test suite.  These are primarily related to the use of the same DTOR
tree in the two edges from the await block.  Fixed by building a new
tree for each.

gcc/cp/ChangeLog:

2020-05-19  Iain Sandoe  

PR sanitizer/95137
* coroutines.cc (expand_one_await_expression): Build separate
DTOR trees for the awaitable object on the destroy and resume
paths.
---
 gcc/cp/coroutines.cc | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index b79e2c66b70..18f74d034b7 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1468,19 +1468,13 @@ expand_one_await_expression (tree *stmt, tree  
*await_expr, void *d)

   tree resume_label = create_named_label_with_ctx (loc, buf, actor);
   tree empty_list = build_empty_stmt (loc);

-  tree dtor = NULL_TREE;
   tree await_type = TREE_TYPE (var);
-  if (needs_dtor)
-dtor = build_special_member_call (var, complete_dtor_identifier, NULL,
- await_type, LOOKUP_NORMAL,
- tf_warning_or_error);
-
   tree stmt_list = NULL;
   tree t_expr = STRIP_NOPS (expr);
   tree r;
   tree *await_init = NULL;
   if (t_expr == var)
-dtor = NULL_TREE;
+needs_dtor = false;
   else
 {
   /* Initialize the var from the provided 'o' expression.  */
@@ -1595,7 +1589,12 @@ expand_one_await_expression (tree *stmt, tree  
*await_expr, void *d)

   destroy_label = build_stmt (loc, LABEL_EXPR, destroy_label);
   append_to_statement_list (destroy_label, _list);
   if (needs_dtor)
-append_to_statement_list (dtor, _list);
+{
+  tree dtor = build_special_member_call (var, complete_dtor_identifier,
+NULL, await_type, LOOKUP_NORMAL,
+tf_warning_or_error);
+  append_to_statement_list (dtor, _list);
+}
   r = build1_loc (loc, GOTO_EXPR, void_type_node, data->cleanup);
   append_to_statement_list (r, _list);

@@ -1630,7 +1629,12 @@ expand_one_await_expression (tree *stmt, tree  
*await_expr, void *d)

   /* Get a pointer to the revised statment.  */
   tree *revised = tsi_stmt_ptr (tsi_last (stmt_list));
   if (needs_dtor)
-append_to_statement_list (dtor, _list);
+{
+  tree dtor = build_special_member_call (var, complete_dtor_identifier,
+NULL, await_type, LOOKUP_NORMAL,
+tf_warning_or_error);
+  append_to_statement_list (dtor, _list);
+}
   data->index += 2;

   /* Replace the original statement with the expansion.  */
--
2.24.1



Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

2020-05-22 Thread Richard Biener via Gcc-patches
On Thu, May 21, 2020 at 10:17 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:
> > Adding Segher to CC, he can help us.
>
> Oh dear.  Are you sure?
>
> > On 5/21/20 2:51 PM, Martin Liška wrote:
> > >Back to this I noticed that ppc64le target build is broken due to:
>
> > >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
> > >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error:
> > >vcondv4sfv4sf cannot FAIL
> > >   357 | FAIL;
>
> Is it new that vcond cannot FAIL?  Because we have done that for years.
>
> Since this breaks bootstrap on a primary target, please revert the patch
> until it is sorted.
>
> > >which is caused by the 4 added optabs:
> > >
> > >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> > >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> > >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> > >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>
> > >looking at the generator:
>
> > >I get there due to:
> > >
> > >B- │516   if (find_optab (, XSTR (expand,
> > >0)))│
> > >│517
> > > {   │
> > >│518   gcc_assert (p.op <
> > > NUM_OPTABS);   │
> > >│519   if
> > > (nofail_optabs[p.op])  │
> > >│520 can_fail_p =
> > > false; │
> > >│521
> > > }   │

OK, so this is an "artifact" of direct internal functions.  We do check that
expansion does not actually FAIL before emitting calls to those IFNs.

I guess this simply makes direct internal functions not a 100% match for
our use and the way out is to add regular internal functions mapping to
the optabs.  That is, I guess, for direct-internal functions it should be
enough to check direct_internal_function_supported_p which it is not
for the case of vcond*.

Richard, do you agree?

Thanks,
Richard.

> > >
> > >#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> > >   nofail_optabs[OPTAB##_optab] = true;
>
> So yes it is new.  Please fix :-(
>
> > >Any hint what's bad? Note that x86_64-linux-gnu is fine.
> > >Do I miss a target hook?
>
> There is a new IFN that requires the existing optabs to never fail.  But
> they *do* sometimes fail.  That is what I understand from this anyway,
> please correct if needed :-)
>
> We can make the rs6000 patterns never FAIL if that is a good idea (I am
> not convinced however), but this should be documented, and all existing
> targets need to be checked.
>
> In general it is not pleasant at all to have patterns that cannot FAIL,
> it makes writing a (new) port much harder, and there can be cases where
> there is no sane code at all that can be generated for some cases, etc.
>
>
> Segher


Re: ChangeLog files - server and client scripts

2020-05-22 Thread Jakub Jelinek via Gcc-patches
On Fri, May 22, 2020 at 12:04:10PM +0100, Richard Earnshaw wrote:
> >> The directories in question are
> >>
> >> gcc/go/gofrontend
> >> libgo
> >> gcc/testsuite/go.test/test
> > 
> > The script has:
> > ignored_prefixes = [
> > 'gcc/d/dmd/',
> > 'gcc/go/frontend/',
> 
> The directory is gcc/go/gofrontend
> 
> so it's missing 'go' from frontend.

Thanks for spotting.  I believe Martin said he will be afk
today, so I've fixed it for him and committed as obvious and
am going to install into git-hooks now too.

diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index 7b61bb8915b..64a0db18e58 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,9 @@
+2020-05-22  Jakub Jelinek  
+
+   * gcc-changelog/git_commit.py: Add trailing / to
+   gcc/testsuite/go.test/test and replace gcc/go/frontend/
+   with gcc/go/gofrontend/ in ignored locations.
+
 2020-05-22  Martin Liska  
 
* gcc-changelog/git_commit.py: Add gcc/testsuite/go.test/test
diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index ba9f5ce9650..8c5fa2c0fc9 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -127,8 +127,8 @@ bug_components = set([
 
 ignored_prefixes = [
 'gcc/d/dmd/',
-'gcc/go/frontend/',
-'gcc/testsuite/go.test/test',
+'gcc/go/gofrontend/',
+'gcc/testsuite/go.test/test/',
 'libgo/',
 'libphobos/libdruntime',
 'libphobos/src/',

Jakub



Re: [PATCH][PR92658] Add missing vector truncmn2 expanders for avx512f

2020-05-22 Thread Uros Bizjak via Gcc-patches
On Fri, May 22, 2020 at 11:52 AM Hongtao Liu  wrote:
> > On a related note, it looks that pmov stores are modelled in a wrong
> > way. For example, this pattern;
> >
> > (define_insn "*avx512f_v8div16qi2_store"
> >   [(set (match_operand:V16QI 0 "memory_operand" "=m")
> > (vec_concat:V16QI
> >   (any_truncate:V8QI
> > (match_operand:V8DI 1 "register_operand" "v"))
> >   (vec_select:V8QI
> > (match_dup 0)
> > (parallel [(const_int 8) (const_int 9)
> >(const_int 10) (const_int 11)
> >(const_int 12) (const_int 13)
> >(const_int 14) (const_int 15)]]
> >
> > models the store in 128bit mode, but according to ISA, it stores in 16bit 
> > mode.
> >
> according to ISA, it stores in 64bit mode
> vpmovqb xmm1/m64 {k1}{z}, zmm2.
>
> memory_operand is 128bit but upper 64bit is not changed which means it
> store only lower 64bits, just same meaning to ISA.

Sorry, I somehow mixed insn patterns. This is the right example:

(define_insn "*avx512vl_v2div2qi2_store"
  [(set (match_operand:V16QI 0 "memory_operand" "=m")
(vec_concat:V16QI
  (any_truncate:V2QI
  (match_operand:V2DI 1 "register_operand" "v"))
  (vec_select:V14QI
(match_dup 0)
(parallel [(const_int 2) (const_int 3)
   (const_int 4) (const_int 5)
   (const_int 6) (const_int 7)
   (const_int 8) (const_int 9)
   (const_int 10) (const_int 11)
   (const_int 12) (const_int 13)
   (const_int 14) (const_int 15)]]
  "TARGET_AVX512VL"
  "vpmovqb\t{%1, %0|%w0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "memory" "store")
   (set_attr "prefix" "evex")
   (set_attr "mode" "TI")])

The isa says:

EVEX.128.F3.0F38.W0 32 /r VPMOVQB xmm1/m16 {k1}{z}, xmm2

However, the pattern says that V16QImode is stored to a memory. Due to
this, insn template needs %w modifier for intel dialect, which is the
sign that something is wrong with the pattern.

These conversions should be reimplemented as having
nonimmedate_operand output operand and memory operand should be split
to a separate insn using a pre-reload splitter. Please see how sse4_1
conversions handle their input operands.

Uros.


[PATCH] tree-optimization/95268 - fix commoning of clobbers

2020-05-22 Thread Richard Biener


This fixes handling of clobbers when commoning stores.

Bootstrapped / tested on x86-64_unknown-linux-gnu.

2020-05-22  Richard Biener  

PR tree-optimization/95268
* tree-ssa-sink.c (sink_common_stores_to_bb): Handle clobbers
properly.

* g++.dg/torture/pr95268.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr95268.C | 46 ++
 gcc/tree-ssa-sink.c| 25 ++
 2 files changed, 60 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr95268.C

diff --git a/gcc/testsuite/g++.dg/torture/pr95268.C 
b/gcc/testsuite/g++.dg/torture/pr95268.C
new file mode 100644
index 000..8385b86e179
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr95268.C
@@ -0,0 +1,46 @@
+// { dg-do compile }
+// { dg-require-effective-target lp64 }
+// { dg-additional-options "-Wno-overflow" }
+
+#include 
+
+extern short var_0, var_2, var_3, var_9, var_11, var_13, var_14, var_19, 
var_22,
+var_32, var_37, var_44, var_57, var_59, var_63, var_70;
+extern unsigned var_5;
+extern char var_6, var_12, var_18, var_38, var_39, var_43, var_55, var_64,
+arr_35;
+extern long var_7, var_8, var_10, var_15, var_25, var_56;
+extern int var_21, var_36, var_51, var_65, var_68, arr_7;
+extern bool var_46, var_58, var_67;
+
+void test() {
+  var_12 = 0 >= 0;
+  var_13 = arr_7;
+  var_14 = (unsigned long)var_7 >> -564810131 + 564810189;
+  var_15 = var_5;
+  var_18 = -602739307623583391;
+  var_19 = -~0;
+  var_21 = var_10 >> var_8 - 17101301574577641170ULL;
+  var_22 = var_5;
+  var_25 = var_9;
+  var_32 = std::max((unsigned)var_2, var_5);
+  var_36 = 9557;
+  var_37 = 394545925;
+  var_38 = 0 >= 0;
+  var_39 = var_5;
+  var_43 = 0;
+  var_44 = arr_35;
+  var_46 = var_7;
+  for (short a = 0; a < 9; a = 021)
+for (short b = 0; b < 024; b += 4)
+  var_51 = std::min((long long)(var_2 ?: var_9), (long long)var_9);
+  var_55 = var_0;
+  var_56 = 3896150587;
+  var_57 = var_11;
+  var_58 = var_59 = var_11;
+  var_63 = 73;
+  var_64 = 10393232284806619711ULL;
+  var_65 = var_3;
+  var_67 = var_6;
+  var_68 = var_70 = 0;
+}
diff --git a/gcc/tree-ssa-sink.c b/gcc/tree-ssa-sink.c
index c5b535bed4d..b61ecf12d1f 100644
--- a/gcc/tree-ssa-sink.c
+++ b/gcc/tree-ssa-sink.c
@@ -534,7 +534,9 @@ sink_common_stores_to_bb (basic_block bb)
  /* ??? We could handle differing SSA uses in the LHS by inserting
 PHIs for them.  */
  else if (! operand_equal_p (gimple_assign_lhs (first_store),
- gimple_assign_lhs (def), 0))
+ gimple_assign_lhs (def), 0)
+  || (gimple_clobber_p (first_store)
+  && !gimple_clobber_p (def)))
{
  first_store = NULL;
  break;
@@ -546,16 +548,17 @@ sink_common_stores_to_bb (basic_block bb)
 
  /* Check if we need a PHI node to merge the stored values.  */
  bool allsame = true;
- for (unsigned i = 1; i < vdefs.length (); ++i)
-   {
- gimple *def = SSA_NAME_DEF_STMT (vdefs[i]);
- if (! operand_equal_p (gimple_assign_rhs1 (first_store),
-gimple_assign_rhs1 (def), 0))
-   {
- allsame = false;
- break;
-   }
-   }
+ if (!gimple_clobber_p (first_store))
+   for (unsigned i = 1; i < vdefs.length (); ++i)
+ {
+   gimple *def = SSA_NAME_DEF_STMT (vdefs[i]);
+   if (! operand_equal_p (gimple_assign_rhs1 (first_store),
+  gimple_assign_rhs1 (def), 0))
+ {
+   allsame = false;
+   break;
+ }
+ }
 
  /* We cannot handle aggregate values if we need to merge them.  */
  tree type = TREE_TYPE (gimple_assign_lhs (first_store));
-- 
2.12.3


Re: ChangeLog files - server and client scripts

2020-05-22 Thread Richard Earnshaw
On 22/05/2020 05:57, Jakub Jelinek wrote:
> On Thu, May 21, 2020 at 03:12:21PM -0700, Ian Lance Taylor via Gcc wrote:
>> Hi, this unfortunately breaks gccgo development.  Significant parts of
>> the gccgo sources are simply copied from other repositories.  Those
>> other repositories do not use ChangeLog files.  The git commit hook
>> should not require ChangeLog files for those changes.  And, when the
>> time comes, no ChangeLog files should be created for changes in those
>> directories.
>>
>> The directories in question are
>>
>> gcc/go/gofrontend
>> libgo
>> gcc/testsuite/go.test/test
> 
> The script has:
> ignored_prefixes = [
> 'gcc/d/dmd/',
> 'gcc/go/frontend/',

The directory is gcc/go/gofrontend

so it's missing 'go' from frontend.

> 'libgo/',
> 'libphobos/libdruntime',
> 'libphobos/src/',
> 'libsanitizer/',
> ]
> so perhaps it just misses gcc/testsuite/go.test/test ?
> Or what exact files you've changed in your script?
> 
>   Jakub
> 

R.


Re: [PATCH] Implement no_stack_protect attribute.

2020-05-22 Thread Richard Biener via Gcc-patches
On Thu, May 21, 2020 at 12:09 PM Martin Liška  wrote:
>
> On 5/18/20 1:49 PM, Richard Biener wrote:
> > On Mon, May 18, 2020 at 1:10 PM Jakub Jelinek via Gcc-patches
> >  wrote:
> >>
> >> On Mon, May 18, 2020 at 01:03:40PM +0200, Jakub Jelinek wrote:
>  The optimize attribute is used to specify that a function is to be 
>  compiled
>  with different optimization options than specified on the command line.
>  ```
> 
>  It's pretty clear about the command line arguments (that are ignored).
> >>>
> >>> That is not clear at all.  The difference is primarily in what the option
> >>> string says in there.
> >>
> >> And if we decide we want to keep existing behavior (haven't checked if we
> >> actually behave that way yet), we should add some syntax that will allow
> >> ammending command line options rather than replacing it.
>
> Hello.
>
> Back to this, I must say that option handling is a complex thing in the GCC.
>
> >
> > We do behave this way - while we're running against the current
> > gcc_options we call decode_options which first does
> > default_options_optimization.  So it behaves inconsistently with
> > respect to options not explicitly listed in default_options_table.
>
> Yes, we basically build on top of the currently selection flags.
> We use default_options_table because any optimization level selection
> in an optimization attribute should efficiently enforce call of 
> default_options_table.
>
> What about using them only in case one changes optimization level (patch)?

I'm not sure if that works, no -On means -O0, so one might interpret
optimize("omit-frame-pointer") as -O0 -fomit-frame-pointer.   Also -On
are not treated in position dependent way, thus -fno-tree-pre -O2 is
the same as -O2 -fno-tree-pre rather than re-enabling PRE over -fno-tree-pre.
So your patch would turn a -O2 -fno-tree-pre invocation, when
optimize("O3") is encountered, to one with PRE enabled, not matching
behavior of -O2 -fno-tree-pre -O3.

I think the only sensible way is to save the original decoded options
from the gcc invocation (and have #pragma optimize push/pop append
accordingly) and append the current attribute/pragmas optimization
string to them and run that whole thing on a default option state.

That makes semantics equivalent to appending more options to the
command-line.  Well, hopefully.  Interaction with the target attribute
might be interesting (that likely also needs to append to that
"current decoded options" set).

As you say, option handling is difficult.

Richard.

> Martin
>
> >
> > But I also think doing the whole processing as in decode_options
> > is the only thing that's really tested.  And a fix would be to not
> > start from the current set of options but from a clean slate...
> >
> > The code clearly thinks we're doing incremental changes:
> >
> >/* Save current options.  */
> >cl_optimization_save (_opts, _options);
> >
> >/* If we previously had some optimization options, use them as the
> >   default.  */
> >if (old_opts)
> >  cl_optimization_restore (_options,
> >   TREE_OPTIMIZATION (old_opts));
> >
> >/* Parse options, and update the vector.  */
> >parse_optimize_options (args, true);
> >DECL_FUNCTION_SPECIFIC_OPTIMIZATION (*node)
> >  = build_optimization_node (_options);
> >
> >/* Restore current options.  */
> >cl_optimization_restore (_options, _opts);
> >
> > so for example you should see -fipa-pta preserved from the
> > command-line while -fno-tree-pta would be overridden even
> > if not mentioned explicitely in the optimize attribute.
> >
> > IMHO the current situation is far from useful...
> >
> > Richard.
> >
> >> Could be say start the optimize attribute string with + or something
> >> similar.
> >> Does target attribute behave that way too?
> >>
> >>  Jakub
> >>
>


[PATCH] enfoce SLP_TREE_VECTYPE for invariants

2020-05-22 Thread Richard Biener


This is the patch I pushed.  There's two places that are a bit ugly,
COND_EXPR handling and reduction handling.  Both uglinesses should go
away once we go SLP-only so I did not spend more than 10 minutes looking
for a nicer solution (eh).  I'm going to push through with moderately
clean solutions to be able to see issues on other targets rather than
waiting till SLP-only is fully ready.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.


This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors
and provides some infrastructure for setting it in the vectorizable_*
functions, amending those.

2020-05-22  Richard Biener  

* tree-vectorizer.h (vect_is_simple_use): New overload.
(vect_maybe_update_slp_op_vectype): New.
* tree-vect-stmts.c (vect_is_simple_use): New overload
accessing operands of SLP vs. non-SLP operation transparently.
(vect_maybe_update_slp_op_vectype): New function updating
the possibly shared SLP operands vector type.
(vectorizable_operation): Be a bit more SLP vs non-SLP agnostic
using the new vect_is_simple_use overload;  update SLP invariant
operand nodes vector type.
(vectorizable_comparison): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_store): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_assignment): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Enforce
present SLP_TREE_VECTYPE and check it matches previous
behavior.
---
 gcc/tree-vect-loop.c  |  33 +-
 gcc/tree-vect-slp.c   |  16 ++-
 gcc/tree-vect-stmts.c | 270 +++---
 gcc/tree-vectorizer.h |   5 +
 4 files changed, 256 insertions(+), 68 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ecce348b39c..4f94b4baad9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6185,17 +6185,29 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  The last use is the reduction variable.  In case of nested cycle this
  assumption is not true: we use reduc_index to record the index of the
  reduction variable.  */
-  reduc_def = PHI_RESULT (reduc_def_phi);
+  /* ???  To get at invariant/constant uses on the SLP node we have to
+ get to it here, slp_node is still the reduction PHI.  */
+  slp_tree slp_for_stmt_info = NULL;
+  if (slp_node)
+{
+  slp_for_stmt_info = slp_node_instance->root;
+  /* And then there's reduction chain with a conversion ...  */
+  if (SLP_TREE_SCALAR_STMTS (slp_for_stmt_info)[0] != stmt_info)
+   slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
+  gcc_assert (SLP_TREE_SCALAR_STMTS (slp_for_stmt_info)[0] == stmt_info);
+}
+  slp_tree *slp_op = XALLOCAVEC (slp_tree, op_type);
   for (i = 0; i < op_type; i++)
 {
-  tree op = gimple_op (stmt, i + 1);
   /* The condition of COND_EXPR is checked in vectorizable_condition().  */
   if (i == 0 && code == COND_EXPR)
 continue;
 
   stmt_vec_info def_stmt_info;
   enum vect_def_type dt;
-  if (!vect_is_simple_use (op, loop_vinfo, , ,
+  tree op;
+  if (!vect_is_simple_use (loop_vinfo, stmt_info, slp_for_stmt_info,
+  i, , _op[i], , ,
   _stmt_info))
{
  if (dump_enabled_p ())
@@ -6729,6 +6741,21 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   return false;
 }
 
+  if (slp_node
+  && !(!single_defuse_cycle
+  && code != DOT_PROD_EXPR
+  && code != WIDEN_SUM_EXPR
+  && code != SAD_EXPR
+  && reduction_type != FOLD_LEFT_REDUCTION))
+for (i = 0; i < op_type; i++)
+  if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_in))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"incompatible vector types for invariants\n");
+ return false;
+   }
+
   if (slp_node)
 vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
   else
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 31ccaf58fc4..ec3675e7070 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3621,16 +3621,22 @@ vect_get_constant_vectors (vec_info *vinfo,
   gimple_seq ctor_seq = NULL;
   auto_vec permute_results;
 
-  /* ???  SLP analysis should compute the vector type for the
- constant / invariant and store it in the SLP node.  */
+  /* We always want SLP_TREE_VECTYPE (op_node) here correctly set.  */
+  vector_type = SLP_TREE_VECTYPE (op_node);
+{
   tree op = op_node->ops[0];
-  /* Check if vector type is a boolean vector.  */
   tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
   && 

Re: [PATCH][PR92658] Add missing vector truncmn2 expanders for avx512f

2020-05-22 Thread Hongtao Liu via Gcc-patches
On Fri, May 22, 2020 at 2:41 PM Uros Bizjak  wrote:
>
> On Fri, May 22, 2020 at 6:55 AM Hongtao Liu  wrote:
> >
> > On Thu, May 21, 2020 at 7:18 PM Uros Bizjak  wrote:
> > >
> > > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu  wrote:
> > > >
> > > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > Hi:
> > > > > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > > PR target/92658
> > > > > > * config/i386/sse.md
> > > > > > (trunc2, truncv32hiv32qi2,
> > > > > > trunc2): New expander.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > * gcc.target/i386/pr92658-avx512f.c: New test.
> > > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > > > >
> > > > > There are more conversions to be added. There are:
> > > > >
> > > > > V2DImode to V2QImode, V2HImode, V2SImode
> > > > > V4DImode to V4QImode, V4HImode, V4SImode
> > > > > V8DImode to V8QImode, V8HImode, V8SImode
> > > > >
> > > > > V4SImode to V4QImode, V4HImode
> > > > > V8SImode to V8QImode, V8HImode
> > > > > V16SImode to V16QImode, V16HImode
> > > > >
> > > > > V8HImode to V8QImode
> > > > > V16HImode to V16QImode
> > > > > V32HImode to V32QImode
> > > > >
> > > > All of them are added
> > > >
> > > > Vectorization failure: (Add xfail in testcase for them since they need
> > > > generic part)
> > > > V2DImode to V2QImode, V2HImode
> > > > V4DImode to V4QImode, V4HImode
> > > > V8DImode to V8QImode
> > > >
> > > > V4SImode to V4QImode, V4HImode
> > > > V8SImode to V8QImode
> > > >
> > > > V8HImode to V8QImode
> > > >
> > > > Vectorization success:
> > > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> > > > V4DImode to V4SImode
> > > > V8DImode to V8HImode, V8SImode
> > > >
> > > > V8SImode to V8HImode
> > > > V16SImode to V16QImode, V16HImode
> > > >
> > > > V32HImode to V32QImode
> > > > V16HImode to V16HImode.
> > > >
> > > >
> > > > > Uros.
> > > >
> > > > Update patch.
> > > > Regression test on i386/x86-64 backend is ok, bootstrap is ok.
> > > >
> > > > gcc/ChangeLog:
> > > > PR target/92658
> > > > * config/i386/sse.md
> > > > (trunc2): New expander
> > > > (truncv32hiv32qi2): Ditto.
> > > > (trunc2): Ditto.
> > > > (trunc2): Ditto.
> > > > (trunc2): Ditto.
> > > > (truncv2div2si2): Ditto.
> > > > (truncv8div8qi2): Ditto.
> > > > (avx512f_v8div16qi2): Renaming
> > > > from *avx512f_v8div16qi2.
> > > > (avx512vl_v2div2si): Renaming
> > > > from *avx512vl_v2div2si2.
> > > > (avx512vl_v2qi2): Renaming
> > > > from *avx512vl_vqi2.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > > * gcc.target/i386/pr92658-avx512f.c: New test.
> > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > >
> > >
> > > +  rtx op = simplify_subreg (V16QImode, operands[0], mode, 0);
> > > +  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);
> > >
> > > You should use simplify_gen_subreg, without null op fixup:
> > >
> > > operands[0] = simplify_gen_subreg (V16QImode, operands[0], 
> > > mode, 0);
> > >
> > Changed.
> > > +  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"
> > >
> > > Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
> > > without this flag.
> > >
> > Changed.
>
> +(define_expand "truncv8div8qi2"
> +  [(set (match_operand:V8QI 0 "register_operand")
> +(truncate:V8QI
> +(match_operand:V8DI 1 "register_operand")))]
> +  "TARGET_AVX512F && TARGET_MMX_WITH_SSE"
> +{
> +  operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0);
> +  emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1]));
> +  DONE;
> +})
>
> You left one here.
>
> +/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */
>
> Target selector shouldn't be needed here.
>
> The patch is OK with the above changes.
>
Changed.
> On a related note, it looks that pmov stores are modelled in a wrong
> way. For example, this pattern;
>
> (define_insn "*avx512f_v8div16qi2_store"
>   [(set (match_operand:V16QI 0 "memory_operand" "=m")
> (vec_concat:V16QI
>   (any_truncate:V8QI
> (match_operand:V8DI 1 "register_operand" "v"))
>   (vec_select:V8QI
> (match_dup 0)
> (parallel [(const_int 8) (const_int 9)
>(const_int 10) (const_int 11)
>(const_int 12) (const_int 13)
>(const_int 14) (const_int 15)]]
>
> models the store in 128bit mode, but according to ISA, it stores in 16bit 
> mode.
>
according to ISA, it stores in 64bit mode
vpmovqb xmm1/m64 {k1}{z}, zmm2.

memory_operand is 128bit but upper 64bit is not changed which means it

[PATCH] tree-optimization/95248 - fix oversight in SM rewrite

2020-05-22 Thread Richard Biener
This fixes a leftover early out in determining the sequence of stores
to materialize.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2020-05-22  Richard Biener  

PR tree-optimization/95248
* tree-ssa-loop-im.c (sm_seq_valid_bb): Remove bogus early out.

* gcc.dg/torture/pr95248.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr95248.c | 28 ++
 gcc/tree-ssa-loop-im.c |  2 --
 2 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr95248.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr95248.c 
b/gcc/testsuite/gcc.dg/torture/pr95248.c
new file mode 100644
index 000..f0efcc12b51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr95248.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
+
+int var_2 = -2013646301;
+int var_3 = -1126567434;
+unsigned int var_12 = 1;
+unsigned int var_19;
+unsigned int arr_25 [24] [21] [15] [17] [15] ;
+
+void __attribute__((noipa)) test()
+{
+  for (int a = 0; a < 3; a = 42)
+for (int b = 0; b < 20; b++)
+  for (int c = 0; c < 4; c = 4)
+for (int d = 0; d < 6; d += 4)
+  for (int e = 0; e < 4; e += 2) {
+arr_25[a][b][c][d][e] = var_2 || var_3;
+var_19 = var_12;
+  }
+}
+
+int main()
+{
+test();
+if (var_19 != 1)
+  __builtin_abort ();
+return 0;
+}
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 63f4ef8883c..fcca099355a 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -2436,8 +2436,6 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
  /* Use the sequence from the first edge and push SMs down.  */
  for (unsigned i = 0; i < first_edge_seq.length (); ++i)
{
- if (first_edge_seq[i].second == sm_other)
-   break;
  unsigned id = first_edge_seq[i].first;
  seq.safe_push (first_edge_seq[i]);
  unsigned new_idx;
-- 
2.25.1


Re: [PATCH] Let numeric_limits::is_iec559 reflect -ffast-math

2020-05-22 Thread Matthias Kretz
On Donnerstag, 21. Mai 2020 17:46:01 CEST Marc Glisse wrote:
> On Thu, 21 May 2020, Jonathan Wakely wrote:
> > On 27/04/20 17:09 +0200, Matthias Kretz wrote:
> >> From: Matthias Kretz 
> >> 
> >>PR libstdc++/84949
> >>* include/std/limits: Let is_iec559 reflect whether
> >>__GCC_IEC_559 says float and double support IEEE 754-2008.
> >>* testsuite/18_support/numeric_limits/is_iec559.cc: Test IEC559
> >>mandated behavior if is_iec559 is true.
> >>* testsuite/18_support/numeric_limits/infinity.cc: Only test inf
> >>behavior if is_iec559 is true, otherwise there is no guarantee
> >>how arithmetic on inf behaves.
> >>* testsuite/18_support/numeric_limits/quiet_NaN.cc: ditto for
> >>NaN.
> >>* testsuite/18_support/numeric_limits/denorm_min-1.cc: Compile
> >>with -ffast-math.
> >>* testsuite/18_support/numeric_limits/epsilon-1.cc: ditto.
> >>* testsuite/18_support/numeric_limits/infinity-1.cc: ditto.
> >>* testsuite/18_support/numeric_limits/is_iec559-1.cc: ditto.
> >>* testsuite/18_support/numeric_limits/quiet_NaN-1.cc: ditto.
> > 
> > I'm inclined to go ahead and commit this (to master only, obviously).
> > It certainly seems more correct to me, and we'll probably never find
> > out if it's "safe" to do unless we actually change it and see what
> > happens.
> > 
> > Marc, do you have an opinion?
> 
> I don't have a strong opinion on this. I thought we were refraining from
> changing numeric_limits based on flags (like -fwrapv for modulo) because
> that would lead to ODR violations when people link objects compiled with
> different flags. There is a value in libstdc++.so, which may have been
> compiled with different flags than the application.

But these ODR violations happen in any case: The floating-point types are 
different types with or without -ffast-math (and related) flags. They behave 
differently. Compiling a function in multiple TUs with different flags 
produces observably different results. Choosing a single one of them is 
obviously fragile and broken. That's the spirit of an ODR violation...

It would sometimes be useful to have different types:
float, float_no_nan, float_no_nan_no_signed_zero, ... 

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


Re: [PATCH] x86: Handle -mavx512vpopcntdq for -march=native

2020-05-22 Thread Uros Bizjak via Gcc-patches
On Thu, May 21, 2020 at 2:54 PM H.J. Lu  wrote:
>
> Add -mavx512vpopcntdq for -march=native if AVX512VPOPCNTDQ is available.
>
> PR target/95258
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> AVX512VPOPCNTDQ.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/driver-i386.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
> index 7612ddfb846..3a816400729 100644
> --- a/gcc/config/i386/driver-i386.c
> +++ b/gcc/config/i386/driver-i386.c
> @@ -420,6 +420,7 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>unsigned int has_avx5124fmaps = 0, has_avx5124vnniw = 0;
>unsigned int has_gfni = 0, has_avx512vbmi2 = 0;
>unsigned int has_avx512bitalg = 0;
> +  unsigned int has_avx512vpopcntdq = 0;
>unsigned int has_shstk = 0;
>unsigned int has_avx512vnni = 0, has_vaes = 0;
>unsigned int has_vpclmulqdq = 0;
> @@ -528,6 +529,7 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>has_vaes = ecx & bit_VAES;
>has_vpclmulqdq = ecx & bit_VPCLMULQDQ;
>has_avx512bitalg = ecx & bit_AVX512BITALG;
> +  has_avx512vpopcntdq = ecx & bit_AVX512VPOPCNTDQ;
>has_movdiri = ecx & bit_MOVDIRI;
>has_movdir64b = ecx & bit_MOVDIR64B;
>has_enqcmd = ecx & bit_ENQCMD;
> @@ -1189,6 +1191,7 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>const char *avx512vp2intersect = has_avx512vp2intersect ? " 
> -mavx512vp2intersect" : " -mno-avx512vp2intersect";
>const char *tsxldtrk = has_tsxldtrk ? " -mtsxldtrk " : " 
> -mno-tsxldtrk";
>const char *avx512bitalg = has_avx512bitalg ? " -mavx512bitalg" : " 
> -mno-avx512bitalg";
> +  const char *avx512vpopcntdq = has_avx512vpopcntdq ? " 
> -mavx512vpopcntdq" : " -mno-avx512vpopcntdq";
>const char *movdiri = has_movdiri ? " -mmovdiri" : " -mno-movdiri";
>const char *movdir64b = has_movdir64b ? " -mmovdir64b" : " 
> -mno-movdir64b";
>const char *enqcmd = has_enqcmd ? " -menqcmd" : " -mno-enqcmd";
> @@ -1210,9 +1213,9 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
> avx512ifma, avx512vbmi, avx5124fmaps, avx5124vnniw,
> clwb, mwaitx, clzero, pku, rdpid, gfni, shstk,
> avx512vbmi2, avx512vnni, vaes, vpclmulqdq,
> -   avx512bitalg, movdiri, movdir64b, waitpkg, cldemote,
> -   ptwrite, avx512bf16, enqcmd, avx512vp2intersect,
> -   serialize, tsxldtrk, NULL);
> +   avx512bitalg, avx512vpopcntdq, movdiri, movdir64b,
> +   waitpkg, cldemote, ptwrite, avx512bf16, enqcmd,
> +   avx512vp2intersect, serialize, tsxldtrk, NULL);
>  }
>
>  done:
> --
> 2.26.2
>


RE: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-22 Thread Yangfei (Felix)
Hi Richard,

Thanks for the suggestions.

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, May 21, 2020 5:22 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with
> fixed sve vector length
> 
> "Yangfei (Felix)"  writes:
> > Hi,
> >
> >   Notice a tiny SVE-related performance issue:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254
> >
> >   For the given test case, SLP succeeds with VNx8HImode with or without
> option -msve-vector-bits=256.
> >   The root cause for the difference is that we choose a different mode in
> aarch64_vectorize_related_mode under -msve-vector-bits=256:
> VNx2HImode instead of V4HImode.
> >   Then in the final tree ssa forwprop pass, we need to do a VIEW_CONVERT
> from V4HImode to VNx2HImode.
> >
> >   PATCH catch and simplify the pattern in
> aarch64_expand_sve_mem_move, emitting a mov pattern of V4HImode
> instead.
> >   I am assuming endianness does not make a difference here considering
> this simplification.
> >   Bootstrap and tested on aarch64-linux-gnu.  Comments?
> 
> I think we should try to catch this at the gimple level, possibly during SLP 
> itself.

Agreed.  It's better if this can be handled in SLP. 
For the given test case, the difference reflect itself after the final ssa 
forwprop. 
So I guess it might be hard for SLP to catch and evaluate in the vect cost 
model.

> Although the patch handles the case in which the V4HI is stored directly to
> memory, I assume it won't help if the code instead does:
> 
> for (i = 0; i < 4; i++)
>   b[i] = u.a[i] + 1;

SLP failed if you modify like that.  

> targetm.can_change_mode_class (..., ALL_REGS) would be a good indicator
> of whether the needed VIEW_CONVERT_EXPR is cheap or expensive.
> 
> I agree it might still be worth handling this in the move patterns too.
> It feels like a target-independent optimisation though, and for example
> should also apply to V4HI moves involving subregs of VNx2HIs.
> 
> So I think it would be worth trying to do this in emit_move_insn.
> In principle it would be useful for:
> 
>   // M1 and M2 equal size, !targetm.can_change_mode_class (M1, M2,
> ALL_REGS)
> 
>   (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
>   (set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR))
>   (set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...)))
>   (set (subreg:M1 (reg:M2 ...)) (constant C))
> 
> It would be nice if we could do this even without the
> can_change_mode_class condition, provided that it doesn't turn valid M1
> constants or MEMs into invalid M2 ones (or at least, M2 ones that need to be
> validated).
> Unfortunately, going that far is likely to interfere with target-specific 
> choices,
> so it's probably too dangerous.
> 
> With the can_change_mode_class condition it should be fine though, since
> it's avoiding an implicit round trip through memory.  The change should be a
> win even if the MEMs or constants need legitimising for M2.

Good suggestion.  I worked out a v2 patch with the logic moved to in 
emit_move_insn.
For (set (subreg:M1 (reg:M2 ...)) (constant C)) case, I am simply requiring C 
should also be a legitimate constant for M2.
It works for the test case.  I will do a full test for the v2 patch if you like 
it.

> > [...]
> > +  if (inner != NULL_RTX
> > +  && aarch64_classify_vector_mode (inner_mode) == VEC_ADVSIMD
> > +  && GET_MODE_INNER (mode) == GET_MODE_INNER (inner_mode)
> > +  && known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE
> (inner_mode))
> > +  && GET_MODE_BITSIZE (inner_mode).to_constant () <= alignment)
> > +{
> > +  rtx addr, mem;
> > +  if (MEM_P (src))
> > +   {
> > + addr = XEXP (src, 0);
> > + mem = gen_rtx_MEM (inner_mode, addr);
> > + emit_move_insn (inner, mem);
> > +   }
> > +  else
> > +   {
> > + addr = XEXP (dest, 0);
> > + mem = gen_rtx_MEM (inner_mode, addr);
> > + emit_move_insn (mem, inner);
> 
> gen_rtx_MEM shouldn't be used to create new versions of existing MEMs,
> since it drops all the attributes.  It's better to use something like
> adjust_address instead.  That will also take care of making sure that the
> address is valid for the new mode.

It was my fault.  I should have realized that difference.  :-)

Felix


pr95254-v2.diff
Description: pr95254-v2.diff


[PATCH] add ctor/dtor to slp_tree

2020-05-22 Thread Richard Biener
This adds constructor and destructor to slp_tree factoring common
code.  I've not changed the wrappers to overloaded CTORs since
I hope to use object_allocator<> and am not sure whether that can
be done in any fancy way yet.

2020-05-20  Richard Biener  

* tree-vectorizer.h (_slp_tree::_slp_tree): New.
(_slp_tree::~_slp_tree): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Factor out code
from allocators.
(_slp_tree::~_slp_tree): Implement.
(vect_free_slp_tree): Simplify.
(vect_create_new_slp_node): Likewise.  Add nops parameter.
(vect_build_slp_tree_2): Adjust.
(vect_analyze_slp_instance): Likewise.
---
 gcc/tree-vect-slp.c   | 90 ++-
 gcc/tree-vectorizer.h |  3 ++
 2 files changed, 41 insertions(+), 52 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 69a2002717f..58aec173b05 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -46,6 +46,34 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 
 
+/* Initialize a SLP node.  */
+
+_slp_tree::_slp_tree ()
+{
+  SLP_TREE_SCALAR_STMTS (this) = vNULL;
+  SLP_TREE_SCALAR_OPS (this) = vNULL;
+  SLP_TREE_VEC_STMTS (this).create (0);
+  SLP_TREE_NUMBER_OF_VEC_STMTS (this) = 0;
+  SLP_TREE_CHILDREN (this) = vNULL;
+  SLP_TREE_LOAD_PERMUTATION (this) = vNULL;
+  SLP_TREE_TWO_OPERATORS (this) = false;
+  SLP_TREE_DEF_TYPE (this) = vect_uninitialized_def;
+  SLP_TREE_VECTYPE (this) = NULL_TREE;
+  this->refcnt = 1;
+  this->max_nunits = 1;
+}
+
+/* Tear down a SLP node.  */
+
+_slp_tree::~_slp_tree ()
+{
+  SLP_TREE_CHILDREN (this).release ();
+  SLP_TREE_SCALAR_STMTS (this).release ();
+  SLP_TREE_SCALAR_OPS (this).release ();
+  SLP_TREE_VEC_STMTS (this).release ();
+  SLP_TREE_LOAD_PERMUTATION (this).release ();
+}
+
 /* Recursively free the memory allocated for the SLP tree rooted at NODE.
FINAL_P is true if we have vectorized the instance or if we have
made a final decision not to vectorize the statements in any way.  */
@@ -76,13 +104,7 @@ vect_free_slp_tree (slp_tree node, bool final_p)
}
 }
 
-  SLP_TREE_CHILDREN (node).release ();
-  SLP_TREE_SCALAR_STMTS (node).release ();
-  SLP_TREE_SCALAR_OPS (node).release ();
-  SLP_TREE_VEC_STMTS (node).release ();
-  SLP_TREE_LOAD_PERMUTATION (node).release ();
-
-  free (node);
+  delete node;
 }
 
 /* Free the memory allocated for the SLP instance.  FINAL_P is true if we
@@ -101,39 +123,15 @@ vect_free_slp_instance (slp_instance instance, bool 
final_p)
 /* Create an SLP node for SCALAR_STMTS.  */
 
 static slp_tree
-vect_create_new_slp_node (vec scalar_stmts)
+vect_create_new_slp_node (vec scalar_stmts, unsigned nops)
 {
-  slp_tree node;
-  stmt_vec_info stmt_info = scalar_stmts[0];
-  unsigned int nops;
-
-  if (gcall *stmt = dyn_cast  (stmt_info->stmt))
-nops = gimple_call_num_args (stmt);
-  else if (gassign *stmt = dyn_cast  (stmt_info->stmt))
-{
-  nops = gimple_num_ops (stmt) - 1;
-  if (gimple_assign_rhs_code (stmt) == COND_EXPR)
-   nops++;
-}
-  else if (is_a  (stmt_info->stmt))
-nops = 0;
-  else
-return NULL;
-
-  node = XNEW (struct _slp_tree);
+  slp_tree node = new _slp_tree;
   SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
-  SLP_TREE_SCALAR_OPS (node) = vNULL;
-  SLP_TREE_VEC_STMTS (node).create (0);
-  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
   SLP_TREE_CHILDREN (node).create (nops);
-  SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
-  SLP_TREE_TWO_OPERATORS (node) = false;
   SLP_TREE_DEF_TYPE (node) = vect_internal_def;
-  SLP_TREE_VECTYPE (node) = NULL_TREE;
-  node->refcnt = 1;
-  node->max_nunits = 1;
 
   unsigned i;
+  stmt_vec_info stmt_info;
   FOR_EACH_VEC_ELT (scalar_stmts, i, stmt_info)
 STMT_VINFO_NUM_SLP_USES (stmt_info)++;
 
@@ -145,21 +143,9 @@ vect_create_new_slp_node (vec scalar_stmts)
 static slp_tree
 vect_create_new_slp_node (vec ops)
 {
-  slp_tree node;
-
-  node = XNEW (struct _slp_tree);
-  SLP_TREE_SCALAR_STMTS (node) = vNULL;
+  slp_tree node = new _slp_tree;
   SLP_TREE_SCALAR_OPS (node) = ops;
-  SLP_TREE_VEC_STMTS (node).create (0);
-  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
-  SLP_TREE_CHILDREN (node) = vNULL;
-  SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
-  SLP_TREE_TWO_OPERATORS (node) = false;
   SLP_TREE_DEF_TYPE (node) = vect_external_def;
-  SLP_TREE_VECTYPE (node) = NULL_TREE;
-  node->refcnt = 1;
-  node->max_nunits = 1;
-
   return node;
 }
 
@@ -1284,7 +1270,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
   else
return NULL;
   (*tree_size)++;
-  node = vect_create_new_slp_node (stmts);
+  node = vect_create_new_slp_node (stmts, 0);
   return node;
 }
 
@@ -1309,7 +1295,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
{
  *max_nunits = this_max_nunits;
  (*tree_size)++;
- node = vect_create_new_slp_node (stmts);
+ node = vect_create_new_slp_node (stmts, 0);

Re: [PATCH][PR92658] Add missing vector truncmn2 expanders for avx512f

2020-05-22 Thread Uros Bizjak via Gcc-patches
On Fri, May 22, 2020 at 6:55 AM Hongtao Liu  wrote:
>
> On Thu, May 21, 2020 at 7:18 PM Uros Bizjak  wrote:
> >
> > On Thu, May 21, 2020 at 7:35 AM Hongtao Liu  wrote:
> > >
> > > On Wed, May 20, 2020 at 11:43 PM Uros Bizjak  wrote:
> > > >
> > > > On Wed, May 20, 2020 at 10:35 AM Hongtao Liu  wrote:
> > > > >
> > > > > Hi:
> > > > >   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
> > > > >
> > > > > gcc/ChangeLog:
> > > > > PR target/92658
> > > > > * config/i386/sse.md
> > > > > (trunc2, truncv32hiv32qi2,
> > > > > trunc2): New expander.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > > * gcc.target/i386/pr92658-avx512f.c: New test.
> > > > > * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> > > >
> > > > There are more conversions to be added. There are:
> > > >
> > > > V2DImode to V2QImode, V2HImode, V2SImode
> > > > V4DImode to V4QImode, V4HImode, V4SImode
> > > > V8DImode to V8QImode, V8HImode, V8SImode
> > > >
> > > > V4SImode to V4QImode, V4HImode
> > > > V8SImode to V8QImode, V8HImode
> > > > V16SImode to V16QImode, V16HImode
> > > >
> > > > V8HImode to V8QImode
> > > > V16HImode to V16QImode
> > > > V32HImode to V32QImode
> > > >
> > > All of them are added
> > >
> > > Vectorization failure: (Add xfail in testcase for them since they need
> > > generic part)
> > > V2DImode to V2QImode, V2HImode
> > > V4DImode to V4QImode, V4HImode
> > > V8DImode to V8QImode
> > >
> > > V4SImode to V4QImode, V4HImode
> > > V8SImode to V8QImode
> > >
> > > V8HImode to V8QImode
> > >
> > > Vectorization success:
> > > V2DImode to V2SImode (under TARGET_MMX_WITH_SSE)
> > > V4DImode to V4SImode
> > > V8DImode to V8HImode, V8SImode
> > >
> > > V8SImode to V8HImode
> > > V16SImode to V16QImode, V16HImode
> > >
> > > V32HImode to V32QImode
> > > V16HImode to V16HImode.
> > >
> > >
> > > > Uros.
> > >
> > > Update patch.
> > > Regression test on i386/x86-64 backend is ok, bootstrap is ok.
> > >
> > > gcc/ChangeLog:
> > > PR target/92658
> > > * config/i386/sse.md
> > > (trunc2): New expander
> > > (truncv32hiv32qi2): Ditto.
> > > (trunc2): Ditto.
> > > (trunc2): Ditto.
> > > (trunc2): Ditto.
> > > (truncv2div2si2): Ditto.
> > > (truncv8div8qi2): Ditto.
> > > (avx512f_v8div16qi2): Renaming
> > > from *avx512f_v8div16qi2.
> > > (avx512vl_v2div2si): Renaming
> > > from *avx512vl_v2div2si2.
> > > (avx512vl_v2qi2): Renaming
> > > from *avx512vl_vqi2.
> > >
> > > gcc/testsuite/ChangeLog:
> > > * gcc.target/i386/pr92658-avx512f.c: New test.
> > > * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> > > * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> >
> >
> > +  rtx op = simplify_subreg (V16QImode, operands[0], mode, 0);
> > +  operands[0] = op ? op : gen_rtx_SUBREG (V16QImode, operands[0], 0);
> >
> > You should use simplify_gen_subreg, without null op fixup:
> >
> > operands[0] = simplify_gen_subreg (V16QImode, operands[0], 
> > mode, 0);
> >
> Changed.
> > +  "TARGET_MMX_WITH_SSE && TARGET_AVX512VL"
> >
> > Do you really need TARGET_MMX_WITH_SSE?  Narrow modes are active even
> > without this flag.
> >
> Changed.

+(define_expand "truncv8div8qi2"
+  [(set (match_operand:V8QI 0 "register_operand")
+(truncate:V8QI
+(match_operand:V8DI 1 "register_operand")))]
+  "TARGET_AVX512F && TARGET_MMX_WITH_SSE"
+{
+  operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0);
+  emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1]));
+  DONE;
+})

You left one here.

+/* { dg-final { scan-assembler-times "vpmovqd" 2 { target { ! ia32 } } } } */

Target selector shouldn't be needed here.

The patch is OK with the above changes.

On a related note, it looks that pmov stores are modelled in a wrong
way. For example, this pattern;

(define_insn "*avx512f_v8div16qi2_store"
  [(set (match_operand:V16QI 0 "memory_operand" "=m")
(vec_concat:V16QI
  (any_truncate:V8QI
(match_operand:V8DI 1 "register_operand" "v"))
  (vec_select:V8QI
(match_dup 0)
(parallel [(const_int 8) (const_int 9)
   (const_int 10) (const_int 11)
   (const_int 12) (const_int 13)
   (const_int 14) (const_int 15)]]

models the store in 128bit mode, but according to ISA, it stores in 16bit mode.

Uros.


Re: ChangeLog files - server and client scripts

2020-05-22 Thread Martin Liška

On 5/22/20 6:57 AM, Jakub Jelinek wrote:

so perhaps it just misses gcc/testsuite/go.test/test ?


Hello.

I've just added the location to ignored locations.


Or what exact files you've changed in your script?


@Ian: Please send us patch with git format-patch.
@Jakub: Can you please sync up the script to the server hooks? I'll
be AFK till Monday.

Martin



Re: drop -aux{dir,base}, revamp -dump{dir,base}

2020-05-22 Thread Richard Biener
On Thu, 21 May 2020, Alexandre Oliva wrote:

> On May 19, 2020, Richard Biener  wrote:
> 
> > On Tue, 19 May 2020, Alexandre Oliva wrote:
> >> I've refreshed the patch, approved back on Jan 22 for gcc-11, in
> >> refs/users/aoliva/heads/aux-dump-revamp, and committed 3 other related
> >> patches on top of it, that I hope to get approved for folding and joint
> >> installation:
> 
> > Thanks again for doing this.  May I also suggest to prepare a short
> > entry for gcc-11/changes.html with these things (like "Output of
> > auxiliary files changed.  See https://gcc.gnu.org/ml/gcc-patches/...
> > for details")?
> 
> How about, under Caveats:
> 
> * Naming and location of auxiliary and dump output files changed.  If
> you compile multiple input files in a single command, if you enable Link
> Time Optimization, or if you use -dumpbase, -dumpdir, -save-temps=*, and
> you expect any file other than the primary output file(s) to be created
> as a side effect, watch out for improvements and a few surprises.  See
> https://gcc.gnu.org/ml/...

Looks good to me.

> 
> Before I can check it all in, I'm still missing a review of:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546015.html
> 
> Did you by any chance miss it, or choose not to review it?

I looked at it shortly but decided somebody else may have more knowledge
there - I'd simply trust you'd done things correctly...

So unless anybody comes up to review and ack it consider it approved
by me after the weekend ;)

Thanks,
Richard.

> Any takers?
> 
> I've just realized I failed to mention I'd regstrapped the patchset on
> x86_64-linux-gnu, and got successful cross builds and regression test
> results for arm-elf, aarch64-elf, aarch64-vx7r2, and x86_64-windows.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)