Re: [Darwin, testsuite, committed] Fix Wnonnull on Darwin.

2019-10-21 Thread Iain Sandoe
Iain Sandoe  wrote:

> Martin Sebor  wrote:
> 
>> On 10/20/2019 07:27 AM, Iain Sandoe wrote:
>>> Martin Sebor  wrote:
 On 10/19/19 2:56 AM, Iain Sandoe wrote:
> Andreas Schwab  wrote:
>> On Okt 19 2019, Iain Sandoe  wrote:
>> 
>>> This test has failed always on Darwin, because Darwin does not mark
>>> entries in string.h with nonnull attributes.  Since the purpose of the 
>>> test
>>> is to check that the warnings are issued for an inlined function, not 
>>> that
>>> the target headers are marked up, we can provide locally marked up
>>> function declarations for Darwin.
>> 
>> If the test depends on the non-std declarations, then it should use them
>> everywhere.
> from my perspective, agreed, Martin?
 
 I don't see a problem with it.  I prefer tests that don't depend
 on system headers to avoid these kinds of issues
>>> We can do that anyway then, - I can just adjust the current code tor remove 
>>> the
>>> special-casing, and to use __SIZE_TYPE__ instead of size_t everywhere, OK?
>> 
>> Sure.
> 
> Will try to get to it later today.

This is what I committed (will sort out backports, in due course).
tested on x86_64-linux-gnu and x86-64-darwin16.
thanks
Iain

===

[testsuite] Make the Wnonnull independent of system headers.

To avoid the result of this test depending on the implementation of
the system 'string.h', provide prototypes for the two functions used
in the test.

gcc/testsuite/ChangeLog:

2019-10-22  Iain Sandoe  

* gcc.dg/Wnonnull.c: Provide prototypes for strlen and memcpy.
Use __SIZE_TYPE__ instead of size_t.


diff --git a/gcc/testsuite/gcc.dg/Wnonnull.c b/gcc/testsuite/gcc.dg/Wnonnull.c
index a165baa99f..0ed06aabe6 100644
--- a/gcc/testsuite/gcc.dg/Wnonnull.c
+++ b/gcc/testsuite/gcc.dg/Wnonnull.c
@@ -2,16 +2,10 @@
{ dg-do compile }
{ dg-options "-O2 -Wall" } */
 
-#ifndef __APPLE__
-#include 
-#else
-/* OSX headers do not mark up the nonnull elements yet.  */
-# include 
-extern size_t strlen (const char *__s)
- __attribute ((pure)) __attribute ((nonnull (1)));
+extern __SIZE_TYPE__ strlen (const char *__s)
+__attribute ((pure)) __attribute ((nonnull (1)));
 extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
-size_t __n) __attribute ((nonnull (1, 2)));
-#endif
+__SIZE_TYPE__ __n) __attribute ((nonnull (1, 2)));
 
 char buf[100];
 
@@ -23,9 +17,9 @@ struct Test
 
 __attribute ((nonnull (1, 2)))
 inline char*
-my_strcpy (char *restrict dst, const char *restrict src, size_t size)
+my_strcpy (char *restrict dst, const char *restrict src, __SIZE_TYPE__ size)
 {
-  size_t len = strlen (src);/* { dg-warning "argument 1 null where 
non-null expected" } */
+  __SIZE_TYPE__ len = strlen (src); /* { dg-warning "argument 1 null where 
non-null expected" } */
   if (len < size)
 memcpy (dst, src, len + 1); /* { dg-warning "argument 2 null where 
non-null expected" } */
   else



Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

2019-10-21 Thread Jason Merrill

On 10/21/19 7:12 PM, Alexandre Oliva wrote:

On Oct 21, 2019, Jason Merrill  wrote:


On Sat, Oct 19, 2019 at 12:08 AM Alexandre Oliva  wrote:



We might also wish to use different lock-breaking logic for that case,
too, e.g. checking that the timestamp of the dir didn't change by
comparing `ls -ld $lockdir` with what we got 30 seconds before.  If it
changed or the output is now empty, we just lost the race again.



It's unlikely that the dir would remain unchanged for several seconds
without the pid file, so if we get the same timestamp after 30 seconds,
it's likely that something went wrong with the lock holder,



This patch uses stamps in the lock directory so that they are automatically
cleared when the lock changes hands.


Nice!


 though it's
not impossible to imagine a scenario in which the lock program that just
succeeded in creating the dir got stopped (^Z) or killed-9 just before
creating the PID file.


One kind of problem is a left-over lock after a kill -9.  That
doesn't provide a chance for the lock cleanup code to run in the
terminated process.  But the current code will eventually steal it
(maybe too soon) and proceed.


Another kind of problem could arise if a lock-needing program got
stopped at an unfortunate time that enabled the lock to be taken from it
(before creating the pid file), and then woke back up believing it still
has the lock.  If it were to carry out anything that really required
mutual exclusion, Bad Things (TM) could happen.

Some systems adopt a STOPITH (that's short for Shoot The Other Process
In The Head) to avoid that scenario, but we don't have that choice:
either we're stealing from a PID that's gone (safe-ish, assuming all
users on a single host, without a shared FS across separate PID
namespaces, but still potentially racy, see below), or we don't know who
we're stealing from (very likely racy, except after a full-system
crash).



Even then, maybe breaking the lock is not such a great idea in
general...



It seems to me that the downside of stealing the lock (greater resource
contention) is less than the downside of  erroring out (build fails, user
has to intervene manually, possibly many hours later when they notice).


Oh, if that's indeed the only consequence, I tend to agree.  I was
worried about scenarios that actually required mutual exclusion.  For
that, stealing locks opens another can of worms since multiple processes
might decide to steal the lock at the same time, and then each one might
end up stealing the lock from others who'd just stole the lock, thinking
they're stealing from the dead or stopped process, with very
impredictable results.


Sure, that could (very rarely) happen if one process managed to break 
the lock and recreate it in between these two lines in another process:


if test -f $lockdir/lock-$1.$$; then
rm -rf $lockdir


We should probably have comments as to the intended use, to express a
preference for serialization, so that nobody ends up using it in cases
that actually depend on mutual exclusion (e.g. to avoid corrupting some
file by concurrently changing it) without disabling the racy lock
stealing machinery:

# Shell-based mutex using mkdir.  This script is used to prefer
# serialized execution to avoid consuming too much RAM.  If reusing it,
# bear in mind that the lock-breaking logic is not race-free, so disable
# it in err() if concurrent execution could cause more serious problems.

Ok with that change or somesuch.


I mostly adopted that comment, thanks.


(Shell functions used to be non-portable, but I think even super
portable autoconf-generated configure scripts use them now)

Thanks!


Here's what I've committed:


commit f71e3aa2a5d614934ecda16987e12202a6b8007d
Author: jason 
Date:   Tue Oct 22 03:09:41 2019 +

* lock-and-run.sh: Check for process existence rather than timeout.

Matthias Klose noted that on less powerful targets, a link might take more
than 5 minutes; he mentions a figure of 3 hours for an LTO link.  So this
patch changes the timeout to a check for whether the locking process still
exists.  If the lock exists in an erroneous state (no pid file or can't
signal the pid) for 30 sec, steal it.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277277 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh
index 3a6a84c253a..22bc436565c 100644
--- a/gcc/lock-and-run.sh
+++ b/gcc/lock-and-run.sh
@@ -1,33 +1,49 @@
 #! /bin/sh
-# Shell-based mutex using mkdir.
+# Shell-based mutex using mkdir.  This script is used in make to prefer
+# serialized execution to avoid consuming too much RAM.  If reusing it,
+# bear in mind that the lock-breaking logic is not race-free, so disable
+# it in err() if concurrent execution could cause more serious problems.
 
+self=`basename $0`
 lockdir="$1" prog="$2"; shift 2 || exit 1
 
 # Remember when we started trying 

Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Joseph Myers
On Mon, 21 Oct 2019, Mike Stump wrote:

> This isn't helpful.  The advanced person already knows this and the 
> limitations of it, and the non-advanced people are confused by the lack 
> of certainty.  So, the web page should not say it, and saying it here, 
> doesn't help much as well.  Trying to have a git tutorial that covers 
> all the bases in two lines, isn't useful either.  So, to that end, the 
> documentation should only list git commit, git push, alone.  All other 

I've seen some versions of plain "git push" in the past warn about changed 
defaults for what it pushes, so being explicit avoids confusing people 
with that warning.  Though it seems that warning was removed from git in 
2016, so maybe avoiding it is no longer relevant.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

2019-10-21 Thread Alexandre Oliva
On Oct 21, 2019, Jason Merrill  wrote:

> On Sat, Oct 19, 2019 at 12:08 AM Alexandre Oliva  wrote:

>> We might also wish to use different lock-breaking logic for that case,
>> too, e.g. checking that the timestamp of the dir didn't change by
>> comparing `ls -ld $lockdir` with what we got 30 seconds before.  If it
>> changed or the output is now empty, we just lost the race again.

>> It's unlikely that the dir would remain unchanged for several seconds
>> without the pid file, so if we get the same timestamp after 30 seconds,
>> it's likely that something went wrong with the lock holder,

> This patch uses stamps in the lock directory so that they are automatically
> cleared when the lock changes hands.

Nice!

>> though it's
>> not impossible to imagine a scenario in which the lock program that just
>> succeeded in creating the dir got stopped (^Z) or killed-9 just before
>> creating the PID file.

One kind of problem is a left-over lock after a kill -9.  That
doesn't provide a chance for the lock cleanup code to run in the
terminated process.  But the current code will eventually steal it
(maybe too soon) and proceed.


Another kind of problem could arise if a lock-needing program got
stopped at an unfortunate time that enabled the lock to be taken from it
(before creating the pid file), and then woke back up believing it still
has the lock.  If it were to carry out anything that really required
mutual exclusion, Bad Things (TM) could happen.

Some systems adopt a STOPITH (that's short for Shoot The Other Process
In The Head) to avoid that scenario, but we don't have that choice:
either we're stealing from a PID that's gone (safe-ish, assuming all
users on a single host, without a shared FS across separate PID
namespaces, but still potentially racy, see below), or we don't know who
we're stealing from (very likely racy, except after a full-system
crash).


>> Even then, maybe breaking the lock is not such a great idea in
>> general...

> It seems to me that the downside of stealing the lock (greater resource
> contention) is less than the downside of  erroring out (build fails, user
> has to intervene manually, possibly many hours later when they notice).

Oh, if that's indeed the only consequence, I tend to agree.  I was
worried about scenarios that actually required mutual exclusion.  For
that, stealing locks opens another can of worms since multiple processes
might decide to steal the lock at the same time, and then each one might
end up stealing the lock from others who'd just stole the lock, thinking
they're stealing from the dead or stopped process, with very
impredictable results.  For short, it's not hard to implement lock
breaking, it's just very hard to make it race-free.


We should probably have comments as to the intended use, to express a
preference for serialization, so that nobody ends up using it in cases
that actually depend on mutual exclusion (e.g. to avoid corrupting some
file by concurrently changing it) without disabling the racy lock
stealing machinery:

# Shell-based mutex using mkdir.  This script is used to prefer
# serialized execution to avoid consuming too much RAM.  If reusing it,
# bear in mind that the lock-breaking logic is not race-free, so disable
# it in err() if concurrent execution could cause more serious problems.

Ok with that change or somesuch.

(Shell functions used to be non-portable, but I think even super
portable autoconf-generated configure scripts use them now)

Thanks!

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[PATCH] Fix incorrect merge of conflictant names in `dump_graphviz`

2019-10-21 Thread Giuliano Belinassi
Hi,

When using lto-dump -callgraph with two or more .o files containing distinct
functions with the same name, dump_graphviz incorrectly merged those functions
into a single node. This patch fixes this issue by calling `dump_name` instead 
of
`name`, therefore concat'ing the function name with the node's id.

To understeand what was the issue, let's say you have two files:

a.c:
static void foo (void) { do_something (); }

b.c:
static void foo (void) { do_something_else (); }

There are distinct functions and should be represented as distinct nodes in
the callgraph dump.



Is it ok for me to commit it in trunk as trivial? And is there something
special that I should know before doing a commit into trunk?



gcc/ChangeLog:
2019-07-22  Giuliano Belinassi  

* cgraph.c (dump_graphviz): Change name to dump_name


Giuliano.
diff --git gcc/cgraph.c gcc/cgraph.c
index 8b752d83809..671306db5ca 100644
--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -2155,7 +2155,7 @@ cgraph_node::dump_graphviz (FILE *f)
 {
   cgraph_node *callee = edge->callee;
 
-  fprintf (f, "\t\"%s\" -> \"%s\"\n", name (), callee->name ());
+  fprintf (f, "\t\"%s\" -> \"%s\"\n", dump_name (), callee->dump_name ());
 }
 }
 


Re: [PATCH] Use narrow mode of constant when expanding widening multiplication

2019-10-21 Thread Jozef Lawrynowicz
On Mon, 21 Oct 2019 08:40:11 +0100
Richard Sandiford  wrote:

> Jozef Lawrynowicz  writes:
> > I experienced the following ICE when working on a downstream patch for 
> > msp430:
> >
> > void
> > foo (unsigned int r, unsigned int y)
> > {
> >   __builtin_umul_overflow ((unsigned int) (-1), y, );
> > }
> >  
> >> msp430-elf-gcc -S tester.c -O0  
> >
> > tester.c: In function 'foo':
> > tester.c:4:1: error: unrecognizable insn:
> > 4 | }
> >   | ^
> > (insn 16 15 17 2 (set (reg:HI 32)
> > (const_int 65535 [0x])) "tester.c":3:3 -1
> >  (nil))
> > during RTL pass: vregs
> > dump file: tester.c.234r.vregs
> > tester.c:4:1: internal compiler error: in extract_insn, at recog.c:2311
> >
> > Following discussions on ml/gcc
> > (https://gcc.gnu.org/ml/gcc/2019-10/msg00083.html), I narrowed this down to 
> > a
> > call to expand_mult_highpart_adjust in expand_expr_real_2.
> >
> > If one of the operands is a constant, its mode had been converted to the 
> > wide
> > mode of our multiplication to generate some RTL, but not converted back to 
> > the
> > narrow mode before expanding what will be the high part of the result of the
> > multiplication.
> >
> > If we look at the other two uses of expand_mult_highpart_adjust in the 
> > sources,
> > (both in expmed.c (expmed_mult_highpart_optab)) we can see that the narrow
> > version of the constant is always used:
> >   if (tem)
> > /* We used the wrong signedness.  Adjust the result.  */
> > return expand_mult_highpart_adjust (mode, tem, op0, narrow_op1,
> > tem, unsignedp);
> >
> > So the attached patch updates the use in expand_expr_real_2 to also use the
> > narrow version of the constant operand.
> > This fixes the aforementioned ICE.  
> 
> TBH, I don't understand why we have:
> 
> if (TREE_CODE (treeop1) == INTEGER_CST)
>   op1 = convert_modes (mode, word_mode, op1,
>TYPE_UNSIGNED (TREE_TYPE (treeop1)));
> 
> All the following code treats op1 as having the original mode (word_mode),
> and having the same mode as op0.  That's what both the optab and (like you
> say) expand_mult_highpart_adjust expect.
> 
> So unless I'm missing something, it looks like all the code does is
> introduce exactly this bug.
> 
> AFAICT from the history, having separate code for INTEGER_CST dates back
> to when this was an expand peephole for normal MULT_EXPRs.  In that case
> we had to handle two cases: when op1 was a conversion from a narrower type,
> and when it was an INTEGER_CST with a suitable range.  The separate
> INTEGER_CST case then got carried along in further updates but I think
> became redundant when the code was moved to WIDEN_MULT_EXPR.
> 
> Removing the above is pre-approved if it works.

Thanks for providing that background info. I'm happy to just remove that code as
you suggest, since it also fixes the ICE.

Applied the attached patch.
Bootstrapped and regtested on trunk for x86_64-pc-linux-gnu.
Regtested on trunk for msp430-elf.

Jozef

> 
> Thanks,
> Richard

>From ad651a1e1fcd0c508c43088a9480d7122dafebde Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Mon, 21 Oct 2019 20:44:16 +
Subject: [PATCH] 2019-10-21  Jozef Lawrynowicz  

	* expr.c (expand_expr_real_2): Don't widen constant op1 when expanding
	widening multiplication.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277271 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog | 5 +
 gcc/expr.c| 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 50121777940..2aa5536cf32 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-10-21  Jozef Lawrynowicz  
+
+	* expr.c (expand_expr_real_2): Don't widen constant op1 when expanding
+	widening multiplication.
+
 2019-10-21  Richard Earnshaw  
 
 	* config/arm/iterators.md (t2_binop0): Fix typo in comment.
diff --git a/gcc/expr.c b/gcc/expr.c
index b54bf1d3dc5..476c6865f20 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8954,9 +8954,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		 != INTEGER_CST check.  Handle it.  */
 		  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
 		goto widen_mult_const;
-		  if (TREE_CODE (treeop1) == INTEGER_CST)
-		op1 = convert_modes (mode, word_mode, op1,
-	 TYPE_UNSIGNED (TREE_TYPE (treeop1)));
 		  temp = expand_binop (mode, other_optab, op0, op1, target,
    unsignedp, OPTAB_LIB_WIDEN);
 		  hipart = gen_highpart (word_mode, temp);
-- 
2.17.1



Re: [PATCH] fix constrained auto parsing issue

2019-10-21 Thread Jason Merrill

On 10/21/19 4:40 PM, Andrew Sutton wrote:

In cp_parser_simple_type_specifier:

if (!type && flag_concepts && decl_specs)
{
   /* Try for a type-constraint with template arguments.  We check
  decl_specs here to avoid trying this for a functional cast.  */
  ...

It's subtle.


Aha.  Please reference cp_parser_simple_type_specifier in the newly 
extended comment, then; OK with that change.



On Mon, Oct 21, 2019 at 2:22 PM Jason Merrill  wrote:


On 10/17/19 10:36 AM, Andrew Sutton wrote:

This fixes a parsing bug with constrained placeholders uses as the
first parameter of a constructor.



+  Parse with an empty set of declaration specifiers since we're
+  trying to match a type-specifier of the first parameter.  */


Why does that make a difference?

Jason





Re: [C++ PATCH] Implement P1073R3: Immediate functions (PR c++/88335)

2019-10-21 Thread Jason Merrill

On 10/21/19 2:49 PM, Jakub Jelinek wrote:

On Mon, Oct 21, 2019 at 01:59:38PM -0400, Jason Merrill wrote:

I agree that cp_fold isn't the right place, since it does lowering for GCC
internals, and consteval expansion is part of the language semantics.  So it
should happen before constexpr body saving, as in your patch.

It seems like we want a finish_full_expression, and do consteval folding
there (and not from finish_function).


The finish_function was just an attempt to limit the number of places where
we need to call it to a minimum.


Fair enough.  Maybe just a finish_unevaluated_operand then.


+  if (init && TREE_STATIC (decl))
+init = cxx_eval_consteval (init);


Do we need this in addition to the cxx_constant_init in store_init_value?

Jason


Re: [PATCH] fix constrained auto parsing issue

2019-10-21 Thread Andrew Sutton
In cp_parser_simple_type_specifier:

if (!type && flag_concepts && decl_specs)
{
  /* Try for a type-constraint with template arguments.  We check
 decl_specs here to avoid trying this for a functional cast.  */
 ...

It's subtle.

Andrew Sutton

On Mon, Oct 21, 2019 at 2:22 PM Jason Merrill  wrote:
>
> On 10/17/19 10:36 AM, Andrew Sutton wrote:
> > This fixes a parsing bug with constrained placeholders uses as the
> > first parameter of a constructor.
>
> > +  Parse with an empty set of declaration specifiers since we're
> > +  trying to match a type-specifier of the first parameter.  */
>
> Why does that make a difference?
>
> Jason
>


Re: [PATCH][PR83534] C++17: typeinfo for noexcept function lacks noexcept information

2019-10-21 Thread Jason Merrill

On 10/4/19 8:17 AM, kamlesh kumar wrote:

bootstrapped and regtested on x86_64.

ChangeLog:
2019-10-04  Kamlesh Kumar  
 * rtti.c (get_tinfo_decl_dynamic): Do not call
 TYPE_MAIN_VARIANT for function.
 (get_typeid): Likewise.
 * g++.dg/rtti/pr83534.C: New Test.


Thanks!  Word wrap from gmail mangled your patch a bit, so next time 
please attach it instead, or see 
https://git-scm.com/docs/git-format-patch#_mua_specific_hints


It looks like you don't have a copyright assignment on file; this patch 
is small enough not to need one, but larger patches will.  The form for 
starting this process can be found at 
http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future



diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index eb1b062a49b..8467e77f7ac 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -276,7 +276,9 @@ get_tinfo_decl_dynamic (tree exp, tsubst_flags_t
complain)
type = non_reference (unlowered_expr_type (exp));

/* Peel off cv qualifiers.  */
-  type = TYPE_MAIN_VARIANT (type);
+  if (TREE_CODE (type) != FUNCTION_TYPE
+|| cxx_dialect < cxx17)
+type = TYPE_MAIN_VARIANT (type);


Since what we're trying to do is strip cv-qualifiers, we might as well 
use the cv_unqualified function.



/* For UNKNOWN_TYPEs call complete_type_or_else to get diagnostics.  */
if (CLASS_TYPE_P (type) || type == unknown_type_node
@@ -298,6 +300,8 @@ get_tinfo_decl_dynamic (tree exp, tsubst_flags_t
complain)
t = build_vtbl_ref (exp, index);
t = convert (type_info_ptr_type, t);
  }
+  else if(TREE_CODE (type) == FUNCTION_TYPE)
+   t = get_tinfo_ptr (type);
else
  /* Otherwise return the type_info for the static type of the expr.  */
  t = get_tinfo_ptr (TYPE_MAIN_VARIANT (type));


And the TYPE_MAIN_VARIANT here was redundant, so we can just drop it.


diff --git a/gcc/testsuite/g++.dg/rtti/pr83534.C
b/gcc/testsuite/g++.dg/rtti/pr83534.C
new file mode 100644
index 000..405ea5c2af4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/pr83534.C
@@ -0,0 +1,14 @@
+// { dg-options "-std=c++17" }
+// { dg-do run }
+
+#include 
+void f1();
+void f2() noexcept;
+int main() {
+
+if((typeid(void()) == typeid(void ()noexcept))
+|| (typeid() == typeid())
+|| (typeid(f1) == typeid(f2)))
+  abort();


This failed for me due to using abort without #include 

Here's what I'm committing:
commit dd14a7b9d1397cf365153ebbe48802873480464e
Author: kamlesh kumar 
Date:   Fri Oct 4 17:47:28 2019 +0530

PR c++/83434 - typeinfo for noexcept function lacks noexcept information

2019-10-21  Kamlesh Kumar  

* rtti.c (get_tinfo_decl_dynamic): Do not call
TYPE_MAIN_VARIANT for function.
(get_typeid): Likewise.

* g++.dg/rtti/pr83534.C: New Test.

Reviewed-by: Jason Merrill 

diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index eb1b062a49b..c905799481e 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -272,11 +272,11 @@ get_tinfo_decl_dynamic (tree exp, tsubst_flags_t complain)
 
   exp = resolve_nondeduced_context (exp, complain);
 
-  /* peel back references, so they match.  */
+  /* Peel back references, so they match.  */
   type = non_reference (unlowered_expr_type (exp));
 
   /* Peel off cv qualifiers.  */
-  type = TYPE_MAIN_VARIANT (type);
+  type = cv_unqualified (type);
 
   /* For UNKNOWN_TYPEs call complete_type_or_else to get diagnostics.  */
   if (CLASS_TYPE_P (type) || type == unknown_type_node
@@ -300,7 +300,7 @@ get_tinfo_decl_dynamic (tree exp, tsubst_flags_t complain)
 }
   else
 /* Otherwise return the type_info for the static type of the expr.  */
-t = get_tinfo_ptr (TYPE_MAIN_VARIANT (type));
+t = get_tinfo_ptr (type);
 
   return cp_build_fold_indirect_ref (t);
 }
@@ -518,7 +518,7 @@ get_typeid (tree type, tsubst_flags_t complain)
 
   /* The top-level cv-qualifiers of the lvalue expression or the type-id
  that is the operand of typeid are always ignored.  */
-  type = TYPE_MAIN_VARIANT (type);
+  type = cv_unqualified (type);
 
   /* For UNKNOWN_TYPEs call complete_type_or_else to get diagnostics.  */
   if (CLASS_TYPE_P (type) || type == unknown_type_node
diff --git a/gcc/testsuite/g++.dg/rtti/pr83534.C b/gcc/testsuite/g++.dg/rtti/pr83534.C
new file mode 100644
index 000..af5f02ebb92
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/pr83534.C
@@ -0,0 +1,13 @@
+// { dg-options "-std=c++17" }
+// { dg-do run }
+
+#include 
+
+void f1();
+void f2() noexcept;
+int main() {
+  if((typeid(void()) == typeid(void ()noexcept))
+ || (typeid() == typeid())
+ || (typeid(f1) == typeid(f2)))
+__builtin_abort();
+}


Re: [SVE] PR91272

2019-10-21 Thread Prathamesh Kulkarni
On Fri, 18 Oct 2019 at 14:36, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The attached patch tries to fix PR91272.
> > Does it look OK ?
> >
> > With patch, I see following failures for aarch64-sve.exp:
> > FAIL: gcc.target/aarch64/sve/clastb_1.c -march=armv8.2-a+sve
> > scan-assembler \\tclastb\\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\\.s
> > FAIL: gcc.target/aarch64/sve/clastb_2.c -march=armv8.2-a+sve
> > scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.s
> > FAIL: gcc.target/aarch64/sve/clastb_3.c -march=armv8.2-a+sve
> > scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.b
> > FAIL: gcc.target/aarch64/sve/clastb_5.c -march=armv8.2-a+sve
> > scan-assembler \\tclastb\\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\\.d
> >
> > For instance, in clastb_1.c, it now emits:
> > clastb  s1, p1, s1, z0.s
> > while using a fully predicated loop.
> > Should I adjust the tests ?
>
> Yeah, that's an improvement really.
>
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index acdd90784dc..2cad2cb94c8 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -10016,7 +10016,8 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >/* See whether another part of the vectorized code applies a loop
> >mask to the condition, or to its inverse.  */
> >
> > -  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +   && reduction_type != EXTRACT_LAST_REDUCTION)
> >   {
> > scalar_cond_masked_key cond (cond_expr, ncopies);
> > if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>
> The context here is:
>
>   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> {
>   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
>   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, 
> j);
> }
>   else
> {
>   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
>   cond.code = invert_tree_comparison (cond.code, honor_nans);
>   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> {
>   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
>   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
>   vectype, j);
>   cond_code = cond.code;
>   swap_cond_operands = true;
> }
> }
>
> Rather than have another instance of vect_get_loop_mask, I think
> it would cleaner to use a nonnull "masks" to decide whether to apply
> the loop mask:
>
>   vec_loop_masks *masks = NULL;
>   if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> {
>   if (reduction_type == EXTRACT_LAST_REDUCTION
>   || loop_vinfo->scalar_cond_masked_set.contains (cond))
> masks = _VINFO_MASKS (loop_vinfo);
>   else
> {
>   ...
> }
>
> Then:
>
> > @@ -10116,6 +10117,15 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec_then_clause = vec_oprnds2[i];
> >vec_else_clause = vec_oprnds3[i];
> >
> > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +   && reduction_type == EXTRACT_LAST_REDUCTION)
> > + {
> > +   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
> > + }
> > +
>
> ...do this vect_get_loop_mask under the condition of "if (masks)".
>
> > if (swap_cond_operands)
> >   std::swap (vec_then_clause, vec_else_clause);
> >
> > @@ -10180,7 +10190,7 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec != { 0, ... } (masked in the MASK_LOAD,
> >unmasked in the VEC_COND_EXPR).  */
> >
> > -   if (loop_mask)
> > +   if (loop_mask && reduction_type != EXTRACT_LAST_REDUCTION)
> >   {
> > if (COMPARISON_CLASS_P (vec_compare))
> >   {
> > @@ -10220,6 +10230,16 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> > vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > vec_compare = vec_compare_name;
> >   }
>
> The code above here:
>
>   if (!is_gimple_val (vec_compare))
> {
>   tree vec_compare_name = make_ssa_name (vec_cmp_type);
>   gassign *new_stmt = gimple_build_assign (vec_compare_name,
>vec_compare);
>   

Re: [PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 04:46:47PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:17, Segher Boessenkool wrote:
> >Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
> >combining more than two insns, always?  I'll experiment.

> For the 2-insn case we don't try a split and simply give up; but when we 
> have a three-insn sequence we then combine tries harder and the generic 
> splitter does rewrite that in 2 insns.

When doing a 2-2 combination, we don't try to split currently.  We could,
if that ever is useful, but that needs more work (to make sure we won't
loop, for one thing).

> Perhaps combine should never try a 3, or 4 insn combination where i0 or 
> i1 is a simple reg-reg move and only feeds one subsequent insn in the 
> sequence on the basis that if that was really going to help then it 
> would have been caught by the two-insn sequence.  If it feeds both i2 
> and i3 then that's a different matter, of course.

Yeah...  Maybe a simple move as producer where the reg dies in the consumer
should not be tried with 3 or more insns?  (Never doing combinations with
trivial copies results in a few in a thousand extra insns, pretty bad).


Segher


Re: Ping: [C][C++] Avoid exposing internal details in aka types

2019-10-21 Thread Jason Merrill

On 10/11/19 10:17 AM, Richard Sandiford wrote:

Richard Sandiford  writes:

Marek Polacek  writes:

On Thu, Oct 10, 2019 at 08:00:53PM +0100, Richard Sandiford wrote:

Ping

Richard Sandiford  writes:

The current aka diagnostics can sometimes leak internal details that
seem more likely to be distracting than useful.  E.g. on aarch64:

   void f (va_list *va) { *va = 1; }

gives:

   incompatible types when assigning to type ‘va_list’ {aka ‘__va_list’} from 
type ‘int’

where __va_list isn't something the user is expected to know about.
A similar thing happens for C++ on the arm_neon.h-based:

   float x;
   int8x8_t y = x;

which gives:

   cannot convert ‘float’ to ‘int8x8_t’ {aka ‘__Int8x8_t’} in initialization

This is accurate -- and __Int8x8_t is defined by the AArch64 PCS --
but it's not going to be meaningful to most users.


Agreed.


+/* Return true if it is worth exposing the DECL_ORIGINAL_TYPE of TYPE to
+   the user in diagnostics, false if it would be better to use TYPE itself.
+   TYPE is known to satisfy typedef_variant_p.  */
+
+bool
+user_facing_original_type_p (const_tree type)
+{
+  gcc_assert (typedef_variant_p (type));
+  tree decl = TYPE_NAME (type);
+
+  /* Look through any typedef in "user" code.  */
+  if (!DECL_IN_SYSTEM_HEADER (decl) && !DECL_IS_BUILTIN (decl))
+return true;
+
+  /* If the original type is also named and is in the user namespace,
+ assume it too is a user-facing type.  */
+  tree orig_type = DECL_ORIGINAL_TYPE (decl);
+  if (tree orig_id = TYPE_IDENTIFIER (orig_type))
+{
+  const char *name = IDENTIFIER_POINTER (orig_id);
+  if (name[0] != '_' || (name[1] != '_' && !ISUPPER (name[1])))
+   return true;


This looks like name_reserved_for_implementation_p.

The rest looks fine to me!


Ah, nice!  I'd looked for a helper but missed that one.

Here's just the C parts, with that change.


And here are the C++ parts, on top of that.  I should probably have
split them from the outset.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


OK.

Jason


Richard

[Comment from the original covering note, in case it helps:

strip_typedefs had:

  /* Explicitly get the underlying type, as TYPE_MAIN_VARIANT doesn't
 strip typedefs with attributes.  */
  result = TYPE_MAIN_VARIANT (DECL_ORIGINAL_TYPE (TYPE_NAME (t)));
  result = strip_typedefs (result);

Applying TYPE_MAIN_VARIANT predates the strip_typedefs call, with the
comment originally contrasting with plain:

  result = TYPE_MAIN_VARIANT (t);

But the recursive call to strip_typedefs will apply TYPE_MAIN_VARIANT,
so it doesn't seem necessary to do it here too.  I think there was also
a missing "remove_attributes" argument, since wrapping something in a
typedef shouldn't change which attributes get removed.]

2019-10-11  Richard Sandiford  

gcc/cp/
* cp-tree.h (STF_USER_VISIBLE): New constant.
(strip_typedefs, strip_typedefs_expr): Take a flags argument.
* tree.c (strip_typedefs, strip_typedefs_expr): Likewise,
updating mutual calls accordingly.  When STF_USER_VISIBLE is true,
only look through typedefs if user_facing_original_type_p.
* error.c (dump_template_bindings, type_to_string): Pass
STF_USER_VISIBLE to strip_typedefs.
(dump_type): Likewise, unless pp_c_flag_gnu_v3 is set.

gcc/testsuite/
* g++.dg/diagnostic/aka5.h: New test.
* g++.dg/diagnostic/aka5a.C: Likewise.
* g++.dg/diagnostic/aka5b.C: Likewise.
* g++.target/aarch64/diag_aka_1.C: Likewise.

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h2019-10-08 09:23:43.0 +0100
+++ gcc/cp/cp-tree.h2019-10-11 15:12:12.544911643 +0100
@@ -5696,6 +5696,13 @@ #define TFF_NO_OMIT_DEFAULT_TEMPLATE_ARG
  #define TFF_NO_TEMPLATE_BINDINGS  (1 << 13)
  #define TFF_POINTER   (1 << 14)
  
+/* These constants can be used as bit flags to control strip_typedefs.

+
+   STF_USER_VISIBLE: use heuristics to try to avoid stripping user-facing
+   aliases of internal details.  This is intended for diagnostics,
+   where it should (for example) give more useful "aka" types.  */
+const unsigned int STF_USER_VISIBLE = 1U;
+
  /* Returns the TEMPLATE_DECL associated to a TEMPLATE_TEMPLATE_PARM
 node.  */
  #define TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL(NODE)\
@@ -7254,8 +7261,10 @@ extern int zero_init_p   
(const_tree);
  extern bool check_abi_tag_redeclaration   (const_tree, const_tree,
 const_tree);
  extern bool check_abi_tag_args(tree, tree);
-extern tree strip_typedefs (tree, bool * = NULL);
-extern tree strip_typedefs_expr(tree, bool * = NULL);
+extern tree strip_typedefs (tree, bool * = NULL,
+ 

Re: [C++ PATCH] Implement P1073R3: Immediate functions (PR c++/88335)

2019-10-21 Thread Jakub Jelinek
On Mon, Oct 21, 2019 at 01:59:38PM -0400, Jason Merrill wrote:
> I agree that cp_fold isn't the right place, since it does lowering for GCC
> internals, and consteval expansion is part of the language semantics.  So it
> should happen before constexpr body saving, as in your patch.
> 
> It seems like we want a finish_full_expression, and do consteval folding
> there (and not from finish_function).

Do we have a reasonable place to catch the full expressions?

I mean, perhaps add_stmt if stmts_are_full_exprs_p () and the spots where
unevaluated operands are handled in the patch, but an immediate invocation
is also a full-expression but for the default argument expressions we don't
know if the calls will be immediate invocation or not. 
Aren't there hundreds of other places where full expressions are handled
though?  I mean, e.g. all expressions in OpenMP/OpenACC clauses, etc.?
If the consteval evaluation is performed from this finish_full_expression,
we'd also need to *walk_subtrees = 0; if we encounter already handled full
expressions in there (e.g. statement expressions etc.).

The finish_function was just an attempt to limit the number of places where
we need to call it to a minimum.

Jakub



Re: [C++] Don't fold __builtin_constant_p prematurely

2019-10-21 Thread Jason Merrill

OK, thanks.

On 9/27/19 2:07 PM, Marc Glisse wrote:

Ping https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00808.html
with one more potential reviewer in Cc.

On Wed, 11 Sep 2019, Marc Glisse wrote:


Ping

On Tue, 3 Sep 2019, Marc Glisse wrote:


On Fri, 2 Aug 2019, Marc Glisse wrote:


Ping

On Tue, 16 Jul 2019, Marc Glisse wrote:


Adding a C++ maintainer in Cc:
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00808.html

On Wed, 10 Jul 2019, Marc Glisse wrote:


Hello,

this avoids folding __builtin_constant_p to 0 early when we are 
not forced to do so. Clearly this has an effect, since it 
uncovered a bug in wi::lshift, fixed today ;-)


I wasn't sure about using |= or just =, the first one seemed more 
conservative.


Bootstrap+regtest on x86_64-pc-linux-gnu.

2019-07-11  Marc Glisse  

gcc/cp/
* constexpr.c (cxx_eval_builtin_function_call): Only set
force_folding_builtin_constant_p if manifestly_const_eval.

gcc/testsuite/
* g++.dg/pr85746.C: New file.









Re: [C++ Patch] Improve cp_parser_class_head error recovery

2019-10-21 Thread Jason Merrill

On 10/18/19 11:35 AM, Paolo Carlini wrote:

Hi,

a few days ago I noticed that for, say, g++.dg/parse/qualified2.C we 
were issuing two additional misleading errors after the first one, 
mentioning in particular a certain "unnamed class" (I'm reproducing only 
the error messages proper):


namespace Glib {
   template  class Value {};
   template <> class Glib::Value {}; // { dg-error "" }
}

qualified2.C:3:29: error: extra qualification not allowed [\-fpermissive\]
qualified2.C:3:46: error: explicit specialization of non-template 
‘Glib::’
qualified2.C:3:47: error: abstract declarator ‘Glib::’ 
used as declaration


Let's see if I can explain clearly enough what I think it's going on.

In cp_parser_class_head, upon the permerror about the extra 
qualification, we try to do error recovery, which is particularly 
tricky, because we are dealing with a permerror thus we have to make 
sure that in case of -fpermissive everything remains internally 
consistent anyway. In this context, clearing 'nested_name_specifier' and 
'num_templates' doesn't seem a good idea because it does *not* give us 
an internal state similar to the one normally obtained when the nested 
name specifier is not there, the reason being that, earlier in the 
function, when a nested name specifier really isn't there we try 
cp_parser_template_id or in case cp_parser_identifier, which set the 
locale 'id' and possibly 'template_id' and 'num_templates', whereas 
during error recovery we remain so to speak completely empty handed. 
Thus, what about not clearing anything? That seems to work at least for 
the two testcases below and doesn't cause regressions.


Thanks, Paolo.

/




OK.

Jason



Re: C++ PATCH for c++/92062 - ODR-use ignored for static member of class template

2019-10-21 Thread Jason Merrill

On 10/11/19 4:23 PM, Marek Polacek wrote:

has_value_dependent_address wasn't stripping location wrappers so it
gave the wrong answer for "" in the static_assert.  That led us to
thinking that the expression isn't instantiation-dependent, and we
skipped static initialization of A<0>::x.

This patch adds stripping so that has_value_dependent_address gives the
same answer as it used to before the location wrappers addition.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 9?

2019-10-11  Marek Polacek  

PR c++/92062 - ODR-use ignored for static member of class template.
* pt.c (has_value_dependent_address): Strip location wrappers.

* g++.dg/cpp0x/constexpr-odr1.C: New test.
* g++.dg/cpp0x/constexpr-odr2.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 84464436991..521d0c56002 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -6542,6 +6542,8 @@ check_valid_ptrmem_cst_expr (tree type, tree expr,
  static bool
  has_value_dependent_address (tree op)
  {
+  STRIP_ANY_LOCATION_WRAPPER (op);
+
/* We could use get_inner_reference here, but there's no need;
   this is only relevant for template non-type arguments, which
   can only be expressed as   */
diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-odr1.C 
gcc/testsuite/g++.dg/cpp0x/constexpr-odr1.C
new file mode 100644
index 000..cf3f95f0565
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-odr1.C
@@ -0,0 +1,19 @@
+// PR c++/92062 - ODR-use ignored for static member of class template.
+// { dg-do run { target c++11 } }
+
+template struct A {
+  static const bool x;
+  static_assert(, ""); // odr-uses A<...>::x
+};
+
+int g;
+
+template
+const bool A::x = (g = 42, false);
+
+void f(A<0>) {}// A<0> must be complete, so is instantiated
+int main()
+{
+  if (g != 42)
+__builtin_abort ();
+}
diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-odr2.C 
gcc/testsuite/g++.dg/cpp0x/constexpr-odr2.C
new file mode 100644
index 000..0927488e569
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-odr2.C
@@ -0,0 +1,19 @@
+// PR c++/92062 - ODR-use ignored for static member of class template.
+// { dg-do run { target c++11 } }
+
+template struct A {
+  static const bool x;
+  enum { force_instantiation =! }; // odr-uses A<...>::x
+};
+
+int g;
+
+template
+const bool A::x = (g = 42, false);
+
+void f(A<0>) {}// A<0> must be complete, so is instantiated
+int main()
+{
+  if (g != 42)
+__builtin_abort ();
+}



OK.

Jason



Re: [PATCH] fix constrained auto parsing issue

2019-10-21 Thread Jason Merrill

On 10/17/19 10:36 AM, Andrew Sutton wrote:

This fixes a parsing bug with constrained placeholders uses as the
first parameter of a constructor.



+Parse with an empty set of declaration specifiers since we're
+trying to match a type-specifier of the first parameter.  */


Why does that make a difference?

Jason



Re: [C++ PATCH] Fix handling of location wrappers in constexpr evaluation (PR c++/92015)

2019-10-21 Thread Jason Merrill

On 10/17/19 3:18 AM, Jakub Jelinek wrote:

Hi!

As written in the PR, location wrappers are stripped by
cxx_eval_constant_expression as the first thing it does after dealing with
jump_target.  The problem is that when this function is called on a
CONSTRUCTOR, is TREE_CONSTANT and reduced_constant_expression_p (which
allows location wrappers around constants), we don't recurse on the
CONSTRUCTOR elements and so nothing strips them away.  Then when
cxx_eval_component_reference or bit_field_ref we actually can run into those
location wrappers and the callers assume they don't appear anymore.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.


I admit I'm not 100% sure about the second case which will call the function
to do the stripping for all the elts until the one found.  The alternative
would be:
tree bitpos = bit_position (field);
if (bitpos == start && DECL_SIZE (field) == TREE_OPERAND (t, 1))
-   return value;
+   {
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ return value;
+   }
if (TREE_CODE (TREE_TYPE (field)) == INTEGER_TYPE
- && TREE_CODE (value) == INTEGER_CST
  && tree_fits_shwi_p (bitpos)
  && tree_fits_shwi_p (DECL_SIZE (field)))
{
  HOST_WIDE_INT bit = tree_to_shwi (bitpos);
  HOST_WIDE_INT sz = tree_to_shwi (DECL_SIZE (field));
  HOST_WIDE_INT shift;
  if (bit >= istart && bit + sz <= istart + isize)
{
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ if (TREE_CODE (value) != INTEGER_CST)
+   continue;
but then we'd do those tree_fits_shwi_p/tree_to_shwi.  Guess
a micro-optimization without a clear winner.

2019-10-17  Jakub Jelinek  

PR c++/92015
* constexpr.c (cxx_eval_component_reference, cxx_eval_bit_field_ref):
Use STRIP_ANY_LOCATION_WRAPPER on CONSTRUCTOR elts.

* g++.dg/cpp0x/constexpr-92015.C: New test.

--- gcc/cp/constexpr.c.jj   2019-10-16 09:30:57.300112739 +0200
+++ gcc/cp/constexpr.c  2019-10-16 17:06:03.943539476 +0200
@@ -2887,7 +2887,10 @@ cxx_eval_component_reference (const cons
  : field == part)
{
  if (value)
-   return value;
+   {
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ return value;
+   }
  else
/* We're in the middle of initializing it.  */
break;
@@ -2977,6 +2980,7 @@ cxx_eval_bit_field_ref (const constexpr_
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (whole), i, field, value)
  {
tree bitpos = bit_position (field);
+  STRIP_ANY_LOCATION_WRAPPER (value);
if (bitpos == start && DECL_SIZE (field) == TREE_OPERAND (t, 1))
return value;
if (TREE_CODE (TREE_TYPE (field)) == INTEGER_TYPE
--- gcc/testsuite/g++.dg/cpp0x/constexpr-92015.C.jj 2019-10-16 
17:16:44.204871319 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-92015.C2019-10-16 
17:14:14.773127884 +0200
@@ -0,0 +1,7 @@
+// PR c++/92015
+// { dg-do compile { target c++11 } }
+
+struct S1 { char c[6] {'h', 'e', 'l', 'l', 'o', 0}; };
+struct S2 { char c[6] = "hello"; };
+static_assert (S1{}.c[0] == 'h', "");
+static_assert (S2{}.c[0] == 'h', "");

Jakub





Re: [PATCH] diagnose hard errors in concept satisfaction

2019-10-21 Thread Jason Merrill

On 10/15/19 3:57 PM, Andrew Sutton wrote:

Certain errors encountered during constraint satisfaction render the
program ill-formed. Emit those as errors during satisfaction and not
when diagnosing constraint errors.

The errors should include the full context for failure (i.e., when
satisfying X, when satisfying Y, this failed), but we don't build that
information until we know we need to diagnose an error. This patch
does not include that context.


I guess that's a reason to do a second pass to diagnose these things if 
we got error_mark_node the first time.



+  if (info.quiet ())
+   error_at (loc, "constraint does not have type %");


Giving the error only when quiet needs a comment.

Jason



Re: C++ PATCH for c++/92106 - ICE with structured bindings and -Wreturn-local-addr

2019-10-21 Thread Jason Merrill

On 10/15/19 2:41 PM, Jakub Jelinek wrote:

On Tue, Oct 15, 2019 at 02:28:12PM -0400, Marek Polacek wrote:

--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/decomp50.C
@@ -0,0 +1,51 @@
+// PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr.
+// { dg-do compile { target c++17 } }
+
+template  struct B;
+template  struct B<_Tp *> { typedef _Tp reference; };


I've looked at the unreduced testcase and it seems to have
template  struct B<_Tp *> { typedef _Tp& reference; };
for this line instead.  I'd think it would be better to change the testcase,
so that it is well defined rather than undefined then.
It is struct iterator_traits<_Tp*>, right?

Otherwise, the patch LGTM, but I'll defer to Jason as the maintainer.


OK with that change.

Jason



Re: [C++ PATCH] Implement P1073R3: Immediate functions (PR c++/88335)

2019-10-21 Thread Jason Merrill

On 10/15/19 1:04 PM, Jakub Jelinek wrote:

Hi!

The following patch implements P1073R3, i.e. consteval, except that
virtual consteval is not supported (I think support for that would need to
include the consteval virtual methods at the end of the binfo structures
after all non-consteval virtual methods, but make sure we don't actually
emit those in the actual virtual tables etc.).


Perhaps we should bypass the existing virtual function call mechanism 
for consteval, and instead find the complete object directly and call 
non-virtually.



Unlike the previous implementation, this doesn't invoke consteval function
already during parsing, but later on, so there aren't issues with say
consteval constructors or consteval in default arguments.


Right, we can't immediately evaluate a class prvalue before we know what 
object it's initializing.



Initially I thought the best spot to do that would be during cp_fold, but
that isn't called e.g. on template function bodies before instantiation,
or on the expressions from unevaluated contexts, so I had to add a function
to walk trees and evaluate consteval function calls found in there.
Furthermore, cp_fold isn't called just during cp_fold_function, but also
during fold_for_warn etc., so some consteval functions might be evaluated
multiple times, which can be a problem that if there are errors, they might
be emitted multiple times.


I agree that cp_fold isn't the right place, since it does lowering for 
GCC internals, and consteval expansion is part of the language 
semantics.  So it should happen before constexpr body saving, as in your 
patch.


It seems like we want a finish_full_expression, and do consteval folding 
there (and not from finish_function).


Jason



[Patch, fortran] PR91926 - assumed rank optional

2019-10-21 Thread Paul Richard Thomas
Please find attached a patch to keep 9-branch up to speed with trunk
as far as the ISO_Fortran_binding feature is concerned.

It bootstraps and regtests on 9-branch and incorporates the correction
for PR92027, which caused problems for trunk on certain platforms.

OK to commit?

Paul

2019-10-21  Paul Thomas  

Backport from trunk
PR fortran/91926
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Correct the
assignment of the attribute field to account correctly for an
assumed shape dummy. Assign separately to the gfc and cfi
descriptors since the atribute can be different. Add branch to
correctly handle missing optional dummies.

2019-10-21  Paul Thomas  

Backport from trunk
PR fortran/91926
* gfortran.dg/ISO_Fortran_binding_13.f90 : New test.
* gfortran.dg/ISO_Fortran_binding_13.c : Additional source.
* gfortran.dg/ISO_Fortran_binding_14.f90 : New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c	(revision 276015)
--- gcc/fortran/trans-expr.c	(working copy)
*** gfc_conv_gfc_desc_to_cfi_desc (gfc_se *p
*** 4989,4995 
--- 5006,5014 
tree gfc_desc_ptr;
tree type;
tree cond;
+   tree desc_attr;
int attribute;
+   int cfi_attribute;
symbol_attribute attr = gfc_expr_attr (e);
stmtblock_t block;

*** gfc_conv_gfc_desc_to_cfi_desc (gfc_se *p
*** 4998,5009 
attribute = 2;
if (!e->rank || gfc_get_full_arrayspec_from_expr (e))
  {
!   if (fsym->attr.pointer)
  	attribute = 0;
!   else if (fsym->attr.allocatable)
  	attribute = 1;
  }

if (e->rank != 0)
  {
parmse->force_no_tmp = 1;
--- 5017,5036 
attribute = 2;
if (!e->rank || gfc_get_full_arrayspec_from_expr (e))
  {
!   if (attr.pointer)
  	attribute = 0;
!   else if (attr.allocatable)
  	attribute = 1;
  }

+   /* If the formal argument is assumed shape and neither a pointer nor
+  allocatable, it is unconditionally CFI_attribute_other.  */
+   if (fsym->as->type == AS_ASSUMED_SHAPE
+   && !fsym->attr.pointer && !fsym->attr.allocatable)
+cfi_attribute = 2;
+   else
+cfi_attribute = attribute;
+
if (e->rank != 0)
  {
parmse->force_no_tmp = 1;
*** gfc_conv_gfc_desc_to_cfi_desc (gfc_se *p
*** 5070,5080 
  		parmse->expr, attr);
  }

!   /* Set the CFI attribute field.  */
!   tmp = gfc_conv_descriptor_attribute (parmse->expr);
tmp = fold_build2_loc (input_location, MODIFY_EXPR,
! 			 void_type_node, tmp,
! 			 build_int_cst (TREE_TYPE (tmp), attribute));
gfc_add_expr_to_block (>pre, tmp);

/* Now pass the gfc_descriptor by reference.  */
--- 5097,5108 
  		parmse->expr, attr);
  }

!   /* Set the CFI attribute field through a temporary value for the
!  gfc attribute.  */
!   desc_attr = gfc_conv_descriptor_attribute (parmse->expr);
tmp = fold_build2_loc (input_location, MODIFY_EXPR,
! 			 void_type_node, desc_attr,
! 			 build_int_cst (TREE_TYPE (desc_attr), cfi_attribute));
gfc_add_expr_to_block (>pre, tmp);

/* Now pass the gfc_descriptor by reference.  */
*** gfc_conv_gfc_desc_to_cfi_desc (gfc_se *p
*** 5092,5097 
--- 5120,5131 
  			 gfor_fndecl_gfc_to_cfi, 2, tmp, gfc_desc_ptr);
gfc_add_expr_to_block (>pre, tmp);

+   /* Now set the gfc descriptor attribute.  */
+   tmp = fold_build2_loc (input_location, MODIFY_EXPR,
+ 			 void_type_node, desc_attr,
+ 			 build_int_cst (TREE_TYPE (desc_attr), attribute));
+   gfc_add_expr_to_block (>pre, tmp);
+
/* The CFI descriptor is passed to the bind_C procedure.  */
parmse->expr = cfi_desc_ptr;

*** gfc_conv_gfc_desc_to_cfi_desc (gfc_se *p
*** 5112,5117 
--- 5146,5170 
tmp = build_call_expr_loc (input_location,
  			 gfor_fndecl_cfi_to_gfc, 2, gfc_desc_ptr, tmp);
gfc_prepend_expr_to_block (>post, tmp);
+
+   /* Deal with an optional dummy being passed to an optional formal arg
+  by finishing the pre and post blocks and making their execution
+  conditional on the dummy being present.  */
+   if (fsym->attr.optional && e->expr_type == EXPR_VARIABLE
+   && e->symtree->n.sym->attr.optional)
+ {
+   cond = gfc_conv_expr_present (e->symtree->n.sym);
+   tmp = fold_build2 (MODIFY_EXPR, void_type_node,
+ 			 cfi_desc_ptr,
+ 			 build_int_cst (pvoid_type_node, 0));
+   tmp = build3_v (COND_EXPR, cond,
+ 		  gfc_finish_block (>pre), tmp);
+   gfc_add_expr_to_block (>pre, tmp);
+   tmp = build3_v (COND_EXPR, cond,
+ 		  gfc_finish_block (>post),
+ 		  build_empty_stmt (input_location));
+   gfc_add_expr_to_block (>post, tmp);
+ }
  }


Index: gcc/testsuite/gfortran.dg/ISO_Fortran_binding_13.c
===
*** gcc/testsuite/gfortran.dg/ISO_Fortran_binding_13.c	(nonexistent)
--- 

Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

2019-10-21 Thread Jason Merrill

On 10/21/19 11:35 AM, Jason Merrill wrote:
On Sat, Oct 19, 2019 at 12:08 AM Alexandre Oliva > wrote:


Hello, Jason,

On Oct 14, 2019, Jason Merrill mailto:ja...@redhat.com>> wrote:

 > Alex, you had a lot of helpful comments when I first wrote this,
any thoughts
 > on this revision?

I think the check of the pid file could be made slightly simpler and
cheaper if we created it using:

    echo $$ > $lockdir/pidT && mv $lockdir/pidT $lockdir/pid

instead of

 > +touch $lockdir/$$



 > +     pid="`(cd $lockdir; echo *)`"

The ""s are implicit in a shell assignment, though there really
shouldn't be more than one PID-named file in the dir.  With the change
suggested above, this would become

         pid=`cat $lockdir/pid 2>/dev/null`


Done.

There's a slight possibility of hitting this right between the creation
of the dir and the creation of the pid file, thus the 2>/dev/null.

 > +     if ps "$pid" >/dev/null; then

could be tested with much lower overhead:

         if test -z "$pid" || kill -0 $pid ; then


Done

though it might make sense to have a different test and error message
for the case of the absent pid file.


Done.

We might also wish to use different lock-breaking logic for that case,
too, e.g. checking that the timestamp of the dir didn't change by
comparing `ls -ld $lockdir` with what we got 30 seconds before.  If it
changed or the output is now empty, we just lost the race again.

It's unlikely that the dir would remain unchanged for several seconds
without the pid file, so if we get the same timestamp after 30 seconds,
it's likely that something went wrong with the lock holder, though it's
not impossible to imagine a scenario in which the lock program that just
succeeded in creating the dir got stopped (^Z) or killed-9 just before
creating the PID file.


This patch uses stamps in the lock directory so that they are 
automatically cleared when the lock changes hands.


Even then, maybe breaking the lock is not such a great idea in
general...

Though mkdir is an operation that forces a synchronization, reading a
file without a filesystem lock isn't.  The rename alleviates that a bit,
but it's not entirely unreasonable for an attempt to read the file to
cache the absence of the file and not notice a creation shortly
afterward.  This would be a lot more of an issue in case of contention
for the lock between different clients of a shared networked filesystem,
though we might imagine highly parallel systems to eventually hit such
issues as well.

But just the possibility of contention across a shared network
filesystem would give me pause, out of realizing that checking for a
local process with the same PID would do no good.  And then, kill -0
would not succeed if the lock holder was started by a different user,
unlike ps.

What if we printed an error message suggesting the command to clean up,
and then errored out, instead of retrying forever or breaking the lock
and proceeding?  Several programs that rely on lock files (git and svn
come to mind) seem to be taking such an approach these days, presumably
because of all the difficulties in automating the double-checking in all
potential scenarios.


It seems to me that the downside of stealing the lock (greater resource 
contention) is less than the downside of  erroring out (build fails, 
user has to intervene manually, possibly many hours later when they notice).


How's this?


...once more, with less HTML.
commit 157d9c29a21ab0a5d854f53ca51ca9a7986a4e64
Author: Jason Merrill 
Date:   Sat Sep 14 14:02:20 2019 -0400

* lock-and-run.sh: Check for process existence rather than timeout.

Matthias Klose noted that on less powerful targets, a link might take more
than 5 minutes; he mentions a figure of 3 hours for an LTO link.  So this
patch changes the timeout to a check for whether the locking process still
exists.  If the lock exists in an erroneous state (no pid file or can't
signal the pid) for 30 sec, steal it.

diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh
index 3a6a84c253a..c1ae16bbb01 100644
--- a/gcc/lock-and-run.sh
+++ b/gcc/lock-and-run.sh
@@ -1,33 +1,43 @@
 #! /bin/sh
 # Shell-based mutex using mkdir.
 
+self=`basename $0`
 lockdir="$1" prog="$2"; shift 2 || exit 1
 
 # Remember when we started trying to acquire the lock.
 count=0
-touch lock-stamp.$$
 
-trap 'rm -r "$lockdir" lock-stamp.$$' 0
+err () {
+if test -f $lockdir/lock-$1.$$; then
+	echo "$self: *** (PID $$) removing stale $lockdir" >&2
+	rm -rf $lockdir
+else
+	touch $lockdir/lock-$1.$$
+fi
+}
 
 until mkdir "$lockdir" 2>/dev/null; do
 # Say something periodically so the user knows what's up.
 if [ `expr $count % 30` = 0 ]; then
-	# Reset if the lock has been renewed.
-	if [ -n 

Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 05:06:20PM +0100, Richard Earnshaw (lists) wrote:
> On 21/10/2019 16:46, Segher Boessenkool wrote:
> >>>There also is the insn_cost hook, which especially for RISC-like targets
> >>>is a lot easier to define.
> >>
> >>Easier, but not a complete replacement for rtx_costs, so not necessarily
> >>easier in the end...
> >
> >It isn't a full replacement *yet*, still chipping away at it.  If your
> >port has an rtx_cost already, adding ai reasonable insn_cost will only
> >improve it, not regress anything.
> >
> >But there are some places that still need rtx_costs, yes.
> >
> >Do you have anything in particular in mind?  PRs welcome!
> 
> Nothing specific.  I have vague recollections of a few places that 
> generated quite complex RTL expressions and then expected rtx_costs to 
> do the decomposition into the cost of the sequence for computing that. 
> Perhaps the mul synthesis code was like that, but I'm probably 
> misremembering it.

Yeah, and there are a few other (less important) places that do not have
anything like an insn.

> Certainly an API where you *know* the RTL is a legal insn and not 
> something made up by the mid-end would be a real step forward -

That is exactly what insn_cost is.  Well, it takes an rtx_insn *, even.

> I think 
> there are still places in the mid-end that generate (PLUS:QI ...), for 
> example, and simple expect rtx_cost to return a useful answer even on a 
> RISC target with no such instruction.

Yup.

CSE still uses rtx_cost directly.  avoid_expensive_constant does, too, and
some other optabs stuff.

Less important are the uses in dojump, expr, expmed (there is your "mult"),

But the really big ones left are set_src_cost and set_rtx_cost I think.


Segher


Re: [PATCH, Fortran] Allow CHARACTER literals in assignments and DATA statements - for review

2019-10-21 Thread Steve Kargl
On Mon, Oct 21, 2019 at 03:40:27PM +0100, Mark Eggleston wrote:
> This is an extension to support a legacy feature supported by other 
> compilers such as flang and the sun compiler.  As I understand it this 
> feature is associated with DEC so it enabled using 
> -fdec-char-conversions and by -fdec.
> 

Without this patch, the following code

% cat a.f90
  integer(4) a
  a = '1234'
  print *, a
  end

produces an error

% gfcx -c a.f90
a.f90:2:6:

2 |   a = '1234'
  |  1
Error: Cannot convert CHARACTER(4) to INTEGER(4) at (1)

With the new option or -fdec, I would hope/expect ar
warning would be issued, which can only be suppressed
by -w, e.g., 

% gfcx -c -fdec a.f90
a.f90:2:6:

2 |   a = '1234'
  |  1
Warning: Nonstandard conversion of CHARACTER(4) to INTEGER(4) at (1)

This hopefully will encourage people to fix their codes.

-- 
Steve


Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Richard Earnshaw (lists)

On 21/10/2019 16:46, Segher Boessenkool wrote:

On Mon, Oct 21, 2019 at 03:46:53PM +0100, Richard Earnshaw (lists) wrote:

On 19/10/2019 14:00, Segher Boessenkool wrote:

On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:


The cost routine for Arm and Thumb2 was not recognising the idioms that
describe the addition with carry, this results in the instructions
appearing more expensive than they really are, which occasionally can lead
to poor choices by combine.  Recognising all the possible variants is
a little trickier than normal because the expressions can become complex
enough that this is no single canonical from.


There also is the insn_cost hook, which especially for RISC-like targets
is a lot easier to define.


Easier, but not a complete replacement for rtx_costs, so not necessarily
easier in the end...


It isn't a full replacement *yet*, still chipping away at it.  If your
port has an rtx_cost already, adding ai reasonable insn_cost will only
improve it, not regress anything.

But there are some places that still need rtx_costs, yes.

Do you have anything in particular in mind?  PRs welcome!


Segher



Nothing specific.  I have vague recollections of a few places that 
generated quite complex RTL expressions and then expected rtx_costs to 
do the decomposition into the cost of the sequence for computing that. 
Perhaps the mul synthesis code was like that, but I'm probably 
misremembering it.


Certainly an API where you *know* the RTL is a legal insn and not 
something made up by the mid-end would be a real step forward - I think 
there are still places in the mid-end that generate (PLUS:QI ...), for 
example, and simple expect rtx_cost to return a useful answer even on a 
RISC target with no such instruction.


R.




Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 08:20:33AM -0700, Mike Stump wrote:
> On Oct 21, 2019, at 3:30 AM, Segher Boessenkool  
> wrote:
> > 
> > On Sun, Oct 20, 2019 at 06:06:30PM +0200, Gerald Pfeifer wrote:
> >> On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
> >>> +Use "git commit" and "git push origin
> >>> +master" to check in the patch.
> >> 
> >> I will admit I made a couple of first commits without reading those
> >> details and just used a plain "git push".  
> >> 
> >> Is there any problem with that, any drawback?
> >> 
> >> Or could we simplify those instructions?
> > 
> > Neither command works in all cases
> 
> This isn't helpful.

I was answering Gerald's question.  I hope the answer was useful.

I was not recommending either "git push" nor "git push origin master".
Each has its drawbacks.  That was my *point*.

Either is fine in this doc, in my opinion.


Segher


Re: Move code out of vect_slp_analyze_bb_1

2019-10-21 Thread Richard Biener
On October 21, 2019 4:06:49 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford
>>  wrote:
>>>
>>> Richard Biener  writes:
>>> > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford
> wrote:
>>> >>Richard Biener  writes:
>>> >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
>>> >> wrote:
>>> After the previous patch, it seems more natural to apply the
>>> PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
>>> the region is, rather than delaying it to vect_slp_analyze_bb_1.
>>> (But rather than carve out the biggest region possible and then
>>> reject it, wouldn't it be better to stop when the region gets
>>> too big, so we at least have a chance of vectorising something?)
>>> 
>>> It also seems more natural for vect_slp_bb_region to create the
>>> bb_vec_info itself rather than (a) having to pass bits of data
>down
>>> for the initialisation and (b) forcing vect_slp_analyze_bb_1 to
>free
>>> on every failure return.
>>> 
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to
>install?
>>> >>>
>>> >>> Ok. But I wonder what the reason was for this limit? Dependency
>>> >>analysis was greatly simplified, being no longer quadratic here.
>Can
>>> >>you check where the limit originally came from? But indeed
>splitting
>>> >>the region makes more sense then, but at dataref group boundaries
>I'd
>>> >>say.
>>> >>
>>> >>Yeah, looks it was the complexity of dependence analysis:
>>> >>
>>> >>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
>>> >
>>> > OK. We no longer
>>> > Call compute dependence between all memory refs but only verify we
>can do those code-motions we need to do. That's of course much harder
>to check a bound on upfront (it's still quadratic in the worst case).
>I'm also not sure this is ever a problem, but we might instead count
>the number of stmts involving memory?
>>>
>>> OK, here's what I tried.  It exposes quadratic behaviour for
>>> gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
>>> all in the name of "vectorising" using V1DI. :-)
>>>
>>> In that test we have N data references in which the even elements
>appear
>>> to be vectorisable but the odd elements don't.  First time round, we
>hit:
>>>
>>>   /* For basic block SLP, try to break the group up into multiples
>of the
>>>  vector size.  */
>>>   unsigned HOST_WIDE_INT const_nunits;
>>>   if (is_a  (vinfo)
>>>   && STMT_VINFO_GROUPED_ACCESS (stmt_info)
>>>   && DR_GROUP_FIRST_ELEMENT (stmt_info)
>>>   && nunits.is_constant (_nunits))
>>>
>>> This breaks off the leading even element and recurses, discards the
>>> leading odd element, and then recurses on the rest.  We then come
>round
>>> to the same code when recursing on the rest.
>>>
>>> It looks like matches is valid for all elements, so I guess we
>should
>>> divide the group based on all false elements, not just the first?
>>
>> What do you mean with "valid for all elements"?  matches[i] is
>> true or false for all elements?  If it is true we shouldn't split
>> at all.
>
>Initially I'd wondered whether the elements were undefined after the
>first "false", i.e. if we broke early at that point.  But it looks like
>true and false elements after the first false are still meaningful.
>
>> But indeed the recursion is bad if we split out one vector a step,
>> we should split the whole group in "half" instead.
>>
>> Still the code should ensure we have a "half" that works OK
>> and one that possibly doesn't.
>>
>> OK, I see we have {true, true, false  } for
>> matches but the 2nd iteration gets {true, false, ... } and doesn't
>> split for me.  Of course if you have V1DI then you'll notice
>> that matches[0] is _always_ true ... I guess we should simply
>> never try splitting for const_nunits == 1?  Or even never
>> do BB vectorization with such a vector type.
>
>I don't think it's specific to const_nunits == 1.  E.g. if we have
>matches == { true, true, false, true } repeating for const_nunits == 2,
>we'll have the same problem of trying:
>
>  N
>  2, N-4
> 2, N-8
>2, N-12,
>   2, N-16
>  
>
>which is still quadratic.

Yeah. 

>Dividing genuinely into half would fix that, so would dividing the
>whole group based on the group1_size. 

I think the smaller split was meant as optimization and originally we didn't 
recurse on the rest... So yes, simply halving might help. 

 Or we could try using the
>whole matches array to divide the group up into chunks that are
>worth recursing on, with each chunk having at most half the original
>group size and with the matches elements for each chunk being all-true
>or all-false.
>
>Thanks,
>Richard



[arm] clean up alu+shift patterns

2019-10-21 Thread Richard Earnshaw (lists)
My DImode arithmetic patches introduced a bug on thumb2 where we could 
generate a register controlled shift into an ALU operation.  In fairness 
the bug was always present, but latent.


As part of cleaning this up (and auditing to ensure I've caught them all 
this time) I've gone through all the shift generating patterns in the MD 
files and cleaned them up, reducing some duplicate patterns between the 
arm and thumb2 descriptions where we can now share the same pattern.  In 
some cases we were missing the shift attribute; in most cases I've 
eliminated an ugly attribute setting using the fact that we normally 
need separate alternatives for shift immediate and shift reg to simplify 
the logic.


* config/arm/iterators.md (t2_binop0): Fix typo in comment.
* config/arm/arm.md (addsi3_carryin_shift): Simplify selection of the
type attribute.
(subsi3_carryin_shift): Separate into register and constant controlled
alternatives.  Use shift_amount_operand for operand 4.  Set shift
attribute and simplify type attribute.
(subsi3_carryin_shift_alt): Likewise.
(rsbsi3_carryin_shift): Likewise.
(rsbsi3_carryin_shift_alt): Likewise.
(andsi_not_shiftsi_si): Enable for TARGET_32BIT.  Separate constant
and register controlled shifts into distinct alternatives.
(andsi_not_shiftsi_si_scc_no_reuse): Likewise.
(andsi_not_shiftsi_si_scc): Likewise.
(arm_cmpsi_negshiftsi_si): Likewise.
(not_shiftsi): Remove redundant M constraint from alternative 1.
(not_shiftsi_compare0): Likewise.
(arm_cmpsi_insn): Remove redundant alternative 2.
(cmpsi_shift_swp): Likewise.
(sub_shiftsi): Likewise.
(sub_shiftsi_compare0_scratch): Likewise.
* config/arm/thumb2.md (thumb_andsi_not_shiftsi_si): Delete pattern.
(thumb2_cmpsi_neg_shiftsi): Likewise.

Committed to trunk.

R.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7ef0c16580d..039fdd02479 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1338,9 +1338,7 @@ (define_insn "*addsi3_carryin_shift"
(set_attr "arch" "32,a")
(set_attr "shift" "3")
(set_attr "predicable" "yes")
-   (set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
-		  (const_string "alu_shift_imm")
-		  (const_string "alu_shift_reg")))]
+   (set_attr "type" "alu_shift_imm,alu_shift_reg")]
 )
 
 (define_insn "*addsi3_carryin_clobercc"
@@ -1719,71 +1717,68 @@ (define_insn "*subsi3_carryin_const0"
 )
 
 (define_insn "*subsi3_carryin_shift"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(minus:SI (minus:SI
-		   (match_operand:SI 1 "s_register_operand" "r")
+		   (match_operand:SI 1 "s_register_operand" "r,r")
 		   (match_operator:SI 2 "shift_operator"
-		[(match_operand:SI 3 "s_register_operand" "r")
-		 (match_operand:SI 4 "reg_or_int_operand" "rM")]))
+		[(match_operand:SI 3 "s_register_operand" "r,r")
+		 (match_operand:SI 4 "shift_amount_operand" "M,r")]))
 		  (match_operand:SI 5 "arm_borrow_operation" "")))]
   "TARGET_32BIT"
   "sbc%?\\t%0, %1, %3%S2"
   [(set_attr "conds" "use")
+   (set_attr "arch" "32,a")
+   (set_attr "shift" "3")
(set_attr "predicable" "yes")
-   (set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
-(const_string "alu_shift_imm")
-(const_string "alu_shift_reg")))]
+   (set_attr "type" "alu_shift_imm,alu_shift_reg")]
 )
 
 (define_insn "*subsi3_carryin_shift_alt"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(minus:SI (minus:SI
-		   (match_operand:SI 1 "s_register_operand" "r")
+		   (match_operand:SI 1 "s_register_operand" "r,r")
 		   (match_operand:SI 5 "arm_borrow_operation" ""))
 		  (match_operator:SI 2 "shift_operator"
-		   [(match_operand:SI 3 "s_register_operand" "r")
-		(match_operand:SI 4 "reg_or_int_operand" "rM")])))]
+		   [(match_operand:SI 3 "s_register_operand" "r,r")
+		(match_operand:SI 4 "shift_amount_operand" "M,r")])))]
   "TARGET_32BIT"
   "sbc%?\\t%0, %1, %3%S2"
   [(set_attr "conds" "use")
+   (set_attr "arch" "32,a")
+   (set_attr "shift" "3")
(set_attr "predicable" "yes")
-   (set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
-(const_string "alu_shift_imm")
-(const_string "alu_shift_reg")))]
+   (set_attr "type" "alu_shift_imm,alu_shift_reg")]
 )
 
+;; No RSC in Thumb2
 (define_insn "*rsbsi3_carryin_shift"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(minus:SI (minus:SI
 		   (match_operator:SI 2 "shift_operator"
-		[(match_operand:SI 3 "s_register_operand" "r")
-		 (match_operand:SI 4 "reg_or_int_operand" "rM")])
-		   (match_operand:SI 1 "s_register_operand" "r"))
+		[(match_operand:SI 3 

Re: [PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

2019-10-21 Thread Richard Earnshaw (lists)

On 19/10/2019 17:17, Segher Boessenkool wrote:

On Fri, Oct 18, 2019 at 08:48:42PM +0100, Richard Earnshaw wrote:


Consider this sequence during combine:

Trying 18, 7 -> 22:
18: r118:SI=r122:SI
   REG_DEAD r122:SI
 7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
   REG_DEAD r118:SI
   REG_DEAD cc:CC_RSB
22: r1:SI=r114:SI
   REG_DEAD r114:SI


r122 is dead here.  combine will have tried just 7+22 before, and that
failed; it now only succeeds because the register copy is removed.

All like you say and what your patch is about.  But, this is a generic
problem, that should be solved in combine itself (whether or not you also
want to reduce the insn cost here).


This patch can easily be reverted if combine is changed; it doesn't 
really interfere with the other patches in the series.


My real problem was that this case didn't really show up until I 
introduced other patches in this series and this combine simplification 
was then causing the generated code to be significantly worse.




Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
combining more than two insns, always?  I'll experiment.





We don't want to prevent combine from eliminating such moves, as this
can expose more combine opportunities, but we shouldn't rate them as
profitable in themselves.  We can do this be adjusting the costs
slightly so that the benefit of eliminating such a simple insn is
reduced.


Yes, but combine should have removed the move in a 2->1 combination
already, if it is beneficial: both 18->7 and 7->22 should have combined
just fine.  This also points to a potential target problem: why are those
two combinations not allowed?





In this case it's because we have a generic pattern for

reg = const - reg - ltu(ccreg)

and the specific case of const == 1, which is then re-ordered as

reg = (1 - ltu(ccreg)) - reg

=> reg = geu(ccreg) - reg

and we don't have a pattern for that.  It would be possible to write one 
if really needed, but because the GEU has to match several different 
modes, it's a bit fiddly and results in a lot of code for not much real 
benefit.


For the 2-insn case we don't try a split and simply give up; but when we 
have a three-insn sequence we then combine tries harder and the generic 
splitter does rewrite that in 2 insns.


Perhaps combine should never try a 3, or 4 insn combination where i0 or 
i1 is a simple reg-reg move and only feeds one subsequent insn in the 
sequence on the basis that if that was really going to help then it 
would have been caught by the two-insn sequence.  If it feeds both i2 
and i3 then that's a different matter, of course.


R.


Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 03:46:53PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 14:00, Segher Boessenkool wrote:
> >On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
> >>
> >>The cost routine for Arm and Thumb2 was not recognising the idioms that
> >>describe the addition with carry, this results in the instructions
> >>appearing more expensive than they really are, which occasionally can lead
> >>to poor choices by combine.  Recognising all the possible variants is
> >>a little trickier than normal because the expressions can become complex
> >>enough that this is no single canonical from.
> >
> >There also is the insn_cost hook, which especially for RISC-like targets
> >is a lot easier to define.
> 
> Easier, but not a complete replacement for rtx_costs, so not necessarily 
> easier in the end...

It isn't a full replacement *yet*, still chipping away at it.  If your
port has an rtx_cost already, adding ai reasonable insn_cost will only
improve it, not regress anything.

But there are some places that still need rtx_costs, yes.

Do you have anything in particular in mind?  PRs welcome!


Segher


PING*2 : Fwd: [PATCH][gcov-profile/91971]Profile directory concatenated with object file path

2019-10-21 Thread Qing Zhao
Hi,

This is 2nd ping of this simple patch.

We are waiting for this patch for one of our important project.

Please let me know whether this patch is reasonable or not.

Thanks a lot.

Qing

> Begin forwarded message:
> 
> From: Qing Zhao 
> Subject: PING: Fwd: [PATCH][gcov-profile/91971]Profile directory concatenated 
> with object file path
> Date: October 14, 2019 at 9:52:34 AM CDT
> To: hubi...@ucw.cz
> Cc: gcc-patches@gcc.gnu.org
> 
> Hi, 
> 
> This is a ping to the following patch.
> 
> Thanks,
> 
> Qing
> 
>> Begin forwarded message:
>> 
>> From: Qing Zhao 
>> Subject: [PATCH][gcov-profile/91971]Profile directory concatenated with 
>> object file path
>> Date: October 9, 2019 at 10:30:04 AM CDT
>> To: hubi...@ucw.cz
>> Cc: gcc-patches@gcc.gnu.org
>> 
>> Hi,
>> 
>> This is a patch for fix PR 91971: Profile directory concatenated with object 
>> file path
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91971
>> 
>> Gcc/ChangeLog:
>> 
>> +2019-10-09  qing zhao  
>> +
>> +   * coverage.c (coverage_init): Mangle the full path of filename when
>> +   filename is a absolute path.
>> +
>> 2019-10-07  Jozef Lawrynowicz  
>> 
>> $ git diff  gcc/coverage.c
>> diff --git a/gcc/coverage.c b/gcc/coverage.c
>> index 0d5138f..bcba61c 100644
>> --- a/gcc/coverage.c
>> +++ b/gcc/coverage.c
>> @@ -1229,6 +1229,14 @@ coverage_init (const char *filename)
>>  else
>>   profile_data_prefix = getpwd ();
>>}
>> +  else
>> +{
>> +  /* when filename is a absolute path, we also need to mangle the full
>> +  path of filename to prevent the profiling data being stored into a
>> +  different path than that specified by profile_data_prefix.  */
>> +  filename = mangle_path (filename);
>> +  len = strlen (filename);
>> +}
>> 
>>  if (profile_data_prefix)
>>prefix_len = strlen (profile_data_prefix);
>> 
>> Okay to commit?
>> 
>> thanks.
>> 
>> Qing
>> 
>> 
>>> On Oct 2, 2019, at 11:05 AM, Qing Zhao  wrote:
>>> 
>>> Ping on:
>>> 
>>> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01554.html 
>>> 
>>> 
>>> Anyway, I filed an gcc bug on this issue as:
>>> 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91971 
>>> 
>>> 
>>> If you think that this is NOT a bug, please let me know.
>>> 
>>> thanks.
>>> 
>>> Qing
>> 
> 



Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Mike Stump
On Oct 21, 2019, at 3:30 AM, Segher Boessenkool  
wrote:
> 
> On Sun, Oct 20, 2019 at 06:06:30PM +0200, Gerald Pfeifer wrote:
>> On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
>>> +Use "git commit" and "git push origin
>>> +master" to check in the patch.
>> 
>> I will admit I made a couple of first commits without reading those
>> details and just used a plain "git push".  
>> 
>> Is there any problem with that, any drawback?
>> 
>> Or could we simplify those instructions?
> 
> Neither command works in all cases

This isn't helpful.  The advanced person already knows this and the limitations 
of it, and the non-advanced people are confused by the lack of certainty.  So, 
the web page should not say it, and saying it here, doesn't help much as well.  
Trying to have a git tutorial that covers all the bases in two lines, isn't 
useful either.  So, to that end, the documentation should only list git commit, 
git push, alone.  All other complexity, should rely upon the mountains of git 
tutorials and git books that can cover the details.  The simple commands always 
work for the simple cases.  The people that are not simple, unfortunately have 
that hard up-hill climb that is git, and nothing we can do in this manual can 
change that.

> , and both work in the simpler cases.
> 
> "git push" will push what you have configured to push for the current branch
> you are on.  If you are on your local "master", and you have followed the
> simple instructions above, that will probably push your local "master" to
> the "master" on your remote "origin" (that is, "git push origin 
> master:master").
> 
> "git push origin master" makes it explicit what remote you want to push to,
> and what local branch you want to push, but not the remote branch you want
> to push to.
> 
> "git push" is slightly more dangerous, but git will prevent you from doing
> most dangerous things, so that's alright.  The instructions as-is are a good
> habit to get into, and for more advanced things you *do* need to name your
> remote and branches; you'll have to learn them some time, why not now.

Cause, the gcc doc cannot do a good job explaining them.  And, if it tried, it 
would be wrong.  So, only list the simple things that work in the simple cases, 
and defer everyone else to more advanced documentation that we don't provide.  
If you want to cookbook it with gcc specific things, do that in the wiki, if 
you want, please, not the gcc doc.

Re: [PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-21 Thread Richard Earnshaw (lists)

On 19/10/2019 14:00, Segher Boessenkool wrote:

On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:


The cost routine for Arm and Thumb2 was not recognising the idioms that
describe the addition with carry, this results in the instructions
appearing more expensive than they really are, which occasionally can lead
to poor choices by combine.  Recognising all the possible variants is
a little trickier than normal because the expressions can become complex
enough that this is no single canonical from.


There also is the insn_cost hook, which especially for RISC-like targets
is a lot easier to define.


Segher



Easier, but not a complete replacement for rtx_costs, so not necessarily 
easier in the end...


R.


Re: [PATCH] add __has_builtin (PR 66970)

2019-10-21 Thread Iain Sandoe
Hi Martin,

Martin Sebor  wrote:

> Ping: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html
> 
> On 10/1/19 11:16 AM, Martin Sebor wrote:
>> Attached is an implementation of the __has_builtin special
>> preprocessor operator/macro analogous to __has_attribute and
>> (hopefully) compatible with the synonymous Clang feature (I
>> couldn't actually find tests for it in the Clang test suite
>> but if someone points me at them I'll verify it).
>> Tested on x86_64-linux.
>> Martin
>> PS I couldn't find an existing API to test whether a reserved
>> symbol like __builtin_offsetof is a function-like built-in so
>> I hardwired the tests for C and C++ into the new names_builtin_p
>> functions.  I don't like this very much because the next time
>> such an operator is added there is nothing to remind us to update
>> the functions.  Adding a flag to the c_common_reswords array would
>> solve the problem but at the expense of a linear search through
>> it.  Does anyone have a suggestion for how to do this better?

This patch will also contribute one of the pieces needed for PR90835 
(and, by inheritance, 90709).

I’ve tested this on Darwin16, and looked through the code - LGTM,
and it’s also somewhat resuable for supporting __has_feature/extension
for which I have patches in progress.

So, +1 for applying it.

thanks
Iain



[PATCH, Fortran] Allow CHARACTER literals in assignments and DATA statements - for review

2019-10-21 Thread Mark Eggleston
This is an extension to support a legacy feature supported by other 
compilers such as flang and the sun compiler.  As I understand it this 
feature is associated with DEC so it enabled using 
-fdec-char-conversions and by -fdec.


It allows character literals to be assigned to numeric (INTEGER, REAL, 
COMPLEX) and LOGICAL variables by direct assignment or in DATA statements.


Please find attached the patch which includes changes to the gfortran 
manual.


Tested on x86_64 using "make check-fortran".

Change logs:

gcc/fortran/ChangeLog

    Jim MacArthur  
    Mark Eggleston  

    * arith.c (hollerith2representation): Use OPT_Wcharacter_truncation in
    call to gfc_warning.  Add character2representation, gfc_character2int,
    gfc_character2real, gfc_character2complex and gfc_character2logical.
    * arith.h: Add prototypes for gfc_character2int, gfc_character2real,
    gfc_character2complex and gfc_character2logical.
    * expr.c (gfc_check_assign): Return true if left hand side is numeric
    or logical and the right hand side is character.
    * gfortran.texi: Add -fdec-char-conversions.
    * intrinsic.c (add_convdersions): Add conversions from character to
    integer, real, complex and logical types for their supported kinds.
    * invoke.texi: Add option to list of options.
    * invoke.texi: Add Character conversion subsection to Extensions
    section.
    * lang.opt: Add new option.
    * options.c (set_dec_flags): Add SET_BITFLAG for
    flag_dec_char_conversions.
    * resolve.c (resolve_ordindary_assign): Issue error if the left hand
    side is numeric or logical and the right hand side is a character
    variable.
    * simplify.c (gfc_convert_constant): Assign the conversion function
    depending on destination type.
    * trans-const.c (gfc_constant_to_tree): Use OPT_Wsurprising in
    gfc_warning allowing the warning to be switched off.

gcc/testsuite/ChangeLog

    Jim MacArthur 
    Mark Eggleston 

    PR fortran/89103
    * gfortran.dg/dec_char_conversion_in_assignment_1.f90: New test.
    * gfortran.dg/dec_char_conversion_in_assignment_2.f90: New test.
    * gfortran.dg/dec_char_conversion_in_assignment_3.f90: New test.
    * gfortran.dg/dec_char_conversion_in_data_1.f90: New test.
    * gfortran.dg/dec_char_conversion_in_data_2.f90: New test.
    * gfortran.dg/dec_char_conversion_in_data_3.f90: New test.
    * gfortran.dg/dec_char_conversion_in_data_4.f90: New test.
    * gfortran.dg/hollerith5.f90: Add -Wsurprising to options.
    * gfortran.dg/hollerith_legacy.f90: Add -Wsurprising to options.
    * gfortran.dg/no_char_to_numeric_assign.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From 26a2a7f4a65331f519ced628dfe7e0fa7b3ce513 Mon Sep 17 00:00:00 2001
From: Jim MacArthur 
Date: Thu, 4 Feb 2016 17:18:30 +
Subject: [PATCH] Allow CHARACTER literals in assignments and data statements

Warnings are raised when this happens.

Enable using -fdec-char-as-int or -fdec
---
 gcc/fortran/arith.c| 94 +-
 gcc/fortran/arith.h|  4 +
 gcc/fortran/expr.c |  5 ++
 gcc/fortran/gfortran.texi  | 24 ++
 gcc/fortran/intrinsic.c| 32 +++-
 gcc/fortran/invoke.texi| 17 ++--
 gcc/fortran/lang.opt   |  5 ++
 gcc/fortran/options.c  |  1 +
 gcc/fortran/resolve.c  |  9 +++
 gcc/fortran/simplify.c | 29 ++-
 gcc/fortran/trans-const.c  |  6 +-
 .../dec_char_conversion_in_assignment_1.f90| 61 ++
 .../dec_char_conversion_in_assignment_2.f90| 61 ++
 .../dec_char_conversion_in_assignment_3.f90| 61 ++
 .../gfortran.dg/dec_char_conversion_in_data_1.f90  | 69 
 .../gfortran.dg/dec_char_conversion_in_data_2.f90  | 69 
 .../gfortran.dg/dec_char_conversion_in_data_3.f90  | 69 
 .../gfortran.dg/dec_char_conversion_in_data_4.f90  |  9 +++
 gcc/testsuite/gfortran.dg/hollerith5.f90   |  5 +-
 gcc/testsuite/gfortran.dg/hollerith_legacy.f90 |  2 +-
 .../gfortran.dg/no_char_to_numeric_assign.f90  | 21 +
 21 files changed, 634 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_assignment_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_assignment_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_assignment_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_data_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_data_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_data_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_char_conversion_in_data_4.f90
 create mode 100644 

Re: [PATCH, OpenACC] Fortran deviceptr

2019-10-21 Thread Chung-Lin Tang

On 2019/10/19 9:04 PM, Bernhard Reutner-Fischer wrote:

On 18 October 2019 17:08:54 CEST, Chung-Lin Tang  
wrote:


Also, I've added a new libgomp.oacc-fortran/deviceptr-2.f90 testcase
that
actually copies out and verifies the deviceptr computation.


In testcases please do not 'call abort' which is nonstandard but use 'stop N' 
which is standard, ideally with different stop integers so one can see easily 
which test failed.

We went through all of the testsuite a while ago to remove the nonstandard 
abort, FYI.
TIA,


Hi Bernhard,
I've adjusted the testcases as advised, updated patch attached.

That said, there seems many more such cases in 
libgomp/testsuite/libgomp.oacc-fortran,
to be updated later.

Thanks,
Chung-Lin



Is this okay for trunk now?

Thanks,
Chung-Lin

2019-10-18  Cesar Philippidis  
 Chung-Lin Tang  

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data
mappings for deviceptr clauses.
(gfc_trans_omp_clauses): Likewise.

gcc/
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETPR.
(omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate.
(gimplify_scan_omp_clauses): Likewise.
(gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for
implicit deviceptr mappings.
gcc/testsuite/
* c-c++-common/goacc/deviceptr-4.c: Update expected data mapping.

2019-10-18  Chung-Lin Tang  
 James Norris  

libgomp/
* oacc-parallel.c (handle_ftn_pointers): Delete function.
(GOACC_parallel_keyed): Remove call to handle_ftn_pointers.
* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test.
* testsuite/libgomp.oacc-fortran/deviceptr-2.f90: New test.


Index: gcc/fortran/trans-openmp.c
===
--- gcc/fortran/trans-openmp.c  (revision 277237)
+++ gcc/fortran/trans-openmp.c  (working copy)
@@ -1099,7 +1099,8 @@ gfc_omp_clause_dtor (tree clause, tree decl)
 void
 gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 {
-  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
+  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
 return;
 
   tree decl = OMP_CLAUSE_DECL (c);
@@ -2173,6 +2174,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp
  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
{
  if (POINTER_TYPE_P (TREE_TYPE (decl))
+ && n->u.map_op == OMP_MAP_FORCE_DEVICEPTR)
+   {
+ OMP_CLAUSE_DECL (node) = decl;
+ goto finalize_map_clause;
+   }
+ else if (POINTER_TYPE_P (TREE_TYPE (decl))
  && (gfc_omp_privatize_by_reference (decl)
  || GFC_DECL_GET_SCALAR_POINTER (decl)
  || GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
@@ -2346,6 +2353,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp
  OMP_CLAUSE_SIZE (node3)
= fold_build2 (MINUS_EXPR, sizetype, ptr, ptr2);
}
+   finalize_map_clause:
  switch (n->u.map_op)
{
case OMP_MAP_ALLOC:
Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 277237)
+++ gcc/gimplify.c  (working copy)
@@ -123,6 +123,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_REDUCTION: inscan seen in {in,ex}clusive clause.  */
   GOVD_REDUCTION_INSCAN = 0x200,
 
+  /* Flag for OpenACC deviceptrs.  */
+  GOVD_DEVICEPTR = 0x400,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -7426,6 +7429,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx,
error ("variable %qE declared in enclosing "
   "% region", DECL_NAME (decl));
  nflags |= GOVD_MAP;
+ nflags |= (n2->value & GOVD_DEVICEPTR);
  if (octx->region_type == ORT_ACC_DATA
  && (n2->value & GOVD_MAP_0LEN_ARRAY))
nflags |= GOVD_MAP_0LEN_ARRAY;
@@ -8943,6 +8947,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TO
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_TOFROM)
flags |= GOVD_MAP_ALWAYS_TO;
+ else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
+   flags |= GOVD_DEVICEPTR;
  goto do_add;
 
case OMP_CLAUSE_DEPEND:
@@ -9727,7 +9733,8 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n,
   | GOVD_MAP_FORCE
   | GOVD_MAP_FORCE_PRESENT
 

Re: [PATCH][wwwdocs] Update GCC 9 release note

2019-10-21 Thread H.J. Lu
On Sun, Oct 20, 2019 at 10:19 AM Gerald Pfeifer  wrote:
>
> On Thu, 10 Oct 2019, H.J. Lu wrote:
> > Here is the same patch for git repo.  Is it OK?
>
> Has this been available since GCC 9.1, or has it been added later?

They have been checked into GCC 9.1.

> (If the latter, please add this to a GCC 9.2 or GCC 9.3 section in
> that file).
>
> Ok.
>
> Gerald



-- 
H.J.


Re: [PATCH] OpenACC reference count overhaul

2019-10-21 Thread Thomas Schwinge
Hi!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> This patch has been broken out of the patch supporting OpenACC 2.6 manual
> deep copy last posted here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html

Thanks.

Remeber to look into  "Potential null
pointer dereference in 'gomp_acc_remove_pointer'", which may be relevant
here.

I see you've merged in the relevant parts of my incremental patch '[WIP]
OpenACC 2.6 manual deep copy support (attach/detach): adjust for
"goacc_async_unmap_tgt" removal', that I included in
,
which tells me that I supposedly understood that part alright.  ;-D

> As part of developing that patch, libgomp's OpenACC reference counting
> implementation proved to be somewhat inconsistent, especially when
> used in combination with the deep copy support which exercises it
> more thoroughly.
>
> So, this patch contains just the changes to reference-counting behaviour,
> for ease of (re-)review.  The other parts of OpenACC 2.6 manual deep
> copy support are forthcoming, but some changes in this patch anticipate
> that support.

As we're discussing these separately, please for now remove the changes
related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving 'link_key'
into an union (to later be shared with 'attach_count');
<87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>.

> Tested with offloading to NVPTX, with good results (though a couple of
> tests need fixing also).

The testsuite changes we're discussing separately, and need to go in
before this one, obviously.

> OK for trunk?

I haven't understood all the changes related to replacing
'dynamic_refcount' with 'virtual_refcount', getting rid of
'data_environ', the 'lookup_dev' rework, but I trust you got that right.
In particular, these seem to remove special-case OpenACC code in favor of
generic OMP code, which is good.

A few more comments:

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h

>  typedef struct acc_dispatch_t
>  {
> -  /* This is a linked list of data mapped using the
> - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
> - Unlike mapped_data in the goacc_thread struct, unmapping can
> - happen out-of-order with respect to mapping.  */
> -  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
> -  struct target_mem_desc *data_environ;

As mentioned before, please also accordingly update the comment attached
to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'.

That code:

> -/* Free address mapping tables.  MM must be locked on entry, and remains 
> locked
> -   on return.  */
> -
> -attribute_hidden void
> -gomp_free_memmap (struct splay_tree_s *mem_map)
> -{
> -  while (mem_map->root)
> -{
> -  struct target_mem_desc *tgt = mem_map->root->key.tgt;
> -
> -  splay_tree_remove (mem_map, _map->root->key);
> -  free (tgt->array);
> -  free (tgt);
> -}
> -}

... kind-of gets inlined here:

> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d)
>  
>if (walk->dev)
>   {
> -   gomp_mutex_lock (>dev->lock);
> -   gomp_free_memmap (>dev->mem_map);
> -   gomp_mutex_unlock (>dev->lock);
> +   while (walk->dev->mem_map.root)
> + {
> +   splay_tree_key k = >dev->mem_map.root->key;
> +   gomp_remove_var (walk->dev, k);
> + }
>  
> walk->dev = NULL;
> walk->base_dev = NULL;

It's not obvious to me why it's OK to remove the locking?  Don't all
operations on the 'mem_map' have to have the device locked?

Does that code now still have the previous (and expected?) "finalize"
semantics (don't consider 'refcount', always unmap)?  (Should we assert
here that 'gomp_remove_var' always returns 'true'?  And/or, if it
doesn't, what does that mean then?)  Or am I confused?  ;-)

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -427,6 +418,7 @@ acc_unmap_data (void *h)
>  {
>struct goacc_thread *thr = goacc_thread ();
>struct gomp_device_descr *acc_dev = thr->dev;
> +  struct splay_tree_key_s cur_node;

I know it's often not the case in existing code, but when adding new
code, please move definitions next to their first use.

> @@ -438,12 +430,11 @@ acc_unmap_data (void *h)
>acc_api_info api_info;
>bool profiling_p = GOACC_PROFILING_SETUP_P (thr, _info, _info);
>  
>gomp_mutex_lock (_dev->lock);
>  
> -  splay_tree_key n = lookup_host (acc_dev, h, 1);
> -  struct target_mem_desc *t;
> +  cur_node.host_start = (uintptr_t) h;
> +  cur_node.host_end = cur_node.host_start + 1;
> +  splay_tree_key n = splay_tree_lookup (_dev->mem_map, _node);
>  
>if (!n)
>  {

Isn't this just inlining 'lookup_host'?  There may be a good reason to do
that, but what is it?

> @@ -451,47 +442,28 @@ acc_unmap_data (void *h)

> - 

Re: Move code out of vect_slp_analyze_bb_1

2019-10-21 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
>> >  wrote:
>> >>Richard Biener  writes:
>> >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
>> >> wrote:
>> After the previous patch, it seems more natural to apply the
>> PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
>> the region is, rather than delaying it to vect_slp_analyze_bb_1.
>> (But rather than carve out the biggest region possible and then
>> reject it, wouldn't it be better to stop when the region gets
>> too big, so we at least have a chance of vectorising something?)
>> 
>> It also seems more natural for vect_slp_bb_region to create the
>> bb_vec_info itself rather than (a) having to pass bits of data down
>> for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
>> on every failure return.
>> 
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >>>
>> >>> Ok. But I wonder what the reason was for this limit? Dependency
>> >>analysis was greatly simplified, being no longer quadratic here. Can
>> >>you check where the limit originally came from? But indeed splitting
>> >>the region makes more sense then, but at dataref group boundaries I'd
>> >>say.
>> >>
>> >>Yeah, looks it was the complexity of dependence analysis:
>> >>
>> >>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
>> >
>> > OK. We no longer
>> > Call compute dependence between all memory refs but only verify we can do 
>> > those code-motions we need to do. That's of course much harder to check a 
>> > bound on upfront (it's still quadratic in the worst case). I'm also not 
>> > sure this is ever a problem, but we might instead count the number of 
>> > stmts involving memory?
>>
>> OK, here's what I tried.  It exposes quadratic behaviour for
>> gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
>> all in the name of "vectorising" using V1DI. :-)
>>
>> In that test we have N data references in which the even elements appear
>> to be vectorisable but the odd elements don't.  First time round, we hit:
>>
>>   /* For basic block SLP, try to break the group up into multiples of the
>>  vector size.  */
>>   unsigned HOST_WIDE_INT const_nunits;
>>   if (is_a  (vinfo)
>>   && STMT_VINFO_GROUPED_ACCESS (stmt_info)
>>   && DR_GROUP_FIRST_ELEMENT (stmt_info)
>>   && nunits.is_constant (_nunits))
>>
>> This breaks off the leading even element and recurses, discards the
>> leading odd element, and then recurses on the rest.  We then come round
>> to the same code when recursing on the rest.
>>
>> It looks like matches is valid for all elements, so I guess we should
>> divide the group based on all false elements, not just the first?
>
> What do you mean with "valid for all elements"?  matches[i] is
> true or false for all elements?  If it is true we shouldn't split
> at all.

Initially I'd wondered whether the elements were undefined after the
first "false", i.e. if we broke early at that point.  But it looks like
true and false elements after the first false are still meaningful.

> But indeed the recursion is bad if we split out one vector a step,
> we should split the whole group in "half" instead.
>
> Still the code should ensure we have a "half" that works OK
> and one that possibly doesn't.
>
> OK, I see we have {true, true, false  } for
> matches but the 2nd iteration gets {true, false, ... } and doesn't
> split for me.  Of course if you have V1DI then you'll notice
> that matches[0] is _always_ true ... I guess we should simply
> never try splitting for const_nunits == 1?  Or even never
> do BB vectorization with such a vector type.

I don't think it's specific to const_nunits == 1.  E.g. if we have
matches == { true, true, false, true } repeating for const_nunits == 2,
we'll have the same problem of trying:

  N
  2, N-4
 2, N-8
2, N-12,
   2, N-16
  

which is still quadratic.

Dividing genuinely into half would fix that, so would dividing the
whole group based on the group1_size.  Or we could try using the
whole matches array to divide the group up into chunks that are
worth recursing on, with each chunk having at most half the original
group size and with the matches elements for each chunk being all-true
or all-false.

Thanks,
Richard


[PATCH] Fix PR92162

2019-10-21 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-21  Richard Biener  

PR tree-optimization/92162
* tree-vect-loop.c (vect_create_epilog_for_reduction): Lookup
STMT_VINFO_REDUC_IDX in reduc_info.
* tree-vect-stmts.c (vectorizable_condition): Likewise.

* gcc.dg/pr92162.c: New testcase.

Index: gcc/testsuite/gcc.dg/pr92162.c
===
--- gcc/testsuite/gcc.dg/pr92162.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr92162.c  (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+short int s8;
+
+void __attribute__ ((simd))
+gn (void)
+{
+  s8 = 0;
+}
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277242)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4259,9 +4259,9 @@ vect_create_epilog_for_reduction (stmt_v
 (CCOMPARE).  The then and else values mirror the main VEC_COND_EXPR:
 the reduction phi corresponds to NEW_PHI_TREE and the new values
 correspond to INDEX_BEFORE_INCR.  */
-  gcc_assert (STMT_VINFO_REDUC_IDX (stmt_info) >= 1);
+  gcc_assert (STMT_VINFO_REDUC_IDX (reduc_info) >= 1);
   tree index_cond_expr;
-  if (STMT_VINFO_REDUC_IDX (stmt_info) == 2)
+  if (STMT_VINFO_REDUC_IDX (reduc_info) == 2)
index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
  ccompare, indx_before_incr, new_phi_tree);
   else
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 277242)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -9818,7 +9818,7 @@ vectorizable_condition (stmt_vec_info st
return false;
   reduc_info = info_for_reduction (stmt_info);
   reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
-  reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
+  reduc_index = STMT_VINFO_REDUC_IDX (reduc_info);
   gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION
  || reduc_index != -1);
 }


Re: Move code out of vect_slp_analyze_bb_1

2019-10-21 Thread Richard Biener
On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
> >  wrote:
> >>Richard Biener  writes:
> >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
> >> wrote:
> After the previous patch, it seems more natural to apply the
> PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
> the region is, rather than delaying it to vect_slp_analyze_bb_1.
> (But rather than carve out the biggest region possible and then
> reject it, wouldn't it be better to stop when the region gets
> too big, so we at least have a chance of vectorising something?)
> 
> It also seems more natural for vect_slp_bb_region to create the
> bb_vec_info itself rather than (a) having to pass bits of data down
> for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
> on every failure return.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>>
> >>> Ok. But I wonder what the reason was for this limit? Dependency
> >>analysis was greatly simplified, being no longer quadratic here. Can
> >>you check where the limit originally came from? But indeed splitting
> >>the region makes more sense then, but at dataref group boundaries I'd
> >>say.
> >>
> >>Yeah, looks it was the complexity of dependence analysis:
> >>
> >>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
> >
> > OK. We no longer
> > Call compute dependence between all memory refs but only verify we can do 
> > those code-motions we need to do. That's of course much harder to check a 
> > bound on upfront (it's still quadratic in the worst case). I'm also not 
> > sure this is ever a problem, but we might instead count the number of stmts 
> > involving memory?
>
> OK, here's what I tried.  It exposes quadratic behaviour for
> gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
> all in the name of "vectorising" using V1DI. :-)
>
> In that test we have N data references in which the even elements appear
> to be vectorisable but the odd elements don't.  First time round, we hit:
>
>   /* For basic block SLP, try to break the group up into multiples of the
>  vector size.  */
>   unsigned HOST_WIDE_INT const_nunits;
>   if (is_a  (vinfo)
>   && STMT_VINFO_GROUPED_ACCESS (stmt_info)
>   && DR_GROUP_FIRST_ELEMENT (stmt_info)
>   && nunits.is_constant (_nunits))
>
> This breaks off the leading even element and recurses, discards the
> leading odd element, and then recurses on the rest.  We then come round
> to the same code when recursing on the rest.
>
> It looks like matches is valid for all elements, so I guess we should
> divide the group based on all false elements, not just the first?

What do you mean with "valid for all elements"?  matches[i] is
true or false for all elements?  If it is true we shouldn't split
at all.

But indeed the recursion is bad if we split out one vector a step,
we should split the whole group in "half" instead.

Still the code should ensure we have a "half" that works OK
and one that possibly doesn't.

OK, I see we have {true, true, false  } for
matches but the 2nd iteration gets {true, false, ... } and doesn't
split for me.  Of course if you have V1DI then you'll notice
that matches[0] is _always_ true ... I guess we should simply
never try splitting for const_nunits == 1?  Or even never
do BB vectorization with such a vector type.

Richard.

> Richard
>
>
> 2019-10-21  Richard Sandiford  
>
> gcc/
> * params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with...
> (PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param.
> * doc/invoke.texi: Update accordingly.
> * tree-vect-slp.c (vect_slp_bb): Compute region_end during the
> search loop.  Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB
> data references.
>
> Index: gcc/params.def
> ===
> --- gcc/params.def  2019-10-16 10:50:00.064243704 +0100
> +++ gcc/params.def  2019-10-21 10:48:45.769347466 +0100
> @@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID
>   0, 0, 1)
>
>  /* Avoid SLP vectorization of large basic blocks.  */
> -DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
> - "slp-max-insns-in-bb",
> - "Maximum number of instructions in basic block to be considered for 
> "
> - "SLP vectorization.", 1000, 0, 0)
> +DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB,
> + "slp-max-memrefs-in-bb",
> + "Maximum number of memory references to be considered for "
> + "SLP vectorization.", 1024, 0, 0)
>
>  DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
>   "min-insn-to-prefetch-ratio",
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100
> +++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100
> @@ -12248,9 

Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-21 Thread Mark Eggleston

PING - OK to commit?

On 02/10/2019 09:00, Mark Eggleston wrote:


On 28/09/2019 17:50, Steve Kargl wrote:

On Thu, Sep 26, 2019 at 09:45:28AM +0100, Mark Eggleston wrote:

Original thread starts here
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01185.html

OK to commit?


I'm not a big fan of option proliferation.  If you don't
want to see warns just use -w.  Of course, this is just
MHO.

Unfortunately -w switches off ALL warnings.


--
https://www.codethink.co.uk/privacy.html



Re: [Darwin, testsuite, committed] Fix Wnonnull on Darwin.

2019-10-21 Thread Iain Sandoe
Hi Martin,

Martin Sebor  wrote:

> On 10/20/2019 07:27 AM, Iain Sandoe wrote:
>> Martin Sebor  wrote:
>>> On 10/19/19 2:56 AM, Iain Sandoe wrote:
 Andreas Schwab  wrote:
> On Okt 19 2019, Iain Sandoe  wrote:
> 
>> This test has failed always on Darwin, because Darwin does not mark
>> entries in string.h with nonnull attributes.  Since the purpose of the 
>> test
>> is to check that the warnings are issued for an inlined function, not 
>> that
>> the target headers are marked up, we can provide locally marked up
>> function declarations for Darwin.
> 
> If the test depends on the non-std declarations, then it should use them
> everywhere.
 from my perspective, agreed, Martin?
>>> 
>>> I don't see a problem with it.  I prefer tests that don't depend
>>> on system headers to avoid these kinds of issues
>> We can do that anyway then, - I can just adjust the current code tor remove 
>> the
>> special-casing, and to use __SIZE_TYPE__ instead of size_t everywhere, OK?
> 
> Sure.

Will try to get to it later today.
(also to backport to 9 and 8 if that’s OK, IIRC the test was introduced in 8.x 
- slowly
 catching up on long-standing Darwin test fails).

>>> That said, it shouldn't matter if the declaration of a built-in
>>> function has the nonnull attribute.  As long as the built-in has
>>> one and isn't disabled GCC should use it.  I'd be curious to know
>>> what is actually preventing the warning from triggering there.
>> This is secondary problem, the Darwin header implementation expands th
>>  memcpy to:
>> __builtin___memcpy_chk (dst, src,  __builtin_object_size (dst),  len);
>> Which, obviously, isn’t doing what you expect.
> 
> That suggests -Wno-system-headers might be getting in the way of
> the warning.  I thought most middel-end warnings coped with it
> but maybe not this one.  But getting rid of the include does
> sound like the right way to deal with the test failing.

In this case it seems like the Darwin headers are applying _FORTIFY_SOURCE
(at least to the strings.h) automatically when the compiler is __GNUC__.  It 
might
be yet more fallout from the various __has_xxx that we don’t yet support.  In 
any
case, a separate issue I think.

thanks
Iain

> 
> Thanks
> Martin
> 
>> So, this does suggest that
>> a) for the test to be well-controlled, we need to avoid system headers
>> b) there might be some other builtins to check.
>> (but if we restrict the discussion to my understood purpose of the Wnonnull
>>  test, only (a) matters and that’s what’s proposed above).
>> thanks.
>> Iain




Re: [Darwin, testsuite, committed] Fix Wnonnull on Darwin.

2019-10-21 Thread Martin Sebor

On 10/20/2019 07:27 AM, Iain Sandoe wrote:

Martin Sebor  wrote:


On 10/19/19 2:56 AM, Iain Sandoe wrote:

Andreas Schwab  wrote:

On Okt 19 2019, Iain Sandoe  wrote:


This test has failed always on Darwin, because Darwin does not mark
entries in string.h with nonnull attributes.  Since the purpose of the test
is to check that the warnings are issued for an inlined function, not that
the target headers are marked up, we can provide locally marked up
function declarations for Darwin.


If the test depends on the non-std declarations, then it should use them
everywhere.

from my perspective, agreed, Martin?


I don't see a problem with it.  I prefer tests that don't depend
on system headers to avoid these kinds of issues


We can do that anyway then, - I can just adjust the current code tor remove the
special-casing, and to use __SIZE_TYPE__ instead of size_t everywhere, OK?


Sure.




That said, it shouldn't matter if the declaration of a built-in
function has the nonnull attribute.  As long as the built-in has
one and isn't disabled GCC should use it.  I'd be curious to know
what is actually preventing the warning from triggering there.


This is secondary problem, the Darwin header implementation expands th
  memcpy to:
__builtin___memcpy_chk (dst, src,  __builtin_object_size (dst),  len);

Which, obviously, isn’t doing what you expect.


That suggests -Wno-system-headers might be getting in the way of
the warning.  I thought most middel-end warnings coped with it
but maybe not this one.  But getting rid of the include does
sound like the right way to deal with the test failing.

Thanks
Martin



So, this does suggest that
a) for the test to be well-controlled, we need to avoid system headers
b) there might be some other builtins to check.

(but if we restrict the discussion to my understood purpose of the Wnonnull
  test, only (a) matters and that’s what’s proposed above).

thanks.
Iain




Remove vars before TLS data during OpenACC shutdown

2019-10-21 Thread Julian Brown
This patch fixes a minor issue that could occur during OpenACC shutdown
on nvidia devices. The acc_shutdown_1 function freed TLS data before
unmapping variables, but the nvptx libgomp plugin uses the lack of TLS
data in nvptx_free as an indication that it is running from callback
context of the CUDA driver runtime, meaning it must add the block to a
list to be freed later (in nvptx_close_device).

With this patch, the data will be freed immediately instead. OK for
trunk?

Thanks,

Julian

ChangeLog

libgomp/
* oacc-init.c (acc_shutdown_1): Remove variable mappings
before TLS data during OpenACC shutdown.
commit bbf990a35ab5a7b5100b03c5c5aca11ef0729e7b
Author: Julian Brown 
Date:   Fri Oct 18 11:47:37 2019 -0700

Remove vars before TLS data during OpenACC shutdown

libgomp/
* oacc-init.c (acc_shutdown_1): Remove variable mappings before TLS
data during OpenACC shutdown.

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index e0395ef43b2..ba51d09fa82 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -334,6 +334,20 @@ acc_shutdown_1 (acc_device_t d)
   /* Free target-specific TLS data and close all devices.  */
   for (walk = goacc_threads; walk != NULL; walk = walk->next)
 {
+  if (walk->dev)
+	{
+	  /* Free the splay tree before the TLS data below: the nvptx plugin
+	 uses the lack of TLS data as an indicator that device memory is
+	 being freed from (CUDA) callback context.  */
+	  while (walk->dev->mem_map.root)
+	{
+	  splay_tree_key k = >dev->mem_map.root->key;
+	  if (k->virtual_refcount == VREFCOUNT_LINK_KEY)
+		k->u.link_key = NULL;
+	  gomp_remove_var (walk->dev, k);
+	}
+	}
+
   if (walk->target_tls)
 	base_dev->openacc.destroy_thread_data_func (walk->target_tls);
 
@@ -354,19 +368,8 @@ acc_shutdown_1 (acc_device_t d)
 	  gomp_fatal ("shutdown during host fallback");
 	}
 
-  if (walk->dev)
-	{
-	  while (walk->dev->mem_map.root)
-	{
-	  splay_tree_key k = >dev->mem_map.root->key;
-	  if (k->virtual_refcount == VREFCOUNT_LINK_KEY)
-		k->u.link_key = NULL;
-	  gomp_remove_var (walk->dev, k);
-	}
-
-	  walk->dev = NULL;
-	  walk->base_dev = NULL;
-	}
+  walk->dev = NULL;
+  walk->base_dev = NULL;
 }
 
   gomp_mutex_unlock (_thread_lock);


Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-21 Thread Segher Boessenkool
On Mon, Oct 21, 2019 at 01:27:23PM +0200, Jakub Jelinek wrote:
> On Mon, Oct 21, 2019 at 05:45:16AM -0500, Segher Boessenkool wrote:
> > On Sun, Oct 20, 2019 at 09:51:22PM +0200, Uros Bizjak wrote:
> > > On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
> > > > As mentioned in the PR, the x86 backend has various 
> > > > define_insn_and_split
> > > > patterns that are meant to match usually during combine, are then
> > > > unconditionally split during split1 pass and as they have && 
> > > > can_create_pseudo_p ()
> > > > in their define_insn condition, if they get matched after split1, 
> > > > nothing
> > > > would split them anymore and they wouldn't match after reload.
> > > >
> > > > The split1 pass already sets a property that can be used.
> > > >
> > > > I've first tried to remove some constraints and associated attributes, 
> > > > but
> > > > it seems from further discussions in the PR I don't know much about the
> > > > reasons why they were added and if they are still needed or not, so this
> > > > version of the patch just replaces the can_create_pseudo_p () conditions
> > > > with a new predicate that stops matching already after the split1 pass.
> > > 
> > > As explained by Segher in the PR, there is no 100% guarantee that
> > > combine won't produce a pattern with a wrong hard register as a e.g.
> > > count reg of a shift insn. RA will die on this kind of pattern with
> > > reload failure, so these constraints are used together with
> > > ix86_legitimate_combined_insn target hook to reject invalid
> > > combinations involving hard registers.
> > 
> > And even that isn't completely safe, there is nothing that stops other
> > passes from creating the offending insns.
> 
> Sure, but with the patch either they are created before split1, then they
> will be split there and all is fine,

No, they can end up as needing reloads for hard registers in exactly the
same way as they can during combine.  The combine hook is only used by
combine, after all.

Other passes generally try only very conservative instruction modifications
(comparatively), I don't expect problems in reality, certainly not problems
that stay hidden for years.

It's just not really safe to use (non-fixed) hard registers early.


Segher


Re: [PATCH] OpenACC reference count overhaul

2019-10-21 Thread Julian Brown
On Tue, 15 Oct 2019 17:30:06 +0200
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2019-10-03T09:35:04-0700, Julian Brown 
> wrote:
> > This patch has been broken out of the patch supporting OpenACC 2.6
> > manual deep copy last posted here:
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html  
> 
> Thanks.
> 
> 
> > a couple of
> > tests need fixing also  
> 
> Let's look at these first, and independently.
> 
> The overall goal not being to bend test cases until they (again) work,
> but rather to verify what they're testing, so that they're valid
> OpenACC code, or if not that, then they're testing specifics of the
> GCC implementation (for example, the 'dg-shouldfail' test cases).

Indeed, the tests looked "obviously wrong", but actually none of them
should have failed with the reference-count overhaul patch. As far as I
can tell, only the context-2.c test now fails with the current og9
branch, intermittently, with the last version of the patch sent. Turns
out that was a real bug! So, good catch.

> ACK -- but do we understand why the same shouldn't be applied to the
> very similar 'libgomp.oacc-c-c++-common/context-1.c' and
> 'libgomp.oacc-c-c++-common/context-3.c', too?
> 
> I suppose your testing of the "OpenACC reference count overhaul"
> tripped over these constructs?  (Why just some, then?)

Yeah. Just blind luck, I think.

> The same pattern ('acc_copyin', 'acc_free') also appears in
> 'libgomp.oacc-c-c++-common/clauses-1.c', does that also need to be
> corrected?  Same in 'libgomp.oacc-c-c++-common/lib-13.c' (... where
> that test case actually is titled "Check acc_is_present and
> acc_delete" instead of "[...] acc_free", huh),
> 'libgomp.oacc-c-c++-common/lib-14.c',
> 'libgomp.oacc-c-c++-common/lib-18.c'.
> 
> Then, the 'acc_deviceptr', 'acc_unmap_data', 'acc_free' usage in
> 'libgomp.oacc-c-c++-common/clauses-1.c' also seems strange, as the
> respective 'acc_free' argument certainly is not (at least not
> directly) a "pointer value that was returned by a call to
> 'acc_malloc'".  Does it make sense to (continue to) support that,
> assuming that's how it's implemented internally, or should these be
> corrected to valid OpenACC, too?  Same in
> 'libgomp.oacc-c-c++-common/present-1.c'.
> 
> Same in 'libgomp.oacc-c-c++-common/clauses-2.c' (we 'dg-shouldfail'
> earlier, but the later code should otherwise be made correct anyway).
> 
> Several of these things again in
> 'libgomp.oacc-c-c++-common/nested-1.c'.

I'm not sure if *all* of those are wrong. I have a patch (forthcoming)
that fixes some of the pedantically-wrong OpenACC usage, but none of
the tests now regress with this version of the patch, so the urgency
is gone.

This version of the patch fixes the lookup_dev_1 helper function --
previously I had:

static splay_tree_key
lookup_dev_1 (splay_tree_node node, uintptr_t d, size_t s)
{
  splay_tree_key k = >key;
  struct target_mem_desc *t = k->tgt;

  if (d >= t->tgt_start && d + s <= t->tgt_end)
return k;

  if (node->left)
return lookup_dev_1 (node->left, d, s);

  if (node->right)
return lookup_dev_1 (node->right, d, s);

  return NULL;
}

which would never recurse into a right-hand branch if there was a
left-hand node! Oops. So, device-address lookups would sometimes fail
when there was a valid mapping, depending on the balance of the splay
tree. (As an aside, I think calling lookup_dev unconditionally in
several of the OpenACC API calls as we do is a bad idea -- it takes time
linear to the number of mappings, with no way to avoid that overhead.
But that's another matter.)

Re-testing shows that the previously-regressing tests no longer
regress, but I haven't yet made any changes to VREFCOUNT_LINK_KEY, etc.
as suggested in the review of the attach/detach patch:

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01374.html

OK? (ChangeLog as before.)

Julian
>From e7b21dfe343f1ca556652cc54793e0656eb95685 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 5 Nov 2018 15:51:46 -0800
Subject: [PATCH] OpenACC reference count overhaul

2019-10-02  Julian Brown  

	libgomp/
	* libgomp.h (VREFCOUNT_LINK_KEY): New macro.
	(struct splay_tree_key_s): Put link_key field into a new union.
	Substitute dynamic_refcount field for virtual_refcount.
	(struct acc_dispatch_t): Remove data_environ field.
	(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
	(gomp_acc_insert_pointer): Remove prototype.
	(gomp_acc_remove_pointer): Update prototype.
	(gomp_free_memmap): Remove prototype.
	(gomp_remove_var_async): Add prototype.
	* oacc-host.c (host_dispatch): Don't initialise removed data_environ
	field.
	* oacc-init.c (acc_shutdown_1): Use gomp_remove_var instead of
	gomp_free_memmap.
	* oacc-mem.c (lookup_dev_1): New function.
	(lookup_dev): Reimplement using above.
	(acc_free, acc_hostptr): Update calls to lookup_dev.
	(acc_map_data): Likewise.  Don't add to data_environ list.
	(acc_unmap_data): Remove call to gomp_unmap_vars.  Fix semantics to
	remove mapping, but not 

Re: [PATCH 2/2] gcc/riscv: Add a mechanism to remove some calls to _riscv_save_0

2019-10-21 Thread Andrew Burgess
Below is a new versions of this patch, I believe that this addresses
the review comments from the earlier version.  In addition this has
been tested using Jim's idea of forcing -msave-restore (if the flag is
not otherwise given) and I now see no test failures for newlib and
linux toolchains.

One thing that has been mentioned briefly was adding a flag to control
just this optimisation, I haven't done that yet, but can if that is
required - obviously this optimisation doesn't do anything if
-msave-restore is turned off, so that is always an option.  With no
test failures I don't know if a separate flag is required.

Thanks,
Andrew

---

commit f1b824e94f3ea396bd0ef46692d5211f855d4b4c
Author: Andrew Burgess 
Date:   Thu Jul 18 16:03:10 2019 +0100

gcc/riscv: Add a mechanism to remove some calls to _riscv_save_0

When using the -msave-restore flag we end up with calls to
_riscv_save_0 and _riscv_restore_0.  These functions adjust the stack
and save or restore the return address.  Due to grouping multiple
save/restore stub functions together the save/restore 0 calls actually
save s0, s1, s2, and the return address, but only the return address
actually matters.  Leaf functions don't call the save/restore stubs,
so whenever we do see a call to the save/restore stubs, the store of
the return address is required.

If we look in gcc/config/riscv/riscv.c at the function
riscv_expand_prologue and riscv_expand_epilogue we can see that it
would be reasonably easy to adjust these functions to avoid the calls
to the save/restore stubs for those cases where we are about to call
_riscv_save_0 and _riscv_restore_0, however, the actual code size
saving this would give is debatable, with linker relaxation, the calls
to save/restore are often just 4-bytes, and can sometimes even be
2-bytes, while leaving the stack adjust and return address save inline
is always going to be 4-bytes.

The interesting case is when we call _riscv_save_0 and
_riscv_restore_0, and also have a frame that would (without
save/restore) have resulted in a tail call.  In this case if we could
remove the save/restore calls, and restore the tail call then we would
get a real size saving.

The problem is that the choice of generating a tail call or not is
done during the gimple expand pass, at which point we don't know how
many registers we need to save (or restore).

The solution presented in this patch offers a partial solution to this
problem.  By using the TARGET_MACHINE_DEPENDENT_REORG pass to
implement a very limited pattern matching we identify functions that
call _riscv_save_0 and _riscv_restore_0, and which could be converted
to make use of a tail call.  These functions are then converted to the
non save/restore tail call form.

This should result in a code size reduction when compiling with -Os
and with the -msave-restore flag.

gcc/ChangeLog:

* config.gcc: Add riscv-sr.o to extra_objs for riscv.
* config/riscv/riscv-sr.c: New file.
* config/riscv/riscv.c (riscv_reorg): New function.
(TARGET_MACHINE_DEPENDENT_REORG): Define.
* config/riscv/riscv.h (SIBCALL_REG_P): Define.
(riscv_remove_unneeded_save_restore_calls): Declare.
* config/riscv/t-riscv (riscv-sr.o): New build rule.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-2.c: New file.
* gcc.target/riscv/save-restore-3.c: New file.
* gcc.target/riscv/save-restore-4.c: New file.
* gcc.target/riscv/save-restore-5.c: New file.
* gcc.target/riscv/save-restore-6.c: New file.
* gcc.target/riscv/save-restore-7.c: New file.
* gcc.target/riscv/save-restore-8.c: New file.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bdc2253f8ef..b4440877e99 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -523,7 +523,7 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o"
d_target_objs="riscv-d.o"
;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-sr.c b/gcc/config/riscv/riscv-sr.c
new file mode 100644
index 000..977f21b3e37
--- /dev/null
+++ b/gcc/config/riscv/riscv-sr.c
@@ -0,0 +1,465 @@
+/* This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public 

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Richard Earnshaw (lists)

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a 
way that it will run the GCC testsuite after each of your patches, so 
I'll keep in mind not to report regressions (I've noticed several already).



I can perform a manual validation taking your 29 patches as a single one 
and compare the results with those of the revision preceding the one 
were you committed patch #1. Do you think it would be useful?



Christophe




I think if you can filter out any that are removed by later patches and 
then report against the patch that caused the regression itself then 
that would be the best.  But I realise that would be more work for you, 
so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

 reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.


[committed] Backports to gcc-9-branch

2019-10-21 Thread Jakub Jelinek
Hi!

I've backported following 18 patches (20 trunk commits in total)
from trunk to gcc-9-branch, bootstrapped/regtested on x86_64-linux and
i686-linux and committed to gcc-9-branch.

Jakub
2019-10-21  Jakub Jelinek  

Backported from mainline
2019-08-02  Jakub Jelinek  

* quadmath.h (M_Eq, M_LOG2Eq, M_LOG10Eq, M_LN2q, M_LN10q, M_PIq,
M_PI_2q, M_PI_4q, M_1_PIq, M_2_PIq, M_2_SQRTPIq, M_SQRT2q,
M_SQRT1_2q): Use two more decimal places.

--- libquadmath/quadmath.h  (revision 273996)
+++ libquadmath/quadmath.h  (revision 273997)
@@ -1,5 +1,5 @@
 /* GCC Quad-Precision Math Library
-   Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+   Copyright (C) 2010-2019 Free Software Foundation, Inc.
Written by Francois-Xavier Coudert  
 
 This file is part of the libquadmath library.
@@ -165,19 +165,19 @@ extern int quadmath_snprintf (char *str,
(floating constant exceeds range of ‘__float128’)  */
 /* #define HUGE_VALQ (__extension__ 0x1.0p32767Q) */
 
-#define M_Eq   2.7182818284590452353602874713526625Q  /* e */
-#define M_LOG2Eq   1.4426950408889634073599246810018921Q  /* log_2 e */
-#define M_LOG10Eq  0.4342944819032518276511289189166051Q  /* log_10 e */
-#define M_LN2q 0.6931471805599453094172321214581766Q  /* log_e 2 */
-#define M_LN10q2.3025850929940456840179914546843642Q  /* log_e 
10 */
-#define M_PIq  3.1415926535897932384626433832795029Q  /* pi */
-#define M_PI_2q1.5707963267948966192313216916397514Q  /* pi/2 
*/
-#define M_PI_4q0.7853981633974483096156608458198757Q  /* pi/4 
*/
-#define M_1_PIq0.3183098861837906715377675267450287Q  /* 1/pi 
*/
-#define M_2_PIq0.6366197723675813430755350534900574Q  /* 2/pi 
*/
-#define M_2_SQRTPIq1.1283791670955125738961589031215452Q  /* 2/sqrt(pi) */
-#define M_SQRT2q   1.4142135623730950488016887242096981Q  /* sqrt(2) */
-#define M_SQRT1_2q 0.7071067811865475244008443621048490Q  /* 1/sqrt(2) */
+#define M_Eq   2.718281828459045235360287471352662498Q  /* e */
+#define M_LOG2Eq   1.442695040888963407359924681001892137Q  /* log_2 e */
+#define M_LOG10Eq  0.434294481903251827651128918916605082Q  /* log_10 e */
+#define M_LN2q 0.693147180559945309417232121458176568Q  /* log_e 2 */
+#define M_LN10q2.302585092994045684017991454684364208Q  /* 
log_e 10 */
+#define M_PIq  3.141592653589793238462643383279502884Q  /* pi */
+#define M_PI_2q1.570796326794896619231321691639751442Q  /* 
pi/2 */
+#define M_PI_4q0.785398163397448309615660845819875721Q  /* 
pi/4 */
+#define M_1_PIq0.318309886183790671537767526745028724Q  /* 
1/pi */
+#define M_2_PIq0.636619772367581343075535053490057448Q  /* 
2/pi */
+#define M_2_SQRTPIq1.128379167095512573896158903121545172Q  /* 2/sqrt(pi) 
*/
+#define M_SQRT2q   1.414213562373095048801688724209698079Q  /* sqrt(2) */
+#define M_SQRT1_2q 0.707106781186547524400844362104849039Q  /* 1/sqrt(2) */
 
 #define __quadmath_extern_inline \
   extern inline __attribute__ ((__gnu_inline__))
2019-10-21  Jakub Jelinek  

Backported from mainline
2019-08-09  Jakub Jelinek  

PR c/91401
* c-parser.c (c_parser_omp_clause_dist_schedule): Fix up typos in the
check_no_duplicate_clause call.  Comment it out, instead emit a
warning for duplicate dist_schedule clauses.

* parser.c (cp_parser_omp_clause_dist_schedule): Comment out the
check_no_duplicate_clause call, instead emit a warning for duplicate
dist_schedule clauses.

* c-c++-common/gomp/pr91401-1.c: New test.
* c-c++-common/gomp/pr91401-2.c: New test.

--- gcc/c/c-parser.c(revision 274225)
+++ gcc/c/c-parser.c(revision 274226)
@@ -14811,7 +14811,10 @@ c_parser_omp_clause_dist_schedule (c_par
 c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
   "expected %<,%> or %<)%>");
 
-  check_no_duplicate_clause (list, OMP_CLAUSE_SCHEDULE, "schedule");
+  /* check_no_duplicate_clause (list, OMP_CLAUSE_DIST_SCHEDULE,
+   "dist_schedule"); */
+  if (omp_find_clause (list, OMP_CLAUSE_DIST_SCHEDULE))
+warning_at (loc, 0, "too many %qs clauses", "dist_schedule");
   if (t == error_mark_node)
 return list;
 
--- gcc/cp/parser.c (revision 274225)
+++ gcc/cp/parser.c (revision 274226)
@@ -35258,8 +35258,10 @@ cp_parser_omp_clause_dist_schedule (cp_p
   else if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_COMMA_CLOSE_PAREN))
 goto resync_fail;
 
-  check_no_duplicate_clause (list, OMP_CLAUSE_DIST_SCHEDULE, "dist_schedule",
-location);
+  /* check_no_duplicate_clause (list, OMP_CLAUSE_DIST_SCHEDULE,
+   "dist_schedule", location); */
+  if (omp_find_clause (list, OMP_CLAUSE_DIST_SCHEDULE))
+  

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Christophe Lyon

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a way 
that it will run the GCC testsuite after each of your patches, so I'll keep in 
mind not to report regressions (I've noticed several already).


I can perform a manual validation taking your 29 patches as a single one and 
compare the results with those of the revision preceding the one were you 
committed patch #1. Do you think it would be useful?


Christophe




[wwwdocs] readings.html - pubs.opengroup.org has moved to https

2019-10-21 Thread Gerald Pfeifer
Committed.

Gerald

- Log -
commit 2ae4a89035240f0df48cefdc0bd3270e193f49f9
Author: Gerald Pfeifer 
Date:   Mon Oct 21 13:45:44 2019 +0200

pubs.opengroup.org has moved to https.

diff --git a/htdocs/readings.html b/htdocs/readings.html
index fd42aaf..203b590 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -641,7 +641,7 @@ names.
   300pp. ISBN: 1-8-179-X.
 
   https://www.opengroup.org/;>The Open Group has quite a bit
-  on http://pubs.opengroup.org/onlinepubs/009695399/toc.htm;>POSIX
+  on https://pubs.opengroup.org/onlinepubs/009695399/toc.htm;>POSIX
   and friends.
   
   http://www.unicode.org;>Unicode and 

[PATCH] Fix PR92161

2019-10-21 Thread Richard Biener


Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-21  Richard Biener  

PR tree-optimization/92161
* tree-vect-loop.c (vect_analyze_loop_2): Reset stmts def-type
for reductions.

* gfortran.dg/pr92161.f: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277237)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2260,6 +2260,17 @@ again:
{
  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (si));
  STMT_SLP_TYPE (stmt_info) = loop_vect;
+ if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
+ || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
+   {
+ /* vectorizable_reduction adjusts reduction stmt def-types,
+restore them to that of the PHI.  */
+ STMT_VINFO_DEF_TYPE (STMT_VINFO_REDUC_DEF (stmt_info))
+   = STMT_VINFO_DEF_TYPE (stmt_info);
+ STMT_VINFO_DEF_TYPE (vect_stmt_to_vectorize
+   (STMT_VINFO_REDUC_DEF (stmt_info)))
+   = STMT_VINFO_DEF_TYPE (stmt_info);
+   }
}
   for (gimple_stmt_iterator si = gsi_start_bb (bb);
   !gsi_end_p (si); gsi_next ())
Index: gcc/testsuite/gfortran.dg/pr92161.f
===
--- gcc/testsuite/gfortran.dg/pr92161.f (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92161.f (working copy)
@@ -0,0 +1,23 @@
+! { dg-do compile }
+! { dg-options "-O1 -ftree-loop-vectorize -fno-signed-zeros 
-fno-trapping-math" }
+! { dg-additional-options "-mvsx" { target { powerpc*-*-* } } }
+  COMPLEX FUNCTION R1 (ZR, CC, EA, U6)
+
+  INTEGER ZR, U6, FZ, J2
+  COMPLEX EA(*), CC
+  DOUBLE PRECISION OS, GA, YU, XT
+
+  OS = DBLE(REAL(CC))
+  GA = DBLE(AIMAG(CC))
+  J2 = 1
+
+  DO 5 FZ = 1, ZR
+YU = DBLE(REAL(EA(J2)))
+XT = DBLE(AIMAG(EA(J2)))
+OS = OS + (YU * 2) - (XT * 2)
+GA = GA + (YU * 3) + (XT * 3)
+J2 = J2 + U6
+5 CONTINUE
+  R1 = CMPLX(REAL(OS), REAL(GA))
+  RETURN
+  END


Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-21 Thread Jakub Jelinek
On Mon, Oct 21, 2019 at 05:45:16AM -0500, Segher Boessenkool wrote:
> On Sun, Oct 20, 2019 at 09:51:22PM +0200, Uros Bizjak wrote:
> > On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
> > > As mentioned in the PR, the x86 backend has various define_insn_and_split
> > > patterns that are meant to match usually during combine, are then
> > > unconditionally split during split1 pass and as they have && 
> > > can_create_pseudo_p ()
> > > in their define_insn condition, if they get matched after split1, nothing
> > > would split them anymore and they wouldn't match after reload.
> > >
> > > The split1 pass already sets a property that can be used.
> > >
> > > I've first tried to remove some constraints and associated attributes, but
> > > it seems from further discussions in the PR I don't know much about the
> > > reasons why they were added and if they are still needed or not, so this
> > > version of the patch just replaces the can_create_pseudo_p () conditions
> > > with a new predicate that stops matching already after the split1 pass.
> > 
> > As explained by Segher in the PR, there is no 100% guarantee that
> > combine won't produce a pattern with a wrong hard register as a e.g.
> > count reg of a shift insn. RA will die on this kind of pattern with
> > reload failure, so these constraints are used together with
> > ix86_legitimate_combined_insn target hook to reject invalid
> > combinations involving hard registers.
> 
> And even that isn't completely safe, there is nothing that stops other
> passes from creating the offending insns.

Sure, but with the patch either they are created before split1, then they
will be split there and all is fine, or they are created after split1 and
will not match.  The only problematic case would be if they are created
during split1 when splitting other insns, not really sure if split1 recurses
on the newly created insns, but even if it doesn't, I'd say it would be a
backend bug that it created them and I strongly doubt i386 backend does
that.

Jakub



Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)

2019-10-21 Thread Jakub Jelinek
On Mon, Oct 21, 2019 at 06:41:46PM +0800, Bin.Cheng wrote:
> The overflow check is difficult, IIUC, checks on bit precision in the
> patch is not enough?  Considering an unsigned 64-bit candidate with

I think it can cover some cases, especially on 64-bit targets, but will
leave other cases out.

I guess the first question is what iv->no_overflow means, the IVs
are base + iteration * step, does no_overflow mean that neither iteration *
step, nor base + iteration * step overflows?
For the debug info and division, guess we care primarily if just the
iteration * step part doesn't overflow.

There are still cases where I fear even the precision check might not be
enough.  If ustep is a power of two and so is rat, then I think we should be
fine, or if use->iv->no_overflow, but otherwise I fear of say unsigned char
use and unsigned short candidate, with ustep 3 and cstep 9 (i.e. rat 3).
If the loop has more than 65536 / 9 iterations, the var - cbase
will overflow, so will have values 0, 9, ..., 0xfff0, 0xfff9, 2, 11, ...
and corresponding use computed using division (assuming ubase is 0)
0 + (unsigned char) ((var - cbase) / 3):
0, 3, ..., 0x50, 0x53, 0, 3, ...
which is incorrect.
Simple testcase -O3 -g -fno-tree-dce:

void
foo (unsigned short c, unsigned char *p)
{
  unsigned char a = 0;
  unsigned short b = 0;
  for (; b != c; a += 3, b += 9)
p[b]++;
}

where I think with the patch it will be wrong-debug on 7282th iteration etc.

So I wonder if for correctness I don't need to add:

  if (!use->iv->no_overflow
  && !cand->iv->no_overflow
  && !integer_pow2p (cstep))
return NULL_TREE;

with some of the above as comment explaining why.

On the other side, if cand->iv->no_overflow, couldn't we bypass the extra
precision test?

And related to the first question above, perhaps incrementally, couldn't we
track separately whether the whole base + iteration * step overflows (i.e.
what we track right now) and also just whether iteration * step overflows in
a separate bool?  Because when we subtract the base from the value, all we
care about is iteration * step.

Jakub



[PATCH] [MIPS] PR82981 - mulditi3 pattern for MIPS64R6

2019-10-21 Thread Mihailo Stojanovic
This expands the existing MIPS mulditi3 pattern by adding support for
MIPS64R6 multiplication instructions.

gcc/ChangeLog:

* config/mips/mips.md (mulditi3): Generate patterns for high
doubleword and low doubleword result of multiplication on
MIPS64R6.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips64r6-ti-mult.c: New test.

---

Not sure if I should add "PR target/82981" above the ChangeLog entries,
as there was already one patch which addressed the issue, but didn't
resolve it completely.
---
 gcc/config/mips/mips.md  | 15 ---
 gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c | 16 
 2 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 4ad5c62..658f5e6 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2464,9 +2464,11 @@
   [(set (match_operand:TI 0 "register_operand")
(mult:TI (any_extend:TI (match_operand:DI 1 "register_operand"))
 (any_extend:TI (match_operand:DI 2 "register_operand"]
-  "ISA_HAS_DMULT && !( == ZERO_EXTEND && TARGET_FIX_VR4120)"
+  "ISA_HAS_R6DMUL
+   || (ISA_HAS_DMULT
+   && !( == ZERO_EXTEND && TARGET_FIX_VR4120))"
 {
-  rtx hilo;
+  rtx hilo, hi, lo;
 
   if (TARGET_MIPS16)
 {
@@ -2476,9 +2478,16 @@
 }
   else if (TARGET_FIX_R4000)
 emit_insn (gen_mulditi3_r4000 (operands[0], operands[1], operands[2]));
-  else
+  else if (ISA_HAS_DMULT)
 emit_insn (gen_mulditi3_internal (operands[0], operands[1],
 operands[2]));
+  else
+{
+  hi = mips_subword (operands[0], 1);
+  lo = mips_subword (operands[0], 0);
+  emit_insn (gen_muldi3_mul3_nohilo (lo, operands[1], operands[2]));
+  emit_insn (gen_muldi3_highpart_r6 (hi, operands[1], operands[2]));
+}
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c 
b/gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c
new file mode 100644
index 000..f969e76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=64 -march=mips64r6" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } } */
+
+typedef unsigned __int128 u128;
+typedef unsigned long long u64;
+
+u128
+test (u64 a, u64 b)
+{
+  return (u128)a * (u128)b;
+}
+
+/* { dg-final { scan-assembler-not "__multi3" } } */
+/* { dg-final { scan-assembler "dmul" } } */
+/* { dg-final { scan-assembler "dmuhu" } } */
-- 
2.7.4



Re: [PATCH] Fix (hypothetical) problem with pre-reload splitters (PR target/92140)

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 09:51:22PM +0200, Uros Bizjak wrote:
> On Sun, Oct 20, 2019 at 1:24 PM Jakub Jelinek  wrote:
> > As mentioned in the PR, the x86 backend has various define_insn_and_split
> > patterns that are meant to match usually during combine, are then
> > unconditionally split during split1 pass and as they have && 
> > can_create_pseudo_p ()
> > in their define_insn condition, if they get matched after split1, nothing
> > would split them anymore and they wouldn't match after reload.
> >
> > The split1 pass already sets a property that can be used.
> >
> > I've first tried to remove some constraints and associated attributes, but
> > it seems from further discussions in the PR I don't know much about the
> > reasons why they were added and if they are still needed or not, so this
> > version of the patch just replaces the can_create_pseudo_p () conditions
> > with a new predicate that stops matching already after the split1 pass.
> 
> As explained by Segher in the PR, there is no 100% guarantee that
> combine won't produce a pattern with a wrong hard register as a e.g.
> count reg of a shift insn. RA will die on this kind of pattern with
> reload failure, so these constraints are used together with
> ix86_legitimate_combined_insn target hook to reject invalid
> combinations involving hard registers.

And even that isn't completely safe, there is nothing that stops other
passes from creating the offending insns.

i386 isn't the only target doing nasty things here, btw., the sky hasn't
come crashing down for many years, we'll all be okay.  Fingers crossed.


Segher


Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)

2019-10-21 Thread Bin.Cheng
On Sat, Oct 19, 2019 at 2:27 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, the following patch attempts to address two issues.
> In remove_unused_ivs we first find the best iv_cand (we prefer primarily the
> same step, next same mode and lastly constant base) and only then call
> get_computation_at to determine the replacement expression.  Unfortunately,
> in various cases get_computation_at can return NULL_TREE and in that case we
> don't try any other candidate and just leave the vars for debug as is, which
> results in => NULL and the IVs .
>
> The following patch will only consider candidates for which
> get_computation_at succeeds, while it can be slower, it can handle more
> cases.  Perhaps alternative would be to have two passes, pick up the best
> candidate without calling get_computation_at, calling it on the best
> candidate and if that fails, retry the best candidate search with calling
> get_computation_at.
>
> Another thing is that get_computation_at can only handle cases where
> use can be expressed as ubase + (var - cbase) * ratio for integral ratio.
> In the PR testcase we don't have any, one (the one we pick as best)
> has a ratio 1/15 and the other 1/4.  Using a division in ivopts at runtime
> is hardly ever desirable, but for debug info we don't mind that, all we need
> to ensure is that we don't result in wrong-debug.
> The patch implements expressing use as ubase + (var - cbase) / ratio for
> these debug info only uses, but so far requires that if the use IV couldn't
> wrap around, that the candidate (var - cbase) can't wrap around either,
> which should be true whenever the candidate type is IMHO at least
> ceil_log2 (ratio) bits larger than the use type.  Do I need to punt if
> !use->iv->no_overflow, or is
> ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps?
> And as questioned in the PR, are there other cases where we can safely
> assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
> without those extra precision checks)?
>
> Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux.
Hi Jakub, thanks very much for working on this.

As for choosing the best candidate, I was thinking to reuse existing
get_computation_cost interface, but looks like it requires non-trivial
modification.  Your patch is simpler.

The overflow check is difficult, IIUC, checks on bit precision in the
patch is not enough?  Considering an unsigned 64-bit candidate with
unknown compilation start value.  My original idea was using
no_overflow flag, apparently this information is not well computed and
inaccurate in most cases?  At last, for now there is no link between
candidate and its original iv.  IIRC, when a candidate is derived from
a no_overflow pointer candidate, it's not safe to mark the candidate
as no_overflow.  I am not sure if I remember wrong, unfortunately, I
don't have examples either.

Thanks,
bin
>
> Comparing bootstrapped cc1plus without and with this patch (the former with
> the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is
> identical) shows:
> -  [33] .debug_info   PROGBITS 22a4788 749275e 00 
>  0   0  1
> -  [34] .debug_abbrev PROGBITS 9736ee6 204aad 00  
> 0   0  1
> -  [35] .debug_line   PROGBITS 993b993 1688464 00 
>  0   0  1
> -  [36] .debug_strPROGBITS afc3df7 6f65aa 01  
> MS  0   0  1
> -  [37] .debug_locPROGBITS b6ba3a1 71a2dde 00 
>  0   0  1
> -  [38] .debug_ranges PROGBITS 1285d17f 16414d0 
> 00  0   0  1
> -  [39] .symtab   SYMTAB   13e9e650 166ff8 18 
> 40 38193  8
> -  [40] .strtab   STRTAB   14005648 2ad809 00 
>  0   0  1
> -  [41] .shstrtab STRTAB   142b2e51 0001a0 00 
>  0   0  1
> +  [33] .debug_info   PROGBITS 22a4788 749365e 00 
>  0   0  1
> +  [34] .debug_abbrev PROGBITS 9737de6 204a9f 00  
> 0   0  1
> +  [35] .debug_line   PROGBITS 993c885 1688f0c 00 
>  0   0  1
> +  [36] .debug_strPROGBITS afc5791 6f65aa 01  
> MS  0   0  1
> +  [37] .debug_locPROGBITS b6bbd3b 71cd404 00 
>  0   0  1
> +  [38] .debug_ranges PROGBITS 1288913f 16414b0 
> 00  0   0  1
> +  [39] .symtab   SYMTAB   13eca5f0 166ff8 18 
> 40 38193  8
> +  [40] .strtab   STRTAB   140315e8 2ad809 00 
>  0   0  1
> +  [41] .shstrtab STRTAB   142dedf1 0001a0 00 
>  0   0  1
> so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger
> (0.15%), so there are some changes with this, but not 

Re: gcc-wwwdocs branch master updated. cdc7bf90357701877546f8bac160d0fb9e20b334

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 06:06:30PM +0200, Gerald Pfeifer wrote:
> On Wed, 9 Oct 2019, js...@gcc.gnu.org wrote:
> > +Use "git commit" and "git push origin
> > +master" to check in the patch.
> 
> I will admit I made a couple of first commits without reading those
> details and just used a plain "git push".  
> 
> Is there any problem with that, any drawback?
> 
> Or could we simplify those instructions?

Neither command works in all cases, and both work in the simpler cases.

"git push" will push what you have configured to push for the current branch
you are on.  If you are on your local "master", and you have followed the
simple instructions above, that will probably push your local "master" to
the "master" on your remote "origin" (that is, "git push origin master:master").

"git push origin master" makes it explicit what remote you want to push to,
and what local branch you want to push, but not the remote branch you want
to push to.

"git push" is slightly more dangerous, but git will prevent you from doing
most dangerous things, so that's alright.  The instructions as-is are a good
habit to get into, and for more advanced things you *do* need to name your
remote and branches; you'll have to learn them some time, why not now.

Both commands are fine, if you have your changes on local master.


However.  *Never ever* use "git push -f".  Don't drive without safety belts.

(Use "+" when you want to force push a branch, don't force push the world).


Segher


[committed][vect] Only change base alignment if more restrictive

2019-10-21 Thread Andre Vieira (lists)

Hi all,

This patch was pre-approved by richi.  This patch makes sure 
ensure_base_align only changes alignment if the new alignment is more 
restrictive.  It already did this if we were dealing with symbols, but 
it now does it for all types of declarations.


Committed in revision r277238.

Cheers,
Andre

2019-10-21  Andre Vieira  

* tree-vect-stmts (ensure_base_align): Only change alignment if new
alignment is more restrictive.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e606945d536ad3353b2317aed5b504a89eec9fc3..a3f99eecdf0976147ddd48725d62b9f18aea08cd 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6288,7 +6288,7 @@ ensure_base_align (dr_vec_info *dr_info)
 
   if (decl_in_symtab_p (base_decl))
 	symtab_node::get (base_decl)->increase_alignment (align_base_to);
-  else
+  else if (DECL_ALIGN (base_decl) < align_base_to)
 	{
 	  SET_DECL_ALIGN (base_decl, align_base_to);
   DECL_USER_ALIGN (base_decl) = 1;


Re: Move code out of vect_slp_analyze_bb_1

2019-10-21 Thread Richard Sandiford
Richard Biener  writes:
> On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
>> wrote:
After the previous patch, it seems more natural to apply the
PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
the region is, rather than delaying it to vect_slp_analyze_bb_1.
(But rather than carve out the biggest region possible and then
reject it, wouldn't it be better to stop when the region gets
too big, so we at least have a chance of vectorising something?)

It also seems more natural for vect_slp_bb_region to create the
bb_vec_info itself rather than (a) having to pass bits of data down
for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
on every failure return.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> Ok. But I wonder what the reason was for this limit? Dependency
>>analysis was greatly simplified, being no longer quadratic here. Can
>>you check where the limit originally came from? But indeed splitting
>>the region makes more sense then, but at dataref group boundaries I'd
>>say. 
>>
>>Yeah, looks it was the complexity of dependence analysis:
>>
>>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
>
> OK. We no longer
> Call compute dependence between all memory refs but only verify we can do 
> those code-motions we need to do. That's of course much harder to check a 
> bound on upfront (it's still quadratic in the worst case). I'm also not sure 
> this is ever a problem, but we might instead count the number of stmts 
> involving memory? 

OK, here's what I tried.  It exposes quadratic behaviour for
gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
all in the name of "vectorising" using V1DI. :-)

In that test we have N data references in which the even elements appear
to be vectorisable but the odd elements don't.  First time round, we hit:

  /* For basic block SLP, try to break the group up into multiples of the
 vector size.  */
  unsigned HOST_WIDE_INT const_nunits;
  if (is_a  (vinfo)
  && STMT_VINFO_GROUPED_ACCESS (stmt_info)
  && DR_GROUP_FIRST_ELEMENT (stmt_info)
  && nunits.is_constant (_nunits))

This breaks off the leading even element and recurses, discards the
leading odd element, and then recurses on the rest.  We then come round
to the same code when recursing on the rest.

It looks like matches is valid for all elements, so I guess we should
divide the group based on all false elements, not just the first?

Richard


2019-10-21  Richard Sandiford  

gcc/
* params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with...
(PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param.
* doc/invoke.texi: Update accordingly.
* tree-vect-slp.c (vect_slp_bb): Compute region_end during the
search loop.  Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB
data references.

Index: gcc/params.def
===
--- gcc/params.def  2019-10-16 10:50:00.064243704 +0100
+++ gcc/params.def  2019-10-21 10:48:45.769347466 +0100
@@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID
  0, 0, 1)
 
 /* Avoid SLP vectorization of large basic blocks.  */
-DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
- "slp-max-insns-in-bb",
- "Maximum number of instructions in basic block to be considered for "
- "SLP vectorization.", 1000, 0, 0)
+DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB,
+ "slp-max-memrefs-in-bb",
+ "Maximum number of memory references to be considered for "
+ "SLP vectorization.", 1024, 0, 0)
 
 DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
  "min-insn-to-prefetch-ratio",
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100
+++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100
@@ -12248,9 +12248,8 @@ by the copy loop headers pass.
 @item vect-epilogues-nomask
 Enable loop epilogue vectorization using smaller vector size.
 
-@item slp-max-insns-in-bb
-Maximum number of instructions in basic block to be
-considered for SLP vectorization.
+@item slp-max-memrefs-in-bb
+Maximum number of memory references to be considered for SLP vectorization.
 
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2019-10-21 07:41:32.997886232 +0100
+++ gcc/tree-vect-slp.c 2019-10-21 10:48:45.769347466 +0100
@@ -3075,12 +3075,19 @@ vect_slp_bb (basic_block bb)
   while (!gsi_end_p (gsi))
 {
   gimple_stmt_iterator region_begin = gsi;
+  gimple_stmt_iterator region_end;
   vec datarefs = vNULL;
+  unsigned int max_datarefs = PARAM_VALUE 

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 12:21:21PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:31, Segher Boessenkool wrote:
> > I have a bunch of testcases from when I did something similar for PowerPC
> > that I wanted to test...  But I cannot get your series to apply.  Do you
> > have a git repo I can pull from?
> 
> Perhaps because it's already committed to trunk?

Oh probably.  Duh.  Thanks :-)

> > u64 addH(u64 a) { return a + 0x12345678ULL; }
> > u64 addH0(u64 a) { return a + 0x1234ULL; }

(If you change those to 0x340078ULL etc. it'll test as meant on arm
as well: to see if it uses immediates in the insn where it can.  It looks
like it'll work fine fwiw).

> We do pretty well on this.  Only addSHm1 needs three insns (except where
> the constant isn't valid for arm), and I think that's the minimum for
> this case anyway.  Several of the tests only need one insn.

Yeah, very nice :-)


Segher


[PATCH][AArch64] Implement __rndr, __rndrrs intrinsics

2019-10-21 Thread Kyrill Tkachov

Hi all,

This patch implements the recently published[1] __rndr and __rndrrs 
intrinsics used to access the RNG in Armv8.5-A.

The __rndrrs intrinsics can be used to reseed the generator too.
They are guarded by the __ARM_FEATURE_RNG feature macro.
A quirk with these intrinsics is that they store the random number in 
their pointer argument and return a status

code if the generation succeeded.

The instructions themselves write the CC flags indicating the success of 
the operation that we can then read with a CSET.
Therefore this implementation makes use of the IGNORE indicator to the 
builtin expand machinery to avoid generating
the CSET if its result is unused (the CC reg clobbering effect is still 
reflected in the pattern).
I've checked that using unspec_volatile prevents undesirable CSEing of 
the instructions.


Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk.
Thanks,
Kyrill

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics


2019-10-21  Kyrylo Tkachov  

    * config/aarch64/aarch64.md (UNSPEC_RNDR, UNSPEC_RNDRRS): Define.
    (aarch64_rndr): New define_insn.
    (aarch64_rndrrs): Likewise.
    * config/aarch64/aarch64.h (AARCH64_ISA_RNG): Define.
    (TARGET_RNG): Likewise.
    * config/aarch64/aarch64.c (aarch64_expand_builtin): Use IGNORE
    argument.
    * config/aarch64/aarch64-protos.h (aarch64_general_expand_builtin):
    Add fourth argument in prototype.
    * config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
    Add AARCH64_BUILTIN_RNG_RNDR, AARCH64_BUILTIN_RNG_RNDRRS.
    (aarch64_init_rng_builtins): Define.
    (aarch64_general_init_builtins): Call aarch64_init_rng_builtins.
    (aarch64_expand_rng_builtin): Define.
    (aarch64_general_expand_builtin): Use IGNORE argument, handle
    RNG builtins.
    * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
    __ARM_FEATURE_RNG when TARGET_RNG.
    * config/aarch64/arm_acle.h (__rndr, __rndrrs): Define.

2019-10-21  Kyrylo Tkachov  

    * gcc.target/aarch64/acle/rng_1.c: New test.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index e02ece8672a633833a63993e24a156dd5ff64e69..23a0db4f74f56c48b0d0c34d5ac61e5d8334bc43 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -445,6 +445,9 @@ enum aarch64_builtins
   AARCH64_TME_BUILTIN_TCOMMIT,
   AARCH64_TME_BUILTIN_TTEST,
   AARCH64_TME_BUILTIN_TCANCEL,
+  /* Armv8.5-a RNG instruction builtins.  */
+  AARCH64_BUILTIN_RNG_RNDR,
+  AARCH64_BUILTIN_RNG_RNDRRS,
   AARCH64_BUILTIN_MAX
 };
 
@@ -,7 +1114,25 @@ aarch64_init_tme_builtins (void)
    AARCH64_TME_BUILTIN_TCANCEL);
 }
 
+/* Add builtins for Random Number instructions.  */
+
+static void
+aarch64_init_rng_builtins (void)
+{
+  tree unsigned_ptr_type = build_pointer_type (unsigned_intDI_type_node);
+  tree ftype
+= build_function_type_list (integer_type_node, unsigned_ptr_type, NULL);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RNG_RNDR]
+= aarch64_general_add_builtin ("__builtin_aarch64_rndr", ftype,
+   AARCH64_BUILTIN_RNG_RNDR);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RNG_RNDRRS]
+= aarch64_general_add_builtin ("__builtin_aarch64_rndrrs", ftype,
+   AARCH64_BUILTIN_RNG_RNDRRS);
+}
+
+
 /* Initialize all builtins in the AARCH64_BUILTIN_GENERAL group.  */
+
 void
 aarch64_general_init_builtins (void)
 {
@@ -1144,6 +1165,7 @@ aarch64_general_init_builtins (void)
 
   aarch64_init_crc32_builtins ();
   aarch64_init_builtin_rsqrt ();
+  aarch64_init_rng_builtins ();
 
   tree ftype_jcvt
 = build_function_type_list (intSI_type_node, double_type_node, NULL);
@@ -1607,10 +1629,48 @@ aarch64_expand_builtin_tme (int fcode, tree exp, rtx target)
 return target;
 }
 
+/* Expand a random number builtin EXP with code FCODE, putting the result
+   int TARGET.  If IGNORE is true the return value is ignored.  */
+
+rtx
+aarch64_expand_rng_builtin (tree exp, rtx target, int fcode, int ignore)
+{
+  rtx pat;
+  enum insn_code icode;
+  if (fcode == AARCH64_BUILTIN_RNG_RNDR)
+icode = CODE_FOR_aarch64_rndr;
+  else if (fcode == AARCH64_BUILTIN_RNG_RNDRRS)
+icode = CODE_FOR_aarch64_rndrrs;
+  else
+gcc_unreachable ();
+
+  rtx rand = gen_reg_rtx (DImode);
+  pat = GEN_FCN (icode) (rand);
+  if (!pat)
+return NULL_RTX;
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  rtx res_addr = expand_normal (arg0);
+  res_addr = convert_memory_address (Pmode, res_addr);
+  rtx res_mem = gen_rtx_MEM (DImode, res_addr);
+  emit_insn (pat);
+  emit_move_insn (res_mem, rand);
+  /* If the status result is unused don't generate the CSET code.  */
+  if (ignore)
+return target;
+
+  rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+  rtx cmp_rtx = gen_rtx_fmt_ee (NE, SImode, cc_reg, const0_rtx);
+  emit_insn (gen_aarch64_cstoresi (target, cmp_rtx, cc_reg));
+  return target;
+}
+
 /* Expand an expression EXP that calls built-in function FCODE,
-   with result going to 

Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-10-21 Thread Stam Markianos-Wright


On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * config/arm/arm.md: Update b for Thumb2 range checks.
>>  * config/arm/arm.c: New function arm_gen_far_branch.
>>  * config/arm/arm-protos.h: New function arm_gen_far_branch
>>  prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * testsuite/gcc.target/arm/pr91816.c: New test.
> 
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>> cpu_arch_option *,
>>   
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>   
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
>> +
>> +
> 
> Lets get the nits out of the way.
> 
> Unnecessary extra new line, need a space between int and const above.
> 
> 

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>   
>> +
>> +/* Generate code to enable conditional branches in functions over 1 MiB.  */
>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +const char * branch_format)
> 
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
> 

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>> label_ptr);
>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
>> (-2040->2048).
>>   ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
>>   
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>>(match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>>(pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -{
>> -  arm_ccfsm_state += 2;
>> -  return \"\";
>> -}
>> -  return \"b%d1\\t%l0\";
>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> +  {
>> +arm_ccfsm_state += 2;
>> +return \"\";
>> +  }
>> + switch (get_attr_length (insn))
>> +  {
>> +// Thumb2 16-bit b{cond}
>> +case 2:
>> +
>> +// Thumb2 32-bit b{cond}
>> +case 4: return \"b%d1\\t%l0\";break;
>> +
>> +// Thumb2 b{cond} out of range.  Use unconditional branch.
>> +case 8: return arm_gen_far_branch \
>> +(operands, 0, \"Lbcond\", \"b%D1\t\");
>> +break;
>> +
>> +// A32 b{cond}
>> +default: return \"b%d1\\t%l0\";
>> +  }
> 
> Please fix indentation here.
> 

.Fixed together with below changes.

>> "
>> [(set_attr "conds" "use")
>>  (set_attr "type" "branch")
>>  (set (attr "length")
>> -(if_then_else
>> -   (and (match_test "TARGET_THUMB2")
>> -(and (ge (minus (match_dup 0) (pc)) (const_int -250))
>> - (le (minus (match_dup 0) (pc)) (const_int 256
>> -   (const_int 2)
>> -   (const_int 4)))]
>> +(if_then_else (match_test 

[PATCH] Record externals/invariants in the SLP graph

2019-10-21 Thread Richard Biener


This patch adds SLP nodes for invariant/external operands of stmts
to the SLP graph.  That in turn allows to simplfy vectorized
operand gathering.  The new SLP nodes are not yet first-class
citizens but this is a step in the correct direction plus it
allows us to scrap more of the IL-operand swapping (with the
stmt rewriting for swapping COND_EXPR operands remaining).

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2019-10-21  Richard Biener  

* tree-vectorizer.h (_slp_tree::ops): New member.
(SLP_TREE_SCALAR_OPS): New.
(vect_get_slp_defs): Adjust prototype.
* tree-vect-slp.c (vect_free_slp_tree): Release
SLP_TREE_SCALAR_OPS.
(vect_create_new_slp_node): Initialize it.  New overload for
initializing by an operands array.
(_slp_oprnd_info::ops): New member.
(vect_create_oprnd_info): Initialize it.
(vect_free_oprnd_info): Release it.
(vect_get_and_check_slp_defs): Populate the operands array.
Do not swap operands in the IL when not necessary.
(vect_build_slp_tree_2): Build SLP nodes for invariant operands.
Record SLP_TREE_SCALAR_OPS for all invariant nodes.  Also
swap operands in the operands array.  Do not swap operands in
the IL.
(vect_slp_rearrange_stmts): Re-arrange SLP_TREE_SCALAR_OPS as well.
(vect_gather_slp_loads): Fix.
(vect_detect_hybrid_slp_stmts): Likewise.
(vect_slp_analyze_node_operations_1): Search for a internal
def child for computing reduction SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_slp_analyze_node_operations): Skip ops-only stmts for
the def-type push/pop dance.
(vect_get_constant_vectors): Compute number_of_vectors here.
Use SLP_TREE_SCALAR_OPS and simplify greatly.
(vect_get_slp_vect_defs): Use gimple_get_lhs also for PHIs.
(vect_get_slp_defs): Simplify greatly.
* tree-vect-loop.c (vectorize_fold_left_reduction): Simplify.
(vect_transform_reduction): Likewise.
* tree-vect-stmts.c (vect_get_vec_defs): Simplify.
(vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277237)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5301,10 +5301,7 @@ vectorize_fold_left_reduction (stmt_vec_
   if (slp_node)
 {
   auto_vec > vec_defs (2);
-  auto_vec sops(2);
-  sops.quick_push (ops[0]);
-  sops.quick_push (ops[1]);
-  vect_get_slp_defs (sops, slp_node, _defs);
+  vect_get_slp_defs (slp_node, _defs);
   vec_oprnds0.safe_splice (vec_defs[1 - reduc_index]);
   vec_defs[0].release ();
   vec_defs[1].release ();
@@ -6473,16 +6470,8 @@ vect_transform_reduction (stmt_vec_info
{
  /* Get vec defs for all the operands except the reduction index,
 ensuring the ordering of the ops in the vector is kept.  */
- auto_vec slp_ops;
  auto_vec, 3> vec_defs;
-
- slp_ops.quick_push (ops[0]);
- slp_ops.quick_push (ops[1]);
- if (op_type == ternary_op)
-   slp_ops.quick_push (ops[2]);
-
- vect_get_slp_defs (slp_ops, slp_node, _defs);
-
+ vect_get_slp_defs (slp_node, _defs);
  vec_oprnds0.safe_splice (vec_defs[0]);
  vec_defs[0].release ();
  vec_oprnds1.safe_splice (vec_defs[1]);
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 277237)
+++ gcc/tree-vect-slp.c (working copy)
@@ -79,6 +79,7 @@ vect_free_slp_tree (slp_tree node, bool
 
   SLP_TREE_CHILDREN (node).release ();
   SLP_TREE_SCALAR_STMTS (node).release ();
+  SLP_TREE_SCALAR_OPS (node).release ();
   SLP_TREE_VEC_STMTS (node).release ();
   SLP_TREE_LOAD_PERMUTATION (node).release ();
 
@@ -122,6 +123,7 @@ vect_create_new_slp_node (vec ops)
+{
+  slp_tree node;
+
+  node = XNEW (struct _slp_tree);
+  SLP_TREE_SCALAR_STMTS (node) = vNULL;
+  SLP_TREE_SCALAR_OPS (node) = ops;
+  SLP_TREE_VEC_STMTS (node).create (0);
+  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
+  SLP_TREE_CHILDREN (node) = vNULL;
+  SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
+  SLP_TREE_TWO_OPERATORS (node) = false;
+  SLP_TREE_DEF_TYPE (node) = vect_external_def;
+  node->refcnt = 1;
+  node->max_nunits = 1;
+
+  return node;
+}
+
 
 /* This structure is used in creation of an SLP tree.  Each instance
corresponds to the same operand in a group of scalar stmts in an SLP
@@ -146,6 +170,8 @@ typedef struct _slp_oprnd_info
 {
   /* Def-stmts for the operands.  */
   vec def_stmts;
+  /* Operands.  */
+  vec ops;
   /* Information about the first statement, 

Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-21 Thread Kugan Vivekanandarajah
Hi Richard,

Thanks for the pointers.



On Fri, 11 Oct 2019 at 22:33, Richard Biener  wrote:
>
> On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
>  wrote:
> >
> > Hi Richard,
> > Thanks for the review.
> >
> > On Wed, 2 Oct 2019 at 20:41, Richard Biener  
> > wrote:
> > >
> > > On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
> > > > passing assembler options specified with -Wa, to the link-time driver.
> > > >
> > > > The proposed solution only works for uniform -Wa options across all
> > > > TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
> > > > would require either adjusting partitioning according to flags or
> > > > emitting multiple object files  from a single LTRANS CU. We could
> > > > consider this as a follow up.
> > > >
> > > > Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for 
> > > > trunk?
> > >
> > > While it works for your simple cases it is unlikely to work in practice 
> > > since
> > > your implementation needs the assembler options be present at the link
> > > command line.  I agree that this might be the way for people to go when
> > > they face the issue but then it needs to be documented somewhere
> > > in the manual.
> > >
> > > That is, with COLLECT_AS_OPTION (why singular?  I'd expected
> > > COLLECT_AS_OPTIONS) available to cc1 we could stream this string
> > > to lto_options and re-materialize it at link time (and diagnose mismatches
> > > even if we like).
> > OK. I will try to implement this. So the idea is if we provide
> > -Wa,options as part of the lto compile, this should be available
> > during link time. Like in:
> >
> > arm-linux-gnueabihf-gcc -march=armv7-a -mthumb -O2 -flto
> > -Wa,-mimplicit-it=always,-mthumb -c test.c
> > arm-linux-gnueabihf-gcc  -flto  test.o
> >
> > I am not sure where should we stream this. Currently, cl_optimization
> > has all the optimization flag provided for compiler and it is
> > autogenerated and all the flags are integer values. Do you have any
> > preference or example where this should be done.
>
> In lto_write_options, I'd simply append the contents of COLLECT_AS_OPTIONS
> (with -Wa, prepended to each of them), then recover them in lto-wrapper
> for each TU and pass them down to the LTRANS compiles (if they agree
> for all TUs, otherwise I'd warn and drop them).

Attached patch streams it and also make sure that the options are the
same for all the TUs. Maybe it is a bit restrictive.

What is the best place to document COLLECT_AS_OPTIONS. We don't seem
to document COLLECT_GCC_OPTIONS anywhere ?

Attached patch passes regression and also fixes the original ARM
kernel build issue with tumb2.

Thanks,
Kugan
>
> Richard.
>
> > Thanks,
> > Kugan
> >
> >
> >
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Kugan
> > > >
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-10-02  kugan.vivekanandarajah  
> > > >
> > > > PR lto/78353
> > > > * gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in env.
> > > > (driver::main): Call putenv_COLLECT_AS_OPTION.
> > > > * lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2019-10-02  kugan.vivekanandarajah  
> > > >
> > > > PR lto/78353
> > > > * gcc.target/arm/pr78353-1.c: New test.
> > > > * gcc.target/arm/pr78353-2.c: New test.
gcc/testsuite/ChangeLog:

2019-10-21  Kugan Vivekanandarajah  

PR lto/78353
* gcc.target/arm/pr78353-1.c: New test.
* gcc.target/arm/pr78353-2.c: New test.


gcc/ChangeLog:

2019-10-21  Kugan Vivekanandarajah  
PR lto/78353
* gcc.c (putenv_COLLECT_AS_OPTIONS): New to set COLLECT_AS_OPTIONS in 
env.
(driver::main): Call putenv_COLLECT_AS_OPTIONS.
* lto-opts.c (lto_write_options): Stream COLLECT_AS_OPTIONS.
* lto-wrapper.c (merge_and_complain): Get COLLECT_AS_OPTIONS and
  make sure they are the same from all TUs.
(find_and_merge_options): Get COLLECT_AS_OPTIONS.
(run_gcc): Likewise.

From 120d61790236cbde5a5d0cb8455b0e3583dd90b2 Mon Sep 17 00:00:00 2001
From: Kugan 
Date: Sat, 28 Sep 2019 02:11:49 +1000
Subject: [PATCH] COLLECT_AS support

Change-Id: Ic05ab2cec895fd64bec75449785e42b809aa73cc
---
 gcc/gcc.c| 29 +++
 gcc/lto-opts.c   | 16 +++--
 gcc/lto-wrapper.c| 30 
 gcc/testsuite/gcc.target/arm/pr78353-1.c |  9 +++
 gcc/testsuite/gcc.target/arm/pr78353-2.c |  9 +++
 5 files changed, 86 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr78353-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr78353-2.c

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 1216cdd505a..dede47fb055 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -5239,6 +5239,34 @@ do_specs_vec (vec vec)
 }
 }
 
+/* Store 

Re: [PATCH] Use narrow mode of constant when expanding widening multiplication

2019-10-21 Thread Richard Sandiford
Jozef Lawrynowicz  writes:
> I experienced the following ICE when working on a downstream patch for msp430:
>
> void
> foo (unsigned int r, unsigned int y)
> {
>   __builtin_umul_overflow ((unsigned int) (-1), y, );
> }
>
>> msp430-elf-gcc -S tester.c -O0
>
> tester.c: In function 'foo':
> tester.c:4:1: error: unrecognizable insn:
> 4 | }
>   | ^
> (insn 16 15 17 2 (set (reg:HI 32)
> (const_int 65535 [0x])) "tester.c":3:3 -1
>  (nil))
> during RTL pass: vregs
> dump file: tester.c.234r.vregs
> tester.c:4:1: internal compiler error: in extract_insn, at recog.c:2311
>
> Following discussions on ml/gcc
> (https://gcc.gnu.org/ml/gcc/2019-10/msg00083.html), I narrowed this down to a
> call to expand_mult_highpart_adjust in expand_expr_real_2.
>
> If one of the operands is a constant, its mode had been converted to the wide
> mode of our multiplication to generate some RTL, but not converted back to the
> narrow mode before expanding what will be the high part of the result of the
> multiplication.
>
> If we look at the other two uses of expand_mult_highpart_adjust in the 
> sources,
> (both in expmed.c (expmed_mult_highpart_optab)) we can see that the narrow
> version of the constant is always used:
>   if (tem)
> /* We used the wrong signedness.  Adjust the result.  */
> return expand_mult_highpart_adjust (mode, tem, op0, narrow_op1,
> tem, unsignedp);
>
> So the attached patch updates the use in expand_expr_real_2 to also use the
> narrow version of the constant operand.
> This fixes the aforementioned ICE.

TBH, I don't understand why we have:

  if (TREE_CODE (treeop1) == INTEGER_CST)
op1 = convert_modes (mode, word_mode, op1,
 TYPE_UNSIGNED (TREE_TYPE (treeop1)));

All the following code treats op1 as having the original mode (word_mode),
and having the same mode as op0.  That's what both the optab and (like you
say) expand_mult_highpart_adjust expect.

So unless I'm missing something, it looks like all the code does is
introduce exactly this bug.

AFAICT from the history, having separate code for INTEGER_CST dates back
to when this was an expand peephole for normal MULT_EXPRs.  In that case
we had to handle two cases: when op1 was a conversion from a narrower type,
and when it was an INTEGER_CST with a suitable range.  The separate
INTEGER_CST case then got carried along in further updates but I think
became redundant when the code was moved to WIDEN_MULT_EXPR.

Removing the above is pre-approved if it works.

Thanks,
Richard


[PATCH] Report errors on inconsistent OpenACC nested reduction, clauses

2019-10-21 Thread Harwath, Frederik

Hi,
OpenACC requires that, if a variable is used in reduction clauses on two nested 
loops, then there
must be reduction clauses for that variable on all loops that are nested in 
between the two loops
and all these reduction clauses must use the same operator; this has been first 
clarified by
OpenACC 2.6. This commit introduces a check for that property which reports 
errors if the property
is violated.

I have tested the patch by comparing "make check" results and I am not aware of 
any regressions.

Gergö has implemented the check and it works, but I was wondering if the way in 
which the patch
avoids issuing errors about operator switches more than once by modifying the 
clauses (cf. the
corresponding comment in omp-low.c) could lead to problems - the processing 
might still continue
after the error on the modified tree, right? I was also wondering about the 
best place for such
checks. Should this be a part of "pass_lower_omp" (as in the patch) or should 
it run earlier
like, for instance, "pass_diagnose_omp_blocks".

Can the patch be included in trunk?

Frederik



>From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 21 Oct 2019 08:27:58 +0200
Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction
 clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
errors if it is violated.

In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
reductions on variable b; adding a reduction clause to make it compile cleanly
would make it a duplicate of the test for variable c.

2010-10-21  Gergö Barany  
		Frederik Harwath  

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC
	 constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-fail.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing reduction clauses.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	 Likewise.
---
 gcc/omp-low.c | 107 +++-
 .../goacc/nested-reductions-fail.c| 492 ++
 .../c-c++-common/goacc/nested-reductions.c| 420 +++
 .../c-c++-common/goacc/reduction-6.c  |  11 -
 .../par-loop-comb-reduction-1.c   |   2 +-
 .../par-loop-comb-reduction-2.c   |   2 +-
 .../par-loop-comb-reduction-3.c   |   2 +-
 .../par-loop-comb-reduction-4.c   |   2 +-
 8 files changed, 1022 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 279b6ef893a..a2212274685 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -127,6 +127,12 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -902,6 +908,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
   ctx->depth = outer_ctx->depth + 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
 }
   else
 {
@@ -917,6 +925,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
   ctx->depth = 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = NULL;
 }
 
   ctx->cb.decl_map = new hash_map;
@@ -1131,6 +1141,9 @@ scan_sharing_clauses 

Re: [testsuite] Add test for PR91532

2019-10-21 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On Sat, 19 Oct 2019 at 23:45, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > Hi Richard,
>> > Sorry for not adding the test in PR91532 fix.
>> > Is the attached patch OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> > 2019-10-18  Prathamesh Kulkarni  
>> >
>> >   PR tree-optimization/91532
>> > testsuite/
>> >   * gcc.target/aarch64/sve/fmla_2.c: Add dg-scan check for deleted 
>> > store.
>> >
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c 
>> > b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
>> > index 5c04bcdb3f5..bebb073d1f8 100644
>> > --- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
>> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
>> > @@ -1,4 +1,4 @@
>> > -/* { dg-options "-O3" } */
>> > +/* { dg-options "-O3 -fdump-tree-ifcvt-details" } */
>> >
>> >  #include 
>> >
>> > @@ -15,5 +15,6 @@ f (double *restrict a, double *restrict b, double 
>> > *restrict c,
>> >  }
>> >  }
>> >
>> > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "ifcvt" } } */
>> >  /* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, 
>> > z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
>> >  /* { dg-final { scan-assembler-not {\tfmad\t} } } */
>>
>> I think it'd be better to have a scan-assembler-times for st1d instead,
>> so that we're testing the end result rather than how we get there.
> Hi Richard,
> Thanks for the suggestions, is the attached patch OK ?

OK, thanks.

Richard


Re: Move code out of vect_slp_analyze_bb_1

2019-10-21 Thread Richard Biener
On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
> wrote:
>>>After the previous patch, it seems more natural to apply the
>>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
>>>the region is, rather than delaying it to vect_slp_analyze_bb_1.
>>>(But rather than carve out the biggest region possible and then
>>>reject it, wouldn't it be better to stop when the region gets
>>>too big, so we at least have a chance of vectorising something?)
>>>
>>>It also seems more natural for vect_slp_bb_region to create the
>>>bb_vec_info itself rather than (a) having to pass bits of data down
>>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
>>>on every failure return.
>>>
>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Ok. But I wonder what the reason was for this limit? Dependency
>analysis was greatly simplified, being no longer quadratic here. Can
>you check where the limit originally came from? But indeed splitting
>the region makes more sense then, but at dataref group boundaries I'd
>say. 
>
>Yeah, looks it was the complexity of dependence analysis:
>
>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html

OK. We no longer
Call compute dependence between all memory refs but only verify we can do those 
code-motions we need to do. That's of course much harder to check a bound on 
upfront (it's still quadratic in the worst case). I'm also not sure this is 
ever a problem, but we might instead count the number of stmts involving 
memory? 

>  > Is there any limit on the size of the BB you consider for
>  > vectorization?  I see we do compute_all_dependences on it - that
>  > might be quite costly.
>
>  I added slp-max-insns-in-bb parameter with initial value 1000.
>
>Thanks,
>Richard