date:20180409

Re: [PATCH] Assorted -masm=intel fixes (PR target/85281)

2018-04-09 Thread Jan Beulich

>>> "H.J. Lu"  04/09/18 9:09 PM >>>
>On Mon, Apr 9, 2018 at 11:37 AM, Jakub Jelinek  wrote:
>> BTW, -masm=intel seems to be in quite bad shape even in the assembler, in
>> various testcases I'm getting errors like on the following reduced one:
>> int k1, xmm0;
>> int foo (void) { return k1; }
>> int bar (void) { return xmm0; }
>> gcc -masm=intel -O2
>> /tmp/cch0mo1K.s: Assembler messages:
>> /tmp/cch0mo1K.s:10: Error: invalid use of register
>> /tmp/cch0mo1K.s:21: Error: invalid use of register
>> As ICC generates the same assembly on the instructions:
>> mov eax, DWORD PTR k1[rip]
>> ...
>> mov eax, DWORD PTR xmm0[rip]
>> I think either the intel syntax spec is faulty, or gas is buggy and should
>> figure out that after *WORD PTR and before [ there is symbol rather than
>> register name.  Some testcases e.g. have k1 as function name and that
>> results in other asm errors (about .size directive).
>
>How does Intel syntax support symbols like eax, k1 and xmm0 with
>".intel_syntax noprefix"?

I've noticed this problem about two weeks ago as well, and have a patch
mostly ready (but need to get around to both produce a proper testcase
and regression test the whole thing); that won't be until in a couple of
weeks time, though - if you think this is needed earlier, I can hand you
the fragments I have. As an aside - you realize this isn't an Intel syntax
only issue, as "noprefix" can as well be specified with .att_syntax.

I should note though that the fix won't go as far a Jakub suggests: Context
doesn't matter for recognizing whether a symbol is a register. For something
like the above to compile, .arch would need to be used to disable the
respective register groups (e.g. .arch .noavx512f to make k1 an ordinary
symbol).

Jan

Re: [PATCH/doc] - mention all optimization options that enable inlining options

2018-04-09 Thread Martin Sebor


On 04/09/2018 01:36 AM, Richard Biener wrote:

On Sun, 8 Apr 2018, Martin Sebor wrote:


While updating the -Wrestrict option to mention that it works
best not just with -O2 but also at higher optimization levels
I looked around for other options that might benefit from
a similar clarification.  I found a few inlining options that
only mention -O2 but that (according to -Q --help=optimizers)
are enabled at other levels as well.  The attached patch
changes the manual to reflect that.


OK.


I committed it earlier today in r259250.




Given the large number of options I didn't take the time to
check all optimization options so there could others that could
stand to be similarly improved if we want to be 100% accurate.
If that is, in fact, what we want then we might want to script
this.


Yeah, note we now have -Og and -Ofast as well...


Right, those aren't mentioned.  With so many flavors of -O does
enumerating them all for each optimization option still make
sense?  I wonder if this could this be generated by some script
instead (it would probably have to be presented in the form of
a table but that might actually make things easier to find).

Martin

[RFC] Improve tree DSE

2018-04-09 Thread Kugan Vivekanandarajah

I would like to queue this patch for stage1 review.

In DSE, while in dse_classify_store, as soon as we see a PHI use
statement that is part of the loop, we are immediately giving up.

As far as I understand, this can be improved. Attached patch is trying
to walk the uses of the PHI statement (by recursively calling
dse_classify_store) and then making sure the obtained store is indeed
redundant.

This is partly as reported in one of the testcase from PR44612. But
this PR is about other issues that is not handled in this patch.

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Is this OK for next stage1?

Thanks,
Kugan


gcc/ChangeLog:

2018-04-10  Kugan Vivekanandarajah  

* tree-ssa-dse.c (dse_classify_store): Handle recursive PHI.
(dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store.

gcc/testsuite/ChangeLog:

2018-04-10  Kugan Vivekanandarajah  

* gcc.dg/tree-ssa/ssa-dse-31.c: New test.
* gcc.dg/tree-ssa/ssa-dse-32.c: New test.
From 5751eaff3d1c263e8631d5a07e43fecaaa0e9d26 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Tue, 10 Apr 2018 09:49:10 +1000
Subject: [PATCH] improve dse

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 ++
 gcc/tree-ssa-dse.c | 51 --
 3 files changed, 81 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
new file mode 100644
index 000..e4d71b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
@@ -0,0 +1,16 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+int main ()
+{
+  static float a[SIZE];
+  int i;
+  for (i = 0; i < SIZE; i++)
+   __builtin_memset ((void *) a, 0, sizeof(float)*3);
+   __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
new file mode 100644
index 000..3d8fd5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
@@ -0,0 +1,23 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+void s4 (float *restrict a)
+{
+  (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+}
+
+
+int main ()
+{
+  int i;
+  float a[10];
+  printf("Start\n");
+  for (i = 0; i < SIZE; i++)
+s4 (a);
+  printf("Done\n");
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9220fea..3513fda 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -521,11 +521,11 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)
Return TRUE if the above conditions are met, otherwise FALSE.  */
 
 static dse_store_status
-dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
-		bool byte_tracking_enabled, sbitmap live_bytes)
+dse_classify_store (ao_ref *ref, gimple *stmt_outer, gimple *stmt,
+		gimple **use_stmt, bool byte_tracking_enabled,
+		sbitmap live_bytes, unsigned cnt = 0)
 {
   gimple *temp;
-  unsigned cnt = 0;
 
   *use_stmt = NULL;
 
@@ -556,9 +556,11 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 	{
 	  cnt++;
 
+	  if (use_stmt == stmt_outer)
+	continue;
 	  /* If we ever reach our DSE candidate stmt again fail.  We
 	 cannot handle dead stores in loops.  */
-	  if (use_stmt == stmt)
+	  else if (use_stmt == stmt)
 	{
 	  fail = true;
 	  BREAK_FROM_IMM_USE_STMT (ui);
@@ -572,8 +574,6 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 	  if (temp
 		  /* Make sure we are not in a loop latch block.  */
 		  || gimple_bb (stmt) == gimple_bb (use_stmt)
-		  || dominated_by_p (CDI_DOMINATORS,
- gimple_bb (stmt), gimple_bb (use_stmt))
 		  /* We can look through PHIs to regions post-dominating
 		 the DSE candidate stmt.  */
 		  || !dominated_by_p (CDI_POST_DOMINATORS,
@@ -582,8 +582,41 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 		  fail = true;
 		  BREAK_FROM_IMM_USE_STMT (ui);
 		}
+	  else if (dominated_by_p (CDI_DOMINATORS,
+   gimple_bb (stmt), gimple_bb (use_stmt)))
+		{
+		  gphi *phi = as_a  (use_stmt);
+		  gimple *def_stmt = SSA_NAME_DEF_STMT (PHI_RESULT (phi));
+		  enum dse_store_status status = DSE_STORE_LIVE;
+		  ao_ref use_ref;
+		  gimple *inner_use_stmt;
+
+		  /* If stmt dominates PHI stmt, follow the PHI stmt.  */
+		  if (!temp)
+		status = dse_classify_store (ref, stmt, def_stmt, _use_stmt,
+

Re: [doc patch] remove hyphen from side-effect

2018-04-09 Thread Sandra Loosemore


On 04/09/2018 05:58 PM, Martin Sebor wrote:

Ping:  I will go ahead and commit this patch sometime this
week unless I hear objections:

   https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00036.html


This patch is mostly OK, except for these hunks:


Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 259017)
+++ gcc/doc/invoke.texi (working copy)
@@ -8710,7 +8710,7 @@ default at @option{-O} and higher.
 @item -ftree-builtin-call-dce
 @opindex ftree-builtin-call-dce
 Perform conditional dead code elimination (DCE) for calls to built-in functions
-that may set @code{errno} but are otherwise side-effect free.  This flag is
+that may set @code{errno} but are otherwise side effect free.  This flag is
 enabled by default at @option{-O2} and higher if @option{-Os} is not also
 specified.
 


s/side-effect free/free of side effects/


@@ -14229,7 +14229,7 @@ not overridden} will do.
 This option is implicitly passed to the compiler for the second
 compilation requested by @option{-fcompare-debug}, along with options to
 silence warnings, and omitting other options that would cause
-side-effect compiler outputs to files or to the standard output.  Dump
+side effect compiler outputs to files or to the standard output.  Dump
 files and preserved temporary files are renamed so as to contain the
 @code{.gk} additional extension during the second compilation, to avoid
 overwriting those generated by the first.


This is impossible to parse either with or without the hyphen.  How 
about rewriting the end of that sentence as


...that would cause the compiler to produce output to files or to 
standard output as a side effect.


??


@@ -3496,7 +3496,7 @@ instructions.
 @cindex RTL predecrement
 @cindex RTL postdecrement
 
-Six special side-effect expression codes appear as memory addresses.

+Six special side effect expression codes appear as memory addresses.
 
 @table @code

 @findex pre_dec


This should either remain hyphenated (adjective phrase immediately 
before the noun), or rewritten as something like


Six special expression codes represent memory addresses with side effects.


Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  (revision 259017)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -3967,7 +3967,7 @@ pre-decrement, post-increment, or post-decrement a
 @defmac HAVE_PRE_MODIFY_DISP
 @defmacx HAVE_POST_MODIFY_DISP
 A C expression that is nonzero if the machine supports pre- or
-post-address side-effect generation involving constants other than
+post-address side effect generation involving constants other than
 the size of the memory operand.
 @end defmac
 
@@ -3974,7 +3974,7 @@ the size of the memory operand.

 @defmac HAVE_PRE_MODIFY_REG
 @defmacx HAVE_POST_MODIFY_REG
 A C expression that is nonzero if the machine supports pre- or
-post-address side-effect generation involving a register displacement.
+post-address side effect generation involving a register displacement.
 @end defmac
 
 @defmac CONSTANT_ADDRESS_P (@var{x})


I think both of these should also remain hyphenated (adjective phrase 
immediately before the modified noun).


-Sandra

Re: [doc patch] remove hyphen from side-effect

2018-04-09 Thread Martin Sebor


Ping:  I will go ahead and commit this patch sometime this
week unless I hear objections:

  https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00036.html

Martin

On 04/02/2018 03:55 PM, Martin Sebor wrote:

Attached is a patch to remove the hyphen from uses of side-effect
in the manual as per the comment in:

  https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00021.html

Although both forms are common, "side effect" (without the hyphen)
is more prevalent in current English and I think it's better to be
consistent than to have to rehash this question each time someone
happens to copy the less common form from existing text.

https://english.stackexchange.com/questions/889/when-should-compound-words-be-written-as-one-word-with-hyphens-or-with-spaces


Martin

[PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure

2018-04-09 Thread Jonathan Wakely

Define a new exception type derived from std::ios::failure[abi:cxx11]
which also aggregates an object of the gcc4-compatible ios::failure
type. Make __throw_ios_failure throw this new type for iostream errors
that raise exceptions. Provide custom type info for the new type so that
it can be caught by handlers for the gcc4-compatible ios::failure type
as well as handlers for ios::failure[abi:cxx11] and its bases.

PR libstdc++/85222
* src/c++11/Makefile.am [ENABLE_DUAL_ABI]: Add special rules for
cxx11-ios_failure.cc to rewrite type info for __ios_failure.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/cxx11-ios_failure.cc (__ios_failure, __iosfail_type_info):
New types.
[_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* src/c++11/ios.cc (__throw_ios_failure): Remove definition.
* src/c++98/ios_failure.cc (__construct_ios_failure)
(__destroy_ios_failure, is_ios_failure_handler): New functions.
[!_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* testsuite/27_io/ios_base/failure/dual_abi.cc: New.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Revert changes to
handler types, to always catch std::ios_base::failure.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.

Tested x86_64-linux and powerpc64-linux, with the default config, and
--disable-libstdcxx-dual-abi, and
--with-default-libstdcxx-abi=gcc4-compatible. I intend to commit this
to trunk and gcc-7-branch soon.
commit bdcc071eaa58aa07824b01a6da64662787abece7
Author: Jonathan Wakely 
Date:   Tue Apr 10 00:19:25 2018 +0100

PR libstdc++/85222 allow catching iostream errors as gcc4-compatible 
ios::failure

Define a new exception type derived from std::ios::failure[abi:cxx11]
which also aggregates an object of the gcc4-compatible ios::failure
type. Make __throw_ios_failure throw this new type for iostream errors
that raise exceptions. Provide custom type info for the new type so that
it can be caught by handlers for the gcc4-compatible ios::failure type
as well as handlers for ios::failure[abi:cxx11] and its bases.

PR libstdc++/85222
* src/c++11/Makefile.am [ENABLE_DUAL_ABI]: Add special rules for
cxx11-ios_failure.cc to rewrite type info for __ios_failure.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/cxx11-ios_failure.cc (__ios_failure, 
__iosfail_type_info):
New types.
[_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* src/c++11/ios.cc (__throw_ios_failure): Remove definition.
* src/c++98/ios_failure.cc (__construct_ios_failure)
(__destroy_ios_failure, is_ios_failure_handler): New functions.
[!_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* testsuite/27_io/ios_base/failure/dual_abi.cc: New.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Revert changes to
handler types, to always catch std::ios_base::failure.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.

diff --git a/libstdc++-v3/src/c++11/Makefile.am

Re: C++ PATCH for c++/85264, ICE with extra template parameter list

2018-04-09 Thread Paolo Carlini


Hi,

On 09/04/2018 21:39, Jason Merrill wrote:

cp_parser_check_template_parameters is supposed to catch when we have
the wrong number of template parameter lists, but it wasn't diagnosing
this case.  Fixed by checking whether the thing we're declaring used a
template-id; the case of declaring a primary template, when we allow
one more template parameter list, uses a plain identifier.
Nice, it looks like this also fixes the prehistoric c++/24314, which I 
still had assigned. When I worked a bit on it, relatively recently, in 
2012, I failed to properly follow-up to your feedback on the mailing 
list, but I don't think we really nailed the problem, did we?


    https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00504.html

Or maybe we just tried to do too much, like doing away completely with 
cp_parser_check_template_parameters in favor of a bit of additional 
checking in maybe_process_partial_specialization.


Paolo.

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-09 Thread Jeff Law

On 04/04/2018 06:43 AM, Peter Bergner wrote:
> On 4/4/18 4:06 AM, Tamar Christina wrote:
>> Now that I know how the loads are done, I have a patch should be both 
>> correct and generate better code in most cases.
>> It just calculates bitsize inside the loop and does the copying in the 
>> largest mode possible that's equal or less than the bits
>> That are to be copied. This avoids the issue with reading too much, honors 
>> padding and alignment and still generates better code
>> In most cases.
>>
>> I'm running the regression tests and should have a final version soon.
> 
> If you give me your patch when it's ready, I'll do bootstrap and regression
> testing on powerpc*-linux and verify it fixes the issues we hit.
Similarly, I've got a jenkins instance here were I can get a bootstrap
and regression test on the usual targets like aarch64, armv7, i686,
powerpc (32, 64, 64le), s390x, sparc64, x86_64.  But more interestingly
it'll also do a bootstrap test on alpha, hppa, m68k, sh4 and other
oddballs like aarch64-be.

A patch for the tip of the trunk is all I need.

It doesn't run the testsuite, but the ability to bootstrap on the lesser
used targets gives a level of code generator validation that is helpful.

Takes about 24hrs to cycle through everything...

jeff

Re: [PATCH][RFC] Fix PR85180, find_base_term is exponential

2018-04-09 Thread Jeff Law

On 04/05/2018 06:32 AM, Richard Biener wrote:
> 
> This fixes exponential time spent in find_base_term by extending the
> existing fix of NULL-ifying VALUE locations during their processing
> to keep them NULL-ified during the whole recursion.
> 
> As Jakub figured this has the chance of returning NULL prematurely
> for the case of the plus/minus case rejecting a found base on the
> ground of
> 
> if (base != NULL_RTX
> && ((REG_P (tmp1) && REG_POINTER (tmp1))
>  || known_base_value_p (base)))
>   return base;
> 
> so any VALUE visited during a such rejected base will not be
> visited again (but we'll get a NULL result).
> 
> Quoting him from IRC: still no differences, out of 850mil calls
> 
> for an instrumented version.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Alternative (safer in the above regard) variants using a hash-map
> to cache the base for a VALUE are quite a bit slower if not implemented
> in ways that also look dangerous at this point.  See the PR for
> some of those.
> 
> Any comments?  Otherwise we decided to go ahead with this tomorrow.
> 
> Thanks,
> Richard.
> 
> 2018-04-05  Richard Biener  
> 
>   PR middle-end/85180
>   * alias.c (find_base_term): New wrapper around find_base_term
>   unwinding CSELIB_VAL_PTR changes.
>   (find_base_term): Do not restore CSELIB_VAL_PTR during the
>   recursion.
> 
>   * gcc.dg/pr85180.c: New testcase.
Seems reasonable.  I do wonder how much all the complexity in alias.c
buys us these days and whether or not we should look to do some dramatic
simplifications.  The core parts of this code date back to the late 90s
when RTL memory disambiguation was much more important than it is today.

jeff

Re: [PATCH][explow] PR target/85173: validize memory before passing it on to target probe_stack

2018-04-09 Thread Jeff Law

On 04/05/2018 08:20 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> In this PR the expansion code emits an invalid memory address for the
> stack probe, which the backend fails to recognise.
> The address is created explicitly in
> anti_adjust_stack_and_probe_stack_clash in explow.c and passed down to
> gen_probe_stack
> without any validation in emit_stack_probe.
> 
> This patch fixes the ICE by calling validize_mem on the memory location
> before passing it down to the target.
> Jakub pointed out that we also want to create valid addresses for the
> probe_stack_address case, so this patch
> creates an expand operand and legitimizes it before passing it down to
> the probe_stack_address expander.
> 
> This patch passes bootstrap and testing on arm-none-linux-gnueabihf and
> aarch64-none-linux-gnu
> and ppc64le-redhat-linux on gcc112 in the compile farm.
> 
> Is this ok for trunk?
> 
> Thanks,
> Kyrill
> 
> P.S. Uros, the alpha probe_stack expander in alpha.md seems incompatible
> with the way the probe_stack name is
> used in the midend. It accepts a const_int operand that is used as an
> offset from the stack pointer, rather than accepting
> a full memory operand like other targets. Do you think it's better to
> rename the probe_stack pattern there to something
> that doesn't conflict with the name the midend assumes?
> 
> 2018-04-05  Kyrylo Tkachov  
> 
>     PR target/85173
>     * explow.c (emit_stack_probe): Call validize_mem on memory location
>     before passing it to gen_probe_stack.  Create address operand and
>     legitimize it for the probe_stack_address case.
> 
> 2018-04-05  Kyrylo Tkachov  
> 
>     PR target/85173
>     * gcc.target/arm/pr85173.c: New test.
Alpha should be fixed -- the docs clearly state that the operand is "the
memory reference in the stack that needs to be probed".  Just passing in
the offset seems wrong.

Note that -fstack-clash-protection on ARM (32bit) relies on the old
-fstack-check code which has a variety of issues.  It's better than
nothing, but the real way to go here is to fix prologue generation as
well as alloca/vla handling to have stack clash specific sequences.

OK for the trunk.

jeff

[PR 84149] From 3b967c133f7f62e3c2951a971d8690d242364f5d Mon Sep 17 00:00:00 2001

2018-04-09 Thread Martin Jambor

Hi,

changes in predictors caused that LTO IPA-CP no longer clones the qsort
implementation for both of the comparators used in 505.mcf_r from Spec
2017 to be quite and this resulted in quite some slowdown of the
benchmark.

I have tried various way of teaching IPA-CP that the remaining contexts
all have a single constant, but all of them required that (I either
reorder stuff on a higher level or) I special case self-recursive edges
and so I decided to do it consistently even in the analysis phase in the
patch below.  Consequently, using default parameters, IPA-CP now knows
that it is beneficial to clone for both comparators again.  However, I
have verified that when I tweak --param ipa-cp-eval-threshold so that
only one clone is created, the one created "for all remaining contexts"
knows that they only contain one comparator.

Because IPA-CP now can count self-recursive edges among those bringing a
value when cloning (as opposed next time around when we come to the node
in the SCC) and we need to make sure the original node keeps its
self-recursive edge pointing to itself, create_specialized_node must
make sure that such edges are handled properly and redirect only clones
of these edges, rather than relying on cgraphclones.c machinery.

The patch passes bootstrap, LTO-bootstrap and testing on x86_64-linux, I
have also verified the 505.mcf_r regression is gone and that it can
compile all of SPEC 2017 and 2006 with LTO (well, the config I used
failed for three benchmarks, but so it did on trunk).  I have also LTO
built Firefox with it and it browsed a few pages fine too.  OK for
trunk?

Thanks,

Martin


2018-04-08  Martin Jambor  

PR ipa/84149
* ipa-cp.c (propagate_vals_across_pass_through): Expand comment.
(cgraph_edge_brings_value_p): New parameter dest_val, check if it is
not the same as the source val.
(cgraph_edge_brings_value_p): New parameter.
(gather_edges_for_value): Pass destination value to
cgraph_edge_brings_value_p.
(perhaps_add_new_callers): Likewise.
(get_info_about_necessary_edges): Likewise and exclude values brought
only by self-recursive edges.
(create_specialized_node): Redirect only clones of self-calling edges.
(+self_recursive_pass_through_p): New function.
(find_more_scalar_values_for_callers_subset): Use it.
(find_aggregate_values_for_callers_subset): Likewise.
(known_aggs_to_agg_replacement_list): Removed.
(decide_whether_version_node): Re-calculate known constants for all
remaining context clones.
---
 gcc/ipa-cp.c | 121 ++-
 1 file changed, 78 insertions(+), 43 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index ee41a8d55b7..6a9a695fc2c 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1601,7 +1601,9 @@ propagate_vals_across_pass_through (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 
   /* Do not create new values when propagating within an SCC because if there
  are arithmetic functions with circular dependencies, there is infinite
- number of them and we would just make lattices bottom.  */
+ number of them and we would just make lattices bottom.  If this condition
+ is ever relaxed we have to detect self-feeding recursive calls in
+ cgraph_edge_brings_value_p in a smarter way.  */
   if ((ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR)
   && ipa_edge_within_scc (cs))
 ret = dest_lat->set_contains_variable ();
@@ -3470,12 +3472,12 @@ same_node_or_its_all_contexts_clone_p (cgraph_node 
*node, cgraph_node *dest)
   return info->is_all_contexts_clone && info->ipcp_orig_node == dest;
 }
 
-/* Return true if edge CS does bring about the value described by SRC to node
-   DEST or its clone for all contexts.  */
+/* Return true if edge CS does bring about the value described by SRC to
+   DEST_VAL of node DEST or its clone for all contexts.  */
 
 static bool
 cgraph_edge_brings_value_p (cgraph_edge *cs, ipcp_value_source *src,
-   cgraph_node *dest)
+   cgraph_node *dest, ipcp_value *dest_val)
 {
   struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
   enum availability availability;
@@ -3485,7 +3487,9 @@ cgraph_edge_brings_value_p (cgraph_edge *cs, 
ipcp_value_source *src,
   || availability <= AVAIL_INTERPOSABLE
   || caller_info->node_dead)
 return false;
-  if (!src->val)
+  /* At the moment we do not propagate over arithmetic jump functions in SCCs,
+ so it is safe to detect self-feeding recursive calls in this way.  */
+  if (!src->val || src->val == dest_val)
 return true;
 
   if (caller_info->ipcp_orig_node)
@@ -3521,13 +3525,14 @@ cgraph_edge_brings_value_p (cgraph_edge *cs, 
ipcp_value_source *src,
 }
 }
 
-/* Return true if edge CS does bring about the value described by SRC to node
-   DEST or its clone for all contexts.  */
+/* Return true

Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-09 Thread Steve Kargl

On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote:
> > On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> >>
> >> the attached patch removes the parallel annotation from DO CONCURRENT.
> >> As discussed in the PR, the autoparallellizer currently generates
> >> wrong code. The only feasible way is to disable the annotation for
> >> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> >> regression.
> >>
> >> Regression-tested. OK for trunk?
> >>
> > 
> > Yes.
> 
> Thanks for the review!
> 
> However, just as I was about to commit, I thought of something else
> and, hopefully, better.
> 
> The underlying problem is that the annotation does not go together
> with -ftree-parallelize-loops - so let's simply not set it if that
> flag is set.
> 
> This way, we can avoid speed regressions for people who do not use
> -ftree-parallelize-loops, and it will be possible to downgrade the
> PR to a missed-optimization bug; it is also not necessary to xfail
> vect-do-concurrent-1.f90
> 
> Regression-tested. OK for trunk, for this version?

In reviewing the PR audit trailing, I thought Richard indicated
the problem is in the autopar stage in the middle.  Removing the
annotation simply prevents gfortran's DO CURRENT from a middle 
in bug.

I'm fine with the new patch.  I'll leave it up to you to decide
which is preferred.

-- 
Steve

C++ PATCH for c++/85279, dump_expr vs. decltype

2018-04-09 Thread Jason Merrill

For some reason, cp_parser_name_lookup_error uses %E for printing the
parsing scope, so dump_expr needs to know how to handle some _TYPEs.
DECLTYPE_TYPE was missing.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4d380464839058b3d70e1262339ed973642d6741
Author: Jason Merrill 
Date:   Mon Apr 9 16:51:01 2018 -0400

PR c++/85279 - dump_expr doesn't understand decltype.

* error.c (dump_expr): Handle DECLTYPE_TYPE.

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 75e853a1428..f27b700a434 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -2714,6 +2714,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case INTEGER_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
+case DECLTYPE_TYPE:
   pp_type_specifier_seq (pp, t);
   break;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype67.C b/gcc/testsuite/g++.dg/cpp0x/decltype67.C
new file mode 100644
index 000..e8042ac59e7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype67.C
@@ -0,0 +1,7 @@
+// PR c++/85279
+// { dg-do compile { target c++11 } }
+
+template struct A
+{
+  void foo(decltype(T())::Y);	// { dg-error {decltype\(T\(\)\)::Y} }
+};

Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-09 Thread Thomas Koenig


Hi Steve,


On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:


the attached patch removes the parallel annotation from DO CONCURRENT.
As discussed in the PR, the autoparallellizer currently generates
wrong code. The only feasible way is to disable the annotation for
gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
regression.

Regression-tested. OK for trunk?



Yes.


Thanks for the review!

However, just as I was about to commit, I thought of something else
and, hopefully, better.

The underlying problem is that the annotation does not go together
with -ftree-parallelize-loops - so let's simply not set it if that
flag is set.

This way, we can avoid speed regressions for people who do not use
-ftree-parallelize-loops, and it will be possible to downgrade the
PR to a missed-optimization bug; it is also not necessary to xfail
vect-do-concurrent-1.f90

Regression-tested. OK for trunk, for this version?


2018-04-09  Thomas Koenig  

PR fortran/83064
* trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
parallell processing of DO CONCURRENT -ftree-parallelize-loops
is set.

2018-04-09  Thomas Koenig  

PR fortran/83064
* gfortran.dg/do_concurrent_5.f90: New test.
* gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus
message.
Index: fortran/trans-stmt.c
===
--- fortran/trans-stmt.c	(Revision 259222)
+++ fortran/trans-stmt.c	(Arbeitskopie)
@@ -3642,7 +3642,10 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr
   /* The exit condition.  */
   cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			  count, build_int_cst (TREE_TYPE (count), 0));
-  if (forall_tmp->do_concurrent)
+
+  /* PR 83064 means that we cannot use the annotation if the
+	 autoparallelizer is active.  */
+  if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node,
   annot_expr_parallel_kind),
Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90
===
--- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Revision 259222)
+++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Arbeitskopie)
@@ -12,4 +12,3 @@ subroutine test(n, a, b, c)
 end subroutine test
 
 ! { dg-message "loop vectorized" "" { target *-*-* } 0 }
-! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 }
! { dg-do  run }
! PR 83064 - this used to give wrong results.
! { dg-options "-O3 -ftree-parallelize-loops=2" }
! Original test case by Christian Felter

program main
use, intrinsic :: iso_fortran_env
implicit none

integer, parameter :: nsplit = 4
integer(int64), parameter :: ne = 2000
integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
real(real64), dimension(nsplit) :: pi

edof(1::4) = 1
edof(2::4) = 2
edof(3::4) = 3
edof(4::4) = 4

stride = ceiling(real(ne)/nsplit)
do i = 1, nsplit
high(i) = stride*i
end do
do i = 2, nsplit
low(i) = high(i-1) + 1
end do
low(1) = 1
high(nsplit) = ne

pi = 0
do concurrent (i = 1:nsplit)
pi(i) = sum(compute( low(i), high(i) ))
end do
if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort

contains

pure function compute( low, high ) result( ttt )
integer(int64), intent(in) :: low, high
real(real64), dimension(nsplit) :: ttt
integer(int64) :: j, k

ttt = 0

! Unrolled loop
! do j = low, high, 4
! k = 1
! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
! k = 2
! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )
! k = 3
! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )
! k = 4
! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )
! end do

! Loop with modulo operation
! do j = low, high
! k = mod( j, nsplit ) + 1
! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
! end do

! Loop with subscripting via host association
do j = low, high
k = edof(j)
ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )
end do
end function

end program main

C++ PATCH for c++/85262, ICE with redundant qualification on constructor

2018-04-09 Thread Jason Merrill

A::A names the constructor, but you can't call the constructor
directly, you should just write A(args).  Our error recovery was
failing in a template because we'd wrapped the arguments with
NON_DEPENDENT_EXPR, and then kept the result as part of the template
tree, whereas we're supposed to throw away anything containing
NON_DEPENDENT_EXPR once we've used it to determine types.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4fde624556763974da407ffcf9858d5126730646
Author: Jason Merrill 
Date:   Mon Apr 9 15:16:32 2018 -0400

PR c++/85262 - ICE with redundant qualification on constructor.

* call.c (build_new_method_call_1): Move make_args_non_dependent
after A::A() handling.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index b22a3cc132e..f978ea73f3d 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9104,14 +9104,6 @@ build_new_method_call_1 (tree instance, tree fns, vec **args,
   basetype = TYPE_MAIN_VARIANT (TREE_TYPE (instance));
   gcc_assert (CLASS_TYPE_P (basetype));
 
-  if (processing_template_decl)
-{
-  orig_args = args == NULL ? NULL : make_tree_vector_copy (*args);
-  instance = build_non_dependent_expr (instance);
-  if (args != NULL)
-	make_args_non_dependent (*args);
-}
-
   user_args = args == NULL ? NULL : *args;
   /* Under DR 147 A::A() is an invalid constructor call,
  not a functional cast.  */
@@ -9132,12 +9124,21 @@ build_new_method_call_1 (tree instance, tree fns, vec **args,
   return call;
 }
 
+  if (processing_template_decl)
+{
+  orig_args = args == NULL ? NULL : make_tree_vector_copy (*args);
+  instance = build_non_dependent_expr (instance);
+  if (args != NULL)
+	make_args_non_dependent (*args);
+}
+
   /* Process the argument list.  */
   if (args != NULL && *args != NULL)
 {
   *args = resolve_args (*args, complain);
   if (*args == NULL)
 	return error_mark_node;
+  user_args = *args;
 }
 
   /* Consider the object argument to be used even if we end up selecting a
diff --git a/gcc/testsuite/g++.dg/parse/ctor10.C b/gcc/testsuite/g++.dg/parse/ctor10.C
new file mode 100644
index 000..99d3ca8c18d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/ctor10.C
@@ -0,0 +1,14 @@
+// PR c++/85262
+// { dg-options -fpermissive }
+
+struct A {};
+
+template struct B : A
+{
+  B()
+  {
+A::A(A());			// { dg-warning "constructor" }
+  }
+};
+
+B<0> b;

[PATCH] Assorted -masm=intel fixes (PR target/85281)

2018-04-09 Thread Jakub Jelinek

Hi!

I've tested make check-gcc
RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} i386.exp 
vect.exp'
testing and looked solely at assembly Error: (there are many scan-assembler*
directives that just fail, and some tests use e.g. only att inline asm
etc.).

The following patch fixes what I found that way.  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

The patch fixes:
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors)

BTW, -masm=intel seems to be in quite bad shape even in the assembler, in
various testcases I'm getting errors like on the following reduced one:
int k1, xmm0;
int foo (void) { return k1; }
int bar (void) { return xmm0; }
gcc -masm=intel -O2 
/tmp/cch0mo1K.s: Assembler messages:
/tmp/cch0mo1K.s:10: Error: invalid use of register
/tmp/cch0mo1K.s:21: Error: invalid use of register
As ICC generates the same assembly on the instructions:
mov eax, DWORD PTR k1[rip]
...
mov eax, DWORD PTR xmm0[rip]
I think either the intel syntax spec is faulty, or gas is buggy and should
figure out that after *WORD PTR and before [ there is symbol rather than
register name.  Some testcases e.g. have k1 as function name and that
results in other asm errors (about .size directive).

2018-04-09  Jakub Jelinek  

PR target/85281
* config/i386/sse.md (reduces,
avx512f_vmcmp3,
avx512f_vmcmp3_mask,
avx512f_sgetexp,
avx512f_rndscale,
avx512dq_ranges,
avx512f_vgetmant):
Use %2 instead of %2 for -masm=intel.
(avx512f_vcvtss2usi, avx512f_vcvtss2usiq,
avx512f_vcvttss2usi,
avx512f_vcvttss2usiq): Use %k1 instead of %1 for
-masm=intel.
(avx512f_vcvtsd2usi, avx512f_vcvtsd2usiq,
avx512f_vcvttsd2usi,
avx512f_vcvttsd2usiq, ufloatv2siv2df2):
Use %q1 instead of %1 for -masm=intel.
(avx512f_sfixupimm,
avx512f_sfixupimm_mask): Use %3 instead
of %3 for -masm=intel.
(sse2_shufpd_v2df_mask): Fix a typo, change %{6%} to %{%6%} for
-masm=intel.
(*avx512vl_v2div2qi2_store): Use %w0 instead of %0 for
-masm=intel.
(*avx512vl_v4qi2_store): Use %k0 instead of %0 for
-masm=intel.
(avx512vl_v4qi2_mask_store): Use a single pattern with
%k0 and %1 for -masm=intel rather than two patterns, one with %0 and
%g1.
(*avx512vl_v8qi2_store): Use %q0 instead of %0 for
-masm=intel.
(avx512vl_v8qi2_mask_store): Use a single pattern with
%q0 and %1 for -masm=intel rather than two patterns, one with %0 and
%g1 and one with %0 and %1.
(avx512er_vmrcp28,
avx512er_vmrsqrt28): Use %1 instead of
%1 for -masm=intel.
(avx5124fmaddps_4fmaddps_mask, avx5124fmaddps_4fmaddss_mask,
avx5124fmaddps_4fnmaddps_mask, avx5124fmaddps_4fnmaddss_mask,
avx5124vnniw_vp4dpwssd_mask, avx5124vnniw_vp4dpwssds_mask): Swap order
of %0 and %{%4%} for -masm=intel.
(avx5124fmaddps_4fmaddps_maskz, avx5124fmaddps_4fmaddss_maskz,
avx5124fmaddps_4fnmaddps_maskz, avx5124fmaddps_4fnmaddss_maskz,
avx5124vnniw_vp4dpwssd_maskz, avx5124vnniw_vp4dpwssds_maskz): Swap
order of %0 and %{%5%}%{z%} for -masm=intel.

--- gcc/config/i386/sse.md.jj   2018-04-09 12:05:38.044703296 +0200
+++ gcc/config/i386/sse.md  2018-04-09 15:15:50.033414875 +0200
@@ -2628,7 +2628,7 @@ (define_insn "reduces\t{%3, %2, %1, 
%0|%0, %1, %2, %3}"
+  "vreduce\t{%3, %2, %1, 
%0|%0, %1, %2, %3}"
   [(set_attr "type" "sse")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -2796,7 +2796,7 @@ (define_insn "avx512f_vmcmp3\t{%3, %2, %1, %0|%0, %1, 
%2, %3}"
+  "vcmp\t{%3, %2, %1, %0|%0, %1, 
%2, %3}"
   [(set_attr "type" "ssecmp")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
@@ -2814,7

Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-09 Thread Steve Kargl

On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> 
> the attached patch removes the parallel annotation from DO CONCURRENT.
> As discussed in the PR, the autoparallellizer currently generates
> wrong code. The only feasible way is to disable the annotation for
> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> regression.
> 
> Regression-tested. OK for trunk?
> 

Yes.

-- 
Steve

C++ PATCH for c++/85277, ICE with invalid offsetof

2018-04-09 Thread Jason Merrill

Here, the code was improperly assuming that the member argument must
be a decl at the point of the error, but since we stopped folding
foo[0] to foo, in this case it's an INDIRECT_REF.

And the existing error for INDIRECT_REF is inaccurate for this case,
so let's stick to complaining about asking for offsetof a function.

While I was here, I updated the non-standard-layout diagnostic to say
"conditionally-supported" rather than undefined.

Tested x86_64-cp-linux-gnu, applying to trunk.
commit cb3221a892dc8ea6c06d148b3756d82067644869
Author: Jason Merrill 
Date:   Mon Apr 9 12:51:38 2018 -0400

PR c++/85277 - ICE with invalid offsetof.

* semantics.c (finish_offsetof): Avoid passing non-DECL to %qD.
Adjust -Winvalid-offsetof diagnostic to say conditionally supported.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 59cac77f6b7..8c893ed64b0 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -4043,17 +4043,17 @@ finish_offsetof (tree object_ptr, tree expr, location_t loc)
   || TREE_CODE (TREE_TYPE (expr)) == METHOD_TYPE
   || TREE_TYPE (expr) == unknown_type_node)
 {
-  if (INDIRECT_REF_P (expr))
-	error ("second operand of % is neither a single "
-	   "identifier nor a sequence of member accesses and "
-	   "array references");
-  else
+  while (TREE_CODE (expr) == COMPONENT_REF
+	 || TREE_CODE (expr) == COMPOUND_EXPR)
+	expr = TREE_OPERAND (expr, 1);
+
+  if (DECL_P (expr))
 	{
-	  if (TREE_CODE (expr) == COMPONENT_REF
-	  || TREE_CODE (expr) == COMPOUND_EXPR)
-	expr = TREE_OPERAND (expr, 1);
 	  error ("cannot apply % to member function %qD", expr);
+	  inform (DECL_SOURCE_LOCATION (expr), "declared here");
 	}
+  else
+	error ("cannot apply % to member function");
   return error_mark_node;
 }
   if (TREE_CODE (expr) == CONST_DECL)
@@ -4069,9 +4069,9 @@ finish_offsetof (tree object_ptr, tree expr, location_t loc)
   && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr)))
   && CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr)))
   && cp_unevaluated_operand == 0)
-pedwarn (loc, OPT_Winvalid_offsetof,
-	 "offsetof within non-standard-layout type %qT is undefined",
-	 TREE_TYPE (TREE_TYPE (object_ptr)));
+warning_at (loc, OPT_Winvalid_offsetof, "offsetof within "
+		"non-standard-layout type %qT is conditionally-supported",
+		TREE_TYPE (TREE_TYPE (object_ptr)));
   return fold_offsetof (expr);
 }
 
diff --git a/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C b/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C
index 5c5e9cf246b..cbc2daafbdd 100644
--- a/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C
+++ b/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C
@@ -1,9 +1,11 @@
 // PR c++/51413
-// { dg-options "-w" }
+// PR c++/85277
+// { dg-options "-Wno-pointer-arith" }
 
 struct A
 {
   static void foo();
 };
 
-int i = __builtin_offsetof(A, foo[1]);  // { dg-error "neither a single identifier nor a sequence of member accesses and array references" }
+int i = __builtin_offsetof(A, foo[1]);  // { dg-error "offsetof" }
+int j = __builtin_offsetof(volatile A, foo[0]);  // { dg-error "offsetof" }
diff --git a/gcc/testsuite/g++.dg/other/offsetof3.C b/gcc/testsuite/g++.dg/other/offsetof3.C
index 8d982426560..779fc72670a 100644
--- a/gcc/testsuite/g++.dg/other/offsetof3.C
+++ b/gcc/testsuite/g++.dg/other/offsetof3.C
@@ -12,4 +12,4 @@ protected:
 typedef X* pX;
 typedef __SIZE_TYPE__ size_t;
 
-size_t yoff = __builtin_offsetof (X, y); /* { dg-error "35:non-standard-layout" } */
+size_t yoff = __builtin_offsetof (X, y); /* { dg-message "35:non-standard-layout" } */
diff --git a/gcc/testsuite/g++.dg/other/offsetof5.C b/gcc/testsuite/g++.dg/other/offsetof5.C
index 86b14488246..8514af087ad 100644
--- a/gcc/testsuite/g++.dg/other/offsetof5.C
+++ b/gcc/testsuite/g++.dg/other/offsetof5.C
@@ -9,14 +9,14 @@ struct A
   int 
 };
 
-int j = offsetof (A, i);		// { dg-error "offsetof" }
+int j = offsetof (A, i);		// { dg-message "offsetof" }
 
 template 
 struct S
 {
   T h;
   T 
-  static const int j = offsetof (S, i);	// { dg-error "offsetof" }
+  static const int j = offsetof (S, i);	// { dg-message "offsetof" }
 };
 
 int k = S::j;			// { dg-message "required from here" }
diff --git a/gcc/testsuite/g++.dg/other/offsetof8.C b/gcc/testsuite/g++.dg/other/offsetof8.C
index 0668199b366..211c5127026 100644
--- a/gcc/testsuite/g++.dg/other/offsetof8.C
+++ b/gcc/testsuite/g++.dg/other/offsetof8.C
@@ -9,4 +9,4 @@ struct B: virtual A { };
 int a[]  = {
   !&((B*)0)->i,// { dg-error "invalid access to non-static data member" }
   __builtin_offsetof (B, i)   // { dg-error "invalid access to non-static" }
-};			  // { dg-error "offsetof within non-standard-layout type" "" { target *-*-* } .-1 }
+};			  // { dg-message "offsetof within non-standard-layout type" "" { target *-*-* } .-1 }

Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-09 Thread Jakub Jelinek

On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> Hello world,
> 
> the attached patch removes the parallel annotation from DO CONCURRENT.
> As discussed in the PR, the autoparallellizer currently generates
> wrong code. The only feasible way is to disable the annotation for
> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> regression.

Can't it be turned into annot_expr_ivdep_kind instead of
annot_expr_parallel_kind for GCC8?  I mean, the potential local addressable
variables shouldn't be a problem for vectorization, just autopar, right?

Jakub

[patch, fortran] Remove parallell annotation from DO CONCURRENT

2018-04-09 Thread Thomas Koenig


Hello world,

the attached patch removes the parallel annotation from DO CONCURRENT.
As discussed in the PR, the autoparallellizer currently generates
wrong code. The only feasible way is to disable the annotation for
gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
regression.

Regression-tested. OK for trunk?

Regards

Thomas

2018-04-09  Thomas Koenig  

PR fortran/83064
* trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
parallell processing of DO CONCURRENT.

2018-04-09  Thomas Koenig  

PR fortran/83064
* gfortran.dg/do_concurrent_5.f90: New test.
* gfortran.dg/vect/vect-do-concurrent-1.f90: Xfail. Adjust
dg-bogus message.
Index: fortran/trans-stmt.c
===
--- fortran/trans-stmt.c	(Revision 259222)
+++ fortran/trans-stmt.c	(Arbeitskopie)
@@ -3642,12 +3642,6 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr
   /* The exit condition.  */
   cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			  count, build_int_cst (TREE_TYPE (count), 0));
-  if (forall_tmp->do_concurrent)
-	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
-		   build_int_cst (integer_type_node,
-  annot_expr_parallel_kind),
-		   integer_zero_node);
-
   tmp = build1_v (GOTO_EXPR, exit_label);
   tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
 			 cond, tmp, build_empty_stmt (input_location));
Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90
===
--- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Revision 259222)
+++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Arbeitskopie)
@@ -1,6 +1,8 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_float }
 ! { dg-additional-options "-O3 -fopt-info-vec-optimized" }
+! { xfail *-*-* }
+! PR 83064 - DO CONCURRENT is no longer vectorized for this case.
 
 subroutine test(n, a, b, c)
   integer, value :: n
@@ -12,4 +14,4 @@ subroutine test(n, a, b, c)
 end subroutine test
 
 ! { dg-message "loop vectorized" "" { target *-*-* } 0 }
-! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 }
+!  Restoration of the test case will need  dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 
! { dg-do  run }
! PR 83064 - this used to give wrong results.
! { dg-options "-O3 -ftree-parallelize-loops=2" }
! Original test case by Christian Felter

program main
use, intrinsic :: iso_fortran_env
implicit none

integer, parameter :: nsplit = 4
integer(int64), parameter :: ne = 2000
integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
real(real64), dimension(nsplit) :: pi

edof(1::4) = 1
edof(2::4) = 2
edof(3::4) = 3
edof(4::4) = 4

stride = ceiling(real(ne)/nsplit)
do i = 1, nsplit
high(i) = stride*i
end do
do i = 2, nsplit
low(i) = high(i-1) + 1
end do
low(1) = 1
high(nsplit) = ne

pi = 0
do concurrent (i = 1:nsplit)
pi(i) = sum(compute( low(i), high(i) ))
end do
if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort

contains

pure function compute( low, high ) result( ttt )
integer(int64), intent(in) :: low, high
real(real64), dimension(nsplit) :: ttt
integer(int64) :: j, k

ttt = 0

! Unrolled loop
! do j = low, high, 4
! k = 1
! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
! k = 2
! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )
! k = 3
! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )
! k = 4
! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )
! end do

! Loop with modulo operation
! do j = low, high
! k = mod( j, nsplit ) + 1
! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )
! end do

! Loop with subscripting via host association
do j = low, high
k = edof(j)
ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )
end do
end function

end program main

Re: [C++ PATCH] Fix invalid structured binding range for parsing error recovery (PR c++/85194)

2018-04-09 Thread Jason Merrill

OK.

On Wed, Apr 4, 2018 at 3:48 PM, Jakub Jelinek  wrote:
> Hi!
>
> The maybe_range_for_decl argument is documented on
> cp_parser_simple_declaration as:
>If MAYBE_RANGE_FOR_DECL is not NULL, the pointed tree will be set to the
>parsed declaration if it is an uninitialized single declarator not followed
>by a `;', or to error_mark_node otherwise. Either way, the trailing `;',
>if present, will not be consumed.  */
> but we initialize *maybe_range_for_decl to NULL_TREE at the beginning, to
> detect if we saw more than one decl; in case of a structured binding, we
> only set it to non-NULL on success and thus violate what the comment says,
> and on testcase like below we return is_range_for from the callers because
> it is followed by colon, yet range_decl is NULL and we ICE trying to
> dereference it.  This patch ensures it is error_mark_node instead, it can
> only happen in case of errors.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-04-04  Jakub Jelinek  
>
> PR c++/85194
> * parser.c (cp_parser_simple_declaration): For structured bindings,
> if *maybe_range_for_decl is NULL after parsing it, set it to
> error_mark_node.
>
> * g++.dg/cpp1z/decomp37.C: New test.
>
> --- gcc/cp/parser.c.jj  2018-04-03 18:05:26.0 +0200
> +++ gcc/cp/parser.c 2018-04-04 14:59:53.562859572 +0200
> @@ -12993,8 +12993,14 @@ cp_parser_simple_declaration (cp_parser*
> /* The next token should be either a `,' or a `;'.  */
> cp_token *token = cp_lexer_peek_token (parser->lexer);
> /* If it's a `;', we are done.  */
> -   if (token->type == CPP_SEMICOLON || maybe_range_for_decl)
> +   if (token->type == CPP_SEMICOLON)
>   goto finish;
> +   else if (maybe_range_for_decl)
> + {
> +   if (*maybe_range_for_decl == NULL_TREE)
> + *maybe_range_for_decl = error_mark_node;
> +   goto finish;
> + }
> /* Anything else is an error.  */
> else
>   {
> --- gcc/testsuite/g++.dg/cpp1z/decomp37.C.jj2018-04-04 15:12:33.724191835 
> +0200
> +++ gcc/testsuite/g++.dg/cpp1z/decomp37.C   2018-04-04 15:13:40.803227382 
> +0200
> @@ -0,0 +1,14 @@
> +// PR c++/85194
> +// { dg-do compile { target c++11 } }
> +// { dg-options "" }
> +
> +struct A { int i; };
> +
> +A x[2];
> +
> +void
> +foo ()
> +{
> +  for (auto [i] = A () : x)// { dg-error "initializer in range-based 
> 'for' loop" }
> +;  // { dg-warning "structured bindings only 
> available with" "" { target c++14_down } .-1 }
> +}
>
> Jakub

C++ PATCH for c++/85264, ICE with extra template parameter list

2018-04-09 Thread Jason Merrill

cp_parser_check_template_parameters is supposed to catch when we have
the wrong number of template parameter lists, but it wasn't diagnosing
this case.  Fixed by checking whether the thing we're declaring used a
template-id; the case of declaring a primary template, when we allow
one more template parameter list, uses a plain identifier.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 770f6eaf7320d908cd39ace3aa155e7ec829982d
Author: Jason Merrill 
Date:   Mon Apr 9 14:03:15 2018 -0400

PR c++/85264 - ICE with excess template-parameter-list.

* parser.c (cp_parser_check_template_parameters): Add template_id_p
parameter.  Don't allow an extra template header if true.
(cp_parser_class_head): Pass template_id_p.
(cp_parser_elaborated_type_specifier): Likewise.
(cp_parser_alias_declaration): Likewise.
(cp_parser_check_declarator_template_parameters): Likewise.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0ffa13de537..0e469259008 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2498,7 +2498,7 @@ static tree cp_parser_maybe_treat_template_as_class
 static bool cp_parser_check_declarator_template_parameters
   (cp_parser *, cp_declarator *, location_t);
 static bool cp_parser_check_template_parameters
-  (cp_parser *, unsigned, location_t, cp_declarator *);
+  (cp_parser *, unsigned, bool, location_t, cp_declarator *);
 static cp_expr cp_parser_simple_cast_expression
   (cp_parser *);
 static tree cp_parser_global_scope_opt
@@ -17917,6 +17917,7 @@ cp_parser_elaborated_type_specifier (cp_parser* parser,
 	  if (template_parm_lists_apply
 	  && !cp_parser_check_template_parameters (parser,
 		   /*num_templates=*/0,
+		   /*template_id*/false,
 		   token->location,
 		   /*declarator=*/NULL))
 	return error_mark_node;
@@ -18971,6 +18972,7 @@ cp_parser_alias_declaration (cp_parser* parser)
   if (parser->num_template_parameter_lists
   && !cp_parser_check_template_parameters (parser,
 	   /*num_templates=*/0,
+	   /*template_id*/false,
 	   id_location,
 	   /*declarator=*/NULL))
 return error_mark_node;
@@ -23117,6 +23119,7 @@ cp_parser_class_head (cp_parser* parser,
   /* Make sure that the right number of template parameters were
  present.  */
   if (!cp_parser_check_template_parameters (parser, num_templates,
+	template_id_p,
 	type_start_token->location,
 	/*declarator=*/NULL))
 {
@@ -26391,17 +26394,22 @@ cp_parser_check_declarator_template_parameters (cp_parser* parser,
   {
 	unsigned num_templates = 0;
 	tree scope = declarator->u.id.qualifying_scope;
+	bool template_id_p = false;
 
 	if (scope)
 	  num_templates = num_template_headers_for_class (scope);
 	else if (TREE_CODE (declarator->u.id.unqualified_name)
 		 == TEMPLATE_ID_EXPR)
-	  /* If the DECLARATOR has the form `X' then it uses one
-	 additional level of template parameters.  */
-	  ++num_templates;
+	  {
+	/* If the DECLARATOR has the form `X' then it uses one
+	   additional level of template parameters.  */
+	++num_templates;
+	template_id_p = true;
+	  }
 
 	return cp_parser_check_template_parameters 
-	  (parser, num_templates, declarator_location, declarator);
+	  (parser, num_templates, template_id_p, declarator_location,
+	   declarator);
   }
 
 case cdk_function:
@@ -26430,6 +26438,7 @@ cp_parser_check_declarator_template_parameters (cp_parser* parser,
 static bool
 cp_parser_check_template_parameters (cp_parser* parser,
  unsigned num_templates,
+ bool template_id_p,
  location_t location,
  cp_declarator *declarator)
 {
@@ -26439,7 +26448,8 @@ cp_parser_check_template_parameters (cp_parser* parser,
 return true;
   /* If there are more, but only one more, then we are referring to a
  member template.  That's OK too.  */
-  if (parser->num_template_parameter_lists == num_templates + 1)
+  if (!template_id_p
+  && parser->num_template_parameter_lists == num_templates + 1)
 return true;
   /* If there are more template classes than parameter lists, we have
  something like:
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic176.C b/gcc/testsuite/g++.dg/cpp0x/variadic176.C
new file mode 100644
index 000..1d6e3c2f10a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic176.C
@@ -0,0 +1,10 @@
+// PR c++/85264
+// { dg-do compile { target c++11 } }
+
+template struct A {};
+
+template
+template
+struct A {};// { dg-error "too many template-parameter-lists" }
+
+A a;

Re: [PATCH] Assorted -masm=intel fixes (PR target/85281)

2018-04-09 Thread H.J. Lu

On Mon, Apr 9, 2018 at 11:37 AM, Jakub Jelinek  wrote:
> Hi!
>
> I've tested make check-gcc
> RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} 
> i386.exp vect.exp'
> testing and looked solely at assembly Error: (there are many scan-assembler*
> directives that just fail, and some tests use e.g. only att inline asm
> etc.).
>
> The following patch fixes what I found that way.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?
>
> The patch fixes:
> -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
> -FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors)
>
> BTW, -masm=intel seems to be in quite bad shape even in the assembler, in
> various testcases I'm getting errors like on the following reduced one:
> int k1, xmm0;
> int foo (void) { return k1; }
> int bar (void) { return xmm0; }
> gcc -masm=intel -O2
> /tmp/cch0mo1K.s: Assembler messages:
> /tmp/cch0mo1K.s:10: Error: invalid use of register
> /tmp/cch0mo1K.s:21: Error: invalid use of register
> As ICC generates the same assembly on the instructions:
> mov eax, DWORD PTR k1[rip]
> ...
> mov eax, DWORD PTR xmm0[rip]
> I think either the intel syntax spec is faulty, or gas is buggy and should
> figure out that after *WORD PTR and before [ there is symbol rather than
> register name.  Some testcases e.g. have k1 as function name and that
> results in other asm errors (about .size directive).
>

Hi Jan,

How does Intel syntax support symbols like eax, k1 and xmm0 with
".intel_syntax noprefix"?

-- 
H.J.

Re: [PATCH] v2: Show pertinent parameter (PR c++/85110)

2018-04-09 Thread Jason Merrill

OK.

On Fri, Apr 6, 2018 at 3:54 PM, David Malcolm  wrote:
> On Thu, 2018-03-29 at 09:20 -0400, Jason Merrill wrote:
>> On Wed, Mar 28, 2018 at 4:44 PM, David Malcolm 
>> wrote:
>> > This followup patch updates the specific error-handling path
>> > to add a note showing the pertinent parameter decl, taking
>> > the output from:
>> >
>> > test.cc: In function 'void caller(const char*)':
>> > test.cc:6:14: error: cannot convert 'const char*' to 'const char**'
>> > for argument '2' to 'void callee(int, const char**, int)'
>> >callee (1, fmt, 3);
>> >   ^~~
>> >
>> > to:
>> >
>> > test.cc: In function 'void caller(const char*)':
>> > test.cc:6:14: error: cannot convert 'const char*' to 'const char**'
>> > for argument '2' to 'void callee(int, const char**, int)'
>> >callee (1, fmt, 3);
>> >   ^~~
>> > test.cc:1:36: note:   initializing argument 2 of 'void callee(int,
>> > const char**, int)'
>> >  void callee (int one, const char **two, int three);
>> >~^~~
>> >
>> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds
>> > a further 18 PASS results to g++.sum.
>> >
>> > Again, not a regression as such, but I've been calling out the
>> > underlined
>> > arguments as a feature of gcc 8, so would be good to fix.
>> >
>> > OK for trunk?
>> >
>> > gcc/cp/ChangeLog:
>> > PR c++/85110
>> > * call.c (get_fndecl_argument_location): Make non-static.
>> > * cp-tree.h (get_fndecl_argument_location): New decl.
>> > * typeck.c (convert_for_assignment): When complaining due
>> > to
>> > conversions for an argument, show the location of the
>> > parameter
>> > within the decl.
>> >
>> > gcc/testsuite/ChangeLog:
>> > PR c++/85110
>> > * g++.dg/diagnostic/param-type-mismatch-2.C: Update for the
>> > cases
>> > where we now show the pertinent parameter.
>> > ---
>> >  gcc/cp/call.c   |  2 +-
>> >  gcc/cp/cp-tree.h|  2 ++
>> >  gcc/cp/typeck.c | 10 +++
>> > ---
>> >  .../g++.dg/diagnostic/param-type-mismatch-2.C   | 21
>> > ++---
>> >  4 files changed, 28 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/gcc/cp/call.c b/gcc/cp/call.c
>> > index 1a87f99..e1a0639 100644
>> > --- a/gcc/cp/call.c
>> > +++ b/gcc/cp/call.c
>> > @@ -6598,7 +6598,7 @@ maybe_print_user_conv_context (conversion
>> > *convs)
>> > ARGNUM is zero based, -1 indicates the `this' argument of a
>> > method.
>> > Return the location of the FNDECL itself if there are
>> > problems.  */
>> >
>> > -static location_t
>> > +location_t
>> >  get_fndecl_argument_location (tree fndecl, int argnum)
>> >  {
>> >int i;
>> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>> > index d5382c2..b45880d 100644
>> > --- a/gcc/cp/cp-tree.h
>> > +++ b/gcc/cp/cp-tree.h
>> > @@ -5965,6 +5965,8 @@ extern bool
>> > can_convert_arg   (tree, tree, tree, int,
>> >  tsubst_flags_t);
>> >  extern bool can_convert_arg_bad(tree,
>> > tree, tree, int,
>> >  tsubst_flags_t);
>> > +extern location_t get_fndecl_argument_location  (tree, int);
>> > +
>> >
>> >  /* A class for recording information about access failures (e.g.
>> > private
>> > fields), so that we can potentially supply a fix-it hint about
>> > diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
>> > index e733c79..742b2e9 100644
>> > --- a/gcc/cp/typeck.c
>> > +++ b/gcc/cp/typeck.c
>> > @@ -8781,9 +8781,13 @@ convert_for_assignment (tree type, tree rhs,
>> >parmnum,
>> > complain, flags);
>> > }
>> >   else if (fndecl)
>> > -   error_at (EXPR_LOC_OR_LOC (rhs, input_location),
>> > - "cannot convert %qH to %qI for argument
>> > %qP to %qD",
>> > - rhstype, type, parmnum, fndecl);
>> > +   {
>> > + error_at (EXPR_LOC_OR_LOC (rhs, input_location),
>> > +   "cannot convert %qH to %qI for argument
>> > %qP to %qD",
>> > +   rhstype, type, parmnum, fndecl);
>> > + inform (get_fndecl_argument_location (fndecl,
>> > parmnum),
>> > + "  initializing argument %P of %qD",
>> > parmnum, fndecl);
>>
>> If you're adding the inform, you don't need the %P of %D in the
>> initial error.
>>
>> Jason
>
> Thanks.  Here's an updated version of the patch which removes them.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds
> 18 PASS results to g++.sum.
>
> OK for trunk?
>
> gcc/cp/ChangeLog:
> PR c++/85110
> * call.c (get_fndecl_argument_location): Make non-static.
> * cp-tree.h

Re: [C++ Patch] PR 85227 ("[7/8/ Regression] ICE with structured binding of a forward declared variable")

2018-04-09 Thread Jason Merrill

OK.

On Fri, Apr 6, 2018 at 3:05 PM, Paolo Carlini  wrote:
> Hi,
>
> On 06/04/2018 19:04, Jason Merrill wrote:
>>
>> On Fri, Apr 6, 2018 at 5:01 AM, Paolo Carlini 
>> wrote:
>>>
>>> here, for an incomplete type we ICE pretty soon in
>>> find_decomp_class_base.
>>> Comparing to other cases too, I convinced myself that trying to complete
>>> the
>>> type is Ok. Also, it seems that in these functions we want to talk about
>>> structured binding and use an appropriate location, thus no
>>> complete_type_or_maybe_complain. Tested x86_64-linux.
>>
>> What if, in a template, we defer trying to do bindings to an incomplete
>> type, so
>>
>> extern struct A a;
>>
>> template
>> void f()
>> {
>>auto [x] = a;
>> }
>>
>> struct A { int i; };
>>
>> int main()
>> {
>>f<0>();
>> }
>>
>> works?  Probably with a pedwarn, as in xref_basetypes or
>> cp_parser_dot_deref_incomplete.
>
> Ok... I tested the very simple patch below, wasnt sure between pedwarn (loc,
> 0, ...) and pedwarn (loc, OPT_Wpedantic, ...) but probably we want to former
> in order not to be too permissive (for comparison, clang rejects with an
> hard error both tests). What do you think?
>
> Thanks!
> Paolo.
>
> /

[PATCH] Fix some broadcasts in -masm=intel mode (PR target/85281)

2018-04-09 Thread Jakub Jelinek

Hi!

As the following testcase shows, we emit an incorrect PTR prefix in a
vpbroadcastb instruction in -masm=intel mode; gas accepts and the manual
documents that the input operand is xmm2/m8 for vpbroadcastb and
xmm2/m16 for vpbroadcastw, so we need to use BYTE PTR and WORD PTR instead
of XMMWORD PTR.

The first two hunks are just a simplification, the only reason we couldn't
use  used in many other spots is that it wasn't covering the 512-bit
floating point vectors.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-04-09  Jakub Jelinek  

PR target/85281
* config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases.
(_vec_dup): Use a single pattern for modes
other than V2DFmode using iptr mode attribute.
(_vec_dup): Use iptr mode attribute.

* gcc.target/i386/pr85281.c: New test.

--- gcc/config/i386/sse.md.jj   2018-04-06 19:19:12.098129938 +0200
+++ gcc/config/i386/sse.md  2018-04-09 12:05:38.044703296 +0200
@@ -804,6 +804,7 @@ (define_mode_attr iptr
   [(V64QI "b") (V32HI "w") (V16SI "k") (V8DI "q")
(V32QI "b") (V16HI "w") (V8SI "k") (V4DI "q")
(V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q")
+   (V16SF "k") (V8DF "q")
(V8SF "k") (V4DF "q")
(V4SF "k") (V2DF "q")
(SF "k") (DF "q")])
@@ -17686,10 +17687,7 @@ (define_insn "_vec_dupmode == V2DFmode)
 return "vpbroadcastq\t{%1, %0|%0, %q1}";
 
-  if (GET_MODE_SIZE (GET_MODE_INNER (mode)) == 4)
-return "vbroadcast\t{%1, 
%0|%0, %k1}";
-  else
-return "vbroadcast\t{%1, 
%0|%0, %q1}";
+  return "vbroadcast\t{%1, 
%0|%0, %1}";
 }
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -17702,7 +17700,7 @@ (define_insn "_vec_dup 1 "nonimmediate_operand" "vm")
(parallel [(const_int 0)]]
   "TARGET_AVX512BW"
-  "vpbroadcast\t{%1, %0|%0, %1}"
+  "vpbroadcast\t{%1, %0|%0, 
%1}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
--- gcc/testsuite/gcc.target/i386/pr85281.c.jj  2018-04-09 12:15:57.204757347 
+0200
+++ gcc/testsuite/gcc.target/i386/pr85281.c 2018-04-09 12:17:56.938766026 
+0200
@@ -0,0 +1,15 @@
+/* PR target/85281 */
+/* { dg-do assemble { target avx512bw } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-require-effective-target masm_intel } */
+/* { dg-options "-O -mavx512bw -masm=intel -w" } */
+
+typedef char V __attribute__ ((__vector_size__ (64)));
+
+V
+foo (V v)
+{
+  v[8] /= (unsigned __int128) 0;
+  v[0] -= ~255;
+  return v;
+}

Jakub

Re: Fix PRs 80463, 83972, 83480

2018-04-09 Thread Jakub Jelinek

On Mon, Apr 09, 2018 at 01:30:12PM +0300, Andrey Belevantsev wrote:
> I think that should be fine, that is, as long as the insn moved up through
> all those debug insns, the copy will do that as well.  It's that
> problematic conditional in sched-deps.c that we should take care of.
> 
> I've reworded the comment and committed the attached patch.  Thanks for
> your help.

The C++ testcase FAILs everywhere:
FAIL: g++.dg/pr80463.C  -std=gnu++98 (test for excess errors)
Excess errors:
cc1plus: warning: var-tracking-assignments changes selective scheduling

The other testcases in the patch used -w probably to disable the same
warning, so I've committed following as obvious to trunk after regtesting it
on x86_64-linux and i686-linux:

2018-04-09  Jakub Jelinek  

PR rtl-optimization/80463
* g++.dg/pr80463.C: Add -w to dg-options.

--- gcc/testsuite/g++.dg/pr80463.C.jj   2018-04-09 20:15:47.226631780 +0200
+++ gcc/testsuite/g++.dg/pr80463.C  2018-04-09 20:19:43.783616136 +0200
@@ -1,5 +1,5 @@
 /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
-/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments" } 
*/
+/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments -w" 
} */
 
 int *a;
 int b, c;


Jakub

Re: C++ PATCH for c++/85032, rejects-valid with if constexpr in template

2018-04-09 Thread Jason Merrill

On Fri, Apr 6, 2018 at 2:17 PM, Marek Polacek  wrote:
> On Mon, Mar 26, 2018 at 01:17:22PM -0400, Jason Merrill wrote:
>> On Sat, Mar 24, 2018 at 6:59 AM, Marek Polacek  wrote:
>> > Recently the code in finish_static_assert was changed to use
>> > perform_implicit_conversion_flags followed by fold_non_dependent_expr.  
>> > That
>> > broke this test becase when in a template, p_i_c_f merely wraps the expr in
>> > an IMPLICIT_CONV_EXPR.  fold_non_dependent_expr should be able to fold it 
>> > to
>> > a constant but it gave up because is_nondependent_constant_expression 
>> > returned
>> > false.  Jason suggested to fix this roughly like the following, i.e. 
>> > consider
>> > conversions from classes to literal types potentially constant.
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >
>> > 2018-03-24  Marek Polacek  
>> >
>> > PR c++/85032
>> > * constexpr.c (potential_constant_expression_1): Consider 
>> > conversions
>> > from classes to literal types potentially constant.
>> >
>> > * g++.dg/cpp0x/pr51225.C: Adjust error message.
>> > * g++.dg/cpp1z/constexpr-if17.C: New test.
>> >
>> > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
>> > index bebd9f5b5d0..c4b5afe90a2 100644
>> > --- gcc/cp/constexpr.c
>> > +++ gcc/cp/constexpr.c
>> > @@ -5768,6 +5768,23 @@ potential_constant_expression_1 (tree t, bool 
>> > want_rval, bool strict, bool now,
>> >   TREE_TYPE (t));
>> >   return false;
>> > }
>> > +  /* This might be a conversion from a class to a literal type.  Let's
>> > +consider it potentially constant since the conversion might be
>> > +a constexpr user-defined conversion.  */
>> > +  else if (cxx_dialect >= cxx11
>> > +  && COMPLETE_TYPE_P (TREE_TYPE (t))
>> > +  && literal_type_p (TREE_TYPE (t))
>>
>> We probably need to allow dependent types here, too.  And incomplete
>> classes, which might turn out to be literal later.
>
> Ok, I've allowed incomplete types, too.  And I think the patch also allows
> dependent types.  Or did you mean using
>&& (TREE_TYPE (t) == NULL_TREE
>|| !COMPLETE_TYPE_P (TREE_TYPE (t))
>|| literal_type_p (TREE_TYPE (t)))
> ?  That doesn't seem to be needed.

I meant dependent_type_p (TREE_TYPE (t)).  I suppose checking
COMPLETE_TYPE_P will cover that by accident, but I'd prefer to make it
explicit.

> + /* If this is a dependent type, it could end up being a class
> +with conversions.  */
> + if (type == NULL_TREE || WILDCARD_TYPE_P (type))
> +   return true;
> + /* Or a non-dependent class which has conversions.  */
> + else if (CLASS_TYPE_P (type) && TYPE_HAS_CONVERSION (type))
> +   return true;

And here, a dependent class type like A could fail both of these
tests and still end up with conversions when instantiated.  We should
check dependent_scope_p as well as TYPE_HAS_CONVERSION.

Jason

Re: Add missing cases to vect_get_smallest_scalar_type (PR 85286)

2018-04-09 Thread Jakub Jelinek

On Mon, Apr 09, 2018 at 06:47:45PM +0100, Richard Sandiford wrote:
> In this PR we used WIDEN_SUM_EXPR to vectorise:
> 
>   short i, y;
>   int sum;
>   [...]
>   for (i = x; i > 0; i--)
> sum += y;
> 
> with 4 ints and 8 shorts per vector.  The problem was that we set
> the VF based only on the ints, then calculated the number of vector
> copies based on the shorts, giving 4/8.  Previously that led to
> ncopies==0, but after r249897 we pick it up as an ICE.
> 
> In this particular case we could vectorise the reduction by setting
> ncopies based on the output type rather than the input type, but it
> doesn't seem worth adding a special "optimisation" for such a
> pathological case.  I think it's really an instance of the more general
> problem that we can't vectorise using combinations of (say) 64-bit and
> 128-bit vectors on targets that support both.

We badly need that, there are plenty of PRs where we generate really large
vectorized loop because of it e.g. on x86 where we can easily use 128-bit,
256-bit and 512-bit vectors; but I'm afraid it is not a stage4 material.

Jakub

Add missing cases to vect_get_smallest_scalar_type (PR 85286)

2018-04-09 Thread Richard Sandiford

In this PR we used WIDEN_SUM_EXPR to vectorise:

  short i, y;
  int sum;
  [...]
  for (i = x; i > 0; i--)
sum += y;

with 4 ints and 8 shorts per vector.  The problem was that we set
the VF based only on the ints, then calculated the number of vector
copies based on the shorts, giving 4/8.  Previously that led to
ncopies==0, but after r249897 we pick it up as an ICE.

In this particular case we could vectorise the reduction by setting
ncopies based on the output type rather than the input type, but it
doesn't seem worth adding a special "optimisation" for such a
pathological case.  I think it's really an instance of the more general
problem that we can't vectorise using combinations of (say) 64-bit and
128-bit vectors on targets that support both.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-04-09  Richard Sandiford  

gcc/
PR tree-optimization/85286
* tree-vect-data-refs.c (vect_get_smallest_scalar_type):

gcc/testsuite/
* gcc.dg/vect/pr85286.c: New test.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2018-03-24 10:52:25.616823316 +
+++ gcc/tree-vect-data-refs.c   2018-04-09 18:44:09.676561821 +0100
@@ -132,6 +132,8 @@ vect_get_smallest_scalar_type (gimple *s
 
   if (is_gimple_assign (stmt)
   && (gimple_assign_cast_p (stmt)
+  || gimple_assign_rhs_code (stmt) == DOT_PROD_EXPR
+  || gimple_assign_rhs_code (stmt) == WIDEN_SUM_EXPR
   || gimple_assign_rhs_code (stmt) == WIDEN_MULT_EXPR
   || gimple_assign_rhs_code (stmt) == WIDEN_LSHIFT_EXPR
   || gimple_assign_rhs_code (stmt) == FLOAT_EXPR))
Index: gcc/testsuite/gcc.dg/vect/pr85286.c
===
--- /dev/null   2018-04-08 19:55:28.217132277 +0100
+++ gcc/testsuite/gcc.dg/vect/pr85286.c 2018-04-09 18:44:09.675561881 +0100
@@ -0,0 +1,19 @@
+/* PR tree-optimization/45241 */
+/* { dg-do compile } */
+/* { dg-additional-options "--param scev-max-expr-complexity=0" } */
+
+int
+foo (short x)
+{
+  short i, y;
+  int sum;
+
+  for (i = 0; i < x; i++)
+y = x * i;
+
+  for (i = x; i > 0; i--)
+sum += y;
+
+  return sum;
+}
+

Set insn_last_address in final_1

2018-04-09 Thread Richard Sandiford

final_1 already sets insn_current_address for each instruction, making
it possible to use some of the address functions in final.c during
assembly generation.  This patch also sets insn_last_address, since
as the comment says, we can treat final as a shorten_branches pass that
does nothing.  It's then possible to use insn_current_reference_address
during final as well.

This is needed for the aarch64.md definitions of far_branch to work:

   (set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
  (const_int 0)
  (const_int 1)))]

This value (tested only during final) uses the difference between
the INSN_ADDRESSES of operand 2 and insn_current_reference_address
to calculate a conservatively-correct estimate of the branch distance.
It takes into account the worst-case gap due to alignment, whereas
a direct comparison of INSN_ADDRESSES would give an unreliable,
optimistic result.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Sameera confirms
that it fixes the original bug, but there's no sharable testcase.
OK to install?

Richard


2018-04-09  Richard Sandiford  

gcc/
* final.c (final_1): Set insn_last_address as well as
insn_current_address.

Index: gcc/final.c
===
--- gcc/final.c 2018-04-09 18:40:26.887628387 +0100
+++ gcc/final.c 2018-04-09 18:40:27.033624471 +0100
@@ -2081,6 +2081,9 @@ final_1 (rtx_insn *first, FILE *file, in
}
  else
insn_current_address = INSN_ADDRESSES (INSN_UID (insn));
+ /* final can be seen as an iteration of shorten_branches that
+does nothing (since a fixed point has already been reached).  */
+ insn_last_address = insn_current_address;
}
 
   dump_basic_block_info (file, insn, start_to_bb, end_to_bb,

C++ PATCH for c++/85256, ICE capturing pointer to VLA

2018-04-09 Thread Jason Merrill

Here if we return immediately after the diagnostic about unsupported
variably-modified type capture, we avoid the crash.

I'm also changing the diagnostic from error to sorry, since it would
make sense to support this usage and apparently Clang already does.

The second patch below is the beginning of work for more general
variably-modified type capture, based on the approach of capturing and
remapping all the uses of outer automatic vars in the VLA type when
the VLA variable is captured.  I've also thought about handling this
later, when we actually do something that involves the dimensions, but
currently that's mostly hidden in ARRAY_REF.

First patch tested x86_64-pc-linux-gnu, applying to trunk.
commit c77d362bbe6abdfdf486b061cf2ec5897723dba9
Author: Jason Merrill 
Date:   Sun Apr 8 14:15:35 2018 -0400

PR c++/85256 - ICE capturing pointer to VLA.

* lambda.c (add_capture): Distinguish between variable-size and
variably-modified types.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 374060626c1..e9b962a8f33 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -554,13 +554,13 @@ add_capture (tree lambda, tree id, tree orig_init, bool by_reference_p,
   else if (!dependent_type_p (type)
 	   && variably_modified_type_p (type, NULL_TREE))
 {
-  error ("capture of variable-size type %qT that is not an N3639 array "
+  sorry ("capture of variably-modified type %qT that is not an N3639 array "
 	 "of runtime bound", type);
   if (TREE_CODE (type) == ARRAY_TYPE
 	  && variably_modified_type_p (TREE_TYPE (type), NULL_TREE))
 	inform (input_location, "because the array element type %qT has "
 		"variable size", TREE_TYPE (type));
-  type = error_mark_node;
+  return error_mark_node;
 }
   else
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C
index d4de131fc23..aee9694852d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C
@@ -7,6 +7,6 @@ void f() {
   int m = 1;
   int d[n][m];
   [&]() {
-return d[1];		// { dg-error "variabl" }
+return d[1];		// { dg-prune-output "sorry" }
   }();
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C
new file mode 100644
index 000..eebdbcdbb93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C
@@ -0,0 +1,10 @@
+// PR c++/85256
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -Wno-vla }
+
+void foo(int i)
+{
+  int (*x)[i];
+  [=]{ [=]{ 0 ? x : x; }; };	// { dg-bogus "sorry" "" { xfail *-*-* } }
+
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla7.C b/gcc/testsuite/g++.dg/cpp1y/vla7.C
index df34c8219db..afa5fac508d 100644
--- a/gcc/testsuite/g++.dg/cpp1y/vla7.C
+++ b/gcc/testsuite/g++.dg/cpp1y/vla7.C
@@ -6,7 +6,7 @@ int main(int argc, char** argv)
 {
   int x[1][argc];
 
-  [](int i) {			// { dg-error "variable.size" }
-x[0][i] = 0;	 	// { dg-prune-output "assignment" }
+  [](int i) {			// { dg-prune-output "sorry" }
+x[0][i] = 0;	 	// { dg-prune-output "not captured" }
   }(5);
 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla9.C b/gcc/testsuite/g++.dg/cpp1y/vla9.C
index 939de30a3c1..2c5b3a5404e 100644
--- a/gcc/testsuite/g++.dg/cpp1y/vla9.C
+++ b/gcc/testsuite/g++.dg/cpp1y/vla9.C
@@ -25,7 +25,7 @@ int main(){
 fa[0][1]=1.8;
 auto fx=[&](){
 for(int c=0; c<2; c++){
-printf("use me", fa[0][c]);	// { dg-error "capture of variable-size type" }
+printf("use me", fa[0][c]);	// { dg-prune-output "sorry" }
 }
 };
 call(fx);
commit 2918e20db4270ced5705eabac37162c7068c1be7
Author: Jason Merrill 
Date:   Sun Apr 8 14:19:00 2018 -0400

no-vla

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 204791e51cf..a7db19da65f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -441,7 +441,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
5: CLASS_TYPE_P (in RECORD_TYPE and UNION_TYPE)
   ENUM_FIXED_UNDERLYING_TYPE_P (in ENUMERAL_TYPE)
   AUTO_IS_DECLTYPE (in TEMPLATE_TYPE_PARM)
-  REFERENCE_VLA_OK (in REFERENCE_TYPE)
6: TYPE_DEPENDENT_P_VALID
 
Usage of DECL_LANG_FLAG_?:
@@ -455,7 +454,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   DECL_TEMPLATE_INSTANTIATED (in a VAR_DECL or a FUNCTION_DECL)
   DECL_MEMBER_TEMPLATE_P (in TEMPLATE_DECL)
   USING_DECL_TYPENAME_P (in USING_DECL)
-  DECL_VLA_CAPTURE_P (in FIELD_DECL)
   DECL_ARRAY_PARAMETER_P (in PARM_DECL)
   LABEL_DECL_CONTINUE (in LABEL_DECL)
2: DECL_THIS_EXTERN (in VAR_DECL or FUNCTION_DECL).
@@ -3632,11 +3630,6 @@ extern void decl_shadowed_for_var_insert (tree, tree);
&& (TREE_CODE (TREE_TYPE (TREE_OPERAND ((NODE), 0)))	\
== REFERENCE_TYPE))
 
-/* True if NODE is a REFERENCE_TYPE which is OK to instantiate to be a
-   reference to VLA type,

Re: Improve code quality across partition boundaries

2018-04-09 Thread Richard Biener

On April 9, 2018 4:17:45 PM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch fixes most offending artifacts across the function partition
>bounary with -freorder-blocks-and-partitions which is now on by default
>for i386 (no other targets).  The problem is that most of cfgcleanup
>and
>cfgrtl is disabled on crossing edges and thus we produce pretty ugly
>code.
>
>The comment I removed reffers to non-existing comment, but the fixup
>of partitioning only looks for crossing jumps. It generates new
>patterns for unconditional jumps and ads trampolines for conditional
>jumps if taret requests it (i386 doesn't).
>
>It is thus safe to turn any far jump to near jump (as this patch does)
>and based on target we can do more (this patch doesn't)
>
>For this late of stage4 I think we are fine with enabling edge
>forwarding
>only. This saves about 0.5% of code segment on cc1 (measured with and
>without patch so the code pays back at least) with profiledbootstrap.
>I have patches to fix other issues but I think they can wait for next
>stage1.  In some cases we would get better code with crossjumping
>too, but it does not seem critical.
>
>profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64
>and also
>profiledbootstrapped on power. I am running firefox build with LTO+FDO
>on
>x86_64 too and regtesting on power. OK?

OK. 

Richard. 

>Honza
>
>   PR rtl/84058
>   * cfgcleanup.c (try_forward_edges): Do not give up on crossing
>   jumps; choose last target that matches the criteria (i.e.
>   no partition changes for non-crossing jumps).
>   * cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
>   support for redirecting crossing jumps to non-crossing.
>Index: cfgcleanup.c
>===
>--- cfgcleanup.c   (revision 259167)
>+++ cfgcleanup.c   (working copy)
>@@ -394,19 +394,6 @@
>   edge_iterator ei;
>   edge e, *threaded_edges = NULL;
> 
>-  /* If we are partitioning hot/cold basic blocks, we don't want to
>- mess up unconditional or indirect jumps that cross between hot
>- and cold sections.
>-
>- Basic block partitioning may result in some jumps that appear to
>- be optimizable (or blocks that appear to be mergeable), but which
>really
>- must be left untouched (they are required to make it safely
>across
>- partition boundaries).  See the comments at the top of
>- bb-reorder.c:partition_hot_cold_basic_blocks for complete
>details.  */
>-
>-  if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b)))
>-return false;
>-
>   for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); )
> {
>   basic_block target, first;
>@@ -415,6 +402,7 @@
>   bool threaded = false;
>   int nthreaded_edges = 0;
>   bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0;
>+  bool new_target_threaded = false;
> 
>   /* Skip complex edges because we don't know how to update them.
> 
>@@ -431,29 +419,12 @@
>   counter = NUM_FIXED_BLOCKS;
>   goto_locus = e->goto_locus;
> 
>-  /* If we are partitioning hot/cold basic_blocks, we don't want
>to mess
>-   up jumps that cross between hot/cold sections.
>-
>-   Basic block partitioning may result in some jumps that appear
>-   to be optimizable (or blocks that appear to be mergeable), but which
>-   really must be left untouched (they are required to make it safely
>-   across partition boundaries).  See the comments at the top of
>-   bb-reorder.c:partition_hot_cold_basic_blocks for complete
>-   details.  */
>-
>-  if (first != EXIT_BLOCK_PTR_FOR_FN (cfun)
>-&& JUMP_P (BB_END (first))
>-&& CROSSING_JUMP_P (BB_END (first)))
>-  return changed;
>-
>   while (counter < n_basic_blocks_for_fn (cfun))
>   {
> basic_block new_target = NULL;
>-bool new_target_threaded = false;
> may_thread |= (target->flags & BB_MODIFIED) != 0;
> 
> if (FORWARDER_BLOCK_P (target)
>-&& !(single_succ_edge (target)->flags & EDGE_CROSSING)
> && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun))
>   {
> /* Bypass trivial infinite loops.  */
>@@ -543,8 +514,14 @@
>   break;
> 
> counter++;
>-target = new_target;
>-threaded |= new_target_threaded;
>+/* Do not turn non-crossing jump to crossing.  Depending on target
>+   it may require different instruction pattern.  */
>+if ((e->flags & EDGE_CROSSING)
>+|| BB_PARTITION (first) == BB_PARTITION (new_target))
>+  {
>+target = new_target;
>+threaded |= new_target_threaded;
>+  }
>   }
> 
>   if (counter >= n_basic_blocks_for_fn (cfun))
>Index: cfgrtl.c
>===
>--- cfgrtl.c   (revision 259167)
>+++ cfgrtl.c   (working copy)
>@@ -4361,6 +4361,18 @@
>   if (e->dest == dest)
>

RE: [PATCH] [ARC] Fix stack usage info for naked functions.

2018-04-09 Thread Claudiu Zissulescu

Accepted & committed, thank you for your contribution,
Claudiu

> -Original Message-
> From: Alexey Brodkin [mailto:abrod...@synopsys.com]
> Sent: Friday, April 06, 2018 6:02 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ; Alexey Brodkin
> 
> Subject: [PATCH] [ARC] Fix stack usage info for naked functions.
> 
> When code containing "naked" function gets compiled with '-fstack-usage'
> compiler prints the following warning:
> -->8-
> board/synopsys/hsdk/hsdk.c: In function 'hsdk_core_init_f':
> board/synopsys/hsdk/hsdk.c:345:1: warning: stack usage computation not
> supported for this target
>  }
>  ^
> -->8-
> 
> Even though stack calculation makes no sense for "naked" function
> we still need to correctly report stack size back to make compiler
> happy.
> 
> This problem was caught in U-Boot here:
> https://lists.denx.de/pipermail/u-boot/2018-March/324325.html
> 
> The same fix was done earlier for ARM, see:
> "config/arm/arm.c (arm_expand_prologue): Set the stack usage to 0 "
> https://github.com/gcc-mirror/gcc/commit/023a7c5dd37
> 
> gcc/
> 2018-04-06  Alexey Brodkin 
> 
>   * config/arc/arc.c (arc_expand_prologue): Set stack usage info
>   also for naked functions.
> ---
>  gcc/config/arc/arc.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 32fcb81880a2..3cb4ba5b4dd7 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -3149,7 +3149,11 @@ arc_expand_prologue (void)
> 
>/* Naked functions don't have prologue.  */
>if (ARC_NAKED_P (fn_type))
> -return;
> +{
> +  if (flag_stack_usage_info)
> + current_function_static_stack_size = 0;
> +  return;
> +}
> 
>/* Compute total frame size.  */
>size = arc_compute_frame_size ();
> --
> 2.14.3

RE: [PATCH] [ARC] Add/update combiner patterns.

2018-04-09 Thread Claudiu Zissulescu

Committed. Thank you for your review,
Claudiu

> -Original Message-
> From: Andrew Burgess [mailto:andrew.burg...@embecosm.com]
> Sent: Saturday, April 07, 2018 11:07 AM
> To: Claudiu Zissulescu 
> Cc: gcc-patches@gcc.gnu.org; claudiu.zissule...@synopsys.com;
> francois.bed...@synopsys.com
> Subject: Re: [PATCH] [ARC] Add/update combiner patterns.
> 
> * Claudiu Zissulescu  [2018-03-07 13:59:03 +0200]:
> 
> > From: claziss 
> >
> > Hi Andrew,
> >
> > Please find the following patch which improves generating more
> instructions with .f flag (i.e., compare with zero).
> >
> > Ok to apply?
> 
> Looks good.
> 
> Thanks.
> Andrew
> 
> 
> > Claudiu
> >
> > gcc/
> > 2018-01-26  Claudiu Zissulescu  
> >
> > * config/arc/arc.md (add_shift): New pattern.
> > (add_shift2): Likewise.
> > (sub_shift): Likewise.
> > (sub_shift_cmp0_noout): Likewise.
> > (compare_si_ashiftsi): Likewise.
> > (xbfu_cmp0_noout): New combine pattern.
> > (xbfu_cmp0"): Likewise.
> > (movsi_set_cc_insn): Place the predicable variant first.
> > (commutative_binary_cmp0_noout): Remove clobber.
> > (commutative_binary_cmp0): New pattern.
> > (noncommutative_binary_cmp0): Likewise.
> > (noncommutative_binary_cmp0_noout): Likewise.
> > (noncommutative_binary_comparison_result_used): Removed.
> > (rsub_cmp0): New pattern.
> > (rsub_cmp0_noout): Likewise.
> > (extzvsi): Changed, keep only meaningful variants.
> > (SQH, SEZ): New iterators.
> > (SQH_postfix): New mode attribute.
> > (SEZ_prefix): New code attribute.
> > (xt_cmp0_noout): New instruction
> pattern.
> > (xt_cmp0): Likewise.
> > * config/arc/predicates.md (cc_set_register): Use CC_REG instead
> > of numerical value.
> > (noncommutative_operator): Check the availability of barrel
> > shifter option.
> > ---
> >  gcc/config/arc/arc.md| 274
> ---
> >  gcc/config/arc/predicates.md |   6 +-
> >  2 files changed, 233 insertions(+), 47 deletions(-)
> >
> > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> > index 7633d36b5a6..fb3432964ac 100644
> > --- a/gcc/config/arc/arc.md
> > +++ b/gcc/config/arc/arc.md
> > @@ -810,20 +810,90 @@ archs4x, archs4xd, archs4xd_slow"
> >"st%U0 %1,%0\;st%U0.di %1,%0"
> >[(set_attr "type" "store")])
> >
> > +;; Combiner patterns for compare with zero
> > +(define_mode_iterator SQH [QI HI])
> > +(define_mode_attr SQH_postfix [(QI "b") (HI "%_")])
> > +
> > +(define_code_iterator SEZ [sign_extend zero_extend])
> > +(define_code_attr SEZ_prefix [(sign_extend "sex") (zero_extend "ext")])
> > +
> > +(define_insn "*xt_cmp0_noout"
> > +  [(set (match_operand 0 "cc_set_register" "")
> > +   (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand"
> "r"))
> > +  (const_int 0)))]
> > +  ""
> > +  ".f\\t0,%1"
> > +  [(set_attr "type" "compare")
> > +   (set_attr "cond" "set_zn")])
> > +
> > +(define_insn "*xt_cmp0"
> > +  [(set (match_operand 0 "cc_set_register" "")
> > +   (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand"
> "r"))
> > +  (const_int 0)))
> > +   (set (match_operand:SI 2 "register_operand" "=r")
> > +   (SEZ:SI (match_dup 1)))]
> > +  ""
> > +  ".f\\t%2,%1"
> > +  [(set_attr "type" "compare")
> > +   (set_attr "cond" "set_zn")])
> > +
> > +(define_insn "*xbfu_cmp0_noout"
> > +  [(set (match_operand 0 "cc_set_register" "")
> > +   (compare:CC_Z
> > +(zero_extract:SI
> > + (match_operand:SI 1 "register_operand"  "  r,r")
> > + (match_operand:SI 2 "const_int_operand" "C3p,n")
> > + (match_operand:SI 3 "const_int_operand" "  n,n"))
> > +(const_int 0)))]
> > +  "TARGET_HS && TARGET_BARREL_SHIFTER"
> > +  {
> > +   int assemble_op2 = (((INTVAL (operands[2]) - 1) & 0x1f) << 5) | (INTVAL
> (operands[3]) & 0x1f);
> > +   operands[2] = GEN_INT (assemble_op2);
> > +   return "xbfu%?.f\\t0,%1,%2";
> > +  }
> > +  [(set_attr "type"   "shift")
> > +   (set_attr "iscompact"  "false")
> > +   (set_attr "length" "4,8")
> > +   (set_attr "predicable" "no")
> > +   (set_attr "cond"   "set_zn")])
> > +
> > +(define_insn "*xbfu_cmp0"
> > +  [(set (match_operand 4 "cc_set_register" "")
> > +   (compare:CC_Z
> > +(zero_extract:SI
> > + (match_operand:SI 1 "register_operand"  "0  ,r,0")
> > + (match_operand:SI 2 "const_int_operand" "C3p,n,n")
> > + (match_operand:SI 3 "const_int_operand" "n  ,n,n"))
> > +(const_int 0)))
> > +   (set (match_operand:SI 0 "register_operand"  "=r,r,r")
> > +   (zero_extract:SI (match_dup 1) (match_dup 2) (match_dup 3)))]
> > +  "TARGET_HS && TARGET_BARREL_SHIFTER"
> > +  {
> > +   int assemble_op2 = (((INTVAL (operands[2]) - 1) & 0x1f) << 5) | (INTVAL
> (operands[3]) & 0x1f);
> > +   operands[2] = GEN_INT (assemble_op2);
> > +   return "xbfu%?.f\\t%0,%1,%2";
> > +  }
> > +  [(set_attr "type"

Re: [PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin

2018-04-09 Thread Thomas Preudhomme


Hi Ramana,

On 06/04/18 17:17, Thomas Preudhomme wrote:



On 06/04/18 17:08, Ramana Radhakrishnan wrote:

On 06/04/2018 16:54, Thomas Preudhomme wrote:

Instruction pattern for setting the FPSCR expects the input value to be
in a register. However, __builtin_arm_set_fpscr expander does not ensure
that this is the case and as a result GCC ICEs when the builtin is
called with a constant literal.

This commit fixes the builtin to force the input value into a register.
It also remove the unneeded volatile in the existing fpscr test and
fixes the function prototype.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-04-06  Thomas Preud'homme  

PR target/85261
* config/arm/arm-builtins.c (arm_expand_builtin): Force input operand
into register.

*** gcc/testsuite/ChangeLog ***

2018-04-06  Thomas Preud'homme  

PR target/85261
* gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with
literal value.  Expect 2 MCR instruction.  Fix function prototype.
Remove volatile keyword.

Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows
no regression.

Is this ok for stage4?

Best regards,

Thomas



(sorry about the duplicate for those who get it)


LGTM, though in this case I would prefer a bootstrap and regression run
as this is automatically exercised most with gcc.dg/atomic_*.c and you
really need this tested on linux than just bare-metal as I'm not sure
how this gets tested on arm-none-eabi.


Oh it is indeed. Didn't realized it was used anywhere. Will start bootstrap 
right away.


Done with --with-arch=armv8-a --with-mode=thumb --with-fpu=neon-vfpv4 
--with-float=hard --enable-languages=c,c++,fortran --with-system-zlib 
--enable-plugins --enable-bootstrap. Testsuite for that GCC does not show any 
regression either.


Ok to commit?





What about earlier branches, have you looked ? This is a silly target
bug and fixes should go back to older branches in this particular case
after baking this on trunk for some time.


GCC 6 and 7 are affected as well and a backport will be done once it has baked 
long enough of course.


Will now bootstrap and regtest against GCC 6 and 7. Will let you know once that 
is finished.


Best regards,

Thomas

Improve code quality across partition boundaries

2018-04-09 Thread Jan Hubicka

Hi,
this patch fixes most offending artifacts across the function partition
bounary with -freorder-blocks-and-partitions which is now on by default
for i386 (no other targets).  The problem is that most of cfgcleanup and
cfgrtl is disabled on crossing edges and thus we produce pretty ugly
code.

The comment I removed reffers to non-existing comment, but the fixup
of partitioning only looks for crossing jumps. It generates new
patterns for unconditional jumps and ads trampolines for conditional
jumps if taret requests it (i386 doesn't).

It is thus safe to turn any far jump to near jump (as this patch does)
and based on target we can do more (this patch doesn't)

For this late of stage4 I think we are fine with enabling edge forwarding
only. This saves about 0.5% of code segment on cc1 (measured with and
without patch so the code pays back at least) with profiledbootstrap.
I have patches to fix other issues but I think they can wait for next
stage1.  In some cases we would get better code with crossjumping
too, but it does not seem critical.

profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64 and also
profiledbootstrapped on power. I am running firefox build with LTO+FDO on
x86_64 too and regtesting on power. OK?

Honza

PR rtl/84058
* cfgcleanup.c (try_forward_edges): Do not give up on crossing
jumps; choose last target that matches the criteria (i.e.
no partition changes for non-crossing jumps).
* cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
support for redirecting crossing jumps to non-crossing.
Index: cfgcleanup.c
===
--- cfgcleanup.c(revision 259167)
+++ cfgcleanup.c(working copy)
@@ -394,19 +394,6 @@
   edge_iterator ei;
   edge e, *threaded_edges = NULL;
 
-  /* If we are partitioning hot/cold basic blocks, we don't want to
- mess up unconditional or indirect jumps that cross between hot
- and cold sections.
-
- Basic block partitioning may result in some jumps that appear to
- be optimizable (or blocks that appear to be mergeable), but which really
- must be left untouched (they are required to make it safely across
- partition boundaries).  See the comments at the top of
- bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
-
-  if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b)))
-return false;
-
   for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); )
 {
   basic_block target, first;
@@ -415,6 +402,7 @@
   bool threaded = false;
   int nthreaded_edges = 0;
   bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0;
+  bool new_target_threaded = false;
 
   /* Skip complex edges because we don't know how to update them.
 
@@ -431,29 +419,12 @@
   counter = NUM_FIXED_BLOCKS;
   goto_locus = e->goto_locus;
 
-  /* If we are partitioning hot/cold basic_blocks, we don't want to mess
-up jumps that cross between hot/cold sections.
-
-Basic block partitioning may result in some jumps that appear
-to be optimizable (or blocks that appear to be mergeable), but which
-really must be left untouched (they are required to make it safely
-across partition boundaries).  See the comments at the top of
-bb-reorder.c:partition_hot_cold_basic_blocks for complete
-details.  */
-
-  if (first != EXIT_BLOCK_PTR_FOR_FN (cfun)
- && JUMP_P (BB_END (first))
- && CROSSING_JUMP_P (BB_END (first)))
-   return changed;
-
   while (counter < n_basic_blocks_for_fn (cfun))
{
  basic_block new_target = NULL;
- bool new_target_threaded = false;
  may_thread |= (target->flags & BB_MODIFIED) != 0;
 
  if (FORWARDER_BLOCK_P (target)
- && !(single_succ_edge (target)->flags & EDGE_CROSSING)
  && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun))
{
  /* Bypass trivial infinite loops.  */
@@ -543,8 +514,14 @@
break;
 
  counter++;
- target = new_target;
- threaded |= new_target_threaded;
+ /* Do not turn non-crossing jump to crossing.  Depending on target
+it may require different instruction pattern.  */
+ if ((e->flags & EDGE_CROSSING)
+ || BB_PARTITION (first) == BB_PARTITION (new_target))
+   {
+ target = new_target;
+ threaded |= new_target_threaded;
+   }
}
 
   if (counter >= n_basic_blocks_for_fn (cfun))
Index: cfgrtl.c
===
--- cfgrtl.c(revision 259167)
+++ cfgrtl.c(working copy)
@@ -4361,6 +4361,18 @@
   if (e->dest == dest)
 return e;
 
+  if (e->flags & EDGE_CROSSING
+  && BB_PARTITION (e->src) == BB_PARTITION (dest)
+  && simplejump_p (BB_END (src)))
+{
+  if (dump_file)
+

Re: [PATCH] Be more carefull about DECL merging in LTO (PR lto/85248).

2018-04-09 Thread Richard Biener

On Mon, Apr 9, 2018 at 3:55 PM, Martin Liška  wrote:
> Hi.
>
> Following patch adds checking of noreturn attribute when we merge DECLs in 
> LTO.
> Note that proper fix is probably 
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43884=diff
> but that would be applicable after MPX is removed. Thus in next stage1.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

+ return false;
+   }
+
 }

excessive vertical space.

Otherwise OK.

Thanks,
Richard.

> Ready to be installed?
> Martin
>
> gcc/lto/ChangeLog:
>
> 2018-04-06  Richard Biener  
> Martin Liska  
>
> PR lto/85248
> * lto-symtab.c (lto_symtab_merge_p): Handle noreturn attribute.
>
> gcc/testsuite/ChangeLog:
>
> 2018-04-06  Jakub Jelinek  
>
> PR lto/85248
> * gcc.dg/lto/pr85248_0.c: New test.
> * gcc.dg/lto/pr85248_1.c: New test.
> ---
>  gcc/lto/lto-symtab.c | 17 ++
>  gcc/testsuite/gcc.dg/lto/pr85248_0.c | 45 
> 
>  gcc/testsuite/gcc.dg/lto/pr85248_1.c |  9 
>  3 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_0.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_1.c
>
>

[PATCH] Be more carefull about DECL merging in LTO (PR lto/85248).

2018-04-09 Thread Martin Liška

Hi.

Following patch adds checking of noreturn attribute when we merge DECLs in LTO.
Note that proper fix is probably 
https://gcc.gnu.org/bugzilla/attachment.cgi?id=43884=diff
but that would be applicable after MPX is removed. Thus in next stage1.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Martin

gcc/lto/ChangeLog:

2018-04-06  Richard Biener  
Martin Liska  

PR lto/85248
* lto-symtab.c (lto_symtab_merge_p): Handle noreturn attribute.

gcc/testsuite/ChangeLog:

2018-04-06  Jakub Jelinek  

PR lto/85248
* gcc.dg/lto/pr85248_0.c: New test.
* gcc.dg/lto/pr85248_1.c: New test.
---
 gcc/lto/lto-symtab.c | 17 ++
 gcc/testsuite/gcc.dg/lto/pr85248_0.c | 45 
 gcc/testsuite/gcc.dg/lto/pr85248_1.c |  9 
 3 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_1.c


diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
index 4f186aed059..fbad96c0707 100644
--- a/gcc/lto/lto-symtab.c
+++ b/gcc/lto/lto-symtab.c
@@ -572,6 +572,9 @@ lto_symtab_merge_p (tree prevailing, tree decl)
 	  return false;
 	}
 }
+
+  /* FIXME: after MPX is removed, use flags_from_decl_or_type
+ function instead.  PR lto/85248.  */
   if (DECL_ATTRIBUTES (prevailing) != DECL_ATTRIBUTES (decl))
 {
   tree prev_attr = lookup_attribute ("error", DECL_ATTRIBUTES (prevailing));
@@ -599,6 +602,20 @@ lto_symtab_merge_p (tree prevailing, tree decl)
 		 "warning attribute mismatch\n");
 	  return false;
 	}
+
+  prev_attr = lookup_attribute ("noreturn", DECL_ATTRIBUTES (prevailing));
+  attr = lookup_attribute ("noreturn", DECL_ATTRIBUTES (decl));
+  if ((prev_attr == NULL) != (attr == NULL)
+	  || (prev_attr
+	  && TREE_VALUE (TREE_VALUE (prev_attr))
+		 != TREE_VALUE (TREE_VALUE (attr
+	{
+  if (symtab->dump_file)
+	fprintf (symtab->dump_file, "Not merging decls; "
+		 "noreturn attribute mismatch\n");
+	  return false;
+	}
+
 }
   return true;
 }
diff --git a/gcc/testsuite/gcc.dg/lto/pr85248_0.c b/gcc/testsuite/gcc.dg/lto/pr85248_0.c
new file mode 100644
index 000..df61ac976a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr85248_0.c
@@ -0,0 +1,45 @@
+/* PR lto/85248 */
+/* { dg-lto-do run } */
+/* { dg-lto-options { { -flto -O2 } } } */
+
+extern void test_alias (int s, int e) __asm__ (__USER_LABEL_PREFIX__ "test");
+extern void test_noreturn (int s, int e) __asm__ (__USER_LABEL_PREFIX__ "test")
+  __attribute__ ((__noreturn__));
+
+extern inline __attribute__ ((__always_inline__, __gnu_inline__)) void
+test (int s, int e)
+{
+  if (__builtin_constant_p (s) && s != 0)
+test_noreturn (s, e);
+  else
+test_alias (s, e);
+}
+
+int
+foo (void)
+{
+  static volatile int a;
+  return a;
+}
+
+static void
+bar (void)
+{
+  test (0, 1);
+  __builtin_exit (0);
+}
+
+static void
+baz ()
+{
+  test (1, 0);
+}
+
+int
+main ()
+{
+  if (foo ())
+baz ();
+  bar ();
+  __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr85248_1.c b/gcc/testsuite/gcc.dg/lto/pr85248_1.c
new file mode 100644
index 000..418c0697028
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr85248_1.c
@@ -0,0 +1,9 @@
+/* { dg-options "-fno-lto" } */
+
+__attribute__((__noipa__)) void
+test (int s, int e)
+{
+  asm volatile ("" : "+g" (s), "+g" (e) : : "memory");
+  if (s)
+__builtin_abort ();
+}

Re: [PATCH v2, rs6000] Tidy implementation of vec_ldl

2018-04-09 Thread Segher Boessenkool

Hi!

On Wed, Apr 04, 2018 at 10:32:58AM -0500, Kelvin Nilsen wrote:
> 2018-04-03  Kelvin Nilsen  
> 
>     * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
>     erroneous entries for
>     "vector int vec_ldl (int, long int *)", and
>     "vector unsigned int vec_ldl (int, unsigned long int *)".
>     Add comments and entries for
>     "vector bool char vec_ldl (int, bool char *)",
>     "vector bool short vec_ldl (int, bool short *)",
>     "vector bool int vec_ldl (int, bool int *)",
>     "vector bool long long vec_ldl (int, bool long long *)",
>     "vector pixel vec_ldl (int, pixel *)",
>     "vector long long vec_ldl (int, long long *)",
>     "vector unsigned long long vec_ldl (int, unsigned long long *)".
>     * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new
>     type tree bool_long_long_type_node and correct definition of
>     bool_V2DI_type_node to make reference to this new type tree.
>     (rs6000_mangle_type): Replace erroneous reference to
>     bool_long_type_node with bool_long_long_type_node.
>     * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
>     comments to emphasize sign distinctions for char and int types and
>     replace RS6000_BTI_bool_long constant with
>     RS6000_BTI_bool_long_long constant.  Also add comment to restrict
>     use of RS6000_BTI_pixel.
>     (bool_long_type_node): Remove this macro definition.
>     (bool_long_long_type_node): New macro definition
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-04-03  Kelvin Nilsen  
> 
>     * gcc.target/powerpc/vec-ldl-1.c: New test.
>     * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect
>     corrections to ABI implementation.

> --- gcc/config/rs6000/rs6000-c.c    (revision 258800)
> +++ gcc/config/rs6000/rs6000-c.c    (working copy)
> @@ -1656,27 +1656,45 @@ const struct altivec_builtin_types altivec_overloa
>  RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
>    { ALTIVEC_BUILTIN_VEC_LVEBX, ALTIVEC_BUILTIN_LVEBX,
>  RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 
> },

You sent this as format=flowed, which a) makes replies look funny like
this, but also b) makes it impossible to apply your patches for someone
else.  Please fix this (for future patches, no need to resend this).

> --- gcc/config/rs6000/rs6000.h    (revision 258800)
> +++ gcc/config/rs6000/rs6000.h    (working copy)
> @@ -2578,7 +2578,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_opaque_V2SF,
>    RS6000_BTI_opaque_p_V2SI,
>    RS6000_BTI_opaque_V4SI,
> -  RS6000_BTI_V16QI,
> +  RS6000_BTI_V16QI,  /* __vector signed char */
>    RS6000_BTI_V1TI,
>    RS6000_BTI_V2SI,
>    RS6000_BTI_V2SF,
> @@ -2588,7 +2588,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_V4SI,
>    RS6000_BTI_V4SF,
>    RS6000_BTI_V8HI,
> -  RS6000_BTI_unsigned_V16QI,
> +  RS6000_BTI_unsigned_V16QI, /* __vector unsigned char */
>    RS6000_BTI_unsigned_V1TI,
>    RS6000_BTI_unsigned_V8HI,
>    RS6000_BTI_unsigned_V4SI,

I don't think these comments help anything.


Okay for trunk.  Thanks!


Segher

[PATCH] Fix PR85284

2018-04-09 Thread Richard Biener


The following fixes the new capability of analyzing a wrapping IV as
non-wrapping under a condition that ensures we do not wrap by
restricting it to the case where the exit test must be executed
when it reaches that bound.  Otherwise we may not analyze the
IV with the assumption that the exit test triggers exactly once.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-04-09  Richard Biener  

PR tree-optimization/85284
* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
Only use the niter constraining form of simple_iv when the exit
is always executed.

* gcc.dg/torture/pr85284.c: New testcase.

Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 259227)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -2356,11 +2356,11 @@ number_of_iterations_exit_assumptions (s
 
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
- op0, , _niters, false))
+ op0, , safe ? _niters : NULL, false))
 return false;
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
- op1, , _niters, false))
+ op1, , safe ? _niters : NULL, false))
 return false;
   /* Give up on complicated case.  */
   if (iv0_niters && iv1_niters)
Index: gcc/testsuite/gcc.dg/torture/pr85284.c
===
--- gcc/testsuite/gcc.dg/torture/pr85284.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr85284.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+
+static int p[48], v;
+
+int
+main ()
+{
+  p[32] = 1;
+  for (int i = 48; i--;)
+{
+  if (!p[i])
+   continue;
+  if ((i & 7) > 2)
+   break;
+  v = i & 1;
+}
+  if (v != 0)
+__builtin_abort ();
+  return 0;
+}

[nvptx, PR84041] Add memory_barrier insn

2018-04-09 Thread Tom de Vries


Hi,

we've been having hanging OpenMP tests for nvptx offloading: 
for-{3,5,6}.c and the corresponding C++ test-cases.


The failures have now been analyzed down to gomp_ptrlock_get in 
libgomp/config/nvptx/ptrlock.h:

...
 static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
  uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
  if (v > 2)
return (void *) v;

  if (v == 0
  && __atomic_compare_exchange_n (ptrlock, , 1, false,
  MEMMODEL_ACQUIRE,
  MEMMODEL_ACQUIRE))
return NULL;

  while (v == 1)
v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);

  return (void *) v;
}
...

There's no atomic load insn defined for nvptx, and also no memory 
barrier insn, so the atomic load ends up generating a normal load. The 
JIT compiler does loop-invariant code motion, and moves the load out of 
the loop, which turns the while into an eternal loop.



Fix conservatively by defining the memory_barrier insn. This can 
possibly be fixed more optimally by implementing an atomic load 
operation in nvptx.


Build x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to stage4 trunk.

Thanks,
- Tom
[nvptx] Add memory_barrier insn

2018-04-09  Tom de Vries  

	PR target/84041
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
	(define_expand "*memory_barrier"): New define_expand.
	(define_insn "memory_barrier"): New insn.

---
 gcc/config/nvptx/nvptx.md | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@
UNSPECV_CAS
UNSPECV_XCHG
UNSPECV_BARSYNC
+   UNSPECV_MEMBAR
UNSPECV_DIM_POS
 
UNSPECV_FORK
@@ -1459,6 +1460,27 @@
   "\\tbar.sync\\t%0;"
   [(set_attr "predicable" "false")])
 
+(define_expand "memory_barrier"
+  [(set (match_dup 0)
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use.  For now, use membar.sys.
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0 "" "")
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+  "\\tmembar.sys;"
+  [(set_attr "predicable" "false")])
+
 (define_insn "nvptx_nounroll"
   [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
   ""

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-09 Thread Martin Liška

Hi.

I'm sending updated version of the patch.

Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives 
regression tests.

Martin
>From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c  | 15 ++-
 gcc/config.gcc  |  1 +
 gcc/config/i386/i386.c  |  5 
 gcc/config/i386/linux.h |  2 ++
 gcc/config/i386/linux64.h   |  2 ++
 gcc/config/i386/t-linux |  6 +
 gcc/config/i386/x86-linux.c | 54 +
 gcc/config/linux-protos.h   |  1 +
 gcc/coretypes.h |  7 +
 gcc/doc/tm.texi |  4 +++
 gcc/doc/tm.texi.in  |  1 +
 gcc/expr.c  | 11 +++-
 gcc/expr.h  |  3 ++-
 gcc/target.def  |  7 +
 gcc/targhooks.c |  9 +++
 gcc/targhooks.h |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 17 files changed, 129 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+ In situations when a libc library provides fast implementation
+ of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+= (endp == 1
+   && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+   && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
  CALL_EXPR_TAILCALL (exp)
  && (endp == 0 || target == const0_rtx)
  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
  expected_align, expected_size,
- min_size, max_size, probable_max_size);
+ min_size, max_size, probable_max_size,
+ avoid_libcall ? _avoided : NULL);
+
+  if (libcall_avoided)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..6445ff569b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 	x86_64-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
 		extra_options="${extra_options} linux-android.opt"
+		extra_objs="${extra_objs} x86-linux.o"
 	  	;;
 	x86_64-*-kfreebsd*-gnu)
 		tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index 69f97f15b0d..6c59cbd6d62 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.

Re: [PATCH] MIPS/GCC: Mark text contents as code or data

2018-04-09 Thread Maciej W. Rozycki

Hi Paul,

 Apologies for the delay in reply, my @imgtec.com address has since become 
defunct and I have only come across your e-mail in my ever-lagging routine 
mailing traffic review.

On Wed, 14 Mar 2018, Paul Hua wrote:

> I noticed that data-sym-pool.c fails on -O0 flags.
> 
> -O0 output :
> -cut--
> frob:
> .frame  $17,8,$31   # vars= 0, regs= 1/0, args= 0, gp= 0
> .mask   0x0002,0
> .fmask  0x,0
> addiu   $sp,-8
> sd  $17,0($sp)
> move$17,$sp
> lw  $2,$L4
> move$sp,$17
> ld  $17,0($sp)
> addiu   $sp,8
> jr  $31
> .type   __pool_frob_3, @object
> __pool_frob_3:
> .align  2
> $L3:
> .word   __gnu_local_gp
> $L4:
> .word   305419896
> .type   __pend_frob_3, @function
> __pend_frob_3:
> .insn
> .endfrob
> .size   frob, .-frob
> .ident  "GCC: (gcc trunk r258495 mips64el o32 n32 n64) 8.0.1
> 20180313 (experimental)"
> -end--
> 
> Is it expected ? maybe we should add skip-if  -O0 flags.

 Thanks for your report.

 I think instead we should relax the matching pattern and also accept an 
arbitrary number of label-`.word' pairs ahead of and/or after the required 
one in the constant pool.  I'll post a fix.

  Maciej

Re: [nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp (fwd)

2018-04-09 Thread Richard Biener


Somehow ended up off-list only.

-- Forwarded message --
Date: Mon, 9 Apr 2018 09:30:15 +0200 (CEST)
From: Richard Biener 
To: Gerald Pfeifer 
Subject: Re: [nvptx,
PR81352] Add exit insn after noreturn call for neutered threads in warp

On Sun, 8 Apr 2018, Gerald Pfeifer wrote:

> On Wed, 24 Jan 2018, Tom de Vries wrote:
> > On 01/24/2018 12:53 PM, Richard Biener wrote:
> >> wrong-code bugs qualify for stage4 if a fix isn't too invasive.  Target
> >> maintainers have an extra say to override stage4 rules anyway and for
> >> non-primary/secondary targets nobody cares anyway.
> > Maybe then we should be more clear then in formulation of stage 4 criteria?
> 
> Richi?  I did not see anyone approve or reject Tom's patch, and it
> did not make it into the tree.
> 
> What is your take?  (It's fine as far as wwwdocs go, but this really
> is a question on the RMs.)
> 
> Gerald
> 
> > [ Change validated as XHTML 1.0 Transitional ]
> > 
> > Index: htdocs/develop.html
> > ===
> > RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v
> > retrieving revision 1.178
> > diff -u -r1.178 develop.html
> > --- htdocs/develop.html 15 Jan 2018 08:23:26 -  1.178
> > +++ htdocs/develop.html 24 Jan 2018 13:40:30 -
> > @@ -130,10 +130,10 @@
> >  Stage 4
> > 
> >  During this period, the only (non-documentation) changes that may
> > -be made are changes that fix regressions.  Other changes may not be
> > -done during this period.  Note that the same constraints apply
> > -to release branches.  This period lasts until stage 1 opens for
> > -the next release.
> > +be made are changes that fix regressions, or that fix wrong-code bugs
> > +in a non-invasive way.  Other changes may not be done during this
> > +period.  Note that the same constraints apply to release branches.
> > +This period lasts until stage 1 opens for the next release.

Given "non-invasive way" applies generally, also to regression fixes
I'd just reformulate it to include other important bugs but mention
the extra caution expected (not sure if my wording is good).

 During this period, the only (non-documentation) changes that may
 be made are changes that fix regressions.  Other important bugs
 like wrong-code, rejects-valid or build issues may be fixed as well.
 All changes during this period should be done with extra care on
 not introducing new regressions - fixing bugs at all cost is not
 wanted.  Note that the same constraints apply to release branches.
 This period lasts until stage 1 opens for the next release.


> >  Rationale
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Handle empty infinite loops in OpenACC for PR84955

2018-04-09 Thread Richard Biener

On Fri, 6 Apr 2018, Jakub Jelinek wrote:

> On Fri, Apr 06, 2018 at 06:48:52AM -0700, Cesar Philippidis wrote:
> > 2018-04-06  Cesar Philippidis  
> > 
> > PR middle-end/84955
> > 
> > gcc/
> > * cfgloop.c (flow_loops_find): Add assert.
> > * omp-expand.c (expand_oacc_for): Add dummy false branch for
> > tiled basic blocks without omp continue statements.
> > * tree-cfg.c (execute_fixup_cfg): Handle calls to internal
> > functions like regular functions.
> > 
> > libgomp/
> > * testsuite/libgomp.oacc-c-c++-common/pr84955.c: New test.
> > * testsuite/libgomp.oacc-fortran/pr84955.f90: New test.
> 
> I'd like to defer the cfgloop.c and tree-cfg.c changes to Richard, just want 
> to
> mention that:
> 
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -9586,10 +9586,7 @@ execute_fixup_cfg (void)
> >for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
> > {
> >   gimple *stmt = gsi_stmt (gsi);
> > - tree decl = is_gimple_call (stmt)
> > - ? gimple_call_fndecl (stmt)
> > - : NULL;
> > - if (decl)
> > + if (is_gimple_call (stmt))
> 
> This change doesn't affect just internal functions, but also all indirect
> calls through function pointers with const, pure or noreturn attributes.

I think the change is desirable nevertheless.  The question is if we
want to do it at this point in time.

The description of the problem sounds more like LTO writing writing out
loops without previously fixing up state.  So sth like the following
which I'd prefer at this stage (the above hunk is ok for stage1 then).

Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 259227)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -2084,6 +2151,9 @@ output_function (struct cgraph_node *nod
   /* Set current_function_decl and cfun.  */
   push_cfun (fn);
 
+  /* Fixup loops if required to match discovery done in the reader.  */
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+
   /* Make string 0 be a NULL string.  */
   streamer_write_char_stream (ob->string_stream, 0);
 
@@ -2176,12 +2246,13 @@ output_function (struct cgraph_node *nod
   streamer_write_record_start (ob, LTO_null);
 
   output_cfg (ob, fn);
-
-  pop_cfun ();
}
   else
 streamer_write_uhwi (ob, 0);
 
+  loop_optimizer_finalize ();
+  pop_cfun ();
+
   /* Create a section to hold the pickled output of this function.   */
   produce_asm (ob, function);
 


> > --- a/gcc/omp-expand.c
> > +++ b/gcc/omp-expand.c
> > @@ -5439,6 +5439,13 @@ expand_oacc_for (struct omp_region *region, struct 
> > omp_for_data *fd)
> >  
> >   split->flags ^= EDGE_FALLTHRU | EDGE_TRUE_VALUE;
> >  
> > + /* Add a dummy exit for the tiled block when cont_bb is missing.  */
> > + if (cont_bb == NULL)
> > +   {
> > + edge e = make_edge (body_bb, exit_bb, EDGE_FALSE_VALUE);
> > + e->probability = profile_probability::even ();
> > +   }
> 
> I miss here updating of split->probability, if you make e even (the other edge
> needs to have probability of 100%-the probability, i.e. even as well.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Fix PRs 80463, 83972, 83480

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 19:18, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> In these PRs we deal with the dependencies we are forced to make between a
>> debug insn and its previous insn (unless bb head).  In the latter case, if
>> such an insn has been moved, we fixup its movement so it aligns with the
>> sel-sched invariants.  We also carefully adjust seqnos in the case we had a
>> single debug insn left in the block.
> 
> This is OK with overlong lines fixed and the new comment reworded for clarity
> (see below).
> 
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  
>>
>>  PR rtl-optimization/80463
>>  PR rtl-optimization/83972
>>  PR rtl-optimization/83480
>>
>>  * sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the
>> correct producer for the insn.
>>  (tidy_control_flow): Fixup seqnos in case of debug insns.
>>
>>  * gcc.dg/pr80463.c: New test.
>>  * g++.dg/pr80463.C: Likewise.
>>  * gcc.dg/pr83972.c: Likewise.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem 
>> ATTRIBUTE_UNUSED,
>>
>>  /* Note a dependence.  */
>>  static void
>> -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED,
>> +has_dependence_note_dep (insn_t pro,
>>  ds_t ds ATTRIBUTE_UNUSED)
>>  {
>> -  if (!sched_insns_conditions_mutex_p (has_dependence_data.pro,
>> -  VINSN_INSN_RTX
>> (has_dependence_data.con)))
>> +  insn_t real_pro = has_dependence_data.pro;
>> +  insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con);
>> +
>> +  /* We do not allow for debug insns to move through others unless they
>> + are at the start of bb.  Such insns may create bookkeeping copies
>> + that would not be able to move up breaking sel-sched invariants.
> 
> I have trouble parsing this, it seems a word is accidentally between "move up"
> and "breaking". Also the "such" is a bit ambiguous.
> 
>> + Detect that here and allow that movement if we allowed it before
>> + in the first place.  */
>> +  if (DEBUG_INSN_P (real_con)
>> +  && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con))
>> +return;
> 
> Should we be concerned about debug insns appearing in sequence here? E.g. if
> pro and real_con are not consecutive, but all insns in between are debug 
> insns?

I think that should be fine, that is, as long as the insn moved up through
all those debug insns, the copy will do that as well.  It's that
problematic conditional in sched-deps.c that we should take care of.

I've reworded the comment and committed the attached patch.  Thanks for
your help.

Andrey

> 
>> +
>> +  if (!sched_insns_conditions_mutex_p (real_pro, real_con))
>>  {
>>ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where];
>>
>> @@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying)
>>
>>gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU);
>>
>> +  /* We could have skipped some debug insns which did not get removed
>> with the block,
>> + and the seqnos could become incorrect.  Fix them up here.  */
>> +  if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first ||
>> sel_bb_end (xbb) != last))
>> +   {
>> + if (!sel_bb_empty_p (xbb->prev_bb))
>> +   {
>> + int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb));
>> + if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb)))
>> +   for (insn_t insn = sel_bb_head (xbb); insn != first; insn =
>> NEXT_INSN (insn))
>> + INSN_SEQNO (insn) = prev_seqno + 1;
>> +   }
>> +   }
>> +
>>/* It can turn out that after removing unused jump, basic block
>>   that contained that jump, becomes empty too.  In such case
>>   remove it too.  */
>> diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C
>> new file mode 100644
>> index 000..5614c28ca45
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/pr80463.C
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-g -fselective-scheduling2 -O2
>> -fvar-tracking-assignments" } */
>> +
>> +int *a;
>> +int b, c;
>> +void
>> +d ()
>> +{
>> +  for (int e; c; e++)
>> +switch (e)
>> +  {
>> +  case 0:
>> +   a[e] = 1;
>> +  case 1:
>> +   b = 2;
>> +   break;
>> +  default:
>> +   a[e] = 3;
>> +  }
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/pr80463.c b/gcc/testsuite/gcc.dg/pr80463.c
>> new file mode 100644
>> index 000..cebf2fef1f3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr80463.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-g -O2

Re: Fix PR 83913

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 19:10, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issue ended up being fixed the way different from described in the PR.
>>  We do not want to walk away from the invariant "zero SCHED_TIMES -- insn
>> is not scheduled" even for bookkeeping copies (testing showed it trips over
>> asserts designed to catch this).  Rather we choose merging exprs in the way
>> the larger sched-times wins.
> 
> My understanding is this is not a "complete" solution to the problem, and a
> chance for a similar blowup on some other testcase remains. Still, avoiding
> picking the minimum sched-times value should be a good mitigation.

Well, it's not much different with any other situation when we pose a limit
on pipelining with the sched-times values.  At least for now I can't think
of something better.

Adjusted the comment as per your suggestion and committed the attached patch.

Andrey

> 
> Please consider adding a comment that the average sched-times value is taken
> as a compromise to thwart "endless" pipelining of bookkeeping-producing insns
> available anywhere vs. pipelining of useful insns, or something like that?
> 
> OK with that considered/added.
> 
>>
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  
>>
>>  PR rtl-optimization/83913
>>
>>  * sel-sched-ir.c (merge_expr_data): Choose the middle between two
>> different sched-times
>>  when merging exprs.
>>
>>  * gcc.dg/pr83913.c: New test.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t
>> split_point)
>>if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from))
>>  EXPR_PRIORITY (to) = EXPR_PRIORITY (from);
>>
>> -  if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from))
>> -EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from);
>> +  if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from))
>> +EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES 
>> (to)
>> + + 1) / 2);
>>
>>if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from))
>>  EXPR_ORIG_BB_INDEX (to) = 0;
>> @@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem 
>> ATTRIBUTE_UNUSED,
>>
>>  /* Note a dependence.  */
>>  static void
>> diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c
>> new file mode 100644
>> index 000..c898d71a261
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr83913.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling
>> -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate
>> -fno-rerun-cse-after-loop -fno-web" } */
>> +
>> +int jo, z4;
>> +
>> +int
>> +be (long unsigned int l7, int nt)
>> +{
>> +  int en;
>> +
>> +  jo = l7;
>> +  for (en = 0; en < 24; ++en)
>> +{
>> +  jo = (jo / z4) * (!!jo >= ((!!nt) & 2));
>> +  if (jo == 0)
>> +nt = 0;
>> +  else
>> +{
>> +  nt = z4;
>> +  ++z4;
>> +  nt = (long unsigned int) nt == (l7 + 1);
>> +}
>> +}
>> +
>> +  return nt;
>> +}
>>
>>
>>

Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 259229)
--- gcc/ChangeLog   (revision 259230)
***
*** 1,5 
--- 1,12 
  2018-04-09  Andrey Belevantsev  
  
+   PR rtl-optimization/83913
+ 
+   * sel-sched-ir.c (merge_expr_data): Choose the middle between two
+   different sched-times when merging exprs.
+ 
+ 2018-04-09  Andrey Belevantsev  
+ 
PR rtl-optimization/83962
  
* sel-sched-ir.c (tidy_control_flow): Correct the order in which we call
Index: gcc/testsuite/gcc.dg/pr83913.c
===
*** gcc/testsuite/gcc.dg/pr83913.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr83913.c  (revision 259230)
***
*** 0 
--- 1,26 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O2 -funroll-all-loops -fselective-scheduling 
-fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate 
-fno-rerun-cse-after-loop -fno-web" } */
+ 
+ int jo, z4;
+ 
+ int
+ be (long unsigned int l7, int nt)
+ {
+   int en;
+ 
+   jo = l7;
+   for (en = 0; en < 24; ++en)
+ {
+   jo = (jo / z4) * (!!jo >= ((!!nt) & 2));
+   if (jo == 0)
+ nt = 0;
+   else
+ {
+   nt = z4;
+   ++z4;
+   nt = (long unsigned int) nt == (l7 + 1);
+ }
+ }
+ 
+   return nt;
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog (revision 259229)
---

Re: Fix PR 83962

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 18:59, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issues is about the correct order in which we need to call the
>> routines that fix up the control flow for us.
> 
> OK with formatting both in the new comment and the Changelog fixed.

Thanks, fixed that in rev. 259229.

Andrey

> 
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  
>>
>>  PR rtl-optimization/83962
>>
>>  * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call
>> tidy_fallthru_edge
>>  and tidy_control_flow.
>>
>>  * gcc.dg/pr83962.c: New test.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying)
>>&& INSN_SCHED_TIMES (BB_END (xbb)) == 0
>>&& !IN_CURRENT_FENCE_P (BB_END (xbb)))
>>  {
>> -  if (sel_remove_insn (BB_END (xbb), false, false))
>> -return true;
>> +  /* We used to call sel_remove_insn here that can trigger
>> tidy_control_flow
>> + before we fix up the fallthru edge.  Correct that ordering by
>> +explicitly doing the latter before the former.  */
>> +  clear_expr (INSN_EXPR (BB_END (xbb)));
>>tidy_fallthru_edge (EDGE_SUCC (xbb, 0));
>> +  if (tidy_control_flow (xbb, false))
>> +   return true;
>>  }
>>
>>first = sel_bb_head (xbb);
>> diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c
>> new file mode 100644
>> index 000..0547e218715
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr83962.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2
>> -fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */
>> +unsigned int ca;
>> +
>> +void
>> +v6 (long long unsigned int as, int p9)
>> +{
>> +  while (p9 < 1)
>> +as = (as != ca) || (as > 1);
>> +}
>>

Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround

2018-04-09 Thread Gerald Pfeifer

On Mon, 9 Apr 2018, Daniel Hellstrom wrote:
> I received feedback from the webmaster that the preferred link is
> "http://www.gaisler.com/notes; so I updated it just be pushing.

Maybe, but then they should reconfigure their server such that
it does not redirect.

We've had to many cases in the past where redirects ended up
being malign or simply broken, and I am not looking forward
towards manually verifying this one every month.

Gerald

Re: Fix PR 83530

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 18:55, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issue is when we cannot correctly make insn tick data for the
>> unscheduled code (but processed by the modulo scheduler).  Fixed by closely
>> following usual scheduling process as described in the PR.
> 
> This is ok with the following nit-picks fixed.

Thank, I've committed the attached.

Andrey

> 
>> 2018-04-03  Andrey Belevantsev  
>>
>>  PR rtl-optimization/83530
>>
>>  * sel-sched.c (force_next_insn): New global variable.
>>  (remove_insn_for_debug): When force_next_insn is true, also leave only
>> next insn
>>  in the ready list.
>>  (sel_sched_region): When the region wasn't scheduled, make another pass
>> over it
>>  with force_next_insn set to 1.
> 
> Overlong lines.
> 
>>  * gcc.dg/pr8350.c: New test.
> 
> Typo in test name.
>  
>> --- a/gcc/sel-sched.c
>> +++ b/gcc/sel-sched.c
>> @@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying)
>> distinguishing between bookkeeping copies and original insns.  */
>>  static int max_uid_before_move_op = 0;
>>
>> +/* When true, we're always scheduling next insn on the already scheduled 
>> code
>> +   to get the right insn data for the following bundling or other passes.  
>> */
>> +static int force_next_insn = 0;
>> +
>>  /* Remove from AV_VLIW_P all instructions but next when debug counter
>> tells us so.  Next instruction is fetched from BNDS.  */
>>  static void
>>  remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p)
>>  {
>> -  if (! dbg_cnt (sel_sched_insn_cnt))
>> +  if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn)
>>  /* Leave only the next insn in av_vliw.  */
>>  {
>>av_set_iterator av_it;
>> @@ -7642,7 +7646,13 @@ sel_sched_region (int rgn)
>>  sel_sched_region_1 ();
>>else
>>  /* Force initialization of INSN_SCHED_CYCLEs for correct bundling.  */
> 
> I believe this comment needs updating.
> 
> Please also consider moving both assignments of reset_sched_cycles_p to
> after the if-else statement, just before sel_region_finish call.
> 
>> -reset_sched_cycles_p = true;
>> +{
>> +  reset_sched_cycles_p = false;
>> +  pipelining_p = false;
>> +  force_next_insn = 1;
>> +  sel_sched_region_1 ();
>> +  force_next_insn = 0;
>> +}
>>
>>sel_region_finish (reset_sched_cycles_p);
>>  }

Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 259227)
--- gcc/ChangeLog   (revision 259228)
***
*** 1,3 
--- 1,13 
+ 2018-04-09  Andrey Belevantsev  
+ 
+   PR rtl-optimization/83530
+ 
+   * sel-sched.c (force_next_insn): New global variable.
+   (remove_insn_for_debug): When force_next_insn is true, also leave only
+   next insn in the ready list.
+   (sel_sched_region): When the region wasn't scheduled, make another pass
+   over it with force_next_insn set to 1.
+ 
  2018-04-08  Monk Chiang  
  
* config.gcc (nds32le-*-*, nds32be-*-*): Add nds32/nds32_intrinsic.h
Index: gcc/testsuite/gcc.dg/pr83530.c
===
*** gcc/testsuite/gcc.dg/pr83530.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr83530.c  (revision 259228)
***
*** 0 
--- 1,15 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */
+ int vm, z0;
+ short int mz;
+ 
+ int
+ ny (void)
+ {
+   int ch;
+ 
+   for (ch = 0; ch < 6; ++ch)
+ vm += ch / vm;
+ 
+   return z0 + mz;
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog (revision 259227)
--- gcc/testsuite/ChangeLog (revision 259228)
***
*** 1,3 
--- 1,8 
+ 2018-04-09  Andrey Belevantsev  
+ 
+   PR rtl-optimization/83530
+   * gcc.dg/pr83530.c: New test.
+ 
  2018-04-07  Thomas Koenig  
  
PR middle-end/82976
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 259227)
--- gcc/sel-sched.c (revision 259228)
*** remove_temp_moveop_nops (bool full_tidyi
*** 5004,5015 
 distinguishing between bookkeeping copies and original insns.  */
  static int max_uid_before_move_op = 0;
  
  /* Remove from AV_VLIW_P all instructions but next when debug counter
 tells us so.  Next instruction is fetched from BNDS.  */
  static void
  remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p)
  {
!   if (! dbg_cnt (sel_sched_insn_cnt))
  /* Leave only the next insn in av_vliw.  */
  {
av_set_iterator av_it;
--- 5004,5019 
 distinguishing between bookkeeping copies and original insns.  */
  static int

Re: [Aarch64] Fix conditional branches with target far away.

2018-04-09 Thread Sameera Deshpande

Hi Richard,

I do not see the said patch applied in ToT yet. When do you expect it
to be available in ToT?

- Thanks and regards,
  Sameera D.

On 30 March 2018 at 17:01, Sameera Deshpande
 wrote:
> Hi Richard,
>
> The testcase is working with the patch you suggested, thanks for
> pointing that out.
>
> On 30 March 2018 at 16:54, Sameera Deshpande
>  wrote:
>> On 30 March 2018 at 16:39, Richard Sandiford
>>  wrote:
 Hi Sudakshina,

 Thanks for pointing that out. Updated the conditions for attribute
 length to take care of boundary conditions for offset range.

 Please find attached the updated patch.

 I have tested it for gcc testsuite and the failing testcase. Ok for trunk?

 On 22 March 2018 at 19:06, Sudakshina Das  wrote:
> Hi Sameera
>
> On 22/03/18 02:07, Sameera Deshpande wrote:
>>
>> Hi Sudakshina,
>>
>> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
>> far branch instruction offset is inclusive of both the offsets. Hence,
>> I am using <=||=> and not <||>= as it was in previous implementation.
>
>
> I have to admit earlier I was only looking at the patch mechanically and
> found a difference with the previous implementation in offset comparison.
> After you pointed out, I looked up the ARMv8 ARM and I have a couple of
> doubts:
>
> 1. My understanding is that any offset in [-1048576 ,1048572] both 
> inclusive
> qualifies as an 'in range' offset. However, the code for both attribute
> length and far_branch has been using [-1048576 ,1048572), that is, ( >= 
> && <
> ). If the far_branch was incorrectly calculated, then maybe the length
> calculations with similar magic numbers should also be corrected? Of 
> course,
> I am not an expert in this and maybe this was a conscience decision so I
> would ask Ramana to maybe clarify if he remembers.
>
> 2. Now to come back to your patch, if my understanding is correct, I 
> think a
> far_branch would be anything outside of this range, that is,
> (offset < -1048576 || offset > 1048572), anything that can not be
> represented in the 21-bit range.
>
> Thanks
> Sudi
>>>
>>> [...]
>>>
 @@ -466,14 +459,9 @@
[(set_attr "type" "branch")
 (set (attr "length")
   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int 
 -1048576))
 -(lt (minus (match_dup 2) (pc)) (const_int 
 1048572)))
 +(le (minus (match_dup 2) (pc)) (const_int 
 1048572)))
 (const_int 4)
 -   (const_int 8)))
>>>
>>> Sorry for not replying earlier, but I think the use of "lt" rather than
>>> "le" in the current length attribute is deliberate.  Distances measured
>>> from (pc) in "length" are a bit special in that backward distances are
>>> measured from the start of the instruction and forward distances are
>>> measured from the end of the instruction:
>>>
>>>   /* The address of the current insn.  We implement this actually as the
>>>  address of the current insn for backward branches, but the last
>>>  address of the next insn for forward branches, and both with
>>>  adjustments that account for the worst-case possible stretching of
>>>  intervening alignments between this insn and its destination.  */
>>>
>>> This avoids the chicken-and-egg situation of the length of the branch
>>> depending on the forward distance and the forward distance depending
>>> on the length of the branch.
>>>
>>> In contrast, this code:
>>>
 @@ -712,7 +695,11 @@
{
  if (get_attr_length (insn) == 8)
{
 - if (get_attr_far_branch (insn) == 1)
 + long long int offset;
 + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
 +   - INSN_ADDRESSES (INSN_UID (insn));
 +
 + if (offset < -1048576 || offset > 1048572)
 return aarch64_gen_far_branch (operands, 2, "Ltb",
"\\t%0, %1, ");
   else
>>>
>>> is reading the final computed addresses, so the code is right to use
>>> the real instruction range.  (FWIW I agree with Kyrill that using
>>> IN_RANGE with hex constants would be clearer.)
>>>
>>> That said... a possible problem comes from situations like:
>>>
>>>address length insn
>>>..c  8 A
>>>   ..align to 8 bytes...
>>>..8B
>>>..c  4 C
>>>   ..align to 16 bytes...
>>>..0D, branch to B
>>>
>>> when D is at the maximum extent of the branch range and when GCC's length
>>> for A is only a conservative estimate.  If the length of A turns out to
>>> be 4 rather than 8 at assembly time, the alignment to 8

Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround

2018-04-09 Thread Daniel Hellstrom


On 2018-04-07 22:08, Gerald Pfeifer wrote:

On Thu, 1 Feb 2018, Daniel Hellstrom wrote:

I would like to commit the following comment about LEON3-FT errata now
available in GCC-7.3.

Thanks, I have now committed it.

Apparently the ultimate URL is the one below (which I applied for
now)?

Gerald


I received feedback from the webmaster that the preferred link is 
"http://www.gaisler.com/notes; so I updated it just be pushing.


Regards,
Daniel



Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.101
diff -u -r1.101 changes.html
--- changes.html2 Apr 2018 15:22:38 -   1.101
+++ changes.html7 Apr 2018 20:05:52 -
@@ -1324,7 +1324,8 @@
  
  SPARC


-Work arounds for the four http://www.gaisler.com/notes;>
+Work arounds for the four https://www.gaisler.com/index.php/information/app-tech-notes;>
  LEON3FT errata GRLIB-TN-0010..0013 have been added. Relevant 
errata
  are activated by the target specific -mfix-ut699,
  -mfix-ut700 and -mfix-gr712rc switches.

Re: [PATCH/doc] - mention all optimization options that enable inlining options

2018-04-09 Thread Richard Biener

On Sun, 8 Apr 2018, Martin Sebor wrote:

> While updating the -Wrestrict option to mention that it works
> best not just with -O2 but also at higher optimization levels
> I looked around for other options that might benefit from
> a similar clarification.  I found a few inlining options that
> only mention -O2 but that (according to -Q --help=optimizers)
> are enabled at other levels as well.  The attached patch
> changes the manual to reflect that.

OK.

> Given the large number of options I didn't take the time to
> check all optimization options so there could others that could
> stand to be similarly improved if we want to be 100% accurate.
> If that is, in fact, what we want then we might want to script
> this.

Yeah, note we now have -Og and -Ofast as well...

Richard.

54 matches

Mail list logo