Re: [PATCH] Assorted -masm=intel fixes (PR target/85281)
>>> "H.J. Lu"04/09/18 9:09 PM >>> >On Mon, Apr 9, 2018 at 11:37 AM, Jakub Jelinek wrote: >> BTW, -masm=intel seems to be in quite bad shape even in the assembler, in >> various testcases I'm getting errors like on the following reduced one: >> int k1, xmm0; >> int foo (void) { return k1; } >> int bar (void) { return xmm0; } >> gcc -masm=intel -O2 >> /tmp/cch0mo1K.s: Assembler messages: >> /tmp/cch0mo1K.s:10: Error: invalid use of register >> /tmp/cch0mo1K.s:21: Error: invalid use of register >> As ICC generates the same assembly on the instructions: >> mov eax, DWORD PTR k1[rip] >> ... >> mov eax, DWORD PTR xmm0[rip] >> I think either the intel syntax spec is faulty, or gas is buggy and should >> figure out that after *WORD PTR and before [ there is symbol rather than >> register name. Some testcases e.g. have k1 as function name and that >> results in other asm errors (about .size directive). > >How does Intel syntax support symbols like eax, k1 and xmm0 with >".intel_syntax noprefix"? I've noticed this problem about two weeks ago as well, and have a patch mostly ready (but need to get around to both produce a proper testcase and regression test the whole thing); that won't be until in a couple of weeks time, though - if you think this is needed earlier, I can hand you the fragments I have. As an aside - you realize this isn't an Intel syntax only issue, as "noprefix" can as well be specified with .att_syntax. I should note though that the fix won't go as far a Jakub suggests: Context doesn't matter for recognizing whether a symbol is a register. For something like the above to compile, .arch would need to be used to disable the respective register groups (e.g. .arch .noavx512f to make k1 an ordinary symbol). Jan
Re: [PATCH/doc] - mention all optimization options that enable inlining options
On 04/09/2018 01:36 AM, Richard Biener wrote: On Sun, 8 Apr 2018, Martin Sebor wrote: While updating the -Wrestrict option to mention that it works best not just with -O2 but also at higher optimization levels I looked around for other options that might benefit from a similar clarification. I found a few inlining options that only mention -O2 but that (according to -Q --help=optimizers) are enabled at other levels as well. The attached patch changes the manual to reflect that. OK. I committed it earlier today in r259250. Given the large number of options I didn't take the time to check all optimization options so there could others that could stand to be similarly improved if we want to be 100% accurate. If that is, in fact, what we want then we might want to script this. Yeah, note we now have -Og and -Ofast as well... Right, those aren't mentioned. With so many flavors of -O does enumerating them all for each optimization option still make sense? I wonder if this could this be generated by some script instead (it would probably have to be presented in the form of a table but that might actually make things easier to find). Martin
[RFC] Improve tree DSE
I would like to queue this patch for stage1 review. In DSE, while in dse_classify_store, as soon as we see a PHI use statement that is part of the loop, we are immediately giving up. As far as I understand, this can be improved. Attached patch is trying to walk the uses of the PHI statement (by recursively calling dse_classify_store) and then making sure the obtained store is indeed redundant. This is partly as reported in one of the testcase from PR44612. But this PR is about other issues that is not handled in this patch. Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions. Is this OK for next stage1? Thanks, Kugan gcc/ChangeLog: 2018-04-10 Kugan Vivekanandarajah* tree-ssa-dse.c (dse_classify_store): Handle recursive PHI. (dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store. gcc/testsuite/ChangeLog: 2018-04-10 Kugan Vivekanandarajah * gcc.dg/tree-ssa/ssa-dse-31.c: New test. * gcc.dg/tree-ssa/ssa-dse-32.c: New test. From 5751eaff3d1c263e8631d5a07e43fecaaa0e9d26 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Tue, 10 Apr 2018 09:49:10 +1000 Subject: [PATCH] improve dse --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 ++ gcc/tree-ssa-dse.c | 51 -- 3 files changed, 81 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c new file mode 100644 index 000..e4d71b2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c @@ -0,0 +1,16 @@ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse-details" } */ +#define SIZE 4 + +int main () +{ + static float a[SIZE]; + int i; + for (i = 0; i < SIZE; i++) + __builtin_memset ((void *) a, 0, sizeof(float)*3); + __builtin_memset ((void *) a, 0, sizeof(float)*SIZE); + return 0; +} + +/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c new file mode 100644 index 000..3d8fd5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c @@ -0,0 +1,23 @@ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse-details" } */ +#define SIZE 4 + +void s4 (float *restrict a) +{ + (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE); +} + + +int main () +{ + int i; + float a[10]; + printf("Start\n"); + for (i = 0; i < SIZE; i++) +s4 (a); + printf("Done\n"); + return 0; +} + +/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 9220fea..3513fda 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -521,11 +521,11 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) Return TRUE if the above conditions are met, otherwise FALSE. */ static dse_store_status -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, - bool byte_tracking_enabled, sbitmap live_bytes) +dse_classify_store (ao_ref *ref, gimple *stmt_outer, gimple *stmt, + gimple **use_stmt, bool byte_tracking_enabled, + sbitmap live_bytes, unsigned cnt = 0) { gimple *temp; - unsigned cnt = 0; *use_stmt = NULL; @@ -556,9 +556,11 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, { cnt++; + if (use_stmt == stmt_outer) + continue; /* If we ever reach our DSE candidate stmt again fail. We cannot handle dead stores in loops. */ - if (use_stmt == stmt) + else if (use_stmt == stmt) { fail = true; BREAK_FROM_IMM_USE_STMT (ui); @@ -572,8 +574,6 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, if (temp /* Make sure we are not in a loop latch block. */ || gimple_bb (stmt) == gimple_bb (use_stmt) - || dominated_by_p (CDI_DOMINATORS, - gimple_bb (stmt), gimple_bb (use_stmt)) /* We can look through PHIs to regions post-dominating the DSE candidate stmt. */ || !dominated_by_p (CDI_POST_DOMINATORS, @@ -582,8 +582,41 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, fail = true; BREAK_FROM_IMM_USE_STMT (ui); } + else if (dominated_by_p (CDI_DOMINATORS, + gimple_bb (stmt), gimple_bb (use_stmt))) + { + gphi *phi = as_a (use_stmt); + gimple *def_stmt = SSA_NAME_DEF_STMT (PHI_RESULT (phi)); + enum dse_store_status status = DSE_STORE_LIVE; + ao_ref use_ref; + gimple *inner_use_stmt; + + /* If stmt dominates PHI stmt, follow the PHI stmt. */ + if (!temp) + status = dse_classify_store (ref, stmt, def_stmt, _use_stmt, +
Re: [doc patch] remove hyphen from side-effect
On 04/09/2018 05:58 PM, Martin Sebor wrote: Ping: I will go ahead and commit this patch sometime this week unless I hear objections: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00036.html This patch is mostly OK, except for these hunks: Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 259017) +++ gcc/doc/invoke.texi (working copy) @@ -8710,7 +8710,7 @@ default at @option{-O} and higher. @item -ftree-builtin-call-dce @opindex ftree-builtin-call-dce Perform conditional dead code elimination (DCE) for calls to built-in functions -that may set @code{errno} but are otherwise side-effect free. This flag is +that may set @code{errno} but are otherwise side effect free. This flag is enabled by default at @option{-O2} and higher if @option{-Os} is not also specified. s/side-effect free/free of side effects/ @@ -14229,7 +14229,7 @@ not overridden} will do. This option is implicitly passed to the compiler for the second compilation requested by @option{-fcompare-debug}, along with options to silence warnings, and omitting other options that would cause -side-effect compiler outputs to files or to the standard output. Dump +side effect compiler outputs to files or to the standard output. Dump files and preserved temporary files are renamed so as to contain the @code{.gk} additional extension during the second compilation, to avoid overwriting those generated by the first. This is impossible to parse either with or without the hyphen. How about rewriting the end of that sentence as ...that would cause the compiler to produce output to files or to standard output as a side effect. ?? @@ -3496,7 +3496,7 @@ instructions. @cindex RTL predecrement @cindex RTL postdecrement -Six special side-effect expression codes appear as memory addresses. +Six special side effect expression codes appear as memory addresses. @table @code @findex pre_dec This should either remain hyphenated (adjective phrase immediately before the noun), or rewritten as something like Six special expression codes represent memory addresses with side effects. Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 259017) +++ gcc/doc/tm.texi.in (working copy) @@ -3967,7 +3967,7 @@ pre-decrement, post-increment, or post-decrement a @defmac HAVE_PRE_MODIFY_DISP @defmacx HAVE_POST_MODIFY_DISP A C expression that is nonzero if the machine supports pre- or -post-address side-effect generation involving constants other than +post-address side effect generation involving constants other than the size of the memory operand. @end defmac @@ -3974,7 +3974,7 @@ the size of the memory operand. @defmac HAVE_PRE_MODIFY_REG @defmacx HAVE_POST_MODIFY_REG A C expression that is nonzero if the machine supports pre- or -post-address side-effect generation involving a register displacement. +post-address side effect generation involving a register displacement. @end defmac @defmac CONSTANT_ADDRESS_P (@var{x}) I think both of these should also remain hyphenated (adjective phrase immediately before the modified noun). -Sandra
Re: [doc patch] remove hyphen from side-effect
Ping: I will go ahead and commit this patch sometime this week unless I hear objections: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00036.html Martin On 04/02/2018 03:55 PM, Martin Sebor wrote: Attached is a patch to remove the hyphen from uses of side-effect in the manual as per the comment in: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00021.html Although both forms are common, "side effect" (without the hyphen) is more prevalent in current English and I think it's better to be consistent than to have to rehash this question each time someone happens to copy the less common form from existing text. https://english.stackexchange.com/questions/889/when-should-compound-words-be-written-as-one-word-with-hyphens-or-with-spaces Martin
[PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure
Define a new exception type derived from std::ios::failure[abi:cxx11] which also aggregates an object of the gcc4-compatible ios::failure type. Make __throw_ios_failure throw this new type for iostream errors that raise exceptions. Provide custom type info for the new type so that it can be caught by handlers for the gcc4-compatible ios::failure type as well as handlers for ios::failure[abi:cxx11] and its bases. PR libstdc++/85222 * src/c++11/Makefile.am [ENABLE_DUAL_ABI]: Add special rules for cxx11-ios_failure.cc to rewrite type info for __ios_failure. * src/c++11/Makefile.in: Regenerate. * src/c++11/cxx11-ios_failure.cc (__ios_failure, __iosfail_type_info): New types. [_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here. * src/c++11/ios.cc (__throw_ios_failure): Remove definition. * src/c++98/ios_failure.cc (__construct_ios_failure) (__destroy_ios_failure, is_ios_failure_handler): New functions. [!_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here. * testsuite/27_io/ios_base/failure/dual_abi.cc: New. * testsuite/27_io/basic_ios/copyfmt/char/1.cc: Revert changes to handler types, to always catch std::ios_base::failure. * testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/char/ exceptions_failbit.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/ exceptions_failbit.cc: Likewise. * testsuite/27_io/basic_istream/extractors_other/char/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_istream/extractors_other/wchar_t/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise. * testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise. * testsuite/27_io/basic_ostream/inserters_other/char/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_ostream/inserters_other/wchar_t/ exceptions_null.cc: Likewise. * testsuite/27_io/ios_base/storage/2.cc: Likewise. Tested x86_64-linux and powerpc64-linux, with the default config, and --disable-libstdcxx-dual-abi, and --with-default-libstdcxx-abi=gcc4-compatible. I intend to commit this to trunk and gcc-7-branch soon. commit bdcc071eaa58aa07824b01a6da64662787abece7 Author: Jonathan WakelyDate: Tue Apr 10 00:19:25 2018 +0100 PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure Define a new exception type derived from std::ios::failure[abi:cxx11] which also aggregates an object of the gcc4-compatible ios::failure type. Make __throw_ios_failure throw this new type for iostream errors that raise exceptions. Provide custom type info for the new type so that it can be caught by handlers for the gcc4-compatible ios::failure type as well as handlers for ios::failure[abi:cxx11] and its bases. PR libstdc++/85222 * src/c++11/Makefile.am [ENABLE_DUAL_ABI]: Add special rules for cxx11-ios_failure.cc to rewrite type info for __ios_failure. * src/c++11/Makefile.in: Regenerate. * src/c++11/cxx11-ios_failure.cc (__ios_failure, __iosfail_type_info): New types. [_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here. * src/c++11/ios.cc (__throw_ios_failure): Remove definition. * src/c++98/ios_failure.cc (__construct_ios_failure) (__destroy_ios_failure, is_ios_failure_handler): New functions. [!_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here. * testsuite/27_io/ios_base/failure/dual_abi.cc: New. * testsuite/27_io/basic_ios/copyfmt/char/1.cc: Revert changes to handler types, to always catch std::ios_base::failure. * testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/char/ exceptions_failbit.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/ exceptions_failbit.cc: Likewise. * testsuite/27_io/basic_istream/extractors_other/char/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_istream/extractors_other/wchar_t/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise. * testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise. * testsuite/27_io/basic_ostream/inserters_other/char/ exceptions_null.cc: Likewise. * testsuite/27_io/basic_ostream/inserters_other/wchar_t/ exceptions_null.cc: Likewise. * testsuite/27_io/ios_base/storage/2.cc: Likewise. diff --git a/libstdc++-v3/src/c++11/Makefile.am
Re: C++ PATCH for c++/85264, ICE with extra template parameter list
Hi, On 09/04/2018 21:39, Jason Merrill wrote: cp_parser_check_template_parameters is supposed to catch when we have the wrong number of template parameter lists, but it wasn't diagnosing this case. Fixed by checking whether the thing we're declaring used a template-id; the case of declaring a primary template, when we allow one more template parameter list, uses a plain identifier. Nice, it looks like this also fixes the prehistoric c++/24314, which I still had assigned. When I worked a bit on it, relatively recently, in 2012, I failed to properly follow-up to your feedback on the mailing list, but I don't think we really nailed the problem, did we? https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00504.html Or maybe we just tried to do too much, like doing away completely with cp_parser_check_template_parameters in favor of a bit of additional checking in maybe_process_partial_specialization. Paolo.
Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]
On 04/04/2018 06:43 AM, Peter Bergner wrote: > On 4/4/18 4:06 AM, Tamar Christina wrote: >> Now that I know how the loads are done, I have a patch should be both >> correct and generate better code in most cases. >> It just calculates bitsize inside the loop and does the copying in the >> largest mode possible that's equal or less than the bits >> That are to be copied. This avoids the issue with reading too much, honors >> padding and alignment and still generates better code >> In most cases. >> >> I'm running the regression tests and should have a final version soon. > > If you give me your patch when it's ready, I'll do bootstrap and regression > testing on powerpc*-linux and verify it fixes the issues we hit. Similarly, I've got a jenkins instance here were I can get a bootstrap and regression test on the usual targets like aarch64, armv7, i686, powerpc (32, 64, 64le), s390x, sparc64, x86_64. But more interestingly it'll also do a bootstrap test on alpha, hppa, m68k, sh4 and other oddballs like aarch64-be. A patch for the tip of the trunk is all I need. It doesn't run the testsuite, but the ability to bootstrap on the lesser used targets gives a level of code generator validation that is helpful. Takes about 24hrs to cycle through everything... jeff
Re: [PATCH][RFC] Fix PR85180, find_base_term is exponential
On 04/05/2018 06:32 AM, Richard Biener wrote: > > This fixes exponential time spent in find_base_term by extending the > existing fix of NULL-ifying VALUE locations during their processing > to keep them NULL-ified during the whole recursion. > > As Jakub figured this has the chance of returning NULL prematurely > for the case of the plus/minus case rejecting a found base on the > ground of > > if (base != NULL_RTX > && ((REG_P (tmp1) && REG_POINTER (tmp1)) > || known_base_value_p (base))) > return base; > > so any VALUE visited during a such rejected base will not be > visited again (but we'll get a NULL result). > > Quoting him from IRC: still no differences, out of 850mil calls > > for an instrumented version. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > Alternative (safer in the above regard) variants using a hash-map > to cache the base for a VALUE are quite a bit slower if not implemented > in ways that also look dangerous at this point. See the PR for > some of those. > > Any comments? Otherwise we decided to go ahead with this tomorrow. > > Thanks, > Richard. > > 2018-04-05 Richard Biener> > PR middle-end/85180 > * alias.c (find_base_term): New wrapper around find_base_term > unwinding CSELIB_VAL_PTR changes. > (find_base_term): Do not restore CSELIB_VAL_PTR during the > recursion. > > * gcc.dg/pr85180.c: New testcase. Seems reasonable. I do wonder how much all the complexity in alias.c buys us these days and whether or not we should look to do some dramatic simplifications. The core parts of this code date back to the late 90s when RTL memory disambiguation was much more important than it is today. jeff
Re: [PATCH][explow] PR target/85173: validize memory before passing it on to target probe_stack
On 04/05/2018 08:20 AM, Kyrill Tkachov wrote: > Hi all, > > In this PR the expansion code emits an invalid memory address for the > stack probe, which the backend fails to recognise. > The address is created explicitly in > anti_adjust_stack_and_probe_stack_clash in explow.c and passed down to > gen_probe_stack > without any validation in emit_stack_probe. > > This patch fixes the ICE by calling validize_mem on the memory location > before passing it down to the target. > Jakub pointed out that we also want to create valid addresses for the > probe_stack_address case, so this patch > creates an expand operand and legitimizes it before passing it down to > the probe_stack_address expander. > > This patch passes bootstrap and testing on arm-none-linux-gnueabihf and > aarch64-none-linux-gnu > and ppc64le-redhat-linux on gcc112 in the compile farm. > > Is this ok for trunk? > > Thanks, > Kyrill > > P.S. Uros, the alpha probe_stack expander in alpha.md seems incompatible > with the way the probe_stack name is > used in the midend. It accepts a const_int operand that is used as an > offset from the stack pointer, rather than accepting > a full memory operand like other targets. Do you think it's better to > rename the probe_stack pattern there to something > that doesn't conflict with the name the midend assumes? > > 2018-04-05 Kyrylo Tkachov> > PR target/85173 > * explow.c (emit_stack_probe): Call validize_mem on memory location > before passing it to gen_probe_stack. Create address operand and > legitimize it for the probe_stack_address case. > > 2018-04-05 Kyrylo Tkachov > > PR target/85173 > * gcc.target/arm/pr85173.c: New test. Alpha should be fixed -- the docs clearly state that the operand is "the memory reference in the stack that needs to be probed". Just passing in the offset seems wrong. Note that -fstack-clash-protection on ARM (32bit) relies on the old -fstack-check code which has a variety of issues. It's better than nothing, but the real way to go here is to fix prologue generation as well as alloca/vla handling to have stack clash specific sequences. OK for the trunk. jeff
[PR 84149] From 3b967c133f7f62e3c2951a971d8690d242364f5d Mon Sep 17 00:00:00 2001
Hi, changes in predictors caused that LTO IPA-CP no longer clones the qsort implementation for both of the comparators used in 505.mcf_r from Spec 2017 to be quite and this resulted in quite some slowdown of the benchmark. I have tried various way of teaching IPA-CP that the remaining contexts all have a single constant, but all of them required that (I either reorder stuff on a higher level or) I special case self-recursive edges and so I decided to do it consistently even in the analysis phase in the patch below. Consequently, using default parameters, IPA-CP now knows that it is beneficial to clone for both comparators again. However, I have verified that when I tweak --param ipa-cp-eval-threshold so that only one clone is created, the one created "for all remaining contexts" knows that they only contain one comparator. Because IPA-CP now can count self-recursive edges among those bringing a value when cloning (as opposed next time around when we come to the node in the SCC) and we need to make sure the original node keeps its self-recursive edge pointing to itself, create_specialized_node must make sure that such edges are handled properly and redirect only clones of these edges, rather than relying on cgraphclones.c machinery. The patch passes bootstrap, LTO-bootstrap and testing on x86_64-linux, I have also verified the 505.mcf_r regression is gone and that it can compile all of SPEC 2017 and 2006 with LTO (well, the config I used failed for three benchmarks, but so it did on trunk). I have also LTO built Firefox with it and it browsed a few pages fine too. OK for trunk? Thanks, Martin 2018-04-08 Martin JamborPR ipa/84149 * ipa-cp.c (propagate_vals_across_pass_through): Expand comment. (cgraph_edge_brings_value_p): New parameter dest_val, check if it is not the same as the source val. (cgraph_edge_brings_value_p): New parameter. (gather_edges_for_value): Pass destination value to cgraph_edge_brings_value_p. (perhaps_add_new_callers): Likewise. (get_info_about_necessary_edges): Likewise and exclude values brought only by self-recursive edges. (create_specialized_node): Redirect only clones of self-calling edges. (+self_recursive_pass_through_p): New function. (find_more_scalar_values_for_callers_subset): Use it. (find_aggregate_values_for_callers_subset): Likewise. (known_aggs_to_agg_replacement_list): Removed. (decide_whether_version_node): Re-calculate known constants for all remaining context clones. --- gcc/ipa-cp.c | 121 ++- 1 file changed, 78 insertions(+), 43 deletions(-) diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index ee41a8d55b7..6a9a695fc2c 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1601,7 +1601,9 @@ propagate_vals_across_pass_through (cgraph_edge *cs, ipa_jump_func *jfunc, /* Do not create new values when propagating within an SCC because if there are arithmetic functions with circular dependencies, there is infinite - number of them and we would just make lattices bottom. */ + number of them and we would just make lattices bottom. If this condition + is ever relaxed we have to detect self-feeding recursive calls in + cgraph_edge_brings_value_p in a smarter way. */ if ((ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR) && ipa_edge_within_scc (cs)) ret = dest_lat->set_contains_variable (); @@ -3470,12 +3472,12 @@ same_node_or_its_all_contexts_clone_p (cgraph_node *node, cgraph_node *dest) return info->is_all_contexts_clone && info->ipcp_orig_node == dest; } -/* Return true if edge CS does bring about the value described by SRC to node - DEST or its clone for all contexts. */ +/* Return true if edge CS does bring about the value described by SRC to + DEST_VAL of node DEST or its clone for all contexts. */ static bool cgraph_edge_brings_value_p (cgraph_edge *cs, ipcp_value_source *src, - cgraph_node *dest) + cgraph_node *dest, ipcp_value *dest_val) { struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller); enum availability availability; @@ -3485,7 +3487,9 @@ cgraph_edge_brings_value_p (cgraph_edge *cs, ipcp_value_source *src, || availability <= AVAIL_INTERPOSABLE || caller_info->node_dead) return false; - if (!src->val) + /* At the moment we do not propagate over arithmetic jump functions in SCCs, + so it is safe to detect self-feeding recursive calls in this way. */ + if (!src->val || src->val == dest_val) return true; if (caller_info->ipcp_orig_node) @@ -3521,13 +3525,14 @@ cgraph_edge_brings_value_p (cgraph_edge *cs, ipcp_value_source *src, } } -/* Return true if edge CS does bring about the value described by SRC to node - DEST or its clone for all contexts. */ +/* Return true
Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT
On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote: > > On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > >> > >> the attached patch removes the parallel annotation from DO CONCURRENT. > >> As discussed in the PR, the autoparallellizer currently generates > >> wrong code. The only feasible way is to disable the annotation for > >> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > >> regression. > >> > >> Regression-tested. OK for trunk? > >> > > > > Yes. > > Thanks for the review! > > However, just as I was about to commit, I thought of something else > and, hopefully, better. > > The underlying problem is that the annotation does not go together > with -ftree-parallelize-loops - so let's simply not set it if that > flag is set. > > This way, we can avoid speed regressions for people who do not use > -ftree-parallelize-loops, and it will be possible to downgrade the > PR to a missed-optimization bug; it is also not necessary to xfail > vect-do-concurrent-1.f90 > > Regression-tested. OK for trunk, for this version? In reviewing the PR audit trailing, I thought Richard indicated the problem is in the autopar stage in the middle. Removing the annotation simply prevents gfortran's DO CURRENT from a middle in bug. I'm fine with the new patch. I'll leave it up to you to decide which is preferred. -- Steve
C++ PATCH for c++/85279, dump_expr vs. decltype
For some reason, cp_parser_name_lookup_error uses %E for printing the parsing scope, so dump_expr needs to know how to handle some _TYPEs. DECLTYPE_TYPE was missing. Tested x86_64-pc-linux-gnu, applying to trunk. commit 4d380464839058b3d70e1262339ed973642d6741 Author: Jason MerrillDate: Mon Apr 9 16:51:01 2018 -0400 PR c++/85279 - dump_expr doesn't understand decltype. * error.c (dump_expr): Handle DECLTYPE_TYPE. diff --git a/gcc/cp/error.c b/gcc/cp/error.c index 75e853a1428..f27b700a434 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -2714,6 +2714,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags) case INTEGER_TYPE: case COMPLEX_TYPE: case VECTOR_TYPE: +case DECLTYPE_TYPE: pp_type_specifier_seq (pp, t); break; diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype67.C b/gcc/testsuite/g++.dg/cpp0x/decltype67.C new file mode 100644 index 000..e8042ac59e7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/decltype67.C @@ -0,0 +1,7 @@ +// PR c++/85279 +// { dg-do compile { target c++11 } } + +template struct A +{ + void foo(decltype(T())::Y); // { dg-error {decltype\(T\(\)\)::Y} } +};
Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT
Hi Steve, On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: the attached patch removes the parallel annotation from DO CONCURRENT. As discussed in the PR, the autoparallellizer currently generates wrong code. The only feasible way is to disable the annotation for gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 regression. Regression-tested. OK for trunk? Yes. Thanks for the review! However, just as I was about to commit, I thought of something else and, hopefully, better. The underlying problem is that the annotation does not go together with -ftree-parallelize-loops - so let's simply not set it if that flag is set. This way, we can avoid speed regressions for people who do not use -ftree-parallelize-loops, and it will be possible to downgrade the PR to a missed-optimization bug; it is also not necessary to xfail vect-do-concurrent-1.f90 Regression-tested. OK for trunk, for this version? 2018-04-09 Thomas KoenigPR fortran/83064 * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for parallell processing of DO CONCURRENT -ftree-parallelize-loops is set. 2018-04-09 Thomas Koenig PR fortran/83064 * gfortran.dg/do_concurrent_5.f90: New test. * gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus message. Index: fortran/trans-stmt.c === --- fortran/trans-stmt.c (Revision 259222) +++ fortran/trans-stmt.c (Arbeitskopie) @@ -3642,7 +3642,10 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr /* The exit condition. */ cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - if (forall_tmp->do_concurrent) + + /* PR 83064 means that we cannot use the annotation if the + autoparallelizer is active. */ + if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_parallel_kind), Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 === --- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Revision 259222) +++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Arbeitskopie) @@ -12,4 +12,3 @@ subroutine test(n, a, b, c) end subroutine test ! { dg-message "loop vectorized" "" { target *-*-* } 0 } -! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 } ! { dg-do run } ! PR 83064 - this used to give wrong results. ! { dg-options "-O3 -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main use, intrinsic :: iso_fortran_env implicit none integer, parameter :: nsplit = 4 integer(int64), parameter :: ne = 2000 integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i real(real64), dimension(nsplit) :: pi edof(1::4) = 1 edof(2::4) = 2 edof(3::4) = 3 edof(4::4) = 4 stride = ceiling(real(ne)/nsplit) do i = 1, nsplit high(i) = stride*i end do do i = 2, nsplit low(i) = high(i-1) + 1 end do low(1) = 1 high(nsplit) = ne pi = 0 do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort contains pure function compute( low, high ) result( ttt ) integer(int64), intent(in) :: low, high real(real64), dimension(nsplit) :: ttt integer(int64) :: j, k ttt = 0 ! Unrolled loop ! do j = low, high, 4 ! k = 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! k = 2 ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) ! k = 3 ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) ! k = 4 ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) ! end do ! Loop with modulo operation ! do j = low, high ! k = mod( j, nsplit ) + 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! end do ! Loop with subscripting via host association do j = low, high k = edof(j) ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) end do end function end program main
C++ PATCH for c++/85262, ICE with redundant qualification on constructor
A::A names the constructor, but you can't call the constructor directly, you should just write A(args). Our error recovery was failing in a template because we'd wrapped the arguments with NON_DEPENDENT_EXPR, and then kept the result as part of the template tree, whereas we're supposed to throw away anything containing NON_DEPENDENT_EXPR once we've used it to determine types. Tested x86_64-pc-linux-gnu, applying to trunk. commit 4fde624556763974da407ffcf9858d5126730646 Author: Jason MerrillDate: Mon Apr 9 15:16:32 2018 -0400 PR c++/85262 - ICE with redundant qualification on constructor. * call.c (build_new_method_call_1): Move make_args_non_dependent after A::A() handling. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index b22a3cc132e..f978ea73f3d 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -9104,14 +9104,6 @@ build_new_method_call_1 (tree instance, tree fns, vec **args, basetype = TYPE_MAIN_VARIANT (TREE_TYPE (instance)); gcc_assert (CLASS_TYPE_P (basetype)); - if (processing_template_decl) -{ - orig_args = args == NULL ? NULL : make_tree_vector_copy (*args); - instance = build_non_dependent_expr (instance); - if (args != NULL) - make_args_non_dependent (*args); -} - user_args = args == NULL ? NULL : *args; /* Under DR 147 A::A() is an invalid constructor call, not a functional cast. */ @@ -9132,12 +9124,21 @@ build_new_method_call_1 (tree instance, tree fns, vec **args, return call; } + if (processing_template_decl) +{ + orig_args = args == NULL ? NULL : make_tree_vector_copy (*args); + instance = build_non_dependent_expr (instance); + if (args != NULL) + make_args_non_dependent (*args); +} + /* Process the argument list. */ if (args != NULL && *args != NULL) { *args = resolve_args (*args, complain); if (*args == NULL) return error_mark_node; + user_args = *args; } /* Consider the object argument to be used even if we end up selecting a diff --git a/gcc/testsuite/g++.dg/parse/ctor10.C b/gcc/testsuite/g++.dg/parse/ctor10.C new file mode 100644 index 000..99d3ca8c18d --- /dev/null +++ b/gcc/testsuite/g++.dg/parse/ctor10.C @@ -0,0 +1,14 @@ +// PR c++/85262 +// { dg-options -fpermissive } + +struct A {}; + +template struct B : A +{ + B() + { +A::A(A()); // { dg-warning "constructor" } + } +}; + +B<0> b;
[PATCH] Assorted -masm=intel fixes (PR target/85281)
Hi! I've tested make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} i386.exp vect.exp' testing and looked solely at assembly Error: (there are many scan-assembler* directives that just fail, and some tests use e.g. only att inline asm etc.). The following patch fixes what I found that way. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? The patch fixes: -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors) BTW, -masm=intel seems to be in quite bad shape even in the assembler, in various testcases I'm getting errors like on the following reduced one: int k1, xmm0; int foo (void) { return k1; } int bar (void) { return xmm0; } gcc -masm=intel -O2 /tmp/cch0mo1K.s: Assembler messages: /tmp/cch0mo1K.s:10: Error: invalid use of register /tmp/cch0mo1K.s:21: Error: invalid use of register As ICC generates the same assembly on the instructions: mov eax, DWORD PTR k1[rip] ... mov eax, DWORD PTR xmm0[rip] I think either the intel syntax spec is faulty, or gas is buggy and should figure out that after *WORD PTR and before [ there is symbol rather than register name. Some testcases e.g. have k1 as function name and that results in other asm errors (about .size directive). 2018-04-09 Jakub JelinekPR target/85281 * config/i386/sse.md (reduces, avx512f_vmcmp3, avx512f_vmcmp3_mask, avx512f_sgetexp, avx512f_rndscale, avx512dq_ranges, avx512f_vgetmant): Use %2 instead of %2 for -masm=intel. (avx512f_vcvtss2usi, avx512f_vcvtss2usiq, avx512f_vcvttss2usi, avx512f_vcvttss2usiq): Use %k1 instead of %1 for -masm=intel. (avx512f_vcvtsd2usi, avx512f_vcvtsd2usiq, avx512f_vcvttsd2usi, avx512f_vcvttsd2usiq, ufloatv2siv2df2): Use %q1 instead of %1 for -masm=intel. (avx512f_sfixupimm, avx512f_sfixupimm_mask): Use %3 instead of %3 for -masm=intel. (sse2_shufpd_v2df_mask): Fix a typo, change %{6%} to %{%6%} for -masm=intel. (*avx512vl_v2div2qi2_store): Use %w0 instead of %0 for -masm=intel. (*avx512vl_v4qi2_store): Use %k0 instead of %0 for -masm=intel. (avx512vl_v4qi2_mask_store): Use a single pattern with %k0 and %1 for -masm=intel rather than two patterns, one with %0 and %g1. (*avx512vl_v8qi2_store): Use %q0 instead of %0 for -masm=intel. (avx512vl_v8qi2_mask_store): Use a single pattern with %q0 and %1 for -masm=intel rather than two patterns, one with %0 and %g1 and one with %0 and %1. (avx512er_vmrcp28, avx512er_vmrsqrt28): Use %1 instead of %1 for -masm=intel. (avx5124fmaddps_4fmaddps_mask, avx5124fmaddps_4fmaddss_mask, avx5124fmaddps_4fnmaddps_mask, avx5124fmaddps_4fnmaddss_mask, avx5124vnniw_vp4dpwssd_mask, avx5124vnniw_vp4dpwssds_mask): Swap order of %0 and %{%4%} for -masm=intel. (avx5124fmaddps_4fmaddps_maskz, avx5124fmaddps_4fmaddss_maskz, avx5124fmaddps_4fnmaddps_maskz, avx5124fmaddps_4fnmaddss_maskz, avx5124vnniw_vp4dpwssd_maskz, avx5124vnniw_vp4dpwssds_maskz): Swap order of %0 and %{%5%}%{z%} for -masm=intel. --- gcc/config/i386/sse.md.jj 2018-04-09 12:05:38.044703296 +0200 +++ gcc/config/i386/sse.md 2018-04-09 15:15:50.033414875 +0200 @@ -2628,7 +2628,7 @@ (define_insn "reduces\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vreduce\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -2796,7 +2796,7 @@ (define_insn "avx512f_vmcmp3\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssecmp") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") @@ -2814,7
Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT
On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > > the attached patch removes the parallel annotation from DO CONCURRENT. > As discussed in the PR, the autoparallellizer currently generates > wrong code. The only feasible way is to disable the annotation for > gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > regression. > > Regression-tested. OK for trunk? > Yes. -- Steve
C++ PATCH for c++/85277, ICE with invalid offsetof
Here, the code was improperly assuming that the member argument must be a decl at the point of the error, but since we stopped folding foo[0] to foo, in this case it's an INDIRECT_REF. And the existing error for INDIRECT_REF is inaccurate for this case, so let's stick to complaining about asking for offsetof a function. While I was here, I updated the non-standard-layout diagnostic to say "conditionally-supported" rather than undefined. Tested x86_64-cp-linux-gnu, applying to trunk. commit cb3221a892dc8ea6c06d148b3756d82067644869 Author: Jason MerrillDate: Mon Apr 9 12:51:38 2018 -0400 PR c++/85277 - ICE with invalid offsetof. * semantics.c (finish_offsetof): Avoid passing non-DECL to %qD. Adjust -Winvalid-offsetof diagnostic to say conditionally supported. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 59cac77f6b7..8c893ed64b0 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -4043,17 +4043,17 @@ finish_offsetof (tree object_ptr, tree expr, location_t loc) || TREE_CODE (TREE_TYPE (expr)) == METHOD_TYPE || TREE_TYPE (expr) == unknown_type_node) { - if (INDIRECT_REF_P (expr)) - error ("second operand of % is neither a single " - "identifier nor a sequence of member accesses and " - "array references"); - else + while (TREE_CODE (expr) == COMPONENT_REF + || TREE_CODE (expr) == COMPOUND_EXPR) + expr = TREE_OPERAND (expr, 1); + + if (DECL_P (expr)) { - if (TREE_CODE (expr) == COMPONENT_REF - || TREE_CODE (expr) == COMPOUND_EXPR) - expr = TREE_OPERAND (expr, 1); error ("cannot apply % to member function %qD", expr); + inform (DECL_SOURCE_LOCATION (expr), "declared here"); } + else + error ("cannot apply % to member function"); return error_mark_node; } if (TREE_CODE (expr) == CONST_DECL) @@ -4069,9 +4069,9 @@ finish_offsetof (tree object_ptr, tree expr, location_t loc) && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr))) && CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr))) && cp_unevaluated_operand == 0) -pedwarn (loc, OPT_Winvalid_offsetof, - "offsetof within non-standard-layout type %qT is undefined", - TREE_TYPE (TREE_TYPE (object_ptr))); +warning_at (loc, OPT_Winvalid_offsetof, "offsetof within " + "non-standard-layout type %qT is conditionally-supported", + TREE_TYPE (TREE_TYPE (object_ptr))); return fold_offsetof (expr); } diff --git a/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C b/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C index 5c5e9cf246b..cbc2daafbdd 100644 --- a/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C +++ b/gcc/testsuite/g++.dg/ext/builtin-offsetof1.C @@ -1,9 +1,11 @@ // PR c++/51413 -// { dg-options "-w" } +// PR c++/85277 +// { dg-options "-Wno-pointer-arith" } struct A { static void foo(); }; -int i = __builtin_offsetof(A, foo[1]); // { dg-error "neither a single identifier nor a sequence of member accesses and array references" } +int i = __builtin_offsetof(A, foo[1]); // { dg-error "offsetof" } +int j = __builtin_offsetof(volatile A, foo[0]); // { dg-error "offsetof" } diff --git a/gcc/testsuite/g++.dg/other/offsetof3.C b/gcc/testsuite/g++.dg/other/offsetof3.C index 8d982426560..779fc72670a 100644 --- a/gcc/testsuite/g++.dg/other/offsetof3.C +++ b/gcc/testsuite/g++.dg/other/offsetof3.C @@ -12,4 +12,4 @@ protected: typedef X* pX; typedef __SIZE_TYPE__ size_t; -size_t yoff = __builtin_offsetof (X, y); /* { dg-error "35:non-standard-layout" } */ +size_t yoff = __builtin_offsetof (X, y); /* { dg-message "35:non-standard-layout" } */ diff --git a/gcc/testsuite/g++.dg/other/offsetof5.C b/gcc/testsuite/g++.dg/other/offsetof5.C index 86b14488246..8514af087ad 100644 --- a/gcc/testsuite/g++.dg/other/offsetof5.C +++ b/gcc/testsuite/g++.dg/other/offsetof5.C @@ -9,14 +9,14 @@ struct A int }; -int j = offsetof (A, i); // { dg-error "offsetof" } +int j = offsetof (A, i); // { dg-message "offsetof" } template struct S { T h; T - static const int j = offsetof (S, i); // { dg-error "offsetof" } + static const int j = offsetof (S, i); // { dg-message "offsetof" } }; int k = S::j; // { dg-message "required from here" } diff --git a/gcc/testsuite/g++.dg/other/offsetof8.C b/gcc/testsuite/g++.dg/other/offsetof8.C index 0668199b366..211c5127026 100644 --- a/gcc/testsuite/g++.dg/other/offsetof8.C +++ b/gcc/testsuite/g++.dg/other/offsetof8.C @@ -9,4 +9,4 @@ struct B: virtual A { }; int a[] = { !&((B*)0)->i,// { dg-error "invalid access to non-static data member" } __builtin_offsetof (B, i) // { dg-error "invalid access to non-static" } -}; // { dg-error "offsetof within non-standard-layout type" "" { target *-*-* } .-1 } +}; // { dg-message "offsetof within non-standard-layout type" "" { target *-*-* } .-1 }
Re: [patch, fortran] Remove parallell annotation from DO CONCURRENT
On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > Hello world, > > the attached patch removes the parallel annotation from DO CONCURRENT. > As discussed in the PR, the autoparallellizer currently generates > wrong code. The only feasible way is to disable the annotation for > gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > regression. Can't it be turned into annot_expr_ivdep_kind instead of annot_expr_parallel_kind for GCC8? I mean, the potential local addressable variables shouldn't be a problem for vectorization, just autopar, right? Jakub
[patch, fortran] Remove parallell annotation from DO CONCURRENT
Hello world, the attached patch removes the parallel annotation from DO CONCURRENT. As discussed in the PR, the autoparallellizer currently generates wrong code. The only feasible way is to disable the annotation for gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 regression. Regression-tested. OK for trunk? Regards Thomas 2018-04-09 Thomas KoenigPR fortran/83064 * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for parallell processing of DO CONCURRENT. 2018-04-09 Thomas Koenig PR fortran/83064 * gfortran.dg/do_concurrent_5.f90: New test. * gfortran.dg/vect/vect-do-concurrent-1.f90: Xfail. Adjust dg-bogus message. Index: fortran/trans-stmt.c === --- fortran/trans-stmt.c (Revision 259222) +++ fortran/trans-stmt.c (Arbeitskopie) @@ -3642,12 +3642,6 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr /* The exit condition. */ cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - if (forall_tmp->do_concurrent) - cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, - build_int_cst (integer_type_node, - annot_expr_parallel_kind), - integer_zero_node); - tmp = build1_v (GOTO_EXPR, exit_label); tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, tmp, build_empty_stmt (input_location)); Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 === --- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Revision 259222) +++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Arbeitskopie) @@ -1,6 +1,8 @@ ! { dg-do compile } ! { dg-require-effective-target vect_float } ! { dg-additional-options "-O3 -fopt-info-vec-optimized" } +! { xfail *-*-* } +! PR 83064 - DO CONCURRENT is no longer vectorized for this case. subroutine test(n, a, b, c) integer, value :: n @@ -12,4 +14,4 @@ subroutine test(n, a, b, c) end subroutine test ! { dg-message "loop vectorized" "" { target *-*-* } 0 } -! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 } +! Restoration of the test case will need dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 ! { dg-do run } ! PR 83064 - this used to give wrong results. ! { dg-options "-O3 -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main use, intrinsic :: iso_fortran_env implicit none integer, parameter :: nsplit = 4 integer(int64), parameter :: ne = 2000 integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i real(real64), dimension(nsplit) :: pi edof(1::4) = 1 edof(2::4) = 2 edof(3::4) = 3 edof(4::4) = 4 stride = ceiling(real(ne)/nsplit) do i = 1, nsplit high(i) = stride*i end do do i = 2, nsplit low(i) = high(i-1) + 1 end do low(1) = 1 high(nsplit) = ne pi = 0 do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort contains pure function compute( low, high ) result( ttt ) integer(int64), intent(in) :: low, high real(real64), dimension(nsplit) :: ttt integer(int64) :: j, k ttt = 0 ! Unrolled loop ! do j = low, high, 4 ! k = 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! k = 2 ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) ! k = 3 ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) ! k = 4 ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) ! end do ! Loop with modulo operation ! do j = low, high ! k = mod( j, nsplit ) + 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! end do ! Loop with subscripting via host association do j = low, high k = edof(j) ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) end do end function end program main
Re: [C++ PATCH] Fix invalid structured binding range for parsing error recovery (PR c++/85194)
OK. On Wed, Apr 4, 2018 at 3:48 PM, Jakub Jelinekwrote: > Hi! > > The maybe_range_for_decl argument is documented on > cp_parser_simple_declaration as: >If MAYBE_RANGE_FOR_DECL is not NULL, the pointed tree will be set to the >parsed declaration if it is an uninitialized single declarator not followed >by a `;', or to error_mark_node otherwise. Either way, the trailing `;', >if present, will not be consumed. */ > but we initialize *maybe_range_for_decl to NULL_TREE at the beginning, to > detect if we saw more than one decl; in case of a structured binding, we > only set it to non-NULL on success and thus violate what the comment says, > and on testcase like below we return is_range_for from the callers because > it is followed by colon, yet range_decl is NULL and we ICE trying to > dereference it. This patch ensures it is error_mark_node instead, it can > only happen in case of errors. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2018-04-04 Jakub Jelinek > > PR c++/85194 > * parser.c (cp_parser_simple_declaration): For structured bindings, > if *maybe_range_for_decl is NULL after parsing it, set it to > error_mark_node. > > * g++.dg/cpp1z/decomp37.C: New test. > > --- gcc/cp/parser.c.jj 2018-04-03 18:05:26.0 +0200 > +++ gcc/cp/parser.c 2018-04-04 14:59:53.562859572 +0200 > @@ -12993,8 +12993,14 @@ cp_parser_simple_declaration (cp_parser* > /* The next token should be either a `,' or a `;'. */ > cp_token *token = cp_lexer_peek_token (parser->lexer); > /* If it's a `;', we are done. */ > - if (token->type == CPP_SEMICOLON || maybe_range_for_decl) > + if (token->type == CPP_SEMICOLON) > goto finish; > + else if (maybe_range_for_decl) > + { > + if (*maybe_range_for_decl == NULL_TREE) > + *maybe_range_for_decl = error_mark_node; > + goto finish; > + } > /* Anything else is an error. */ > else > { > --- gcc/testsuite/g++.dg/cpp1z/decomp37.C.jj2018-04-04 15:12:33.724191835 > +0200 > +++ gcc/testsuite/g++.dg/cpp1z/decomp37.C 2018-04-04 15:13:40.803227382 > +0200 > @@ -0,0 +1,14 @@ > +// PR c++/85194 > +// { dg-do compile { target c++11 } } > +// { dg-options "" } > + > +struct A { int i; }; > + > +A x[2]; > + > +void > +foo () > +{ > + for (auto [i] = A () : x)// { dg-error "initializer in range-based > 'for' loop" } > +; // { dg-warning "structured bindings only > available with" "" { target c++14_down } .-1 } > +} > > Jakub
C++ PATCH for c++/85264, ICE with extra template parameter list
cp_parser_check_template_parameters is supposed to catch when we have the wrong number of template parameter lists, but it wasn't diagnosing this case. Fixed by checking whether the thing we're declaring used a template-id; the case of declaring a primary template, when we allow one more template parameter list, uses a plain identifier. Tested x86_64-pc-linux-gnu, applying to trunk. commit 770f6eaf7320d908cd39ace3aa155e7ec829982d Author: Jason MerrillDate: Mon Apr 9 14:03:15 2018 -0400 PR c++/85264 - ICE with excess template-parameter-list. * parser.c (cp_parser_check_template_parameters): Add template_id_p parameter. Don't allow an extra template header if true. (cp_parser_class_head): Pass template_id_p. (cp_parser_elaborated_type_specifier): Likewise. (cp_parser_alias_declaration): Likewise. (cp_parser_check_declarator_template_parameters): Likewise. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 0ffa13de537..0e469259008 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -2498,7 +2498,7 @@ static tree cp_parser_maybe_treat_template_as_class static bool cp_parser_check_declarator_template_parameters (cp_parser *, cp_declarator *, location_t); static bool cp_parser_check_template_parameters - (cp_parser *, unsigned, location_t, cp_declarator *); + (cp_parser *, unsigned, bool, location_t, cp_declarator *); static cp_expr cp_parser_simple_cast_expression (cp_parser *); static tree cp_parser_global_scope_opt @@ -17917,6 +17917,7 @@ cp_parser_elaborated_type_specifier (cp_parser* parser, if (template_parm_lists_apply && !cp_parser_check_template_parameters (parser, /*num_templates=*/0, + /*template_id*/false, token->location, /*declarator=*/NULL)) return error_mark_node; @@ -18971,6 +18972,7 @@ cp_parser_alias_declaration (cp_parser* parser) if (parser->num_template_parameter_lists && !cp_parser_check_template_parameters (parser, /*num_templates=*/0, + /*template_id*/false, id_location, /*declarator=*/NULL)) return error_mark_node; @@ -23117,6 +23119,7 @@ cp_parser_class_head (cp_parser* parser, /* Make sure that the right number of template parameters were present. */ if (!cp_parser_check_template_parameters (parser, num_templates, + template_id_p, type_start_token->location, /*declarator=*/NULL)) { @@ -26391,17 +26394,22 @@ cp_parser_check_declarator_template_parameters (cp_parser* parser, { unsigned num_templates = 0; tree scope = declarator->u.id.qualifying_scope; + bool template_id_p = false; if (scope) num_templates = num_template_headers_for_class (scope); else if (TREE_CODE (declarator->u.id.unqualified_name) == TEMPLATE_ID_EXPR) - /* If the DECLARATOR has the form `X' then it uses one - additional level of template parameters. */ - ++num_templates; + { + /* If the DECLARATOR has the form `X' then it uses one + additional level of template parameters. */ + ++num_templates; + template_id_p = true; + } return cp_parser_check_template_parameters - (parser, num_templates, declarator_location, declarator); + (parser, num_templates, template_id_p, declarator_location, + declarator); } case cdk_function: @@ -26430,6 +26438,7 @@ cp_parser_check_declarator_template_parameters (cp_parser* parser, static bool cp_parser_check_template_parameters (cp_parser* parser, unsigned num_templates, + bool template_id_p, location_t location, cp_declarator *declarator) { @@ -26439,7 +26448,8 @@ cp_parser_check_template_parameters (cp_parser* parser, return true; /* If there are more, but only one more, then we are referring to a member template. That's OK too. */ - if (parser->num_template_parameter_lists == num_templates + 1) + if (!template_id_p + && parser->num_template_parameter_lists == num_templates + 1) return true; /* If there are more template classes than parameter lists, we have something like: diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic176.C b/gcc/testsuite/g++.dg/cpp0x/variadic176.C new file mode 100644 index 000..1d6e3c2f10a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic176.C @@ -0,0 +1,10 @@ +// PR c++/85264 +// { dg-do compile { target c++11 } } + +template struct A {}; + +template +template +struct A {};// { dg-error "too many template-parameter-lists" } + +A a;
Re: [PATCH] Assorted -masm=intel fixes (PR target/85281)
On Mon, Apr 9, 2018 at 11:37 AM, Jakub Jelinekwrote: > Hi! > > I've tested make check-gcc > RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} > i386.exp vect.exp' > testing and looked solely at assembly Error: (there are many scan-assembler* > directives that just fail, and some tests use e.g. only att inline asm > etc.). > > The following patch fixes what I found that way. Bootstrapped/regtested on > x86_64-linux and i686-linux, ok for trunk? > > The patch fixes: > -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors) > -FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors) > > BTW, -masm=intel seems to be in quite bad shape even in the assembler, in > various testcases I'm getting errors like on the following reduced one: > int k1, xmm0; > int foo (void) { return k1; } > int bar (void) { return xmm0; } > gcc -masm=intel -O2 > /tmp/cch0mo1K.s: Assembler messages: > /tmp/cch0mo1K.s:10: Error: invalid use of register > /tmp/cch0mo1K.s:21: Error: invalid use of register > As ICC generates the same assembly on the instructions: > mov eax, DWORD PTR k1[rip] > ... > mov eax, DWORD PTR xmm0[rip] > I think either the intel syntax spec is faulty, or gas is buggy and should > figure out that after *WORD PTR and before [ there is symbol rather than > register name. Some testcases e.g. have k1 as function name and that > results in other asm errors (about .size directive). > Hi Jan, How does Intel syntax support symbols like eax, k1 and xmm0 with ".intel_syntax noprefix"? -- H.J.
Re: [PATCH] v2: Show pertinent parameter (PR c++/85110)
OK. On Fri, Apr 6, 2018 at 3:54 PM, David Malcolmwrote: > On Thu, 2018-03-29 at 09:20 -0400, Jason Merrill wrote: >> On Wed, Mar 28, 2018 at 4:44 PM, David Malcolm >> wrote: >> > This followup patch updates the specific error-handling path >> > to add a note showing the pertinent parameter decl, taking >> > the output from: >> > >> > test.cc: In function 'void caller(const char*)': >> > test.cc:6:14: error: cannot convert 'const char*' to 'const char**' >> > for argument '2' to 'void callee(int, const char**, int)' >> >callee (1, fmt, 3); >> > ^~~ >> > >> > to: >> > >> > test.cc: In function 'void caller(const char*)': >> > test.cc:6:14: error: cannot convert 'const char*' to 'const char**' >> > for argument '2' to 'void callee(int, const char**, int)' >> >callee (1, fmt, 3); >> > ^~~ >> > test.cc:1:36: note: initializing argument 2 of 'void callee(int, >> > const char**, int)' >> > void callee (int one, const char **two, int three); >> >~^~~ >> > >> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds >> > a further 18 PASS results to g++.sum. >> > >> > Again, not a regression as such, but I've been calling out the >> > underlined >> > arguments as a feature of gcc 8, so would be good to fix. >> > >> > OK for trunk? >> > >> > gcc/cp/ChangeLog: >> > PR c++/85110 >> > * call.c (get_fndecl_argument_location): Make non-static. >> > * cp-tree.h (get_fndecl_argument_location): New decl. >> > * typeck.c (convert_for_assignment): When complaining due >> > to >> > conversions for an argument, show the location of the >> > parameter >> > within the decl. >> > >> > gcc/testsuite/ChangeLog: >> > PR c++/85110 >> > * g++.dg/diagnostic/param-type-mismatch-2.C: Update for the >> > cases >> > where we now show the pertinent parameter. >> > --- >> > gcc/cp/call.c | 2 +- >> > gcc/cp/cp-tree.h| 2 ++ >> > gcc/cp/typeck.c | 10 +++ >> > --- >> > .../g++.dg/diagnostic/param-type-mismatch-2.C | 21 >> > ++--- >> > 4 files changed, 28 insertions(+), 7 deletions(-) >> > >> > diff --git a/gcc/cp/call.c b/gcc/cp/call.c >> > index 1a87f99..e1a0639 100644 >> > --- a/gcc/cp/call.c >> > +++ b/gcc/cp/call.c >> > @@ -6598,7 +6598,7 @@ maybe_print_user_conv_context (conversion >> > *convs) >> > ARGNUM is zero based, -1 indicates the `this' argument of a >> > method. >> > Return the location of the FNDECL itself if there are >> > problems. */ >> > >> > -static location_t >> > +location_t >> > get_fndecl_argument_location (tree fndecl, int argnum) >> > { >> >int i; >> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h >> > index d5382c2..b45880d 100644 >> > --- a/gcc/cp/cp-tree.h >> > +++ b/gcc/cp/cp-tree.h >> > @@ -5965,6 +5965,8 @@ extern bool >> > can_convert_arg (tree, tree, tree, int, >> > tsubst_flags_t); >> > extern bool can_convert_arg_bad(tree, >> > tree, tree, int, >> > tsubst_flags_t); >> > +extern location_t get_fndecl_argument_location (tree, int); >> > + >> > >> > /* A class for recording information about access failures (e.g. >> > private >> > fields), so that we can potentially supply a fix-it hint about >> > diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c >> > index e733c79..742b2e9 100644 >> > --- a/gcc/cp/typeck.c >> > +++ b/gcc/cp/typeck.c >> > @@ -8781,9 +8781,13 @@ convert_for_assignment (tree type, tree rhs, >> >parmnum, >> > complain, flags); >> > } >> > else if (fndecl) >> > - error_at (EXPR_LOC_OR_LOC (rhs, input_location), >> > - "cannot convert %qH to %qI for argument >> > %qP to %qD", >> > - rhstype, type, parmnum, fndecl); >> > + { >> > + error_at (EXPR_LOC_OR_LOC (rhs, input_location), >> > + "cannot convert %qH to %qI for argument >> > %qP to %qD", >> > + rhstype, type, parmnum, fndecl); >> > + inform (get_fndecl_argument_location (fndecl, >> > parmnum), >> > + " initializing argument %P of %qD", >> > parmnum, fndecl); >> >> If you're adding the inform, you don't need the %P of %D in the >> initial error. >> >> Jason > > Thanks. Here's an updated version of the patch which removes them. > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds > 18 PASS results to g++.sum. > > OK for trunk? > > gcc/cp/ChangeLog: > PR c++/85110 > * call.c (get_fndecl_argument_location): Make non-static. > * cp-tree.h
Re: [C++ Patch] PR 85227 ("[7/8/ Regression] ICE with structured binding of a forward declared variable")
OK. On Fri, Apr 6, 2018 at 3:05 PM, Paolo Carliniwrote: > Hi, > > On 06/04/2018 19:04, Jason Merrill wrote: >> >> On Fri, Apr 6, 2018 at 5:01 AM, Paolo Carlini >> wrote: >>> >>> here, for an incomplete type we ICE pretty soon in >>> find_decomp_class_base. >>> Comparing to other cases too, I convinced myself that trying to complete >>> the >>> type is Ok. Also, it seems that in these functions we want to talk about >>> structured binding and use an appropriate location, thus no >>> complete_type_or_maybe_complain. Tested x86_64-linux. >> >> What if, in a template, we defer trying to do bindings to an incomplete >> type, so >> >> extern struct A a; >> >> template >> void f() >> { >>auto [x] = a; >> } >> >> struct A { int i; }; >> >> int main() >> { >>f<0>(); >> } >> >> works? Probably with a pedwarn, as in xref_basetypes or >> cp_parser_dot_deref_incomplete. > > Ok... I tested the very simple patch below, wasnt sure between pedwarn (loc, > 0, ...) and pedwarn (loc, OPT_Wpedantic, ...) but probably we want to former > in order not to be too permissive (for comparison, clang rejects with an > hard error both tests). What do you think? > > Thanks! > Paolo. > > /
[PATCH] Fix some broadcasts in -masm=intel mode (PR target/85281)
Hi! As the following testcase shows, we emit an incorrect PTR prefix in a vpbroadcastb instruction in -masm=intel mode; gas accepts and the manual documents that the input operand is xmm2/m8 for vpbroadcastb and xmm2/m16 for vpbroadcastw, so we need to use BYTE PTR and WORD PTR instead of XMMWORD PTR. The first two hunks are just a simplification, the only reason we couldn't use used in many other spots is that it wasn't covering the 512-bit floating point vectors. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-04-09 Jakub JelinekPR target/85281 * config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases. (_vec_dup): Use a single pattern for modes other than V2DFmode using iptr mode attribute. (_vec_dup): Use iptr mode attribute. * gcc.target/i386/pr85281.c: New test. --- gcc/config/i386/sse.md.jj 2018-04-06 19:19:12.098129938 +0200 +++ gcc/config/i386/sse.md 2018-04-09 12:05:38.044703296 +0200 @@ -804,6 +804,7 @@ (define_mode_attr iptr [(V64QI "b") (V32HI "w") (V16SI "k") (V8DI "q") (V32QI "b") (V16HI "w") (V8SI "k") (V4DI "q") (V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q") + (V16SF "k") (V8DF "q") (V8SF "k") (V4DF "q") (V4SF "k") (V2DF "q") (SF "k") (DF "q")]) @@ -17686,10 +17687,7 @@ (define_insn "_vec_dupmode == V2DFmode) return "vpbroadcastq\t{%1, %0|%0, %q1}"; - if (GET_MODE_SIZE (GET_MODE_INNER (mode)) == 4) -return "vbroadcast\t{%1, %0|%0, %k1}"; - else -return "vbroadcast\t{%1, %0|%0, %q1}"; + return "vbroadcast\t{%1, %0|%0, %1}"; } [(set_attr "type" "ssemov") (set_attr "prefix" "evex") @@ -17702,7 +17700,7 @@ (define_insn "_vec_dup 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]] "TARGET_AVX512BW" - "vpbroadcast\t{%1, %0|%0, %1}" + "vpbroadcast\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) --- gcc/testsuite/gcc.target/i386/pr85281.c.jj 2018-04-09 12:15:57.204757347 +0200 +++ gcc/testsuite/gcc.target/i386/pr85281.c 2018-04-09 12:17:56.938766026 +0200 @@ -0,0 +1,15 @@ +/* PR target/85281 */ +/* { dg-do assemble { target avx512bw } } */ +/* { dg-require-effective-target int128 } */ +/* { dg-require-effective-target masm_intel } */ +/* { dg-options "-O -mavx512bw -masm=intel -w" } */ + +typedef char V __attribute__ ((__vector_size__ (64))); + +V +foo (V v) +{ + v[8] /= (unsigned __int128) 0; + v[0] -= ~255; + return v; +} Jakub
Re: Fix PRs 80463, 83972, 83480
On Mon, Apr 09, 2018 at 01:30:12PM +0300, Andrey Belevantsev wrote: > I think that should be fine, that is, as long as the insn moved up through > all those debug insns, the copy will do that as well. It's that > problematic conditional in sched-deps.c that we should take care of. > > I've reworded the comment and committed the attached patch. Thanks for > your help. The C++ testcase FAILs everywhere: FAIL: g++.dg/pr80463.C -std=gnu++98 (test for excess errors) Excess errors: cc1plus: warning: var-tracking-assignments changes selective scheduling The other testcases in the patch used -w probably to disable the same warning, so I've committed following as obvious to trunk after regtesting it on x86_64-linux and i686-linux: 2018-04-09 Jakub JelinekPR rtl-optimization/80463 * g++.dg/pr80463.C: Add -w to dg-options. --- gcc/testsuite/g++.dg/pr80463.C.jj 2018-04-09 20:15:47.226631780 +0200 +++ gcc/testsuite/g++.dg/pr80463.C 2018-04-09 20:19:43.783616136 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ -/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments" } */ +/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments -w" } */ int *a; int b, c; Jakub
Re: C++ PATCH for c++/85032, rejects-valid with if constexpr in template
On Fri, Apr 6, 2018 at 2:17 PM, Marek Polacekwrote: > On Mon, Mar 26, 2018 at 01:17:22PM -0400, Jason Merrill wrote: >> On Sat, Mar 24, 2018 at 6:59 AM, Marek Polacek wrote: >> > Recently the code in finish_static_assert was changed to use >> > perform_implicit_conversion_flags followed by fold_non_dependent_expr. >> > That >> > broke this test becase when in a template, p_i_c_f merely wraps the expr in >> > an IMPLICIT_CONV_EXPR. fold_non_dependent_expr should be able to fold it >> > to >> > a constant but it gave up because is_nondependent_constant_expression >> > returned >> > false. Jason suggested to fix this roughly like the following, i.e. >> > consider >> > conversions from classes to literal types potentially constant. >> > >> > Bootstrapped/regtested on x86_64-linux, ok for trunk? >> > >> > 2018-03-24 Marek Polacek >> > >> > PR c++/85032 >> > * constexpr.c (potential_constant_expression_1): Consider >> > conversions >> > from classes to literal types potentially constant. >> > >> > * g++.dg/cpp0x/pr51225.C: Adjust error message. >> > * g++.dg/cpp1z/constexpr-if17.C: New test. >> > >> > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c >> > index bebd9f5b5d0..c4b5afe90a2 100644 >> > --- gcc/cp/constexpr.c >> > +++ gcc/cp/constexpr.c >> > @@ -5768,6 +5768,23 @@ potential_constant_expression_1 (tree t, bool >> > want_rval, bool strict, bool now, >> > TREE_TYPE (t)); >> > return false; >> > } >> > + /* This might be a conversion from a class to a literal type. Let's >> > +consider it potentially constant since the conversion might be >> > +a constexpr user-defined conversion. */ >> > + else if (cxx_dialect >= cxx11 >> > + && COMPLETE_TYPE_P (TREE_TYPE (t)) >> > + && literal_type_p (TREE_TYPE (t)) >> >> We probably need to allow dependent types here, too. And incomplete >> classes, which might turn out to be literal later. > > Ok, I've allowed incomplete types, too. And I think the patch also allows > dependent types. Or did you mean using >&& (TREE_TYPE (t) == NULL_TREE >|| !COMPLETE_TYPE_P (TREE_TYPE (t)) >|| literal_type_p (TREE_TYPE (t))) > ? That doesn't seem to be needed. I meant dependent_type_p (TREE_TYPE (t)). I suppose checking COMPLETE_TYPE_P will cover that by accident, but I'd prefer to make it explicit. > + /* If this is a dependent type, it could end up being a class > +with conversions. */ > + if (type == NULL_TREE || WILDCARD_TYPE_P (type)) > + return true; > + /* Or a non-dependent class which has conversions. */ > + else if (CLASS_TYPE_P (type) && TYPE_HAS_CONVERSION (type)) > + return true; And here, a dependent class type like A could fail both of these tests and still end up with conversions when instantiated. We should check dependent_scope_p as well as TYPE_HAS_CONVERSION. Jason
Re: Add missing cases to vect_get_smallest_scalar_type (PR 85286)
On Mon, Apr 09, 2018 at 06:47:45PM +0100, Richard Sandiford wrote: > In this PR we used WIDEN_SUM_EXPR to vectorise: > > short i, y; > int sum; > [...] > for (i = x; i > 0; i--) > sum += y; > > with 4 ints and 8 shorts per vector. The problem was that we set > the VF based only on the ints, then calculated the number of vector > copies based on the shorts, giving 4/8. Previously that led to > ncopies==0, but after r249897 we pick it up as an ICE. > > In this particular case we could vectorise the reduction by setting > ncopies based on the output type rather than the input type, but it > doesn't seem worth adding a special "optimisation" for such a > pathological case. I think it's really an instance of the more general > problem that we can't vectorise using combinations of (say) 64-bit and > 128-bit vectors on targets that support both. We badly need that, there are plenty of PRs where we generate really large vectorized loop because of it e.g. on x86 where we can easily use 128-bit, 256-bit and 512-bit vectors; but I'm afraid it is not a stage4 material. Jakub
Add missing cases to vect_get_smallest_scalar_type (PR 85286)
In this PR we used WIDEN_SUM_EXPR to vectorise: short i, y; int sum; [...] for (i = x; i > 0; i--) sum += y; with 4 ints and 8 shorts per vector. The problem was that we set the VF based only on the ints, then calculated the number of vector copies based on the shorts, giving 4/8. Previously that led to ncopies==0, but after r249897 we pick it up as an ICE. In this particular case we could vectorise the reduction by setting ncopies based on the output type rather than the input type, but it doesn't seem worth adding a special "optimisation" for such a pathological case. I think it's really an instance of the more general problem that we can't vectorise using combinations of (say) 64-bit and 128-bit vectors on targets that support both. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard 2018-04-09 Richard Sandifordgcc/ PR tree-optimization/85286 * tree-vect-data-refs.c (vect_get_smallest_scalar_type): gcc/testsuite/ * gcc.dg/vect/pr85286.c: New test. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c 2018-03-24 10:52:25.616823316 + +++ gcc/tree-vect-data-refs.c 2018-04-09 18:44:09.676561821 +0100 @@ -132,6 +132,8 @@ vect_get_smallest_scalar_type (gimple *s if (is_gimple_assign (stmt) && (gimple_assign_cast_p (stmt) + || gimple_assign_rhs_code (stmt) == DOT_PROD_EXPR + || gimple_assign_rhs_code (stmt) == WIDEN_SUM_EXPR || gimple_assign_rhs_code (stmt) == WIDEN_MULT_EXPR || gimple_assign_rhs_code (stmt) == WIDEN_LSHIFT_EXPR || gimple_assign_rhs_code (stmt) == FLOAT_EXPR)) Index: gcc/testsuite/gcc.dg/vect/pr85286.c === --- /dev/null 2018-04-08 19:55:28.217132277 +0100 +++ gcc/testsuite/gcc.dg/vect/pr85286.c 2018-04-09 18:44:09.675561881 +0100 @@ -0,0 +1,19 @@ +/* PR tree-optimization/45241 */ +/* { dg-do compile } */ +/* { dg-additional-options "--param scev-max-expr-complexity=0" } */ + +int +foo (short x) +{ + short i, y; + int sum; + + for (i = 0; i < x; i++) +y = x * i; + + for (i = x; i > 0; i--) +sum += y; + + return sum; +} +
Set insn_last_address in final_1
final_1 already sets insn_current_address for each instruction, making it possible to use some of the address functions in final.c during assembly generation. This patch also sets insn_last_address, since as the comment says, we can treat final as a shorten_branches pass that does nothing. It's then possible to use insn_current_reference_address during final as well. This is needed for the aarch64.md definitions of far_branch to work: (set (attr "far_branch") (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) (lt (minus (match_dup 2) (pc)) (const_int 1048572))) (const_int 0) (const_int 1)))] This value (tested only during final) uses the difference between the INSN_ADDRESSES of operand 2 and insn_current_reference_address to calculate a conservatively-correct estimate of the branch distance. It takes into account the worst-case gap due to alignment, whereas a direct comparison of INSN_ADDRESSES would give an unreliable, optimistic result. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Sameera confirms that it fixes the original bug, but there's no sharable testcase. OK to install? Richard 2018-04-09 Richard Sandifordgcc/ * final.c (final_1): Set insn_last_address as well as insn_current_address. Index: gcc/final.c === --- gcc/final.c 2018-04-09 18:40:26.887628387 +0100 +++ gcc/final.c 2018-04-09 18:40:27.033624471 +0100 @@ -2081,6 +2081,9 @@ final_1 (rtx_insn *first, FILE *file, in } else insn_current_address = INSN_ADDRESSES (INSN_UID (insn)); + /* final can be seen as an iteration of shorten_branches that +does nothing (since a fixed point has already been reached). */ + insn_last_address = insn_current_address; } dump_basic_block_info (file, insn, start_to_bb, end_to_bb,
C++ PATCH for c++/85256, ICE capturing pointer to VLA
Here if we return immediately after the diagnostic about unsupported variably-modified type capture, we avoid the crash. I'm also changing the diagnostic from error to sorry, since it would make sense to support this usage and apparently Clang already does. The second patch below is the beginning of work for more general variably-modified type capture, based on the approach of capturing and remapping all the uses of outer automatic vars in the VLA type when the VLA variable is captured. I've also thought about handling this later, when we actually do something that involves the dimensions, but currently that's mostly hidden in ARRAY_REF. First patch tested x86_64-pc-linux-gnu, applying to trunk. commit c77d362bbe6abdfdf486b061cf2ec5897723dba9 Author: Jason MerrillDate: Sun Apr 8 14:15:35 2018 -0400 PR c++/85256 - ICE capturing pointer to VLA. * lambda.c (add_capture): Distinguish between variable-size and variably-modified types. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 374060626c1..e9b962a8f33 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -554,13 +554,13 @@ add_capture (tree lambda, tree id, tree orig_init, bool by_reference_p, else if (!dependent_type_p (type) && variably_modified_type_p (type, NULL_TREE)) { - error ("capture of variable-size type %qT that is not an N3639 array " + sorry ("capture of variably-modified type %qT that is not an N3639 array " "of runtime bound", type); if (TREE_CODE (type) == ARRAY_TYPE && variably_modified_type_p (TREE_TYPE (type), NULL_TREE)) inform (input_location, "because the array element type %qT has " "variable size", TREE_TYPE (type)); - type = error_mark_node; + return error_mark_node; } else { diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C index d4de131fc23..aee9694852d 100644 --- a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla2.C @@ -7,6 +7,6 @@ void f() { int m = 1; int d[n][m]; [&]() { -return d[1]; // { dg-error "variabl" } +return d[1]; // { dg-prune-output "sorry" } }(); } diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C new file mode 100644 index 000..eebdbcdbb93 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-vla3.C @@ -0,0 +1,10 @@ +// PR c++/85256 +// { dg-do compile { target c++11 } } +// { dg-additional-options -Wno-vla } + +void foo(int i) +{ + int (*x)[i]; + [=]{ [=]{ 0 ? x : x; }; }; // { dg-bogus "sorry" "" { xfail *-*-* } } + +} diff --git a/gcc/testsuite/g++.dg/cpp1y/vla7.C b/gcc/testsuite/g++.dg/cpp1y/vla7.C index df34c8219db..afa5fac508d 100644 --- a/gcc/testsuite/g++.dg/cpp1y/vla7.C +++ b/gcc/testsuite/g++.dg/cpp1y/vla7.C @@ -6,7 +6,7 @@ int main(int argc, char** argv) { int x[1][argc]; - [](int i) { // { dg-error "variable.size" } -x[0][i] = 0; // { dg-prune-output "assignment" } + [](int i) { // { dg-prune-output "sorry" } +x[0][i] = 0; // { dg-prune-output "not captured" } }(5); } diff --git a/gcc/testsuite/g++.dg/cpp1y/vla9.C b/gcc/testsuite/g++.dg/cpp1y/vla9.C index 939de30a3c1..2c5b3a5404e 100644 --- a/gcc/testsuite/g++.dg/cpp1y/vla9.C +++ b/gcc/testsuite/g++.dg/cpp1y/vla9.C @@ -25,7 +25,7 @@ int main(){ fa[0][1]=1.8; auto fx=[&](){ for(int c=0; c<2; c++){ -printf("use me", fa[0][c]); // { dg-error "capture of variable-size type" } +printf("use me", fa[0][c]); // { dg-prune-output "sorry" } } }; call(fx); commit 2918e20db4270ced5705eabac37162c7068c1be7 Author: Jason Merrill Date: Sun Apr 8 14:19:00 2018 -0400 no-vla diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 204791e51cf..a7db19da65f 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -441,7 +441,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX]; 5: CLASS_TYPE_P (in RECORD_TYPE and UNION_TYPE) ENUM_FIXED_UNDERLYING_TYPE_P (in ENUMERAL_TYPE) AUTO_IS_DECLTYPE (in TEMPLATE_TYPE_PARM) - REFERENCE_VLA_OK (in REFERENCE_TYPE) 6: TYPE_DEPENDENT_P_VALID Usage of DECL_LANG_FLAG_?: @@ -455,7 +454,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX]; DECL_TEMPLATE_INSTANTIATED (in a VAR_DECL or a FUNCTION_DECL) DECL_MEMBER_TEMPLATE_P (in TEMPLATE_DECL) USING_DECL_TYPENAME_P (in USING_DECL) - DECL_VLA_CAPTURE_P (in FIELD_DECL) DECL_ARRAY_PARAMETER_P (in PARM_DECL) LABEL_DECL_CONTINUE (in LABEL_DECL) 2: DECL_THIS_EXTERN (in VAR_DECL or FUNCTION_DECL). @@ -3632,11 +3630,6 @@ extern void decl_shadowed_for_var_insert (tree, tree); && (TREE_CODE (TREE_TYPE (TREE_OPERAND ((NODE), 0))) \ == REFERENCE_TYPE)) -/* True if NODE is a REFERENCE_TYPE which is OK to instantiate to be a - reference to VLA type,
Re: Improve code quality across partition boundaries
On April 9, 2018 4:17:45 PM GMT+02:00, Jan Hubickawrote: >Hi, >this patch fixes most offending artifacts across the function partition >bounary with -freorder-blocks-and-partitions which is now on by default >for i386 (no other targets). The problem is that most of cfgcleanup >and >cfgrtl is disabled on crossing edges and thus we produce pretty ugly >code. > >The comment I removed reffers to non-existing comment, but the fixup >of partitioning only looks for crossing jumps. It generates new >patterns for unconditional jumps and ads trampolines for conditional >jumps if taret requests it (i386 doesn't). > >It is thus safe to turn any far jump to near jump (as this patch does) >and based on target we can do more (this patch doesn't) > >For this late of stage4 I think we are fine with enabling edge >forwarding >only. This saves about 0.5% of code segment on cc1 (measured with and >without patch so the code pays back at least) with profiledbootstrap. >I have patches to fix other issues but I think they can wait for next >stage1. In some cases we would get better code with crossjumping >too, but it does not seem critical. > >profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64 >and also >profiledbootstrapped on power. I am running firefox build with LTO+FDO >on >x86_64 too and regtesting on power. OK? OK. Richard. >Honza > > PR rtl/84058 > * cfgcleanup.c (try_forward_edges): Do not give up on crossing > jumps; choose last target that matches the criteria (i.e. > no partition changes for non-crossing jumps). > * cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic > support for redirecting crossing jumps to non-crossing. >Index: cfgcleanup.c >=== >--- cfgcleanup.c (revision 259167) >+++ cfgcleanup.c (working copy) >@@ -394,19 +394,6 @@ > edge_iterator ei; > edge e, *threaded_edges = NULL; > >- /* If we are partitioning hot/cold basic blocks, we don't want to >- mess up unconditional or indirect jumps that cross between hot >- and cold sections. >- >- Basic block partitioning may result in some jumps that appear to >- be optimizable (or blocks that appear to be mergeable), but which >really >- must be left untouched (they are required to make it safely >across >- partition boundaries). See the comments at the top of >- bb-reorder.c:partition_hot_cold_basic_blocks for complete >details. */ >- >- if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b))) >-return false; >- > for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); ) > { > basic_block target, first; >@@ -415,6 +402,7 @@ > bool threaded = false; > int nthreaded_edges = 0; > bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0; >+ bool new_target_threaded = false; > > /* Skip complex edges because we don't know how to update them. > >@@ -431,29 +419,12 @@ > counter = NUM_FIXED_BLOCKS; > goto_locus = e->goto_locus; > >- /* If we are partitioning hot/cold basic_blocks, we don't want >to mess >- up jumps that cross between hot/cold sections. >- >- Basic block partitioning may result in some jumps that appear >- to be optimizable (or blocks that appear to be mergeable), but which >- really must be left untouched (they are required to make it safely >- across partition boundaries). See the comments at the top of >- bb-reorder.c:partition_hot_cold_basic_blocks for complete >- details. */ >- >- if (first != EXIT_BLOCK_PTR_FOR_FN (cfun) >-&& JUMP_P (BB_END (first)) >-&& CROSSING_JUMP_P (BB_END (first))) >- return changed; >- > while (counter < n_basic_blocks_for_fn (cfun)) > { > basic_block new_target = NULL; >-bool new_target_threaded = false; > may_thread |= (target->flags & BB_MODIFIED) != 0; > > if (FORWARDER_BLOCK_P (target) >-&& !(single_succ_edge (target)->flags & EDGE_CROSSING) > && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun)) > { > /* Bypass trivial infinite loops. */ >@@ -543,8 +514,14 @@ > break; > > counter++; >-target = new_target; >-threaded |= new_target_threaded; >+/* Do not turn non-crossing jump to crossing. Depending on target >+ it may require different instruction pattern. */ >+if ((e->flags & EDGE_CROSSING) >+|| BB_PARTITION (first) == BB_PARTITION (new_target)) >+ { >+target = new_target; >+threaded |= new_target_threaded; >+ } > } > > if (counter >= n_basic_blocks_for_fn (cfun)) >Index: cfgrtl.c >=== >--- cfgrtl.c (revision 259167) >+++ cfgrtl.c (working copy) >@@ -4361,6 +4361,18 @@ > if (e->dest == dest) >
RE: [PATCH] [ARC] Fix stack usage info for naked functions.
Accepted & committed, thank you for your contribution, Claudiu > -Original Message- > From: Alexey Brodkin [mailto:abrod...@synopsys.com] > Sent: Friday, April 06, 2018 6:02 PM > To: gcc-patches@gcc.gnu.org > Cc: Claudiu Zissulescu; Alexey Brodkin > > Subject: [PATCH] [ARC] Fix stack usage info for naked functions. > > When code containing "naked" function gets compiled with '-fstack-usage' > compiler prints the following warning: > -->8- > board/synopsys/hsdk/hsdk.c: In function 'hsdk_core_init_f': > board/synopsys/hsdk/hsdk.c:345:1: warning: stack usage computation not > supported for this target > } > ^ > -->8- > > Even though stack calculation makes no sense for "naked" function > we still need to correctly report stack size back to make compiler > happy. > > This problem was caught in U-Boot here: > https://lists.denx.de/pipermail/u-boot/2018-March/324325.html > > The same fix was done earlier for ARM, see: > "config/arm/arm.c (arm_expand_prologue): Set the stack usage to 0 " > https://github.com/gcc-mirror/gcc/commit/023a7c5dd37 > > gcc/ > 2018-04-06 Alexey Brodkin > > * config/arc/arc.c (arc_expand_prologue): Set stack usage info > also for naked functions. > --- > gcc/config/arc/arc.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > index 32fcb81880a2..3cb4ba5b4dd7 100644 > --- a/gcc/config/arc/arc.c > +++ b/gcc/config/arc/arc.c > @@ -3149,7 +3149,11 @@ arc_expand_prologue (void) > >/* Naked functions don't have prologue. */ >if (ARC_NAKED_P (fn_type)) > -return; > +{ > + if (flag_stack_usage_info) > + current_function_static_stack_size = 0; > + return; > +} > >/* Compute total frame size. */ >size = arc_compute_frame_size (); > -- > 2.14.3
RE: [PATCH] [ARC] Add/update combiner patterns.
Committed. Thank you for your review, Claudiu > -Original Message- > From: Andrew Burgess [mailto:andrew.burg...@embecosm.com] > Sent: Saturday, April 07, 2018 11:07 AM > To: Claudiu Zissulescu> Cc: gcc-patches@gcc.gnu.org; claudiu.zissule...@synopsys.com; > francois.bed...@synopsys.com > Subject: Re: [PATCH] [ARC] Add/update combiner patterns. > > * Claudiu Zissulescu [2018-03-07 13:59:03 +0200]: > > > From: claziss > > > > Hi Andrew, > > > > Please find the following patch which improves generating more > instructions with .f flag (i.e., compare with zero). > > > > Ok to apply? > > Looks good. > > Thanks. > Andrew > > > > Claudiu > > > > gcc/ > > 2018-01-26 Claudiu Zissulescu > > > > * config/arc/arc.md (add_shift): New pattern. > > (add_shift2): Likewise. > > (sub_shift): Likewise. > > (sub_shift_cmp0_noout): Likewise. > > (compare_si_ashiftsi): Likewise. > > (xbfu_cmp0_noout): New combine pattern. > > (xbfu_cmp0"): Likewise. > > (movsi_set_cc_insn): Place the predicable variant first. > > (commutative_binary_cmp0_noout): Remove clobber. > > (commutative_binary_cmp0): New pattern. > > (noncommutative_binary_cmp0): Likewise. > > (noncommutative_binary_cmp0_noout): Likewise. > > (noncommutative_binary_comparison_result_used): Removed. > > (rsub_cmp0): New pattern. > > (rsub_cmp0_noout): Likewise. > > (extzvsi): Changed, keep only meaningful variants. > > (SQH, SEZ): New iterators. > > (SQH_postfix): New mode attribute. > > (SEZ_prefix): New code attribute. > > (xt_cmp0_noout): New instruction > pattern. > > (xt_cmp0): Likewise. > > * config/arc/predicates.md (cc_set_register): Use CC_REG instead > > of numerical value. > > (noncommutative_operator): Check the availability of barrel > > shifter option. > > --- > > gcc/config/arc/arc.md| 274 > --- > > gcc/config/arc/predicates.md | 6 +- > > 2 files changed, 233 insertions(+), 47 deletions(-) > > > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > > index 7633d36b5a6..fb3432964ac 100644 > > --- a/gcc/config/arc/arc.md > > +++ b/gcc/config/arc/arc.md > > @@ -810,20 +810,90 @@ archs4x, archs4xd, archs4xd_slow" > >"st%U0 %1,%0\;st%U0.di %1,%0" > >[(set_attr "type" "store")]) > > > > +;; Combiner patterns for compare with zero > > +(define_mode_iterator SQH [QI HI]) > > +(define_mode_attr SQH_postfix [(QI "b") (HI "%_")]) > > + > > +(define_code_iterator SEZ [sign_extend zero_extend]) > > +(define_code_attr SEZ_prefix [(sign_extend "sex") (zero_extend "ext")]) > > + > > +(define_insn "*xt_cmp0_noout" > > + [(set (match_operand 0 "cc_set_register" "") > > + (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand" > "r")) > > + (const_int 0)))] > > + "" > > + ".f\\t0,%1" > > + [(set_attr "type" "compare") > > + (set_attr "cond" "set_zn")]) > > + > > +(define_insn "*xt_cmp0" > > + [(set (match_operand 0 "cc_set_register" "") > > + (compare:CC_ZN (SEZ:SI (match_operand:SQH 1 "register_operand" > "r")) > > + (const_int 0))) > > + (set (match_operand:SI 2 "register_operand" "=r") > > + (SEZ:SI (match_dup 1)))] > > + "" > > + ".f\\t%2,%1" > > + [(set_attr "type" "compare") > > + (set_attr "cond" "set_zn")]) > > + > > +(define_insn "*xbfu_cmp0_noout" > > + [(set (match_operand 0 "cc_set_register" "") > > + (compare:CC_Z > > +(zero_extract:SI > > + (match_operand:SI 1 "register_operand" " r,r") > > + (match_operand:SI 2 "const_int_operand" "C3p,n") > > + (match_operand:SI 3 "const_int_operand" " n,n")) > > +(const_int 0)))] > > + "TARGET_HS && TARGET_BARREL_SHIFTER" > > + { > > + int assemble_op2 = (((INTVAL (operands[2]) - 1) & 0x1f) << 5) | (INTVAL > (operands[3]) & 0x1f); > > + operands[2] = GEN_INT (assemble_op2); > > + return "xbfu%?.f\\t0,%1,%2"; > > + } > > + [(set_attr "type" "shift") > > + (set_attr "iscompact" "false") > > + (set_attr "length" "4,8") > > + (set_attr "predicable" "no") > > + (set_attr "cond" "set_zn")]) > > + > > +(define_insn "*xbfu_cmp0" > > + [(set (match_operand 4 "cc_set_register" "") > > + (compare:CC_Z > > +(zero_extract:SI > > + (match_operand:SI 1 "register_operand" "0 ,r,0") > > + (match_operand:SI 2 "const_int_operand" "C3p,n,n") > > + (match_operand:SI 3 "const_int_operand" "n ,n,n")) > > +(const_int 0))) > > + (set (match_operand:SI 0 "register_operand" "=r,r,r") > > + (zero_extract:SI (match_dup 1) (match_dup 2) (match_dup 3)))] > > + "TARGET_HS && TARGET_BARREL_SHIFTER" > > + { > > + int assemble_op2 = (((INTVAL (operands[2]) - 1) & 0x1f) << 5) | (INTVAL > (operands[3]) & 0x1f); > > + operands[2] = GEN_INT (assemble_op2); > > + return "xbfu%?.f\\t%0,%1,%2"; > > + } > > + [(set_attr "type"
Re: [PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin
Hi Ramana, On 06/04/18 17:17, Thomas Preudhomme wrote: On 06/04/18 17:08, Ramana Radhakrishnan wrote: On 06/04/2018 16:54, Thomas Preudhomme wrote: Instruction pattern for setting the FPSCR expects the input value to be in a register. However, __builtin_arm_set_fpscr expander does not ensure that this is the case and as a result GCC ICEs when the builtin is called with a constant literal. This commit fixes the builtin to force the input value into a register. It also remove the unneeded volatile in the existing fpscr test and fixes the function prototype. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-04-06 Thomas Preud'hommePR target/85261 * config/arm/arm-builtins.c (arm_expand_builtin): Force input operand into register. *** gcc/testsuite/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with literal value. Expect 2 MCR instruction. Fix function prototype. Remove volatile keyword. Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows no regression. Is this ok for stage4? Best regards, Thomas (sorry about the duplicate for those who get it) LGTM, though in this case I would prefer a bootstrap and regression run as this is automatically exercised most with gcc.dg/atomic_*.c and you really need this tested on linux than just bare-metal as I'm not sure how this gets tested on arm-none-eabi. Oh it is indeed. Didn't realized it was used anywhere. Will start bootstrap right away. Done with --with-arch=armv8-a --with-mode=thumb --with-fpu=neon-vfpv4 --with-float=hard --enable-languages=c,c++,fortran --with-system-zlib --enable-plugins --enable-bootstrap. Testsuite for that GCC does not show any regression either. Ok to commit? What about earlier branches, have you looked ? This is a silly target bug and fixes should go back to older branches in this particular case after baking this on trunk for some time. GCC 6 and 7 are affected as well and a backport will be done once it has baked long enough of course. Will now bootstrap and regtest against GCC 6 and 7. Will let you know once that is finished. Best regards, Thomas
Improve code quality across partition boundaries
Hi, this patch fixes most offending artifacts across the function partition bounary with -freorder-blocks-and-partitions which is now on by default for i386 (no other targets). The problem is that most of cfgcleanup and cfgrtl is disabled on crossing edges and thus we produce pretty ugly code. The comment I removed reffers to non-existing comment, but the fixup of partitioning only looks for crossing jumps. It generates new patterns for unconditional jumps and ads trampolines for conditional jumps if taret requests it (i386 doesn't). It is thus safe to turn any far jump to near jump (as this patch does) and based on target we can do more (this patch doesn't) For this late of stage4 I think we are fine with enabling edge forwarding only. This saves about 0.5% of code segment on cc1 (measured with and without patch so the code pays back at least) with profiledbootstrap. I have patches to fix other issues but I think they can wait for next stage1. In some cases we would get better code with crossjumping too, but it does not seem critical. profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64 and also profiledbootstrapped on power. I am running firefox build with LTO+FDO on x86_64 too and regtesting on power. OK? Honza PR rtl/84058 * cfgcleanup.c (try_forward_edges): Do not give up on crossing jumps; choose last target that matches the criteria (i.e. no partition changes for non-crossing jumps). * cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic support for redirecting crossing jumps to non-crossing. Index: cfgcleanup.c === --- cfgcleanup.c(revision 259167) +++ cfgcleanup.c(working copy) @@ -394,19 +394,6 @@ edge_iterator ei; edge e, *threaded_edges = NULL; - /* If we are partitioning hot/cold basic blocks, we don't want to - mess up unconditional or indirect jumps that cross between hot - and cold sections. - - Basic block partitioning may result in some jumps that appear to - be optimizable (or blocks that appear to be mergeable), but which really - must be left untouched (they are required to make it safely across - partition boundaries). See the comments at the top of - bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - - if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b))) -return false; - for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); ) { basic_block target, first; @@ -415,6 +402,7 @@ bool threaded = false; int nthreaded_edges = 0; bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0; + bool new_target_threaded = false; /* Skip complex edges because we don't know how to update them. @@ -431,29 +419,12 @@ counter = NUM_FIXED_BLOCKS; goto_locus = e->goto_locus; - /* If we are partitioning hot/cold basic_blocks, we don't want to mess -up jumps that cross between hot/cold sections. - -Basic block partitioning may result in some jumps that appear -to be optimizable (or blocks that appear to be mergeable), but which -really must be left untouched (they are required to make it safely -across partition boundaries). See the comments at the top of -bb-reorder.c:partition_hot_cold_basic_blocks for complete -details. */ - - if (first != EXIT_BLOCK_PTR_FOR_FN (cfun) - && JUMP_P (BB_END (first)) - && CROSSING_JUMP_P (BB_END (first))) - return changed; - while (counter < n_basic_blocks_for_fn (cfun)) { basic_block new_target = NULL; - bool new_target_threaded = false; may_thread |= (target->flags & BB_MODIFIED) != 0; if (FORWARDER_BLOCK_P (target) - && !(single_succ_edge (target)->flags & EDGE_CROSSING) && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun)) { /* Bypass trivial infinite loops. */ @@ -543,8 +514,14 @@ break; counter++; - target = new_target; - threaded |= new_target_threaded; + /* Do not turn non-crossing jump to crossing. Depending on target +it may require different instruction pattern. */ + if ((e->flags & EDGE_CROSSING) + || BB_PARTITION (first) == BB_PARTITION (new_target)) + { + target = new_target; + threaded |= new_target_threaded; + } } if (counter >= n_basic_blocks_for_fn (cfun)) Index: cfgrtl.c === --- cfgrtl.c(revision 259167) +++ cfgrtl.c(working copy) @@ -4361,6 +4361,18 @@ if (e->dest == dest) return e; + if (e->flags & EDGE_CROSSING + && BB_PARTITION (e->src) == BB_PARTITION (dest) + && simplejump_p (BB_END (src))) +{ + if (dump_file) +
Re: [PATCH] Be more carefull about DECL merging in LTO (PR lto/85248).
On Mon, Apr 9, 2018 at 3:55 PM, Martin Liškawrote: > Hi. > > Following patch adds checking of noreturn attribute when we merge DECLs in > LTO. > Note that proper fix is probably > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43884=diff > but that would be applicable after MPX is removed. Thus in next stage1. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. + return false; + } + } excessive vertical space. Otherwise OK. Thanks, Richard. > Ready to be installed? > Martin > > gcc/lto/ChangeLog: > > 2018-04-06 Richard Biener > Martin Liska > > PR lto/85248 > * lto-symtab.c (lto_symtab_merge_p): Handle noreturn attribute. > > gcc/testsuite/ChangeLog: > > 2018-04-06 Jakub Jelinek > > PR lto/85248 > * gcc.dg/lto/pr85248_0.c: New test. > * gcc.dg/lto/pr85248_1.c: New test. > --- > gcc/lto/lto-symtab.c | 17 ++ > gcc/testsuite/gcc.dg/lto/pr85248_0.c | 45 > > gcc/testsuite/gcc.dg/lto/pr85248_1.c | 9 > 3 files changed, 71 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_0.c > create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_1.c > >
[PATCH] Be more carefull about DECL merging in LTO (PR lto/85248).
Hi. Following patch adds checking of noreturn attribute when we merge DECLs in LTO. Note that proper fix is probably https://gcc.gnu.org/bugzilla/attachment.cgi?id=43884=diff but that would be applicable after MPX is removed. Thus in next stage1. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Martin gcc/lto/ChangeLog: 2018-04-06 Richard BienerMartin Liska PR lto/85248 * lto-symtab.c (lto_symtab_merge_p): Handle noreturn attribute. gcc/testsuite/ChangeLog: 2018-04-06 Jakub Jelinek PR lto/85248 * gcc.dg/lto/pr85248_0.c: New test. * gcc.dg/lto/pr85248_1.c: New test. --- gcc/lto/lto-symtab.c | 17 ++ gcc/testsuite/gcc.dg/lto/pr85248_0.c | 45 gcc/testsuite/gcc.dg/lto/pr85248_1.c | 9 3 files changed, 71 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr85248_1.c diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c index 4f186aed059..fbad96c0707 100644 --- a/gcc/lto/lto-symtab.c +++ b/gcc/lto/lto-symtab.c @@ -572,6 +572,9 @@ lto_symtab_merge_p (tree prevailing, tree decl) return false; } } + + /* FIXME: after MPX is removed, use flags_from_decl_or_type + function instead. PR lto/85248. */ if (DECL_ATTRIBUTES (prevailing) != DECL_ATTRIBUTES (decl)) { tree prev_attr = lookup_attribute ("error", DECL_ATTRIBUTES (prevailing)); @@ -599,6 +602,20 @@ lto_symtab_merge_p (tree prevailing, tree decl) "warning attribute mismatch\n"); return false; } + + prev_attr = lookup_attribute ("noreturn", DECL_ATTRIBUTES (prevailing)); + attr = lookup_attribute ("noreturn", DECL_ATTRIBUTES (decl)); + if ((prev_attr == NULL) != (attr == NULL) + || (prev_attr + && TREE_VALUE (TREE_VALUE (prev_attr)) + != TREE_VALUE (TREE_VALUE (attr + { + if (symtab->dump_file) + fprintf (symtab->dump_file, "Not merging decls; " + "noreturn attribute mismatch\n"); + return false; + } + } return true; } diff --git a/gcc/testsuite/gcc.dg/lto/pr85248_0.c b/gcc/testsuite/gcc.dg/lto/pr85248_0.c new file mode 100644 index 000..df61ac976a5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr85248_0.c @@ -0,0 +1,45 @@ +/* PR lto/85248 */ +/* { dg-lto-do run } */ +/* { dg-lto-options { { -flto -O2 } } } */ + +extern void test_alias (int s, int e) __asm__ (__USER_LABEL_PREFIX__ "test"); +extern void test_noreturn (int s, int e) __asm__ (__USER_LABEL_PREFIX__ "test") + __attribute__ ((__noreturn__)); + +extern inline __attribute__ ((__always_inline__, __gnu_inline__)) void +test (int s, int e) +{ + if (__builtin_constant_p (s) && s != 0) +test_noreturn (s, e); + else +test_alias (s, e); +} + +int +foo (void) +{ + static volatile int a; + return a; +} + +static void +bar (void) +{ + test (0, 1); + __builtin_exit (0); +} + +static void +baz () +{ + test (1, 0); +} + +int +main () +{ + if (foo ()) +baz (); + bar (); + __builtin_abort (); +} diff --git a/gcc/testsuite/gcc.dg/lto/pr85248_1.c b/gcc/testsuite/gcc.dg/lto/pr85248_1.c new file mode 100644 index 000..418c0697028 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr85248_1.c @@ -0,0 +1,9 @@ +/* { dg-options "-fno-lto" } */ + +__attribute__((__noipa__)) void +test (int s, int e) +{ + asm volatile ("" : "+g" (s), "+g" (e) : : "memory"); + if (s) +__builtin_abort (); +}
Re: [PATCH v2, rs6000] Tidy implementation of vec_ldl
Hi! On Wed, Apr 04, 2018 at 10:32:58AM -0500, Kelvin Nilsen wrote: > 2018-04-03 Kelvin Nilsen> > * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove > erroneous entries for > "vector int vec_ldl (int, long int *)", and > "vector unsigned int vec_ldl (int, unsigned long int *)". > Add comments and entries for > "vector bool char vec_ldl (int, bool char *)", > "vector bool short vec_ldl (int, bool short *)", > "vector bool int vec_ldl (int, bool int *)", > "vector bool long long vec_ldl (int, bool long long *)", > "vector pixel vec_ldl (int, pixel *)", > "vector long long vec_ldl (int, long long *)", > "vector unsigned long long vec_ldl (int, unsigned long long *)". > * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new > type tree bool_long_long_type_node and correct definition of > bool_V2DI_type_node to make reference to this new type tree. > (rs6000_mangle_type): Replace erroneous reference to > bool_long_type_node with bool_long_long_type_node. > * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add > comments to emphasize sign distinctions for char and int types and > replace RS6000_BTI_bool_long constant with > RS6000_BTI_bool_long_long constant. Also add comment to restrict > use of RS6000_BTI_pixel. > (bool_long_type_node): Remove this macro definition. > (bool_long_long_type_node): New macro definition > > gcc/testsuite/ChangeLog: > > 2018-04-03 Kelvin Nilsen > > * gcc.target/powerpc/vec-ldl-1.c: New test. > * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect > corrections to ABI implementation. > --- gcc/config/rs6000/rs6000-c.c (revision 258800) > +++ gcc/config/rs6000/rs6000-c.c (working copy) > @@ -1656,27 +1656,45 @@ const struct altivec_builtin_types altivec_overloa > RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 }, > { ALTIVEC_BUILTIN_VEC_LVEBX, ALTIVEC_BUILTIN_LVEBX, > RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 > }, You sent this as format=flowed, which a) makes replies look funny like this, but also b) makes it impossible to apply your patches for someone else. Please fix this (for future patches, no need to resend this). > --- gcc/config/rs6000/rs6000.h (revision 258800) > +++ gcc/config/rs6000/rs6000.h (working copy) > @@ -2578,7 +2578,7 @@ enum rs6000_builtin_type_index > RS6000_BTI_opaque_V2SF, > RS6000_BTI_opaque_p_V2SI, > RS6000_BTI_opaque_V4SI, > - RS6000_BTI_V16QI, > + RS6000_BTI_V16QI, /* __vector signed char */ > RS6000_BTI_V1TI, > RS6000_BTI_V2SI, > RS6000_BTI_V2SF, > @@ -2588,7 +2588,7 @@ enum rs6000_builtin_type_index > RS6000_BTI_V4SI, > RS6000_BTI_V4SF, > RS6000_BTI_V8HI, > - RS6000_BTI_unsigned_V16QI, > + RS6000_BTI_unsigned_V16QI, /* __vector unsigned char */ > RS6000_BTI_unsigned_V1TI, > RS6000_BTI_unsigned_V8HI, > RS6000_BTI_unsigned_V4SI, I don't think these comments help anything. Okay for trunk. Thanks! Segher
[PATCH] Fix PR85284
The following fixes the new capability of analyzing a wrapping IV as non-wrapping under a condition that ensures we do not wrap by restricting it to the case where the exit test must be executed when it reaches that bound. Otherwise we may not analyze the IV with the assumption that the exit test triggers exactly once. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2018-04-09 Richard BienerPR tree-optimization/85284 * tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions): Only use the niter constraining form of simple_iv when the exit is always executed. * gcc.dg/torture/pr85284.c: New testcase. Index: gcc/tree-ssa-loop-niter.c === --- gcc/tree-ssa-loop-niter.c (revision 259227) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -2356,11 +2356,11 @@ number_of_iterations_exit_assumptions (s tree iv0_niters = NULL_TREE; if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt), - op0, , _niters, false)) + op0, , safe ? _niters : NULL, false)) return false; tree iv1_niters = NULL_TREE; if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt), - op1, , _niters, false)) + op1, , safe ? _niters : NULL, false)) return false; /* Give up on complicated case. */ if (iv0_niters && iv1_niters) Index: gcc/testsuite/gcc.dg/torture/pr85284.c === --- gcc/testsuite/gcc.dg/torture/pr85284.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr85284.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do run } */ + +static int p[48], v; + +int +main () +{ + p[32] = 1; + for (int i = 48; i--;) +{ + if (!p[i]) + continue; + if ((i & 7) > 2) + break; + v = i & 1; +} + if (v != 0) +__builtin_abort (); + return 0; +}
[nvptx, PR84041] Add memory_barrier insn
Hi, we've been having hanging OpenMP tests for nvptx offloading: for-{3,5,6}.c and the corresponding C++ test-cases. The failures have now been analyzed down to gomp_ptrlock_get in libgomp/config/nvptx/ptrlock.h: ... static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock) { uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE); if (v > 2) return (void *) v; if (v == 0 && __atomic_compare_exchange_n (ptrlock, , 1, false, MEMMODEL_ACQUIRE, MEMMODEL_ACQUIRE)) return NULL; while (v == 1) v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE); return (void *) v; } ... There's no atomic load insn defined for nvptx, and also no memory barrier insn, so the atomic load ends up generating a normal load. The JIT compiler does loop-invariant code motion, and moves the load out of the loop, which turns the while into an eternal loop. Fix conservatively by defining the memory_barrier insn. This can possibly be fixed more optimally by implementing an atomic load operation in nvptx. Build x86_64 with nvptx accelerator and reg-tested libgomp. Committed to stage4 trunk. Thanks, - Tom [nvptx] Add memory_barrier insn 2018-04-09 Tom de VriesPR target/84041 * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR. (define_expand "*memory_barrier"): New define_expand. (define_insn "memory_barrier"): New insn. --- gcc/config/nvptx/nvptx.md | 22 ++ 1 file changed, 22 insertions(+) diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 4f4453d..68bba36 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -55,6 +55,7 @@ UNSPECV_CAS UNSPECV_XCHG UNSPECV_BARSYNC + UNSPECV_MEMBAR UNSPECV_DIM_POS UNSPECV_FORK @@ -1459,6 +1460,27 @@ "\\tbar.sync\\t%0;" [(set_attr "predicable" "false")]) +(define_expand "memory_barrier" + [(set (match_dup 0) + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] + "" +{ + operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (operands[0]) = 1; +}) + +;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys +;; (corresponding to cuda functions threadfence_block, threadfence and +;; threadfence_system). For the insn memory_barrier we use membar.sys. This +;; may be overconservative, but before using membar.gl instead we'll need to +;; explain in detail why it's safe to use. For now, use membar.sys. +(define_insn "*memory_barrier" + [(set (match_operand:BLK 0 "" "") + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] + "" + "\\tmembar.sys;" + [(set_attr "predicable" "false")]) + (define_insn "nvptx_nounroll" [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)] ""
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
Hi. I'm sending updated version of the patch. Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives regression tests. Martin >From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001 From: marxinDate: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/linux-protos.h (ix86_linux_libc_func_speed): New. (TARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define macro. * config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/t-linux: Add x86-linux.o. * config.gcc: Likewise. * config/i386/x86-linux.c: New file. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-28 Martin Liska * gcc.dg/string-opt-1.c: gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 15 ++- gcc/config.gcc | 1 + gcc/config/i386/i386.c | 5 gcc/config/i386/linux.h | 2 ++ gcc/config/i386/linux64.h | 2 ++ gcc/config/i386/t-linux | 6 + gcc/config/i386/x86-linux.c | 54 + gcc/config/linux-protos.h | 1 + gcc/coretypes.h | 7 + gcc/doc/tm.texi | 4 +++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 11 +++- gcc/expr.h | 3 ++- gcc/target.def | 7 + gcc/targhooks.c | 9 +++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 ++-- 17 files changed, 129 insertions(+), 5 deletions(-) create mode 100644 gcc/config/i386/x86-linux.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 487d9d58db2..98ee3fb272d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall += (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ + bool libcall_avoided = false; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall ? _avoided : NULL); + + if (libcall_avoided) +return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 1b58c060a92..6445ff569b3 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu) x86_64-*-linux*) tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h" extra_options="${extra_options} linux-android.opt" + extra_objs="${extra_objs} x86-linux.o" ;; x86_64-*-kfreebsd*-gnu) tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h" diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b4f6aec1434..2471ff7b99a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -52105,6 +52105,11 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#ifdef SUBTARGET_LIBC_FUNC_SPEED +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED +#endif + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h index 69f97f15b0d..6c59cbd6d62 100644 --- a/gcc/config/i386/linux.h +++ b/gcc/config/i386/linux.h @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.
Re: [PATCH] MIPS/GCC: Mark text contents as code or data
Hi Paul, Apologies for the delay in reply, my @imgtec.com address has since become defunct and I have only come across your e-mail in my ever-lagging routine mailing traffic review. On Wed, 14 Mar 2018, Paul Hua wrote: > I noticed that data-sym-pool.c fails on -O0 flags. > > -O0 output : > -cut-- > frob: > .frame $17,8,$31 # vars= 0, regs= 1/0, args= 0, gp= 0 > .mask 0x0002,0 > .fmask 0x,0 > addiu $sp,-8 > sd $17,0($sp) > move$17,$sp > lw $2,$L4 > move$sp,$17 > ld $17,0($sp) > addiu $sp,8 > jr $31 > .type __pool_frob_3, @object > __pool_frob_3: > .align 2 > $L3: > .word __gnu_local_gp > $L4: > .word 305419896 > .type __pend_frob_3, @function > __pend_frob_3: > .insn > .endfrob > .size frob, .-frob > .ident "GCC: (gcc trunk r258495 mips64el o32 n32 n64) 8.0.1 > 20180313 (experimental)" > -end-- > > Is it expected ? maybe we should add skip-if -O0 flags. Thanks for your report. I think instead we should relax the matching pattern and also accept an arbitrary number of label-`.word' pairs ahead of and/or after the required one in the constant pool. I'll post a fix. Maciej
Re: [nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp (fwd)
Somehow ended up off-list only. -- Forwarded message -- Date: Mon, 9 Apr 2018 09:30:15 +0200 (CEST) From: Richard BienerTo: Gerald Pfeifer Subject: Re: [nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp On Sun, 8 Apr 2018, Gerald Pfeifer wrote: > On Wed, 24 Jan 2018, Tom de Vries wrote: > > On 01/24/2018 12:53 PM, Richard Biener wrote: > >> wrong-code bugs qualify for stage4 if a fix isn't too invasive. Target > >> maintainers have an extra say to override stage4 rules anyway and for > >> non-primary/secondary targets nobody cares anyway. > > Maybe then we should be more clear then in formulation of stage 4 criteria? > > Richi? I did not see anyone approve or reject Tom's patch, and it > did not make it into the tree. > > What is your take? (It's fine as far as wwwdocs go, but this really > is a question on the RMs.) > > Gerald > > > [ Change validated as XHTML 1.0 Transitional ] > > > > Index: htdocs/develop.html > > === > > RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v > > retrieving revision 1.178 > > diff -u -r1.178 develop.html > > --- htdocs/develop.html 15 Jan 2018 08:23:26 - 1.178 > > +++ htdocs/develop.html 24 Jan 2018 13:40:30 - > > @@ -130,10 +130,10 @@ > > Stage 4 > > > > During this period, the only (non-documentation) changes that may > > -be made are changes that fix regressions. Other changes may not be > > -done during this period. Note that the same constraints apply > > -to release branches. This period lasts until stage 1 opens for > > -the next release. > > +be made are changes that fix regressions, or that fix wrong-code bugs > > +in a non-invasive way. Other changes may not be done during this > > +period. Note that the same constraints apply to release branches. > > +This period lasts until stage 1 opens for the next release. Given "non-invasive way" applies generally, also to regression fixes I'd just reformulate it to include other important bugs but mention the extra caution expected (not sure if my wording is good). During this period, the only (non-documentation) changes that may be made are changes that fix regressions. Other important bugs like wrong-code, rejects-valid or build issues may be fixed as well. All changes during this period should be done with extra care on not introducing new regressions - fixing bugs at all cost is not wanted. Note that the same constraints apply to release branches. This period lasts until stage 1 opens for the next release. > > Rationale > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Handle empty infinite loops in OpenACC for PR84955
On Fri, 6 Apr 2018, Jakub Jelinek wrote: > On Fri, Apr 06, 2018 at 06:48:52AM -0700, Cesar Philippidis wrote: > > 2018-04-06 Cesar Philippidis> > > > PR middle-end/84955 > > > > gcc/ > > * cfgloop.c (flow_loops_find): Add assert. > > * omp-expand.c (expand_oacc_for): Add dummy false branch for > > tiled basic blocks without omp continue statements. > > * tree-cfg.c (execute_fixup_cfg): Handle calls to internal > > functions like regular functions. > > > > libgomp/ > > * testsuite/libgomp.oacc-c-c++-common/pr84955.c: New test. > > * testsuite/libgomp.oacc-fortran/pr84955.f90: New test. > > I'd like to defer the cfgloop.c and tree-cfg.c changes to Richard, just want > to > mention that: > > > --- a/gcc/tree-cfg.c > > +++ b/gcc/tree-cfg.c > > @@ -9586,10 +9586,7 @@ execute_fixup_cfg (void) > >for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) > > { > > gimple *stmt = gsi_stmt (gsi); > > - tree decl = is_gimple_call (stmt) > > - ? gimple_call_fndecl (stmt) > > - : NULL; > > - if (decl) > > + if (is_gimple_call (stmt)) > > This change doesn't affect just internal functions, but also all indirect > calls through function pointers with const, pure or noreturn attributes. I think the change is desirable nevertheless. The question is if we want to do it at this point in time. The description of the problem sounds more like LTO writing writing out loops without previously fixing up state. So sth like the following which I'd prefer at this stage (the above hunk is ok for stage1 then). Index: gcc/lto-streamer-out.c === --- gcc/lto-streamer-out.c (revision 259227) +++ gcc/lto-streamer-out.c (working copy) @@ -2084,6 +2151,9 @@ output_function (struct cgraph_node *nod /* Set current_function_decl and cfun. */ push_cfun (fn); + /* Fixup loops if required to match discovery done in the reader. */ + loop_optimizer_init (AVOID_CFG_MODIFICATIONS); + /* Make string 0 be a NULL string. */ streamer_write_char_stream (ob->string_stream, 0); @@ -2176,12 +2246,13 @@ output_function (struct cgraph_node *nod streamer_write_record_start (ob, LTO_null); output_cfg (ob, fn); - - pop_cfun (); } else streamer_write_uhwi (ob, 0); + loop_optimizer_finalize (); + pop_cfun (); + /* Create a section to hold the pickled output of this function. */ produce_asm (ob, function); > > --- a/gcc/omp-expand.c > > +++ b/gcc/omp-expand.c > > @@ -5439,6 +5439,13 @@ expand_oacc_for (struct omp_region *region, struct > > omp_for_data *fd) > > > > split->flags ^= EDGE_FALLTHRU | EDGE_TRUE_VALUE; > > > > + /* Add a dummy exit for the tiled block when cont_bb is missing. */ > > + if (cont_bb == NULL) > > + { > > + edge e = make_edge (body_bb, exit_bb, EDGE_FALSE_VALUE); > > + e->probability = profile_probability::even (); > > + } > > I miss here updating of split->probability, if you make e even (the other edge > needs to have probability of 100%-the probability, i.e. even as well. > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Fix PRs 80463, 83972, 83480
On 06.04.2018 19:18, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> In these PRs we deal with the dependencies we are forced to make between a >> debug insn and its previous insn (unless bb head). In the latter case, if >> such an insn has been moved, we fixup its movement so it aligns with the >> sel-sched invariants. We also carefully adjust seqnos in the case we had a >> single debug insn left in the block. > > This is OK with overlong lines fixed and the new comment reworded for clarity > (see below). > >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev>> >> PR rtl-optimization/80463 >> PR rtl-optimization/83972 >> PR rtl-optimization/83480 >> >> * sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the >> correct producer for the insn. >> (tidy_control_flow): Fixup seqnos in case of debug insns. >> >> * gcc.dg/pr80463.c: New test. >> * g++.dg/pr80463.C: Likewise. >> * gcc.dg/pr83972.c: Likewise. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem >> ATTRIBUTE_UNUSED, >> >> /* Note a dependence. */ >> static void >> -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED, >> +has_dependence_note_dep (insn_t pro, >> ds_t ds ATTRIBUTE_UNUSED) >> { >> - if (!sched_insns_conditions_mutex_p (has_dependence_data.pro, >> - VINSN_INSN_RTX >> (has_dependence_data.con))) >> + insn_t real_pro = has_dependence_data.pro; >> + insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con); >> + >> + /* We do not allow for debug insns to move through others unless they >> + are at the start of bb. Such insns may create bookkeeping copies >> + that would not be able to move up breaking sel-sched invariants. > > I have trouble parsing this, it seems a word is accidentally between "move up" > and "breaking". Also the "such" is a bit ambiguous. > >> + Detect that here and allow that movement if we allowed it before >> + in the first place. */ >> + if (DEBUG_INSN_P (real_con) >> + && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con)) >> +return; > > Should we be concerned about debug insns appearing in sequence here? E.g. if > pro and real_con are not consecutive, but all insns in between are debug > insns? I think that should be fine, that is, as long as the insn moved up through all those debug insns, the copy will do that as well. It's that problematic conditional in sched-deps.c that we should take care of. I've reworded the comment and committed the attached patch. Thanks for your help. Andrey > >> + >> + if (!sched_insns_conditions_mutex_p (real_pro, real_con)) >> { >>ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where]; >> >> @@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying) >> >>gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU); >> >> + /* We could have skipped some debug insns which did not get removed >> with the block, >> + and the seqnos could become incorrect. Fix them up here. */ >> + if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first || >> sel_bb_end (xbb) != last)) >> + { >> + if (!sel_bb_empty_p (xbb->prev_bb)) >> + { >> + int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb)); >> + if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb))) >> + for (insn_t insn = sel_bb_head (xbb); insn != first; insn = >> NEXT_INSN (insn)) >> + INSN_SEQNO (insn) = prev_seqno + 1; >> + } >> + } >> + >>/* It can turn out that after removing unused jump, basic block >> that contained that jump, becomes empty too. In such case >> remove it too. */ >> diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C >> new file mode 100644 >> index 000..5614c28ca45 >> --- /dev/null >> +++ b/gcc/testsuite/g++.dg/pr80463.C >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-g -fselective-scheduling2 -O2 >> -fvar-tracking-assignments" } */ >> + >> +int *a; >> +int b, c; >> +void >> +d () >> +{ >> + for (int e; c; e++) >> +switch (e) >> + { >> + case 0: >> + a[e] = 1; >> + case 1: >> + b = 2; >> + break; >> + default: >> + a[e] = 3; >> + } >> +} >> diff --git a/gcc/testsuite/gcc.dg/pr80463.c b/gcc/testsuite/gcc.dg/pr80463.c >> new file mode 100644 >> index 000..cebf2fef1f3 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr80463.c >> @@ -0,0 +1,54 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-g -O2
Re: Fix PR 83913
On 06.04.2018 19:10, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issue ended up being fixed the way different from described in the PR. >> We do not want to walk away from the invariant "zero SCHED_TIMES -- insn >> is not scheduled" even for bookkeeping copies (testing showed it trips over >> asserts designed to catch this). Rather we choose merging exprs in the way >> the larger sched-times wins. > > My understanding is this is not a "complete" solution to the problem, and a > chance for a similar blowup on some other testcase remains. Still, avoiding > picking the minimum sched-times value should be a good mitigation. Well, it's not much different with any other situation when we pose a limit on pipelining with the sched-times values. At least for now I can't think of something better. Adjusted the comment as per your suggestion and committed the attached patch. Andrey > > Please consider adding a comment that the average sched-times value is taken > as a compromise to thwart "endless" pipelining of bookkeeping-producing insns > available anywhere vs. pipelining of useful insns, or something like that? > > OK with that considered/added. > >> >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev>> >> PR rtl-optimization/83913 >> >> * sel-sched-ir.c (merge_expr_data): Choose the middle between two >> different sched-times >> when merging exprs. >> >> * gcc.dg/pr83913.c: New test. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t >> split_point) >>if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from)) >> EXPR_PRIORITY (to) = EXPR_PRIORITY (from); >> >> - if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from)) >> -EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from); >> + if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from)) >> +EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES >> (to) >> + + 1) / 2); >> >>if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from)) >> EXPR_ORIG_BB_INDEX (to) = 0; >> @@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem >> ATTRIBUTE_UNUSED, >> >> /* Note a dependence. */ >> static void >> diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c >> new file mode 100644 >> index 000..c898d71a261 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr83913.c >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling >> -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate >> -fno-rerun-cse-after-loop -fno-web" } */ >> + >> +int jo, z4; >> + >> +int >> +be (long unsigned int l7, int nt) >> +{ >> + int en; >> + >> + jo = l7; >> + for (en = 0; en < 24; ++en) >> +{ >> + jo = (jo / z4) * (!!jo >= ((!!nt) & 2)); >> + if (jo == 0) >> +nt = 0; >> + else >> +{ >> + nt = z4; >> + ++z4; >> + nt = (long unsigned int) nt == (l7 + 1); >> +} >> +} >> + >> + return nt; >> +} >> >> >> Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 259229) --- gcc/ChangeLog (revision 259230) *** *** 1,5 --- 1,12 2018-04-09 Andrey Belevantsev + PR rtl-optimization/83913 + + * sel-sched-ir.c (merge_expr_data): Choose the middle between two + different sched-times when merging exprs. + + 2018-04-09 Andrey Belevantsev + PR rtl-optimization/83962 * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call Index: gcc/testsuite/gcc.dg/pr83913.c === *** gcc/testsuite/gcc.dg/pr83913.c (revision 0) --- gcc/testsuite/gcc.dg/pr83913.c (revision 259230) *** *** 0 --- 1,26 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O2 -funroll-all-loops -fselective-scheduling -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate -fno-rerun-cse-after-loop -fno-web" } */ + + int jo, z4; + + int + be (long unsigned int l7, int nt) + { + int en; + + jo = l7; + for (en = 0; en < 24; ++en) + { + jo = (jo / z4) * (!!jo >= ((!!nt) & 2)); + if (jo == 0) + nt = 0; + else + { + nt = z4; + ++z4; + nt = (long unsigned int) nt == (l7 + 1); + } + } + + return nt; + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 259229) ---
Re: Fix PR 83962
On 06.04.2018 18:59, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issues is about the correct order in which we need to call the >> routines that fix up the control flow for us. > > OK with formatting both in the new comment and the Changelog fixed. Thanks, fixed that in rev. 259229. Andrey > >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev>> >> PR rtl-optimization/83962 >> >> * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call >> tidy_fallthru_edge >> and tidy_control_flow. >> >> * gcc.dg/pr83962.c: New test. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying) >>&& INSN_SCHED_TIMES (BB_END (xbb)) == 0 >>&& !IN_CURRENT_FENCE_P (BB_END (xbb))) >> { >> - if (sel_remove_insn (BB_END (xbb), false, false)) >> -return true; >> + /* We used to call sel_remove_insn here that can trigger >> tidy_control_flow >> + before we fix up the fallthru edge. Correct that ordering by >> +explicitly doing the latter before the former. */ >> + clear_expr (INSN_EXPR (BB_END (xbb))); >>tidy_fallthru_edge (EDGE_SUCC (xbb, 0)); >> + if (tidy_control_flow (xbb, false)) >> + return true; >> } >> >>first = sel_bb_head (xbb); >> diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c >> new file mode 100644 >> index 000..0547e218715 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr83962.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2 >> -fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */ >> +unsigned int ca; >> + >> +void >> +v6 (long long unsigned int as, int p9) >> +{ >> + while (p9 < 1) >> +as = (as != ca) || (as > 1); >> +} >>
Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround
On Mon, 9 Apr 2018, Daniel Hellstrom wrote: > I received feedback from the webmaster that the preferred link is > "http://www.gaisler.com/notes; so I updated it just be pushing. Maybe, but then they should reconfigure their server such that it does not redirect. We've had to many cases in the past where redirects ended up being malign or simply broken, and I am not looking forward towards manually verifying this one every month. Gerald
Re: Fix PR 83530
On 06.04.2018 18:55, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issue is when we cannot correctly make insn tick data for the >> unscheduled code (but processed by the modulo scheduler). Fixed by closely >> following usual scheduling process as described in the PR. > > This is ok with the following nit-picks fixed. Thank, I've committed the attached. Andrey > >> 2018-04-03 Andrey Belevantsev>> >> PR rtl-optimization/83530 >> >> * sel-sched.c (force_next_insn): New global variable. >> (remove_insn_for_debug): When force_next_insn is true, also leave only >> next insn >> in the ready list. >> (sel_sched_region): When the region wasn't scheduled, make another pass >> over it >> with force_next_insn set to 1. > > Overlong lines. > >> * gcc.dg/pr8350.c: New test. > > Typo in test name. > >> --- a/gcc/sel-sched.c >> +++ b/gcc/sel-sched.c >> @@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying) >> distinguishing between bookkeeping copies and original insns. */ >> static int max_uid_before_move_op = 0; >> >> +/* When true, we're always scheduling next insn on the already scheduled >> code >> + to get the right insn data for the following bundling or other passes. >> */ >> +static int force_next_insn = 0; >> + >> /* Remove from AV_VLIW_P all instructions but next when debug counter >> tells us so. Next instruction is fetched from BNDS. */ >> static void >> remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p) >> { >> - if (! dbg_cnt (sel_sched_insn_cnt)) >> + if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn) >> /* Leave only the next insn in av_vliw. */ >> { >>av_set_iterator av_it; >> @@ -7642,7 +7646,13 @@ sel_sched_region (int rgn) >> sel_sched_region_1 (); >>else >> /* Force initialization of INSN_SCHED_CYCLEs for correct bundling. */ > > I believe this comment needs updating. > > Please also consider moving both assignments of reset_sched_cycles_p to > after the if-else statement, just before sel_region_finish call. > >> -reset_sched_cycles_p = true; >> +{ >> + reset_sched_cycles_p = false; >> + pipelining_p = false; >> + force_next_insn = 1; >> + sel_sched_region_1 (); >> + force_next_insn = 0; >> +} >> >>sel_region_finish (reset_sched_cycles_p); >> } Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 259227) --- gcc/ChangeLog (revision 259228) *** *** 1,3 --- 1,13 + 2018-04-09 Andrey Belevantsev + + PR rtl-optimization/83530 + + * sel-sched.c (force_next_insn): New global variable. + (remove_insn_for_debug): When force_next_insn is true, also leave only + next insn in the ready list. + (sel_sched_region): When the region wasn't scheduled, make another pass + over it with force_next_insn set to 1. + 2018-04-08 Monk Chiang * config.gcc (nds32le-*-*, nds32be-*-*): Add nds32/nds32_intrinsic.h Index: gcc/testsuite/gcc.dg/pr83530.c === *** gcc/testsuite/gcc.dg/pr83530.c (revision 0) --- gcc/testsuite/gcc.dg/pr83530.c (revision 259228) *** *** 0 --- 1,15 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */ + int vm, z0; + short int mz; + + int + ny (void) + { + int ch; + + for (ch = 0; ch < 6; ++ch) + vm += ch / vm; + + return z0 + mz; + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 259227) --- gcc/testsuite/ChangeLog (revision 259228) *** *** 1,3 --- 1,8 + 2018-04-09 Andrey Belevantsev + + PR rtl-optimization/83530 + * gcc.dg/pr83530.c: New test. + 2018-04-07 Thomas Koenig PR middle-end/82976 Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 259227) --- gcc/sel-sched.c (revision 259228) *** remove_temp_moveop_nops (bool full_tidyi *** 5004,5015 distinguishing between bookkeeping copies and original insns. */ static int max_uid_before_move_op = 0; /* Remove from AV_VLIW_P all instructions but next when debug counter tells us so. Next instruction is fetched from BNDS. */ static void remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p) { ! if (! dbg_cnt (sel_sched_insn_cnt)) /* Leave only the next insn in av_vliw. */ { av_set_iterator av_it; --- 5004,5019 distinguishing between bookkeeping copies and original insns. */ static int
Re: [Aarch64] Fix conditional branches with target far away.
Hi Richard, I do not see the said patch applied in ToT yet. When do you expect it to be available in ToT? - Thanks and regards, Sameera D. On 30 March 2018 at 17:01, Sameera Deshpandewrote: > Hi Richard, > > The testcase is working with the patch you suggested, thanks for > pointing that out. > > On 30 March 2018 at 16:54, Sameera Deshpande > wrote: >> On 30 March 2018 at 16:39, Richard Sandiford >> wrote: Hi Sudakshina, Thanks for pointing that out. Updated the conditions for attribute length to take care of boundary conditions for offset range. Please find attached the updated patch. I have tested it for gcc testsuite and the failing testcase. Ok for trunk? On 22 March 2018 at 19:06, Sudakshina Das wrote: > Hi Sameera > > On 22/03/18 02:07, Sameera Deshpande wrote: >> >> Hi Sudakshina, >> >> As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the >> far branch instruction offset is inclusive of both the offsets. Hence, >> I am using <=||=> and not <||>= as it was in previous implementation. > > > I have to admit earlier I was only looking at the patch mechanically and > found a difference with the previous implementation in offset comparison. > After you pointed out, I looked up the ARMv8 ARM and I have a couple of > doubts: > > 1. My understanding is that any offset in [-1048576 ,1048572] both > inclusive > qualifies as an 'in range' offset. However, the code for both attribute > length and far_branch has been using [-1048576 ,1048572), that is, ( >= > && < > ). If the far_branch was incorrectly calculated, then maybe the length > calculations with similar magic numbers should also be corrected? Of > course, > I am not an expert in this and maybe this was a conscience decision so I > would ask Ramana to maybe clarify if he remembers. > > 2. Now to come back to your patch, if my understanding is correct, I > think a > far_branch would be anything outside of this range, that is, > (offset < -1048576 || offset > 1048572), anything that can not be > represented in the 21-bit range. > > Thanks > Sudi >>> >>> [...] >>> @@ -466,14 +459,9 @@ [(set_attr "type" "branch") (set (attr "length") (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) -(lt (minus (match_dup 2) (pc)) (const_int 1048572))) +(le (minus (match_dup 2) (pc)) (const_int 1048572))) (const_int 4) - (const_int 8))) >>> >>> Sorry for not replying earlier, but I think the use of "lt" rather than >>> "le" in the current length attribute is deliberate. Distances measured >>> from (pc) in "length" are a bit special in that backward distances are >>> measured from the start of the instruction and forward distances are >>> measured from the end of the instruction: >>> >>> /* The address of the current insn. We implement this actually as the >>> address of the current insn for backward branches, but the last >>> address of the next insn for forward branches, and both with >>> adjustments that account for the worst-case possible stretching of >>> intervening alignments between this insn and its destination. */ >>> >>> This avoids the chicken-and-egg situation of the length of the branch >>> depending on the forward distance and the forward distance depending >>> on the length of the branch. >>> >>> In contrast, this code: >>> @@ -712,7 +695,11 @@ { if (get_attr_length (insn) == 8) { - if (get_attr_far_branch (insn) == 1) + long long int offset; + offset = INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0))) + - INSN_ADDRESSES (INSN_UID (insn)); + + if (offset < -1048576 || offset > 1048572) return aarch64_gen_far_branch (operands, 2, "Ltb", "\\t%0, %1, "); else >>> >>> is reading the final computed addresses, so the code is right to use >>> the real instruction range. (FWIW I agree with Kyrill that using >>> IN_RANGE with hex constants would be clearer.) >>> >>> That said... a possible problem comes from situations like: >>> >>>address length insn >>>..c 8 A >>> ..align to 8 bytes... >>>..8B >>>..c 4 C >>> ..align to 16 bytes... >>>..0D, branch to B >>> >>> when D is at the maximum extent of the branch range and when GCC's length >>> for A is only a conservative estimate. If the length of A turns out to >>> be 4 rather than 8 at assembly time, the alignment to 8
Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround
On 2018-04-07 22:08, Gerald Pfeifer wrote: On Thu, 1 Feb 2018, Daniel Hellstrom wrote: I would like to commit the following comment about LEON3-FT errata now available in GCC-7.3. Thanks, I have now committed it. Apparently the ultimate URL is the one below (which I applied for now)? Gerald I received feedback from the webmaster that the preferred link is "http://www.gaisler.com/notes; so I updated it just be pushing. Regards, Daniel Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.101 diff -u -r1.101 changes.html --- changes.html2 Apr 2018 15:22:38 - 1.101 +++ changes.html7 Apr 2018 20:05:52 - @@ -1324,7 +1324,8 @@ SPARC -Work arounds for the four http://www.gaisler.com/notes;> +Work arounds for the four https://www.gaisler.com/index.php/information/app-tech-notes;> LEON3FT errata GRLIB-TN-0010..0013 have been added. Relevant errata are activated by the target specific -mfix-ut699, -mfix-ut700 and -mfix-gr712rc switches.
Re: [PATCH/doc] - mention all optimization options that enable inlining options
On Sun, 8 Apr 2018, Martin Sebor wrote: > While updating the -Wrestrict option to mention that it works > best not just with -O2 but also at higher optimization levels > I looked around for other options that might benefit from > a similar clarification. I found a few inlining options that > only mention -O2 but that (according to -Q --help=optimizers) > are enabled at other levels as well. The attached patch > changes the manual to reflect that. OK. > Given the large number of options I didn't take the time to > check all optimization options so there could others that could > stand to be similarly improved if we want to be 100% accurate. > If that is, in fact, what we want then we might want to script > this. Yeah, note we now have -Og and -Ofast as well... Richard.