[PR71478] Fix ICE in tree-ssa-reassoc.c
Hi, In PR71478, for vector negation of ssa produced by call stmt, we add vector (-1) and ssa to the ops list. However, in the place where we remove the (-1) from ops list, we failed to do this for vector integer. As a result, rewrite_expr_tree wrongly assumes that it is working with gimple_assign. Attached patch fixes the place where we remove the vector (-1). Regression tested on x86-64-linux-gnu with no new regressions. Regression testing on aarc64-linux-gnu is ongoing. Is this OK for trunk? Thanks, Kugan gcc/testsuite/ChangeLog: 2016-06-10 Kugan Vivekanandarajah* gcc.dg/pr71478.c: New test. gcc/ChangeLog: 2016-06-10 Kugan Vivekanandarajah * tree-ssa-reassoc.c (reassociate_bb): Remove (-1) from ops list for vector integer type. diff --git a/gcc/testsuite/gcc.dg/pr71478.c b/gcc/testsuite/gcc.dg/pr71478.c index e69de29..7b9cce6 100644 --- a/gcc/testsuite/gcc.dg/pr71478.c +++ b/gcc/testsuite/gcc.dg/pr71478.c @@ -0,0 +1,21 @@ +/* PR target/71478 */ +/* { dg-require-effective-target vect_int } */ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +typedef unsigned int __attribute__ ((vector_size (8))) uv2si; +typedef int __attribute__ ((vector_size (8))) v2si; + +uv2si bar (v2si); + +uv2si +foo (void) +{ + v2si x = (v2si) (0x7fff80008000UL); + v2si y = (v2si) (0x8f997fffUL); + uv2si z = x >= y; + uv2si k = bar (x); + uv2si j = k * __builtin_shuffle (z, z, (uv2si) {1, 3}); + return k * j; +} + diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 36b34d3..e32d503 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -5303,8 +5303,7 @@ reassociate_bb (basic_block bb) && rhs_code == MULT_EXPR) { last = ops.last (); - if (((TREE_CODE (last->op) == INTEGER_CST - && integer_minus_onep (last->op)) + if ((integer_minus_onep (last->op) || real_minus_onep (last->op)) && !HONOR_SNANS (TREE_TYPE (lhs)) && (!HONOR_SIGNED_ZEROS (TREE_TYPE (lhs))
Re: [C++ RFC / Patch] Again about PR 70202
... and this version passes testing. For real. Paolo. Index: cp/decl.c === --- cp/decl.c (revision 237285) +++ cp/decl.c (working copy) @@ -6002,6 +6002,13 @@ check_initializer (tree decl, tree init, int flags tree init_code = NULL; tree core_type; + if (init && init != error_mark_node + && TREE_TYPE (init) == error_mark_node) +{ + TREE_TYPE (decl) = error_mark_node; + return NULL_TREE; +} + /* Things that are going to be initialized need to have complete type. */ TREE_TYPE (decl) = type = complete_type (TREE_TYPE (decl)); Index: testsuite/g++.dg/inherit/crash6.C === --- testsuite/g++.dg/inherit/crash6.C (revision 0) +++ testsuite/g++.dg/inherit/crash6.C (working copy) @@ -0,0 +1,10 @@ +// PR c++/70202 + +class A +{ + virtual void foo () { } +}; +class B : public A, A { }; // { dg-error "duplicate base type" } + +B b1, = b1; // { dg-error "incomplete type" } +A a = b2;
Go patch committed: fix error message quoting
This patch to the Go frontend fixes the quoting in an error message. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 236804) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -a87af72757d9a2e4479062a459a41d4540398005 +054ff1ece3dd5888a445efeaf3ae197b16d4186f The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 236804) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -11463,7 +11463,7 @@ Selector_expression::lower_method_expres if (method != NULL && !is_pointer && !method->is_value_method()) { - error_at(location, "method requires pointer (use %<(*%s).%s)%>", + error_at(location, "method requires pointer (use %<(*%s).%s%>)", nt->message_name().c_str(), Gogo::message_name(name).c_str()); return Expression::make_error(location);
[wwwdocs] Follow up on usability improvements patch
I submitted a patch for the website a few weeks ago, but received no response in the negative or positive. Wanted to check if it had been overlooked. Original patch can be found here: https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00767.html Thanks, Stuart
[PATCH, applied] Backport PowerPC ISA 3.0 splat changes to GCC 6.2
I have applied the following patches for ISA 3.0 (power9) that were installed on the trunk in May to the GCC 6.2 branch after doing a bootstrap and regression testing. This patch combines the original patch, and the two followup patches that fixed some bugs. It does remove the all_ones_predicate that had also been used by the ISA 3.0 min/max patch, and it was installed when that patch was back ported. [gcc] 2016-06-09 Michael MeissnerBack port from trunk 2016-05-31 Michael Meissner * config/rs6000/vsx.md (vsx_splat_, V2DI/V2DF): Simplify alternatives, eliminating preferred register class. Add support for the MTVSRDD instruction in ISA 3.0. (vsx_splat_v4si_internal): Use splat_input_operand instead of reg_or_indexed_operand. (vsx_splat_v4sf_internal): Likewise. Back port from trunk 2016-05-31 Michael Meissner PR target/71186 * config/rs6000/vsx.md (xxspltib__nosplit): Add alternatives for loading up all 0's or all 1's. Back port from trunk 2016-05-18 Michael Meissner PR target/70915 * config/rs6000/constraints.md (wE constraint): New constraint for a vector constant that can be loaded with XXSPLTIB. (wM constraint): New constraint for a vector constant of a 1's. (wS constraint): New constraint for a vector constant that can be loaded with XXSPLTIB and a vector sign extend instruction. * config/rs6000/predicates.md (xxspltib_constant_split): New predicates for wE/wS constraints. (xxspltib_constant_nosplit): Likewise. (easy_vector_constant): Add support for constants that can be loaded via XXSPLTIB. (splat_input_operand): Add support for ISA 3.0 word splat operations. * config/rs6000/rs6000.c (xxspltib_constant_p): New function to return if a constant can be loaded with the ISA 3.0 XXSPLTIB instruction and possibly with a sign extension. (output_vec_const_move): Add support for XXSPLTIB. If we are loading up 0/-1 into Altivec registers, prefer using VSPLTISW instead of XXLXOR/XXLORC. (rs6000_expand_vector_init): Add support for ISA 3.0 word splat operations. (rs6000_legitimize_reload_address): Likewise. (rs6000_output_move_128bit): Use output_vec_const_move to emit constants. * config/rs6000/vsx.md (VSX_M): Add TImode (if -mvsx-timode) and combine VSX_M and VSX_M2 into one iterator. (VSX_M2): Likewise. (VSINT_84): New iterators for loading constants with XXSPLTIB. (VSINT_842): Likewise. (UNSPEC_VSX_SIGN_EXTEND): New UNSPEC. (xxspltib_v16qi): New insns to load up constants with the ISA 3.0 XXSPLTIB instruction. (xxspltib__nosplit): Likewise. (xxspltib__split): New insn to load up constants with XXSPLTIB and a sign extend instruction. (vsx_mov): Replace single move that handled all vector types with separate 32-bit and 64-bit moves. Combine the movti_ moves (when -mvsx-timode is in effect) into the main vector moves. Eliminate separate moves for , where the preferred register class () is listed first, and the secondary register class () is listed second with a '?' to discourage use. Prefer loading 0/-1 in any VSX register for ISA 3.0, and Altivec registers for ISA 2.06/2.07 (PR target/70915) so that if the register was involved in a slow operation, the clear/set operation does not wait for the slow operation to finish. Adjust the length attributes for 32-bit mode. Use rs6000_output_move_128bit and drop the use of the string instructions for 32-bit movti when -mvsx-timode is in effect. Use spacing so that the alternatives and attributes don't generate long lines, and put things in columns, so that it is easier to match up the operands and attributes with the insn alternatives. (vsx_mov_64bit): Likewise. (vsx_mov_32bit): Likewise. (vsx_movti_64bit): Fold movti into normal vector moves. (vsx_movti_32bit): Likewise. (vsx_splat_, V4SI/V4SF modes): Add support for ISA 3.0 word splat instructions. (vsx_splat_v4si_internal): Likewise. (vsx_splat_v4sf_internal): Likewise. (vector fusion peepholes): Use VSX_M instead of VSX_M2. (vsx_sign_extend_qi_): New ISA 3.0 instructions to sign extend vector elements. (vsx_sign_extend_hi_): Likewise. (vsx_sign_extend_si_v2di): Likewise. * config/rs6000/rs6000-protos.h (xxspltib_constant_p): Add declaration. * doc/md.texi (PowerPC constraints): Document the wE, wM, and wS constraints. Add
Re: [C++ RFC / Patch] Again about PR 70202
Hi, On 09/06/2016 23:38, Jason Merrill wrote: I would think that when we see a duplicated base we should just drop the duplicates and continue. If the type of b1 is error_mark_node, why isn't the type of b2 also error_mark_node? Thanks Jason. Normally check_initializer massages the decl basing on the init and so to speak transforms an error_mark_node as TREE_TYPE of the init into an error_mark_node as DECL_INITIAL of a reference. Thus we don't see error_mark_node as TREE_TYPE of b2. Now if I do something as trivial as the attached the testcase passes. Next step, analyze the huge breakage caused by that change, I don't dare running the testsuite ;) Thanks, Paolo. /// Index: decl.c === --- decl.c (revision 237280) +++ decl.c (working copy) @@ -6002,6 +6002,12 @@ check_initializer (tree decl, tree init, int flags tree init_code = NULL; tree core_type; + if (TREE_TYPE (init) == error_mark_node) +{ + TREE_TYPE (decl) = error_mark_node; + return NULL_TREE; +} + /* Things that are going to be initialized need to have complete type. */ TREE_TYPE (decl) = type = complete_type (TREE_TYPE (decl));
Re: [PATCH][AArch64] Enable -frename-registers at -O2 and higher
On Tue, May 31, 2016 at 2:56 AM, James Greenhalghwrote: > As you're proposing to have this on by default, I'd like to give a chance > to hear whether there is consensus as to this being the right choice for > the thunderx, xgene1, exynos-m1 and qdf24xx subtargets. I tested this on qdf24xx using SPEC 2006. I'm seeing a very small performance decrease on the int benchmarks which may just be noise, and an even smaller decrease on the FP side which is definitely in the noise. This is an out-of-order part with hardware register renaming, so this optimization probably doesn't do much for this target. I may also be seeing secondary issues, e.g. the pipeline description isn't as good as I would like yet. Jim
Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc
Bruce, The current AIX stdlib.h header file uses #define. This is exactly what I am trying to remove with the fixincludes patch. #define malloc __linux_malloc needs to be replaced with extern void *malloc(size_t) __asm__("__linux_malloc"); and so forth. Thanks, David On Thu, Jun 9, 2016 at 5:18 PM, Bruce Korbwrote: > It ought to work: > > $ for f in m re c v ; do printf '#define %salloc __linux_%salloc\n' $f $f >> done > foo.h > $ grep -E '[ \t](m|re|c|v)alloc +__linux_\1alloc' foo.h > #define malloc __linux_malloc > #define realloc __linux_realloc > #define calloc __linux_calloc > #define valloc __linux_valloc > > and your "%1" is then "m" or "re" or "c" or "v". > I can't test right now tho. > > On Thu, Jun 9, 2016 at 10:49 AM, David Edelsohn wrote: >> >> Hi, Bruce! >> >> I thought about a regex, but the aliases require a full function >> signature and the original, narrow context does not provide a function >> signature. If it was just alias XXXalloc as __linux_XXXalloc, it >> would be more straight forward. If there's a convenient way to add >> the other information, I'm eager to learn. >> >> Thanks, David >> >> On Thu, Jun 9, 2016 at 1:44 PM, Bruce Korb wrote: >> > He's retired, but he ain't dead. >> > I think these could be accomplished with a single fix. >> > Please try a regex expression in the selection and utilize the selection >> > in >> > the replacement. >> > I'll look at it when I have time (give me a few days) >> > >> > On Thu, Jun 9, 2016 at 10:29 AM, Jeff Law wrote: >> >> >> >> On 06/09/2016 11:25 AM, David Edelsohn wrote: >> >>> >> >>> AIX has added variants of malloc, realloc, calloc and valloc with >> >>> greater compatibility with Linux semantics, especially for NULL >> >>> addresses. The variants are declared in stdlib.h and use #define to >> >>> override the normal definition if _LINUX_SOURCE_COMPAT is defined, >> >>> e.g., >> >>> >> >>> #define malloc __linux_malloc >> >>> #define calloc __linux_calloc >> >>> #define realloc __linux_realloc >> >>> #define valloc __linux_valloc >> >>> >> >>> libstdc++-v3 cstdlib specifically undefines a number of stdlib.h >> >>> macros, >> >>> e.g., >> >>> >> >>> // Get rid of those macros defined in in lieu of real >> >>> functions. >> >>> ... >> >>> #undef malloc >> >>> #undef realloc >> >>> >> >>> C++ applications on AIX, especially users of BOOST that include >> >>> cstdlib, encounter unexpected behavior when the definition of malloc >> >>> changes from the expected / requested version. >> >>> >> >>> The following patch updates fixincludes to correct the AIX stdlib.h >> >>> header by converting the #define to GCC asm aliases. I created a >> >>> separate fix for each definition because the order is not guaranteed. >> >>> >> >>> Bootstrapped on powerpc-ibm-aix7.1.0.0. This fixes a recent node.js >> >>> build failure on AIX due to additional dependence on BOOST. >> >>> >> >>> Okay for trunk, GCC 6 and GCC 5? >> >>> >> >>> Thanks, David >> >>> >> >>> * inclhack.def (aix_stdlib_malloc): New fix. >> >>> (aix_stdlib_realloc): New fix. >> >>> (aix_stdlib_calloc): New fix. >> >>> (aix_stdlib_valloc): New fix. >> >>> * fixincl.x: Regenerate. >> >>> * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test. >> >>> [AIX_STDLIB_REALLOC]: New test. >> >>> [AIX_STDLIB_CALLOC]: New test. >> >>> [AIX_STDLIB_VALLOC]: New test. >> >> >> >> Wow, fixincludesI'm not even sure if Bruce is around anymore... >> >> >> >> GIven these are conditional on mach= *-*-aix*, I think you can >> >> self-approve them. >> >> >> >> jeff >> >> >> > > >
Re: [PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently
On Thu, Jun 9, 2016 at 3:39 AM, Richard Bienerwrote: > On Thu, 9 Jun 2016, Jakub Jelinek wrote: > >> On Thu, Jun 09, 2016 at 08:50:15AM +0200, Richard Biener wrote: >> > On Wed, 8 Jun 2016, Jason Merrill wrote: >> > >> > > On Wed, Jun 8, 2016 at 11:16 AM, Marc Glisse >> > > wrote: >> > > > On Wed, 8 Jun 2016, Richard Biener wrote: >> > > > >> > > >> The following works around PR70992 but the issue came up repeatedly >> > > >> that we are not very consistent in preserving the undefined behavior >> > > >> of division or modulo by zero. Ok - the only inconsistency is >> > > >> that we fold 0 % x to 0 but not 0 % 0 (with literal zero). >> > > >> >> > > >> After folding is now no longer done early in the C family FEs the >> > > >> number of diagnostic regressions with the patch below is two. >> > > >> >> > > >> FAIL: g++.dg/cpp1y/constexpr-sfinae.C -std=c++14 (test for excess >> > > >> errors) >> > > >> > > Yep. We don't want to fold away undefined behavior in a constexpr >> > > function, since constexpr evaluation wants to detect undefined >> > > behavior and treat the expression as non-constant in that case. >> > >> > Hmm. So 0 / x is not constant because x might be zero but 0 * x is >> > constant because it can never invoke undefined behavior? Does this mean >> > that 0 << n is not const because n might be too large (I suppose >> > 0 << 12000 is not const already)? Is 0 * (-x) const? x might be INT_MIN. >> >> E.g. for the shifts the C++ FE has cxx_eval_check_shift_p which should >> optionally warn and/or set *non_constant_p. 0 * (-INT_MIN) would be >> non-constant too, etc. >> constexpr int foo (int x) { return -x; } >> constexpr int bar (int x) { return 0 * (-x); } >> constexpr int a = foo (-__INT_MAX__ - 1); >> constexpr int b = bar (-__INT_MAX__ - 1); >> shows that we don't diagnose the latter though, most likely because >> constexpr evaluation is done on the folded tree. I don't think there's any folding before constexpr evaluation in this case, since this doesn't involve a call. >> So, either whatever cp_fold does (which uses fold* underneath) should avoid >> folding such cases, or the constexpr evaluation should be done on a copy >> made before folding. Then cp_fold doesn't have to prohibit optimizations >> and it is a matter of constexpr.c routines to detect all the undefined >> behavior. After all, I think doing constexpr evaluation on unfolded trees >> will have also the advantage of better diagnostic locations. > > Yes, I think constexpr diagnostic / detection should be done on > unfolded trees (not sure why we'd need to a copy here, just do folding > later?). If cp_fold already avoids folding 0 * (-x) then it should > simply avoid folding 0 / x or 0 % x as well (for the particular issue > this thread started on). The issue is with constexpr functions, which get delayed folding like any other function before they are used in constant expression evaluation. The copy would be to preserve the unfolded trees through cp_fold_function. I've been thinking about doing this anyway; this may be the motivation to go ahead with it. Jason
Re: [PATCH] Add ggc-tests.c
On 06/06/2016 03:31 PM, David Malcolm wrote: Jeff approved an earlier version of this (as unittests/test-ggc.c): https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03306.html Not terribly happy with that counter to used to create a big list to detect recursion. But I'm not offhand sure how to avoid without exposing more of the ggc system that is wise. OK if/when prereqs are approved. Minor twiddling if we end up moving it elsewhere or standardizing/reducing header files is pre-approved. This version moves it to gcc/ggc-tests.c and ports it to the new -fself-test approach. For now, I also reduced the count within TEST_F (ggc_test, chain_next) from 2 million to 10, to avoid slowing down the test (though the former takes only about 0.5s on my box). I've also fixed things so that it works with both checked and unchecked builds. Note that the GTY roots within ggc-tests.c are wrapped with #if CHECKING_P which implies that PCH files would be incompatible between release vs checked builds (I'm not sure whether or not that's already the case). I wouldn't expect PCH files to be compatible across release vs checked builds. It might work, but that'd likely be more accident than design (though I am trying to squash out cases where structures change their lengths based on things like ENABLE_CHECKING). I've also added various new tests since Jeff's review, for: * GTY((length)), * unions, and * GTY((user)) Successfully bootstrapped on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * Makefile.in (OBJS): Add ggc-tests.o. (GTFILES): Add ggc-tests.c. * ggc-tests.c: New file. + +static void +test_chain_next () +{ + /* 2 million nodes (and thus the same number of stack frames) ought + to be deep enough to crash if gengtype has created something + that recurses. + + This length reliably causes the test to segfault without the + chain_next/prev optimization on this box (Fedora 20 x86_64 with 128GB + of RAM), but causes this test to take about 0.5s, dominating the time + taken by the overall testsuite. + + We could perhaps lower this by not increasing the stack size so much + in toplev.c, or perhaps reducing the stack size when running this + testcase. */ + const int count = 10; // 200; ISTM like you need to update the comment. And the trailing // 200 should be eliminated. I'm just not sure how to really test for this properly. I'd be tempted to remove it. OK with the comment fixes for test_chain_next. jeff
Minor tweak to df_note_bb_compute
This simply prevents valgrind from complaining about an invalid read when pass_free_cfg::execute calls df_analyze for targets with delay slots, because var_location instructions are present in the RTL stream at this point. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2016-06-09 Eric Botcazou* df-problems.c (df_note_bb_compute): Guard use of DF_INSN_INFO_GET. -- Eric BotcazouIndex: df-problems.c === --- df-problems.c (revision 237165) +++ df-problems.c (working copy) @@ -3498,13 +3498,13 @@ df_note_bb_compute (unsigned int bb_inde FOR_BB_INSNS_REVERSE (bb, insn) { + if (!INSN_P (insn)) + continue; + df_insn_info *insn_info = DF_INSN_INFO_GET (insn); df_mw_hardreg *mw; int debug_insn; - if (!INSN_P (insn)) - continue; - debug_insn = DEBUG_INSN_P (insn); bitmap_clear (do_not_gen);
Re: [PATCH] Fix PR 71407 : Use correct types for live usage of live operations
On 06/09/2016 10:03 AM, Alan Hayward wrote: This patch fixes PR 71407 by ensuring the BIT_FIELD_REF is created using the given vectype and then casted to the result type. Tested on x86 and aarch64. Note that the test vectorizes on x86 but does not vectorize on aarch64 or power (due to a != statement failing to vectorize) Ok to commit? gcc/ PR tree-optimization/71407 PR tree-optimization/71416 * tree-vect-loop.c (vectorizable_live_operation): Use vectype for BIT_FIELD_REF type. Testsuite/ PR tree-optimization/71407 PR tree-optimization/71416 * gcc.dg/vect/pr71407.c: New * gcc.dg/vect/pr71416-1.c: New * gcc.dg/vect/pr71416-2.c: New OK. jeff
Re: [C++ RFC / Patch] Again about PR 70202
I would think that when we see a duplicated base we should just drop the duplicates and continue. If the type of b1 is error_mark_node, why isn't the type of b2 also error_mark_node? Jason On Thu, Jun 9, 2016 at 1:01 PM, Paolo Carliniwrote: > Hi again, > > thus today I had to revert my first try at resolving this error recovery > issue because it caused c++/71465. In retrospect it was a bit heavy handed > anyway, the zeroing of the type served us well for many years... > > Today I tried to investigate the issue a bit more. I remind you that we are > crashing on the gcc_unreachable () at the end of build_simple_base_path > while, during error recovery, we are processing the initialization of the > final 'a' in: > > class A > { > virtual void foo () { } > }; > class B : public A, A { }; // { dg-error "duplicate base type" } > > B b1, = b1; > A a = b2; > > Now the first comment I have about this, about which I'd like to have some > feedback, is that the gcc_unreachable () at the end of > build_simple_base_path without a return seems a bit weird to me, because the > function is *supposed to return a tree*, not void. I guess some front-ends > would even warn on this?!? Anyway, if we change: > > > /* */ > > gcc_unreachable (); > } > > to: > > /* */ > > gcc_assert (seen_error ()); > return error_mark_node; > } > > we get something which, by and large is equally safe in terms of > miscompilations and helps a lot error recovery, ie, we would not have > c++/70202. It's also true however that we would not make much progress in > trying to understand if we could solve the specific problem at issue in a > more insightful way. > > Then my next observation is the following: if we change the testcase to: > > class A > { > virtual void foo () { } > }; > class B : public A, A { }; // { dg-error "duplicate base type" } > > B b1, = b1; > A a = b1; > > thus everything is the same but at the end we have 'b1', not 'b2', we also > don't crash. We are saved not by something particularly smart ;) but by this > check at the beginning of ocp_convert: > > if (error_operand_p (e) || type == error_mark_node) > return error_mark_node; > > that is, the TREE_TYPE of 'e', the VAR_DECL for 'b1', is error_mark_node and > we return early the error_mark_node, we don't reach the problematic > build_base_path. With 'b2' we have a more complicated REFERENCE_REF which we > don't catch, but otherwise everything is the same, we reach build_base_path > and we crash on the gcc_unreachable. In practice, if I add at the beginning > of ocp_convert: > > + if (REFERENCE_REF_P (expr) > + && VAR_P (TREE_OPERAND (expr, 0)) > + && DECL_INITIAL (TREE_OPERAND (expr, 0)) == error_mark_node) > +return error_mark_node; > > it also passes testing. Then the final observation is that we also don't > crash if the type of 'A' is simpler, thus we don't have the declaration of > > virtual void foo () { } > > Then I gather that when the type is "complex" enough we are not cleaning > things up completely enough when we encounter it as a duplicated base. Maybe > as optimal error recovery when we see a duplicated base we should clean up > everything having to do with bases end up with something equivalent to an > incomplete class B; or something like that?!? I don't know what we want to > try short / medium /long term... > > Thanks! > Paolo. > >
[PATCH] PR bootstrap/71481: fix input.c selftest
input.c's selftest::test_reading_source_line attempted to read from __FILE__, which doesn't work if the binary is run from a different location than the build dir. Fix it by rewriting the test to write out a tempfile, and read from that, rather than from __FILE__. I used make_temp_file to create the name for the temporary file, on the grounds that that's what the driver uses for that purpose. This is on top of the patch kit posted as: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00735.html Successfully bootstrapped on x86_64-pc-linux-gnu Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 OK for trunk? gcc/ChangeLog: PR bootstrap/71481 * input.c (selftest::test_reading_source_line): Avoid reading from __FILE__ by creating a tempfile with known content and reading from that instead. --- gcc/input.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/gcc/input.c b/gcc/input.c index 704ee75..08019c9 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1219,23 +1219,33 @@ test_builtins () static void test_reading_source_line () { - /* We will read *this* source file, using __FILE__. - Here is some specific text to read and test for: - The quick brown fox jumps over the lazy dog. */ - const int linenum_after_test_message = __LINE__; - const int linenum = linenum_after_test_message - 1; - + /* Create a tempfile and write some text to it. */ + char *filename = make_temp_file (".txt"); + ASSERT_NE (filename, NULL); + FILE *out = fopen (filename, "w"); + if (!out) +::selftest::fail_formatted (SELFTEST_LOCATION, + "unable to open tempfile: %s", filename); + fprintf (out, + "01234567890123456789\n" + "This is the test text\n" + "This is the 3rd line\n"); + fclose (out); + + /* Read back a specific line from the tempfile. */ int line_size; - const char *source_line = location_get_source_line (__FILE__, linenum, _size); + const char *source_line = location_get_source_line (filename, 2, _size); ASSERT_TRUE (source_line != NULL); - ASSERT_EQ (53, line_size); - if (!strncmp (" The quick brown fox jumps over the lazy dog. */", - source_line, line_size)) + ASSERT_EQ (21, line_size); + if (!strncmp ("This is the test text", + source_line, line_size)) ::selftest::pass (SELFTEST_LOCATION, "source_line matched expected value"); else ::selftest::fail (SELFTEST_LOCATION, "source_line did not match expected value"); + + free (filename); } /* Run all of the selftests within this file. */ -- 1.8.5.3
Re: [PING] [PATCH] c/71392 - SEGV calling integer overflow built-ins with a null pointer
On 06/09/2016 12:34 PM, Martin Sebor wrote: Attached is an updated version of the original patch described below to annotate with the nonnull attribute the Built-In Functions to Perform Arithmetic with Overflow Checking. Since the machinery that's in place doesn't handle the attribute on type-generic built-ins changes to the (handle_nonnull_attribute function in Ada and LTO besides C were required so I CC the maintainers for these areas. Thanks Martin On 06/02/2016 05:22 PM, Martin Sebor wrote: In a discussion of a patch in a this area (c/68120 and c++/70507) Jakub noticed that the integer arithmetic built-ins with overflow checking that expect a pointer to an integer as the last argument silently (i.e., without a warning) accept a null pointer. As the test case in the bug referenced in in subject shows, such calls then crash at runtime. The attached patch follows the same approach used by other built ins that take a pointer to an object (such as __built_strlen) to issue a -Wnonnull warning for such invalid calls. Martin gcc-71392.diff PR c/71392 - SEGV calling integer overflow built-ins with a null pointer gcc/ChangeLog: PR c/71392 * builtin-attrs.def (ATTR_NOTHROW_NONNULL_LEAF_LIST): New macro. (ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF): Same. * builtins.def (BUILT_IN_SADD_OVERFLOW, BUILT_IN_SADDL_OVERFLOW): Use them. (BUILT_IN_SADDLL_OVERFLOW, BUILT_IN_SSUB_OVERFLOW): Same. (BUILT_IN_SSUBL_OVERFLOW, BUILT_IN_SSUBLL_OVERFLOW): Same. (BUILT_IN_SMUL_OVERFLOW, BUILT_IN_SMULL_OVERFLOW): Same. (BUILT_IN_SMULLL_OVERFLOW, BUILT_IN_UADD_OVERFLOW): Same. (BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW): Same. (BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW): Same. (BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW): Same. (BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW): gcc/ada/ChangeLog: PR c/71392 * gcc/ada/gcc-interface/utils.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/c-family/ChangeLog: PR c/71392 * gcc/c-family/c-common.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/lto/ChangeLog: PR c/71392 * gcc/lto/lto-lang.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/testsuite/ChangeLog: PR c/71392 * c-c++-common/builtin-arith-overflow-1.c: Add test cases. OK for the trunk. THanks, Jeff
Re: [PING] [PATCH] Fix ICE with x87 asm operands (PR inline-asm/68843)
On 05/29/2016 08:37 AM, Bernd Edlinger wrote: gcc: 2016-05-22 Bernd EdlingerPR inline-asm/68843 * reg-stack.c (check_asm_stack_operands): Explicit input arguments must be grouped on top of stack. Don't force early clobber on ordinary reg outputs. testsuite: 2016-05-22 Bernd Edlinger PR inline-asm/68843 * gcc.target/i386/pr68843-1.c: New test. * gcc.target/i386/pr68843-2.c: New test. OK for the trunk. Sorry for the wait, Jeff
Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x
On Thu, Jun 09, 2016 at 04:06:53PM -0400, Michael Meissner wrote: > I'm including the global reviewers on the list. I just want to be sure that > there is no problem installing these patches on the GCC 6.2 branch. While it > is technically an enchancement, it is needed to be able to install the glibc > support that is needed to complete the work to add IEEE 128-bit floating > point. > > The issue being fixed is that when we are creating the complex type, we used > to > do a lookup for the size, and that fails on the PowerPC which has 2 128-bit > floating point types (__ibm128 and __float128, with long double currently > defaulting to __ibm128). The rs6000 parts are okay to backport to 6. Thanks, Segher
Re: [PATCH] Fixes to must-tail-call tests
On 05/27/2016 11:44 AM, David Malcolm wrote: On Fri, 2016-05-27 at 13:29 +0100, Thomas Preudhomme wrote: Hi Rainer, On Wednesday 25 May 2016 11:31:12 Rainer Orth wrote: David Malcolmwrites: The following fixes the known failures of the must-tail-call tests. Tested with --target= * aarch64-unknown-linux-gnu * ia64-unknown-linux-gnu * m68k-unknown-linux-gnu * x86_64-pc-linux-gnu Even with this patch, there are still failures on sparc-sun -solaris2.12: FAIL: gcc.dg/plugin/must-tail-call-1.c -fplugin=./must_tail_call_plugin.so (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail -call-1.c:1 2:10: error: cannot tail-call: target is not able to optimize the call into a sibling call FAIL: gcc.dg/plugin/must-tail-call-2.c -fplugin=./must_tail_call_plugin.so (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail -call-2.c:3 2:10: error: cannot tail-call: target is not able to optimize the call into a sibling call My aim with these tests was to try to cover the various ways in which mandatory tail-call optimization can fail. However, this is very target-dependent, and, as written, the test over -specifies the output. Sorry about this. I've run the test on all of the configurations in contrib/config -list.mk (using the patch kit in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02100.html ) Collated output can be seen here: https://dmalcolm.fedorapeople.org/gcc/2016-05-27/must-tail-call-logs.txt showing all the different error messages across every configuration. It's not clear to me what the best way forward here is. We could simply check for error: cannot tail-call: and leave the precise messages we're checking for unspecified. If we care about checking for the precise messages, one of more duplicate copy(s) of the test could be provided, filtering by target to specify precise targets, giving more precise output, where this is known. I'm attaching a patch (sans ChangeLog) which strips the precise messages in the manner described above. Retesting on all targets with this patch, the only remaining failures are of the form: must-tail-call-1.c:12:10: error: cannot tail-call: machine description does not have a sibcall_epilogue instruction pattern on targets lacking the pattern. Thoughts? I probably should have caught this -- there's all kinds of oddball reasons why we might not be able to optimize a tail call, some of which are target dependent. I think trying to give highly precise messages is doomed to long term maintenance headaches because they're going to be target dependent. So I'd go with just stripping the precise messages like you've done. jeff
Re: Revert gcc r227962
On 05/29/2016 08:33 AM, JonY wrote: On 5/23/2016 16:56, JonY wrote: On 5/20/2016 06:36, JonY wrote: On 5/20/2016 02:11, Jeff Law wrote: So if we make this change (revert 227962), my understanding is that cygwin bootstraps will fail because they won't find kernel32 and perhaps other libraries. Jeff I'll need to double check with trunk but gcc-5.3.0 built OK without it. The other alternative is to search /usr/lib before w32api. yep it reached stage 3 but failed from another error building target libraries (libcilkrts), meaning it was able to find the w32api libraries even with this patch reverted. Has it been reverted? I managed to bootstrap after disabling the failing libraries. I've reverted the patch on the trunk. THanks for your patience. jeff>
Re: [PATCH] Remove old diagnostic macros.
On 06/09/2016 12:35 PM, Marcin Baczyński wrote: 2016-06-09 18:35 GMT+02:00 Jeff Law: On 05/30/2016 05:19 PM, Marcin Baczyński wrote: Hi, this is my first GCC patch, so please bear with me if something is wrong with it in an obvious way. I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is? Bootstrapped on x86_64-pc-linux-gnu. 2016-05-31 Marcin Baczyński * gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer): delete. This is fine for the trunk. Please install. Thanks for the approval. As for installing, I don't have write access to the repository, so could someone with the access do it, please? OK. I took care of committing to the repository. Thanks, jeff
Re: move increase_alignment from simple to regular ipa pass
On 8 June 2016 at 20:38, Jan Hubickawrote: >> I think it would be nice to work towards transitioning >> flag_section_anchors to a flag on varpool nodes, thereby removing >> the Optimization flag from common.opt:fsection-anchors >> >> That would simplify the walk over varpool candidates. > > Makes sense to me, too. There are more candidates for sutff that should be > variable specific in common.opt (such as variable alignment, -fdata-sctions, > -fmerge-constants) and targets. We may try to do it in an easy to extend way > so incrementally we can get rid of those global flags, too. In this version I removed Optimization from fsection-anchors entry in common.opt, and gated the increase_alignment pass on flag_section_anchors != 0. Cross tested on arm*-*-*, aarch64*-*-*. Does it look OK ? > > One thing that needs to be done for LTO is sane merging, I guess in this case > it is clear that the variable should be anchored when its previaling > definition > is. Um could we determine during WPA if symbol is a section anchor for merging ? Seems to me SYMBOL_REF_ANCHOR_P is defined only on DECL_RTL and not at tree level. Do we have DECL_RTL info available during WPA ? Thanks, Prathamesh > > Honza >> >> Richard. >> >> > Thanks, >> > Prathamesh >> > > >> > > Honza >> > >> >> > >> Richard. >> > >> > >> >> -- >> Richard Biener >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB >> 21284 (AG Nuernberg) diff --git a/gcc/common.opt b/gcc/common.opt index f0d7196..f93f26c 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2133,7 +2133,7 @@ Common Report Var(flag_sched_dep_count_heuristic) Init(1) Optimization Enable the dependent count heuristic in the scheduler. fsection-anchors -Common Report Var(flag_section_anchors) Optimization +Common Report Var(flag_section_anchors) Access data in the same section from shared anchor points. fsee diff --git a/gcc/passes.def b/gcc/passes.def index 3647e90..3a8063c 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -138,12 +138,12 @@ along with GCC; see the file COPYING3. If not see PUSH_INSERT_PASSES_WITHIN (pass_ipa_tree_profile) NEXT_PASS (pass_feedback_split_functions); POP_INSERT_PASSES () - NEXT_PASS (pass_ipa_increase_alignment); NEXT_PASS (pass_ipa_tm); NEXT_PASS (pass_ipa_lower_emutls); TERMINATE_PASS_LIST (all_small_ipa_passes) INSERT_PASSES_AFTER (all_regular_ipa_passes) + NEXT_PASS (pass_ipa_increase_alignment); NEXT_PASS (pass_ipa_whole_program_visibility); NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_icf); diff --git a/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c new file mode 100644 index 000..74eaed8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target section_anchors } */ +/* { dg-require-effective-target vect_int } */ + +#define N 32 + +/* Clone of section-anchors-vect-70.c with foo() having -fno-tree-loop-vectorize. */ + +static struct A { + int p1, p2; + int e[N]; +} a, b, c; + +__attribute__((optimize("-fno-tree-loop-vectorize"))) +int foo(void) +{ + for (int i = 0; i < N; i++) +a.e[i] = b.e[i] + c.e[i]; + + return a.e[0]; +} + +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target arm*-*-* } } } */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 36299a6..d36aa1d 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -483,7 +483,7 @@ extern simple_ipa_opt_pass *make_pass_local_optimization_passes (gcc::context *c extern ipa_opt_pass_d *make_pass_ipa_whole_program_visibility (gcc::context *ctxt); -extern simple_ipa_opt_pass *make_pass_ipa_increase_alignment (gcc::context +extern ipa_opt_pass_d *make_pass_ipa_increase_alignment (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt); diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2669813..d34e560 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -899,6 +899,34 @@ get_vec_alignment_for_type (tree type) return (alignment > TYPE_ALIGN (type)) ? alignment : 0; } +/* Return true if alignment should be increased for this vnode. + This is done if every function that references/referring to vnode + has flag_tree_loop_vectorize and flag_section_anchors set. */ + +static bool +increase_alignment_p (varpool_node *vnode) +{ + ipa_ref *ref; +
Re: move increase_alignment from simple to regular ipa pass
> On 8 June 2016 at 20:38, Jan Hubickawrote: > >> I think it would be nice to work towards transitioning > >> flag_section_anchors to a flag on varpool nodes, thereby removing > >> the Optimization flag from common.opt:fsection-anchors > >> > >> That would simplify the walk over varpool candidates. > > > > Makes sense to me, too. There are more candidates for sutff that should be > > variable specific in common.opt (such as variable alignment, -fdata-sctions, > > -fmerge-constants) and targets. We may try to do it in an easy to extend > > way > > so incrementally we can get rid of those global flags, too. > In this version I removed Optimization from fsection-anchors entry in > common.opt, > and gated the increase_alignment pass on flag_section_anchors != 0. > Cross tested on arm*-*-*, aarch64*-*-*. > Does it look OK ? If you go this way you will need to do something sane for LTO. Here one can compile some object files with -fsection-anchors and other without and link with random setting (because in traditional compilation linktime flags does not matter). For global flags we have magic in merge_and_complain that determines flags to pass to the LTO compiler. It is not very robust though. > > > > One thing that needs to be done for LTO is sane merging, I guess in this > > case > > it is clear that the variable should be anchored when its previaling > > definition > > is. > Um could we determine during WPA if symbol is a section anchor for merging ? > Seems to me SYMBOL_REF_ANCHOR_P is defined only on DECL_RTL and not at > tree level. > Do we have DECL_RTL info available during WPA ? We don't have anchros computed, but we can decide whether we want to potentially anchor the variable if we can. I would say all you need is to have section_anchor flag in varpool node itself which controls RTL production. At varpool_finalize_decl you will set it according to flag_varpool and stream it to LTO objects. At WPA when doing linking, the section_anchor flag of the previaling decl wins.. Honza > > Thanks, > Prathamesh > > > > Honza > >> > >> Richard. > >> > >> > Thanks, > >> > Prathamesh > >> > > > >> > > Honza > >> > >> > >> > >> Richard. > >> > > >> > > >> > >> -- > >> Richard Biener > >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > >> 21284 (AG Nuernberg) > diff --git a/gcc/common.opt b/gcc/common.opt > index f0d7196..f93f26c 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2133,7 +2133,7 @@ Common Report Var(flag_sched_dep_count_heuristic) > Init(1) Optimization > Enable the dependent count heuristic in the scheduler. > > fsection-anchors > -Common Report Var(flag_section_anchors) Optimization > +Common Report Var(flag_section_anchors) > Access data in the same section from shared anchor points. > > fsee > diff --git a/gcc/passes.def b/gcc/passes.def > index 3647e90..3a8063c 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -138,12 +138,12 @@ along with GCC; see the file COPYING3. If not see >PUSH_INSERT_PASSES_WITHIN (pass_ipa_tree_profile) >NEXT_PASS (pass_feedback_split_functions); >POP_INSERT_PASSES () > - NEXT_PASS (pass_ipa_increase_alignment); >NEXT_PASS (pass_ipa_tm); >NEXT_PASS (pass_ipa_lower_emutls); >TERMINATE_PASS_LIST (all_small_ipa_passes) > >INSERT_PASSES_AFTER (all_regular_ipa_passes) > + NEXT_PASS (pass_ipa_increase_alignment); >NEXT_PASS (pass_ipa_whole_program_visibility); >NEXT_PASS (pass_ipa_profile); >NEXT_PASS (pass_ipa_icf); > diff --git a/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c > b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c > new file mode 100644 > index 000..74eaed8 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target section_anchors } */ > +/* { dg-require-effective-target vect_int } */ > + > +#define N 32 > + > +/* Clone of section-anchors-vect-70.c with foo() having > -fno-tree-loop-vectorize. */ > + > +static struct A { > + int p1, p2; > + int e[N]; > +} a, b, c; > + > +__attribute__((optimize("-fno-tree-loop-vectorize"))) > +int foo(void) > +{ > + for (int i = 0; i < N; i++) > +a.e[i] = b.e[i] + c.e[i]; > + > + return a.e[0]; > +} > + > +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 > "increase_alignment" { target aarch64*-*-* } } } */ > +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 > "increase_alignment" { target powerpc64*-*-* } } } */ > +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 > "increase_alignment" { target arm*-*-* } } } */ > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index 36299a6..d36aa1d 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -483,7 +483,7 @@ extern simple_ipa_opt_pass > *make_pass_local_optimization_passes (gcc::context *c > > extern
Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)
On Thu, 9 Jun 2016, Nathan Sidwell wrote: > > (define_expand "restore_stack_block" > >[(match_operand 0 "register_operand" "") > > (match_operand 1 "register_operand" "")] > > you've not addressed my previous comments about this. To be clear -- do you mean that "restore_stack_block" should have a comment mentioning why "save_stack_block" is not overridden? > > @@ -0,0 +1,25 @@ > > > +char *__nvptx_stacks[32] __attribute__((shared,nocommon)); > > Is there a reason this can't be in crt0? It's also needed for offloading compilation, which doesn't link crt0. > It should be 'void *' Also, why '32' when only slot zero is initialized? > ISTM that this should be size 1. Offloading will need the full width. Thanks. Alexander
Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x
I'm including the global reviewers on the list. I just want to be sure that there is no problem installing these patches on the GCC 6.2 branch. While it is technically an enchancement, it is needed to be able to install the glibc support that is needed to complete the work to add IEEE 128-bit floating point. The issue being fixed is that when we are creating the complex type, we used to do a lookup for the size, and that fails on the PowerPC which has 2 128-bit floating point types (__ibm128 and __float128, with long double currently defaulting to __ibm128). On Fri, Jun 03, 2016 at 09:33:35AM -0400, Michael Meissner wrote: > These patches were installed on the trunk on May 2nd, with a fix from Alan > Modra on May 11th. Unless I here objections in the next few days, I will > commit these changes to the GCC 6.x branch. These changes will allow people > to > use complex __float128 types (via an attribute) on the PowerPC. > > Note, we will need patches to libgcc to fully enable complex __float128 > support > on the PowerPC. These patches enable the compiler support, so that the libgcc > changes can be coded. > > In addition to bootstrapping and regtesting on the PowerPC (little endian > power8), I also bootstrapped and regested the changes on x86_64 running RHEL > 6.2. There were no regressions in either case. > > [gcc] > 2016-06-02 Michael Meissner> > Back port from trunk > 2016-05-11 Alan Modra > > * config/rs6000/rs6000.c (is_complex_IBM_long_double, > abi_v4_pass_in_fpr): New functions. > (rs6000_function_arg_boundary): Exclude complex IBM long double > from 64-bit alignment when ABI_V4. > (rs6000_function_arg, rs6000_function_arg_advance_1, > rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr. > > Back port from trunk > 2016-05-02 Michael Meissner > > * machmode.h (mode_complex): Add support to give the complex mode > for a given mode. > (GET_MODE_COMPLEX_MODE): Likewise. > * stor-layout.c (layout_type): For COMPLEX_TYPE, use the mode > stored by build_complex_type and gfc_build_complex_type instead of > trying to figure out the appropriate mode based on the size. Raise > an assertion error, if the type was not set. > * genmodes.c (struct mode_data): Add field for the complex type of > the given type. > (blank_mode): Likewise. > (make_complex_modes): Remember the complex mode created in the > base type. > (emit_mode_complex): Write out the mode_complex array to map a > type mode to the complex version. > (emit_insn_modes_c): Likewise. > * tree.c (build_complex_type): Set the complex type to use before > calling layout_type. > * config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Add > support for __float128 complex datatypes. > (rs6000_hard_regno_mode_ok): Likewise. > (rs6000_setup_reg_addr_masks): Likewise. > (rs6000_complex_function_value): Likewise. > * config/rs6000/rs6000.h (FLOAT128_IEEE_P): Likewise. > __float128 and __ibm128 complex. > (FLOAT128_IBM_P): Likewise. > (ALTIVEC_ARG_MAX_RETURN): Likewise. > * doc/extend.texi (Additional Floating Types): Document that > -mfloat128 must be used to enable __float128. Document complex > __float128 and __ibm128 support. > > [gcc/fortran] > 2016-06-02 Michael Meissner > > Back port from trunk > 2016-05-02 Michael Meissner > > * trans-types.c (gfc_build_complex_type): > > [gcc/testsuite] > 2016-06-02 Michael Meissner > > Back port from trunk > 2016-05-02 Michael Meissner > > * gcc.target/powerpc/float128-complex-1.c: New tests for complex > __float128. > * gcc.target/powerpc/float128-complex-2.c: Likewise. > > -- > Michael Meissner, IBM > IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA > email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 > Index: gcc/machmode.h > === > --- gcc/machmode.h(revision 237045) > +++ gcc/machmode.h(working copy) > @@ -269,6 +269,10 @@ extern const unsigned char mode_wider[NU > extern const unsigned char mode_2xwider[NUM_MACHINE_MODES]; > #define GET_MODE_2XWIDER_MODE(MODE) ((machine_mode) mode_2xwider[MODE]) > > +/* Get the complex mode from the component mode. */ > +extern const unsigned char mode_complex[NUM_MACHINE_MODES]; > +#define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE]) > + > /* Return the mode for data of a given size SIZE and mode class CLASS. > If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE. > The value is BLKmode if no other mode is found. */ >
Re: [PATCH, RS6000] Add RS6000_BTM_MODULO to set of RS6000_BTM_COMMON flags
[ Sorry I missed this ] On Thu, Jun 02, 2016 at 02:51:37PM -0600, Kelvin Nilsen wrote: > This patch adds the RS6000_BTM_MODULO flag to to the set of flags > associated with the RS6000_BTM_COMMON variable. > > This patch has bootstrapped with the trunk and the gcc-6-branch on > powerpc64le-unknown-linux-gnu and there were no regressions. > > Is it ok to merge this with the trunk? Can I merge with the > gcc-6-branch after waiting a few days following the trunk integration? Okay, thank you! Segher
Re: [PATCH 0/9] separate shrink-wrapping
On Thu, Jun 09, 2016 at 10:12:53AM -0600, Jeff Law wrote: > I'm going to largely let Bernd own the review on this. Just a few comments. > > I certainly like the concept. My mental model is that parts of the > prologue might sink further than other parts of the prologue. It's not > an exact match, but I think it's a "close enough" mental model. The "normal", "main" shrink-wrapping sinks the prologue, duplicating blocks that can be run both with and without prologue. It always keeps one prologue, and any path through the function will execute the prologue at most once. Separate shrink-wrapping can put the prologue pieces in more than one spot (each), and also execute them more than once. But it does not copy blocks, that can explode the code size too much if there are many concerns. The standard example: | 1 / \ 2 3 \ / 4 / \ 5 6 \ / 7 | where 3 and 6 have some concern. If 3 and 6 together have a lower execution frequency than 1 does, it is better to put prologues and epilogues around 3 and 6; otherwise, it is better to have a prologue before 1 and an epilogue after 7. > It looks like the "concerns" are all target defined and its the target's > responsibility to deal with dependencies between the "concerns". Right? Exactly. There are not many dependencies in general (except for the obvious "setting up a stack frame has to come before almost everything else"). I have found no reasonably simple way yet to have the target inform the generic code about the dependencies, and nothing uses it so far anyway. So I just punt the responsibility it to the target code, for now anyway. > (BTW, I think "concerns" is a particularly poor name, perhaps > "reasons" would be better?). I think that's even worse. "pieces", maybe? > Right now it looks like shrink_wrapped_separate essentially says "we did > something special here" -- while I don't think it's necessary for this > patch, describing what we did might be better. Essentially different > paths would have a set of prologue/epilogue attributes that apply to > that path. It is used by a few later passes to say "don't touch some stuff, you do not have all the info you need". This is analogous to the existing "shrink_wrapped" flag. > This would be useful for things like cfgcleanup where you could join > noreturn calls more aggressively. Though this may not matter in any > significant way in practice. Oh, it already tried to join such paths more aggressively. Which then blows up big time in the usual dwarf2cfi spot. > There may be enough commonality that we can promote some "concerns" from > target dependent to target independent -- I've often wondered if we > could handle a fair amount of prologue/epilogue generation in target > independent ways.For simple targets, I wouldn't be terribly > surprised if all prologue/epilogue generation could be handled > generically and they'd get separate shrink-wrapping "for free". But > that's all blue-sky. I don't think that will work to well, unfortunately. Segher
Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)
On 06/02/16 17:22, Alexander Monakov wrote: On Wed, 25 May 2016, Nathan Sidwell wrote: It seems like we should reject the combination of -msoft-stack -fopenacc? Possibly; the doc text makes it explicit that the option is exposed only for the purpose of testing the compiler, anyway. It is always best to prevent the user doing something you don't recommend, rather than presume they'll be sensible. No change for that yet in the respin; do you want something like if (flag_openacc && TARGET_SOFT_STACK) sorry ("-fopenacc and -msoft-stack are mutually exclusive"); in nvptx_override_options? Yes, but it's not a sorry, just a regular error. +static void +init_softstack_frame (FILE *file, unsigned alignment, unsigned size) +{ + /* Maintain 64-bit stack alignment. */ This still needs blank lines to aid readibility. @@ -1027,6 +1186,21 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) REGNO (cfun->machine->axis_predicate[1]), "x"); } +/* Output instruction that sets soft stack pointer in shared memory to the + value in register given by SRC_REGNO. */ + +const char * +nvptx_output_set_softstack (unsigned src_regno) +{ + if (cfun->machine->has_softstack && !crtl->is_leaf) +{ + fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ", + POINTER_SIZE, reg_names[SOFTSTACK_SLOT_REGNUM]); + output_reg (asm_out_file, src_regno, VOIDmode); + fprintf (asm_out_file, ";\n"); +} + return ""; I think this would be clearer by having the softstack insn's pattern mention both the SRC and softstack reg and then simply return either "" or "st.shared%t\\t%0, %1" (and do that directly in the insn's emitter section rather than an fn call?) Then also have an epilogue expander emit the insn, rather than embed it in the return emitter. --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md (define_expand "restore_stack_block" [(match_operand 0 "register_operand" "") (match_operand 1 "register_operand" "")] you've not addressed my previous comments about this. @@ -0,0 +1,25 @@ +char *__nvptx_stacks[32] __attribute__((shared,nocommon)); Is there a reason this can't be in crt0? It should be 'void *' Also, why '32' when only slot zero is initialized? ISTM that this should be size 1.
Re: [PATCH] Remove old diagnostic macros.
2016-06-09 18:35 GMT+02:00 Jeff Law: > On 05/30/2016 05:19 PM, Marcin Baczyński wrote: >> >> Hi, >> this is my first GCC patch, so please bear with me if something is wrong >> with >> it in an obvious way. >> >> I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is? >> Bootstrapped on x86_64-pc-linux-gnu. >> >> 2016-05-31 Marcin Baczyński >> >>* gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer): >> delete. > > This is fine for the trunk. Please install. > Thanks for the approval. As for installing, I don't have write access to the repository, so could someone with the access do it, please? > THanks, > jeff
[PING] [PATCH] c/71392 - SEGV calling integer overflow built-ins with a null pointer
Attached is an updated version of the original patch described below to annotate with the nonnull attribute the Built-In Functions to Perform Arithmetic with Overflow Checking. Since the machinery that's in place doesn't handle the attribute on type-generic built-ins changes to the (handle_nonnull_attribute function in Ada and LTO besides C were required so I CC the maintainers for these areas. Thanks Martin On 06/02/2016 05:22 PM, Martin Sebor wrote: In a discussion of a patch in a this area (c/68120 and c++/70507) Jakub noticed that the integer arithmetic built-ins with overflow checking that expect a pointer to an integer as the last argument silently (i.e., without a warning) accept a null pointer. As the test case in the bug referenced in in subject shows, such calls then crash at runtime. The attached patch follows the same approach used by other built ins that take a pointer to an object (such as __built_strlen) to issue a -Wnonnull warning for such invalid calls. Martin PR c/71392 - SEGV calling integer overflow built-ins with a null pointer gcc/ChangeLog: PR c/71392 * builtin-attrs.def (ATTR_NOTHROW_NONNULL_LEAF_LIST): New macro. (ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF): Same. * builtins.def (BUILT_IN_SADD_OVERFLOW, BUILT_IN_SADDL_OVERFLOW): Use them. (BUILT_IN_SADDLL_OVERFLOW, BUILT_IN_SSUB_OVERFLOW): Same. (BUILT_IN_SSUBL_OVERFLOW, BUILT_IN_SSUBLL_OVERFLOW): Same. (BUILT_IN_SMUL_OVERFLOW, BUILT_IN_SMULL_OVERFLOW): Same. (BUILT_IN_SMULLL_OVERFLOW, BUILT_IN_UADD_OVERFLOW): Same. (BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW): Same. (BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW): Same. (BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW): Same. (BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW): gcc/ada/ChangeLog: PR c/71392 * gcc/ada/gcc-interface/utils.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/c-family/ChangeLog: PR c/71392 * gcc/c-family/c-common.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/lto/ChangeLog: PR c/71392 * gcc/lto/lto-lang.c (handle_nonnull_attribute): Accept the nonnull attribute in type-generic builtins. gcc/testsuite/ChangeLog: PR c/71392 * c-c++-common/builtin-arith-overflow-1.c: Add test cases. Index: gcc/ada/gcc-interface/utils.c === --- gcc/ada/gcc-interface/utils.c (revision 237267) +++ gcc/ada/gcc-interface/utils.c (working copy) @@ -5833,10 +5833,14 @@ /* If no arguments are specified, all pointer arguments should be non-null. Verify a full prototype is given so that the arguments - will have the correct types when we actually check them later. */ + will have the correct types when we actually check them later. + Avoid diagnosing type-generic built-ins since those have no + prototype. */ if (!args) { - if (!prototype_p (type)) + if (!prototype_p (type) + && TYPE_ATTRIBUTES (type) + && !lookup_attribute ("type generic", TYPE_ATTRIBUTES (type))) { error ("nonnull attribute without arguments on a non-prototype"); *no_add_attrs = true; Index: gcc/builtin-attrs.def === --- gcc/builtin-attrs.def (revision 237267) +++ gcc/builtin-attrs.def (working copy) @@ -165,6 +165,7 @@ /* Nothrow leaf functions whose pointer parameter(s) are all nonnull. */ DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_LEAF, ATTR_NONNULL, ATTR_NULL, \ ATTR_NOTHROW_LEAF_LIST) +DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_LEAF_LIST, ATTR_LEAF, ATTR_NULL, ATTR_NOTHROW_NONNULL_LEAF) /* Nothrow functions whose first parameter is a nonnull pointer. */ DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_1, ATTR_NONNULL, ATTR_LIST_1, \ ATTR_NOTHROW_LIST) @@ -183,6 +184,10 @@ /* Nothrow leaf functions which are type-generic. */ DEF_ATTR_TREE_LIST (ATTR_NOTHROW_TYPEGENERIC_LEAF, ATTR_TYPEGENERIC, ATTR_NULL, \ ATTR_NOTHROW_LEAF_LIST) +/* Nothrow nonnull leaf functions that are type-generic. */ +DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF, + ATTR_TYPEGENERIC, ATTR_NULL, + ATTR_NOTHROW_NONNULL_LEAF) /* Nothrow const functions whose pointer parameter(s) are all nonnull. */ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL) Index: gcc/builtins.def === --- gcc/builtins.def (revision 237267) +++ gcc/builtins.def (working copy) @@ -707,31 +707,31 @@ DEF_C94_BUILTIN(BUILT_IN_TOWUPPER, "towupper", BT_FN_WINT_WINT, ATTR_PURE_NOTHROW_LEAF_LIST) /* Category: integer overflow checking builtins. */ -DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW, "add_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) -DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW, "sub_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) -DEF_GCC_BUILTIN
Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN
On Thu, Jun 09, 2016 at 04:25:32PM +, Joseph Myers wrote: > On Thu, 9 Jun 2016, Jeff Law wrote: > > > On 05/26/2016 09:04 AM, David Malcolm wrote: > > > When using config-list.mk to build all configurations, it's useful > > > to filter out the configurations that are known to be broken. > > > > > > This patch does so, adding a KNOWN_BROKEN variable. > > > > > > contrib/ChangeLog: > > > * config-list.mk (LIST): Rename to... > > > (FULL_LIST): ...this. > > > (KNOWN_BROKEN): New variable. > > > (LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN. > > I'd rather fix, deprecate and/or remove these kinds of targets. > > > > Given that interix has been broken for a long time, I'll approve a patch > > that > > removes it. > > The rule was meant to be that a target is deprecated in one release, then > removed in the next - the error message given without --enable-obsolete > says so. So all the targets that require --enable-obsolete in GCC 6 > should be removed for GCC 7 in the absence of interest in reviving them. yeah, I have patches to do that (including interix which was obsoleted), but moving and a hard drive failure got in the way of sending them. I'll try and get those out shortly. Trev
[PATCH 3/3] pretty-print.c: skip color selftests if GCC_COLORS is set
gcc/ChangeLog: * pretty-print.c (assert_pp_format_colored): Skip the test if GCC_COLORS is set. (test_pp_format): Remove comment about GCC_COLORS. --- gcc/pretty-print.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c index 86ae3a5..3713953 100644 --- a/gcc/pretty-print.c +++ b/gcc/pretty-print.c @@ -1266,6 +1266,12 @@ static void assert_pp_format_colored (const location , const char *expected, const char *fmt, ...) { + /* The tests of colorization assume that the default color scheme. + If GCC_COLORS is set, then the colors have potentially been + overridden; skip the test. */ + if (getenv ("GCC_COLORS")) +return; + va_list ap; va_start (ap, fmt); @@ -1347,7 +1353,6 @@ test_pp_format () ASSERT_PP_FORMAT_2 ("normal colored normal 12345678", "normal %rcolored%R normal %x", "error", 0x12345678); - /* The following assumes an empty value for GCC_COLORS. */ assert_pp_format_colored (SELFTEST_LOCATION, "normal \33[01;31m\33[Kcolored\33[m\33[K normal 12345678", -- 1.8.5.3
[PATCH 2/3] selftests: improve reported failure locations
This patch introduce a selftest::location struct to wrap up __FILE__ and __LINE__ information (and __FUNCTION__) throughout the selftests, allowing location information to be passed around. It updates the helper functions in pretty-print.c to pass through the precise location of each test, so that if a failure occurs, the correct line number is printed, rather than a line within a helper function. gcc/ChangeLog: * input.c (test_reading_source_line): Use SELFTEST_LOCATION. * pretty-print.c (assert_pp_format_va): Add location param and use it with ASSERT_STREQ_AT. (assert_pp_format): Add location param and pass it to assert_pp_format_va. (assert_pp_format_colored): Likewise. (ASSERT_PP_FORMAT_1): New. (ASSERT_PP_FORMAT_2): New. (ASSERT_PP_FORMAT_3): New. (test_pp_format): Provide SELFTEST_LOCATION throughout, either explicitly, or implicitly via the above macros. * selftest.c (selftest::pass): Use a selftest::location rather than file and line. (selftest::fail): Likewise. Print the function name. (selftest::fail_formatted): Likewise. (selftest::assert_streq): Use a selftest::location rather than file and line. * selftest.h (selftest::location): New struct. (SELFTEST_LOCATION): New macro. (selftest::pass): Accept a const location & rather than file and line. (selftest::fail): Likewise. (selftest::fail_formatted): Likewise. (selftest::assert_streq): Likewise. (ASSERT_TRUE): Update for above changes, using SELFTEST_LOCATION. (ASSERT_FALSE): Likewise. (ASSERT_EQ): Likewise. (ASSERT_NE): Likewise. (ASSERT_STREQ): Likewise. (ASSERT_PRED1): Likewise. (ASSERT_STREQ_AT): New macro. --- gcc/input.c| 4 +- gcc/pretty-print.c | 128 - gcc/selftest.c | 20 - gcc/selftest.h | 60 ++--- 4 files changed, 134 insertions(+), 78 deletions(-) diff --git a/gcc/input.c b/gcc/input.c index 0b340a8..704ee75 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1231,10 +1231,10 @@ test_reading_source_line () ASSERT_EQ (53, line_size); if (!strncmp (" The quick brown fox jumps over the lazy dog. */", source_line, line_size)) -::selftest::pass (__FILE__, __LINE__, +::selftest::pass (SELFTEST_LOCATION, "source_line matched expected value"); else -::selftest::fail (__FILE__, __LINE__, +::selftest::fail (SELFTEST_LOCATION, "source_line did not match expected value"); } diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c index 8febda0..86ae3a5 100644 --- a/gcc/pretty-print.c +++ b/gcc/pretty-print.c @@ -1227,8 +1227,8 @@ test_basic_printing () prints EXPECTED, assuming that pp_show_color is SHOW_COLOR. */ static void -assert_pp_format_va (const char *expected, bool show_color, const char *fmt, -va_list *ap) +assert_pp_format_va (const location , const char *expected, +bool show_color, const char *fmt, va_list *ap) { pretty_printer pp; text_info ti; @@ -1243,34 +1243,59 @@ assert_pp_format_va (const char *expected, bool show_color, const char *fmt, pp_show_color () = show_color; pp_format (, ); pp_output_formatted_text (); - ASSERT_STREQ (expected, pp_formatted_text ()); + ASSERT_STREQ_AT (loc, expected, pp_formatted_text ()); } /* Verify that pp_format (FMT, ...) followed by pp_output_formatted_text prints EXPECTED, with show_color disabled. */ static void -assert_pp_format (const char *expected, const char *fmt, ...) +assert_pp_format (const location , const char *expected, + const char *fmt, ...) { va_list ap; va_start (ap, fmt); - assert_pp_format_va (expected, false, fmt, ); + assert_pp_format_va (loc, expected, false, fmt, ); va_end (ap); } /* As above, but with colorization enabled. */ static void -assert_pp_format_colored (const char *expected, const char *fmt, ...) +assert_pp_format_colored (const location , const char *expected, + const char *fmt, ...) { va_list ap; va_start (ap, fmt); - assert_pp_format_va (expected, true, fmt, ); + assert_pp_format_va (loc, expected, true, fmt, ); va_end (ap); } +/* Helper function for calling testing pp_format, + by calling assert_pp_format with various numbers of arguments. + These exist mostly to avoid having to write SELFTEST_LOCATION + throughout test_pp_format. */ + +#define ASSERT_PP_FORMAT_1(EXPECTED, FMT, ARG1) \ + SELFTEST_BEGIN_STMT\ +assert_pp_format ((SELFTEST_LOCATION), (EXPECTED), (FMT), \ + (ARG1));\ + SELFTEST_END_STMT + +#define ASSERT_PP_FORMAT_2(EXPECTED, FMT,
[PATCH 1/3] selftest: show values when ASSERT_STREQ fails
Rework ASSERT_STREQ so that it prints the actual and expected values to stderr when it fails (by moving it to a helper function). gcc/ChangeLog: * selftest.c (selftest::fail_formatted): New function. (selftest::assert_streq): New function. * selftest.h (selftests::fail_formatted): New decl. (selftest::assert_streq): New decl. (ASSERT_STREQ): Reimplement in terms of selftest::assert_streq. --- gcc/selftest.c | 32 gcc/selftest.h | 24 +++- 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/gcc/selftest.c b/gcc/selftest.c index de804df..e5332db 100644 --- a/gcc/selftest.c +++ b/gcc/selftest.c @@ -44,4 +44,36 @@ selftest::fail (const char *file, int line, const char *msg) abort (); } +/* As "fail", but using printf-style formatted output. */ + +void +selftest::fail_formatted (const char *file, int line, const char *fmt, ...) +{ + va_list ap; + + fprintf (stderr, "%s:%i: FAIL: ", file, line); + /* TODO: add calling function name as well? */ + va_start (ap, fmt); + vfprintf (stderr, fmt, ap); + va_end (ap); + fprintf (stderr, "\n"); + abort (); +} + +/* Implementation detail of ASSERT_STREQ. */ + +void +selftest::assert_streq (const char *file, int line, + const char *desc_expected, const char *desc_actual, + const char *val_expected, const char *val_actual) +{ + if (0 == strcmp (val_expected, val_actual)) +::selftest::pass (file, line, "ASSERT_STREQ"); + else +::selftest::fail_formatted + (file, line, "ASSERT_STREQ (%s, %s) expected=\"%s\" actual=\"%s\"", +desc_expected, desc_actual, val_expected, val_actual); +} + + #endif /* #if CHECKING_P */ diff --git a/gcc/selftest.h b/gcc/selftest.h index d1f8acc..6759734 100644 --- a/gcc/selftest.h +++ b/gcc/selftest.h @@ -39,6 +39,17 @@ extern void pass (const char *file, int line, const char *msg); extern void fail (const char *file, int line, const char *msg); +/* As "fail", but using printf-style formatted output. */ + +extern void fail_formatted (const char *file, int line, const char *fmt, ...) + ATTRIBUTE_PRINTF_3; + +/* Implementation detail of ASSERT_STREQ. */ + +extern void assert_streq (const char *file, int line, + const char *desc_expected, const char *desc_actual, + const char *val_expected, const char *val_actual); + /* Declarations for specific families of tests (by source file), in alphabetical order. */ extern void bitmap_c_tests (); @@ -123,15 +134,10 @@ extern int num_passes; ::selftest::pass if they are equal, ::selftest::fail if they are non-equal. */ -#define ASSERT_STREQ(EXPECTED, ACTUAL)\ - SELFTEST_BEGIN_STMT \ - const char *desc = "ASSERT_STREQ (" #EXPECTED ", " #ACTUAL ")"; \ - const char *expected_ = (EXPECTED);\ - const char *actual_ = (ACTUAL);\ - if (0 == strcmp (expected_, actual_)) \ -::selftest::pass (__FILE__, __LINE__, desc); \ - else\ -::selftest::fail (__FILE__, __LINE__, desc); \ +#define ASSERT_STREQ(EXPECTED, ACTUAL) \ + SELFTEST_BEGIN_STMT \ + ::selftest::assert_streq (__FILE__, __LINE__, #EXPECTED, #ACTUAL, \ + (EXPECTED), (ACTUAL)); \ SELFTEST_END_STMT /* Evaluate PRED1 (VAL1), calling ::selftest::pass if it is true, -- 1.8.5.3
[PATCH 0/3] selftest improvements
PR bootstrap/71471 highlighted the need for selftests to provide more information when they fail. The report contained: src/gcc/pretty-print.c:1246: FAIL: ASSERT_STREQ (expected, pp_formatted_text ()) This showed a string-equality failure within the helper function assert_pp_format_va, but didn't show which actual comparison was failing. Patch 1 updates ASSERT_STREQ so that the output includes the actual and expected strings, which would have allowed us to identify the failing test by searching the source. Patch 2 introduces a way to pass location information to helper functions so that the file/line that's printed for a failure points to the specific case, rather than to the inside of the helper function, and uses it within pretty-print.c's selftests. This would have allowed us to directly identify the failing test. Patch 3 fixes a bug that turned out not to be the cause of the PR. I've tested the combination of these patches: * successful bootstrap on x86_64-pc-linux-gnu * successful build/-fselftest of stage 1 on powerpc-ibm-aix7.1.3.0 OK for trunk? David Malcolm (3): selftest: show values when ASSERT_STREQ fails selftests: improve reported failure locations pretty-print.c: skip color selftests if GCC_COLORS is set gcc/input.c| 4 +- gcc/pretty-print.c | 135 + gcc/selftest.c | 40 ++-- gcc/selftest.h | 76 +- 4 files changed, 177 insertions(+), 78 deletions(-) -- 1.8.5.3
Re: [PATCH] Add selftest for pretty-print.c (v2)
On Thu, Jun 9, 2016 at 1:53 PM, David Malcolmwrote: > On Thu, 2016-06-09 at 11:22 -0600, Jeff Law wrote: >> On 06/09/2016 07:30 AM, David Edelsohn wrote: >> > >> > The self-tests specifically abort the build and break bootstrap >> > upon >> > failure. Most other changes that inadvertently have bugs or tickle >> > a >> > latent issue in a target will introduce some additional testsuite >> > failures, not a bootstrap failure. x86 developers seem to get >> > quite >> > annoyed when a patch causes a bootstrap failure for an x86 >> > configuration. >> > >> > Second, all of the large changes that may have unknown effects on >> > various targets have been tested extensively on multiple >> > architectures, as have most global optimization changes. It may >> > not >> > be required, but it generally has been considered "good form" and >> > has >> > been a stipulation of patch approval by some reviewers. It would >> > be >> > very unfortunate for GCC to lower the bar for patches by some >> > developers and not others. >> Let's all calm down a bit here. Everyone here just wants to make a >> better compiler and mistakes happen. > >> What I see in David Malcolm's change is a fairly minor bug. I don't >> think David (or anyone) could have really expected that %p is printed >> differently across different hosts and thus his patch would need >> wider >> host testing. And AFAICT David addressed this issue as soon as he >> started his day. >> >> So let's all take a deep breath and get back to improving GCC rather >> than taking jabs at each other. >> > > Sorry about the breakage. I've committed a fix as r237271, which I've > tested on PPC AIX (and on x86_64 linux). > > The selftest code is very new. I tested both it and the pretty-print.c > tests for every known-good *target* in config-list.mk; the issue here > was a *host*-specific issue. > > Maybe the current "fail the build on any selftest failures" is too > aggressive. That said, note that if one knows which file the failing > test is in (which we did), it's trivial to disable the tests in that > file by hacking gcc/selftests-run-tests.c and commenting out/deleting > the call: > > diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c > index 934e700..1c8128b 100644 > --- a/gcc/selftest-run-tests.c > +++ b/gcc/selftest-run-tests.c > @@ -46,7 +46,7 @@ selftest::run_tests () >hash_map_tests_c_tests (); >hash_set_tests_c_tests (); >vec_c_tests (); > - pretty_print_c_tests (); > + //pretty_print_c_tests (); >wide_int_cc_tests (); > > > whilst the underlying failure is investigated, so adding a new selftest > is presumably not as risky an event as, say, changing an optimizer: the > change is localized and can be readily disabled if it turns out to have > a config-specific assumption. > > The selftests currently in trunk aren't the most exciting; I'm much > more interested in the ggc-tests.c patch (awaiting review), since this > would finally give us self-testing of gengtype and ggc, which AFAIK we > haven't been able to test directly before. I hate gengtype, and it's > been a goal of mine to try to tame it since I started working on gcc. > (FWIW I've successfully tested the ggc patch on AIX PPC now, for stage > 1 at least, and for all targets in config-list.mk). > > Sorry again about the breakage. Thanks for fixing this so quickly. Maybe we need to consider some sort of "warn on failure" beta testing period for new self-tests before they cause errors. If self-tests can trigger host-dependent behavior and cause bootstrap failure as a consequence, we need to think about how this interacts with other GCC development policies. Thanks, David
Re: [PATCH] Add selftest for pretty-print.c (v2)
On Thu, 2016-06-09 at 11:22 -0600, Jeff Law wrote: > On 06/09/2016 07:30 AM, David Edelsohn wrote: > > > > The self-tests specifically abort the build and break bootstrap > > upon > > failure. Most other changes that inadvertently have bugs or tickle > > a > > latent issue in a target will introduce some additional testsuite > > failures, not a bootstrap failure. x86 developers seem to get > > quite > > annoyed when a patch causes a bootstrap failure for an x86 > > configuration. > > > > Second, all of the large changes that may have unknown effects on > > various targets have been tested extensively on multiple > > architectures, as have most global optimization changes. It may > > not > > be required, but it generally has been considered "good form" and > > has > > been a stipulation of patch approval by some reviewers. It would > > be > > very unfortunate for GCC to lower the bar for patches by some > > developers and not others. > Let's all calm down a bit here. Everyone here just wants to make a > better compiler and mistakes happen. > What I see in David Malcolm's change is a fairly minor bug. I don't > think David (or anyone) could have really expected that %p is printed > differently across different hosts and thus his patch would need > wider > host testing. And AFAICT David addressed this issue as soon as he > started his day. > > So let's all take a deep breath and get back to improving GCC rather > than taking jabs at each other. > Sorry about the breakage. I've committed a fix as r237271, which I've tested on PPC AIX (and on x86_64 linux). The selftest code is very new. I tested both it and the pretty-print.c tests for every known-good *target* in config-list.mk; the issue here was a *host*-specific issue. Maybe the current "fail the build on any selftest failures" is too aggressive. That said, note that if one knows which file the failing test is in (which we did), it's trivial to disable the tests in that file by hacking gcc/selftests-run-tests.c and commenting out/deleting the call: diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c index 934e700..1c8128b 100644 --- a/gcc/selftest-run-tests.c +++ b/gcc/selftest-run-tests.c @@ -46,7 +46,7 @@ selftest::run_tests () hash_map_tests_c_tests (); hash_set_tests_c_tests (); vec_c_tests (); - pretty_print_c_tests (); + //pretty_print_c_tests (); wide_int_cc_tests (); whilst the underlying failure is investigated, so adding a new selftest is presumably not as risky an event as, say, changing an optimizer: the change is localized and can be readily disabled if it turns out to have a config-specific assumption. The selftests currently in trunk aren't the most exciting; I'm much more interested in the ggc-tests.c patch (awaiting review), since this would finally give us self-testing of gengtype and ggc, which AFAIK we haven't been able to test directly before. I hate gengtype, and it's been a goal of mine to try to tame it since I started working on gcc. (FWIW I've successfully tested the ggc patch on AIX PPC now, for stage 1 at least, and for all targets in config-list.mk). Sorry again about the breakage. Dave
Re: [PATCH] Fix tests for x86 interrupt for -fpic and -march=corei7 targets
On Thu, Jun 9, 2016 at 4:01 PM, Koval, Juliawrote: > Hi, > > Here is a trivial patch, that fixes tests for these targets. Ok for trunk? > > gcc/testsuite/ChangeLog: > * gcc.target/i386/interrupt-12.c: Fix test for -fpic and corei7. > * gcc.target/i386/interrupt-13.c: Likewise. > * gcc.target/i386/interrupt-15.c: Likewise. > * gcc.target/i386/interrupt-14.c: Fix test for -fpic. > * gcc.target/i386/interrupt-24.c: Likewise. > * gcc.target/i386/interrupt-3.c: Fix test for corei7. > * gcc.target/i386/interrupt-9.c: Likewise. > * gcc.target/i386/interrupt-redzone-2.c: Likewise. OK. Thanks, Uros.
Re: Moving backwards/FSM threader into its own pass
On 06/09/2016 05:18 AM, Martin Liška wrote: Hello. The patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71466 THanks. I've got a few issues to address related to that change -- I've just been swamped with some personal stuff the last week. I'm seriously considering reverting the change at this point, addressing the issues, then re-installing as I expect to stay pretty busy in the immediate future. jeff
Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc
Hi, Bruce! I thought about a regex, but the aliases require a full function signature and the original, narrow context does not provide a function signature. If it was just alias XXXalloc as __linux_XXXalloc, it would be more straight forward. If there's a convenient way to add the other information, I'm eager to learn. Thanks, David On Thu, Jun 9, 2016 at 1:44 PM, Bruce Korbwrote: > He's retired, but he ain't dead. > I think these could be accomplished with a single fix. > Please try a regex expression in the selection and utilize the selection in > the replacement. > I'll look at it when I have time (give me a few days) > > On Thu, Jun 9, 2016 at 10:29 AM, Jeff Law wrote: >> >> On 06/09/2016 11:25 AM, David Edelsohn wrote: >>> >>> AIX has added variants of malloc, realloc, calloc and valloc with >>> greater compatibility with Linux semantics, especially for NULL >>> addresses. The variants are declared in stdlib.h and use #define to >>> override the normal definition if _LINUX_SOURCE_COMPAT is defined, >>> e.g., >>> >>> #define malloc __linux_malloc >>> #define calloc __linux_calloc >>> #define realloc __linux_realloc >>> #define valloc __linux_valloc >>> >>> libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros, >>> e.g., >>> >>> // Get rid of those macros defined in in lieu of real >>> functions. >>> ... >>> #undef malloc >>> #undef realloc >>> >>> C++ applications on AIX, especially users of BOOST that include >>> cstdlib, encounter unexpected behavior when the definition of malloc >>> changes from the expected / requested version. >>> >>> The following patch updates fixincludes to correct the AIX stdlib.h >>> header by converting the #define to GCC asm aliases. I created a >>> separate fix for each definition because the order is not guaranteed. >>> >>> Bootstrapped on powerpc-ibm-aix7.1.0.0. This fixes a recent node.js >>> build failure on AIX due to additional dependence on BOOST. >>> >>> Okay for trunk, GCC 6 and GCC 5? >>> >>> Thanks, David >>> >>> * inclhack.def (aix_stdlib_malloc): New fix. >>> (aix_stdlib_realloc): New fix. >>> (aix_stdlib_calloc): New fix. >>> (aix_stdlib_valloc): New fix. >>> * fixincl.x: Regenerate. >>> * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test. >>> [AIX_STDLIB_REALLOC]: New test. >>> [AIX_STDLIB_CALLOC]: New test. >>> [AIX_STDLIB_VALLOC]: New test. >> >> Wow, fixincludesI'm not even sure if Bruce is around anymore... >> >> GIven these are conditional on mach= *-*-aix*, I think you can >> self-approve them. >> >> jeff >> >
Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc
On 06/09/2016 11:25 AM, David Edelsohn wrote: AIX has added variants of malloc, realloc, calloc and valloc with greater compatibility with Linux semantics, especially for NULL addresses. The variants are declared in stdlib.h and use #define to override the normal definition if _LINUX_SOURCE_COMPAT is defined, e.g., #define malloc __linux_malloc #define calloc __linux_calloc #define realloc __linux_realloc #define valloc __linux_valloc libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros, e.g., // Get rid of those macros defined in in lieu of real functions. ... #undef malloc #undef realloc C++ applications on AIX, especially users of BOOST that include cstdlib, encounter unexpected behavior when the definition of malloc changes from the expected / requested version. The following patch updates fixincludes to correct the AIX stdlib.h header by converting the #define to GCC asm aliases. I created a separate fix for each definition because the order is not guaranteed. Bootstrapped on powerpc-ibm-aix7.1.0.0. This fixes a recent node.js build failure on AIX due to additional dependence on BOOST. Okay for trunk, GCC 6 and GCC 5? Thanks, David * inclhack.def (aix_stdlib_malloc): New fix. (aix_stdlib_realloc): New fix. (aix_stdlib_calloc): New fix. (aix_stdlib_valloc): New fix. * fixincl.x: Regenerate. * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test. [AIX_STDLIB_REALLOC]: New test. [AIX_STDLIB_CALLOC]: New test. [AIX_STDLIB_VALLOC]: New test. Wow, fixincludesI'm not even sure if Bruce is around anymore... GIven these are conditional on mach= *-*-aix*, I think you can self-approve them. jeff
[PATCH,FIXINCLUDES] AIX stdlib.h #define malloc
AIX has added variants of malloc, realloc, calloc and valloc with greater compatibility with Linux semantics, especially for NULL addresses. The variants are declared in stdlib.h and use #define to override the normal definition if _LINUX_SOURCE_COMPAT is defined, e.g., #define malloc __linux_malloc #define calloc __linux_calloc #define realloc __linux_realloc #define valloc __linux_valloc libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros, e.g., // Get rid of those macros defined in in lieu of real functions. ... #undef malloc #undef realloc C++ applications on AIX, especially users of BOOST that include cstdlib, encounter unexpected behavior when the definition of malloc changes from the expected / requested version. The following patch updates fixincludes to correct the AIX stdlib.h header by converting the #define to GCC asm aliases. I created a separate fix for each definition because the order is not guaranteed. Bootstrapped on powerpc-ibm-aix7.1.0.0. This fixes a recent node.js build failure on AIX due to additional dependence on BOOST. Okay for trunk, GCC 6 and GCC 5? Thanks, David * inclhack.def (aix_stdlib_malloc): New fix. (aix_stdlib_realloc): New fix. (aix_stdlib_calloc): New fix. (aix_stdlib_valloc): New fix. * fixincl.x: Regenerate. * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test. [AIX_STDLIB_REALLOC]: New test. [AIX_STDLIB_CALLOC]: New test. [AIX_STDLIB_VALLOC]: New test. Index: inclhack.def === --- inclhack.def(revision 237258) +++ inclhack.def(working copy) @@ -911,7 +911,49 @@ test_text = "#ifdef __cplusplus\n}\n\n#ifdef ferror"; }; +/* + * stdlib.h on AIX uses #define on malloc and friends. + */ +fix = { +hackname = aix_stdlib_malloc; +mach = "*-*-aix*"; +files = stdlib.h; +select= "#define[ \t]+malloc[ \t]+__linux_malloc"; +c_fix = format; +c_fix_arg = "extern void malloc(size_t) __asm__(\"__linux_malloc\");"; +test_text = "#define malloc __linux_malloc"; +}; +fix = { +hackname = aix_stdlib_realloc; +mach = "*-*-aix*"; +files = stdlib.h; +select= "#define[ \t]+realloc[ \t]+__linux_realloc"; +c_fix = format; +c_fix_arg = "extern void realloc(void *, size_t) __asm__(\"__linux_realloc\");"; +test_text = "#define realloc __linux_realloc"; +}; + +fix = { +hackname = aix_stdlib_calloc; +mach = "*-*-aix*"; +files = stdlib.h; +select= "#define[ \t]+calloc[ \t]+__linux_calloc"; +c_fix = format; +c_fix_arg = "extern void calloc(size_t, size_t) __asm__(\"__linux_calloc\");"; +test_text = "#define calloc __linux_calloc"; +}; + +fix = { +hackname = aix_stdlib_valloc; +mach = "*-*-aix*"; +files = stdlib.h; +select= "#define[ \t]+valloc[ \t]+__linux_valloc"; +c_fix = format; +c_fix_arg = "extern void valloc(size_t) __asm__(\"__linux_valloc\");"; +test_text = "#define valloc __linux_valloc"; +}; + /* * stdlib.h on AIX 4.3 declares strtof() with a non-const first argument. */
Re: [PATCH] PR bootstrap/71471: remove selftest for pp_format (%p)
On 06/09/2016 10:45 AM, David Malcolm wrote: I was confused by the comment to pp_format: /* The following format specifiers are recognized as being client independent: ... %p: pointer ... */ into thinking that %p is printed in a host-independent manner, when "client independent" means in relation to different pretty-printers e.g the C++ pretty printer, or normal tree one etc. The following patch removes the overzealous test case and clarifies the comment. OK for trunk? I'm working on a followup that makes selftest failures easier to track down. gcc/ChangeLog: PR bootstrap/71471 * pretty-print.c (pp_indent): Specify that %p is printed in a host-dependent manner. (test_pp_format): Remove the test for %p. OK. jeff
Re: [PATCH] Add selftest for pretty-print.c (v2)
On 06/09/2016 07:30 AM, David Edelsohn wrote: The self-tests specifically abort the build and break bootstrap upon failure. Most other changes that inadvertently have bugs or tickle a latent issue in a target will introduce some additional testsuite failures, not a bootstrap failure. x86 developers seem to get quite annoyed when a patch causes a bootstrap failure for an x86 configuration. Second, all of the large changes that may have unknown effects on various targets have been tested extensively on multiple architectures, as have most global optimization changes. It may not be required, but it generally has been considered "good form" and has been a stipulation of patch approval by some reviewers. It would be very unfortunate for GCC to lower the bar for patches by some developers and not others. Let's all calm down a bit here. Everyone here just wants to make a better compiler and mistakes happen. What I see in David Malcolm's change is a fairly minor bug. I don't think David (or anyone) could have really expected that %p is printed differently across different hosts and thus his patch would need wider host testing. And AFAICT David addressed this issue as soon as he started his day. So let's all take a deep breath and get back to improving GCC rather than taking jabs at each other. jeff
Re: Update probabilities in predict.def to match reality
> On 06/07/2016 09:27 PM, Jan Hubicka wrote: > > There are bugs in few predictors - goto predictor is dead because the FE > > code was dropped, > > return predictor is bit random because CFG is optimized (it should probably > > be done in FE), > > loop iv compare seems bogus and fortran fail alloc does not seem to work as > > intended. > > I added FIXME and will addres them incrementally. > > Hi. > > I've investigated why 'loop iv compare heuristics' provides bogus values and > I've just created > PR for that: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71474 > > And I would like to apply following patch, that adds 'XFAIL' test-case > described in the PR and > I would distinguish scanning for 'loop iv compare' and 'guess loop iv > compare'. > > Can I install the patch? > Thanks, > Martin > > >From 6272402f76d4e6ff496d55e9a4fac7ee9a696e4e Mon Sep 17 00:00:00 2001 > From: marxin> Date: Thu, 9 Jun 2016 16:53:32 +0200 > Subject: [PATCH] Make 'loop iv compare' heuristics scanning more precise in > test-suite > > gcc/testsuite/ChangeLog: > > 2016-06-09 Martin Liska > > * gcc.dg/predict-1.c: Distinguish between "loop iv compare" > and "guess loop iv compared" heuristics. > * gcc.dg/predict-2.c: Likewise. > * gcc.dg/predict-3.c: Likewise. > * gcc.dg/predict-4.c: Likewise. > * gcc.dg/predict-5.c: Likewise. > * gcc.dg/predict-6.c: Likewise. > * gfortran.dg/predict-1.f90: New test. I think the usual strategy is to add testcases once PR is fixed (so failures track regressions). So please just fix the predict-?.c templates and leave predict-1.f90 to go with the patch (I see you arleady proposed one in the PR log. OK for the predict-?.c changes. Thanks, Honza
Re: [C++ RFC / Patch] Again about PR 70202
On Thu, Jun 09, 2016 at 07:01:47PM +0200, Paolo Carlini wrote: > Now the first comment I have about this, about which I'd like to have some > feedback, is that the gcc_unreachable () at the end of > build_simple_base_path without a return seems a bit weird to me, because the > function is *supposed to return a tree*, not void. I guess some front-ends > would even warn on this?!? Anyway, if we change: > > > /* */ > > gcc_unreachable (); > } I'll just comment on this. gcc_unreachable () is a noreturn function, so this is perfectly fine IMHO. Jakub
[C++ RFC / Patch] Again about PR 70202
Hi again, thus today I had to revert my first try at resolving this error recovery issue because it caused c++/71465. In retrospect it was a bit heavy handed anyway, the zeroing of the type served us well for many years... Today I tried to investigate the issue a bit more. I remind you that we are crashing on the gcc_unreachable () at the end of build_simple_base_path while, during error recovery, we are processing the initialization of the final 'a' in: class A { virtual void foo () { } }; class B : public A, A { }; // { dg-error "duplicate base type" } B b1, = b1; A a = b2; Now the first comment I have about this, about which I'd like to have some feedback, is that the gcc_unreachable () at the end of build_simple_base_path without a return seems a bit weird to me, because the function is *supposed to return a tree*, not void. I guess some front-ends would even warn on this?!? Anyway, if we change: /* */ gcc_unreachable (); } to: /* */ gcc_assert (seen_error ()); return error_mark_node; } we get something which, by and large is equally safe in terms of miscompilations and helps a lot error recovery, ie, we would not have c++/70202. It's also true however that we would not make much progress in trying to understand if we could solve the specific problem at issue in a more insightful way. Then my next observation is the following: if we change the testcase to: class A { virtual void foo () { } }; class B : public A, A { }; // { dg-error "duplicate base type" } B b1, = b1; A a = b1; thus everything is the same but at the end we have 'b1', not 'b2', we also don't crash. We are saved not by something particularly smart ;) but by this check at the beginning of ocp_convert: if (error_operand_p (e) || type == error_mark_node) return error_mark_node; that is, the TREE_TYPE of 'e', the VAR_DECL for 'b1', is error_mark_node and we return early the error_mark_node, we don't reach the problematic build_base_path. With 'b2' we have a more complicated REFERENCE_REF which we don't catch, but otherwise everything is the same, we reach build_base_path and we crash on the gcc_unreachable. In practice, if I add at the beginning of ocp_convert: + if (REFERENCE_REF_P (expr) + && VAR_P (TREE_OPERAND (expr, 0)) + && DECL_INITIAL (TREE_OPERAND (expr, 0)) == error_mark_node) +return error_mark_node; it also passes testing. Then the final observation is that we also don't crash if the type of 'A' is simpler, thus we don't have the declaration of virtual void foo () { } Then I gather that when the type is "complex" enough we are not cleaning things up completely enough when we encounter it as a duplicated base. Maybe as optimal error recovery when we see a duplicated base we should clean up everything having to do with bases end up with something equivalent to an incomplete class B; or something like that?!? I don't know what we want to try short / medium /long term... Thanks! Paolo.
Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches
On Thu, Jun 09, 2016 at 07:53:52PM +0300, Alexander Monakov wrote: > I'm sending updated patch series with backend prerequisites for OpenMP > offloading to the NVIDIA PTX ISA. The first patch has already received some > comments and this version reflects review feedback. The other patches have > been adjusted for clarity and re-cut in a more rigorous manner. All patches > are > rebased onto current trunk. > > Jakub, can you offer wishes/recommendations for sending the rest of > (middle-end and libgomp) patches? As you know there's a branch with Once all the prerequisites are in (I assume the patches depend on the NVPTX backend patches you've just posted), then I'd prefer if you rebase the rest to current trunk and post in reasonably reviewable chunks (that can be all of middle-end changes in one patch, all of libgomp plugin changes, all of other libgomp changes, or if some of those would be too large, split that a little bit). Jakub
Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models
On 06/02/2016 10:53 AM, James Greenhalgh wrote: Hi, When I was working in the area last year, I promised to revisit the cost model for noce if-conversion and see if I could improve the modeling. This turned out to be more tricky than I expected. This patch set rewrites the cost model for noce if-conversion. The goal is to rationalise the units used in the calculations away from BRANCH_COST, which is only defined relative to itself. Right. I think we all agreed that the key weakness of BRANCH_COST was that its meaning is only defined relative to itself. What we want is a costing metric that would allow us to estimate the cost of different forms of the computation, which might include branches and which may include edge probabilty information. If you're looking at that and thinking it doesn't sound much different from our current call to BRANCH_COST, you're right. This isn't as large a departure from the existing cost model as I had originally intended. Perhaps not as large of a change as you intended, but I think you're hitting the key issue with BRANCH_COST. This act of making the cost models consistent will cause code generation changes on a number of targets - most notably x86_64. On x86_64 the RTX cost of a conditional move comes out at "20" - this is far higher than COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots of if-conversion. The easy fix for this would be to implement the new hook. I measured the performance impact on Spec2000 as a smoke test, it didn't seem to harm anything, and the result was a slight (< 3%) uplift on Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for the reasons. I'd be comfortable with Uros guiding the implementation of the target hook for x86 so that we don't take a major step backward. Jeff
[PATCH 3/8] nvptx -muniform-simt
This patch implements -muniform-simt code generation option, which is used to emit code for OpenMP offloading. The goal is to emit code that can either execute "normally", or can execute in a way that keeps all lanes in a given warp active, their local state synchronized, and observable effects from execution happening as if only one lane was active. The latter mode is how OpenMP offloaded code runs outside of SIMD regions. To achieve that, the compiler instruments atomic instructions and calls to functions provided by the CUDA runtime (malloc, free, vprintf), i.e. those that GCC itself doesn't compile. Instrumentation converts an atomic instruction to a predicated atomic instruction followed by a warp shuffle. To illustrate, atom.op dest, becomes @PRED atom.op dest, shfl.idx dest, dest, MASTER where, outside of SIMD regions: - PRED is true in lane 0, false in lanes 1-31, so the side effect happens once - MASTER is 0 in all lanes, so the shuffle synchronizes 'dest' among all lanes and inside of SIMD regions: - PRED is true in all lanes, so the atomic is done in all lanes independently - MASTER equals to current lane number, so the shuffle is a no-op. To keep track of current state and compute PRED and MASTER, the compiler uses shared memory array 'unsigned __nvptx_uni[]' with per-warp all-zeros or all-ones masks. The mask word is zero outside of SIMD regions, all-ones inside. Function prologue uses mask to compute MASTER and PRED via: MASTER = LANE_ID & MASK; PRED = LANE_ID == MASTER; Calls are handled like atomics. gcc/ * config/nvptx/nvptx.c (need_unisimt_decl): New variable. Set it... (nvptx_init_unisimt_predicate): ...here (new function) and use it... (nvptx_file_end): ...here to emit declaration of __nvptx_uni array. (nvptx_declare_function_name): Call nvptx_init_unisimt_predicate. (nvptx_get_unisimt_master): New helper function. (nvptx_get_unisimt_predicate): Ditto. (nvptx_call_insn_is_syscall_p): Ditto. (nvptx_unisimt_handle_set): Ditto. (nvptx_reorg_uniform_simt): New. Transform code for -muniform-simt. (nvptx_reorg): Call nvptx_reorg_uniform_simt. * config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Define __nvptx_unisimt__ when -muniform-simt option is active. (struct machine_function): Add unisimt_master, unisimt_predicate rtx fields. * config/nvptx/nvptx.md (atomic): New attribute. (atomic_compare_and_swap_1): Mark with atomic attribute. (atomic_exchange): Ditto. (atomic_fetch_add): Ditto. (atomic_fetch_addsf): Ditto. (atomic_fetch_): Ditto. * config/nvptx/nvptx.opt (muniform-simt): New option. * doc/invoke.texi (-muniform-simt): Document. gcc/testsuite/ * gcc.target/nvptx/unisimt.c: New test. libgcc/ * config/nvptx/crt0.c (__main) [__nvptx_unisimt__]: Setup __nvptx_uni. * config/nvptx/stacks.c (__nvptx_uni): Define. --- gcc/config/nvptx/nvptx.c | 124 +++ gcc/config/nvptx/nvptx.h | 4 + gcc/config/nvptx/nvptx.md| 18 +++-- gcc/config/nvptx/nvptx.opt | 4 + gcc/doc/invoke.texi | 10 +++ gcc/testsuite/gcc.target/nvptx/unisimt.c | 22 ++ libgcc/config/nvptx/crt0.c | 4 + libgcc/config/nvptx/stacks.c | 3 +- 8 files changed, 183 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/nvptx/unisimt.c diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 7b90cb1..f91573a 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -142,6 +142,9 @@ static GTY(()) tree global_lock_var; /* True if any function references __nvptx_stacks. */ static bool need_softstack_decl; +/* True if any function references __nvptx_uni. */ +static bool need_unisimt_decl; + /* Allocate a new, cleared machine_function structure. */ static struct machine_function * @@ -1049,6 +1052,34 @@ nvptx_init_axis_predicate (FILE *file, int regno, const char *name) fprintf (file, "\t}\n"); } +/* Emit code to initialize predicate and master lane index registers for + -muniform-simt code generation variant. */ + +static void +nvptx_init_unisimt_predicate (FILE *file) +{ + int bits = POINTER_SIZE; + int master = REGNO (cfun->machine->unisimt_master); + int pred = REGNO (cfun->machine->unisimt_predicate); + fprintf (file, "\t{\n"); + fprintf (file, "\t\t.reg.u32 %%ustmp0;\n"); + fprintf (file, "\t\t.reg.u%d %%ustmp1;\n", bits); + fprintf (file, "\t\t.reg.u%d %%ustmp2;\n", bits); + fprintf (file, "\t\tmov.u32 %%ustmp0, %%tid.y;\n"); + fprintf (file, "\t\tmul%s.u32 %%ustmp1, %%ustmp0, 4;\n", + bits == 64 ? ".wide" : ".lo"); + fprintf (file, "\t\tmov.u%d %%ustmp2, __nvptx_uni;\n", bits); + fprintf (file, "\t\tadd.u%d %%ustmp2, %%ustmp2, %%ustmp1;\n", bits);
[PATCH 1/8] nvptx -msoft-stack
This is a respin of the recently reviewed -msoft-stack patch that addresses review feedback and adds required libgcc changes. gcc/: * config/nvptx/nvptx-protos.h (nvptx_output_set_softstack): Declare. * config/nvptx/nvptx.c: (need_softstack_decl): New variable. (init_softstack_frame): New. (nvptx_declare_function_name): Handle TARGET_SOFT_STACK. (nvptx_output_set_softstack): New. (nvptx_output_return): Emit stack restore if needed. (nvptx_get_drap_rtx): Return %argp as the DRAP if needed. (nvptx_file_end): Handle need_softstack_decl. * config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define __nvptx_softstack__ when -msoft-stack is active. (STACK_SIZE_MODE): Define. (FIXED_REGISTERS): Adjust. (SOFTSTACK_SLOT_REGNUM): New. (SOFTSTACK_PREV_REGNUM): New. (REGISTER_NAMES): Adjust. (struct machine_function): New bool field has_softstack. * config/nvptx/nvptx.md (UNSPEC_SET_SOFTSTACK): New. (allocate_stack): Implement for TARGET_SOFT_STACK. Remove unused code. (allocate_stack_): Remove unused pattern. (set_softstack_insn): New pattern. (restore_stack_block): Handle for TARGET_SOFT_STACK. * config/nvptx/nvptx.opt: (msoft-stack): New option. * doc/invoke.texi (msoft-stack): Document. gcc/testsuite/: * gcc.target/nvptx/softstack.c: New test. * lib/target-supports.exp (check_effective_target_alloca): Use a compile test. libgcc/: * config/nvptx/crt0.c (__main) [__nvptx_softstack__]: Setup __nvptx_stacks. * config/nvptx/stacks.c: New file. * config/nvptx/t-nvptx (LIB2ADD): Add stacks.c. --- gcc/config/nvptx/nvptx-protos.h| 1 + gcc/config/nvptx/nvptx.c | 111 ++--- gcc/config/nvptx/nvptx.h | 15 +++- gcc/config/nvptx/nvptx.md | 33 ++--- gcc/config/nvptx/nvptx.opt | 4 ++ gcc/doc/invoke.texi| 12 gcc/testsuite/gcc.target/nvptx/softstack.c | 23 ++ gcc/testsuite/lib/target-supports.exp | 5 +- libgcc/config/nvptx/crt0.c | 6 ++ libgcc/config/nvptx/stacks.c | 24 +++ libgcc/config/nvptx/t-nvptx| 3 +- 11 files changed, 213 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/nvptx/softstack.c create mode 100644 libgcc/config/nvptx/stacks.c diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index ec4588e..647607d 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -41,5 +41,6 @@ extern const char *nvptx_ptx_type_from_mode (machine_mode, bool); extern const char *nvptx_output_mov_insn (rtx, rtx); extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx); extern const char *nvptx_output_return (void); +extern const char *nvptx_output_set_softstack (unsigned); #endif #endif diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 39e4145..1c95fdf 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -139,6 +139,9 @@ static GTY(()) rtx worker_red_sym; /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; +/* True if any function references __nvptx_stacks. */ +static bool need_softstack_decl; + /* Allocate a new, cleared machine_function structure. */ static struct machine_function * @@ -979,6 +982,60 @@ init_frame (FILE *file, int regno, unsigned align, unsigned size) POINTER_SIZE, reg_names[regno], reg_names[regno]); } +/* Emit soft stack frame setup sequence. */ + +static void +init_softstack_frame (FILE *file, unsigned alignment, HOST_WIDE_INT size) +{ + /* Maintain 64-bit stack alignment. */ + unsigned keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; + size = ROUND_UP (size, keep_align); + int bits = POINTER_SIZE; + const char *reg_stack = reg_names[STACK_POINTER_REGNUM]; + const char *reg_frame = reg_names[FRAME_POINTER_REGNUM]; + const char *reg_sspslot = reg_names[SOFTSTACK_SLOT_REGNUM]; + const char *reg_sspprev = reg_names[SOFTSTACK_PREV_REGNUM]; + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_stack); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_frame); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspslot); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspprev); + fprintf (file, "\t{\n"); + fprintf (file, "\t\t.reg.u32 %%fstmp0;\n"); + fprintf (file, "\t\t.reg.u%d %%fstmp1;\n", bits); + fprintf (file, "\t\t.reg.u%d %%fstmp2;\n", bits); + fprintf (file, "\t\tmov.u32 %%fstmp0, %%tid.y;\n"); + fprintf (file, "\t\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n", + bits == 64 ? ".wide" : ".lo", bits / 8); + fprintf (file, "\t\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits); + /* Initialize %sspslot = &__nvptx_stacks[tid.y]. */ + fprintf
[PATCH 0/8] NVPTX offloading to NVPTX: backend patches
Hi, I'm sending updated patch series with backend prerequisites for OpenMP offloading to the NVIDIA PTX ISA. The first patch has already received some comments and this version reflects review feedback. The other patches have been adjusted for clarity and re-cut in a more rigorous manner. All patches are rebased onto current trunk. Jakub, can you offer wishes/recommendations for sending the rest of (middle-end and libgomp) patches? As you know there's a branch with development history; is that of interest, or would it be easier if I rebased all stuff anew on current trunk? Thanks. Alexander
[PATCH 8/8] nvptx: handle OpenMP "omp target entrypoint"
This patch implements emission of OpenMP target region entrypoints: the compiler emits the target function with '$impl' appended to the name, and under the original name it emits a short entry sequence that sets up shared memory arrays and calls the target function via 'gomp_nvptx_main' (which is implemented in libgomp). * config/nvptx/nvptx.c (write_as_kernel): Restrict to OpenACC target regions. (write_omp_entry): New. Use it... (nvptx_declare_function_name): ...here to emit OpenMP target region entrypoints. (nvptx_record_offload_symbol): Handle NULL attributes. --- gcc/config/nvptx/nvptx.c | 82 +--- 1 file changed, 78 insertions(+), 4 deletions(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index d959f85..5e47002 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -723,7 +723,10 @@ static bool write_as_kernel (tree attrs) { return (lookup_attribute ("kernel", attrs) != NULL_TREE - || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE); + || (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE + && lookup_attribute ("oacc function", attrs) != NULL_TREE)); + /* For OpenMP target regions, the corresponding kernel entry is emitted from + write_omp_entry as a separate function. */ } /* Emit a linker marker for a function decl or defn. */ @@ -1073,6 +1076,69 @@ nvptx_init_unisimt_predicate (FILE *file) need_unisimt_decl = true; } +/* Emit kernel NAME for function ORIG outlined for an OpenMP 'target' region: + + extern void gomp_nvptx_main (void (*fn)(void*), void *fnarg); + void __attribute__((kernel)) NAME (void *arg, char *stack, size_t stacksize) + { + __nvptx_stacks[tid.y] = stack + stacksize * (ctaid.x * ntid.y + tid.y + 1); + __nvptx_uni[tid.y] = 0; + gomp_nvptx_main (ORIG, arg); + } + ORIG itself should not be emitted as a PTX .entry function. */ + +static void +write_omp_entry (FILE *file, const char *name, const char *orig) +{ + static bool gomp_nvptx_main_declared; + if (!gomp_nvptx_main_declared) +{ + gomp_nvptx_main_declared = true; + write_fn_marker (func_decls, false, true, "gomp_nvptx_main"); + func_decls << ".extern .func gomp_nvptx_main (.param.u" << POINTER_SIZE +<< " %in_ar1, .param.u" << POINTER_SIZE << " %in_ar2);\n"; +} +#define ENTRY_TEMPLATE(PS, PS_BYTES, MAD_PS_32) "\ + (.param.u" PS " %arg, .param.u" PS " %stack, .param.u" PS " %sz)\n\ +{\n\ + .reg.u32 %r<3>;\n\ + .reg.u" PS " %R<4>;\n\ + mov.u32 %r0, %tid.y;\n\ + mov.u32 %r1, %ntid.y;\n\ + mov.u32 %r2, %ctaid.x;\n\ + cvt.u" PS ".u32 %R1, %r0;\n\ + " MAD_PS_32 " %R1, %r1, %r2, %R1;\n\ + mov.u" PS " %R0, __nvptx_stacks;\n\ + " MAD_PS_32 " %R0, %r0, " PS_BYTES ", %R0;\n\ + ld.param.u" PS " %R2, [%stack];\n\ + ld.param.u" PS " %R3, [%sz];\n\ + add.u" PS " %R2, %R2, %R3;\n\ + mad.lo.u" PS " %R2, %R1, %R3, %R2;\n\ + st.shared.u" PS " [%R0], %R2;\n\ + mov.u" PS " %R0, __nvptx_uni;\n\ + " MAD_PS_32 " %R0, %r0, 4, %R0;\n\ + mov.u32 %r0, 0;\n\ + st.shared.u32 [%R0], %r0;\n\ + mov.u" PS " %R0, \0;\n\ + ld.param.u" PS " %R1, [%arg];\n\ + {\n\ + .param.u" PS " %P<2>;\n\ + st.param.u" PS " [%P0], %R0;\n\ + st.param.u" PS " [%P1], %R1;\n\ + call.uni gomp_nvptx_main, (%P0, %P1);\n\ + }\n\ + ret.uni;\n\ +}\n" + static const char entry64[] = ENTRY_TEMPLATE ("64", "8", "mad.wide.u32"); + static const char entry32[] = ENTRY_TEMPLATE ("32", "4", "mad.lo.u32 "); +#undef ENTRY_TEMPLATE + const char *entry_1 = TARGET_ABI64 ? entry64 : entry32; + /* Position ENTRY_2 after the embedded nul using strlen of the prefix. */ + const char *entry_2 = entry_1 + strlen (entry64) + 1; + fprintf (file, ".visible .entry %s%s%s%s", name, entry_1, orig, entry_2); + need_softstack_decl = need_unisimt_decl = true; +} + /* Implement ASM_DECLARE_FUNCTION_NAME. Writes the start of a ptx function, including local var decls and copies from the arguments to local regs. */ @@ -1084,6 +1150,14 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree result_type = TREE_TYPE (fntype); int argno = 0; + if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) +{ + char *buf = (char *) alloca (strlen (name) + sizeof ("$impl")); + sprintf (buf, "%s$impl", name); + write_omp_entry (file, name, buf); + name = buf; +} /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; @@ -4153,13 +4227,13 @@ nvptx_record_offload_symbol (tree decl) case FUNCTION_DECL: {
[PATCH 4/8] nvptx -mgomp
This patch adds option -mgomp which enables -msoft-stack plus -muniform-simt, and builds a multilib with it. This codegen convention is used for OpenMP offloading. * config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP. * config/nvptx/nvptx.opt (mgomp): New option. * config/nvptx/t-nvptx (MULTILIB_OPTIONS): New. * doc/invoke.texi (mgomp): Document. --- gcc/config/nvptx/nvptx.c | 3 +++ gcc/config/nvptx/nvptx.opt | 4 gcc/config/nvptx/t-nvptx | 2 ++ gcc/doc/invoke.texi| 5 + 4 files changed, 14 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index f91573a..294002e 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -192,6 +192,9 @@ nvptx_option_override (void) worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red"); SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED); worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + + if (TARGET_GOMP) +target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT; } /* Return a ptx type for MODE. If PROMOTE, then use .u32 for QImode to diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 0d46e1d..cb6194d 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -40,3 +40,7 @@ Use custom stacks instead of local memory for automatic storage. muniform-simt Target Report Mask(UNIFORM_SIMT) Generate code that can keep local state uniform across all lanes. + +mgomp +Target Report Mask(GOMP) +Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt. diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx index e2580c9..6c1010d 100644 --- a/gcc/config/nvptx/t-nvptx +++ b/gcc/config/nvptx/t-nvptx @@ -8,3 +8,5 @@ ALL_HOST_OBJS += mkoffload.o mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBDEPS) +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBS) + +MULTILIB_OPTIONS = mgomp diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 534025d..bae7f2d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -19631,6 +19631,11 @@ generation variant is used for OpenMP offloading, but the option is exposed on its own for the purpose of testing the compiler; to generate code suitable for linking into programs using OpenMP offloading, use option @option{-mgomp}. +@item -mgomp +@opindex mgomp +Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and +@option{-muniform-simt} options, and selects corresponding multilib variant. + @end table @node PDP-11 Options -- 2.6.6
[PATCH 7/8] nvptx: new insns for OpenMP SIMD-on-SIMT
This patch implements in nvptx.md a few new instruction patterns that are used for OpenMP SIMD code. * config/nvptx/nvptx-protos.h (nvptx_shuffle_kind): Move enum declaration from nvptx.c. (nvptx_gen_shuffle): Declare. * config/nvptx/nvptx.c (nvptx_shuffle_kind): Move to nvptx-protos.h. (nvptx_gen_shuffle): Export. * config/nvptx/nvptx.md (UNSPEC_VOTE_BALLOT): New unspec. (UNSPEC_LANEID): Ditto. (UNSPECV_NOUNROLL): Ditto. (nvptx_vote_ballot): New pattern. (omp_simt_lane): Ditto. (omp_simt_last_lane): Ditto. (omp_simt_ordered): Ditto. (omp_simt_vote_any): Ditto. (omp_simt_xchg_bfly): Ditto. (omp_simt_xchg_idx): Ditto. (nvptx_nounroll): Ditto. * target-insns.def (omp_simt_lane): New. (omp_simt_last_lane): New. (omp_simt_ordered): New. (omp_simt_vote_any): New. (omp_simt_xchg_bfly): New. (omp_simt_xchg_idx): New. --- gcc/config/nvptx/nvptx-protos.h | 11 + gcc/config/nvptx/nvptx.c| 12 +- gcc/config/nvptx/nvptx.md | 94 + gcc/target-insns.def| 6 +++ 4 files changed, 112 insertions(+), 11 deletions(-) diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index 647607d..331ec0a 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -21,6 +21,16 @@ #ifndef GCC_NVPTX_PROTOS_H #define GCC_NVPTX_PROTOS_H +/* The kind of shuffe instruction. */ +enum nvptx_shuffle_kind +{ + SHUFFLE_UP, + SHUFFLE_DOWN, + SHUFFLE_BFLY, + SHUFFLE_IDX, + SHUFFLE_MAX +}; + extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl); extern void nvptx_declare_object_name (FILE *file, const char *name, const_tree decl); @@ -36,6 +46,7 @@ extern void nvptx_register_pragmas (void); extern void nvptx_expand_oacc_fork (unsigned); extern void nvptx_expand_oacc_join (unsigned); extern void nvptx_expand_call (rtx, rtx); +extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind); extern rtx nvptx_expand_compare (rtx); extern const char *nvptx_ptx_type_from_mode (machine_mode, bool); extern const char *nvptx_output_mov_insn (rtx, rtx); diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 46afae4..d959f85 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -70,16 +70,6 @@ /* This file should be included last. */ #include "target-def.h" -/* The kind of shuffe instruction. */ -enum nvptx_shuffle_kind -{ - SHUFFLE_UP, - SHUFFLE_DOWN, - SHUFFLE_BFLY, - SHUFFLE_IDX, - SHUFFLE_MAX -}; - /* The various PTX memory areas an object might reside in. */ enum nvptx_data_area { @@ -1433,7 +1423,7 @@ nvptx_gen_pack (rtx dst, rtx src0, rtx src1) /* Generate an instruction or sequence to broadcast register REG across the vectors of a single warp. */ -static rtx +rtx nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind) { rtx res; diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 31f87ec..b1b27a2 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -42,6 +42,10 @@ UNSPEC_BIT_CONV + UNSPEC_VOTE_BALLOT + + UNSPEC_LANEID + UNSPEC_SHUFFLE UNSPEC_BR_UNIFIED ]) @@ -57,6 +61,8 @@ UNSPECV_FORKED UNSPECV_JOINING UNSPECV_JOIN + + UNSPECV_NOUNROLL ]) (define_attr "subregs_ok" "false,true" @@ -1166,6 +1172,88 @@ "" "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;") +(define_insn "nvptx_vote_ballot" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec:SI [(match_operand:BI 1 "nvptx_register_operand" "R")] + UNSPEC_VOTE_BALLOT))] + "" + "%.\\tvote.ballot.b32\\t%0, %1;") + +;; Patterns for OpenMP SIMD-via-SIMT lowering + +;; Implement IFN_GOMP_SIMT_LANE: set operand 0 to lane index +(define_insn "omp_simt_lane" + [(set (match_operand:SI 0 "nvptx_register_operand" "") + (unspec:SI [(const_int 0)] UNSPEC_LANEID))] + "" + "%.\\tmov.u32\\t%0, %%laneid;") + +;; Implement IFN_GOMP_SIMT_ORDERED: copy operand 1 to operand 0 and +;; place a compiler barrier to disallow unrolling/peeling the containing loop +(define_expand "omp_simt_ordered" + [(match_operand:SI 0 "nvptx_register_operand" "=R") + (match_operand:SI 1 "nvptx_register_operand" "R")] + "" +{ + emit_move_insn (operands[0], operands[1]); + emit_insn (gen_nvptx_nounroll ()); + DONE; +}) + +;; Implement IFN_GOMP_SIMT_XCHG_BFLY: perform a "butterfly" exchange +;; across lanes +(define_expand "omp_simt_xchg_bfly" + [(match_operand 0 "nvptx_register_operand" "=R") + (match_operand 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")] + "" +{ + emit_insn (nvptx_gen_shuffle (operands[0], operands[1], operands[2], + SHUFFLE_BFLY)); + DONE; +}) + +;; Implement
[PATCH 2/8] nvptx: implement predicated instructions
This patch wires up generation of predicated instruction forms in nvptx.md and fixes their handling in nvptx.c. This is a prerequisite for the following patch. On its own it doesn't affect generated code because COND_EXEC instructions are created by if-conversion only after register allocation, which is not performed on NVPTX. * config/nvptx/nvptx.c (nvptx_output_call_insn): Handle COND_EXEC patterns. Emit instruction predicate. (nvptx_print_operand): Fix handling of instruction predicates. * config/nvptx/nvptx.md (predicable): New attribute. Generate predicated forms via define_cond_exec. (br_true): Mark as not predicable. (br_false): Ditto. (br_true_uni): Ditto. (br_false_uni): Ditto. (return): Ditto. (trap_if_true): Ditto. (trap_if_false): Ditto. (nvptx_fork): Ditto. (nvptx_forked): Ditto. (nvptx_joining): Ditto. (nvptx_join): Ditto. (nvptx_barsync): Ditto. --- gcc/config/nvptx/nvptx.c | 14 -- gcc/config/nvptx/nvptx.md | 43 +++ 2 files changed, 39 insertions(+), 18 deletions(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 1c95fdf..7b90cb1 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -1919,6 +1919,8 @@ nvptx_output_mov_insn (rtx dst, rtx src) return "%.\tcvt%t0%t1\t%0, %1;"; } +static void nvptx_print_operand (FILE *, rtx, int); + /* Output INSN, which is a call to CALLEE with result RESULT. For ptx, this involves writing .param declarations and in/out copies into them. For indirect calls, also write the .callprototype. */ @@ -1930,6 +1932,8 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee) static int labelno; bool needs_tgt = register_operand (callee, Pmode); rtx pat = PATTERN (insn); + if (GET_CODE (pat) == COND_EXEC) +pat = COND_EXEC_CODE (pat); int arg_end = XVECLEN (pat, 0); tree decl = NULL_TREE; @@ -1974,6 +1978,8 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee) fprintf (asm_out_file, ";\n"); } + /* The '.' stands for the call's predicate, if any. */ + nvptx_print_operand (asm_out_file, NULL_RTX, '.'); fprintf (asm_out_file, "\t\tcall "); if (result != NULL_RTX) fprintf (asm_out_file, "(%s_in), ", reg_names[NVPTX_RETURN_REGNUM]); @@ -2037,8 +2043,6 @@ nvptx_print_operand_punct_valid_p (unsigned char c) return c == '.' || c== '#'; } -static void nvptx_print_operand (FILE *, rtx, int); - /* Subroutine of nvptx_print_operand; used to print a memory reference X to FILE. */ static void @@ -2099,12 +2103,10 @@ nvptx_print_operand (FILE *file, rtx x, int code) x = current_insn_predicate; if (x) { - unsigned int regno = REGNO (XEXP (x, 0)); - fputs ("[", file); + fputs ("@", file); if (GET_CODE (x) == EQ) fputs ("!", file); - fputs (reg_names [regno], file); - fputs ("]", file); + output_reg (file, REGNO (XEXP (x, 0)), VOIDmode); } return; } diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index bbc8b1c..bfe3260 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -126,6 +126,17 @@ return true; }) +(define_attr "predicable" "false,true" + (const_string "true")) + +(define_cond_exec + [(match_operator 0 "predicate_operator" + [(match_operand:BI 1 "nvptx_register_operand" "") + (match_operand:BI 2 "const0_operand" "")])] + "" + "" + ) + (define_constraint "P0" "An integer with the value 0." (and (match_code "const_int") @@ -511,7 +522,8 @@ (label_ref (match_operand 1 "" "")) (pc)))] "" - "%j0\\tbra\\t%l1;") + "%j0\\tbra\\t%l1;" + [(set_attr "predicable" "false")]) (define_insn "br_false" [(set (pc) @@ -520,7 +532,8 @@ (label_ref (match_operand 1 "" "")) (pc)))] "" - "%J0\\tbra\\t%l1;") + "%J0\\tbra\\t%l1;" + [(set_attr "predicable" "false")]) ;; unified conditional branch (define_insn "br_true_uni" @@ -529,7 +542,8 @@ UNSPEC_BR_UNIFIED) (const_int 0)) (label_ref (match_operand 1 "" "")) (pc)))] "" - "%j0\\tbra.uni\\t%l1;") + "%j0\\tbra.uni\\t%l1;" + [(set_attr "predicable" "false")]) (define_insn "br_false_uni" [(set (pc) (if_then_else @@ -537,7 +551,8 @@ UNSPEC_BR_UNIFIED) (const_int 0)) (label_ref (match_operand 1 "" "")) (pc)))] "" - "%J0\\tbra.uni\\t%l1;") + "%J0\\tbra.uni\\t%l1;" + [(set_attr "predicable" "false")]) (define_expand "cbranch4" [(set (pc) @@ -940,7 +955,8 @@ "" { return nvptx_output_return (); -}) +} + [(set_attr "predicable" "false")]) (define_expand "epilogue" [(clobber (const_int 0))] @@ -1029,14 +1045,16 @@ (const_int 0))
[PATCH 6/8] new target hook: TARGET_SIMT_VF
This patch adds a new target hook and implements it in a straightforward manner on NVPTX to indicate that the target is running in SIMT fashion with 32 threads in a synchronous group ("warp"). For use in OpenMP transforms. * config/nvptx/nvptx.c (nvptx_simt_vf): New. (TARGET_SIMT_VF): Define. * doc/tm.texi: Regenerate. * doc/tm.texi.in: (TARGET_SIMT_VF): New hook. * target.def: Define it. --- gcc/config/nvptx/nvptx.c | 11 +++ gcc/doc/tm.texi | 4 gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 12 4 files changed, 29 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 294002e..46afae4 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4430,6 +4430,14 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget), #define PTX_WORKER_LENGTH 32 #define PTX_GANG_DEFAULT 32 +/* Implement TARGET_SIMT_VF target hook: number of threads in a warp. */ + +static int +nvptx_simt_vf () +{ + return PTX_VECTOR_LENGTH; +} + /* Validate compute dimensions of an OpenACC offload or routine, fill in non-unity defaults. FN_LEVEL indicates the level at which a routine might spawn a loop. It is negative for non-routines. If @@ -5195,6 +5203,9 @@ nvptx_goacc_reduction (gcall *call) #undef TARGET_BUILTIN_DECL #define TARGET_BUILTIN_DECL nvptx_builtin_decl +#undef TARGET_SIMT_VF +#define TARGET_SIMT_VF nvptx_simt_vf + #undef TARGET_GOACC_VALIDATE_DIMS #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b318615..93af006 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5768,6 +5768,10 @@ usable. In that case, the smaller the number is, the more desirable it is to use it. @end deftypefn +@deftypefn {Target Hook} int TARGET_SIMT_VF (void) +Return number of threads in SIMT thread group on the target. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int *@var{dims}, int @var{fn_level}) This hook should check the launch dimensions provided for an OpenACC compute region, or routine. Defaulted values are represented as -1 diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 1e8423c..ccec1b1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4263,6 +4263,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_SIMD_CLONE_USABLE +@hook TARGET_SIMT_VF + @hook TARGET_GOACC_VALIDATE_DIMS @hook TARGET_GOACC_DIM_LIMIT diff --git a/gcc/target.def b/gcc/target.def index a4df363..ba47e5b 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1639,6 +1639,18 @@ int, (struct cgraph_node *), NULL) HOOK_VECTOR_END (simd_clone) +/* Functions relating to OpenMP SIMT vectorization transform. */ +#undef HOOK_PREFIX +#define HOOK_PREFIX "TARGET_SIMT_" +HOOK_VECTOR (TARGET_SIMT, simt) + +DEFHOOK +(vf, +"Return number of threads in SIMT thread group on the target.", +int, (void), NULL) + +HOOK_VECTOR_END (simt) + /* Functions relating to openacc. */ #undef HOOK_PREFIX #define HOOK_PREFIX "TARGET_GOACC_" -- 2.6.6
[PATCH 5/8] nvptx mkoffload: pass -mgomp for OpenMP offloading
This patch wires up use of alternative -mgomp multilib for OpenMP offloading via nvptx mkoffload. It makes OpenACC and OpenMP incompatible for simultaneous offloading compilation, so I've added a diagnostic for that. * config/nvptx/mkoffload.c (main): Check that either OpenACC or OpenMP is selected. Pass -mgomp to offload compiler in OpenMP case. --- gcc/config/nvptx/mkoffload.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c index c8eed45..d876c7b 100644 --- a/gcc/config/nvptx/mkoffload.c +++ b/gcc/config/nvptx/mkoffload.c @@ -460,6 +460,7 @@ main (int argc, char **argv) /* Scan the argument vector. */ bool fopenmp = false; + bool fopenacc = false; for (int i = 1; i < argc; i++) { #define STR "-foffload-abi=" @@ -476,11 +477,15 @@ main (int argc, char **argv) #undef STR else if (strcmp (argv[i], "-fopenmp") == 0) fopenmp = true; + else if (strcmp (argv[i], "-fopenacc") == 0) + fopenacc = true; else if (strcmp (argv[i], "-save-temps") == 0) save_temps = true; else if (strcmp (argv[i], "-v") == 0) verbose = true; } + if (!(fopenacc ^ fopenmp)) +fatal_error (input_location, "either -fopenacc or -fopenmp must be set"); struct obstack argv_obstack; obstack_init (_obstack); @@ -501,6 +506,8 @@ main (int argc, char **argv) default: gcc_unreachable (); } + if (fopenmp) +obstack_ptr_grow (_obstack, "-mgomp"); for (int ix = 1; ix != argc; ix++) { -- 2.6.6
Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)
On Thu, Jun 09, 2016 at 06:43:04PM +0200, Jakub Jelinek wrote: > Yes, I'm all in favor in disabling X constraint for inline asm. > Especially if people actually try to print it as well, rather than make it > unused. That is a sure path to ICEs. Though, on the other side, even our documentation mentions asm volatile ("mtfsf 255,%1" : "=X"(sum): "f"(fpenv)); So perhaps we need to error just in case such an argument is printed? Jakub
Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)
On Thu, Jun 09, 2016 at 10:30:13AM -0600, Jeff Law wrote: > On 06/06/2016 01:40 PM, Jakub Jelinek wrote: > >On Mon, Jun 06, 2016 at 09:27:56PM +0200, Marc Glisse wrote: > >>The last one would miss floating point registers (no 2 platforms use the > >>same letter for those, hence my quest for something more generic). > >> > >>The goal of the experiment is described in PR59159 (for which "+X" is > >>unlikely to be the right answer, in particular because it is meaningless for > >>constants). I don't know in what context people use the "X" constraint, or > >>even better "=X"... > > > >X constraint has been added mainly for uses in match_scratch like: > >(clobber (match_scratch:SI 2 "=X,X,X,")) > >or when the predicate takes care of everything and it is not needed to > >specify anything further: > > [(set (match_operand:SWI12 0 "push_operand" "=X") > >(match_operand:SWI12 1 "nonmemory_no_elim_operand" "rn"))] > >Using it in inline asm generally has resulted in lots of issues, including > >ICEs etc., so nothing I'd recommend to use. > So would it make sense to define it as not available for use in ASMs? I > realize that's potentially a user-visible change, but it might be a > reasonable one to make. Yes, I'm all in favor in disabling X constraint for inline asm. Especially if people actually try to print it as well, rather than make it unused. That is a sure path to ICEs. Jakub
Re: [PATCH] Remove old diagnostic macros.
On 05/30/2016 05:19 PM, Marcin Baczyński wrote: Hi, this is my first GCC patch, so please bear with me if something is wrong with it in an obvious way. I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is? Bootstrapped on x86_64-pc-linux-gnu. 2016-05-31 Marcin Baczyński* gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer): delete. This is fine for the trunk. Please install. THanks, jeff
Re: [PATCH 0/4] BRIG (HSAIL) frontend
Ping: Any further comments to the patch set? Thanks, Pekka On Wed, May 18, 2016 at 8:01 PM, Pekka Jääskeläinenwrote: > Hi Joseph, > > Updated diffstat below: > Makefile.def | 3 + > Makefile.in | 489 + > configure | 1 + > configure.ac | 1 + > gcc/brig/Make-lang.in | 246 + > gcc/brig/brig-c.h |68 + > gcc/brig/brig-lang.c | 454 + > gcc/brig/brigfrontend/brig-arg-block-handler.cc |67 + > gcc/brig/brigfrontend/brig-atomic-inst-handler.cc | 377 + > gcc/brig/brigfrontend/brig-basic-inst-handler.cc | 733 + > gcc/brig/brigfrontend/brig-branch-inst-handler.cc | 217 + > gcc/brig/brigfrontend/brig-cmp-inst-handler.cc| 212 + > gcc/brig/brigfrontend/brig-code-entry-handler.cc | 2325 +++ > gcc/brig/brigfrontend/brig-code-entry-handler.h | 449 + > gcc/brig/brigfrontend/brig-comment-handler.cc |39 + > gcc/brig/brigfrontend/brig-control-handler.cc |29 + > .../brigfrontend/brig-copy-move-inst-handler.cc |56 + > gcc/brig/brigfrontend/brig-cvt-inst-handler.cc| 250 + > gcc/brig/brigfrontend/brig-fbarrier-handler.cc|44 + > gcc/brig/brigfrontend/brig-function-handler.cc| 373 + > gcc/brig/brigfrontend/brig-function.cc| 698 + > gcc/brig/brigfrontend/brig-function.h | 216 + > gcc/brig/brigfrontend/brig-inst-mod-handler.cc| 168 + > gcc/brig/brigfrontend/brig-label-handler.cc |37 + > gcc/brig/brigfrontend/brig-lane-inst-handler.cc |83 + > gcc/brig/brigfrontend/brig-machine.c |37 + > gcc/brig/brigfrontend/brig-machine.h |35 + > gcc/brig/brigfrontend/brig-mem-inst-handler.cc| 181 + > gcc/brig/brigfrontend/brig-module-handler.cc |30 + > gcc/brig/brigfrontend/brig-queue-inst-handler.cc |92 + > gcc/brig/brigfrontend/brig-seg-inst-handler.cc| 134 + > gcc/brig/brigfrontend/brig-signal-inst-handler.cc |42 + > gcc/brig/brigfrontend/brig-util.cc| 348 + > gcc/brig/brigfrontend/brig-util.h |49 + > gcc/brig/brigfrontend/brig-variable-handler.cc| 256 + > gcc/brig/brigfrontend/brig_to_generic.cc | 773 + > gcc/brig/brigfrontend/brig_to_generic.h | 245 + > gcc/brig/brigfrontend/phsa.h |40 + > gcc/brig/brigspec.c | 193 + > gcc/brig/config-lang.in |41 + > gcc/brig/lang-specs.h |28 + > gcc/brig/lang.opt |41 + > gcc/doc/frontends.texi| 2 +- > gcc/doc/invoke.texi | 4 + > gcc/doc/standards.texi| 8 + > gcc/testsuite/brig.dg/README |10 + > gcc/testsuite/brig.dg/dg.exp |27 + > gcc/testsuite/brig.dg/test/gimple/alloca.hsail|37 + > gcc/testsuite/brig.dg/test/gimple/atomics.hsail |33 + > gcc/testsuite/brig.dg/test/gimple/branches.hsail |58 + > gcc/testsuite/brig.dg/test/gimple/fbarrier.hsail |74 + > .../brig.dg/test/gimple/function_calls.hsail |59 + > gcc/testsuite/brig.dg/test/gimple/mem.hsail |39 + > gcc/testsuite/brig.dg/test/gimple/mulhi.hsail |33 + > gcc/testsuite/brig.dg/test/gimple/packed.hsail|78 + > .../brig.dg/test/gimple/smoke_test.hsail |91 + > gcc/testsuite/brig.dg/test/gimple/variables.hsail | 124 + > gcc/testsuite/brig.dg/test/gimple/vector.hsail|57 + > gcc/testsuite/lib/brig-dg.exp |29 + > gcc/testsuite/lib/brig.exp|40 + > include/hsa-interface.h | 630 + > libhsail-rt/Makefile.am | 123 + > libhsail-rt/Makefile.in | 721 + > libhsail-rt/README| 4 + > libhsail-rt/aclocal.m4| 979 + > libhsail-rt/config.h.in | 217 + > libhsail-rt/configure | 17162 ++ > libhsail-rt/configure.ac | 150 + > libhsail-rt/include/internal/phsa-rt.h|97 + > .../include/internal/phsa_queue_interface.h |60 + > libhsail-rt/include/internal/workitems.h | 103 + > libhsail-rt/m4/libtool.m4 | 7997 > libhsail-rt/m4/ltoptions.m4 | 384 + > libhsail-rt/m4/ltsugar.m4 | 123 + > libhsail-rt/m4/ltversion.m4 |23 + > libhsail-rt/m4/lt~obsolete.m4 |98 +
Re: [PATCH] Fix check_GNU_style.sh for BSD / Mac OS X
On 06/03/2016 09:41 AM, Alan Hayward wrote: check_GNU_style.sh fails to detect lines >80 chars on BSD / Mac OS X systems. This is becuase paste is being called with an empty delimiter list. Instead \0 should be used. Tested on Ubuntu 14.04 and OS X 10.9.5 contrib/ * check_GNU_style.sh: Fix paste args for BSD OK. jeff
Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)
On 06/06/2016 01:40 PM, Jakub Jelinek wrote: On Mon, Jun 06, 2016 at 09:27:56PM +0200, Marc Glisse wrote: The last one would miss floating point registers (no 2 platforms use the same letter for those, hence my quest for something more generic). The goal of the experiment is described in PR59159 (for which "+X" is unlikely to be the right answer, in particular because it is meaningless for constants). I don't know in what context people use the "X" constraint, or even better "=X"... X constraint has been added mainly for uses in match_scratch like: (clobber (match_scratch:SI 2 "=X,X,X,")) or when the predicate takes care of everything and it is not needed to specify anything further: [(set (match_operand:SWI12 0 "push_operand" "=X") (match_operand:SWI12 1 "nonmemory_no_elim_operand" "rn"))] Using it in inline asm generally has resulted in lots of issues, including ICEs etc., so nothing I'd recommend to use. So would it make sense to define it as not available for use in ASMs? I realize that's potentially a user-visible change, but it might be a reasonable one to make. jeff
Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)
On 06/07/2016 11:58 AM, Bernd Edlinger wrote: AFACT this is not the only place where overly complex RTL trees can cause an ICE. That wouldn't surprise me at all -- but the design of RTL is such that it can be arbitrarily complex. Essentially, routines can not make assumptions about the complexity of the RTL they're given -- except for any guards/pre-conditions set up in the caller. Yes. That is what I think too, "X" is probably not used often in practice, but if it is allowed, it should at least not result in an ICE. There's no disagreement about "X" should never result in an ICE. The question is whether or not "X" truly means an arbitrary piece of RTL or some tighter subset of RTL. I guess one could also raise the question of whether or not "X" should ever be exposed to asms. "X"'s original intent IIRC was to specify that for a particular alternative the operand didn't matter and would never be used. This comes up with scratch registers for example. "X" should allow to feed "whatsoever" valid C expressions to the asm input, and using the %-expression in the assembler string should not cause an ICE. Actually it doesn't even have to be a valid C expression. It might have started as a C expression from an ASM statement, but been modified by various RTL optimization passes into something that isn't necessarily a valid C expression. However if I compare commond.md ... (define_constraint "i" "Matches a general integer constant." (and (match_test "CONSTANT_P (op)") (match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)"))) ... against recog.c, general_operand seems to allow less i.e. checks targetm.legitimate_constant_p additionally. So that is something I don't really understand, I mean if a constant is not a "legitimate" constant by the target's definition, why should it be safe to use in later in targetm.asm_out.print_operand? Legitimacy in this context is usually something related to operands that have to be reloaded for PIC code generation. They may print just fine, but from a code generation standpoint they require special handling. One could argue that this is inconsistent and it ought to be cleaned up, but that's separate from the "X" issue. jeff
Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN
On Thu, 9 Jun 2016, Jeff Law wrote: > On 05/26/2016 09:04 AM, David Malcolm wrote: > > When using config-list.mk to build all configurations, it's useful > > to filter out the configurations that are known to be broken. > > > > This patch does so, adding a KNOWN_BROKEN variable. > > > > contrib/ChangeLog: > > * config-list.mk (LIST): Rename to... > > (FULL_LIST): ...this. > > (KNOWN_BROKEN): New variable. > > (LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN. > I'd rather fix, deprecate and/or remove these kinds of targets. > > Given that interix has been broken for a long time, I'll approve a patch that > removes it. The rule was meant to be that a target is deprecated in one release, then removed in the next - the error message given without --enable-obsolete says so. So all the targets that require --enable-obsolete in GCC 6 should be removed for GCC 7 in the absence of interest in reviving them. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] PR bootstrap/71471: remove selftest for pp_format (%p)
I was confused by the comment to pp_format: /* The following format specifiers are recognized as being client independent: ... %p: pointer ... */ into thinking that %p is printed in a host-independent manner, when "client independent" means in relation to different pretty-printers e.g the C++ pretty printer, or normal tree one etc. The following patch removes the overzealous test case and clarifies the comment. OK for trunk? I'm working on a followup that makes selftest failures easier to track down. gcc/ChangeLog: PR bootstrap/71471 * pretty-print.c (pp_indent): Specify that %p is printed in a host-dependent manner. (test_pp_format): Remove the test for %p. --- gcc/pretty-print.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c index d805da4..8febda0 100644 --- a/gcc/pretty-print.c +++ b/gcc/pretty-print.c @@ -279,7 +279,7 @@ pp_indent (pretty_printer *pp) %wd, %wi, %wo, %wu, %wx: HOST_WIDE_INT versions. %c: character. %s: string. - %p: pointer. + %p: pointer (printed in a host-dependent manner). %r: if pp_show_color(pp), switch to color identified by const char *. %R: if pp_show_color(pp), reset color. %m: strerror(text->err_no) - does not consume a value from args_ptr. @@ -1317,8 +1317,8 @@ test_pp_format () assert_pp_format ("A 12345678", "%c %x", 'A', 0x12345678); assert_pp_format ("hello world 12345678", "%s %x", "hello world", 0x12345678); - assert_pp_format ("0xcafebabe 12345678", "%p %x", (void *)0xcafebabe, - 0x12345678); + /* We can't test for %p; the pointer is printed in an implementation-defined + manner. */ assert_pp_format ("normal colored normal 12345678", "normal %rcolored%R normal %x", "error", 0x12345678); -- 1.8.5.3
Re: [PATCH 0/9] separate shrink-wrapping
On 06/07/2016 07:47 PM, Segher Boessenkool wrote: This patch series introduces separate shrink-wrapping. There are many things the prologue/epilogue of a function do, and most of those things can be done independently. For example, most of the time, for many targets, the save of callee-saved registers can be done later than the "main" prologue. Doing so helps quite a bit because the prologue is expensive for functions that do not need everything it does done for every path through the function; often, the hot paths do not need much at all, e.g. not those things the prologue needs to do for the function to call other functions. The first patch creates a command-line flag, some hooks, a status flag ("is this function wrapped separately", used by later passes), and documentation for these things. The next six patches are to prevent later passes from mishandling the epilogue instructions that now appear before the epilogue: mostly, you cannot do much to instructions with a REG_CFA_RESTORE note without confusing dwarf2cfi. The cprop one is for prologue instructions. Then, the main patch. And finally a patch for PowerPC that implements separate wrapping for GPRs and LR. Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on powerpc64le-linux. Previous versions of this series also tested on x86_64-linux. Is this okay for trunk? Segher Segher Boessenkool (9): separate shrink-wrap: New command-line flag, status flag, hooks, and doc cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate dce: Don't dead-code delete separately wrapped restores regrename: Don't rename restores regrename: Don't run if function was separately shrink-wrapped sel-sched: Don't mess with register restores cprop: Leave RTX_FRAME_RELATED_P instructions alone shrink-wrap: shrink-wrapping for separate concerns rs6000: Separate shrink-wrapping I'm going to largely let Bernd own the review on this. Just a few comments. I certainly like the concept. My mental model is that parts of the prologue might sink further than other parts of the prologue. It's not an exact match, but I think it's a "close enough" mental model. It looks like the "concerns" are all target defined and its the target's responsibility to deal with dependencies between the "concerns". Right? (BTW, I think "concerns" is a particularly poor name, perhaps "reasons" would be better?). Right now it looks like shrink_wrapped_separate essentially says "we did something special here" -- while I don't think it's necessary for this patch, describing what we did might be better. Essentially different paths would have a set of prologue/epilogue attributes that apply to that path. This would be useful for things like cfgcleanup where you could join noreturn calls more aggressively. Though this may not matter in any significant way in practice. There may be enough commonality that we can promote some "concerns" from target dependent to target independent -- I've often wondered if we could handle a fair amount of prologue/epilogue generation in target independent ways.For simple targets, I wouldn't be terribly surprised if all prologue/epilogue generation could be handled generically and they'd get separate shrink-wrapping "for free". But that's all blue-sky. Jeff
[PATCH] Fix PR 71407 : Use correct types for live usage of live operations
This patch fixes PR 71407 by ensuring the BIT_FIELD_REF is created using the given vectype and then casted to the result type. Tested on x86 and aarch64. Note that the test vectorizes on x86 but does not vectorize on aarch64 or power (due to a != statement failing to vectorize) Ok to commit? gcc/ PR tree-optimization/71407 PR tree-optimization/71416 * tree-vect-loop.c (vectorizable_live_operation): Use vectype for BIT_FIELD_REF type. Testsuite/ PR tree-optimization/71407 PR tree-optimization/71416 * gcc.dg/vect/pr71407.c: New * gcc.dg/vect/pr71416-1.c: New * gcc.dg/vect/pr71416-2.c: New Alan. pr71407.patch Description: Binary data
Re: [PATCH] c/70883 - inconsistent error message for calls to __builtin_add_overflow with too few arguments
On 06/07/2016 08:44 PM, Martin Sebor wrote: Ping: This is a trivial patch to make the text of an error message for an insufficient number of arguments to a built-in consistent across different built-ins. Rather that sometimes saying "not enough arguments" and others "too few arguments" the patch makes GCC print "too few" in all cases. https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00093.html OK. jeff
Re: [PATCH] Do not enable -fcheck-pointer-bounds w/ -fsanitize=bounds
On 06/09/2016 03:12 AM, Martin Liška wrote: Hello. I've prepared patch for the PR that is very similar to what was done in PR65044. I've been currently testing the patch on x86_64-linux. Ready to be installed after it finishes? OK. jeff
Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
On 06/08/2016 10:47 AM, Jiong Wang wrote: As discussed on the PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751, here is the patch. For this particular failure on arm, *arm_movsi_insn has the following operand constraints: operand 0: "=rk,r,r,r,rk,m" operand 1: "rk, I,K,j,mi,rk" gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it comes from substituion that alternative (alt) 4 got a chance to compete with alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is with rld_nregs=1. I fell it's OK to give alt 4 a chance here, but we should calculate the cost correctly. For alt 4, it should be treated as spill into memory, but currently lra only recognize a spill for pseudo register. While the spilled rtx for alt 4 is a plus after equiv substitution. (plus:SI (reg/f:SI 102 sfp) (const_int 4 [0x4])) This patch thus let lra-constraint cost spill of Non-pseudo as well and fixed the regression. x86_64/aarch64/arm boostrap and regression OK. arm bootstrapped cc1 is about 0.3% smaller in code size. OK for trunk? gcc/ PR rtl-optimization/70751 * lra-constraints.c (process_alt_operands): Recognize Non-pseudo spilled into memory. I think the change itself is fine, but the comment could use some work. Actually I think you just need to remove the first sentence (the one referring to BZ70751 and r235184) and keep everything from "Suppose a target" onward. OK with that fix. jeff
Re: [PATCH][MIPS] Don't split shifts by default for MIPS16.
On Tue, 24 May 2016, Robert Suchanek wrote: > The following changes the default behaviour of shift splitting > for MIPS16 e.g. the shifts will be split only when used with > undocumented -mno-debugd option that is now switched on by default. > > This appears to enable better optimization in certain cases, and hence, > giving slightly better performance. Thank you for your contribution, however I have some issues with your proposal. First of all since TARGET_DEBUG_D_MODE is used across several places I think this really should be split into two separate pieces, independently reviewed. Then we have two parts: 1. The change to the expander is probably all but obviously correct as it complements a similar one applied to the corresponding splitters eons ago (back in 2001), although I wonder why we need to have both the splitters and instructions split manually in the expander in the first place. Can you please investigatie it? 2. The other change is far from obvious since it flips the splitting default, which has been there since forever (or the addition of MIPS16 support back in 1998). So I think it needs a justification more elaborate than just "certain cases", and perhaps a proper command-line option defined rather than using an obscure debugging hook. It may also make sense to set the default dynamically from the CPU tuning selected -- if the "certain cases" change from CPU to CPU, that is. Can you please look into it? Maciej
Re: [PATCH, RFC] First cut at using vec_construct for strided loads
> On Jun 8, 2016, at 9:05 AM, Richard Bienerwrote: > > On Wed, 8 Jun 2016, Bill Schmidt wrote: > >> Hi Richard, >> >>> On Jun 8, 2016, at 7:29 AM, Richard Biener >>> wrote: >>> >>> On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt >>> wrote: This patch is a follow-up to the discussion generated by http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html. I've added vec_construct to the cost model for use in vect_model_load_cost, and implemented a cost calculation that makes sense to me for PowerPC. I'm less certain about the default, i386, and spu implementations. I took a guess at i386 from the discussions we had, and used the same calculation for the default and for spu. I'm hoping you or others can fill in the blanks if I guessed badly. The i386 cost for vec_construct is different from all the others, which are parameterized for each processor description. This should probably be parameterized in some way as well, but thought you'd know better than I how that should be. Perhaps instead of elements / 2 + 1 it should be (elements / 2) * X + Y where X and Y are taken from the processor description, and represent the cost of a merge and a permute, respectively. Let me know what you think. >>> >>> Just trying to understand how you arrived at the above formulas in >>> investigating >>> strangely low cost for v16qi construction of 9. If we pairwise reduce >>> elements >>> with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd >>> get with not accounting an initial move of element zero into a vector and >>> then >>> inserting each other element into that with elements - 1 inserts. >> >> What I wrote there only makes partial sense for certain types on Power, so >> far as >> I can tell, and even then it doesn’t generalize properly. When the scalar >> registers >> are contained in the vector registers (as happens for floating-point on >> Power), then >> you can do some merges and other forms of permutes to combine them faster >> than doing specific inserts. But that isn’t a general solution even on >> Power; for the >> integer modes we still do inserts. > > You mean Power has instructions to combine more than two vector registers > into one? Otherwise you still need n / 2 plus n / 4 plus n / 8 ... > "permutes" which boils down to n - 1. Right, we do not. This is what I meant about it not generalizing properly -- it was an ill-thought-out, off-the-cuff remark, so far as I can tell. I agree with the n - 1, and I need to go in and make a similar change for Power. There is a special case with 32-bit floating-point that probably also needs adjustment. Bill > >> So what you have makes sense to me, and what’s currently in place for Power >> needs >> work also, so far as I can tell. I’ll take a note to revisit this. > > Thanks. > Richard. > >> — Bill >> >>> >>> This also matches up with code-generation on x86_64 for >>> >>> vT foo (T a, T b, ...) >>> { >>> return (vT) {a, b, ... }; >>> } >>> >>> for any vector / element type combination I tried. Thus the patch below. >>> >>> I'll bootstrap / test that on x86_64-linux and I'm leaving other >>> targets to target >>> maintainers. >>> >>> Ok for the i386 parts? >>> >>> Thanks, >>> Richard. >>> >>> 2016-06-08 Richard Biener >>> >>> * targhooks.c (default_builtin_vectorization_cost): Adjust >>> vec_construct cost. >>> * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. >>> >>> Index: gcc/targhooks.c >>> === >>> --- gcc/targhooks.c (revision 237196) >>> +++ gcc/targhooks.c (working copy) >>> @@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum >>>return 3; >>> >>> case vec_construct: >>> - elements = TYPE_VECTOR_SUBPARTS (vectype); >>> - return elements / 2 + 1; >>> + return TYPE_VECTOR_SUBPARTS (vectype) - 1; >>> >>> default: >>>gcc_unreachable (); >>> Index: gcc/config/i386/i386.c >>> === >>> --- gcc/config/i386/i386.c (revision 237196) >>> +++ gcc/config/i386/i386.c (working copy) >>> @@ -49503,8 +49520,6 @@ static int >>> ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, >>> tree vectype, int) >>> { >>> - unsigned elements; >>> - >>> switch (type_of_cost) >>>{ >>> case scalar_stmt: >>> @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve >>>return ix86_cost->vec_stmt_cost; >>> >>> case vec_construct: >>> - elements = TYPE_VECTOR_SUBPARTS (vectype); >>> - return ix86_cost->vec_stmt_cost * (elements / 2 + 1); >>> + return
[ARM] Fix, add tests for FP16 aapcs.
Hello, A number of tests were added to check for FP16 arguments and return values being passed in registers. These require mfloat-abi=hard to be selected but in some test configurations they were run with -mfloat-abi=soft or -mfloat-abi=softfp. Explict skip-if directives are added to the tests to ensure that they only run on valid configurations. In addition, the code in the gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument passing and return values and a softfp variant is added as fp16-aapcs-2.c. Tested for arm-none-linux-gnueabihf with native make check and for arm-none-eabi with cross-compiled check-gcc. Also checked the new tests with cross-compiled arm-eabi-qemu/-mcpu=cortex-m3/-mthumb. Ok for trunk? Matthew 2016-06-09 Matthew Wahab* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for mfloat-abi=soft and mfloat-abi=softfp. Replace arm_neon_fp16_ok with arm_neon_fp16_hw. * testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise. * testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for mfloat-abi=soft and mfloat-abi=softfp. Also, simplify the test and set option -mfloat-abi=hard. * testsuite/gcc.target/arm/fp16-aapcs-2.c: New. >From b02f0283367d4a4c1b012e8ca8e7b5c91f3ac561 Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Tue, 24 May 2016 09:21:11 +0100 Subject: [PATCH] [ARM] Fix, add tests for FP16 aapcs. A number of tests were added to check for FP16 arguments and return values being passed in register. These require mfloat-abi=hard to be selected but, in some test configurations, they were run with -mfloat-abi=soft or -mfloat-abi=softfp. Explict skip-if directives are added to the tests to ensure that they only run on valid configurations. In addition, the code in the gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument passing and return values and a softfp variant is added as fp16-aapcs-2.c. 2016-06-09 Matthew Wahab * testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for mfloat-abi=soft and mfloat-abi=softfp. Replace arm_neon_fp16_ok with arm_neon_fp16_hw. * testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise. * testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise. * testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for mfloat-abi=soft and mfloat-abi=softfp. Also, simplify the test and set option -mfloat-abi=hard. * testsuite/gcc.target/arm/fp16-aapcs-2.c: New. --- gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c | 3 ++- gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c | 3 ++- gcc/testsuite/gcc.target/arm/aapcs/vfp18.c | 3 ++- gcc/testsuite/gcc.target/arm/aapcs/vfp19.c | 3 ++- gcc/testsuite/gcc.target/arm/aapcs/vfp20.c | 3 ++- gcc/testsuite/gcc.target/arm/aapcs/vfp21.c | 3 ++- gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c | 22 +- gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c | 21 + 8 files changed, 46 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c index 680a3b5..6764a19 100644 --- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c +++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c @@ -1,8 +1,9 @@ /* Test AAPCS layout (VFP variant for Neon types) */ /* { dg-do run { target arm_eabi } } */ -/* { dg-require-effective-target arm_neon_fp16_ok } */ +/* { dg-require-effective-target arm_neon_fp16_hw } */ /* { dg-add-options arm_neon_fp16 } */ +/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */ #ifndef IN_FRAMEWORK #define VFP diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c index fc2b13b..b2526a1 100644 --- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c +++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c @@ -1,8 +1,9 @@ /* Test AAPCS layout (VFP variant for Neon types) */ /* { dg-do run { target arm_eabi } } */ -/* { dg-require-effective-target arm_neon_fp16_ok } */ +/* { dg-require-effective-target arm_neon_fp16_hw } */ /* { dg-add-options arm_neon_fp16 } */ +/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */ #ifndef IN_FRAMEWORK #define VFP diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c index 225e9ce..629b884 100644 --- a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c +++
RE: [PATCH][MIPS] Add -minline-intermix to ignore compression flags when inlining
On Wed, 25 May 2016, Robert Suchanek wrote: > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > index 73f1cb6..2f6195e 100644 > > > --- a/gcc/doc/invoke.texi > > > +++ b/gcc/doc/invoke.texi > > > @@ -837,6 +837,7 @@ Objective-C and Objective-C++ Dialects}. > > > -mips16 -mno-mips16 -mflip-mips16 @gol > > > -minterlink-compressed -mno-interlink-compressed @gol > > > -minterlink-mips16 -mno-interlink-mips16 @gol > > > +-minline-intermix -mno-inline-intermix @gol > > > > Funky indentation here > > I'm not sure what's wrong here, it's the diff I'd say. There is no > indentation within "MIPS Options". Two spaces rather than one are used between individual options (modulo formatting errata in some previous additions). Maciej
Re: [PATCH 2/3] config-list.mk: add GCC_SRC_DIR
On 05/26/2016 09:04 AM, David Malcolm wrote: config-list.mk currently requires the pwd to be in a sibling directory of the source tree. However, building using config-list.mk can consume over 400GB of disk space in this build directory (e.g. my machine successfully built cc1 for 206 configurations last night, consuming 442GB of space). I've found it useful to be able to run config-list.mk in an arbitrary build location (i.e. one with plenty of free space), so this patch adds a GCC_SRC_DIR variable which can be overridden. contrib/ChangeLog: * config-list.mk (GCC_SRC_DIR): New variable. (make-log-dir): Use GCC_SRC_DIR. ($(LIST)): Likewise. OK. Good to see this happen, I've found the restriction annoying as well. jeff
Re: [PATCH 3/3] config-list.mk: add OPT-enable-obsolete to 4 targets
On 05/26/2016 09:04 AM, David Malcolm wrote: r233165 marked three deprecated rtems targets as obsolete. r233887 marked mep-elf as obsolete. Update config-list.mk to add OPT-enable-obsolete to these 4 targets. contrib/ChangeLog: * config-list.mk (FULL_LIST): Add OPT-enable-obsolete to avr-rtems, h8300-rtems, m32r-rtems, mep-elf. OK. jeff
Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN
On 05/26/2016 09:04 AM, David Malcolm wrote: When using config-list.mk to build all configurations, it's useful to filter out the configurations that are known to be broken. This patch does so, adding a KNOWN_BROKEN variable. contrib/ChangeLog: * config-list.mk (LIST): Rename to... (FULL_LIST): ...this. (KNOWN_BROKEN): New variable. (LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN. I'd rather fix, deprecate and/or remove these kinds of targets. Given that interix has been broken for a long time, I'll approve a patch that removes it. jeff
Re: Update probabilities in predict.def to match reality
On 06/07/2016 09:27 PM, Jan Hubicka wrote: > There are bugs in few predictors - goto predictor is dead because the FE code > was dropped, > return predictor is bit random because CFG is optimized (it should probably > be done in FE), > loop iv compare seems bogus and fortran fail alloc does not seem to work as > intended. > I added FIXME and will addres them incrementally. Hi. I've investigated why 'loop iv compare heuristics' provides bogus values and I've just created PR for that: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71474 And I would like to apply following patch, that adds 'XFAIL' test-case described in the PR and I would distinguish scanning for 'loop iv compare' and 'guess loop iv compare'. Can I install the patch? Thanks, Martin >From 6272402f76d4e6ff496d55e9a4fac7ee9a696e4e Mon Sep 17 00:00:00 2001 From: marxinDate: Thu, 9 Jun 2016 16:53:32 +0200 Subject: [PATCH] Make 'loop iv compare' heuristics scanning more precise in test-suite gcc/testsuite/ChangeLog: 2016-06-09 Martin Liska * gcc.dg/predict-1.c: Distinguish between "loop iv compare" and "guess loop iv compared" heuristics. * gcc.dg/predict-2.c: Likewise. * gcc.dg/predict-3.c: Likewise. * gcc.dg/predict-4.c: Likewise. * gcc.dg/predict-5.c: Likewise. * gcc.dg/predict-6.c: Likewise. * gfortran.dg/predict-1.f90: New test. --- gcc/testsuite/gcc.dg/predict-1.c| 2 +- gcc/testsuite/gcc.dg/predict-2.c| 2 +- gcc/testsuite/gcc.dg/predict-3.c| 2 +- gcc/testsuite/gcc.dg/predict-4.c| 2 +- gcc/testsuite/gcc.dg/predict-5.c| 2 +- gcc/testsuite/gcc.dg/predict-6.c| 2 +- gcc/testsuite/gfortran.dg/predict-1.f90 | 18 ++ 7 files changed, 24 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/predict-1.f90 diff --git a/gcc/testsuite/gcc.dg/predict-1.c b/gcc/testsuite/gcc.dg/predict-1.c index d0924f2..10d62ba 100644 --- a/gcc/testsuite/gcc.dg/predict-1.c +++ b/gcc/testsuite/gcc.dg/predict-1.c @@ -23,4 +23,4 @@ void foo (int bound) } } -/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 2.0%" 5 "profile_estimate"} } */ +/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 5 "profile_estimate"} } */ diff --git a/gcc/testsuite/gcc.dg/predict-2.c b/gcc/testsuite/gcc.dg/predict-2.c index 3011686..aa91568 100644 --- a/gcc/testsuite/gcc.dg/predict-2.c +++ b/gcc/testsuite/gcc.dg/predict-2.c @@ -23,4 +23,4 @@ void foo (int base, int bound) } } -/* { dg-final { scan-tree-dump-not "loop iv compare heuristics of edge\[^:\]*:" "profile_estimate"} } */ +/* { dg-final { scan-tree-dump-not "guess loop iv compare heuristics of edge\[^:\]*:" "profile_estimate"} } */ diff --git a/gcc/testsuite/gcc.dg/predict-3.c b/gcc/testsuite/gcc.dg/predict-3.c index 663f141..7274963 100644 --- a/gcc/testsuite/gcc.dg/predict-3.c +++ b/gcc/testsuite/gcc.dg/predict-3.c @@ -25,4 +25,4 @@ void foo (int bound) } } -/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 98.0%" 3 "profile_estimate"} } */ +/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 3 "profile_estimate"} } */ diff --git a/gcc/testsuite/gcc.dg/predict-4.c b/gcc/testsuite/gcc.dg/predict-4.c index 5779da3..2ac2ec5 100644 --- a/gcc/testsuite/gcc.dg/predict-4.c +++ b/gcc/testsuite/gcc.dg/predict-4.c @@ -15,4 +15,4 @@ void foo (int bound) } } -/* { dg-final { scan-tree-dump "loop iv compare heuristics of edge\[^:\]*: 50.0%" "profile_estimate"} } */ +/* { dg-final { scan-tree-dump " loop iv compare heuristics of edge\[^:\]*: 50.0%" "profile_estimate"} } */ diff --git a/gcc/testsuite/gcc.dg/predict-5.c b/gcc/testsuite/gcc.dg/predict-5.c index 56ada30..135081d 100644 --- a/gcc/testsuite/gcc.dg/predict-5.c +++ b/gcc/testsuite/gcc.dg/predict-5.c @@ -21,4 +21,4 @@ void foo (int base, int bound) } } -/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 98.0%" 4 "profile_estimate"} } */ +/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 4 "profile_estimate"} } */ diff --git a/gcc/testsuite/gcc.dg/predict-6.c b/gcc/testsuite/gcc.dg/predict-6.c index 9ed41ed..104683f 100644 --- a/gcc/testsuite/gcc.dg/predict-6.c +++ b/gcc/testsuite/gcc.dg/predict-6.c @@ -21,4 +21,4 @@ void foo (int base, int bound) } } -/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */ +/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */ diff --git a/gcc/testsuite/gfortran.dg/predict-1.f90 b/gcc/testsuite/gfortran.dg/predict-1.f90 new file mode 100644 index 000..78a03f6 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/predict-1.f90 @@ -0,0 +1,18 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-profile_estimate" } +
Re: [PATCH][wwwdocs][AArch64] Mention -mcpu=qdf24xx support for GCC 6
On Thu, Jun 9, 2016 at 1:43 AM, Kyrill Tkachovwrote: > On 03/06/16 21:32, Evandro Menezes wrote: >> Shouldn't this read "The Qualcomm QDF24xx processors are now supported via >> the"? >> > I used Jim's suggestion at > https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01423.html. > I don't have a strong opinion on it either. > If Jim objects to the wording I'm happy to change it. I don't have a strong opinion either. The option was named qdf24xx in anticipation that there might be more than one closely related part, but currently there are zero of them, as nothing has been released yet. Jim
[committed] MIPS: Stay within 79 columns in `mips_output_jump'
Hi, I have applied this change, as obvious. gcc/ * config/mips/mips.c (mips_output_jump): Fix formatting. Maciej gcc-mips-output-jump-format.diff Index: gcc/config/mips/mips.c === --- gcc/config/mips/mips.c (revision 237266) +++ gcc/config/mips/mips.c (working copy) @@ -13588,8 +13588,9 @@ mips_output_jump (rtx *operands, int tar else s += sprintf (s, "%%*"); - s += sprintf (s, "%s%s%s%s%s\t%%%d%s", insn_name, and_link, reg, compact, short_delay, - target_opno, nop); + s += sprintf (s, "%s%s%s%s%s\t%%%d%s", + insn_name, and_link, reg, compact, short_delay, + target_opno, nop); if (!reg_p && TARGET_ABICALLS_PIC2) s += sprintf (s, "\n\t.option\tpic2");
Re: [Patch, testsuite] Skip some more tests for targets with int size < 32
On Jun 8, 2016, at 6:14 AM, Senthil Kumar Selvarajwrote: > > This patch requires int32plus support for a few more tests - these > were failing for the avr target. > If this is ok, could someone commit please? I don't have commit access. Ok. Committed revision 237266. > 2016-06-08 Senthil Kumar Selvaraj > > * gcc.c-torture/execute/bswap-2.c: Require int32plus. > * gcc.dg/torture/pr68067-1.c: Likewise. > * gcc.dg/torture/pr68067-2.c: Likewise.
Re: [Patch, avr] Fix broken stack-usage-1.c test
On Jun 8, 2016, at 4:20 AM, Senthil Kumar Selvarajwrote: > > I forgot to send this testcase modification with that patch - here's > the fix for making gcc.dg/stack-usage-1.c pass again for avr. > > If this is ok, could someone commit please? I don't have commit > access. Ok. Committed revision 237266. > 2016-06-08 Senthil Kumar Selvaraj > > * gcc.dg/stack-usage-1.c (SIZE): Consider return address > when setting SIZE.
[PATCH] Fix tests for x86 interrupt for -fpic and -march=corei7 targets
Hi, Here is a trivial patch, that fixes tests for these targets. Ok for trunk? gcc/testsuite/ChangeLog: * gcc.target/i386/interrupt-12.c: Fix test for -fpic and corei7. * gcc.target/i386/interrupt-13.c: Likewise. * gcc.target/i386/interrupt-15.c: Likewise. * gcc.target/i386/interrupt-14.c: Fix test for -fpic. * gcc.target/i386/interrupt-24.c: Likewise. * gcc.target/i386/interrupt-3.c: Fix test for corei7. * gcc.target/i386/interrupt-9.c: Likewise. * gcc.target/i386/interrupt-redzone-2.c: Likewise. Br, Julia fix_tests_09 Description: fix_tests_09
[hsa-branch 5/5] OMP lowering/expansion changes to gridify tiled loops
Hi, the patch below is the main part of the series that enhances the existing gridification code to pattern-match sequences of constructs in which the distribute, parallel and a loop constructs are not in one simple combined construct but the step in the distribute loop exactly matches the iteration size of (possibly many) inner loop constructs. It also checks other conditions, most notably that an unknown non-const and non-pure function is not called as part of the distribute loop but outside of a normal loop construct. If this pattern is matched, iterations of the distribute loop are converted to HSA groups and iterations of inner loop constructs to individual work-items within those groups. In the code, the inner loops are converted to a barrier. I'll commit this to the hsa branch in a few moments and will re-submit it for trunk at some point in summer. Thanks, Martin 2016-06-09 Martin Jamborgcc/ * gimple.h (enum gf_mask): New element GF_OMP_FOR_GRID_GROUP_ITER. (gimple_omp_for_grid_group_iter): New function. (gimple_omp_for_set_grid_group_iter): Likewise. * omp-low.c (check_omp_nesting_restrictions): Allow kernel loop in place of a distribute one. (grid_expand_omp_for_loop): New parameter specifying whether the loop is an intra-group one. If so or if the loop is over groups, use the respective builtins for expansion. Emit barriers for intra-group ones. Moved a branch removeal here from grid_expand_target_grid_body. (grid_expand_target_grid_body): If the loop iterates over groups, find the intra-group ones, expand them and remove them from the OMP construct tree. (grid_prop): New type. (grid_safe_assignment_p): New parameter grid, do not consider safe assignments to variables holding group sizes. (grid_seq_only_contains_local_assignments): New parameter grid, pass it to grid_safe_assignment_p. (grid_find_single_omp_among_assignments_1): Likewise. Also tkae target location from grid. Emit more missed-optimizations information. (grid_find_single_omp_among_assignments): Likewise. (grid_parallel_clauses_gridifiable): New function. (grid_inner_loop_gridifiable_p): Likewise. (grid_dist_follows_simple_pattern): Likewise. (grid_gfor_follows_tiling_pattern): Likewise. (grid_call_permissible_in_distribute_p): Likewise. (grid_handle_call_in_distribute): Likewise. (grid_dist_follows_tiling_pattern): Likewise. (grid_target_follows_gridifiable_pattern): New parameter grid. If distribute is not in a combined construct, attempt tiled gridification. Parts of simple gridification moved to the new functions. (grid_var_segment): New enum. (grid_mark_variable_segment): New function. (grid_copy_leading_local_assignments): Also call grid_mark_variable_segment if requested by a new parameter. (grid_mark_tiling_loops): New function. (grid_mark_tiling_parallels_and_loops): Likewise. (grid_process_kernel_body_copy): Also handle tiled grids. (grid_attempt_target_gridification): new variable grid to be passed around. Get group sizes from it. gcc/testsuite/ * c-c++-common/gomp/gridify-2.c: New test. * c-c++-common/gomp/gridify-3.c: Likewise. libgomp/ * testsuite/libgomp.hsa.c/tiling-1.c: New test. * testsuite/libgomp.hsa.c/tiling-2.c: Likewise. --- gcc/gimple.h| 21 + gcc/omp-low.c | 1057 --- gcc/testsuite/c-c++-common/gomp/gridify-2.c | 66 ++ gcc/testsuite/c-c++-common/gomp/gridify-3.c | 68 ++ libgomp/testsuite/libgomp.hsa.c/tiling-1.c | 212 ++ libgomp/testsuite/libgomp.hsa.c/tiling-2.c | 258 +++ 6 files changed, 1434 insertions(+), 248 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-3.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-2.c diff --git a/gcc/gimple.h b/gcc/gimple.h index 063e29d..2680a13 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -163,6 +163,7 @@ enum gf_mask { GF_OMP_FOR_COMBINED= 1 << 4, GF_OMP_FOR_COMBINED_INTO = 1 << 5, GF_OMP_FOR_GRID_PHONY = 1 << 6, +GF_OMP_FOR_GRID_GROUP_ITER = 1 << 7, GF_OMP_TARGET_KIND_MASK= (1 << 4) - 1, GF_OMP_TARGET_KIND_REGION = 0, GF_OMP_TARGET_KIND_DATA= 1, @@ -5124,6 +5125,26 @@ gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value) omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY; } +/* Return true if iterations of a grid OMP_FOR statement correspond to HSA + groups. */ + +static inline bool +gimple_omp_for_grid_group_iter (const gomp_for
[hsa-branch 2/5] Make emit_insn_operands handle zero operands
Hi, the patch below allows emit_insn_operands to instructions with no operands gracefully. Apparently so far we have not produced any. I'll commit this to the hsa branch in a few moments and then to trunk at some point in summer. Martin 2016-06-02 Martin Jambor* hsa-brig.c (emit_insn_operands): Cope with zero operands in an instruction. --- gcc/hsa-brig.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index 9c74b9a..471533c 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -1236,20 +1236,20 @@ emit_insn_operands (hsa_insn_basic *insn) operand_offsets; unsigned l = insn->operand_count (); - operand_offsets.safe_grow (l); - - for (unsigned i = 0; i < l; i++) -operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i))); /* We have N operands so use 4 * N for the byte_count. */ uint32_t byte_count = lendian32 (4 * l); - unsigned offset = brig_data.add (_count, sizeof (byte_count)); - brig_data.add (operand_offsets.address (), -l * sizeof (BrigOperandOffset32_t)); + if (l > 0) +{ + operand_offsets.safe_grow (l); + for (unsigned i = 0; i < l; i++) + operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i))); + brig_data.add (operand_offsets.address (), +l * sizeof (BrigOperandOffset32_t)); +} brig_data.round_size_up (4); - return offset; } -- 2.8.2
[hsa-branch 3/5] Reorganize HSA branches representation
this patch reorganizes the class hierarchy we use to represent HSA branching and synchronization instructions. The ultimate goal is to find a way of representing a barrier instruction which shares the same kind with branches. It basically renames hsa_insn_br, which we have used to represent only conditional branches anyway, to hsa_insn_cbr and makes it inherit from a new ancestor that will be used to represent barriers. The next patch in the series actually introduces barrier instruction into the IL . I'll commit this to the hsa branch in a few moments and then to trunk at some point in summer. Martin 2016-06-03 Martin Jambor* hsa.h (hsa_insn_br): Renamed to hsa_insn_cbr, renamed all occurences in all files too. (hsa_insn_br): New class, now the ancestor of hsa_incn_cbr. (is_a_helper ::test): New function. (is_a_helper ::test): Adjust to only cover conditional branch instructions. * hsa-brig.c (emit_branch_insn): Renamed to emit_cond_branch_insn. Emit the width stored in the class. (emit_generic_branch_insn): New function. (emit_insn): Call emit_generic_branch_insn. * hsa-dump.c (hsa_width_specifier_name): New function. (dump_hsa_insn_1): Dump generic branch instructions. * hsa-gen.c (hsa_insn_br::hsa_insn_br): New. (hsa_insn_br::operator new): Likewise. (hsa_insn_cbr::hsa_insn_cbr): Set width via ancestor constructor. * hsa.c (hsa_destroy_insn): Also handle instances of hsa_insn_br. --- gcc/hsa-brig.c | 32 gcc/hsa-dump.c | 92 -- gcc/hsa-gen.c | 37 +-- gcc/hsa.c | 6 ++-- gcc/hsa.h | 49 +-- 5 files changed, 188 insertions(+), 28 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index 471533c..716d8f5 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -1496,11 +1496,29 @@ emit_cmp_insn (hsa_insn_cmp *cmp) brig_insn_count++; } -/* Emit an HSA branching instruction and all necessary directives, schedule - necessary operands for writing. */ +/* Emit an HSA generic branching/sycnronization instruction. */ + +static void +emit_generic_branch_insn (hsa_insn_br *br) +{ + struct BrigInstBr repr; + repr.base.base.byteCount = lendian16 (sizeof (repr)); + repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR); + repr.base.opcode = lendian16 (br->m_opcode); + repr.width = br->m_width; + repr.base.type = lendian16 (br->m_type); + repr.base.operands = lendian32 (emit_insn_operands (br)); + memset (, 0, sizeof (repr.reserved)); + + brig_code.add (, sizeof (repr)); + brig_insn_count++; +} + +/* Emit an HSA conditional branching instruction and all necessary directives, + schedule necessary operands for writing. */ static void -emit_branch_insn (hsa_insn_br *br) +emit_cond_branch_insn (hsa_insn_cbr *br) { struct BrigInstBr repr; @@ -1513,7 +1531,7 @@ emit_branch_insn (hsa_insn_br *br) repr.base.base.byteCount = lendian16 (sizeof (repr)); repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR); repr.base.opcode = lendian16 (br->m_opcode); - repr.width = BRIG_WIDTH_1; + repr.width = br->m_width; /* For Conditional jumps the type is always B1. */ repr.base.type = lendian16 (BRIG_TYPE_B1); @@ -1885,8 +1903,8 @@ emit_insn (hsa_insn_basic *insn) emit_segment_insn (seg); else if (hsa_insn_cmp *cmp = dyn_cast (insn)) emit_cmp_insn (cmp); - else if (hsa_insn_br *br = dyn_cast (insn)) -emit_branch_insn (br); + else if (hsa_insn_cbr *br = dyn_cast (insn)) +emit_cond_branch_insn (br); else if (hsa_insn_sbr *sbr = dyn_cast (insn)) { if (switch_instructions == NULL) @@ -1895,6 +1913,8 @@ emit_insn (hsa_insn_basic *insn) switch_instructions->safe_push (sbr); emit_switch_insn (sbr); } + else if (hsa_insn_br *br = dyn_cast (insn)) +emit_generic_branch_insn (br); else if (hsa_insn_arg_block *block = dyn_cast (insn)) emit_arg_block_insn (block); else if (hsa_insn_call *call = dyn_cast (insn)) diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c index 985caca..3b65684 100644 --- a/gcc/hsa-dump.c +++ b/gcc/hsa-dump.c @@ -621,6 +621,88 @@ hsa_m_atomicop_name (enum BrigAtomicOperation op) } } +/* Return textual name for atomic operation. */ + +static const char * +hsa_width_specifier_name (BrigWidth8_t width) +{ + switch (width) +{ +case BRIG_WIDTH_NONE: + return "none"; +case BRIG_WIDTH_1: + return "1"; +case BRIG_WIDTH_2: + return "2"; +case BRIG_WIDTH_4: + return "4"; +case BRIG_WIDTH_8: + return "8"; +case BRIG_WIDTH_16: + return "16"; +case BRIG_WIDTH_32: + return "32"; +case BRIG_WIDTH_64: + return "64"; +case BRIG_WIDTH_128: + return "128"; +case BRIG_WIDTH_256: + return "256"; +case BRIG_WIDTH_512: + return
[hsa-branch 1/5] Allow putting local variables into group and global segments
Hi, the following patch adds the capability to put local HSA variables into the group or global segment as indicated by new declaration attributes. In the process I had to fix how we differentiate between local and global attributes because context and allocation properties now can diverge. This patch does not do the things necessary to expose the attributes to the user. I wonder whether it even makes sense for OpenMP uses. I will do so when we provide the users with a more direct way of generating HSAIL. I will commit the patch to the HSA branch in a few moments. The ultimate goal is trunk but I'd like to keep it there for a number of weeks. Martin 2016-06-02 Martin Jambor* hsa-gen.c (get_symbol_for_decl): Fix dinstinguishing between global and local functions. Put local variables into a segment according to their attribute or static flag, if there is one. --- gcc/hsa-gen.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index c08f4a8..2ead76a 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -794,8 +794,8 @@ get_symbol_for_decl (tree decl) dummy.m_decl = decl; - bool is_in_global_vars -= TREE_CODE (decl) == VAR_DECL && is_global_var (decl); + bool is_in_global_vars = ((TREE_CODE (decl) == VAR_DECL) + && !decl_function_context (decl)); if (is_in_global_vars) slot = hsa_global_variable_symbols->find_slot (, INSERT); @@ -861,8 +861,17 @@ get_symbol_for_decl (tree decl) /* PARM_DECL and RESULT_DECL should be already in m_local_symbols. */ gcc_assert (TREE_CODE (decl) == VAR_DECL); - sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_PRIVATE, - BRIG_LINKAGE_FUNCTION); + BrigSegment8_t segment; + if (lookup_attribute ("hsa_group_segment", DECL_ATTRIBUTES (decl))) + segment = BRIG_SEGMENT_GROUP; + else if (TREE_STATIC (decl) + || lookup_attribute ("hsa_global_segment", + DECL_ATTRIBUTES (decl))) + segment = BRIG_SEGMENT_GLOBAL; + else + segment = BRIG_SEGMENT_PRIVATE; + + sym = new hsa_symbol (BRIG_TYPE_NONE, segment, BRIG_LINKAGE_FUNCTION); sym->m_align = align; sym->fillup_for_decl (decl); hsa_cfun->m_private_variables.safe_push (sym); -- 2.8.2
[hsa-branch 4/5] New HSA builtins needed for tiling
Hi, this patch adds two hsa builtins and code for emission of HSAIL for them and for BUILT_IN_GOMP_BARRIER. These built-ins are going to be introduced to GIMPLE IL at OMP expansion time by the subsequent patch in the series. I plan to experiment with adding more builtins for special HSAIL instructions but that is a separate effort. I'll commit this to the hsa branch in a few moments and then to trunk at some point in summer. Thanks, Martin 2016-06-03 Martin Jambor* hsa-builtins.def (BUILT_IN_HSA_GET_WORKGROUP_ID): New. (BUILT_IN_HSA_GET_WORKITEM_ID): Likewise. * hsa-gen.c (gen_hsa_insns_for_call): Emit HSAIL for the above builtins and for BUILT_IN_GOMP_BARRIER. Move emiting of BUILT_IN_HSA_GET_WORKITEM_ABSID up in the function. --- gcc/hsa-builtins.def | 4 gcc/hsa-gen.c| 38 ++ 2 files changed, 34 insertions(+), 8 deletions(-) diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def index e4681c1..3f183f1 100644 --- a/gcc/hsa-builtins.def +++ b/gcc/hsa-builtins.def @@ -27,5 +27,9 @@ along with GCC; see the file COPYING3. If not see /* The reason why they aren't in gcc/builtins.def is that the Fortran front end doesn't source those. */ +DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKGROUP_ID, "hsa_get_workgroup_id", +BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKITEM_ID, "hsa_get_workitem_id", +BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKITEM_ABSID, "hsa_get_workitem_absid", BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index fb376a1..efcba8c 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -5783,6 +5783,36 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) hbb->append_insn (atominsn); break; } + +case BUILT_IN_HSA_GET_WORKGROUP_ID: + { + hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true); + if (bdim->m_type != BRIG_TYPE_U32) + bdim->get_in_type (BRIG_TYPE_U32, hbb); + query_hsa_grid (stmt, BRIG_OPCODE_WORKGROUPID, bdim, hbb); + break; + } +case BUILT_IN_HSA_GET_WORKITEM_ID: + { + hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true); + if (bdim->m_type != BRIG_TYPE_U32) + bdim->get_in_type (BRIG_TYPE_U32, hbb); + query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMID, bdim, hbb); + break; + } +case BUILT_IN_HSA_GET_WORKITEM_ABSID: + { + hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true); + if (bdim->m_type != BRIG_TYPE_U32) + bdim->get_in_type (BRIG_TYPE_U32, hbb); + query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, bdim, hbb); + break; + } + +case BUILT_IN_GOMP_BARRIER: + hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, BRIG_TYPE_NONE, +BRIG_WIDTH_ALL)); + break; case BUILT_IN_GOMP_PARALLEL: { gcc_checking_assert (gimple_call_num_args (stmt) == 4); @@ -5798,14 +5828,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) break; } -case BUILT_IN_HSA_GET_WORKITEM_ABSID: - { - hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true); - if (bdim->m_type != BRIG_TYPE_U32) - bdim->get_in_type (BRIG_TYPE_U32, hbb); - query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, bdim, hbb); - break; - } case BUILT_IN_OMP_GET_THREAD_NUM: { query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb); -- 2.8.2
[hsa-branch 0/5] Gridification support for tiling algorithms
Hi, this patch series, currently intended just for the branch but eventually also for trunk in time for gcc 7, enables gridification, that is the expansion of OpenMP loops for HSA GPUs, to work with separate distribute and loop construct, provided that the step size of the distribute loop is equal to the iteration size of iteration space of the "normal" loops in it. It also allows the HSA back-end to emit group-segment variables and expands any variables private to a distribute construct as such. Apart from increased flexibility, one of the main motivations is to enable tiling. The patches enable the compiler to grok the matrix multiplication code example and and emit it to HSA, which than runs 2.5 times faster (in my very non-scientific settings) than a naive implementation (compiled for HSA). Thanks, Martin #define BLOCK_SIZE 16 void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA, const float*B, const int LDB, const float beta, float*C, const int LDC){ #pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N]) #pragma omp distribute collapse(2) for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE) for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE) { /* Each team has a local copy of these mini matrices */ float As[BLOCK_SIZE][BLOCK_SIZE]; float Bs[BLOCK_SIZE][BLOCK_SIZE]; #pragma omp parallel { int C_row, C_col; float Cval = 0.0; for (int kblock = 0; kblock < K ; kblock += BLOCK_SIZE ) { #pragma omp for collapse(2) for (int row=0 ; row < BLOCK_SIZE ; row++) for (int col=0 ; col < BLOCK_SIZE ; col++) { C_row = C_row_start + row; C_col = C_col_start + col; if ((C_row < M) && (kblock + col < K)) As[row][col] = A[(C_row*LDA)+ kblock + col]; else As[row][col] = 0; if ((kblock + row < K) && C_col < N) Bs[row][col] = B[((kblock+row)*LDB)+ C_col]; else Bs[row][col] = 0; } #pragma omp for collapse(2) for (int row=0 ; row < BLOCK_SIZE ; row++) for (int col=0 ; col < BLOCK_SIZE ; col++) { for (int e = 0; e < BLOCK_SIZE; ++e) Cval += As[row][e] * Bs[e][col]; } } /* End for kblock .. */ #pragma omp for collapse(2) for (int row=0 ; row < BLOCK_SIZE ; row++) for (int col=0 ; col < BLOCK_SIZE ; col++) { C_row = C_row_start + row; C_col = C_col_start + col; if ((C_row < M) && (C_col < N)) C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col]; } } /* end parallel */ }/* end target teams distribute */ }
[hsa-branch] Fix issue with an undefined builtin
Hi, when I added HSA-only builtins to the HSA branch, I added them only conditionally if HSA was enabled, which however broke non-HSA compilation. This patch fixes this be removing the ifdefs. Now that we have reorganized HSA so that all GTY stuff is in one file (hsa.c), the right thing to do is to conditionally compile all of it. However, this will mean putting ifdefs also to omp-lowering, which is something that I would like to do only after we split omp-low.c and gridification gets its own file. At the branch this does not do any real harm anyway. I will commit this to the branch straight away. Thanks, Martin 2016-06-06 Martin Jambor* builtins.def: Do not enclose DEF_HSA_BUILTIN by an ifdef ENABLE_HSA. diff --git a/gcc/builtins.def b/gcc/builtins.def index 2bc933b..4e1c0ac 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -189,14 +189,10 @@ along with GCC; see the file COPYING3. If not see || flag_offload_abi != OFFLOAD_ABI_UNSET)) #undef DEF_HSA_BUILTIN -#ifdef ENABLE_HSA #define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ false, false, true, ATTRS, false, \ (!flag_disable_hsa)) -#else -#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) -#endif /* Builtin used by implementation of Cilk Plus. Most of these are decomposed by the compiler but a few are implemented in libcilkrts. */
Re: [PATCH] Add selftest for pretty-print.c (v2)
On Thu, Jun 9, 2016 at 9:10 AM, Jakub Jelinekwrote: > On Thu, Jun 09, 2016 at 09:02:27AM -0400, David Edelsohn wrote: >> On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidt wrote: >> > On 06/09/2016 02:21 PM, David Edelsohn wrote: >> > >> >> This is a completely unacceptable way to introduce these self-tests. >> >> Please stop adding self-tests that only are tested on x86 Linux and >> >> cause bootstrap failures. >> > >> > >> > We have no requirement to test patches on more than one target. I think >> > your >> > request is unreasonable. >> >> Bernd, >> >> This is a completely inappropriate response. GCC must maintain a >> stable, working development base on which developers can work. GCC >> specifically supports multiple architectures and targets. >> >> GCC is not your personal kingdom and playground. > > For patches that the submitter can easily expect problems on some > architectures, testing there is desirable, but we certainly don't and should > not require every single generic code change to be expected on all supported > targets, not even on primary targets. That just doesn't scale, various > people submit multiple changes a day and having to wait days or weeks before > testing is over is unacceptable. > > You've reported a problem, David is going to look at it and fix. > I don't see why it would be unacceptable to add further self-tests, as long > as there is nothing in them where one should expect issues on targets other > than the tested one. I didn't request that every patch be tested on every architecture. Please don't play a rhetorical game of arguing against the strawman that you created. The self-tests specifically abort the build and break bootstrap upon failure. Most other changes that inadvertently have bugs or tickle a latent issue in a target will introduce some additional testsuite failures, not a bootstrap failure. x86 developers seem to get quite annoyed when a patch causes a bootstrap failure for an x86 configuration. Second, all of the large changes that may have unknown effects on various targets have been tested extensively on multiple architectures, as have most global optimization changes. It may not be required, but it generally has been considered "good form" and has been a stipulation of patch approval by some reviewers. It would be very unfortunate for GCC to lower the bar for patches by some developers and not others. Thanks, David
[PATCH] Handle undefined extern vars in output_in_order
Hi, This patch teaches cgraphunit.c:output_in_order to output undefined external variables via assemble_undefined_decl. At the moment that is only done for -ftoplevel-reorder in varpool.c:symbol_table::output_variables. This patch makes both behave the same way. I've also made handling of variables in both functions look similar to each other. I'll admit it's rather unclear to me what the '!DECL_EXTERNAL (pv->decl)' condition is guarding in output_in_order (removed in my patch). AIUI, it should be always true for defined variables (or, conversely, if it's false for a defined variable, why are we skipping it?). I have plans to reimplement handling of "omp declare target link" in a more clean manner. Now it looks quite out of place there. Bootstrapped and regtested on x86_64, OK for trunk? Alexander * cgraphunit.c (cgraph_order_sort_kind): New entry ORDER_VAR_UNDEF. (output_in_order): Loop over undefined variables too. Output them via assemble_undefined_decl. Skip "omp declare target link" variables earlier. * varpool.c (symbol_table::output_variables): Handle undefined variables together with defined ones. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 4bfcad7..5a04902 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2141,6 +2141,7 @@ enum cgraph_order_sort_kind ORDER_UNDEFINED = 0, ORDER_FUNCTION, ORDER_VAR, + ORDER_VAR_UNDEF, ORDER_ASM }; @@ -2187,16 +2188,25 @@ output_in_order (bool no_reorder) } } - FOR_EACH_DEFINED_VARIABLE (pv) -if (!DECL_EXTERNAL (pv->decl)) - { - if (no_reorder && !pv->no_reorder) - continue; - i = pv->order; - gcc_assert (nodes[i].kind == ORDER_UNDEFINED); - nodes[i].kind = ORDER_VAR; - nodes[i].u.v = pv; - } + FOR_EACH_VARIABLE (pv) +{ + if (no_reorder && !pv->no_reorder) + continue; + if (DECL_HARD_REGISTER (pv->decl)) + continue; + if (DECL_HAS_VALUE_EXPR_P (pv->decl)) + { + gcc_checking_assert (lookup_attribute ("omp declare target link", +DECL_ATTRIBUTES (pv->decl))); +#ifdef ACCEL_COMPILER + continue; +#endif + } + i = pv->order; + gcc_assert (nodes[i].kind == ORDER_UNDEFINED); + nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF; + nodes[i].u.v = pv; +} for (pa = symtab->first_asm_symbol (); pa; pa = pa->next) { @@ -,16 +2232,13 @@ output_in_order (bool no_reorder) break; case ORDER_VAR: -#ifdef ACCEL_COMPILER - /* Do not assemble "omp declare target link" vars. */ - if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl) - && lookup_attribute ("omp declare target link", - DECL_ATTRIBUTES (nodes[i].u.v->decl))) - break; -#endif nodes[i].u.v->assemble_decl (); break; + case ORDER_VAR_UNDEF: + assemble_undefined_decl (nodes[i].u.v->decl); + break; + case ORDER_ASM: assemble_asm (nodes[i].u.a->asm_str); break; diff --git a/gcc/varpool.c b/gcc/varpool.c index ab615fa..b513026 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -733,11 +733,6 @@ symbol_table::output_variables (void) timevar_push (TV_VAROUT); - FOR_EACH_VARIABLE (node) -if (!node->definition - && !DECL_HAS_VALUE_EXPR_P (node->decl) - && !DECL_HARD_REGISTER (node->decl)) - assemble_undefined_decl (node->decl); FOR_EACH_DEFINED_VARIABLE (node) { /* Handled in output_in_order. */ @@ -747,20 +742,24 @@ symbol_table::output_variables (void) node->finalize_named_section_flags (); } - FOR_EACH_DEFINED_VARIABLE (node) + FOR_EACH_VARIABLE (node) { - /* Handled in output_in_order. */ if (node->no_reorder) continue; + if (DECL_HARD_REGISTER (node->decl)) + continue; + if (DECL_HAS_VALUE_EXPR_P (node->decl)) + { + gcc_checking_assert (lookup_attribute ("omp declare target link", +DECL_ATTRIBUTES (node->decl))); #ifdef ACCEL_COMPILER - /* Do not assemble "omp declare target link" vars. */ - if (DECL_HAS_VALUE_EXPR_P (node->decl) - && lookup_attribute ("omp declare target link", - DECL_ATTRIBUTES (node->decl))) - continue; + continue; #endif - if (node->assemble_decl ()) -changed = true; + } + if (node->definition) + changed |= node->assemble_decl (); + else + assemble_undefined_decl (node->decl); } timevar_pop (TV_VAROUT); return changed;
Re: [PATCH] Add selftest for pretty-print.c (v2)
On Thu, Jun 09, 2016 at 09:02:27AM -0400, David Edelsohn wrote: > On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidtwrote: > > On 06/09/2016 02:21 PM, David Edelsohn wrote: > > > >> This is a completely unacceptable way to introduce these self-tests. > >> Please stop adding self-tests that only are tested on x86 Linux and > >> cause bootstrap failures. > > > > > > We have no requirement to test patches on more than one target. I think your > > request is unreasonable. > > Bernd, > > This is a completely inappropriate response. GCC must maintain a > stable, working development base on which developers can work. GCC > specifically supports multiple architectures and targets. > > GCC is not your personal kingdom and playground. For patches that the submitter can easily expect problems on some architectures, testing there is desirable, but we certainly don't and should not require every single generic code change to be expected on all supported targets, not even on primary targets. That just doesn't scale, various people submit multiple changes a day and having to wait days or weeks before testing is over is unacceptable. You've reported a problem, David is going to look at it and fix. I don't see why it would be unacceptable to add further self-tests, as long as there is nothing in them where one should expect issues on targets other than the tested one. Jakub
Re: [PATCH] Add selftest for pretty-print.c (v2)
On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidtwrote: > On 06/09/2016 02:21 PM, David Edelsohn wrote: > >> This is a completely unacceptable way to introduce these self-tests. >> Please stop adding self-tests that only are tested on x86 Linux and >> cause bootstrap failures. > > > We have no requirement to test patches on more than one target. I think your > request is unreasonable. Bernd, This is a completely inappropriate response. GCC must maintain a stable, working development base on which developers can work. GCC specifically supports multiple architectures and targets. GCC is not your personal kingdom and playground. Thanks, David > > Have you at least investigated whether the self-test exposes a real bug or > not? > > > Bernd
Re: [PATCH] Add selftest for pretty-print.c (v2)
On 06/09/2016 02:21 PM, David Edelsohn wrote: This is a completely unacceptable way to introduce these self-tests. Please stop adding self-tests that only are tested on x86 Linux and cause bootstrap failures. We have no requirement to test patches on more than one target. I think your request is unreasonable. Have you at least investigated whether the self-test exposes a real bug or not? Bernd
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Thu, Jun 09, 2016 at 02:40:43PM +0200, Christophe Lyon wrote: > > Bet it depends if this happens before the signal(SIGILL, sig_ill_handler); > > call or after it. If before, then I guess you'd better rewrite the > > long long a = 0, b = 1; > > asm ("vorr %P0, %P1, %P2" > > : "=w" (a) > > : "0" (a), "w" (b)); > > if (a != 1) > > Of course you are right: it happens just before the call to signal, > to build the sig_ill_handler address in r1. > > So it's not even a problem with rewriting the asm. Ugh, so the added options don't affect just vectorized code, but normal integer only code? check_vect is fragile, there is always a risk that some instruction is scheduled before the call. If you have working target attribute support, I think you should compile check_vect with attribute set to some lowest common denominator that every ARM CPU supports (if there is any, that is). Though most likely you'll need to tweak the inline asm, because maybe "w" constraint won't be available then. Jakub
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On 9 June 2016 at 14:31, Jakub Jelinekwrote: > On Thu, Jun 09, 2016 at 02:18:44PM +0200, Christophe Lyon wrote: >> On 8 June 2016 at 16:50, Jakub Jelinek wrote: >> > On Wed, Jun 08, 2016 at 04:44:00PM +0200, Christophe Lyon wrote: >> >> I've tried the attached patch (which does only dg-options -> >> >> dg-additional-options). >> >> For GCC, it's better, except that on arm-none-eabi qemu complains about >> >> an illegal instruction when asked to use arm926 and GCC is configured with >> >> the default cpu. Maybe that's because check_vect does not have the >> >> expected >> > >> > check_vect installs a SIGILL handler and if the insn is invalid, excepts >> > a signal to be raised. Is that not the case with qemu? Or is qemu just >> > being too noisy? >> > >> qemu complains when executing check_vect's prologue, which contains >> movwr1, #35712 >> which is not supported either on arm926. > > Bet it depends if this happens before the signal(SIGILL, sig_ill_handler); > call or after it. If before, then I guess you'd better rewrite the > long long a = 0, b = 1; > asm ("vorr %P0, %P1, %P2" > : "=w" (a) > : "0" (a), "w" (b)); > if (a != 1) Of course you are right: it happens just before the call to signal, to build the sig_ill_handler address in r1. So it's not even a problem with rewriting the asm. > fully into inline asm, if after, then it is likely either just too noisy > qemu, or misdesigned issue in qemu. > > Jakub
Re: Maintain loop iteration count estimates
> On Thu, 9 Jun 2016, Jan Hubicka wrote: > > > Hi, > > after we read the profile, we know expected number of iterations. > > We know the average ;) It may make sense to add some histogram > value profiling for niter now that we should easily able to do so. I always interpreted the estimated number of iterations to be the same as expected number of iterations and to be same as average. So it seems to be sane to feed the info from profile. I am thinking to add the histograms, yes. It is midly anoying to do so becuase one needs to intrstrument all exit edges out of loop. I guess we don't care much if histogram will get lost on abnormals. One option I am thinking about is to introduce counter that will take two parameters A,B. It will record linear histogram for values in range 0...A and logarithmic histogram for values greater than A capping by 2^B (so we don't need 64 counters for every loop). I think we are not really that interested in precise histograms for loops that iterate more than, say, 2^10 times. We only need to know that it loops a lot. We however care about low iteration counts to make good decision for peeling. This is still 26 counters per loop that is quite a lot. A lot cheaper alternative may be to simply measure loop peeling limit by having counter that counts how often loop exits in first PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is the maximal number of iterations in this case (we really want likely maximal number of iterations but that seems harder to get). This will determine peeling limit which we can then store into loop structure. Vectorizer/unroller/prefetcher and most of other classical loop opts care about the average being large enough so the current expected value seems to do the trick. This does not let you to update profile very precisely after peeling (and also peeling done by vectorizer), but it needs only 2 counters per loop. What other passes, beside peeling, would immediately benefit from a historgram? I wonder if you can think of better scheme? > > > Currently we use profile each time estimate_numbers_of_iterations_loop > > is called to recompute this value. This is not very safe because the > > profile may be misupdated. It seems safer to compute it at once and > > maintain thorough the compilation. > > > > Notice that I removed: > > - /* Force estimate compuation but leave any existing upper bound in > > place. */ > > - loop->any_estimate = false; > > From beggining of estimate_numbers_of_iterations_loop. I can not make > > sense of > > this. Even w/o profile if we have estimate, we are better to maintain it > > because > > later we may not be able to derive it again. > > There seems to be no code that is forced by setting loop->any_estimate = > > true. > > Only code that cares seems to be record_niter_bound that only decreases > > existing > > estimates. THis seems sane procedure - we don't roll loops. > > > > Bootstrapped/regtested x86_64-linux, OK? > > Ok. Did you check what this does to SPEC with FDO? I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have readily available. Any regressions would however point to loop info updating bugs (or wrong use of the profile) so I will look into them if they appear. I am trying to benchmark firefox now. Honza
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Thu, Jun 09, 2016 at 02:18:44PM +0200, Christophe Lyon wrote: > On 8 June 2016 at 16:50, Jakub Jelinekwrote: > > On Wed, Jun 08, 2016 at 04:44:00PM +0200, Christophe Lyon wrote: > >> I've tried the attached patch (which does only dg-options -> > >> dg-additional-options). > >> For GCC, it's better, except that on arm-none-eabi qemu complains about > >> an illegal instruction when asked to use arm926 and GCC is configured with > >> the default cpu. Maybe that's because check_vect does not have the expected > > > > check_vect installs a SIGILL handler and if the insn is invalid, excepts > > a signal to be raised. Is that not the case with qemu? Or is qemu just > > being too noisy? > > > qemu complains when executing check_vect's prologue, which contains > movwr1, #35712 > which is not supported either on arm926. Bet it depends if this happens before the signal(SIGILL, sig_ill_handler); call or after it. If before, then I guess you'd better rewrite the long long a = 0, b = 1; asm ("vorr %P0, %P1, %P2" : "=w" (a) : "0" (a), "w" (b)); if (a != 1) fully into inline asm, if after, then it is likely either just too noisy qemu, or misdesigned issue in qemu. Jakub