Re: [PATCH] [PR c++/85039] no type definitions in builtin offsetof
On Fri, Apr 13, 2018, 6:09 PM Alexandre Olivawrote: > On Apr 11, 2018, Jason Merrill wrote: > > > I raised this issue with the C++ committee, and it seems that nobody > > expects defining a type here to work. So let's go back to your first > > patch, removing the offending part of semicolon3.C. > > All right, here's the adjusted patch. Retested on i686- and > x86_64-linux-gnu. Ok to install? > Ok. Jason [PR c++/85039] no type definitions in builtin offsetof > > Types defined within a __builtin_offsetof argument don't always get > properly recorded as members of their context types, so if they're > anonymous, we may fail to assign them an anon type index for mangling > and ICE. > > We shouldn't allow types to be introduced in __builtin_offsetof, I > think, and Jason says the std committee agrees, so I've arranged for > us to reject them. > > Even then, we still parse the definitions and attempt to assign > mangled names to its member functions, so the ICE remains. Since > we've already reported an error, we might as well complete the name > assignment with an arbitrary index, thus avoiding the ICE. > > We used to have a test that expected to be able to define types in > __builtin_offsetof; this patch removes that specific test. > > > for gcc/cp/ChangeLog > > PR c++/85039 > * parser.c (cp_parser_builtin_offset): Reject type definitions. > * mangle.c (nested_anon_class_index): Avoid crash returning -1 > if we've seen errors. > > for gcc/testsuite/ChangeLog > > PR c++/85039 > * g++.dg/pr85039-1.C: New. > * g++.dg/pr85039-2.C: New. > * g++.dg/parse/semicolon3.C: Remove test_offsetof. > --- > gcc/cp/mangle.c |3 +++ > gcc/cp/parser.c |8 +++- > gcc/testsuite/g++.dg/parse/semicolon3.C |7 --- > gcc/testsuite/g++.dg/pr85039-1.C| 17 + > gcc/testsuite/g++.dg/pr85039-2.C| 10 ++ > 5 files changed, 37 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/pr85039-1.C > create mode 100644 gcc/testsuite/g++.dg/pr85039-2.C > > diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c > index 94c4bed28486..a7f9d686345d 100644 > --- a/gcc/cp/mangle.c > +++ b/gcc/cp/mangle.c > @@ -1623,6 +1623,9 @@ nested_anon_class_index (tree type) > ++index; >} > > + if (seen_error ()) > +return -1; > + >gcc_unreachable (); > } > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c > index 8b1b271b53d1..bf46165f5ae1 100644 > --- a/gcc/cp/parser.c > +++ b/gcc/cp/parser.c > @@ -9823,7 +9823,13 @@ cp_parser_builtin_offsetof (cp_parser *parser) >parens.require_open (parser); >/* Parse the type-id. */ >location_t loc = cp_lexer_peek_token (parser->lexer)->location; > - type = cp_parser_type_id (parser); > + { > +const char *saved_message = parser->type_definition_forbidden_message; > +parser->type_definition_forbidden_message > + = G_("types may not be defined within __builtin_offsetof"); > +type = cp_parser_type_id (parser); > +parser->type_definition_forbidden_message = saved_message; > + } >/* Look for the `,'. */ >cp_parser_require (parser, CPP_COMMA, RT_COMMA); >token = cp_lexer_peek_token (parser->lexer); > diff --git a/gcc/testsuite/g++.dg/parse/semicolon3.C > b/gcc/testsuite/g++.dg/parse/semicolon3.C > index 8a2b1ac46301..0d46be9ed654 100644 > --- a/gcc/testsuite/g++.dg/parse/semicolon3.C > +++ b/gcc/testsuite/g++.dg/parse/semicolon3.C > @@ -20,13 +20,6 @@ struct OK3 > } // no complaints >(s7); > > -__SIZE_TYPE__ > -test_offsetof (void) > -{ > - // no complaints about a missing semicolon > - return __builtin_offsetof (struct OK4 { int a; int b; }, b); > -} > - > struct OK5 > { >int a; > diff --git a/gcc/testsuite/g++.dg/pr85039-1.C > b/gcc/testsuite/g++.dg/pr85039-1.C > new file mode 100644 > index ..f57c8a261dee > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr85039-1.C > @@ -0,0 +1,17 @@ > +// { dg-do compile { target c++14 } } > + > +constexpr int a() { > + return > + __builtin_offsetof(struct { // { dg-error "types may not be defined" } > +int i; > +short b { > + __builtin_offsetof(struct { // { dg-error "types may not be > defined" } > + int j; > +struct c { // { dg-error "types may not be defined" } > + void d() { > + } > +}; > + }, j) > +}; > + }, i); > +} > diff --git a/gcc/testsuite/g++.dg/pr85039-2.C > b/gcc/testsuite/g++.dg/pr85039-2.C > new file mode 100644 > index ..e6d16325105b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr85039-2.C > @@ -0,0 +1,10 @@ > +// { dg-do compile } > + > +struct d { > + static d *b; > +} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be > defined" } > + int i; > + struct a { // { dg-error "types may not be defined" } > +int c() { return .1f; } > + }; > +}, i)); > > >
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
On Fri, Apr 13, 2018, 7:53 PM Paolo Carliniwrote: > Hi again, > > On 14/04/2018 00:12, Paolo Carlini wrote: > > Hi, > > > > On 13/04/2018 19:45, Jason Merrill wrote: > >> Ah, I see. The problem is that even though convert_to_integer_1 was > >> called with dofold false, we lose that when it calls convert(). Why > >> not recurse directly to convert_to_integer_1 with the underlying type? > >> That would avoid the undesired folding. > > Interesting. I had no idea that this seemingly simple error-recovery > > issue was ultimately due to a substantive if latent issue in > > convert.c. Anyway, in the meanwhile I tried a few variants of direct > > recursion without much success, my boostraps all failed pretty quickly > > and pretty badly. However the below - which isn't far from the spirit > > of your analysis, I think - is holding up well, at least bootstrap + > > C++ testsuite are definitely OK. Should I continue testing it over the > > weekend? > The below seems much better, also bootstraps fine and I'm now fully > testing it. > > Thanks, > Paolo. > > PS: no idea why earlier today I wasted a lot of time trying to > completely avoid maybe_fold_build1_loc :( > Ok if testing passes. Jason >
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
Hi again, On 14/04/2018 00:12, Paolo Carlini wrote: Hi, On 13/04/2018 19:45, Jason Merrill wrote: Ah, I see. The problem is that even though convert_to_integer_1 was called with dofold false, we lose that when it calls convert(). Why not recurse directly to convert_to_integer_1 with the underlying type? That would avoid the undesired folding. Interesting. I had no idea that this seemingly simple error-recovery issue was ultimately due to a substantive if latent issue in convert.c. Anyway, in the meanwhile I tried a few variants of direct recursion without much success, my boostraps all failed pretty quickly and pretty badly. However the below - which isn't far from the spirit of your analysis, I think - is holding up well, at least bootstrap + C++ testsuite are definitely OK. Should I continue testing it over the weekend? The below seems much better, also bootstraps fine and I'm now fully testing it. Thanks, Paolo. PS: no idea why earlier today I wasted a lot of time trying to completely avoid maybe_fold_build1_loc :( / Index: convert.c === --- convert.c (revision 259376) +++ convert.c (working copy) @@ -741,8 +741,10 @@ convert_to_integer_1 (tree type, tree expr, bool d else if (TREE_CODE (type) == ENUMERAL_TYPE || maybe_ne (outprec, GET_MODE_PRECISION (TYPE_MODE (type { - expr = convert (lang_hooks.types.type_for_mode - (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr); + expr + = convert_to_integer_1 (lang_hooks.types.type_for_mode + (TYPE_MODE (type), TYPE_UNSIGNED (type)), + expr, dofold); return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr); } Index: testsuite/g++.dg/cpp0x/pr85112.C === --- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent) +++ testsuite/g++.dg/cpp0x/pr85112.C(working copy) @@ -0,0 +1,17 @@ +// PR c++/85112 +// { dg-do compile { target c++11 } } + +struct A +{ + int m; + int n : 4; +}; + +int i; // { dg-message "not const" } + +void foo() +{ + constexpr int j = i; // { dg-error "not usable" } + A a; + a.n = j; +}
[PATCH] rs6000: Disable -m[no-]direct-move (PR85293)
The -mno-direct-move option causes a lot of problems, since it forces us to be able to generate code for p8 and up with some crucial instructions missing. This patch removes the -m[no-]direct-move options so that the user cannot put us into this unexpected situation anymore. Internally we still have all the same flags, and they are automatically set based on -mcpu; getting rid of that is a lot more work and will have to wait for GCC 9 (in some places the flag is used to see if we are compiling for a p8 _at all_). Tested on powerpc64-linux {-m32,-m64}. Will also test on a p9 before committing. Segher 2018-04-13 Segher BoessenkoolPR target/85293 * config/rs6000/rs6000.opt (mdirect-move): Make deprecated. * doc/invoke.texi (RS/6000 and PowerPC Options): Remove -mdirect-move and -mno-direct-move. gcc/testsuite/ PR target/85293 * gcc.target/powerpc/pr80098-2.c: Remove -mdirect-move. Remove the corresponding dg-error clause. * gcc.target/powerpc/pr80098-3.c: Ditto. * gcc.target/powerpc/pr80103-1.c: Delete. --- gcc/config/rs6000/rs6000.opt | 3 +-- gcc/doc/invoke.texi | 12 ++-- gcc/testsuite/gcc.target/powerpc/pr80098-2.c | 3 +-- gcc/testsuite/gcc.target/powerpc/pr80098-3.c | 3 +-- gcc/testsuite/gcc.target/powerpc/pr80103-1.c | 15 --- 5 files changed, 5 insertions(+), 31 deletions(-) delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr80103-1.c diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 596f3fb..bbe3a64 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -534,8 +534,7 @@ Target Report Mask(CRYPTO) Var(rs6000_isa_flags) Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions. mdirect-move -Target Report Mask(DIRECT_MOVE) Var(rs6000_isa_flags) -Use ISA 2.07 direct move between GPR & VSX register instructions. +Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) Ignore Warn(%qs is deprecated) mhtm Target Report Mask(HTM) Var(rs6000_isa_flags) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a8e2672..70896f7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1095,7 +1095,7 @@ See RS/6000 and PowerPC Options. -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol -msave-toc-indirect -mno-save-toc-indirect @gol -mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol --mcrypto -mno-crypto -mhtm -mno-htm -mdirect-move -mno-direct-move @gol +-mcrypto -mno-crypto -mhtm -mno-htm @gol -mquad-memory -mno-quad-memory @gol -mquad-memory-atomic -mno-quad-memory-atomic @gol -mcompat-align-parm -mno-compat-align-parm @gol @@ -23353,7 +23353,7 @@ following options: -mpopcntb -mpopcntd -mpowerpc64 @gol -mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol -msimple-fpu -mmulhw -mdlmzb -mmfpgpr -mvsx @gol --mcrypto -mdirect-move -mhtm -mpower8-fusion -mpower8-vector @gol +-mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol -mquad-memory -mquad-memory-atomic -mfloat128 -mfloat128-hardware} The particular options set for any particular CPU varies between @@ -23495,14 +23495,6 @@ Enable the use (disable) of the built-in functions that allow direct access to the cryptographic instructions that were added in version 2.07 of the PowerPC ISA. -@item -mdirect-move -@itemx -mno-direct-move -@opindex mdirect-move -@opindex mno-direct-move -Generate code that uses (does not use) the instructions to move data -between the general purpose registers and the vector/scalar (VSX) -registers that were added in version 2.07 of the PowerPC ISA. - @item -mhtm @itemx -mno-htm @opindex mhtm diff --git a/gcc/testsuite/gcc.target/powerpc/pr80098-2.c b/gcc/testsuite/gcc.target/powerpc/pr80098-2.c index 88f7ee4..5a6421b 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr80098-2.c +++ b/gcc/testsuite/gcc.target/powerpc/pr80098-2.c @@ -1,9 +1,8 @@ /* { dg-do compile { target { powerpc64*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-options "-mcpu=power8 -mno-power8-vector -mdirect-move -mcrypto" } */ +/* { dg-options "-mcpu=power8 -mno-power8-vector -mcrypto" } */ int i; -/* { dg-error "'-mno-power8-vector' turns off '-mdirect-move'" "PR80098" { target *-*-* } 0 } */ /* { dg-error "'-mno-power8-vector' turns off '-mcrypto'" "PR80098" { target *-*-* } 0 } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr80098-3.c b/gcc/testsuite/gcc.target/powerpc/pr80098-3.c index aae8fa1..e62418b 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr80098-3.c +++ b/gcc/testsuite/gcc.target/powerpc/pr80098-3.c @@ -1,9 +1,8 @@ /* { dg-do compile { target { powerpc64*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } {
Re: [PATCH] rs6000 PR83660 fix ICE with vec_extract
Hi! On Fri, Apr 13, 2018 at 03:37:08PM -0500, Aaron Sawdey wrote: > Per the discussion on the 83660, I've come to a minimal patch to > prevent this. Basically marking the vec_extract tree as having side > effects later makes sure that it gets all the cleanup points it needs > so that gimplify_cleanup_point_expr () is happy. Also because > vec_insert puts a MODIFY_EXPR in there, it has side effects and this > problem will not occur. > > Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is > where this issue arises. OK for trunk if everything passes? > --- testsuite/gcc.target/powerpc/pr83660.C(nonexistent) > +++ testsuite/gcc.target/powerpc/pr83660.C(working copy) > @@ -0,0 +1,14 @@ > +/* PR target/83660 */ > +/* { dg-do compile } */ > +/* { dg-options "-mcpu=power7" } */ You'll need to skip this test if the user overrides -mcpu=; see the many other testcases that do this. With that fixed, okay for trunk. Thanks! Segher
[PATCH] [PR c++/80290] recycle tinst garbage sooner
tinst_level objects are created during template instantiation, and they're most often quite short-lived, but since there's no intervening garbage collection, they accumulate throughout the pass while most by far could be recycled after a short while. The original testcase in PR80290, for example, creates almost 55 million tinst_level objects, all but 10 thousand of them without intervening GC, but we don't need more than 284 of them live at a time. Furthermore, in many cases, TREE_LIST objects are created to stand for the decl in tinst_level. In most cases, those can be recycled as soon as the tinst_level object is recycled; in some relatively common cases, they are modified and reused in up to 3 tinst_level objects. In the same testcase, TREE_LISTs are used in all but 3 thousand of the tinst_level objects, and we don't need more than 89 tinst_level objects with TREE_LISTs live at a time. Furthermore, all but 2 thousand of those are created without intervening GC. This patch arranges for tinst_level objects to be refcounted (I've seen as many as 20 live references to a single tinst_level object in my testing), and for pending_template, tinst_level and TREE_LIST objects that can be recycled to be added to freelists; that's much faster than ggc_free()ing them and reallocating them shortly thereafter. In fact, the introduction of such freelists is what makes this entire patch lighter-weight, when it comes to memory use, and faster. With refcounting alone, the total memory footprint was still about the same, and we spent more time. In order to further reduce memory use, I've arranged for us to create TREE_LISTs lazily, only at points that really need them (when printing error messages). This grows tinst_level by an additional pointer, but since a TREE_LIST holds a lot more than an extra pointer, and tinst_levels with TREE_LISTs used to be allocated tens of thousands of times more often than plain decl ones, we still save memory overall. I was still not quite happy about growing memory use in cases that used template classes but not template overload resolution, so I changed the type of the errors field to unsigned short, from int. With that change, in_system_header_p and refcount move into the same word or half-word that used to hold errors, releasing a full word, bringing tinst_level back to its original size, now without any padding. The errors field is only used to test whether we've had any errors since the expansion of some template, to skip the expansion of further templates. If we get 2^16 errors or more, it is probably reasonable to refrain from expanding further templates, even if we would expand them before this change. With these changes, compile time for the original testcase at -O0, with default checking enabled, is cut down by some 3.7%, total GCable memory allocation is cut down by almost 20%, and total memory use (as reported by GNU time as maxresident) is cut down by slightly over 15%. Regstrapped on i686- and x86_64-linux-gnu. Ok to install? (See below for an incremental patch that introduces some statistics collection and reporting, that shows how I collected some of the data above. That one is not meant for inclusion) for gcc/cp/ChangeLog PR c++/80290 * cp-tree.h (struct tinst_level): Split decl into tldcl and targs. Add private split_list_p, tree_list_p, and not_list_p inline const predicates; to_list private member function declaration; list_p, get_node, and get_decl_maybe accessors, and refcount private data member. Narrow errors to unsigned short. * error.c (print_instantiation_full_context): Use new accessors. (print_instantiation_partial_context_line): Likewise. * mangle.c (mangle_decl_string): Likewise. * pt.c (freelist): New template class. (tree_list_freelist_head): New var. (tree_list_freelist): New fn, along with specializations. (tinst_level_freelist_head): New var. (pending_template_freelist_head): Likewise. (tinst_level_freelist, pending_template_freelist): New fns. (tinst_level::to_list): Define. (inc_refcount_use, dec_refcount_use): New fns for tinst_level. (set_refcount_ptr): New template fn. (add_pending_template): Adjust for refcounting, freelists and new accessors. (neglectable_inst_p): Take a NULL d as a non-DECL. (limit_bad_template_recursion): Use new accessors. (push_tinst_level): New overload to create the list. (push_tinst_level_loc): Make it static, split decl into two args, adjust tests and initialization to cope with split lists, use freelist, adjust for refcounting. (push_tinst_level_loc): New wrapper with the old interface. (pop_tinst_level): Adjust for refcounting. (record_last_problematic_instantiation): Likewise. (reopen_tinst_level): Likewise. Use new accessors.
Re: RFA[powerpc]: patch to fix PR79916
On 04/13/2018 05:26 PM, Segher Boessenkool wrote: Hi! On Fri, Apr 13, 2018 at 04:43:02PM -0400, Vladimir Makarov wrote: On 04/13/2018 03:58 PM, Alexander Monakov wrote: Here's another compact variant: regno = reg_renumber[regno]; if (regno < 0) regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; Thanks, Alexander. I like your variant more. Me too :-) --- testsuite/gcc.target/powerpc/pr79916.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr79916.c (working copy) @@ -0,0 +1,556 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ Is the testcase specific to LE? If not, please use powerpc*-*-*, or powerpc*-*-* && lp64 if it needs 64-bit. I've just tried powerpc64 BE and powerpc. The bug is reproducible everywhere. So I change it to powerpc*-*-* +/* { dg-options "-Idfp -fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */ I think you can drop the -Idfp ? If that option would do anything we have bigger problems ;-) I removed it. Looks fine to me otherwise. Thanks for working on this! Thank you for the quick review. I committed the patch to trunk (rev. 259379). The final variant is in the attachment. Index: ChangeLog === --- ChangeLog (revision 259378) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2018-04-13 Vladimir Makarov+ + PR rtl-optimization/79916 + * config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard + regs (if any) to define how to gnerate SD moves when LRA is in + progress. + 2018-04-13 Jakub Jelinek PR rtl-optimization/85393 Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 259378) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-04-13 Vladimir Makarov + + PR rtl-optimization/79916 + * gcc.target/powerpc/pr79916.c: New. + 2018-04-13 Jakub Jelinek PR rtl-optimization/85393 Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 259330) +++ config/rs6000/rs6000.c (working copy) @@ -10610,7 +10610,9 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10635,7 +10637,9 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { @@ -10664,7 +10668,9 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10689,7 +10695,9 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { Index: testsuite/gcc.target/powerpc/pr79916.c === --- testsuite/gcc.target/powerpc/pr79916.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr79916.c (working copy) @@ -0,0 +1,556 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-options "-fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */ + +#define PASTE2(A,B) A ## B +#define PASTE(A,B) PASTE2(A,B) + +#define TESTVAL_NEG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + si = VAL;\ + sll = PASTE(VAL,LL); \ + a = si;\ + b = sll;\ + c = VAL;\ + d = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NEG_BIG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + sll = PASTE(VAL,LL); \ + a = sll;\ + b = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NONNEG(VAL,SUF,SIZE)
Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins
Hi! On Fri, Apr 13, 2018 at 03:27:40PM -0700, Carl Love wrote: > On Fri, 2018-04-13 at 16:54 -0500, Segher Boessenkool wrote: > > On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote: > > > diff --git a/gcc/config/rs6000/rs6000.c > > > b/gcc/config/rs6000/rs6000.c > > > index a0c9b5c..855be43 100644 > > > --- a/gcc/config/rs6000/rs6000.c > > > +++ b/gcc/config/rs6000/rs6000.c > > > @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin > > > (gimple_stmt_iterator *gsi) > > > { > > > arg0 = gimple_call_arg (stmt, 0); > > > lhs = gimple_call_lhs (stmt); > > > - /* Only fold the vec_splat_*() if arg0 is constant. */ > > > - if (TREE_CODE (arg0) != INTEGER_CST) > > > + /* Only fold the vec_splat_*() if arg0 is a 5-bit > > > constant. */ > > > + if (TREE_CODE (arg0) != INTEGER_CST > > > + || TREE_INT_CST_LOW (arg0) & ~0x1f) > > > return false; > > > > Should this test for *signed* 5-bit constants only? > > > > if (TREE_CODE (arg0) != INTEGER_CST > > || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15)) > > The buitins for the unsigned vec_splat_u[8, 16,32] are actually mapped > to their corresponding signed version vec_splat_s[8, 16,32] in > altivec.h. Both the vec_splat_u[8, 16,32] and vec_splat_s[8, 16,32] > builtins will get you to the case statement > > /* flavors of vec_splat_[us]{8,16,32}. */ > case ALTIVEC_BUILTIN_VSPLTISB: > case ALTIVEC_BUILTIN_VSPLTISH: > case ALTIVEC_BUILTIN_VSPLTISW: > > under which the change is being made. So technically arg0 could be a > signed 5-bit or an unsiged 5-bit value. Either way the 5-bit value is > splatted into the vector with sign extension to 8/16/32 bits. So I > would argue that the IN_RANGE test for signed values is maybe > misleading as arg0 could represent a signed or unsigned value. Both > tests, the one from the patch or Segher's suggestion, are really just > looking to see if any of the upper bits are 1. From a functional > standpoint, I don't see any difference and feel either one would do the > job. Am I missing anything? But then the & ~0x1f test is not good either, it does not work for values -16..-1 ? You cannot handle signed and unsigned exactly the same for the test for allowed values. Segher
Re: [PATCH v2, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative
Hi! On Fri, Apr 13, 2018 at 03:21:14PM -0500, Paul Clarke wrote: > - if (__builtin_constant_p(__B)) > - lshift = (__v4su) vec_splat_s32(__B); > + if (__builtin_constant_p(__B) && __B < 16) > +lshift = (__v4su) vec_splat_s32(__B); >else > - lshift = vec_splats ((unsigned int) __B); > +lshift = vec_splats ((unsigned int) __B); You changed the tab chars to spaces here. Please don't. I'll fix it. Rest looks fine... Let's see if I manage to commit it :-) Thanks, Segher
Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins
On Fri, 2018-04-13 at 16:54 -0500, Segher Boessenkool wrote: > Hi Carl, > > On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote: > > diff --git a/gcc/config/rs6000/rs6000.c > > b/gcc/config/rs6000/rs6000.c > > index a0c9b5c..855be43 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin > > (gimple_stmt_iterator *gsi) > > { > > arg0 = gimple_call_arg (stmt, 0); > > lhs = gimple_call_lhs (stmt); > > - /* Only fold the vec_splat_*() if arg0 is constant. */ > > - if (TREE_CODE (arg0) != INTEGER_CST) > > + /* Only fold the vec_splat_*() if arg0 is a 5-bit > > constant. */ > > + if (TREE_CODE (arg0) != INTEGER_CST > > + || TREE_INT_CST_LOW (arg0) & ~0x1f) > > return false; > > Should this test for *signed* 5-bit constants only? > >if (TREE_CODE (arg0) != INTEGER_CST > || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15)) Segher: The buitins for the unsigned vec_splat_u[8, 16,32] are actually mapped to their corresponding signed version vec_splat_s[8, 16,32] in altivec.h. Both the vec_splat_u[8, 16,32] and vec_splat_s[8, 16,32] builtins will get you to the case statement /* flavors of vec_splat_[us]{8,16,32}. */ case ALTIVEC_BUILTIN_VSPLTISB: case ALTIVEC_BUILTIN_VSPLTISH: case ALTIVEC_BUILTIN_VSPLTISW: under which the change is being made. So technically arg0 could be a signed 5-bit or an unsiged 5-bit value. Either way the 5-bit value is splatted into the vector with sign extension to 8/16/32 bits. So I would argue that the IN_RANGE test for signed values is maybe misleading as arg0 could represent a signed or unsigned value. Both tests, the one from the patch or Segher's suggestion, are really just looking to see if any of the upper bits are 1. From a functional standpoint, I don't see any difference and feel either one would do the job. Am I missing anything? Carl Love
Re: [PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32
Hi! On Fri, Apr 13, 2018 at 03:15:09PM -0500, Kelvin Nilsen wrote: > Twelve failures have been occuring in the bfp test directory during -m32 > regression testing. > This patch: > > 1. Changes the expected error messages in certain test programs. > > 2. Disables certain test programs from being exercised on 32-bit targets. > > 3. Adds a "note" error message to explain the mapping from overloaded > built-in functions > to non-overloaded built-in functions. Ah, very good plan! > @@ -6932,14 +6933,20 @@ > && rs6000_builtin_type_compatible (types[1], desc->op2)) > { > if (rs6000_builtin_decls[desc->overloaded_code] != NULL_TREE) > - return altivec_build_resolved_builtin (args, n, desc); > + { > + result = altivec_build_resolved_builtin (args, n, desc); > + /* overloaded_code is set above */ > + if (!rs6000_builtin_is_supported_p (overloaded_code)) > + unsupported_builtin = true; > + else > + return result; > + } > else > unsupported_builtin = true; > } > } The indentation (here and elsewhere) is off; probably because of the way you sent this? Please check. > + else > + /* on 64-bit, i seem to end up here */ > + error ("builtin function %qs not supported in this compiler " > + "configuration", name); I think you don't want that comment? > + /* If an error-representing result tree was returned from > + altivec_build_resolved_builtin above, use it. */ > + return (result != NULL)? result: error_mark_node; Spaces before ? and :. x != 0 as a condition is pretty silly btw, but that's just style ;-) Okay for trunk with those nits fixed. Thanks! Segher
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
Hi, On 13/04/2018 19:45, Jason Merrill wrote: Ah, I see. The problem is that even though convert_to_integer_1 was called with dofold false, we lose that when it calls convert(). Why not recurse directly to convert_to_integer_1 with the underlying type? That would avoid the undesired folding. Interesting. I had no idea that this seemingly simple error-recovery issue was ultimately due to a substantive if latent issue in convert.c. Anyway, in the meanwhile I tried a few variants of direct recursion without much success, my boostraps all failed pretty quickly and pretty badly. However the below - which isn't far from the spirit of your analysis, I think - is holding up well, at least bootstrap + C++ testsuite are definitely OK. Should I continue testing it over the weekend? Thanks! Paolo. /// Index: convert.c === --- convert.c (revision 259376) +++ convert.c (working copy) @@ -741,8 +741,12 @@ convert_to_integer_1 (tree type, tree expr, bool d else if (TREE_CODE (type) == ENUMERAL_TYPE || maybe_ne (outprec, GET_MODE_PRECISION (TYPE_MODE (type { - expr = convert (lang_hooks.types.type_for_mode - (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr); + tree utype = (lang_hooks.types.type_for_mode + (TYPE_MODE (type), TYPE_UNSIGNED (type))); + if (dofold) + expr = convert (utype, expr); + else + expr = build1 (CONVERT_EXPR, utype, expr); return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr); } Index: testsuite/g++.dg/cpp0x/pr85112.C === --- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent) +++ testsuite/g++.dg/cpp0x/pr85112.C(working copy) @@ -0,0 +1,17 @@ +// PR c++/85112 +// { dg-do compile { target c++11 } } + +struct A +{ + int m; + int n : 4; +}; + +int i; // { dg-message "not const" } + +void foo() +{ + constexpr int j = i; // { dg-error "not usable" } + A a; + a.n = j; +}
Re: [PATCH] [PR c++/85039] no type definitions in builtin offsetof
On Apr 11, 2018, Jason Merrillwrote: > I raised this issue with the C++ committee, and it seems that nobody > expects defining a type here to work. So let's go back to your first > patch, removing the offending part of semicolon3.C. All right, here's the adjusted patch. Retested on i686- and x86_64-linux-gnu. Ok to install? [PR c++/85039] no type definitions in builtin offsetof Types defined within a __builtin_offsetof argument don't always get properly recorded as members of their context types, so if they're anonymous, we may fail to assign them an anon type index for mangling and ICE. We shouldn't allow types to be introduced in __builtin_offsetof, I think, and Jason says the std committee agrees, so I've arranged for us to reject them. Even then, we still parse the definitions and attempt to assign mangled names to its member functions, so the ICE remains. Since we've already reported an error, we might as well complete the name assignment with an arbitrary index, thus avoiding the ICE. We used to have a test that expected to be able to define types in __builtin_offsetof; this patch removes that specific test. for gcc/cp/ChangeLog PR c++/85039 * parser.c (cp_parser_builtin_offset): Reject type definitions. * mangle.c (nested_anon_class_index): Avoid crash returning -1 if we've seen errors. for gcc/testsuite/ChangeLog PR c++/85039 * g++.dg/pr85039-1.C: New. * g++.dg/pr85039-2.C: New. * g++.dg/parse/semicolon3.C: Remove test_offsetof. --- gcc/cp/mangle.c |3 +++ gcc/cp/parser.c |8 +++- gcc/testsuite/g++.dg/parse/semicolon3.C |7 --- gcc/testsuite/g++.dg/pr85039-1.C| 17 + gcc/testsuite/g++.dg/pr85039-2.C| 10 ++ 5 files changed, 37 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr85039-1.C create mode 100644 gcc/testsuite/g++.dg/pr85039-2.C diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 94c4bed28486..a7f9d686345d 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -1623,6 +1623,9 @@ nested_anon_class_index (tree type) ++index; } + if (seen_error ()) +return -1; + gcc_unreachable (); } diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 8b1b271b53d1..bf46165f5ae1 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -9823,7 +9823,13 @@ cp_parser_builtin_offsetof (cp_parser *parser) parens.require_open (parser); /* Parse the type-id. */ location_t loc = cp_lexer_peek_token (parser->lexer)->location; - type = cp_parser_type_id (parser); + { +const char *saved_message = parser->type_definition_forbidden_message; +parser->type_definition_forbidden_message + = G_("types may not be defined within __builtin_offsetof"); +type = cp_parser_type_id (parser); +parser->type_definition_forbidden_message = saved_message; + } /* Look for the `,'. */ cp_parser_require (parser, CPP_COMMA, RT_COMMA); token = cp_lexer_peek_token (parser->lexer); diff --git a/gcc/testsuite/g++.dg/parse/semicolon3.C b/gcc/testsuite/g++.dg/parse/semicolon3.C index 8a2b1ac46301..0d46be9ed654 100644 --- a/gcc/testsuite/g++.dg/parse/semicolon3.C +++ b/gcc/testsuite/g++.dg/parse/semicolon3.C @@ -20,13 +20,6 @@ struct OK3 } // no complaints (s7); -__SIZE_TYPE__ -test_offsetof (void) -{ - // no complaints about a missing semicolon - return __builtin_offsetof (struct OK4 { int a; int b; }, b); -} - struct OK5 { int a; diff --git a/gcc/testsuite/g++.dg/pr85039-1.C b/gcc/testsuite/g++.dg/pr85039-1.C new file mode 100644 index ..f57c8a261dee --- /dev/null +++ b/gcc/testsuite/g++.dg/pr85039-1.C @@ -0,0 +1,17 @@ +// { dg-do compile { target c++14 } } + +constexpr int a() { + return + __builtin_offsetof(struct { // { dg-error "types may not be defined" } +int i; +short b { + __builtin_offsetof(struct { // { dg-error "types may not be defined" } + int j; +struct c { // { dg-error "types may not be defined" } + void d() { + } +}; + }, j) +}; + }, i); +} diff --git a/gcc/testsuite/g++.dg/pr85039-2.C b/gcc/testsuite/g++.dg/pr85039-2.C new file mode 100644 index ..e6d16325105b --- /dev/null +++ b/gcc/testsuite/g++.dg/pr85039-2.C @@ -0,0 +1,10 @@ +// { dg-do compile } + +struct d { + static d *b; +} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be defined" } + int i; + struct a { // { dg-error "types may not be defined" } +int c() { return .1f; } + }; +}, i)); -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins
Hi Carl, On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote: > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index a0c9b5c..855be43 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) >{ >arg0 = gimple_call_arg (stmt, 0); >lhs = gimple_call_lhs (stmt); > - /* Only fold the vec_splat_*() if arg0 is constant. */ > - if (TREE_CODE (arg0) != INTEGER_CST) > + /* Only fold the vec_splat_*() if arg0 is a 5-bit constant. */ > + if (TREE_CODE (arg0) != INTEGER_CST > + || TREE_INT_CST_LOW (arg0) & ~0x1f) > return false; Should this test for *signed* 5-bit constants only? if (TREE_CODE (arg0) != INTEGER_CST || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15)) or similar? Approved for trunk (with such a change if appropriate, and testing then of course). Thanks! Segher
Re: RFA[powerpc]: patch to fix PR79916
Hi! On Fri, Apr 13, 2018 at 04:43:02PM -0400, Vladimir Makarov wrote: > On 04/13/2018 03:58 PM, Alexander Monakov wrote: > >Here's another compact variant: > > > > regno = reg_renumber[regno]; > > if (regno < 0) > > regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; > > > Thanks, Alexander. I like your variant more. Me too :-) > --- testsuite/gcc.target/powerpc/pr79916.c(nonexistent) > +++ testsuite/gcc.target/powerpc/pr79916.c(working copy) > @@ -0,0 +1,556 @@ > +/* { dg-do compile { target { powerpc64le-*-* } } } */ Is the testcase specific to LE? If not, please use powerpc*-*-*, or powerpc*-*-* && lp64 if it needs 64-bit. > +/* { dg-options "-Idfp -fno-expensive-optimizations --param > ira-max-conflict-table-size=0 -mno-popcntd -O3" } */ I think you can drop the -Idfp ? If that option would do anything we have bigger problems ;-) Looks fine to me otherwise. Thanks for working on this! Segher
Re: RFA[powerpc]: patch to fix PR79916
On 04/13/2018 03:58 PM, Alexander Monakov wrote: On Fri, 13 Apr 2018, Jakub Jelinek wrote: if (reg_renumber[regno] >= 0) regno = reg_renumber[regno]; else regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; or regno = (reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]); is better, the latter is perhaps more compact. Here's another compact variant: regno = reg_renumber[regno]; if (regno < 0) regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; Thanks, Alexander. I like your variant more. Sorry for forgetting GNU standard (it might happen when you are working on a few different projects simultaneously). The revised patch is in the attachment. Index: ChangeLog === --- ChangeLog (revision 259376) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2018-04-13 Vladimir Makarov+ + PR rtl-optimization/79916 + * config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard + regs (if any) to define how to gnerate SD moves when LRA is in + progress. + 2018-04-13 Jan Hubicka Bin Cheng Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 259376) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-04-13 Vladimir Makarov + + PR rtl-optimization/79916 + * gcc.target/powerpc/pr79916.c: New. + 2018-04-13 Andrey Belevantsev PR rtl-optimization/83852 Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 259330) +++ config/rs6000/rs6000.c (working copy) @@ -10610,7 +10610,9 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10635,7 +10637,9 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { @@ -10664,7 +10668,9 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10689,7 +10695,9 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno]; + if (regno < 0) + regno = ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { Index: testsuite/gcc.target/powerpc/pr79916.c === --- testsuite/gcc.target/powerpc/pr79916.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr79916.c (working copy) @@ -0,0 +1,556 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options "-Idfp -fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */ + +#define PASTE2(A,B) A ## B +#define PASTE(A,B) PASTE2(A,B) + +#define TESTVAL_NEG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + si = VAL;\ + sll = PASTE(VAL,LL); \ + a = si;\ + b = sll;\ + c = VAL;\ + d = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NEG_BIG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + sll = PASTE(VAL,LL); \ + a = sll;\ + b = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NONNEG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + si = VAL;\ + ui = VAL;\ + sll = PASTE(VAL,LL); \ + ull = PASTE(VAL,ULL); \ + a = si;\ + b = sll;\ + c = ui;\ + d = ull;\ + e = VAL;\ + f = VAL;\ + g = PASTE(VAL,LL); \ + h = PASTE(VAL,ULL); \ + if ((__builtin_memcmp ((void *),
[PATCH] rs6000 PR83660 fix ICE with vec_extract
Per the discussion on the 83660, I've come to a minimal patch to prevent this. Basically marking the vec_extract tree as having side effects later makes sure that it gets all the cleanup points it needs so that gimplify_cleanup_point_expr () is happy. Also because vec_insert puts a MODIFY_EXPR in there, it has side effects and this problem will not occur. Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is where this issue arises. OK for trunk if everything passes? Thanks, Aaron 2018-04-13 Aaron SawdeyPR target/83660 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Mark vec_extract expression as having side effects to make sure it gets a cleanup point. 2018-04-13 Aaron Sawdey PR target/83660 * gcc.target/powerpc/pr83660.C: New test. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: config/rs6000/rs6000-c.c === --- config/rs6000/rs6000-c.c (revision 259353) +++ config/rs6000/rs6000-c.c (working copy) @@ -6705,6 +6705,15 @@ stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1); stmt = build_indirect_ref (loc, stmt, RO_NULL); + /* PR83660: We mark this as having side effects so that + downstream in fold_build_cleanup_point_expr () it will get a + CLEANUP_POINT_EXPR. If it does not we can run into an ICE + later in gimplify_cleanup_point_expr (). Potentially this + causes missed optimization because the actually is no side + effect. */ + if (c_dialect_cxx ()) + TREE_SIDE_EFFECTS (stmt) = 1; + return stmt; } Index: testsuite/gcc.target/powerpc/pr83660.C === --- testsuite/gcc.target/powerpc/pr83660.C (nonexistent) +++ testsuite/gcc.target/powerpc/pr83660.C (working copy) @@ -0,0 +1,14 @@ +/* PR target/83660 */ +/* { dg-do compile } */ +/* { dg-options "-mcpu=power7" } */ + +#include + +typedef __vector unsigned int uvec32_t __attribute__((__aligned__(16))); + +unsigned get_word(uvec32_t v) +{ +return ({const unsigned _B1 = 32; +vec_extract((uvec32_t)v, 2);}); +} +
Re: [PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)
On Fri, Apr 13, 2018 at 09:32:07PM +0200, Eric Botcazou wrote: > > This works nicely if the landing pad bb hasn't been really modified much, > > in particular it still needs to contain only the instructions emitted > > by expand_dw2_landing_pad_for_region function call and then an unconditional > > jump or at least single successor and nothing else in the bb. > > How is that supposed to work for SJLJ though? Dunno, does SJLJ emit .gcc_except_table section with this limitation? > OK, but also make expand_dw2_landing_pad_for_region private so that the same > mistake is not made again in the future. Done, thanks. Jakub
[PATCH v2, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative
The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h do not properly handle shift values between 16 and 31, inclusive. These are setting up the shift with vec_splat_s32, which only accepts *5 bit signed* shift values, or a range of -16 to 15. Values above 15 produce an error: error: argument 1 must be a 5-bit signed literal Fix is to effectively reduce the range for which vec_splat_s32 is used to < 32 and use vec_splats otherwise. Also, __mm_slli_epi{16,32,64}, when given a negative shift value, should always return a vector of {0}. 2018-04-13 Paul A. ClarkeChanges in v2: - fixed the "shift by 0" cases, which were being treated as negative shifts and returning {0} instead of a no-op. (Segher) - fixed the test cases to correctly test the shift-by zero cases. gcc/ PR target/83402 * config/rs6000/emmintrin.h (_mm_slli_epi{16,32,64}): Ensure that vec_splat_s32 is only called with 0 <= shift < 16. Ensure negative shifts result in {0}. gcc/testsuite/ PR target/83402 * gcc.target/powerpc/sse2-psllw-1.c: Refactor and add tests for several values: positive, negative, and zero. * gcc.target/powerpc/sse2-pslld-1.c: Same. * gcc.target/powerpc/sse2-psllq-1.c: Same. Index: gcc/config/rs6000/emmintrin.h === --- gcc/config/rs6000/emmintrin.h (revision 259375) +++ gcc/config/rs6000/emmintrin.h (working copy) @@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B) __v8hu lshift; __v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 }; - if (__B < 16) + if (__B >= 0 && __B < 16) { if (__builtin_constant_p(__B)) lshift = (__v8hu) vec_splat_s16(__B); @@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B) __v4su lshift; __v4si result = { 0, 0, 0, 0 }; - if (__B < 32) + if (__B >= 0 && __B < 32) { - if (__builtin_constant_p(__B)) - lshift = (__v4su) vec_splat_s32(__B); + if (__builtin_constant_p(__B) && __B < 16) +lshift = (__v4su) vec_splat_s32(__B); else - lshift = vec_splats ((unsigned int) __B); +lshift = vec_splats ((unsigned int) __B); result = vec_vslw ((__v4si) __A, lshift); } @@ -1527,17 +1527,12 @@ _mm_slli_epi64 (__m128i __A, int __B) __v2du lshift; __v2di result = { 0, 0 }; - if (__B < 64) + if (__B >= 0 && __B < 64) { - if (__builtin_constant_p(__B)) - { - if (__B < 32) - lshift = (__v2du) vec_splat_s32(__B); - else - lshift = (__v2du) vec_splats((unsigned long long)__B); - } + if (__builtin_constant_p(__B) && __B < 16) + lshift = (__v2du) vec_splat_s32(__B); else - lshift = (__v2du) vec_splats ((unsigned int) __B); + lshift = (__v2du) vec_splats ((unsigned int) __B); result = vec_vsld ((__v2di) __A, lshift); } Index: gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c === --- gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c (revision 259375) +++ gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c (working copy) @@ -13,32 +13,50 @@ #define TEST sse2_test_pslld_1 #endif -#define N 0xf - #include -static __m128i -__attribute__((noinline, unused)) -test (__m128i s1) -{ - return _mm_slli_epi32 (s1, N); -} +#define TEST_FUNC(id, N) \ + static __m128i \ + __attribute__((noinline, unused)) \ + test##id (__m128i s1) \ + { \ +return _mm_slli_epi32 (s1, N); \ + } +TEST_FUNC(0, 0) +TEST_FUNC(15, 15) +TEST_FUNC(16, 16) +TEST_FUNC(31, 31) +TEST_FUNC(neg1, -1) +TEST_FUNC(neg16, -16) +TEST_FUNC(neg32, -32) +TEST_FUNC(neg64, -64) +TEST_FUNC(neg128, -128) + +#define TEST_CODE(id, N) \ + { \ +int e[4] = {0}; \ +union128i_d u, s; \ +int i; \ +s.x = _mm_set_epi32 (1, -2, 3, 4); \ +u.x = test##id (s.x); \ +if (N >= 0 && N < 32) \ + for (i = 0; i < 4; i++) \ +e[i] = s.a[i] << (N * (N >= 0)); \ +if (check_union128i_d (u, e)) \ + abort (); \ + } + static void TEST (void) { - union128i_d u, s; - int e[4] = {0}; - int i; - - s.x = _mm_set_epi32 (1, -2, 3, 4); - - u.x = test (s.x); - - if (N < 32) -for (i = 0; i < 4; i++) - e[i] = s.a[i] << N; - - if (check_union128i_d (u, e)) -abort (); + TEST_CODE(0, 0); + TEST_CODE(15, 15); + TEST_CODE(16, 16); + TEST_CODE(31, 31); + TEST_CODE(neg1, -1); + TEST_CODE(neg16, -16); + TEST_CODE(neg32, -32); + TEST_CODE(neg64, -64); + TEST_CODE(neg128, -128); } Index: gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c === --- gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c (revision 259375) +++ gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c (working copy) @@ -13,36 +13,56 @@ #define TEST sse2_test_psllq_1 #endif -#define N 60 - #include #ifdef
[PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32
Twelve failures have been occuring in the bfp test directory during -m32 regression testing. The cause of these failures was two-fold: 1. Patches added subsequent to development of the tests caused new error messages to be emitted that are different than the error messages expected in the dejagnu patterns. These new patches also changed which built-in functions are legal when compiling with the -m32 command-line option. 2. The implementation of overloaded built-in functions maps overloaded function names to non-overloaded names. Depending on the stage at which an error is recognized, error messages may refer either to the overloaded built-in function name or the non-overloaded name. This patch: 1. Changes the expected error messages in certain test programs. 2. Disables certain test programs from being exercised on 32-bit targets. 3. Adds a "note" error message to explain the mapping from overloaded built-in functions to non-overloaded built-in functions. This patch has bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2018-04-13 Kelvin Nilsen* config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p): New prototype. * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add note to error message to explain internal mapping of overloaded built-in function name to non-overloaded built-in function name. * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New function. gcc/testsuite/ChangeLog: 2018-04-13 Kelvin Nilsen * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to prevent cascading of errors and change expected error message. * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test to 64-bit targets. * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise. * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected error message. * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c (working copy) @@ -8,10 +8,10 @@ error because the builtin requires 64 bits. */ #include -unsigned __int128 /* { dg-error "'__int128' is not supported on this target" } */ +unsigned long long int get_significand (__ieee128 *p) { __ieee128 source = *p; - return __builtin_vec_scalar_extract_sig (source); /* { dg-error "builtin function '__builtin_vec_scalar_extract_sig' not supported in this compiler configuration" } */ + return (long long int) __builtin_vec_scalar_extract_sig (source); /* { dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */ } Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target powerpc_p9vector_ok } */ /* { dg-options "-mcpu=power9" } */ @@ -11,6 +12,8 @@ { __ieee128 source = *p; + /* IEEE 128-bit floating point operations are only supported + on 64-bit targets. */ return scalar_test_neg (source); } Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target powerpc_p9vector_ok } */ /* { dg-options "-mcpu=power9" } */ @@ -11,6 +12,8 @@ { __ieee128 source = *p; + /* IEEE 128-bit floating point operations are only supported + on 64-bit targets. */ return scalar_test_data_class (source, 3); } Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c (revision 259316) +++
Re: RFA[powerpc]: patch to fix PR79916
On Fri, 13 Apr 2018, Jakub Jelinek wrote: > if (reg_renumber[regno] >= 0) > regno = reg_renumber[regno]; > else > regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; > or > regno = (reg_renumber[regno] >= 0 > ? reg_renumber[regno] > : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]); > is better, the latter is perhaps more compact. Here's another compact variant: regno = reg_renumber[regno]; if (regno < 0) regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; Alexander
Re: RFA[powerpc]: patch to fix PR79916
On Fri, Apr 13, 2018 at 03:29:47PM -0400, Vladimir Makarov wrote: > The attached patch fixes > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79916 > > The PR is about LRA cycling on some tests when SD data are used. The > problem was in that actual assigned reg to pseudo was not in the pseudo > preferred class and this resulted in wrong generated code which LRA tried to > change again and again. > > The patch was successfully bootstrapped and tested on ppc64 (gcc110). > > Is it ok to commit? I have just formatting nits and will defer the actual review to the PowerPC maintainers. The nit is that all the lines are too long. Not sure if if (reg_renumber[regno] >= 0) regno = reg_renumber[regno]; else regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; or regno = (reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]); is better, the latter is perhaps more compact. > --- config/rs6000/rs6000.c(revision 259330) > +++ config/rs6000/rs6000.c(working copy) > @@ -10610,7 +10610,7 @@ rs6000_emit_move (rtx dest, rtx source, >if (regno >= FIRST_PSEUDO_REGISTER) > { > cl = reg_preferred_class (regno); > - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; > + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == > NO_REGS ? -1 : ira_class_hard_regs[cl][1]; > } >if (regno >= 0 && ! FP_REGNO_P (regno)) > { > @@ -10635,7 +10635,7 @@ rs6000_emit_move (rtx dest, rtx source, > { > cl = reg_preferred_class (regno); > gcc_assert (cl != NO_REGS); > - regno = ira_class_hard_regs[cl][0]; > + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : > ira_class_hard_regs[cl][0]; > } >if (FP_REGNO_P (regno)) > { > @@ -10664,7 +10664,7 @@ rs6000_emit_move (rtx dest, rtx source, >if (regno >= FIRST_PSEUDO_REGISTER) > { > cl = reg_preferred_class (regno); > - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; > + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == > NO_REGS ? -1 : ira_class_hard_regs[cl][0]; > } >if (regno >= 0 && ! FP_REGNO_P (regno)) > { > @@ -10689,7 +10689,7 @@ rs6000_emit_move (rtx dest, rtx source, > { > cl = reg_preferred_class (regno); > gcc_assert (cl != NO_REGS); > - regno = ira_class_hard_regs[cl][0]; > + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : > ira_class_hard_regs[cl][0]; > } >if (FP_REGNO_P (regno)) > { Jakub
Re: [PATCH, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative
On 04/13/2018 02:37 PM, Segher Boessenkool wrote: > On Thu, Apr 12, 2018 at 07:07:21PM -0500, Paul Clarke wrote: >> The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h >> do not properly handle shift values between 16 and 31, inclusive. >> These were setting up the shift with vec_splat_s32, which only accepts >> *5 bit signed* shift values, or a range of -16 to 15. Values above 15 >> produced an error: >> >> error: argument 1 must be a 5-bit signed literal >> >> Fix is to effectively reduce the range for which vec_splat_s32 is used >> to < 32 and use vec_splats otherwise. >> >> Also, __mm_slli_epi{16,32,64}, when given a negative shift value, >> should always return a vector of {0}. > > Yup. > >> 2018-04-12 Paul A. Clarke>> >> gcc/config > > There is no changelog there; the changelog is in gcc/. I'll fix it up. At some point, I'll need to figure out what that means, I guess. :-) >> --- gcc/config/rs6000/emmintrin.h(revision 259016) >> +++ gcc/config/rs6000/emmintrin.h(working copy) >> @@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B) >>__v8hu lshift; >>__v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 }; >> >> - if (__B < 16) >> + if (__B > 0 && __B < 16) >> { >>if (__builtin_constant_p(__B)) >>lshift = (__v8hu) vec_splat_s16(__B); >> @@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B) >>__v4su lshift; >>__v4si result = { 0, 0, 0, 0 }; >> >> - if (__B < 32) >> + if (__B > 0 && __B < 32) > > These should >= 0 (I'll fix it). Oh, you're right, of course. That also means the test cases have to change. Shall I submit a new patch with those changes? > Thanks for the patch! I'll commit it; please get your commit access in > order if you plan any further patches. I'll work on that. PC
Re: [PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)
On April 13, 2018 9:33:31 PM GMT+02:00, Eric Botcazouwrote: >> 2018-04-13 Jakub Jelinek >> >> PR rtl-optimization/85376 >> * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ >and >> zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX >> instead of a specific value. >> >> * gcc.dg/pr85376.c: New test. > >My preference would be for this one. Likewise. Richard.
Re: [PATCH, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative
Hi! On Thu, Apr 12, 2018 at 07:07:21PM -0500, Paul Clarke wrote: > The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h > do not properly handle shift values between 16 and 31, inclusive. > These were setting up the shift with vec_splat_s32, which only accepts > *5 bit signed* shift values, or a range of -16 to 15. Values above 15 > produced an error: > > error: argument 1 must be a 5-bit signed literal > > Fix is to effectively reduce the range for which vec_splat_s32 is used > to < 32 and use vec_splats otherwise. > > Also, __mm_slli_epi{16,32,64}, when given a negative shift value, > should always return a vector of {0}. Yup. > 2018-04-12 Paul A. Clarke> > gcc/config There is no changelog there; the changelog is in gcc/. I'll fix it up. > --- gcc/config/rs6000/emmintrin.h (revision 259016) > +++ gcc/config/rs6000/emmintrin.h (working copy) > @@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B) >__v8hu lshift; >__v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 }; > > - if (__B < 16) > + if (__B > 0 && __B < 16) > { >if (__builtin_constant_p(__B)) > lshift = (__v8hu) vec_splat_s16(__B); > @@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B) >__v4su lshift; >__v4si result = { 0, 0, 0, 0 }; > > - if (__B < 32) > + if (__B > 0 && __B < 32) These should >= 0 (I'll fix it). Thanks for the patch! I'll commit it; please get your commit access in order if you plan any further patches. Cheers, Segher
[PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)
On Fri, Apr 13, 2018 at 12:33:02AM +0200, Jakub Jelinek wrote: > The following patch let us punt in these cases. Bootstrapped/regtested on > x86_64-linux and i686-linux, ok for trunk? > > Another option would be to tweak simplify-rtx.c and instead of doing > else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value)) > int_value = GET_MODE_PRECISION (imode); > do > else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value)) > return NULL_RTX; > and similarly for CLZ, haven't tested what would break if anything; > we've been doing something like that since r62453 when the > C?Z_DEFINED_VALUE_AT_ZERO macros have been introduced, and before that > actually the same, just unconditionally assumed the value is undefined at 0. And here is the variant patch, also bootstrapped/regtested on x86_64-linux and i686-linux. Is this one ok for trunk, or the other one (or something else)? 2018-04-13 Jakub JelinekPR rtl-optimization/85376 * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX instead of a specific value. * gcc.dg/pr85376.c: New test. --- gcc/simplify-rtx.c.jj 2018-03-21 11:13:00.322234030 +0100 +++ gcc/simplify-rtx.c 2018-04-13 11:47:05.689621937 +0200 @@ -1877,7 +1877,7 @@ simplify_const_unary_operation (enum rtx if (wi::ne_p (op0, 0)) int_value = wi::clz (op0); else if (! CLZ_DEFINED_VALUE_AT_ZERO (imode, int_value)) - int_value = GET_MODE_PRECISION (imode); + return NULL_RTX; result = wi::shwi (int_value, result_mode); break; @@ -1889,7 +1889,7 @@ simplify_const_unary_operation (enum rtx if (wi::ne_p (op0, 0)) int_value = wi::ctz (op0); else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value)) - int_value = GET_MODE_PRECISION (imode); + return NULL_RTX; result = wi::shwi (int_value, result_mode); break; --- gcc/testsuite/gcc.dg/pr85376.c.jj 2018-04-12 17:44:41.506370642 +0200 +++ gcc/testsuite/gcc.dg/pr85376.c 2018-04-12 17:45:11.669401115 +0200 @@ -0,0 +1,32 @@ +/* PR rtl-optimization/85376 */ +/* { dg-do run { target int128 } } */ +/* { dg-options "-Og -fno-dce -fgcse -fno-tree-ccp -fno-tree-copy-prop -Wno-psabi" } */ + +typedef unsigned int U __attribute__ ((vector_size (64))); +typedef unsigned __int128 V __attribute__ ((vector_size (64))); +unsigned int e, i, l; +unsigned char f; +U g, h, k, j; + +static inline V +foo (unsigned char n, unsigned short o, unsigned int p, U q, U r, U s) +{ + unsigned int t; + o <<= 5; + q[7] >>= __builtin_add_overflow (0xfff0, __builtin_ffs (n), [5]); + t = __builtin_ffs (g[7]); + e *= __builtin_sub_overflow (o, t, ); + return f + (V) g + (V) h + (V) q + i + (V) j + (V) s + (V) k + l; +} + +int +main () +{ + if (__SIZEOF_INT128__ != 16 || __SIZEOF_INT__ != 4 || __CHAR_BIT__ != 8) +return 0; + V x = foo (0, 1, 5, (U) { }, (U) { }, (U) { }); + for (unsigned i = 0; i < 4; i++) +if ((unsigned int) x[i] != 0x20) + __builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)
> 2018-04-13 Jakub Jelinek> > PR rtl-optimization/85376 > * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and > zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX > instead of a specific value. > > * gcc.dg/pr85376.c: New test. My preference would be for this one. -- Eric Botcazou
Re: [PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)
> This works nicely if the landing pad bb hasn't been really modified much, > in particular it still needs to contain only the instructions emitted > by expand_dw2_landing_pad_for_region function call and then an unconditional > jump or at least single successor and nothing else in the bb. How is that supposed to work for SJLJ though? > 2018-04-13 Jakub Jelinek> > PR rtl-optimization/85393 > * bb-reorder.c (fix_up_crossing_landing_pad): In new_bb emit just > a label and unconditional jump to old_bb, rather than > expand_dw2_landing_pad_for_region insn(s) and jump to single_succ > basic block. OK, but also make expand_dw2_landing_pad_for_region private so that the same mistake is not made again in the future. -- Eric Botcazou
RFA[powerpc]: patch to fix PR79916
The attached patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79916 The PR is about LRA cycling on some tests when SD data are used. The problem was in that actual assigned reg to pseudo was not in the pseudo preferred class and this resulted in wrong generated code which LRA tried to change again and again. The patch was successfully bootstrapped and tested on ppc64 (gcc110). Is it ok to commit? Index: ChangeLog === --- ChangeLog (revision 259376) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2018-04-13 Vladimir Makarov+ + PR rtl-optimization/79916 + * config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard + regs (if any) to define how to gnerate SD moves when LRA is in + progress. + 2018-04-13 Jan Hubicka Bin Cheng Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 259376) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-04-13 Vladimir Makarov + + PR rtl-optimization/79916 + * gcc.target/powerpc/pr79916.c: New. + 2018-04-13 Andrey Belevantsev PR rtl-optimization/83852 Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 259330) +++ config/rs6000/rs6000.c (working copy) @@ -10610,7 +10610,7 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10635,7 +10635,7 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { @@ -10664,7 +10664,7 @@ rs6000_emit_move (rtx dest, rtx source, if (regno >= FIRST_PSEUDO_REGISTER) { cl = reg_preferred_class (regno); - regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0]; } if (regno >= 0 && ! FP_REGNO_P (regno)) { @@ -10689,7 +10689,7 @@ rs6000_emit_move (rtx dest, rtx source, { cl = reg_preferred_class (regno); gcc_assert (cl != NO_REGS); - regno = ira_class_hard_regs[cl][0]; + regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : ira_class_hard_regs[cl][0]; } if (FP_REGNO_P (regno)) { Index: testsuite/gcc.target/powerpc/pr79916.c === --- testsuite/gcc.target/powerpc/pr79916.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr79916.c (working copy) @@ -0,0 +1,556 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options "-Idfp -fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */ + +#define PASTE2(A,B) A ## B +#define PASTE(A,B) PASTE2(A,B) + +#define TESTVAL_NEG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + si = VAL;\ + sll = PASTE(VAL,LL); \ + a = si;\ + b = sll;\ + c = VAL;\ + d = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NEG_BIG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + sll = PASTE(VAL,LL); \ + a = sll;\ + b = PASTE(VAL,LL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0)) \ +__builtin_abort (); + +#define TESTVAL_NONNEG(VAL,SUF,SIZE) \ + x = PASTE(PASTE(VAL,.),SUF); \ + si = VAL;\ + ui = VAL;\ + sll = PASTE(VAL,LL); \ + ull = PASTE(VAL,ULL); \ + a = si;\ + b = sll;\ + c = ui;\ + d = ull;\ + e = VAL;\ + f = VAL;\ + g = PASTE(VAL,LL); \ + h = PASTE(VAL,ULL); \ + if ((__builtin_memcmp ((void *), (void *), SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void *), (void *),SIZE) != 0) \ + || (__builtin_memcmp ((void
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
On Fri, Apr 13, 2018 at 1:18 PM, Paolo Carliniwrote: > Hi, > > > On 13/04/2018 19:14, Jason Merrill wrote: >> >> On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carlini >> wrote: >>> >>> Hi, >>> >>> On 13/04/2018 16:06, Jason Merrill wrote: On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini wrote: > > Hi, > > in this error-recovery regression, after a sensible error message > issued > by > cxx_constant_init, store_init_value stores an error_mark_node as > DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears > much > later, to cause a crash during gimplification. As far as I know, the > choice > of storing error_mark_nodes too is the outcome of a rather delicate > error-recovery strategy and I don't think we want to revisit it at this > time, thus the remaining option is catching later the error_mark_node, > at > a > "good" time. I note, in passing, that the do loop in gimplify_expr > which > uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking > from > the > error-recovery point of view, because at each iteration it *does* cover > for > error_operand_p (save_expr) but only immediately after the call, when > it's > too late. > > All the above said, I believe that at least for the 8.1.0 needs we may > want > to catch the error_mark_node in cp_build_modify_expr, when we are > handling > the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR > from > the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep > in > convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone > form, > with the error_mark_node as the first operand. Passes testing on > x86_64-linux. We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place. >>> >>> Basing on my analysis, that's easy to do, in convert_to_integer_1. >> >> Are we passing an error_mark_node down into convert_to_integer_1? >> Where are we folding away the VAR_DECL? > > That convert gets a NOP_EXPR with the VAR_DECL for 'j' as argument and > returns the error_mark_node. Ah, I see. The problem is that even though convert_to_integer_1 was called with dofold false, we lose that when it calls convert(). Why not recurse directly to convert_to_integer_1 with the underlying type? That would avoid the undesired folding. Jason
[PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)
Hi! As mentioned in the comments, the .gcc_except_table format doesn't allow the throwing region and corresponding landing pad to be in different partitions, because it uses landing_pad_symbol - the start of the partition in which the throwing region is and we don't generally have relocations for arbitrary symbol subtraction. The fix_up_crossing_landing_pad caller checks if all the EH incoming edges are from the same partition, in that case makes sure the landing pad is also in that partition; in case they are from different partitions, "duplicates" the landing pad into another bb, redirects EH edges from one of the partitions to the new landing pad and makes sure we jump from that landing pad to the single_succ of the old landing pad. This works nicely if the landing pad bb hasn't been really modified much, in particular it still needs to contain only the instructions emitted by expand_dw2_landing_pad_for_region function call and then an unconditional jump or at least single successor and nothing else in the bb. On the following testcase, GCSE pre pass adds another instruction into the landing pad bb and because fix_up_crossing_landing_pad doesn't really duplicate the bb, but instead regenerates it from scratch based from what the landing pad should contain, that pre inserted insn is lost from the new landing pad and we segfault trying to dereference uninitialized register. Similarly, if e.g. the landing pad bb is merged with some other bb, or has more than one successor etc. Instead of trying to duplicate the whole landing pad (which also wastes space), this patch instead just emits into the new landing pad's bb a label and immediately jumps to the other landing pad (that is a crossing jump of course). This way we don't need to duplicate anything. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-04-13 Jakub JelinekPR rtl-optimization/85393 * bb-reorder.c (fix_up_crossing_landing_pad): In new_bb emit just a label and unconditional jump to old_bb, rather than expand_dw2_landing_pad_for_region insn(s) and jump to single_succ basic block. * g++.dg/opt/pr85393.C: New test. * g++.dg/opt/pr85393-aux.cc: New file. --- gcc/bb-reorder.c.jj 2018-01-12 19:20:32.412976124 +0100 +++ gcc/bb-reorder.c2018-04-13 15:59:41.094859051 +0200 @@ -1409,14 +1409,14 @@ get_uncond_jump_length (void) } /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. - Duplicate the landing pad and split the edges so that no EH edge - crosses partitions. */ + Add a new landing pad that will just jump to the old one and split the + edges so that no EH edge crosses partitions. */ static void fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb) { eh_landing_pad new_lp; - basic_block new_bb, last_bb, post_bb; + basic_block new_bb, last_bb; rtx_insn *jump; unsigned new_partition; edge_iterator ei; @@ -1431,24 +1431,20 @@ fix_up_crossing_landing_pad (eh_landing_ /* Put appropriate instructions in new bb. */ rtx_code_label *new_label = emit_label (new_lp->landing_pad); - expand_dw2_landing_pad_for_region (old_lp->region); - - post_bb = BLOCK_FOR_INSN (old_lp->landing_pad); - post_bb = single_succ (post_bb); - rtx_code_label *post_label = block_label (post_bb); - jump = emit_jump_insn (targetm.gen_jump (post_label)); - JUMP_LABEL (jump) = post_label; + rtx_code_label *old_label = block_label (old_bb); + jump = emit_jump_insn (targetm.gen_jump (old_label)); + JUMP_LABEL (jump) = old_label; /* Create new basic block to be dest for lp. */ last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; new_bb = create_basic_block (new_label, jump, last_bb); new_bb->aux = last_bb->aux; - new_bb->count = post_bb->count; + new_bb->count = old_bb->count; last_bb->aux = new_bb; emit_barrier_after_bb (new_bb); - make_single_succ_edge (new_bb, post_bb, 0); + make_single_succ_edge (new_bb, old_bb, 0); /* Make sure new bb is in the other partition. */ new_partition = BB_PARTITION (old_bb); @@ -1457,7 +1453,7 @@ fix_up_crossing_landing_pad (eh_landing_ /* Fix up the edges. */ for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; ) -if (BB_PARTITION (e->src) == new_partition) +if (e->src != new_bb && BB_PARTITION (e->src) == new_partition) { rtx_insn *insn = BB_END (e->src); rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX); --- gcc/testsuite/g++.dg/opt/pr85393.C.jj 2018-04-13 12:05:46.935137936 +0200 +++ gcc/testsuite/g++.dg/opt/pr85393.C 2018-04-13 12:10:03.473256722 +0200 @@ -0,0 +1,29 @@ +// PR rtl-optimization/85393 +// { dg-do run { target c++11 } } +// { dg-options "-O2" } +// { dg-additional-sources "pr85393-aux.cc" } + +#include +#include + +void foo (char const *s); +struct S { ~S () noexcept (false) { throw std::runtime_error ("foo"); } }; + +int +main
Re: [PATCH] Fix -fsanitize=address VLA instrumentation (PR sanitizer/85230)
On Fri, Apr 13, 2018 at 12:16:06AM +0200, Jakub Jelinek wrote: > Bootstrapped on > {x86_64,i686,powerpc64,powerpc64le,aarch64,s390x,armv7hl}-linux, regtested > on {x86_64,i686,powerpc64,powerpc64le}-linux so far, but on the power* ones > on virtual address space size that isn't really supported (likely > https://github.com/google/sanitizers/issues/933#issuecomment-380058705 > issue, so while nothing regresses there, pretty much all asan tests fail > there before and after the patch); also tested successfully with > asan.exp=alloca* on gcc110 and gcc112 on compile farm where it doesn't > suffer from the VA issue. Ok if testing passes also on aarch64, s390x > and armv7hl? Passed regtest also on aarch64, s390x and armv7hl, additionally with the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85394#c2 patch on top of this patch bootstrapped/regtested on powerpc64{,le}-linux, this time without any asan related issues. Ok for trunk? Jakub
Re: [PATCH] Make redirection only for target_clones: V2 (PR ipa/85329).
On Fri, Apr 13, 2018 at 04:33:40PM +0200, Martin Liška wrote: > > Ah, but we emit the resolver only if we see a use of it. That sounds quite > > broken, resolver in each TU that uses it? Better to have one at each > > definition... > > > > Jakub > > > > So after quite some time I would need some brainstorming as I'm not sure how > to > fix that properly. Let's start how 'target' attribute works and I believe it > should > behave same as target_clones attribute: I think the target_clones behavior if you put the .resolver into .text. section in comdat rather than .text..resolver and .resolver comdat is reasonable. The thing is that both the .resolver and symbols are defined in the section in which the resolver is defined. Maybe it would be nice if the resolver was made not exported out of the TU, and eventually, not necessarily now for GCC8, turn the specializations into non-exported symbols too and put them into the same section or different sections of the same comdat group. For ctors and dtors we need extra care about the aliases, not sure if we can have an alias to ifunc symbol, or if we need to emit two ifunc symbols with the same resolver or what exactly. And yes, it would be nice if the target attribute multiversioning worked similarly to this. It would change behavior for it in ABI incompatible way, the question is how much work it would be as well. What you can do right now with the target attribute multiversioning and couldn't do after the change would be e.g. mv.h: __attribute__((target ("default"))) void foo (); __attribute__((target ("avx"))) void foo (); mv1.C: #include "mv.h" __attribute__((target ("default"))) void foo () {} mv2.C: #include "mv.h" __attribute__((target ("avx"))) void foo () {} mv3.C: #include "mv.h" void bar () { foo (); } You couldn't this in the semantics closer to target_clones, all the definitions would need to be done in the same file which is where you'd also get the ifunc symbol. IMHO it is worth changing the semantics anyway, because the current one isn't very well thought out, it doesn't really work well with comdats, can have too many resolvers with the associated costs, etc. Honza, do you agree? Jakub
[PATCH] configure.ac: honor --with-gcc-major-version in gcc-driver-name.h (PR jit/85384, variant)
On Thu, Apr 12, 2018 at 04:51:21PM -0400, David Malcolm wrote: > This patch updates gcc/configure.ac to use gcc_base_ver. > > I had to drop the \$\$ from the sed expression to get it to work > within the configure script; I'm not entirely sure what their purpose > is. Without them, it's still matching on the first group of numeric > characters in BASE-VER. > > Tested with and without --with-gcc-major-version; in each case, > gcc-driver-name.h is correctly determined. > > Fixes the linker issue reported downstream in > https://bugzilla.redhat.com/show_bug.cgi?id=1566178 > and fixes the driver not found issue with: > gcc_jit_context_set_bool_use_external_driver (ctxt, 1); > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > OK for trunk? > > config/ChangeLog: > PR jit/85384 > * acx.m4 (GCC_BASE_VER): Remove \$\$ from sed expression. > > gcc/ChangeLog: > PR jit/85384 > * configure.ac (gcc-driver-name.h): Honor --with-gcc-major-version > by using gcc_base_ver to generate a gcc_driver_version, and use > it when generating GCC_DRIVER_NAME. > * configure.ac: Regenerate. Here is the variant I've talked about in patch form. Bootstrapped/regtested on x86_64-linux and i686-linux and tested with --enable-languages=jit --enable-host-shared --disable-bootstrap and --enable-languages=jit --enable-host-shared --disable-bootstrap --with-gcc-major-version-only Ok for trunk? 2018-04-13 Jakub JelinekPR jit/85384 * configure.ac (GCC_DRIVER_NAME): For --with-gcc-major-version-only use just major version in the driver filename rather than full version. * configure: Regenerated. --- gcc/configure.ac.jj 2018-04-12 10:22:56.179162225 +0200 +++ gcc/configure.ac2018-04-13 16:16:02.712459619 +0200 @@ -6499,8 +6499,14 @@ AC_DEFINE_UNQUOTED(DIAGNOSTICS_COLOR_DEF # Generate gcc-driver-name.h containing GCC_DRIVER_NAME for the benefit # of jit/jit-playback.c. +changequote(,)dnl +gcc_driver_version=$gcc_BASEVER +if test x$with_gcc_major_version_only = xyes ; then + gcc_driver_version=`echo $gcc_BASEVER | sed -e 's/^\([0-9]*\).*$/\1/'` +fi +changequote([,])dnl cat > gcc-driver-name.h < gcc-driver-name.h <
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
Hi, On 13/04/2018 19:14, Jason Merrill wrote: On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carliniwrote: Hi, On 13/04/2018 16:06, Jason Merrill wrote: On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini wrote: Hi, in this error-recovery regression, after a sensible error message issued by cxx_constant_init, store_init_value stores an error_mark_node as DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much later, to cause a crash during gimplification. As far as I know, the choice of storing error_mark_nodes too is the outcome of a rather delicate error-recovery strategy and I don't think we want to revisit it at this time, thus the remaining option is catching later the error_mark_node, at a "good" time. I note, in passing, that the do loop in gimplify_expr which uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the error-recovery point of view, because at each iteration it *does* cover for error_operand_p (save_expr) but only immediately after the call, when it's too late. All the above said, I believe that at least for the 8.1.0 needs we may want to catch the error_mark_node in cp_build_modify_expr, when we are handling the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form, with the error_mark_node as the first operand. Passes testing on x86_64-linux. We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place. Basing on my analysis, that's easy to do, in convert_to_integer_1. Are we passing an error_mark_node down into convert_to_integer_1? Where are we folding away the VAR_DECL? That convert gets a NOP_EXPR with the VAR_DECL for 'j' as argument and returns the error_mark_node. Paolo.
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carliniwrote: > Hi, > > On 13/04/2018 16:06, Jason Merrill wrote: >> >> On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini >> wrote: >>> >>> Hi, >>> >>> in this error-recovery regression, after a sensible error message issued >>> by >>> cxx_constant_init, store_init_value stores an error_mark_node as >>> DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much >>> later, to cause a crash during gimplification. As far as I know, the >>> choice >>> of storing error_mark_nodes too is the outcome of a rather delicate >>> error-recovery strategy and I don't think we want to revisit it at this >>> time, thus the remaining option is catching later the error_mark_node, at >>> a >>> "good" time. I note, in passing, that the do loop in gimplify_expr which >>> uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from >>> the >>> error-recovery point of view, because at each iteration it *does* cover >>> for >>> error_operand_p (save_expr) but only immediately after the call, when >>> it's >>> too late. >>> >>> All the above said, I believe that at least for the 8.1.0 needs we may >>> want >>> to catch the error_mark_node in cp_build_modify_expr, when we are >>> handling >>> the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR >>> from >>> the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in >>> convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone >>> form, >>> with the error_mark_node as the first operand. Passes testing on >>> x86_64-linux. >> >> We should avoid wrapping an error_mark_node in a NOP_EXPR in the first >> place. > > Basing on my analysis, that's easy to do, in convert_to_integer_1. Are we passing an error_mark_node down into convert_to_integer_1? Where are we folding away the VAR_DECL? Jason
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
Hi, On 13/04/2018 16:06, Jason Merrill wrote: On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carliniwrote: Hi, in this error-recovery regression, after a sensible error message issued by cxx_constant_init, store_init_value stores an error_mark_node as DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much later, to cause a crash during gimplification. As far as I know, the choice of storing error_mark_nodes too is the outcome of a rather delicate error-recovery strategy and I don't think we want to revisit it at this time, thus the remaining option is catching later the error_mark_node, at a "good" time. I note, in passing, that the do loop in gimplify_expr which uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the error-recovery point of view, because at each iteration it *does* cover for error_operand_p (save_expr) but only immediately after the call, when it's too late. All the above said, I believe that at least for the 8.1.0 needs we may want to catch the error_mark_node in cp_build_modify_expr, when we are handling the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form, with the error_mark_node as the first operand. Passes testing on x86_64-linux. We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place. Basing on my analysis, that's easy to do, in convert_to_integer_1. I wasn't sure we wanted to touch such basic facilities ;) Implementation-wise, I wondered whether we wanted to handle NOP_EXPR and error_mark_node specially inside build1 itself, but, again, that seems a bit invasive, plus all the build* facilities always allocate a new node as the first step. Anyway, fully testing the below will require a bit of time, shall I go ahead with that, given that g++.dg passed? Thanks! Paolo. Index: convert.c === --- convert.c (revision 259376) +++ convert.c (working copy) @@ -743,6 +743,8 @@ convert_to_integer_1 (tree type, tree expr, bool d { expr = convert (lang_hooks.types.type_for_mode (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr); + if (expr == error_mark_node) + return error_mark_node; return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr); } Index: testsuite/g++.dg/cpp0x/pr85112.C === --- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent) +++ testsuite/g++.dg/cpp0x/pr85112.C(working copy) @@ -0,0 +1,17 @@ +// PR c++/85112 +// { dg-do compile { target c++11 } } + +struct A +{ + int m; + int n : 4; +}; + +int i; // { dg-message "not const" } + +void foo() +{ + constexpr int j = i; // { dg-error "not usable" } + A a; + a.n = j; +}
[PATCH] avoid -Wrestrict for null pointers (PR 85365)
PR 85365 is another example of a false positive caused by the interaction between the instrumentation inserted by sanitizers, jump threading, and a middle-end warning. In this case, the warning is easy to avoid by suppressing -Wrestrict for null pointers. Although undefined, since they do no point to an object, strictly speaking they do not indicate an overlap, and so issuing a -Wrestrict warning is not quite appropriate anyway. Testing in progress. Martin PR c/85365 - -Wrestrict false positives with -fsanitize=undefined gcc/ChangeLog: PR c/85365 * gimple-fold.c (gimple_fold_builtin_strcpy): Suppress -Wrestrict for null pointers. (gimple_fold_builtin_stxcpy_chk): Same. * gimple-ssa-warn-restrict.c (check_bounds_or_overlap): Same. gcc/testsuite/ChangeLog: PR c/85365 * gcc.dg/Wrestrict-15.c: New test. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 259298) +++ gcc/gimple-fold.c (working copy) @@ -1612,7 +1612,11 @@ gimple_fold_builtin_strcpy (gimple_stmt_iterator * /* If SRC and DEST are the same (and not volatile), return DEST. */ if (operand_equal_p (src, dest, 0)) { - if (!gimple_no_warning_p (stmt)) + /* Issue -Wrestrict unless the pointers are null (those do + not point to objects and so do not indicate an overlap; + such calls could be the result of sanitization and jump + threading). */ + if (!integer_zerop (dest) && !gimple_no_warning_p (stmt)) { tree func = gimple_call_fndecl (stmt); @@ -2593,7 +2597,11 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterat /* If SRC and DEST are the same (and not volatile), return DEST. */ if (fcode == BUILT_IN_STRCPY_CHK && operand_equal_p (src, dest, 0)) { - if (!gimple_no_warning_p (stmt)) + /* Issue -Wrestrict unless the pointers are null (those do + not point to objects and so do not indicate an overlap; + such calls could be the result of sanitization and jump + threading). */ + if (!integer_zerop (dest) && !gimple_no_warning_p (stmt)) { tree func = gimple_call_fndecl (stmt); Index: gcc/gimple-ssa-warn-restrict.c === --- gcc/gimple-ssa-warn-restrict.c (revision 259298) +++ gcc/gimple-ssa-warn-restrict.c (working copy) @@ -1880,11 +1880,20 @@ check_bounds_or_overlap (gcall *call, tree dst, tr if (operand_equal_p (dst, src, 0)) { - warning_at (loc, OPT_Wrestrict, - "%G%qD source argument is the same as destination", - call, func); - gimple_set_no_warning (call, true); - return false; + /* Issue -Wrestrict unless the pointers are null (those do + not point to objects and so do not indicate an overlap; + such calls could be the result of sanitization and jump + threading). */ + if (!integer_zerop (dst) && !gimple_no_warning_p (call)) + { + warning_at (loc, OPT_Wrestrict, + "%G%qD source argument is the same as destination", + call, func); + gimple_set_no_warning (call, true); + return false; + } + + return true; } /* Return false when overlap has been detected. */ Index: gcc/testsuite/gcc.dg/Wrestrict-15.c === --- gcc/testsuite/gcc.dg/Wrestrict-15.c (nonexistent) +++ gcc/testsuite/gcc.dg/Wrestrict-15.c (working copy) @@ -0,0 +1,38 @@ +/* PR 85365 - -Wrestrict false positives with -fsanitize=undefined + { dg-do compile } + { dg-options "-O2 -Wrestrict -fsanitize=undefined" } */ + +typedef __SIZE_TYPE__ size_t; + +char *strcpy (char *, const char *); +char *strcat (char *, const char *); + +size_t strlen (char *); + +extern char a[], b[], d[]; + +size_t t1 (char *g, int i) +{ + /* The following exercises the handling in gimple-fold.c. */ + strcpy (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */ + return strlen (g + 4); +} + +void t2 (char *g, int i) +{ + strcpy (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */ + strcat (g + 4, d); +} + +void t3 (char *g, int i) +{ + /* The following exercises the handling in gimple-ssa-warn-restrict.c. */ + strcat (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */ + strcat (g + 4, d); +} + +void t4 (char *p, char *q) +{ + strcpy (p, q);/* { dg-bogus "\\\[-Wrestrict]" } */ + strcat (p, q + 32); +}
[PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins
GCC Maintainers: GCC revision 255549 implemented early gimple folding for the vec_splat_s[8,16,32] builtins. However, as a consequence of the implementation, we lost error checking for out-of-range values for the expected vspltis[bhw] instructions. The result of not having the out- of-range checking is we get and ICE. This patch adds the missing error checking on argument zero for the vec_splat_s[8,16,32] builtins. The argument must be a 5-bit const int as specified for the vspltis[bhw] instructions. The regression testing for the patch was done on GCC mainline on powerpc64le-unknown-linux-gnu (Power 8 LE) with no regressions. Additional hand testing was done as well to test the various cases such as vec_splat_s8 (31) vec_splat_s8 (32) vec_splat_s8 (value) where "value" is an integer variable to verify vector result is correct for a 5-bit const int argument ( i.e. 31). The error message "error: argument 1 must be a 5-bit signed literal" is generated in the other cases. Please let me know if the patch looks OK for the GCC 7 branch. Carl Love --- gcc/ChangeLog: 2018-04-13 Carl LovePR target/83402 * config/rs6000/rs6000-c.c (rs6000_gimple_fold_builtin): Add size check for arg0. --- gcc/config/rs6000/rs6000.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index a0c9b5c..855be43 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) { arg0 = gimple_call_arg (stmt, 0); lhs = gimple_call_lhs (stmt); -/* Only fold the vec_splat_*() if arg0 is constant. */ -if (TREE_CODE (arg0) != INTEGER_CST) +/* Only fold the vec_splat_*() if arg0 is a 5-bit constant. */ +if (TREE_CODE (arg0) != INTEGER_CST +|| TREE_INT_CST_LOW (arg0) & ~0x1f) return false; gimple_seq stmts = NULL; location_t loc = gimple_location (stmt); -- 2.7.4
[ARM,testsuite] Restrict armv8_2-fp16-scalar-2 to hf targets
Hi, It seems gcc.target/arm/armv8_2-fp16-scalar-2.c needs an armhf target to pass: it wants 4 fmov.f16, but there are only 2 on arm-linux-gnueabi for instance. Since the test includes arm_fp16.h, adding -mfloat-abi=hard implies the usual problem of missing gnu/stubs-hard.h, so it seems to me that the easiest way is just to skip this test on non-hf targets. Is this simple patch OK or do we want something smarter? Thanks, Christophe 2018-04-13 Christophe Lyon* gcc.target/arm/armv8_2-fp16-scalar-2.c: Require arm_hf_eabi. diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c index fa4828d..fcd7db4 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_hf_eabi } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ /* { dg-options "-O2 -std=c11" } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote: > On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, >> wrote: > On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote: > > Hi, > > > > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande > > >: > > > Hi Christophe, > > > > > > Please find attached the updated patch with testcases. > > > > > > Ok for trunk? > > > > Thanks for the update. > > > > Since the new intrinsics are only available on aarch64, you want to > > prevent the tests from running on arm. > > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two > > targets. > > There are several examples on how to do that in that directory. > > > > I have also noticed that the tests fail at execution on aarch64_be. > > I think this is important to fix. We don't want the big-endian target to have > failing implementations of the Neon intrinsics. What is the nature of the > failure? > > From what I can see, nothing in the patch prevents using these intrinsics > on big-endian, so either the intrinsics behaviour is wrong (we have a wrong > code bug), or the testcase expected behaviour is wrong. > > I don't think disabling the test for big-endian is the right fix. We should > either fix the intrinsics, or fix the testcase. > > Thanks, > James > > Hi James, > > As the tests assume the little endian order of elements while checking the > results, the tests are failing for big endian targets. So, the failures are > not because of intrinsic implementations, but because of the testcase. The testcase is a little hard to follow through the macros, but why would this be the case? ld1 is deterministic on big and little endian for which elements will be loaded from memory, as is st1. My expectation would be that: int __attribute__ ((noinline)) test_vld_u16_x3 () { uint16_t data[3 * 3]; uint16_t temp[3 * 3]; uint16x4x3_t vectors; int i,j; for (i = 0; i < 3 * 3; i++) data [i] = (uint16_t) 3*i; asm volatile ("" : : : "memory"); vectors = vld1_u16_x3 (data); vst1_u16 (temp, vectors.val[0]); vst1_u16 ([3], vectors.val[1]); vst1_u16 ([3 * 2], vectors.val[2]); asm volatile ("" : : : "memory"); for (j = 0; j < 3 * 3; j++) if (temp[j] != data[j]) return 1; return 0; } would work equally well for big- or little-endian. I think this is more likely to be an intrinsics implementation bug. Thanks, James
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh,wrote: > On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote: > > Hi, > > > > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande < > sameera.deshpa...@linaro.org>: > > > Hi Christophe, > > > > > > Please find attached the updated patch with testcases. > > > > > > Ok for trunk? > > > > Thanks for the update. > > > > Since the new intrinsics are only available on aarch64, you want to > > prevent the tests from running on arm. > > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two > targets. > > There are several examples on how to do that in that directory. > > > > I have also noticed that the tests fail at execution on aarch64_be. > > I think this is important to fix. We don't want the big-endian target to > have > failing implementations of the Neon intrinsics. What is the nature of the > failure? > > From what I can see, nothing in the patch prevents using these intrinsics > on big-endian, so either the intrinsics behaviour is wrong (we have a wrong > code bug), or the testcase expected behaviour is wrong. > > I don't think disabling the test for big-endian is the right fix. We should > either fix the intrinsics, or fix the testcase. > > Thanks, > James > > Hi James, As the tests assume the little endian order of elements while checking the results, the tests are failing for big endian targets. So, the failures are not because of intrinsic implementations, but because of the testcase. - Thanks and regards, Sameera D.
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote: > Hi, > > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande: > > Hi Christophe, > > > > Please find attached the updated patch with testcases. > > > > Ok for trunk? > > Thanks for the update. > > Since the new intrinsics are only available on aarch64, you want to > prevent the tests from running on arm. > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two > targets. > There are several examples on how to do that in that directory. > > I have also noticed that the tests fail at execution on aarch64_be. I think this is important to fix. We don't want the big-endian target to have failing implementations of the Neon intrinsics. What is the nature of the failure? >From what I can see, nothing in the patch prevents using these intrinsics on big-endian, so either the intrinsics behaviour is wrong (we have a wrong code bug), or the testcase expected behaviour is wrong. I don't think disabling the test for big-endian is the right fix. We should either fix the intrinsics, or fix the testcase. Thanks, James
Re: [PATCH] Make redirection only for target_clones: V2 (PR ipa/85329).
On 04/12/2018 07:59 PM, Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 07:53:35PM +0200, Jakub Jelinek wrote: >> On Thu, Apr 12, 2018 at 03:46:26PM +0200, Jan Hubicka wrote: >>> If you make C++ inline and get the idea to use target cloning attribute on >>> this, >>> this will likely lead to link error if you compile multiple files because >>> you >>> turn comdat to non-comdat. >>> >>> For comdats this woudl effectivly need to become C++ abi extension and we >>> would >>> need to define comdat sections for these. Perhaps easiest way is to simply >>> reject the attribute on comdats and probaby also extern functions? >> >> I'm not really sure we can do that, various packages in the wild are already >> using this. >> What is the problem with comdats and multi-versioning? >> The question is what comdat groups we should use for the comdat resolver and >> the versioned functions, shall the ifunc symbol be the original mangling of >> the method (or other comdat) and the other entrypoints just be .local >> non-weak symbols inside of the same section? > > Ah, but we emit the resolver only if we see a use of it. That sounds quite > broken, resolver in each TU that uses it? Better to have one at each > definition... > > Jakub > So after quite some time I would need some brainstorming as I'm not sure how to fix that properly. Let's start how 'target' attribute works and I believe it should behave same as target_clones attribute: $ cat mv.h int foo (void); void test (); $ cat mv.cpp #include "mv.h" __attribute__ ((target ("default"))) int foo () { return 0; } __attribute__ ((target ("sse4.2"))) int foo () { return 1; } int main () { __builtin_printf ("in main: %d\n", foo ()); test (); return 0; } $ cat mv2.cpp #include "mv.h" void test() { int (*f) (void) = __builtin_printf ("value: %d\n", foo ()); } $ gcc -fdump-ipa-cgraph=/dev/stdout mv.cpp -c _Z13_Z3foov.ifuncv/2 (int _Z3foov.ifunc()) @0x767182e0 Type: function definition analyzed alias cpp_implicit_alias Visibility: externally_visible asm_written public comdat comdat_group:_Z3foov.resolver one_only section:.text._Z3foov.resolver (implicit_section) artificial Same comdat group as: _Z3foov.resolver/6 References: _Z3foov.resolver/6 (alias) Referring: Availability: overwritable First run: 0 Version info: next: _Z3foov/0 dispatcher: _Z3foov.resolver Function flags: Called by: Calls: _Z3foov.sse4.2/1 (int foo()) @0x76718170 Type: function definition analyzed Visibility: force_output externally_visible asm_written public Address is taken. References: Referring: Availability: available First run: 0 Version info: prev: _Z3foov/0 dispatcher: int _Z3foov.ifunc() Function flags: Called by: Calls: _Z3foov/0 (int foo()) @0x76718000 Type: function definition analyzed Visibility: force_output externally_visible asm_written public Address is taken. References: Referring: Availability: available First run: 0 Version info: next: _Z3foov.sse4.2/1 dispatcher: int _Z3foov.ifunc() Function flags: Called by: Calls: _Z3foov.resolver/6 (_Z3foov.resolver) @0x767188a0 Type: function definition analyzed Visibility: externally_visible asm_written public weak comdat comdat_group:_Z3foov.resolver one_only section:.text._Z3foov.resolver (implicit_section) artificial Same comdat group as: _Z13_Z3foov.ifuncv/2 References: Referring: _Z13_Z3foov.ifuncv/2 (alias) Availability: available First run: 0 Function flags: Called by: Calls: So note that resolver and ifunc live in a unique comdat_group. And as opposed to target_clones, default implementation has not appended '.default' suffix. That's problem because an address taken returns default implementation as seen here: $ gcc mv.cpp mv2.cpp && ./a.out in main: 1 value: 0 Which is the same problem is fixed for target_clones in this release cycle. So it's broken. Apart from that, following is broken for target attribute: cat Crypto-target.ii struct aaa { static __attribute__((target("avx512f"))) void foo() { __builtin_printf ("avx512f\n"); } static __attribute__((target("sse"))) void foo() {__builtin_printf ("sse\n");} static __attribute__((target("default"))) void foo() {__builtin_printf ("default\n");} }; void (*initializer) (void) = { ::foo }; int main() { initializer (); // aaa::foo (); } $ g++ Crypto-target.ii && ./a.out /tmp/ccJMMaDc.o:(.data+0x0): undefined reference to `_ZN3aaa3fooEv.ifunc()' collect2: error: ld returned 1 exit status For target_clones, I have a patch candidate that works for the test-case in PR85329: $ cat ~/Programming/testcases/Crypto.ii struct a { __attribute__((target_clones("sse", "default"))) void operator^=(a) {} } * b; class c { public: a *d(); }; class f { void g(); c e; }; void f::g() { *e.d() ^= b[0]; } $ g++ ~/Programming/testcases/Crypto.ii -c -fdump-ipa-cgraph=/dev/stdout _ZN1aeOES_.resolver/6
Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carliniwrote: > Hi, > > in this error-recovery regression, after a sensible error message issued by > cxx_constant_init, store_init_value stores an error_mark_node as > DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much > later, to cause a crash during gimplification. As far as I know, the choice > of storing error_mark_nodes too is the outcome of a rather delicate > error-recovery strategy and I don't think we want to revisit it at this > time, thus the remaining option is catching later the error_mark_node, at a > "good" time. I note, in passing, that the do loop in gimplify_expr which > uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the > error-recovery point of view, because at each iteration it *does* cover for > error_operand_p (save_expr) but only immediately after the call, when it's > too late. > > All the above said, I believe that at least for the 8.1.0 needs we may want > to catch the error_mark_node in cp_build_modify_expr, when we are handling > the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from > the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in > convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form, > with the error_mark_node as the first operand. Passes testing on > x86_64-linux. We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place. Jason
[PATCH] i386: Insert ENDBR after __morestack call
Since __morestack will jump back to its callee via indirect call, we need to insert ENDBR after calling __morestack. OK for trunk? H.J. gcc/ PR target/85388 * config/i386/i386.c (ix86_expand_split_stack_prologue): Insert ENDBR after calling __morestack. gcc/testsuite/ PR target/85388 * gcc.dg/pr85388-1.c: New test. * gcc.dg/pr85388-2.c: Likewise. * gcc.dg/pr85388-3.c: Likewise. * gcc.dg/pr85388-4.c: Likewise. * gcc.dg/pr85388-5.c: Likewise. * gcc.dg/pr85388-6.c: Likewise. --- gcc/config/i386/i386.c | 11 ++- gcc/testsuite/gcc.dg/pr85388-1.c | 50 + gcc/testsuite/gcc.dg/pr85388-2.c | 56 gcc/testsuite/gcc.dg/pr85388-3.c | 65 + gcc/testsuite/gcc.dg/pr85388-4.c | 69 gcc/testsuite/gcc.dg/pr85388-5.c | 54 +++ gcc/testsuite/gcc.dg/pr85388-6.c | 56 7 files changed, 360 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/pr85388-1.c create mode 100644 gcc/testsuite/gcc.dg/pr85388-2.c create mode 100644 gcc/testsuite/gcc.dg/pr85388-3.c create mode 100644 gcc/testsuite/gcc.dg/pr85388-4.c create mode 100644 gcc/testsuite/gcc.dg/pr85388-5.c create mode 100644 gcc/testsuite/gcc.dg/pr85388-6.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 03e5c433574..8b4fd8ae30b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -15242,7 +15242,16 @@ ix86_expand_split_stack_prologue (void) instruction--we need control flow to continue at the subsequent label. Therefore, we use an unspec. */ gcc_assert (crtl->args.pops_args < 65536); - emit_insn (gen_split_stack_return (GEN_INT (crtl->args.pops_args))); + rtx_insn *ret_insn += emit_insn (gen_split_stack_return (GEN_INT (crtl->args.pops_args))); + + if ((flag_cf_protection & CF_BRANCH) && TARGET_IBT) +{ + /* Insert ENDBR since __morestack will jump back here via indirect +call. */ + rtx cet_eb = gen_nop_endbr (); + emit_insn_after (cet_eb, ret_insn); +} /* If we are in 64-bit mode and this function uses a static chain, we saved %r10 in %rax before calling _morestack. */ diff --git a/gcc/testsuite/gcc.dg/pr85388-1.c b/gcc/testsuite/gcc.dg/pr85388-1.c new file mode 100644 index 000..86d4737e32b --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr85388-1.c @@ -0,0 +1,50 @@ +/* This test needs to use setrlimit to set the stack size, so it can + only run on Unix. */ +/* { dg-do run { target { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */ +/* { dg-require-effective-target cet } */ +/* { dg-require-effective-target split_stack } */ +/* { dg-options "-fsplit-stack -fcf-protection -mcet" } */ + +#include +#include +#include + +/* Use a noinline function to ensure that the buffer is not removed + from the stack. */ +static void use_buffer (char *buf) __attribute__ ((noinline)); +static void +use_buffer (char *buf) +{ + buf[0] = '\0'; +} + +/* Each recursive call uses 10,000 bytes. We call it 1000 times, + using a total of 10,000,000 bytes. If -fsplit-stack is not + working, that will overflow our stack limit. */ + +static void +down (int i) +{ + char buf[1]; + + if (i > 0) +{ + use_buffer (buf); + down (i - 1); +} +} + +int +main (void) +{ + struct rlimit r; + + /* We set a stack limit because we are usually invoked via make, and + make sets the stack limit to be as large as possible. */ + r.rlim_cur = 8192 * 1024; + r.rlim_max = 8192 * 1024; + if (setrlimit (RLIMIT_STACK, ) != 0) +abort (); + down (1000); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/pr85388-2.c b/gcc/testsuite/gcc.dg/pr85388-2.c new file mode 100644 index 000..fd13d984c50 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr85388-2.c @@ -0,0 +1,56 @@ +/* { dg-do run { target { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */ +/* { dg-require-effective-target cet } */ +/* { dg-require-effective-target split_stack } */ +/* { dg-require-effective-target pthread_h } */ +/* { dg-options "-pthread -fsplit-stack -fcf-protection -mcet" } */ + +#include +#include + +/* Use a noinline function to ensure that the buffer is not removed + from the stack. */ +static void use_buffer (char *buf) __attribute__ ((noinline)); +static void +use_buffer (char *buf) +{ + buf[0] = '\0'; +} + +/* Each recursive call uses 10,000 bytes. We call it 1000 times, + using a total of 10,000,000 bytes. If -fsplit-stack is not + working, that will overflow our stack limit. */ + +static void +down (int i) +{ + char buf[1]; + + if (i > 0) +{ + use_buffer (buf); + down (i - 1); +} +} + +static void * +thread_routine (void *arg __attribute__ ((unused))) +{ + down (1000); + return NULL; +} + +int +main (void) +{ + int i; + pthread_t
[x86, patch] Add tuning options to skylake-avx512
Hi, This patch adds 2 tuning options to -march=skylake-avx512. Ok for trunk? gcc/ PR target/84413 * config/i386/x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Add m_SKYLAKE_AVX512. Thanks, Julia 0001-pts.patch Description: 0001-pts.patch
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)
On Fri, Apr 13, 2018 at 12:10:33PM +, Wilco Dijkstra wrote: > > Anyway, here is what I think Richard was asking for, that I'm currently > > bootstrapping/regtesting. It can be easily combined with Martin's target > > hook if needed, or do it only for > > endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp) > > This patch causes regressions on AArch64 since it now always calls mempcpy > again, so yes either it would need to be done only for tailcalls (which fixes > the No, it only uses mempcpy if we'd otherwise call memcpy library function and user wrote mempcpy. AFAIK 7.x and earlier behaved that way too, so it isn't a regression, regression is only from released GCC versions. And, a fix is easy, just implement fast mempcpy on aarch64 ;) Jakub
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)
On Fri, Apr 13, 2018 at 5:10 AM, Wilco Dijkstrawrote: > This patch causes regressions on AArch64 since it now always calls mempcpy > again, so yes either it would need to be done only for tailcalls (which fixes > the > reported regression) or we need Martin's change too so each target can state > whether > they have a fast mempcpy or not. > You should open a bug report to track it. -- H.J.
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)
Jakub Jelinek wrote: >On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote: >> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got >> > added, >> > in 2013 other power versions, in 2016 s390*, etc. Doing a decent mempcpy >> > isn't hard if you have asm version of memcpy and one spare register. >> >> More mempcpy implementations have been added in recent years indeed, but >> almost all >> add an extra copy of the memcpy code rather than using a single combined >> implementation. >> That means it is still better to call memcpy (which is frequently used and >> thus likely in L1/L2) >> rather than mempcpy (which is more likely to be cold and thus not cached). > > That really depends, usually when some app uses mempcpy, it uses it very > heavily. But it would have to not use memcpy nearby. Do you have examples of apps using mempcpy significantly? GLIBC is the only case I've seen that uses mempcpy. > And all the proposed patches do is honor what the user asked, if > you use memcpy () + n, we aren't transforming that into mempcpy behind the > user's back. We transform a lot of calls behind the user's back so that's not a plausible argument for "honoring" the original call. Eg. (void)mempcpy always gets changed to memcpy, bcopy to memmove, bzero to memset, strchr (s, 0) into strlen(s) + s - the list is long and there are plenty cases where these expansions block tailcalls. > Anyway, here is what I think Richard was asking for, that I'm currently > bootstrapping/regtesting. It can be easily combined with Martin's target > hook if needed, or do it only for > endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp) This patch causes regressions on AArch64 since it now always calls mempcpy again, so yes either it would need to be done only for tailcalls (which fixes the reported regression) or we need Martin's change too so each target can state whether they have a fast mempcpy or not. Wilco
Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure
On 13 April 2018 at 11:22, Jakub Jelinek wrote: > On Fri, Apr 13, 2018 at 11:19:35AM +0100, Jonathan Wakely wrote: >> --- a/libstdc++-v3/src/c++11/Makefile.am >> +++ b/libstdc++-v3/src/c++11/Makefile.am >> @@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc >> >> if ENABLE_DUAL_ABI >> # Rewrite the type info for __ios_failure. >> -rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \ >> - -e 'n' \ >> - -e >> 's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \ >> - -e '}' >> +rewrite_ios_failure_typeinfo = sed -e >> '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' > > I miss / in between , and _ZTVN10__cxxabiv120__si_class_type_infoE > > $ echo | sed -e > '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_info_ZTVSt19__iosfail_type_info/' > sed: -e expression #1, char 29: unexpected `,' > $ echo | sed -e > '/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' > > Jakub Not sure how that got past my testing, so I've made sure to test with a clean build this time. Committed to trunk and gcc-7-branch. commit 6b12637ba78f29a2a697d3fb78e3e23076d8dee4 Author: Jonathan WakelyDate: Fri Apr 13 11:30:48 2018 +0100 Fix broken sed command from previous commit * src/c++11/Makefile.am: Fix sed command. * src/c++11/Makefile.in: Regenerate. diff --git a/libstdc++-v3/src/c++11/Makefile.am b/libstdc++-v3/src/c++11/Makefile.am index 12bc004a2ea..cdc49bb7f9b 100644 --- a/libstdc++-v3/src/c++11/Makefile.am +++ b/libstdc++-v3/src/c++11/Makefile.am @@ -128,7 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc if ENABLE_DUAL_ABI # Rewrite the type info for __ios_failure. -rewrite_ios_failure_typeinfo = sed -e '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' +rewrite_ios_failure_typeinfo = sed -e '/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' cxx11-ios_failure-lt.s: cxx11-ios_failure.cc $(LTCXXCOMPILE) -S $< -o tmp-cxx11-ios_failure-lt.s
Add test from PR 83852 (was Re: Fix PR 83962)
On 09.04.2018 12:16, Andrey Belevantsev wrote: > On 06.04.2018 18:59, Alexander Monakov wrote: >> On Tue, 3 Apr 2018, Andrey Belevantsev wrote: >> >>> Hello, >>> >>> This issues is about the correct order in which we need to call the >>> routines that fix up the control flow for us. >> >> OK with formatting both in the new comment and the Changelog fixed. > > Thanks, fixed that in rev. 259229. I've found out that this patch also fixes PR 83852, so I've committed the test from that PR as obvious after verifying that it works on cross-ppc compiler and on x86-64. Andrey Index: gcc.dg/pr83852.c === *** gcc.dg/pr83852.c(revision 0) --- gcc.dg/pr83852.c(revision 259373) *** *** 0 --- 1,33 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-std=gnu99 -O2 -fselective-scheduling -fno-if-conversion -fno-tree-dse -w" } */ + long long int uo; + unsigned int vt; + + void + r5 (long long int h8, long long int pu) + { + short int wj; + long long int *mh = h8; + + for (wj = 0; wj < 3; ++wj) + { + int oq; + long long int ns, xf; + + h8 += 2; + oq = !!h8 && !!wj; + ++uo; + vt ^= oq + uo; + ns = !!uo && !!vt; + xf = (h8 != 0) ? mh : 1; + pu += ns < xf; + } + + for (pu = 0; pu < 1; ++pu) + { + int *sc; + + sc = (int *) + *sc = 0; + } + } Index: ChangeLog === *** ChangeLog (revision 259372) --- ChangeLog (revision 259373) *** *** 1,3 --- 1,8 + 2018-04-13 Andrey Belevantsev+ + PR rtl-optimization/83852 + * gcc.dg/pr83852.c: New testcase. + 2018-04-13 Andreas Krebbel PR testsuite/85326
Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure
On Fri, Apr 13, 2018 at 11:19:35AM +0100, Jonathan Wakely wrote: > --- a/libstdc++-v3/src/c++11/Makefile.am > +++ b/libstdc++-v3/src/c++11/Makefile.am > @@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc > > if ENABLE_DUAL_ABI > # Rewrite the type info for __ios_failure. > -rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \ > - -e 'n' \ > - -e > 's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \ > - -e '}' > +rewrite_ios_failure_typeinfo = sed -e > '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' I miss / in between , and _ZTVN10__cxxabiv120__si_class_type_infoE $ echo | sed -e '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_info_ZTVSt19__iosfail_type_info/' sed: -e expression #1, char 29: unexpected `,' $ echo | sed -e '/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' Jakub
Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure
Darwin has double underscores at the start of mangled names, so this fixes the sed command to be more flexible. Committed to trunk and gcc-7-branch. commit f80944837b4c21016d826bff5f497ceda85b9894 Author: Jonathan WakelyDate: Fri Apr 13 10:44:08 2018 +0100 Fix __iosfail_type_info hack to work on darwin * src/c++11/Makefile.am: Rewrite sed rule to be less fragile and to handle mangled names starting with double underscores on darwin. * src/c++11/Makefile.in: Regenerate. diff --git a/libstdc++-v3/src/c++11/Makefile.am b/libstdc++-v3/src/c++11/Makefile.am index 8d524b67232..12bc004a2ea 100644 --- a/libstdc++-v3/src/c++11/Makefile.am +++ b/libstdc++-v3/src/c++11/Makefile.am @@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc if ENABLE_DUAL_ABI # Rewrite the type info for __ios_failure. -rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \ - -e 'n' \ - -e 's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \ - -e '}' +rewrite_ios_failure_typeinfo = sed -e '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' cxx11-ios_failure-lt.s: cxx11-ios_failure.cc $(LTCXXCOMPILE) -S $< -o tmp-cxx11-ios_failure-lt.s
[Committed] IBM Z: Get rid of target specific C++ testcase
gcc/testsuite/ChangeLog: 2018-04-13 Andreas KrebbelPR testsuite/85326 * gcc.target/s390/pr77822-1.C: Rename to ... * gcc.target/s390/pr77822-1.c: ... this. Add asm scan check. * gcc.target/s390/pr77822-2.c: Add asm scan check. * gcc.target/s390/s390.exp: Remove C from testcase regexps. --- .../gcc.target/s390/{pr77822-1.C => pr77822-1.c} | 16 +--- gcc/testsuite/gcc.target/s390/pr77822-2.c| 2 ++ gcc/testsuite/gcc.target/s390/s390.exp | 13 - 3 files changed, 19 insertions(+), 12 deletions(-) rename gcc/testsuite/gcc.target/s390/{pr77822-1.C => pr77822-1.c} (62%) diff --git a/gcc/testsuite/gcc.target/s390/pr77822-1.C b/gcc/testsuite/gcc.target/s390/pr77822-1.c similarity index 62% rename from gcc/testsuite/gcc.target/s390/pr77822-1.C rename to gcc/testsuite/gcc.target/s390/pr77822-1.c index bd5a9b4..9bf7bf4 100644 --- a/gcc/testsuite/gcc.target/s390/pr77822-1.C +++ b/gcc/testsuite/gcc.target/s390/pr77822-1.c @@ -3,15 +3,15 @@ /* { dg-do compile } */ /* { dg-options "-O3 -march=zEC12" } */ -class A { - void m_fn1(); - char m_datawidth; - char m_subunits; - int m_subunit_infos[]; -}; +void m_fn1(); + +char m_datawidth; +char m_subunits; +int m_subunit_infos[1]; + int a; long b; -void A::m_fn1() { +void m_fn1() { int c = 32, d = m_datawidth / c; for (int e = 0; e < d; e++) { int f = e * 32; @@ -19,3 +19,5 @@ void A::m_fn1() { m_subunit_infos[m_subunits] = a; } } + +/* { dg-final { scan-assembler-not "risbg.*-\[0-9\]+\\\+1\n" } } */ diff --git a/gcc/testsuite/gcc.target/s390/pr77822-2.c b/gcc/testsuite/gcc.target/s390/pr77822-2.c index 6789152..9a0fad2 100644 --- a/gcc/testsuite/gcc.target/s390/pr77822-2.c +++ b/gcc/testsuite/gcc.target/s390/pr77822-2.c @@ -305,3 +305,5 @@ void sizepos_c_13 (signed char b) if (b >> 13 & 1) g = b; } + +/* { dg-final { scan-assembler-not "risbg.*-\[0-9\]+\\\+1\n" } } */ diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index bb13bfd..93c570a 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -199,23 +199,26 @@ dg-init set md_tests $srcdir/$subdir/md/*.c +# C++ tests belong into g++.dg with a target check. Do NOT add C to +# these regexps! + # Main loop. -dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.{c,S,C}] \ +dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.{c,S}] \ $md_tests]] "" $DEFAULT_CFLAGS -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.{c,S,C}]] \ +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.{c,S}]] \ "" $DEFAULT_CFLAGS -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/target-attribute/*.{c,S,C}]] \ +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/target-attribute/*.{c,S}]] \ "" $DEFAULT_CFLAGS -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/arch12/*.{c,S,C}]] \ +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/arch12/*.{c,S}]] \ "" "-O3 -march=arch12 -mzarch" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vxe/*.{c,S}]] \ "" "-O3 -march=arch12 -mzarch" -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.{c,S,C}]] \ +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.{c,S}]] \ "" $DEFAULT_CFLAGS # Additional hotpatch torture tests. -- 2.9.1
Re: [PATCH] Fix -fsanitize=address VLA instrumentation (PR sanitizer/85230)
On 04/13/2018 01:16 AM, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, we need to unpoison the red zones when leaving a > scope with VLA variable(s); this is done through __asan_allocas_unpoison > call, unfortunately it is called after the __builtin_stack_restore which > restores the stack pointer; now, if an interrupt comes in between the stack > restore and the __asan_allocas_unpoison call, the interrupt handler might > have some stack bytes marked as red zones in the shadow memory and might > diagnose sanitizing error even when there is none in the original program. > > The following patch ought to fix this by swapping the two calls, so we first > unpoison and only after it is unpoisoned in shadow memory release the stack. > The second argument to the __asan_allocas_unpoison call is meant to > be virtual_dynamic_stack_rtx after the __builtin_stack_restore, i.e. the new > stack_pointer_rtx value + STACK_DYNAMIC_OFFSET (current_function_decl). > As the STACK_DYNAMIC_OFFSET value isn't known until the vregs pass, the code > used a hack where it ignored the second argument and replaced it by > virtual_dynamic_stack_rtx. With the asan.c change below this doesn't work > anymore, because virtual_dynamic_stack_rtx aka stack_pointer_rtx + > STACK_DYNAMIC_OFFSET (current_function_decl) before the > __builtin_stack_restore is a different value. The patch instead uses the > argument passed to the __asan_allocas_unpoison at GIMPLE time, which is the > same as passed to __builtin_stack_restore; this is the new stack_pointer_rtx > value after __builtin_stack_restore. And, because we don't want that value, > but that + STACK_DYNAMIC_OFFSET (current_function_decl), we compute > arg1 + (virtual_dynamic_stack_rtx - stack_pointer_rtx) and let CSE/combiner > optimize it into arg1 (on targets like x86_64 where STACK_DYNAMIC_OFFSET can > be even 0 when not accumulating outgoing args or when that size is 0) or > arg1 + some_constant. > > Bootstrapped on > {x86_64,i686,powerpc64,powerpc64le,aarch64,s390x,armv7hl}-linux, regtested > on {x86_64,i686,powerpc64,powerpc64le}-linux so far, but on the power* ones > on virtual address space size that isn't really supported (likely > https://github.com/google/sanitizers/issues/933#issuecomment-380058705 > issue, so while nothing regresses there, pretty much all asan tests fail > there before and after the patch); also tested successfully with > asan.exp=alloca* on gcc110 and gcc112 on compile farm where it doesn't > suffer from the VA issue. Ok if testing passes also on aarch64, s390x > and armv7hl? OK with me, thanks. -Maxim > 2018-04-12 Jakub Jelinek> > PR sanitizer/85230 > * asan.c (handle_builtin_stack_restore): Adjust comment. Emit > __asan_allocas_unpoison call and last_alloca_addr = new_sp before > __builtin_stack_restore rather than after it. > * builtins.c (expand_asan_emit_allocas_unpoison): Pass > arg1 + (virtual_dynamic_stack_rtx - stack_pointer_rtx) as second > argument instead of virtual_dynamic_stack_rtx. > > --- gcc/asan.c.jj 2018-01-09 21:53:38.821577722 +0100 > +++ gcc/asan.c2018-04-12 13:22:59.166095523 +0200 > @@ -554,14 +554,14 @@ get_last_alloca_addr () > return last_alloca_addr; > } > > -/* Insert __asan_allocas_unpoison (top, bottom) call after > +/* Insert __asan_allocas_unpoison (top, bottom) call before > __builtin_stack_restore (new_sp) call. > The pseudocode of this routine should look like this: > - __builtin_stack_restore (new_sp); >top = last_alloca_addr; >bot = new_sp; >__asan_allocas_unpoison (top, bot); >last_alloca_addr = new_sp; > + __builtin_stack_restore (new_sp); > In general, we can't use new_sp as bot parameter because on some > architectures SP has non zero offset from dynamic stack area. Moreover, > on > some architectures this offset (STACK_DYNAMIC_OFFSET) becomes known for > each > @@ -570,9 +570,8 @@ get_last_alloca_addr () > > http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK. > To overcome the issue we use following trick: pass new_sp as a second > parameter to __asan_allocas_unpoison and rewrite it during expansion with > - virtual_dynamic_stack_rtx later in expand_asan_emit_allocas_unpoison > - function. > -*/ > + new_sp + (virtual_dynamic_stack_rtx - sp) later in > + expand_asan_emit_allocas_unpoison function. */ > > static void > handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter) > @@ -584,9 +583,9 @@ handle_builtin_stack_restore (gcall *cal > tree restored_stack = gimple_call_arg (call, 0); > tree fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCAS_UNPOISON); > gimple *g = gimple_build_call (fn, 2, last_alloca, restored_stack); > - gsi_insert_after (iter, g, GSI_NEW_STMT); > + gsi_insert_before (iter, g, GSI_SAME_STMT); > g = gimple_build_assign (last_alloca,
[C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")
Hi, in this error-recovery regression, after a sensible error message issued by cxx_constant_init, store_init_value stores an error_mark_node as DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much later, to cause a crash during gimplification. As far as I know, the choice of storing error_mark_nodes too is the outcome of a rather delicate error-recovery strategy and I don't think we want to revisit it at this time, thus the remaining option is catching later the error_mark_node, at a "good" time. I note, in passing, that the do loop in gimplify_expr which uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the error-recovery point of view, because at each iteration it *does* cover for error_operand_p (save_expr) but only immediately after the call, when it's too late. All the above said, I believe that at least for the 8.1.0 needs we may want to catch the error_mark_node in cp_build_modify_expr, when we are handling the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form, with the error_mark_node as the first operand. Passes testing on x86_64-linux. Thanks, Paolo. / /cp 2018-04-13 Paolo CarliniPR c++/85112 * typeck.c (cp_build_modify_expr): Return error_mark_node upon an error_mark_node wrapped in a NOP_EXPR too. /testsuite 2018-04-13 Paolo Carlini PR c++/85112 * g++.dg/cpp0x/pr85112.C: New. Index: cp/typeck.c === --- cp/typeck.c (revision 259359) +++ cp/typeck.c (working copy) @@ -8234,7 +8234,9 @@ cp_build_modify_expr (location_t loc, tree lhs, en TREE_OPERAND (newrhs, 0)); } - if (newrhs == error_mark_node) + if (newrhs == error_mark_node + || (TREE_CODE (newrhs) == NOP_EXPR + && TREE_OPERAND (newrhs, 0) == error_mark_node)) return error_mark_node; if (c_dialect_objc () && flag_objc_gc) Index: testsuite/g++.dg/cpp0x/pr85112.C === --- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent) +++ testsuite/g++.dg/cpp0x/pr85112.C(working copy) @@ -0,0 +1,17 @@ +// PR c++/85112 +// { dg-do compile { target c++11 } } + +struct A +{ + int m; + int n : 4; +}; + +int i; // { dg-message "not const" } + +void foo() +{ + constexpr int j = i; // { dg-error "not usable" } + A a; + a.n = j; +}
Be more permissive for always inlining across target attribute changes
Hi, this patch makes it possible to always inline across target attribute changes when doing so will not lead to incorrect code. We used to be permissive here with default options and overly restrictie without. This fixes most common anoyances seen with these, but not all (i.e. zen) Bootstrapped/regteste x86_64-linux, commited. Honza PR lto/71991 * config/i386/i386.c (ix86_can_inline_p): Allow safe transitions for always inline. * gcc.target/i386/pr71991.c: New testcase. Index: config/i386/i386.c === --- config/i386/i386.c (revision 259345) +++ config/i386/i386.c (working copy) @@ -5766,6 +5766,19 @@ ix86_can_inline_p (tree caller, tree cal { tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); + + /* Changes of those flags can be tolerated for always inlines. Lets hope + user knows what he is doing. */ + const unsigned HOST_WIDE_INT always_inline_safe_mask += (MASK_USE_8BIT_IDIV | MASK_ACCUMULATE_OUTGOING_ARGS + | MASK_NO_ALIGN_STRINGOPS | MASK_AVX256_SPLIT_UNALIGNED_LOAD + | MASK_AVX256_SPLIT_UNALIGNED_STORE | MASK_CLD + | MASK_NO_FANCY_MATH_387 | MASK_IEEE_FP | MASK_INLINE_ALL_STRINGOPS + | MASK_INLINE_STRINGOPS_DYNAMICALLY | MASK_RECIP | MASK_STACK_PROBE + | MASK_STV | MASK_TLS_DIRECT_SEG_REFS | MASK_VZEROUPPER + | MASK_NO_PUSH_ARGS | MASK_OMIT_LEAF_FRAME_POINTER); + + if (!callee_tree) callee_tree = target_option_default_node; if (!caller_tree) @@ -5776,6 +5789,10 @@ ix86_can_inline_p (tree caller, tree cal struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); bool ret = false; + bool always_inline = + (DECL_DISREGARD_INLINE_LIMITS (callee) + && lookup_attribute ("always_inline", + DECL_ATTRIBUTES (callee))); /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 function can inline a SSE2 function but a SSE2 function can't inline @@ -5787,14 +5804,17 @@ ix86_can_inline_p (tree caller, tree cal ret = false; /* See if we have the same non-isa options. */ - else if (caller_opts->x_target_flags != callee_opts->x_target_flags) + else if ((!always_inline + && caller_opts->x_target_flags != callee_opts->x_target_flags) + || (caller_opts->x_target_flags & ~always_inline_safe_mask) + != (callee_opts->x_target_flags & ~always_inline_safe_mask)) ret = false; /* See if arch, tune, etc. are the same. */ else if (caller_opts->arch != callee_opts->arch) ret = false; - else if (caller_opts->tune != callee_opts->tune) + else if (!always_inline && caller_opts->tune != callee_opts->tune) ret = false; else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath @@ -5807,7 +5827,8 @@ ix86_can_inline_p (tree caller, tree cal (cgraph_node::get (callee))->fp_expressions)) ret = false; - else if (caller_opts->branch_cost != callee_opts->branch_cost) + else if (!always_inline + && caller_opts->branch_cost != callee_opts->branch_cost) ret = false; else Index: testsuite/gcc.target/i386/pr71991.c === --- testsuite/gcc.target/i386/pr71991.c (revision 0) +++ testsuite/gcc.target/i386/pr71991.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +static inline __attribute__ ((__always_inline__)) int fn1 () { return 0; } +static __attribute__ ((target ("inline-all-stringops"))) int fn2 () { fn1 (); } + +int main() +{ + fn2(); +} +
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)
On April 13, 2018 12:35:54 AM GMT+02:00, Jakub Jelinekwrote: >On Thu, Apr 12, 2018 at 07:37:22PM +0200, Jakub Jelinek wrote: >> On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote: >> > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy >got added, >> > > in 2013 other power versions, in 2016 s390*, etc. Doing a decent >mempcpy >> > > isn't hard if you have asm version of memcpy and one spare >register. >> > >> > More mempcpy implementations have been added in recent years >indeed, but almost all >> > add an extra copy of the memcpy code rather than using a single >combined implementation. >> > That means it is still better to call memcpy (which is frequently >used and thus likely in L1/L2) >> > rather than mempcpy (which is more likely to be cold and thus not >cached). >> >> That really depends, usually when some app uses mempcpy, it uses it >very >> heavily. And all the proposed patches do is honor what the user >asked, if >> you use memcpy () + n, we aren't transforming that into mempcpy >behind the >> user's back. >> >> Anyway, here is what I think Richard was asking for, that I'm >currently >> bootstrapping/regtesting. It can be easily combined with Martin's >target >> hook if needed, or do it only for >> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp) >> etc. >> >> 2018-04-12 Martin Liska >> Jakub Jelinek >> >> PR middle-end/81657 >> * expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET. >> * expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET. >> * builtins.c (expand_builtin_memory_copy_args): Use >> BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target, >> handle dest_addr == pc_rtx. >> >> * gcc.dg/string-opt-1.c: Remove bogus comment. Expect a mempcpy >> call. > >Successfully bootstrapped/regtested on x86_64-linux and i686-linux. OK. Thanks, Richard. > Jakub
Fix gcc.dg/debug/pr41893-1.c with Solaris ld (PR lto/81968)
The last LTO early debug related failure on Solaris 11 is FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors) FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess errors) FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess errors) Excess errors: ld: fatal: relocation error: R_386_32: file /var/tmp//ccyfLclbdebugobjtem: section [6].rel.debug_macro: symbol .debug_macro (section): symbol has been discarded with discarded section: [7].debug_macro As detailed in the PR, the rewritten objects lto-wrapper passes to ld -r for a partial link violate the ELF gABI access rules for COMDAT sections. On closer inspection, however, the input objects do so, too. Solaris ld has heuristics to relax the rules in objects produced by GCC, which trigger on the presence of a .comment section containing the string "GCC: (GNU)". This same relaxation can be enabled with -z relaxreloc/-z relax=comdat if need be. Right now, simple_object_elf_copy_lto_debug_sections doesn't copy .comment sections, so the heuristic doesn't trigger. Fixed trivially by keeping .comment sections in the output. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11. Approved by Richard in the PR, installed on mainline. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-04-11 Rainer OrthPR lto/81968 * simple-object.c (handle_lto_debug_sections): Keep .comment section. # HG changeset patch # Parent 6314816f4181dd47c34310bc29a5996d925c9b8d Fix gcc.dg/debug/pr41893-1.c with Solaris ld (PR lto/81968) libiberty: PR lto/81968 * simple-object.c (handle_lto_debug_sections): Keep .comment section. diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c --- a/libiberty/simple-object.c +++ b/libiberty/simple-object.c @@ -284,6 +284,11 @@ handle_lto_debug_sections (const char *n /* Copy over .note.GNU-stack section under the same name if present. */ else if (strcmp (name, ".note.GNU-stack") == 0) return strcpy (newname, name); + /* Copy over .comment section under the same name if present. Solaris + ld uses them to relax its checking of ELF gABI access rules for + COMDAT sections in objects produced by GCC. */ + else if (strcmp (name, ".comment") == 0) +return strcpy (newname, name); return NULL; }
Re: [Ping, Fortran, Patch, PR81773, PR83606, coarray, v1] Fix coarray get to array with vector indexing
Ping On Sun, 8 Apr 2018 14:25:50 +0200 Andre Vehreschildwrote: > Hi all, > > attached patch fixes (to my knowledge) the two PRs 81773 and 83606 where the > result of a coarray get() operation is assigned to an array using a vector as > index. It took me quite long to get it right, because I had to use the > scalarizer which I haven't use directly before. So reviewers with expertise on > using the scalarizer are especially welcome. > > Bootstrapped and regtested on x86_64-linux/f27. > > Ok for trunk? Backports? > > Regards, > Andre -- Andre Vehreschild * Email: vehre ad gmx dot de gcc/fortran/ChangeLog: 2018-04-08 Andre Vehreschild PR fortran/81773 PR fortran/83606 * dependency.c (gfc_dep_resolver): Coarray indexes are to be ignored during dependency computation. They define no data dependency. * trans-array.c (conv_array_index_offset): The stride can not be set here, prevent fail. * trans-intrinsic.c (conv_caf_send): Add creation of temporary array for caf_get's result and copying to the array with vectorial indexing. gcc/testsuite/ChangeLog: 2018-04-08 Andre Vehreschild PR fortran/81773 PR fortran/83606 * gfortran.dg/coarray/get_to_indexed_array_1.f90: New test. * gfortran.dg/coarray/get_to_indirect_array.f90: New test. diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c index a0bbd584947..3e14ddc25d8 100644 --- a/gcc/fortran/dependency.c +++ b/gcc/fortran/dependency.c @@ -2238,8 +2238,9 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse) break; /* Exactly matching and forward overlapping ranges don't cause a - dependency. */ - if (fin_dep < GFC_DEP_BACKWARD) + dependency, when they are not part of a coarray ref. */ + if (fin_dep < GFC_DEP_BACKWARD + && lref->u.ar.codimen == 0 && rref->u.ar.codimen == 0) return 0; /* Keep checking. We only have a dependency if diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index bd731689031..b68e77d5281 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -3215,7 +3215,7 @@ conv_array_index_offset (gfc_se * se, gfc_ss * ss, int dim, int i, } /* Multiply by the stride. */ - if (!integer_onep (stride)) + if (stride != NULL && !integer_onep (stride)) index = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, index, stride); diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index a45aec708fb..00edd447bb2 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -1907,34 +1907,124 @@ conv_caf_send (gfc_code *code) { } else { - /* If has_vector, pass descriptor for whole array and the - vector bounds separately. */ - gfc_array_ref *ar, ar2; - bool has_vector = false; + bool has_vector = gfc_has_vector_subscript (lhs_expr); - if (gfc_is_coindexed (lhs_expr) && gfc_has_vector_subscript (lhs_expr)) + if (gfc_is_coindexed (lhs_expr) || !has_vector) { - has_vector = true; - ar = gfc_find_array_ref (lhs_expr); - ar2 = *ar; - memset (ar, '\0', sizeof (*ar)); - ar->as = ar2.as; - ar->type = AR_FULL; + /* If has_vector, pass descriptor for whole array and the + vector bounds separately. */ + gfc_array_ref *ar, ar2; + bool has_tmp_lhs_array = false; + if (has_vector) + { + has_tmp_lhs_array = true; + ar = gfc_find_array_ref (lhs_expr); + ar2 = *ar; + memset (ar, '\0', sizeof (*ar)); + ar->as = ar2.as; + ar->type = AR_FULL; + } + lhs_se.want_pointer = 1; + gfc_conv_expr_descriptor (_se, lhs_expr); + /* Using gfc_conv_expr_descriptor, we only get the descriptor, but + that has the wrong type if component references are done. */ + lhs_type = gfc_typenode_for_spec (_expr->ts); + tmp = build_fold_indirect_ref_loc (input_location, lhs_se.expr); + gfc_add_modify (_se.pre, gfc_conv_descriptor_dtype (tmp), + gfc_get_dtype_rank_type (has_vector ? ar2.dimen + : lhs_expr->rank, + lhs_type)); + if (has_tmp_lhs_array) + { + vec = conv_caf_vector_subscript (, lhs_se.expr, ); + *ar = ar2; + } } - lhs_se.want_pointer = 1; - gfc_conv_expr_descriptor (_se, lhs_expr); - /* Using gfc_conv_expr_descriptor, we only get the descriptor, but that - has the wrong type if component references are done. */ - lhs_type = gfc_typenode_for_spec (_expr->ts); - tmp = build_fold_indirect_ref_loc (input_location, lhs_se.expr); - gfc_add_modify (_se.pre, gfc_conv_descriptor_dtype (tmp), - gfc_get_dtype_rank_type (has_vector ? ar2.dimen - : lhs_expr->rank, - lhs_type)); - if (has_vector) + else { - vec = conv_caf_vector_subscript (,
[PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64
From: Vladimir MezentsevWhen weakref_targets is not empty a target cannot be removed from weak_decls. A small example is below when 'wv12' is removed from the weak list on aarch64: static vtype Wv12 __attribute__((weakref ("wv12"))); extern vtype wv12 __attribute__((weak)); Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). Tested on aarch64-linux-gnu. No regression. The attr-weakref-1.c test passed. ChangeLog: 2018-04-12 Vladimir Mezentsev PR gcc/84923 * varasm.c (weak_finish): clean up weak_decls --- gcc/varasm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/varasm.c b/gcc/varasm.c index d24bac4..2a70234 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -5683,8 +5683,7 @@ weak_finish (void) nor multiple .weak directives for the latter. */ for (p = _decls; (t2 = *p) ; ) { - if (TREE_VALUE (t2) == alias_decl - || target == DECL_ASSEMBLER_NAME (TREE_VALUE (t2))) + if (TREE_VALUE (t2) == alias_decl) *p = TREE_CHAIN (t2); else p = _CHAIN (t2); -- 1.8.3.1