Re: [RFC] PR64703, glibc sysdeps/powerpc/powerpc64/dl-machine.h miscompile
On Mon, Jan 26, 2015 at 10:11:14AM +0100, Richard Biener wrote: On Sat, Jan 24, 2015 at 12:23 AM, Alan Modra amo...@gmail.com wrote: How does this look as a potential fix for PR64703? I haven't made many forays into gimple code, so even though this patch passes bootstrap and regression testing on powerpc64-linux it's quite possible this is the wrong place to change. If it does look to be OK, then I'll fill out the targetm changes, include a testcase etc. It looks mostly ok, comments below. Thanks for looking! PR target/64703 * tree-ssa-alias.c (pt_solution_includes_base): New function, extracted from.. (ref_maybe_used_by_call_p_1): ..here. Delete dead code checking for NULL return from ao_ref_base. Handle potential memory reference by indirect calls on targets using function descriptors. Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 220025) +++ gcc/tree-ssa-alias.c(working copy) @@ -1532,6 +1532,23 @@ refs_output_dependent_p (tree store1, tree store2) return refs_may_alias_p_1 (r1, r2, false); } +static bool +pt_solution_includes_base (struct pt_solution *pt, tree base) Needs a comment. Sure, that was part of etc. above. ;) +{ + if (DECL_P (base)) +return pt_solution_includes (pt, base); + + if ((TREE_CODE (base) == MEM_REF + || TREE_CODE (base) == TARGET_MEM_REF) + TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME) +{ + struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)); + if (pi) + return pt_solutions_intersect (pt, pi-pt); +} + return true; +} + /* If the call CALL may use the memory reference REF return true, otherwise return false. */ @@ -1542,15 +1559,24 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *r unsigned i; int flags = gimple_call_flags (call); + base = ao_ref_base (ref); You dropped the if (!base) return true; check - please put it back. Hmm, calls to ao_ref_base in tree-ssa-alias.c are a mixed bag. Some check the return for NULL, others don't. All the checks for NULL are dead code since ao_ref_base never returns NULL. OK, I'll put it back and leave cleanup for another day. + callee = gimple_call_fn (call); + if (callee TREE_CODE (callee) == SSA_NAME Do we never propagate the address of a function descriptor here? That is, can we generate a testcase like descr fn; setup fn foo (fn); void foo (fn) { (*fn) (a, b); } and then inline foo so the call becomes (*fn) (a, b); ? We'd then still implicitely read from 'fn'. I also wonder whether (and where!) we resolve such a descriptor reference to the actual function on GIMPLE? Well, if we did end up with a direct call then the value in 'fn' won't matter, I think. A direct call implies gcc knows the value of a function pointer, and can thus use the value rather than the pointer. In this case it implies gcc has looked through 'fn' to its initialization, and seen that 'fn' is a function address. ie. your setup fn above is something like fn = *(descr *) some_function; Now it appears that gcc isn't clever enough to do this, but if it did, then surely the value of the pointer would be 'some_function', not 'fn'. I can't see how we could end up with a direct call any other way. As far as I know, gcc doesn't have any knowledge that 'descr' has anything to do with a function address unless that knowledge is imparted by an initialization such as the above. ie. The answer to your and where! question is nowhere. That is - don't you simply want to use if (targetm.function_descriptors ptr_deref_mayalias_ref_p_1 (callee, ref)) return true; here? No. I don't want to do needless work on direct calls, particularly since it appears that ptr_deref_may_alias_ref_p_1 does return true for direct calls like memcpy. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] Fix ICE due to invalid thunk (PR ipa/64776)
On Mon, 26 Jan 2015, Jakub Jelinek wrote: Hi! On x86_64-darwin, we ICE on one of the pr64307.c testcase, because expand_thunk doesn't load non-gimple_val arguments into registers for the first argument, only for all the other ones. Supposedly normally thunks were meant to have this argument as pointer first and thus it wasn't an issue, but in the -O0 -fipa-icf case a thunk is created even for a non-method. This patch fixes it by special-casing the first argument only if this_adjusting - then we know it is a pointer that is being adjusted. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2015-01-26 Jakub Jelinek ja...@redhat.com PR ipa/64776 * cgraphunit.c (cgraph_node::expand_thunk): If not this_adjusting, handle the first argument in the same loop as all the other arguments. --- gcc/cgraphunit.c.jj 2015-01-15 14:05:05.0 +0100 +++ gcc/cgraphunit.c 2015-01-26 17:26:18.629818527 +0100 @@ -1610,14 +1610,18 @@ cgraph_node::expand_thunk (bool output_a for (arg = a; arg; arg = DECL_CHAIN (arg)) nargs++; auto_vectree vargs (nargs); + i = 0; + arg = a; if (this_adjusting) -vargs.quick_push (thunk_adjust (bsi, a, 1, fixed_offset, - virtual_offset)); - else if (nargs) -vargs.quick_push (a); + { + vargs.quick_push (thunk_adjust (bsi, a, 1, fixed_offset, + virtual_offset)); + arg = DECL_CHAIN (a); + i = 1; + } if (nargs) -for (i = 1, arg = DECL_CHAIN (a); i nargs; i++, arg = DECL_CHAIN (arg)) + for (; i nargs; i++, arg = DECL_CHAIN (arg)) { tree tmp = arg; if (!is_gimple_val (arg)) Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
[PATCH] Fix PR64798
The new exceptional EH allocator failed to align exception objects properly (it ended up aligning to __alignof__((std::size_t))). The following fixes that by aligning to what __attribute__((aligned)) would align to (this is what _Unwind_Exception is aligned to, a member of __cxa_refcounted_exception). Bootstrapped and tested on x86_64-unknown-linux-gnu - Rainer is testing this on sparc-solaris where it broke g++.old-deja/g++.eh/badalloc1.C. Ok for trunk? Thanks, Richard. 2015-01-27 Richard Biener rguent...@suse.de PR libstdc++/64798 * libsupc++/eh_alloc.cc (struct allocated_entry): Align data member. (pool::allocate): Adjust allocation size and alignment to that change. (pool::free): Adjust pointer offsetting. Index: libstdc++-v3/libsupc++/eh_alloc.cc === --- libstdc++-v3/libsupc++/eh_alloc.cc (revision 220164) +++ libstdc++-v3/libsupc++/eh_alloc.cc (working copy) @@ -94,7 +94,7 @@ namespace }; struct allocated_entry { std::size_t size; - char data[]; + char data[] __attribute__((aligned)); }; // A single mutex controlling emergency allocations. @@ -133,17 +133,18 @@ namespace void *pool::allocate (std::size_t size) { __gnu_cxx::__scoped_lock sentry(emergency_mutex); - // We need an additional size_t member. - size += sizeof (std::size_t); + // We need an additional size_t member plus the padding to + // ensure proper alignment of data. + size += offsetof (allocated_entry, data); // And we need to at least hand out objects of the size of // a freelist entry. if (size sizeof (free_entry)) size = sizeof (free_entry); - // And we need to align objects we hand out to the required - // alignment of a freelist entry (this really aligns the + // And we need to align objects we hand out to the maximum + // alignment required on the target (this really aligns the // tail which will become a new freelist entry). - size = ((size + __alignof__(free_entry) - 1) - ~(__alignof__(free_entry) - 1)); + size = ((size + __alignof__ (allocated_entry::data) - 1) + ~(__alignof__ (allocated_entry::data) - 1)); // Search for an entry of proper size on the freelist. free_entry **e; for (e = first_free_entry; @@ -185,7 +186,7 @@ namespace { __gnu_cxx::__scoped_lock sentry(emergency_mutex); allocated_entry *e = reinterpret_cast allocated_entry * - (reinterpret_cast char * (data) - sizeof (std::size_t)); + (reinterpret_cast char * (data) - offsetof (allocated_entry, data)); std::size_t sz = e-size; if (!first_free_entry) {
Re: RFA: patch to fix a bad code generation for PR64110 -- new constraints addition
Jeff Law l...@redhat.com writes: On 01/24/15 04:29, Richard Sandiford wrote: Yeah. I expect in practice most people who used ? and ! attached them to a particular operand for a reason. From a quick scan through 386.exp it looked like almost all uses would either want this behaviour or wouldn't care. An interesting exception is: (define_insn extendsidi2_1 [(set (match_operand:DI 0 nonimmediate_operand =*A,r,?r,?*o) (sign_extend:DI (match_operand:SI 1 register_operand 0,0,r,r))) (clobber (reg:CC FLAGS_REG)) (clobber (match_scratch:SI 2 =X,X,X,r))] !TARGET_64BIT #) I don't know how effective the third alternative is with LRA. Surely a r-0 alternative is by definition a case where r-r is possible only with a ?-cost reload? Seems to me we could just delete it. But assuming it does some good, I suppose the ? really does apply to the alternative as a whole. If we had to reload operand 1 or operand 0, there's an extra cost if it can't use the same register as the other operand. Wouldn't it be better to make ? and ! behave the new way and only add new constraints if it turns out that the old behaviour really is useful in some cases? Maybe stage 4 isn't the time to be making that kind of change. Still, it'd be great if someone who's set up do x86_64 benchmarking could measure the effect of making ? and ! behave like the new constraints. My worry isn't the x86_64 port, but all the others that folks don't test as regularly. I'd rather go the other direction, have folks familiar with the port go through it changing the constraints where it makes sense. That just seems a hell of a lot safer. A port maintainer could certainly hack something together for testing purposes to guide them as to whether or not there's something to be gained by converting many/most of the ?! to the new constraints. Yeah, but in practice that's only ever going to be a partial transition. Many port maintainers won't look at this, so we'll have to support both versions indefinitely, even if the new behaviour turns out to be the best for all cases. I just think we're going to regret having two sets of constraints with such subtly different meanings. Looking back at the original PR, Jakub said: The ! has been added by me for PR63594, so it isn't there from the era when i?86 backend was using reload. If there is a better way to express that RA should prefer to use memory or xmm register and only use r constraint if it already is in a r register and doesn't need to be reloaded, I can use that. Whether it is ?, ??? or something else. ! description in gcc docs just fitted most what I wanted... In some ways this seems to match the intention of *. Originally I think it was just an RA-only thing and was ignored by reload, but LRA does take it into account too (which sounds like progress to me). If I revert the patch locally and change the *vec_dupmode pattern to use *, it passes both the test for PR64110 and the tests for PR63594. Would that be OK as an alternative? Thanks, Richard
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Thanks, Julian
Re: [Patch, Fortran, OOP] PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component
On Tue, Jan 27, 2015 at 07:20:10PM +0100, Janus Weil wrote: 2015-01-27 10:30 GMT+01:00 Jakub Jelinek ja...@redhat.com: Yeah, if you want to add ubsan tests, you need to add gfortran.dg/ubsan/ directory and hack up ubsan.exp in there Thanks for the remark, I was suspecting something like that. However, for this case it's not really worth the hassle. In fact the test case does not really need the sanitizer and should also work without it. So I'll just remove the -fsanitize option: Index: gcc/testsuite/gfortran.dg/class_allocate_18.f90 === --- gcc/testsuite/gfortran.dg/class_allocate_18.f90(Revision 220180) +++ gcc/testsuite/gfortran.dg/class_allocate_18.f90(Arbeitskopie) @@ -1,5 +1,4 @@ ! { dg-do run } -! { dg-options -fsanitize=undefined } ! ! PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component ! LGTM. Jakub
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On 01/26/15 22:11, Segher Boessenkool wrote: On Mon, Jan 26, 2015 at 08:07:29PM -0700, Jeff Law wrote: The second change we need is an additional simplification. If we have (subreg:M1 (zero_extend:M2 (x)) Where M1 M2 and both are scalar integer modes. It's advantageous to strip the SUBREG and instead have a wider extension. Should you also check M1 is not multiple registers? We're generally working with pseudos, so we could estimate, but not know for sure if we're dealing with multiple hard regs. But more importantly, I'm not sure what that check would buy us. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Thoughts? It looks fine to me. Well, some comments... @@ -2643,6 +2644,24 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, || GET_CODE (src) == LSHIFTRT) nshift++; } + + /* If I0 loads a memory and I3 sets the same memory, then I2 and I3 +are likely manipulating its value. Ideally we'll be able to combine +all four insns into a bitfield insertion of some kind. + +Note the source in I0 might be inside a sign/zero extension and the +memory modes in I0 and I3 might be different. So extract the address +from the destination of I3 and search for it in the source of I0. + +In the event that there's a match but the source/dest do not actually +refer to the same memory, the worst that happens is we try some +combinations that we wouldn't have otherwise. */ + if ((set0 = single_set (i0)) + (set3 = single_set (i3)) + GET_CODE (SET_DEST (set3)) == MEM + rtx_referenced_p (XEXP (SET_DEST (set3), 0), SET_SRC (set0))) + ngood += 2; I think you should test MEM_P (SET_SRC (set0)), too. Or even just test rtx_equal_p (SET_DEST (set3), SET_SRC (set0)) ? Yea, we need a tighter test on set0 to ensure it's a MEM. That code got twidded before the last testrun. I'll take care of that. Earlier versions checked reg_equal_p on the MEM. But that's often a mistake because the modes of the two memory references may be different. I don't recall which of the various tests, but I was definitely seeing SImode in the load and HImode in the store. Similarly you don't want to check reg_equal_p on the addresses as they aren't necessarily the same either (they're obviously related). That's how I ultimately settled on rtx_referenced_p form you see above. I'm still not sure that's 100% what I want, but I don't have any tests yet which require something more complex. + if (ngood 2 nshift 2) return 0; } @@ -5663,6 +5682,25 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest, return CONST0_RTX (mode); } + /* If we have (subreg:M1 (zero_extend:M2 (x))) or +(subreg:M1 (sign_extend: M2 (x))) where M1 is wider +then M2, then go ahead and just widen the original extension. + +While the subreg is useful in saying I don't care about those +upper bits. Squashing out the subreg results in simpler RTL that +is more easily matched. */ Closing quote missing. Fixed locally. + if ((GET_CODE (SUBREG_REG (x)) == ZERO_EXTEND + || GET_CODE (SUBREG_REG (x)) == SIGN_EXTEND) + SCALAR_INT_MODE_P (GET_MODE (x)) + SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (x))) + GET_MODE (x) GET_MODE (SUBREG_REG (x))) GET_MODE_SIZE instead? It's work as-is. But using GET_MODE_SIZE shows the intent clearer. I'll fix that momentarily. Does this do anything good for the dec mem thing on x86? That would be a nice bonus :-) It might, but I haven't tested for that specifically. If you've got sample code or a PR in mind, pass it along and I'll take a look. I'd think dec mem would generally be handled by 3-1 insn combination code unless there's something else going on. jef
[RFC PATCH] Emit DW_LANG_Fortran{03,08}
Hi! DW_LANG_Fortran03 and DW_LANG_Fortran08 DW_AT_language values were recently accepted into DWARF5. This patch changes GCC to handle those similarly to how e.g. the -std=c++11, -std=c++14 or -std=c11 are handled. As it will take some time for consumers to catch up, I'm enabling that only if -gdwarf-5 is used for now. 2015-01-27 Jakub Jelinek ja...@redhat.com * dwarf2.h (enum dwarf_source_language): Add DW_LANG_Fortran03 and DW_LANG_Fortran08. * dwarf2out.c (is_fortran): Also return true for DW_LANG_Fortran03 or DW_LANG_Fortran08. (lower_bound_default): Return 1 for DW_LANG_Fortran03 or DW_LANG_Fortran08. (gen_compile_unit_die): Handle GNU Fortran2003 and GNU Fortran2008 language strings. * dbxout.c (get_lang_number): Use lang_GNU_Fortran. * langhooks.h (lang_GNU_Fortran): New prototype. * langhooks.c (lang_GNU_Fortran): New function. fortran/ * options.c: Include langhooks.h. (gfc_post_options): Change lang_hooks.name based on selected -std= mode. --- include/dwarf2.h.jj 2014-11-26 20:35:01.0 +0100 +++ include/dwarf2.h2015-01-27 17:55:18.086122137 +0100 @@ -312,6 +312,8 @@ enum dwarf_source_language DW_LANG_C_plus_plus_11 = 0x001a, /* dwarf5.20141029.pdf DRAFT */ DW_LANG_C11 = 0x001d, DW_LANG_C_plus_plus_14 = 0x0021, +DW_LANG_Fortran03 = 0x0022, +DW_LANG_Fortran08 = 0x0023, DW_LANG_lo_user = 0x8000, /* Implementation-defined range start. */ DW_LANG_hi_user = 0x, /* Implementation-defined range start. */ --- gcc/dwarf2out.c.jj 2015-01-27 17:54:13.0 +0100 +++ gcc/dwarf2out.c 2015-01-27 19:03:30.632411565 +0100 @@ -4736,7 +4736,9 @@ is_fortran (void) return (lang == DW_LANG_Fortran77 || lang == DW_LANG_Fortran90 - || lang == DW_LANG_Fortran95); + || lang == DW_LANG_Fortran95 + || lang == DW_LANG_Fortran03 + || lang == DW_LANG_Fortran08); } /* Return TRUE if the language is Ada. */ @@ -16720,6 +16722,8 @@ lower_bound_default (void) case DW_LANG_Fortran77: case DW_LANG_Fortran90: case DW_LANG_Fortran95: +case DW_LANG_Fortran03: +case DW_LANG_Fortran08: return 1; case DW_LANG_UPC: case DW_LANG_D: @@ -19781,8 +19785,17 @@ gen_compile_unit_die (const char *filena { if (strcmp (language_string, GNU Ada) == 0) language = DW_LANG_Ada95; - else if (strcmp (language_string, GNU Fortran) == 0) - language = DW_LANG_Fortran95; + else if (strncmp (language_string, GNU Fortran, 11) == 0) + { + language = DW_LANG_Fortran95; + if (dwarf_version = 5 /* || !dwarf_strict */) + { + if (strcmp (language_string, GNU Fortran2003) == 0) + language = DW_LANG_Fortran03; + else if (strcmp (language_string, GNU Fortran2008) == 0) + language = DW_LANG_Fortran08; + } + } else if (strcmp (language_string, GNU Java) == 0) language = DW_LANG_Java; else if (strcmp (language_string, GNU Objective-C) == 0) @@ -19796,7 +19809,7 @@ gen_compile_unit_die (const char *filena } } /* Use a degraded Fortran setting in strict DWARF2 so is_fortran works. */ - else if (strcmp (language_string, GNU Fortran) == 0) + else if (strncmp (language_string, GNU Fortran, 11) == 0) language = DW_LANG_Fortran90; add_AT_unsigned (die, DW_AT_language, language); @@ -19806,6 +19819,8 @@ gen_compile_unit_die (const char *filena case DW_LANG_Fortran77: case DW_LANG_Fortran90: case DW_LANG_Fortran95: +case DW_LANG_Fortran03: +case DW_LANG_Fortran08: /* Fortran has case insensitive identifiers and the front-end lowercases everything. */ add_AT_unsigned (die, DW_AT_identifier_case, DW_ID_down_case); --- gcc/dbxout.c.jj 2015-01-15 20:25:30.0 +0100 +++ gcc/dbxout.c2015-01-27 18:58:58.286033152 +0100 @@ -967,7 +967,7 @@ get_lang_number (void) return N_SO_CC; else if (strcmp (language_string, GNU F77) == 0) return N_SO_FORTRAN; - else if (strcmp (language_string, GNU Fortran) == 0) + else if (lang_GNU_Fortran ()) return N_SO_FORTRAN90; /* CHECKME */ else if (strcmp (language_string, GNU Pascal) == 0) return N_SO_PASCAL; --- gcc/langhooks.c.jj 2015-01-09 21:59:54.0 +0100 +++ gcc/langhooks.c 2015-01-27 18:58:37.375387995 +0100 @@ -731,3 +731,11 @@ lang_GNU_CXX (void) { return strncmp (lang_hooks.name, GNU C++, 7) == 0; } + +/* Returns true if the current lang_hooks represents the GNU Fortran frontend. */ + +bool +lang_GNU_Fortran (void) +{ + return strncmp (lang_hooks.name, GNU Fortran, 11) == 0; +} --- gcc/langhooks.h.jj 2015-01-05 13:07:13.0 +0100 +++ gcc/langhooks.h 2015-01-27 18:57:51.139172602 +0100 @@ -509,5 +509,6 @@ extern tree add_builtin_type (const char extern bool lang_GNU_C
Re: [RFC PATCH] Emit DW_LANG_Fortran{03,08}
On Tue, 2015-01-27 at 19:19 +0100, Jakub Jelinek wrote: Hi! DW_LANG_Fortran03 and DW_LANG_Fortran08 DW_AT_language values were recently accepted into DWARF5. This patch changes GCC to handle those similarly to how e.g. the -std=c++11, -std=c++14 or -std=c11 are handled. As it will take some time for consumers to catch up, I'm enabling that only if -gdwarf-5 is used for now. 2015-01-27 Jakub Jelinek ja...@redhat.com * dwarf2.h (enum dwarf_source_language): Add DW_LANG_Fortran03 and DW_LANG_Fortran08. * dwarf2out.c (is_fortran): Also return true for DW_LANG_Fortran03 or DW_LANG_Fortran08. (lower_bound_default): Return 1 for DW_LANG_Fortran03 or DW_LANG_Fortran08. (gen_compile_unit_die): Handle GNU Fortran2003 and GNU Fortran2008 language strings. * dbxout.c (get_lang_number): Use lang_GNU_Fortran. * langhooks.h (lang_GNU_Fortran): New prototype. * langhooks.c (lang_GNU_Fortran): New function. fortran/ * options.c: Include langhooks.h. (gfc_post_options): Change lang_hooks.name based on selected -std= mode. (...snip...) --- gcc/fortran/options.c.jj 2015-01-12 21:29:11.0 +0100 +++ gcc/fortran/options.c 2015-01-27 19:07:33.729285229 +0100 @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. #include cpp.h #include diagnostic.h /* For global_dc. */ #include tm.h +#include langhooks.h gfc_option_t gfc_option; @@ -398,6 +399,11 @@ gfc_post_options (const char **pfilename gfc_cpp_post_options (); + if (gfc_option.allow_std GFC_STD_F2008) +lang_hooks.name = GNU Fortran2008; + else if (gfc_option.allow_std GFC_STD_F2003) +lang_hooks.name = GNU Fortran2003; + Did you test this on rs6000? In particular, rs6000_output_function_epilogue has a: else if (! strcmp (language_string, GNU F77) || ! strcmp (language_string, GNU Fortran)) i = 1; Does that conditional need updating to track the langhooks.name change (maybe to use your new lang_GNU_Fortran function?) Dave
Re: [PATCH][2/2] Improve array-bound warnings and VRP
Richard Biener wrote: On Mon, 26 Jan 2015, Jakub Jelinek wrote: Then it probably should be ok. I'm really afraid of emitting more warnings with such high false positive rate now. As the patch also mitigates some of the code bloat we get with the complete peeling (regression against 4.7) I have installed it. It's also the easiest vehicle to verify range-info is not broken by passes between vrp1 and vrp2. You could make warnings appear only for warn_array_bounds 1 if there are concerns about false positives. For what it's worth, I tested the old version of both patches on one of my projects (mostly numerical algorithms) and it did not produce additional warnings. I really appreciate all improvements in this area. Martin
Re: [RFC PATCH] Emit DW_LANG_Fortran{03,08}
Jakub Jelinek wrote: DW_LANG_Fortran03 and DW_LANG_Fortran08 DW_AT_language values were recently accepted into DWARF5. This patch changes GCC to handle those similarly to how e.g. the -std=c++11, -std=c++14 or -std=c11 are handled. For completeness: gfortran currently produces GNU Fortran and DW_LANG_Fortran95; GCC itself also handles ...Fortran77 and ...Fortran90, but those are not produced with gfortran. With the patch, it produces for -gdwarf-2/3/4 (4 is default) or -gdwarf-5 -std=f95 the same as above. For -std=f2003 -gdwarf-5, it yields GNU Fortran2003 and DW_LANG_Fortran2003. And for -gdwarf-5 and the rest of -std= (f2008, f2008ts, gnu, legacy), it produces GNU Fortran2008 and DW_LANG_Fortran2008. (In principle, they could have prepared for the future and added Fortran 2015 as well.) Regarding the change: it is fine with me. (However, I wonder how much will break, once the || !dwarf_strict is enabled, knowing that compilers are often more frequently updated as debuggers, valgrind and similar programs. On the other, except of debuggers, most tools should care much about the DW_LANG.) Tobias PS: Talking about DWARF5, do you know when it will be available as public draft? I am especially looking forward to http://dwarfstd.org/ShowIssue.php?issue=121221.1 (Allow DW_AT_type with DW_TAG_string_type), which would be a low-hanging fruit in terms of implementation. Contrary to the array additions of 130313.5. As it will take some time for consumers to catch up, I'm enabling that only if -gdwarf-5 is used for now. 2015-01-27 Jakub Jelinek ja...@redhat.com * dwarf2.h (enum dwarf_source_language): Add DW_LANG_Fortran03 and DW_LANG_Fortran08. * dwarf2out.c (is_fortran): Also return true for DW_LANG_Fortran03 or DW_LANG_Fortran08. (lower_bound_default): Return 1 for DW_LANG_Fortran03 or DW_LANG_Fortran08. (gen_compile_unit_die): Handle GNU Fortran2003 and GNU Fortran2008 language strings. * dbxout.c (get_lang_number): Use lang_GNU_Fortran. * langhooks.h (lang_GNU_Fortran): New prototype. * langhooks.c (lang_GNU_Fortran): New function. fortran/ * options.c: Include langhooks.h. (gfc_post_options): Change lang_hooks.name based on selected -std= mode.
Re: [Patch, Fortran] PR64771 - Fix coarray ICE
Tobias Burnus wrote: This one compiles just as well, of course. From my side, that patch (using MAX) is fine. Thanks for bearing the bootstrap failure and for the patch. I have now committed it (i.e. Rainer's patch) as Rev. 220182. I have also committed the fixed-up/combined patch to the 4.9 branch as Rev. 220184. (BTW: The original patch was approved by Paul on IRC.) Tobias
Re: [Patch, Fortran, OOP] PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component
On Tue, Jan 27, 2015 at 10:24:47AM +0100, Andreas Schwab wrote: 2015-01-19 Janus Weil ja...@gcc.gnu.org PR fortran/64230 * gfortran.dg/class_allocate_18.f90: Extended. FAIL: gfortran.dg/class_allocate_18.f90 -O0 (test for excess errors) Excess errors: /usr/ia64-suse-linux/bin/ld: cannot find -lubsan Sorry for the breakage, guys! 2015-01-27 10:30 GMT+01:00 Jakub Jelinek ja...@redhat.com: Yeah, if you want to add ubsan tests, you need to add gfortran.dg/ubsan/ directory and hack up ubsan.exp in there Thanks for the remark, I was suspecting something like that. However, for this case it's not really worth the hassle. In fact the test case does not really need the sanitizer and should also work without it. So I'll just remove the -fsanitize option: Index: gcc/testsuite/gfortran.dg/class_allocate_18.f90 === --- gcc/testsuite/gfortran.dg/class_allocate_18.f90(Revision 220180) +++ gcc/testsuite/gfortran.dg/class_allocate_18.f90(Arbeitskopie) @@ -1,5 +1,4 @@ ! { dg-do run } -! { dg-options -fsanitize=undefined } ! ! PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component ! Cheers, Janus
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Thomas, Any plans to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64635 soon? On x86_64 darwin, the OpenACC merge resulted a huge number of failures in the libgomp test suite… === libgomp Summary === # of expected passes 10628 # of unexpected failures 724 # of unsupported tests 562 which are resolved with a fix similar to https://gcc.gnu.org/bugzilla/attachment.cgi?id=34480. Jack On Mon, Jan 26, 2015 at 8:44 AM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! Sorry for the late answer -- I've been on sick leave, and just now returning to work. Julian, would you please have a look at the following issues? In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin iver...@gmail.com wrote: On 17 Jan 02:16, Ilya Verbin wrote: Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Sorry for that! Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) So, you don't assume that a device can have multiple images from multiple libs? Ping? This probably is just a bug that we introduced with our changes? (Julian?) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? Currently I'm trying to rebase on trunk my old patch, which fixes offloading from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX. Grüße, Thomas
Re: [Patch, Fortran, OOP] PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component
2015-01-27 19:23 GMT+01:00 Jakub Jelinek ja...@redhat.com: On Tue, Jan 27, 2015 at 07:20:10PM +0100, Janus Weil wrote: 2015-01-27 10:30 GMT+01:00 Jakub Jelinek ja...@redhat.com: Yeah, if you want to add ubsan tests, you need to add gfortran.dg/ubsan/ directory and hack up ubsan.exp in there Thanks for the remark, I was suspecting something like that. However, for this case it's not really worth the hassle. In fact the test case does not really need the sanitizer and should also work without it. So I'll just remove the -fsanitize option: Index: gcc/testsuite/gfortran.dg/class_allocate_18.f90 === --- gcc/testsuite/gfortran.dg/class_allocate_18.f90(Revision 220180) +++ gcc/testsuite/gfortran.dg/class_allocate_18.f90(Arbeitskopie) @@ -1,5 +1,4 @@ ! { dg-do run } -! { dg-options -fsanitize=undefined } ! ! PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component ! LGTM. Good, committed as r220181. Since I had already backported the original patch to 4.9 yesterday, I'll do the same there ... Cheers, Janus
Re: [RFC PATCH] Emit DW_LANG_Fortran{03,08}
On Tue, Jan 27, 2015 at 01:52:12PM -0500, David Malcolm wrote: @@ -398,6 +399,11 @@ gfc_post_options (const char **pfilename gfc_cpp_post_options (); + if (gfc_option.allow_std GFC_STD_F2008) +lang_hooks.name = GNU Fortran2008; + else if (gfc_option.allow_std GFC_STD_F2003) +lang_hooks.name = GNU Fortran2003; + Did you test this on rs6000? In particular, rs6000_output_function_epilogue has a: else if (! strcmp (language_string, GNU F77) || ! strcmp (language_string, GNU Fortran)) i = 1; You're right, missed that. Consider that changed to lang_GNU_Fortran (). Jakub
C++ PATCH for c++/63889 (ICE with member variable template)
We were trying to instantiate is_ok with only the innermost set of template arguments; we need to make sure that the outer args are provided as well. Tested x86_64-pc-linux-gnu, applying to trunk. commit e2df55ffbe254dfc15801a204af16d012aeb4cb5 Author: Jason Merrill ja...@redhat.com Date: Mon Jan 26 10:55:42 2015 -0500 PR c++/63889 * pt.c (finish_template_variable): Move from semantics.c. Handle multiple template arg levels. Handle coercion here. (lookup_template_variable): Not here. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index bc26530..d377daa 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8091,13 +8091,28 @@ tree lookup_template_variable (tree templ, tree arglist) { tree type = unknown_type_node; - tsubst_flags_t complain = tf_warning_or_error; - tree parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS (templ)); - arglist = coerce_template_parms (parms, arglist, templ, complain, - /*req_all*/true, /*use_default*/true); return build2 (TEMPLATE_ID_EXPR, type, templ, arglist); } +/* Instantiate a variable declaration from a TEMPLATE_ID_EXPR for use. */ + +tree +finish_template_variable (tree var) +{ + tree templ = TREE_OPERAND (var, 0); + + tree arglist = TREE_OPERAND (var, 1); + tree tmpl_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (templ)); + arglist = add_outermost_template_args (tmpl_args, arglist); + + tree parms = DECL_TEMPLATE_PARMS (templ); + tsubst_flags_t complain = tf_warning_or_error; + arglist = coerce_innermost_template_parms (parms, arglist, templ, complain, + /*req_all*/true, + /*use_default*/true); + + return instantiate_template (templ, arglist, complain); +} struct pair_fn_data { diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 915048d..75aa501 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2454,15 +2454,6 @@ finish_call_expr (tree fn, vectree, va_gc **args, bool disallow_virtual, return result; } -/* Instantiate a variable declaration from a TEMPLATE_ID_EXPR for use. */ - -tree -finish_template_variable (tree var) -{ - return instantiate_template (TREE_OPERAND (var, 0), TREE_OPERAND (var, 1), - tf_error); -} - /* Finish a call to a postfix increment or decrement or EXPR. (Which is indicated by CODE, which should be POSTINCREMENT_EXPR or POSTDECREMENT_EXPR.) */ diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ22.C b/gcc/testsuite/g++.dg/cpp1y/var-templ22.C new file mode 100644 index 000..9ddc925 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ22.C @@ -0,0 +1,14 @@ +// PR c++/63889 +// { dg-do compile { target c++14 } } + +templateclass T +struct A +{ + templateclass + static constexpr bool is_ok = true; + + templatebool v = is_okT + A(T) { } +}; + +Aint p(42);
Re: [Patch, Fortran] PR63861 - fix OpenMP/ACC's gfc_has_alloc_comps
On Tue, Jan 27, 2015 at 08:27:07AM +0100, Tobias Burnus wrote: 2015-01-27 Tobias Burnus bur...@net-b.de PR fortran/63861 gcc/fortran/ * trans-openmp.c (gfc_has_alloc_comps, gfc_trans_omp_clauses): Fix handling for scalar coarrays. * trans-types.c (gfc_get_element_type): Add comment. gcc/testsuite/ * gfortran.dg/goacc/coarray_2.f90: New. Ok, thanks. Jakub
Re: [PATCH] Fix ICE during ipa dumping (PR ipa/64730)
On Mon, 26 Jan 2015, Jakub Jelinek wrote: Hi! On various targets, %s in fprintf can't handle NULL arguments, and even when edge-call_stmt is non-NULL, it still might have UNKNOWN_LOCATION or BUILTINS_LOCATION, which have NULL filename. In this particular case it is a fnsplit created call. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2015-01-26 Jakub Jelinek ja...@redhat.com PR ipa/64730 * ipa-inline.c (inline_small_functions): Print unknown even if edge-call_stmt is non-NULL, but has builtins or unknown location. --- gcc/ipa-inline.c.jj 2015-01-22 21:45:18.0 +0100 +++ gcc/ipa-inline.c 2015-01-26 15:41:57.193640527 +0100 @@ -1822,6 +1822,9 @@ inline_small_functions (void) Estimated badness is %f, frequency %.2f.\n, edge-caller-name (), edge-caller-order, edge-call_stmt + (LOCATION_LOCUS (gimple_location ((const_gimple) + edge-call_stmt)) + BUILTINS_LOCATION) ? gimple_filename ((const_gimple) edge-call_stmt) : unknown, edge-call_stmt Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
Re: [Patch, Fortran, OOP] PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component
On Tue, Jan 27, 2015 at 10:24:47AM +0100, Andreas Schwab wrote: Janus Weil ja...@gcc.gnu.org writes: 2015-01-19 Janus Weil ja...@gcc.gnu.org PR fortran/64230 * gfortran.dg/class_allocate_18.f90: Extended. FAIL: gfortran.dg/class_allocate_18.f90 -O0 (test for excess errors) Excess errors: /usr/ia64-suse-linux/bin/ld: cannot find -lubsan Yeah, if you want to add ubsan tests, you need to add gfortran.dg/ubsan/ directory and hack up ubsan.exp in there, from say gcc.dg/ubsan/ubsan.exp and gfortran.dg/dg.exp. Jakub
Re: [PATCH] Update BBs in cleanup_barriers pass (PR rtl-optimization/61058)
Yes, they do, that is why it crashed during final. OK. Why wouldn't it work to call reorder_insns instead of reorder_insns_nobb? -- Eric Botcazou
[PATCH] S/390: -mhotpatch v2
The attached patch updates the -mhotpatch option and the hopatch function attribute with (incompatible) new semantics. Please refer to the commit in the patch for details. -- 2015-01-27 Dominik Vogt v...@linux.vnet.ibm.com * doc/extend.texi: s/390: Update documentation of hotpatch attribute. * doc/invoke.texi (-mhotpatch): s/390: Update documentation of -mhotpatch= option. * config/s390/s390.opt (mhotpatch): s/390: Remove -mhotpatch and -mno-hotpatch options. Change syntax of -mhotpatch= option. * config/s390/s390.c (s390_hotpatch_trampoline_halfwords_default): Renamed. (s390_hotpatch_trampoline_halfwords_max): Renamed. (s390_hotpatch_hw_max): New name. (s390_hotpatch_trampoline_halfwords): Renamed. (s390_hotpatch_hw_before_label): New name. (get_hotpatch_attribute): Removed. (s390_hotpatch_hw_after_label): New name. (s390_handle_hotpatch_attribute): Add second parameter to hotpatch attribute. (s390_attribute_table): Ditto. (s390_function_num_hotpatch_trampoline_halfwords): Renamed. (s390_function_num_hotpatch_hw): New name. Remove special handling of inline functions and hotpatching. Return number of nops before and after the function label. (s390_can_inline_p): Removed. (s390_asm_output_function_label): Emit a configurable number of nops after the function label. (s390_option_override): Update -mhotpatch= syntax and remove -mhotpatch. (TARGET_CAN_INLINE_P) Removed. (TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): New. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany From 9123265bb1d6e325f4edc99a2d1f33a862b3ba53 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Mon, 1 Dec 2014 15:59:42 +0100 Subject: [PATCH] S/390: -mhotpatch v2 Update the -mhotpatch option and the hotpatch function attribute to take exactly two arguments. The first is the number of halfwords to be filled with two-byte-nops before the function label. The second is the number of halfwords to be filled with nops after the label (the biggest available nop instructions are used). Further changes are: * Artificial functions and the main function are also patched. * Functions selected for hotpatching can still be inlined. It's the responsibility of the user to take care of this when patching, or to explicitly disable inlining. --- gcc/config/s390/s390.c | 227 - gcc/config/s390/s390.opt | 12 +- gcc/doc/extend.texi| 17 +- gcc/doc/invoke.texi| 16 +- gcc/testsuite/gcc.target/s390/hotpatch-1.c | 14 +- gcc/testsuite/gcc.target/s390/hotpatch-10.c| 15 +- gcc/testsuite/gcc.target/s390/hotpatch-11.c| 12 +- gcc/testsuite/gcc.target/s390/hotpatch-12.c| 14 +- gcc/testsuite/gcc.target/s390/hotpatch-13.c| 17 ++ gcc/testsuite/gcc.target/s390/hotpatch-14.c| 17 ++ gcc/testsuite/gcc.target/s390/hotpatch-15.c| 17 ++ gcc/testsuite/gcc.target/s390/hotpatch-16.c| 17 ++ gcc/testsuite/gcc.target/s390/hotpatch-17.c| 17 ++ gcc/testsuite/gcc.target/s390/hotpatch-18.c| 16 ++ gcc/testsuite/gcc.target/s390/hotpatch-19.c| 23 +++ gcc/testsuite/gcc.target/s390/hotpatch-2.c | 12 +- gcc/testsuite/gcc.target/s390/hotpatch-20.c| 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-3.c | 10 +- gcc/testsuite/gcc.target/s390/hotpatch-4.c | 18 +- gcc/testsuite/gcc.target/s390/hotpatch-5.c | 15 +- gcc/testsuite/gcc.target/s390/hotpatch-6.c | 13 +- gcc/testsuite/gcc.target/s390/hotpatch-7.c | 13 +- gcc/testsuite/gcc.target/s390/hotpatch-8.c | 24 +-- gcc/testsuite/gcc.target/s390/hotpatch-9.c | 15 +- gcc/testsuite/gcc.target/s390/hotpatch-compile-1.c | 24 +-- .../gcc.target/s390/hotpatch-compile-10.c | 12 ++ .../gcc.target/s390/hotpatch-compile-11.c | 12 ++ .../gcc.target/s390/hotpatch-compile-12.c | 12 ++ .../gcc.target/s390/hotpatch-compile-13.c | 29 +++ .../gcc.target/s390/hotpatch-compile-14.c | 11 + .../gcc.target/s390/hotpatch-compile-15.c | 43 .../gcc.target/s390/hotpatch-compile-16.c | 24 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-2.c | 24 +-- gcc/testsuite/gcc.target/s390/hotpatch-compile-3.c | 24 +-- gcc/testsuite/gcc.target/s390/hotpatch-compile-4.c | 2 +- gcc/testsuite/gcc.target/s390/hotpatch-compile-5.c | 23 +-- gcc/testsuite/gcc.target/s390/hotpatch-compile-6.c | 4 +- gcc/testsuite/gcc.target/s390/hotpatch-compile-7.c | 66 +- gcc/testsuite/gcc.target/s390/hotpatch-compile-8.c | 23 +-- gcc/testsuite/gcc.target/s390/hotpatch-compile-9.c | 12 ++ 40 files changed, 532 insertions(+), 404
Re: [PATCH] Fix PR64277
On Tue, 27 Jan 2015, Jakub Jelinek wrote: On Tue, Jan 27, 2015 at 10:25:48AM +0100, Richard Biener wrote: This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? So nothing in the testsuite needed to change? Nice. Yes. Ok for trunk. I'll search for duplicates and add a few testcases. Thanks. Committed as follows (first testcase in PR59124 not fixed - it warns from the first pass). 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/56273 PR tree-optimization/59124 PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. * g++.dg/warn/Warray-bounds-6.C: New testcase. * gcc.dg/Warray-bounds-12.c: Likewise. * gcc.dg/Warray-bounds-13.c: Likewise. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2015-01-27 10:34:26.453743828 +0100 --- gcc/tree-vrp.c 2015-01-27 10:43:04.970610102 +0100 *** vrp_finalize (void) *** 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds) check_all_array_refs (); /* We must identify jump threading opportunities before we release --- 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release Index: gcc/testsuite/g++.dg/warn/Warray-bounds-6.C === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/g++.dg/warn/Warray-bounds-6.C 2015-01-27 10:40:31.311871855 +0100 *** *** 0 --- 1,26 + // { dg-do compile } + // { dg-options -O3 -Warray-bounds } + + struct type { + bool a, b; + bool get_b() { return b; } + }; + + type stuff[9u]; + + void bar(); + + void foo() + { + for(unsigned i = 0u; i 9u; i++) + { + if(!stuff[i].a) + continue; + + bar(); + + for(unsigned j = i + 1u; j 9u; j++) + if(stuff[j].a stuff[j].get_b()) // { dg-bogus above array bounds } + return; + } + } Index: gcc/testsuite/gcc.dg/Warray-bounds-12.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-12.c 2015-01-27 10:40:58.196175989 +0100 *** *** 0 --- 1,26 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + /* { dg-additional-options -mssse3 { target x86_64-*-* i?86-*-* } } */ + + void foo(short a[], short m) + { + int i, j; + int f1[10]; + short nc; + + nc = m + 1; + if (nc 3) + { + for (i = 0; i = nc; i++) + { + f1[i] = f1[i] + 1; + } + } + + for (i = 0, j = m; i nc; i++, j--) + { + a[i] = f1[i]; /* { dg-bogus above array bounds } */ + a[j] = i; + } + return; + } Index: gcc/testsuite/gcc.dg/Warray-bounds-13.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-13.c 2015-01-27 10:42:43.738369929 +0100 *** *** 0 --- 1,18 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + + extern char *bar[17]; + + int foo(int argc, char **argv) + { + int i; + int n = 0; + + for (i = 0; i argc; i++) + n++; + + for (i = 0; i argc; i++) + argv[i] = bar[i + n]; /* { dg-bogus above array bounds } */ + + return 0; + }
Re: [PATCH][AArch64] Use target builtin instead of __builtin_sqrt for vsqrt_f64
On 19/01/15 15:46, Kyrill Tkachov wrote: On 19/01/15 15:44, James Greenhalgh wrote: On Mon, Jan 12, 2015 at 05:30:46PM +, Andrew Pinski wrote: On Mon, Jan 12, 2015 at 7:52 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, As raised in https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01237.html and discussed in that thread, using __builtin_sqrt for vsqrt_f64 may end up in a call to the library sqrt at -O0. To avoid that this patch uses a target builtin for sqrt on DF mode and uses that to implement the intrinsic. With this patch I don't see sqrt calls being created at -O0 on a large arm_neon.h testcase where they were generated before. aarch64-none-elf testing and the intrinsics testsuite in particular are clean. Ok for trunk? Maybe have a target fold which folds this into sqrt if -fno-math-errno is supplied. This might be useful the -ffast-math case. Maybe also fold it when a constant is supplied too. Given that we are now in Stage 4, I'd rather see this fixed for GCC 5.0 in the way Kyrill proposed than languishing on a TODO list. Though an IOU ticket on bugzilla for the missed optimization seems a good idea to me. Unless Kyrill already has something in the works to address your comment, this looks like the right short-term solution to me (Though Marcus/Richard will have to approve it). Sorry, this slipped through the cracks. I agree with James. A missed-optimization issue on bugzilla would be helpful to keep track of this. I've filed PR 64821 to keep track of this for GCC 6. Can I ping https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00710.html then? It's a regression fix at -O0 so should be appropriate for stage4 Thanks, Kyrill Kyrill Thanks, James 2015-01-12 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-simd-builtins.def (sqrt): Use BUILTIN_VDQF_DF. * config/aarch64/arm_neon.h (vsqrt_f64): Use __builtin_aarch64_sqrtdf instead of __builtin_sqrt.
Re: [PATCH] Fix PR64277
2015-01-27 12:47 GMT+03:00 Richard Biener rguent...@suse.de: On Tue, 27 Jan 2015, Jakub Jelinek wrote: On Tue, Jan 27, 2015 at 10:25:48AM +0100, Richard Biener wrote: This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? So nothing in the testsuite needed to change? Nice. Yes. Ok for trunk. I'll search for duplicates and add a few testcases. Thanks. Committed as follows (first testcase in PR59124 not fixed - it warns from the first pass). Are you going to port it to 4.9 branch? Thanks, Ilya 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/56273 PR tree-optimization/59124 PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. * g++.dg/warn/Warray-bounds-6.C: New testcase. * gcc.dg/Warray-bounds-12.c: Likewise. * gcc.dg/Warray-bounds-13.c: Likewise. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2015-01-27 10:34:26.453743828 +0100 --- gcc/tree-vrp.c 2015-01-27 10:43:04.970610102 +0100 *** vrp_finalize (void) *** 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds) check_all_array_refs (); /* We must identify jump threading opportunities before we release --- 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release Index: gcc/testsuite/g++.dg/warn/Warray-bounds-6.C === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/g++.dg/warn/Warray-bounds-6.C 2015-01-27 10:40:31.311871855 +0100 *** *** 0 --- 1,26 + // { dg-do compile } + // { dg-options -O3 -Warray-bounds } + + struct type { + bool a, b; + bool get_b() { return b; } + }; + + type stuff[9u]; + + void bar(); + + void foo() + { + for(unsigned i = 0u; i 9u; i++) + { + if(!stuff[i].a) + continue; + + bar(); + + for(unsigned j = i + 1u; j 9u; j++) + if(stuff[j].a stuff[j].get_b()) // { dg-bogus above array bounds } + return; + } + } Index: gcc/testsuite/gcc.dg/Warray-bounds-12.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-12.c 2015-01-27 10:40:58.196175989 +0100 *** *** 0 --- 1,26 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + /* { dg-additional-options -mssse3 { target x86_64-*-* i?86-*-* } } */ + + void foo(short a[], short m) + { + int i, j; + int f1[10]; + short nc; + + nc = m + 1; + if (nc 3) + { + for (i = 0; i = nc; i++) + { + f1[i] = f1[i] + 1; + } + } + + for (i = 0, j = m; i nc; i++, j--) + { + a[i] = f1[i]; /* { dg-bogus above array bounds } */ + a[j] = i; + } + return; + } Index: gcc/testsuite/gcc.dg/Warray-bounds-13.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-13.c 2015-01-27 10:42:43.738369929 +0100 *** *** 0 --- 1,18 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + + extern char *bar[17]; + + int foo(int argc, char **argv) + { + int i; + int n = 0; + + for (i = 0; i argc; i++) + n++; + + for (i = 0; i argc; i++) + argv[i] = bar[i + n]; /* { dg-bogus above array bounds } */ + + return 0; + }
Re: [PATCH] Fix for PR64741 (UBSan/ASan integration)
On Tue, Jan 27, 2015 at 09:19:20AM +0300, Yury Gribov wrote: As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64741 , ASan may currently report false positives for UBSan internal variables due to their incomplete type information. This patch fixes this. Bootstrapped and regtested on Linux x64. Ok to commit? -Y commit cf083510ece7b7bde1ab5a41e293b5a6a5bb4550 Author: Yury Gribov y.gri...@samsung.com Date: Mon Jan 26 10:19:03 2015 +0300 2015-01-26 Yury Gribov y.gri...@samsung.com PR ubsan/64741 * ubsan.c (ubsan_type_descriptor): Update type size. No extra newline between PR and * ubsan.c lines. --- a/gcc/ubsan.c +++ b/gcc/ubsan.c @@ -504,6 +504,14 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style pstyle) tinfo = get_ubsan_type_info_for_type (type); /* Create a new VAR_DECL of type descriptor. */ + const char *tmp = pp_formatted_text (pretty_name); + size_t len = strlen (tmp); + tree str = build_string (len + 1, tmp); + TREE_TYPE (str) = build_array_type (char_type_node, + build_index_type (size_int (len))); + TREE_READONLY (str) = 1; + TREE_STATIC (str) = 1; While touching this, could you please rewrite it as: const char *tmp = pp_formatted_text (pretty_name); size_t len = strlen (tmp) + 1; tree str = build_string (len, tmp); TREE_TYPE (str) = build_array_type_nelts (char_type_node, len); TREE_READONLY (str) = 1; TREE_STATIC (str) = 1; ? Or, if you want, do it as a follow-up. There is another occurrence of this in ubsan_source_location. Ok for trunk with or without this change. Jakub
Re: [PATCH] Update BBs in cleanup_barriers pass (PR rtl-optimization/61058)
Because reorder_insns doesn't handle the case of moving a barrier into a middle of basic block. Right, I should have read the audit trail. :-) The patch is OK then, but add a ??? note at the end of the comment saying that the proper thing to do here is probably not to run cleanup_barrier for this back-end. -- Eric Botcazou
Re: [PATCH] wide-int division fix (PR tree-optimization/64807)
On Mon, 26 Jan 2015, Jakub Jelinek wrote: Hi! On the following testcase we generate wrong code, because apparently divmod_internal_2 relies on 0 being the topmost element (at b_dividend[m]): algorithm. M is the number of significant elements of U however there needs to be at least one extra element of B_DIVIDEND allocated, N is the number of elements of B_DIVISOR. */ The comment talks just about allocation, but from the code it seems it really relies on it being 0. There is space for it: unsigned HOST_HALF_WIDE_INT b_dividend[(4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT) + 1]; unsigned HOST_HALF_WIDE_INT b_divisor[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; (the + 1), and usually there already is a zero in there: m = dividend_blocks_needed; while (m 1 b_dividend[m - 1] == 0) m--; so the only problematic case is if m isn't decreased. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2015-01-26 Jakub Jelinek ja...@redhat.com PR tree-optimization/64807 * wide-int.cc (wi::divmod_internal): Clear b_dividend[dividend_blocks_needed]. * gcc.dg/pr64807.c: New test. --- gcc/wide-int.cc.jj2015-01-09 21:59:38.0 +0100 +++ gcc/wide-int.cc 2015-01-26 19:21:56.114316481 +0100 @@ -1819,6 +1819,7 @@ wi::divmod_internal (HOST_WIDE_INT *quot divisor_blocks_needed, divisor_prec, sgn); m = dividend_blocks_needed; + b_dividend[m] = 0; while (m 1 b_dividend[m - 1] == 0) m--; --- gcc/testsuite/gcc.dg/pr64807.c.jj 2015-01-26 19:24:13.612943033 +0100 +++ gcc/testsuite/gcc.dg/pr64807.c2015-01-26 19:32:34.502237566 +0100 @@ -0,0 +1,19 @@ +/* PR tree-optimization/64807 */ +/* { dg-do run { target int128 } } */ +/* { dg-options -O2 } */ + +__uint128_t +foo (void) +{ + __uint128_t a = -1; + __uint128_t b = -1; + return a / b; +} + +int +main () +{ + if (foo () != 1) +__builtin_abort (); + return 0; +} Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
[PATCH, PR tree-optimization/64277] Improve loop iterations count estimation
Hi, This patch was supposed to fix PR tree-optimization/64277. Tracker is now fixed by warnings disabling but I think patch is still useful to avoid dead code generated by complete unroll. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Ilya -- gcc/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-loop-niter.c (record_nonwrapping_iv): Use base range info when possible to refine estimation. gcc/testsuite/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/pr64277.c: New. diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..0d5ef11 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,21 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options -O3 -Wall -Werror } */ + + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m 5) +m = 5; + for (i = m; i l; i--) +f1[i] = a[i]++; +} diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 919f5c0..6a55c6f 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, { tree niter_bound, extreme, delta; tree type = TREE_TYPE (base), unsigned_type; + tree orig_base = base; if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step)) return; @@ -2777,16 +2778,32 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, if (tree_int_cst_sign_bit (step)) { + wide_int min, max; extreme = fold_convert (unsigned_type, low); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (high) == INTEGER_CST + !POINTER_TYPE_P (TREE_TYPE (orig_base)) + SSA_NAME_RANGE_INFO (orig_base) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (wide_int (high), max)) + base = wide_int_to_tree (unsigned_type, max); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, high); delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme); step = fold_build1 (NEGATE_EXPR, unsigned_type, step); } else { + wide_int min, max; extreme = fold_convert (unsigned_type, high); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (low) == INTEGER_CST + !POINTER_TYPE_P (TREE_TYPE (orig_base)) + SSA_NAME_RANGE_INFO (orig_base) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (min, wide_int (low))) + base = wide_int_to_tree (unsigned_type, min); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, low); delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base); }
Re: [PATCH] Update BBs in cleanup_barriers pass (PR rtl-optimization/61058)
On Tue, Jan 27, 2015 at 09:25:32AM +0100, Eric Botcazou wrote: Yes, they do, that is why it crashed during final. OK. Why wouldn't it work to call reorder_insns instead of reorder_insns_nobb? Because reorder_insns doesn't handle the case of moving a barrier into a middle of basic block. if (!BARRIER_P (from) (bb2 = BLOCK_FOR_INSN (from))) { if (BB_END (bb2) == to) BB_END (bb2) = prev; df_set_bb_dirty (bb2); } if (BB_END (bb) == after) BB_END (bb) = to; for (x = from; x != NEXT_INSN (to); x = NEXT_INSN (x)) if (!BARRIER_P (x)) df_insn_change_bb (x, bb); from == to is a BARRIER in this case, BB_END (bb) != after (BB_END is actually PREV_INSN (from)), so this doesn't do anything at all. While what we need is: 1) set BB_END to after 2) clear BLOCK_FOR_INSN on the notes after AFTER (after addition of barrier after FROM == TO) until former PREV_INSN (FROM) (inclusive) Jakub
Re: [Patch, Fortran, OOP] PR 64230: [4.9/5 Regression] Invalid memory reference in a compiler-generated finalizer for allocatable component
Janus Weil ja...@gcc.gnu.org writes: 2015-01-19 Janus Weil ja...@gcc.gnu.org PR fortran/64230 * gfortran.dg/class_allocate_18.f90: Extended. FAIL: gfortran.dg/class_allocate_18.f90 -O0 (test for excess errors) Excess errors: /usr/ia64-suse-linux/bin/ld: cannot find -lubsan Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[PATCH] Fix PR64277
This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? I'll search for duplicates and add a few testcases. Thanks, Richard. 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 220107) +++ gcc/tree-vrp.c (working copy) @@ -10229,7 +10197,7 @@ vrp_finalize (void) substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); - if (warn_array_bounds) + if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release
Re: [PATCH] Fix PR64277
On Tue, Jan 27, 2015 at 10:25:48AM +0100, Richard Biener wrote: This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? So nothing in the testsuite needed to change? Nice. Ok for trunk. I'll search for duplicates and add a few testcases. Thanks. 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c(revision 220107) +++ gcc/tree-vrp.c(working copy) @@ -10229,7 +10197,7 @@ vrp_finalize (void) substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); - if (warn_array_bounds) + if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release Jakub
Re: [debug-early] C++ clones and limbo DIEs
On 01/23/2015 01:45 PM, Aldy Hernandez wrote: It would expect [the flush] to be before free_lang_data and LTO streaming. The reason this wouldn't make a difference is because, as it stands, dwarf for the clones are not generated until final.c: if (!DECL_IGNORED_P (current_function_decl)) debug_hooks-function_decl (current_function_decl); which happens after free_lang_data. I agree that the current code doesn't have this effect, but we're talking about changing things, right? :) Unfortunately, this sets DECL_ABSTRACT_P for the static_p above, and refuses to unset it after the call to dwarf2out_decl. Well, that sounds like a bug. Why isn't it being unset? Is it because DECL_ABSTRACT_P was already set for the function, so we don't call set_decl_abstract_flags (decl, 0)? Perhaps a solution to that would be to avoid calling set_decl_abstract_flags (decl, 1) if the function is already marked as abstract. Or to teach set_decl_abstract_flags not to mess with static local variables. Jason
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On Tue, Jan 27, 2015 at 01:53:34PM -0700, Jeff Law wrote: I do have a specific PR in mind, but I cannot currently find it. It was about x86, dec mem and then using the flags... Must have sent 100 emails in that thread... And cannot find it now! Are you referring to 61225? That is the one, thanks. Segher
C++ PATCH for c++/58597 (lambda in default arg)
Here, sometimes we can end up in maybe_add_lambda_conv_op with current_function_decl set but not cfun. If we push_function_context in that case, the later pop doesn't clear cfun, but leaves it with a value that leads to a crash later on. So let's avoid calling push_function_context in that case. Tested x86_64-pc-linux-gnu, applying to trunk. commit ce65568ba19c4613c25f48064a0d5e66454265ac Author: Jason Merrill ja...@redhat.com Date: Tue Jan 27 14:26:18 2015 -0500 PR c++/58597 * lambda.c (maybe_add_lambda_conv_op): Check cfun rather than current_function_decl. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 6c9e224..b160c8c 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -854,7 +854,7 @@ prepare_op_call (tree fn, int nargs) void maybe_add_lambda_conv_op (tree type) { - bool nested = (current_function_decl != NULL_TREE); + bool nested = (cfun != NULL); bool nested_def = decl_function_context (TYPE_MAIN_DECL (type)); tree callop = lambda_function (type); diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-defarg6.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-defarg6.C new file mode 100644 index 000..fe8767a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-defarg6.C @@ -0,0 +1,9 @@ +// PR c++/58597 +// { dg-do compile { target c++11 } } + +templatetypename struct A +{ + templatetypename T A(T, int = []{ return 0; }()) {} +}; + +Aint a = 0;
Bug 62044 - [4.8/4.9 Regression] ICE in USE statement with RENAME for extended derived type
Dear All, The highly embarrassing bug in mold = allocations to class entities has been fixed in revisions 220140 and 220191 for trunk and 4.9 respectively. The PR has been set as RESOLVED. Cheers Paul
[Patch, fortran] PR63205 - [OOP] Wrongly rejects type = class (for identical declared type)
Dear All, This patch enables the passing of an allocatable class object, scalar or array, to a derived type of the declared type, either in an assignment or as an actual argument. Much of the effort went into sorting out the finalization call so that the 'left over' allocatable components added by the dynamic type do not leak memory. At the moment, the existence of the finalization function is tested for. A check to see if the dynamic type is the same as the declared type could be added. Note that adding the 'must_finalize' field to gfc_expr will be useful in enabling the missing mandatory finalization calls. There are still interrogation marks about the patch; especially in build_class_array_ref, where I do not understand why the added code does not work in general, except for hidden function results. Nonetheless, the code does not leak memory, apart perhaps from the compound derived type constructors, with allocatable components that already show leaks elsewhere. It is also well ringfenced and so should not cause any regressions... touch wood! Bootstraps and regtests on x86_64/FC21 - OK for trunk? Paul 2015-01-27 Paul Thomas pa...@gcc.gnu.org PR fortran/63205 * gfortran.h: Add 'must finalize' field to gfc_expr and prototypes for gfc_is_alloc_class_scalar_function and for gfc_is_alloc_class_array_function. * expr.c (gfc_is_alloc_class_scalar_function, gfc_is_alloc_class_array_function): New functions. * trans-array.c (gfc_add_loop_ss_code): Do not move the expression for allocatable class scalar functions outside the loop. (conv_array_index_offset): Cope with deltas being NULL_TREE. (build_class_array_ref): Do not return with allocatable class array functions. Add code to pick out the returned class array. Dereference if necessary and return if not a class object. (gfc_conv_scalarized_array_ref): Cope with offsets being NULL. (gfc_walk_function_expr): Return an array ss for the result of an allocatable class array function. * trans-expr.c (gfc_conv_subref_array_arg): Remove the assert that the argument should be a variable. If an allocatable class array function, set the offset to zero and skip the write-out loop in this case. (gfc_conv_procedure_call): Add allocatable class array function to the assert. Call gfc_conv_subref_array_arg for allocatable class array function arguments with derived type formal arg.. Add the code for handling allocatable class functions, including finalization calls to prevent memory leaks. (arrayfunc_assign_needs_temporary): Return if an allocatable class array function. (gfc_trans_assignment_1): Set must_finalize to rhs expression for allocatable class functions. Set scalar_to_array as needed for scalar class allocatable functions assigned to an array. Nullify the allocatable components corresponding the the lhs derived type so that the finalization does not free them. 2015-01-27 Paul Thomas pa...@gcc.gnu.org PR fortran/63205 * gfortran.dg/class_to_type_4.f90: New test Index: gcc/fortran/gfortran.h === *** gcc/fortran/gfortran.h (revision 208092) --- gcc/fortran/gfortran.h (working copy) *** typedef struct gfc_expr *** 1753,1758 --- 1753,1761 /* Mark an expression as being a MOLD argument of ALLOCATE. */ unsigned int mold : 1; + /* Will require finalization after use. */ + unsigned int must_finalize : 1; + /* If an expression comes from a Hollerith constant or compile-time evaluation of a transfer statement, it may have a prescribed target- memory representation, and these cannot always be backformed from *** bool gfc_expr_check_typed (gfc_expr*, gf *** 2804,2809 --- 2807,2814 gfc_component * gfc_get_proc_ptr_comp (gfc_expr *); bool gfc_is_proc_ptr_comp (gfc_expr *); + bool gfc_is_alloc_class_scalar_function (gfc_expr *); + bool gfc_is_alloc_class_array_function (gfc_expr *); bool gfc_ref_this_image (gfc_ref *ref); bool gfc_is_coindexed (gfc_expr *); Index: gcc/fortran/expr.c === *** gcc/fortran/expr.c (revision 208092) --- gcc/fortran/expr.c (working copy) *** gfc_is_proc_ptr_comp (gfc_expr *expr) *** 4274,4279 --- 4274,4313 } + /* Determine if an expression is a function with an allocatable class scalar +result. */ + bool + gfc_is_alloc_class_scalar_function (gfc_expr *expr) + { + if (expr-expr_type == EXPR_FUNCTION +expr-value.function.esym +expr-value.function.esym-result +expr-value.function.esym-result-ts.type == BT_CLASS +!CLASS_DATA (expr-value.function.esym-result)-attr.dimension +CLASS_DATA (expr-value.function.esym-result)-attr.allocatable) + return true; + + return false; + } + + + /* Determine if an expression is a
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On Tue, Jan 27, 2015 at 12:27:38PM -0700, Jeff Law wrote: On 01/26/15 22:11, Segher Boessenkool wrote: On Mon, Jan 26, 2015 at 08:07:29PM -0700, Jeff Law wrote: The second change we need is an additional simplification. If we have (subreg:M1 (zero_extend:M2 (x)) Where M1 M2 and both are scalar integer modes. It's advantageous to strip the SUBREG and instead have a wider extension. Should you also check M1 is not multiple registers? We're generally working with pseudos, so we could estimate, but not know for sure if we're dealing with multiple hard regs. But more importantly, I'm not sure what that check would buy us. I mean e.g. DI on a 32-bit target. My worry is that zero_extend:DI then is more expensive -- if say, it is implemented as a split, combine itself cannot get rid of the redundancy. Earlier versions checked reg_equal_p on the MEM. But that's often a mistake because the modes of the two memory references may be different. I don't recall which of the various tests, but I was definitely seeing SImode in the load and HImode in the store. Similarly you don't want to check reg_equal_p on the addresses as they aren't necessarily the same either (they're obviously related). That's how I ultimately settled on rtx_referenced_p form you see above. I'm still not sure that's 100% what I want, but I don't have any tests yet which require something more complex. Okay, if there are actual real cases like that :-) All this code does is cull cases that are not useful to try to combine, since without that combining four insns is very expensive. Does this do anything good for the dec mem thing on x86? That would be a nice bonus :-) It might, but I haven't tested for that specifically. If you've got sample code or a PR in mind, pass it along and I'll take a look. I'd think dec mem would generally be handled by 3-1 insn combination code unless there's something else going on. I do have a specific PR in mind, but I cannot currently find it. It was about x86, dec mem and then using the flags... Must have sent 100 emails in that thread... And cannot find it now! Segher
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On 01/27/15 13:36, Segher Boessenkool wrote: On Tue, Jan 27, 2015 at 12:27:38PM -0700, Jeff Law wrote: On 01/26/15 22:11, Segher Boessenkool wrote: On Mon, Jan 26, 2015 at 08:07:29PM -0700, Jeff Law wrote: The second change we need is an additional simplification. If we have (subreg:M1 (zero_extend:M2 (x)) Where M1 M2 and both are scalar integer modes. It's advantageous to strip the SUBREG and instead have a wider extension. Should you also check M1 is not multiple registers? We're generally working with pseudos, so we could estimate, but not know for sure if we're dealing with multiple hard regs. But more importantly, I'm not sure what that check would buy us. I mean e.g. DI on a 32-bit target. My worry is that zero_extend:DI then is more expensive -- if say, it is implemented as a split, combine itself cannot get rid of the redundancy. We might lose for a case like (subreg:DI ({zero,sign}_extend:SI (x))) on a 32 bit target if something were able to recognize that the upper bits were don't cares. The most likely place for that to happen would be at assembly output time -- but that would require the target to exploit the don't care semantics of those bits. I don't recall any port doing that. We could exploit this in generic splitting code, but I don't think we do. lower-subreg slams in a zero when it finds a paradoxical subreg and we've asked for the high word. I don't immediately see that does anything special when the operand of the subreg is anything other than another reg or mem. combine exploits the don't care nature of those bits to eliminate masking and such. It's not going to be able to eliminate the subreg entirely unless it folks into some later insn and we're ultimately able to narrow the operation back down to SImode. Also note that while ports may not have special cases around the subreg variant, several have special cases for ZERO_EXTEND. Basically they slam in a zero to the upper word, either via a splitter or during assembly code output. Those special cases will be recognized more often now. Jeff
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On 01/27/15 13:36, Segher Boessenkool wrote: I mean e.g. DI on a 32-bit target. My worry is that zero_extend:DI then is more expensive -- if say, it is implemented as a split, combine itself cannot get rid of the redundancy. OK. Let me play with that a bit. Okay, if there are actual real cases like that :-) All this code does is cull cases that are not useful to try to combine, since without that combining four insns is very expensive. There are :-) It surprised me as well. Does this do anything good for the dec mem thing on x86? That would be a nice bonus :-) It might, but I haven't tested for that specifically. If you've got sample code or a PR in mind, pass it along and I'll take a look. I'd think dec mem would generally be handled by 3-1 insn combination code unless there's something else going on. I do have a specific PR in mind, but I cannot currently find it. It was about x86, dec mem and then using the flags... Must have sent 100 emails in that thread... And cannot find it now! Are you referring to 61225? Jeff
Re: Fix 59828 - Broken assembly on ppc* with two -mcpu= options
On Wed, Jan 21, 2015 at 02:01:44PM -0500, David Edelsohn wrote: I want to avoid duplicating the -mcpu parsing logic or the Rube Goldberg mechanism to re-generate the -mXXX assembler directive. Oh well, I had fun writing the patch. I thought it reasonably elegant, meeting the goals you state above. You think differently, and I won't push my approach further. The bug isn't important enough to argue over. -- Alan Modra Australia Development Lab, IBM
RE: [Patch][wwwdocs]Deprecate the ARM TPCS related options in gcc 5.0
-Original Message- From: Gerald Pfeifer [mailto:ger...@pfeifer.com] Sent: Monday, January 26, 2015 7:34 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [Patch][wwwdocs]Deprecate the ARM TPCS related options in gcc 5.0 On Monday 2015-01-26 16:47, Terry Guo wrote: This patch intends to update gcc 5.0 change.html to deprecate TPCS related options because TPCS is obsoleted per the ABI document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapc s.pdf. Is it OK? From a language perspective I suggest to say The options list the options here related to the old ABI... or The options related to the old ABI -- list the options -- ..., where I somewhat prefer the former. Please wait for Richard or Ramana for final review and approval. Gerald Thanks Gerald. Patch is updated. Is this one OK? BR, Terry Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.73 diff -u -p -r1.73 changes.html --- htdocs/gcc-5/changes.html 26 Jan 2015 09:40:03 - 1.73 +++ htdocs/gcc-5/changes.html 27 Jan 2015 09:35:32 - @@ -513,8 +513,9 @@ void operator delete[] (void *, std::siz li The deprecated option code-mwords-little-endian/code has been removed. /li - li The options relating to the old ABI code-mapcs/code and - code-mapcs-frame/code have been deprecated. + li The options code-mapcs/code, code-mapcs-frame/code, + code-mtpcs-frame/code and code-mtpcs-leaf-frame/code + which are only applicable to the old ABI have been deprecated. /li liThe transitional options code-mlra/code and code-mno-lra/code have been removed. The ARM backend now uses the local register allocator
Re: Fix 59828 - Broken assembly on ppc* with two -mcpu= options
On Tue, Jan 27, 2015 at 7:27 PM, Alan Modra amo...@gmail.com wrote: On Wed, Jan 21, 2015 at 02:01:44PM -0500, David Edelsohn wrote: I want to avoid duplicating the -mcpu parsing logic or the Rube Goldberg mechanism to re-generate the -mXXX assembler directive. Oh well, I had fun writing the patch. I thought it reasonably elegant, meeting the goals you state above. You think differently, and I won't push my approach further. The bug isn't important enough to argue over. Alan, I am sorry that you do not want to finish the patch. I don't understand why you find the command line argument so appealing when the .machine pseudo-op was designed for this purpose. Thanks, David
Re: [PATCH][AArch64] Improve bit-test-branch pattern to avoid unnecessary register clobber
On Tue, Jan 27, 2015 at 02:31:14PM +, Jiong Wang wrote: testcase changed to execution version, and moved to gcc.dg. the compile time only take several seconds. (previously I am using cc1 built by O0 which at most take 24s) ok to install? Ok for the testcase. The config/aarch64/ bits I'll defer to aarch64 maintainers. 2015-01-19 Ramana Radhakrishnan ramana.radhakrish...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (tboptabmode1): Clobber CC reg instead of scratch reg. (cboptabmode1): Likewise. * config/aarch64/iterators.md (bcond): New define_code_attr. gcc/testsuite/ * gcc.dg/long_branch.c: New testcase. Jakub
RE: [PATCH RFA MIPS] Prohibit vector modes in accumulators
-Original Message- From: Matthew Fortune [mailto:matthew.fort...@imgtec.com] Sent: Tuesday, January 27, 2015 7:19 AM To: Richard Sandiford Cc: Robert Suchanek; gcc-patches@gcc.gnu.org; Moore, Catherine Subject: RE: [PATCH RFA MIPS] Prohibit vector modes in accumulators Richard Sandiford rdsandif...@googlemail.com writes: Matthew Fortune matthew.fort...@imgtec.com writes: 2015-01-23 Robert Suchanek robert.sucha...@imgtec.com * config/mips/mips.c (mips_hard_regno_mode_ok_p): Prohibit accumulators for all vector modes. This seems like a genuine bug and although it can only be triggered by loongson or paired-single support it probably qualifies for fixing. Agreed FWIW. We shouldn't mark something as valid for a mode if even the mode's move pattern can't handle it. I think this kind of thing should go in regardless of development stage. Given that it was one of the pre-existing tests that failed I'm happy that we are covering this issue. All of these LRA related issues are likely to phase in and out with subtle changes to code-gen so I don't think we can always get a test case that fails on trunk. That's true. Since Catherine asked for further info then I will leave her to say if she is happy to accept on this basis. I withdraw my request for a testcase. Catherine
[RFC PATCH] Avoid most of the BUILT_IN_*_CHKP enum values
Hi! I've grepped for BUILT_IN_.*_CHKP in the sources and we actually need far fewer enum values than the 1204 that are being defined. This patch requires builtins.def to say explicitly (by using DEF_*BUILTIN_CHKP macro instead of corresponding DEF_*BUILTIN) which ones need that, for all the others only space in the enum is reserved and nothing else. I'd hope this could work around the buggy AIX stabs handling, but even on say x86_64-linux it has a benefit of decreasing cc1plus .debug_info by about 2.7MB (of course, with dwz that benefit goes to almost nothing, just the ~ 7000 bytes or so, plus .debug_str cost (that is merged even without dwz between TUs). The cost without dwz is obviously mainly from repeating that in most of the translation units. But why declare BUILT_IN_*_CHKP enums that are never used by anything... 2015-01-27 Jakub Jelinek ja...@redhat.com * builtins.def (DEF_BUILTIN_CHKP): Define if not defined. (DEF_LIB_BUILTIN_CHKP, DEF_EXT_LIB_BUILTIN_CHKP): Redefine. (DEF_CHKP_BUILTIN): Define using DEF_BUILTIN_CHKP instead of DEF_BUILTIN. (BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCAT, BUILT_IN_STRCHR, BUILT_IN_STRCPY, BUILT_IN_STRLEN): Use DEF_LIB_BUILTIN_CHKP macro instead of DEF_LIB_BUILTIN. (BUILT_IN_MEMCPY_CHK, BUILT_IN_MEMMOVE_CHK, BUILT_IN_MEMPCPY_CHK, BUILT_IN_MEMPCPY, BUILT_IN_MEMSET_CHK, BUILT_IN_STPCPY_CHK, BUILT_IN_STPCPY, BUILT_IN_STRCAT_CHK, BUILT_IN_STRCPY_CHK): Use DEF_EXT_LIB_BUILTIN_CHKP macro instead of DEF_EXT_LIB_BUILTIN. * tree-core.h (enum built_in_function): In between BEGIN_CHKP_BUILTINS and END_CHKP_BUILTINS only define enum values for builtins that use DEF_BUILTIN_CHKP macro. --- gcc/builtins.def.jj 2015-01-15 23:39:10.0 +0100 +++ gcc/builtins.def2015-01-27 15:04:44.860924664 +0100 @@ -63,6 +63,16 @@ along with GCC; see the file COPYING3. The builtins is registered only if COND is true. */ +/* A macro for builtins where the + BUILT_IN_*_CHKP = BUILT_IN_* + BEGIN_CHKP_BUILTINS + 1 + enums should be defined too. */ +#ifndef DEF_BUILTIN_CHKP +#define DEF_BUILTIN_CHKP(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, \ +FALLBACK_P, NONANSI_P, ATTRS, IMPLICIT, COND) \ + DEF_BUILTIN(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, FALLBACK_P,\ + NONANSI_P, ATTRS, IMPLICIT, COND) +#endif + /* A GCC builtin (like __builtin_saveregs) is provided by the compiler, but does not correspond to a function in the standard library. */ @@ -87,6 +97,10 @@ along with GCC; see the file COPYING3. #define DEF_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, false, ATTRS, true, true) +#undef DEF_LIB_BUILTIN_CHKP +#define DEF_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, false, ATTRS, true, true) /* Like DEF_LIB_BUILTIN, except that the function is not one that is specified by ANSI/ISO C. So, when we're being fully conformant we @@ -96,6 +110,10 @@ along with GCC; see the file COPYING3. #define DEF_EXT_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, true, ATTRS, false, true) +#undef DEF_EXT_LIB_BUILTIN_CHKP +#define DEF_EXT_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, true, ATTRS, false, true) /* Like DEF_LIB_BUILTIN, except that the function is only a part of the standard in C94 or above. */ @@ -199,8 +217,8 @@ along with GCC; see the file COPYING3. /* Builtin used by the implementation of Pointer Bounds Checker. */ #undef DEF_CHKP_BUILTIN #define DEF_CHKP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ - DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ - true, true, false, ATTRS, true, true) + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, false, ATTRS, true, true) /* Define an attribute list for math functions that are normally impure because some of them may write into global memory for @@ -595,22 +613,22 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_BZERO, DEF_EXT_LIB_BUILTIN(BUILT_IN_INDEX, index, BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, memchr, BT_FN_PTR_CONST_PTR_INT_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, memcmp, BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_MEMCPY, memcpy, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_MEMMOVE, memmove,
Re: RFA: patch to fix a bad code generation for PR64110 -- new constraints addition
On 01/27/15 07:08, Richard Sandiford wrote: Yeah, but in practice that's only ever going to be a partial transition. Many port maintainers won't look at this, so we'll have to support both versions indefinitely, even if the new behaviour turns out to be the best for all cases. Yes, most likely. I find myself pondering the related question of how we get ports to transition to LRA and if we could tie these together. Maintainers are going to need to transition to LRA if we're ever going to start removing blobs of reload. As a part of that transition they're presumably going to be looking closely at their backend and could make the constraint transition. In an ideal world, we'd declare release X.Y has a cut-off point. Ports that haven't transitioned to LRA get deprecated at that point. Those ports are the ones most likely not to make the constraint transition as well. I think we would have to consider any uses of ?! that remain after that point as intentional. I just think we're going to regret having two sets of constraints with such subtly different meanings. But isn't that inevitable? While I suspect that most instances of ?! should be converted, there may be some that should not. If that's the case then we're going to have both forever. Looking back at the original PR, Jakub said: The ! has been added by me for PR63594, so it isn't there from the era when i?86 backend was using reload. If there is a better way to express that RA should prefer to use memory or xmm register and only use r constraint if it already is in a r register and doesn't need to be reloaded, I can use that. Whether it is ?, ??? or something else. ! description in gcc docs just fitted most what I wanted... In some ways this seems to match the intention of *. Originally I think it was just an RA-only thing and was ignored by reload, but LRA does take it into account too (which sounds like progress to me). If I revert the patch locally and change the *vec_dupmode pattern to use *, it passes both the test for PR64110 and the tests for PR63594. Would that be OK as an alternative? I think that's up to Uros and Jakub to sort out. Jeff
Re: [PATCH] Fix ada bootstrap under cygwin-64
this patch fixes the ada bootstrap under cygwin-64. Boot-strapped under x86_64-pc-cygwin. OK for trunk? OK
Re: [RFC PATCH] Avoid most of the BUILT_IN_*_CHKP enum values
2015-01-27 17:27 GMT+03:00 Jakub Jelinek ja...@redhat.com: Hi! I've grepped for BUILT_IN_.*_CHKP in the sources and we actually need far fewer enum values than the 1204 that are being defined. This patch requires builtins.def to say explicitly (by using DEF_*BUILTIN_CHKP macro instead of corresponding DEF_*BUILTIN) which ones need that, for all the others only space in the enum is reserved and nothing else. I'd hope this could work around the buggy AIX stabs handling, but even on say x86_64-linux it has a benefit of decreasing cc1plus .debug_info by about 2.7MB (of course, with dwz that benefit goes to almost nothing, just the ~ 7000 bytes or so, plus .debug_str cost (that is merged even without dwz between TUs). The cost without dwz is obviously mainly from repeating that in most of the translation units. But why declare BUILT_IN_*_CHKP enums that are never used by anything... Enum values not mentioned in the code are not fully useless. When we have builtin functions defined as 'always_inline' functions, they are instrumented and enum names may be used in dumps and debugging. That's not a big value though. Thanks a lot for taking care of it! Ilya 2015-01-27 Jakub Jelinek ja...@redhat.com * builtins.def (DEF_BUILTIN_CHKP): Define if not defined. (DEF_LIB_BUILTIN_CHKP, DEF_EXT_LIB_BUILTIN_CHKP): Redefine. (DEF_CHKP_BUILTIN): Define using DEF_BUILTIN_CHKP instead of DEF_BUILTIN. (BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCAT, BUILT_IN_STRCHR, BUILT_IN_STRCPY, BUILT_IN_STRLEN): Use DEF_LIB_BUILTIN_CHKP macro instead of DEF_LIB_BUILTIN. (BUILT_IN_MEMCPY_CHK, BUILT_IN_MEMMOVE_CHK, BUILT_IN_MEMPCPY_CHK, BUILT_IN_MEMPCPY, BUILT_IN_MEMSET_CHK, BUILT_IN_STPCPY_CHK, BUILT_IN_STPCPY, BUILT_IN_STRCAT_CHK, BUILT_IN_STRCPY_CHK): Use DEF_EXT_LIB_BUILTIN_CHKP macro instead of DEF_EXT_LIB_BUILTIN. * tree-core.h (enum built_in_function): In between BEGIN_CHKP_BUILTINS and END_CHKP_BUILTINS only define enum values for builtins that use DEF_BUILTIN_CHKP macro. --- gcc/builtins.def.jj 2015-01-15 23:39:10.0 +0100 +++ gcc/builtins.def2015-01-27 15:04:44.860924664 +0100 @@ -63,6 +63,16 @@ along with GCC; see the file COPYING3. The builtins is registered only if COND is true. */ +/* A macro for builtins where the + BUILT_IN_*_CHKP = BUILT_IN_* + BEGIN_CHKP_BUILTINS + 1 + enums should be defined too. */ +#ifndef DEF_BUILTIN_CHKP +#define DEF_BUILTIN_CHKP(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, \ +FALLBACK_P, NONANSI_P, ATTRS, IMPLICIT, COND) \ + DEF_BUILTIN(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, FALLBACK_P,\ + NONANSI_P, ATTRS, IMPLICIT, COND) +#endif + /* A GCC builtin (like __builtin_saveregs) is provided by the compiler, but does not correspond to a function in the standard library. */ @@ -87,6 +97,10 @@ along with GCC; see the file COPYING3. #define DEF_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, false, ATTRS, true, true) +#undef DEF_LIB_BUILTIN_CHKP +#define DEF_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, false, ATTRS, true, true) /* Like DEF_LIB_BUILTIN, except that the function is not one that is specified by ANSI/ISO C. So, when we're being fully conformant we @@ -96,6 +110,10 @@ along with GCC; see the file COPYING3. #define DEF_EXT_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, true, ATTRS, false, true) +#undef DEF_EXT_LIB_BUILTIN_CHKP +#define DEF_EXT_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, true, ATTRS, false, true) /* Like DEF_LIB_BUILTIN, except that the function is only a part of the standard in C94 or above. */ @@ -199,8 +217,8 @@ along with GCC; see the file COPYING3. /* Builtin used by the implementation of Pointer Bounds Checker. */ #undef DEF_CHKP_BUILTIN #define DEF_CHKP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ - DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ - true, true, false, ATTRS, true, true) + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, false, ATTRS, true, true) /* Define an attribute list for math functions that are normally impure because some of them may write into global memory for @@ -595,22 +613,22 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_BZERO, DEF_EXT_LIB_BUILTIN(BUILT_IN_INDEX, index, BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF)
[PATCH] Add comdat_group effective target (PR bootstrap/64612)
Hi! This patch introduces a new effective target check and adds it to the pr64612.C - if comdat groups aren't used, there is no guarantee that the D2 dtor will be emitted always alongside of D1 dtor. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-01-27 Jakub Jelinek ja...@redhat.com PR bootstrap/64612 * lib/target-supports.exp (check_effective_target_comdat_group): New. * g++.dg/ipa/pr64612.C: Guard scan-assembler test with { target comdat_group }. * doc/sourcebuild.texi (comdat_group): Document. --- gcc/testsuite/lib/target-supports.exp.jj2015-01-15 23:39:06.0 +0100 +++ gcc/testsuite/lib/target-supports.exp 2015-01-26 15:24:55.325236098 +0100 @@ -6198,3 +6198,13 @@ proc check_effective_target_pie_copyrelo return $pie_copyreloc_available_saved } + +# Return 1 if the target uses comdat groups. + +proc check_effective_target_comdat_group {} { +return [check_no_messages_and_pattern comdat_group \.section\[^\n\r]*,comdat assembly { + // C++ + inline int foo () { return 1; } + int (*fn) () = foo; +}] +} --- gcc/testsuite/g++.dg/ipa/pr64612.C.jj 2015-01-26 15:25:43.301410027 +0100 +++ gcc/testsuite/g++.dg/ipa/pr64612.C 2015-01-26 15:23:11.380025863 +0100 @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O3 -std=c++11 } */ -/* { dg-final { scan-assembler _ZN5QListI7QStringED1Ev } } */ +/* { dg-final { scan-assembler _ZN5QListI7QStringED1Ev { target comdat_group } } } */ class A { --- gcc/doc/sourcebuild.texi.jj 2015-01-15 23:39:02.0 +0100 +++ gcc/doc/sourcebuild.texi2015-01-27 16:07:37.504081520 +0100 @@ -1930,6 +1930,9 @@ Target supports @code{wchar_t} that is c @item wchar_t_char32_t_compatible Target supports @code{wchar_t} that is compatible with @code{char32_t}. + +@item comdat_group +Target uses comdat groups. @end table @subsubsection Local to tests in @code{gcc.target/i386} Jakub
[PATCH PR64809]
Hi All, Here is a simple patch that cures ICE - skip debug gimples. Test is also included. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? ChangeLog: 2015-01-27 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/64809 * cfgexpand.c (reorder_operands): Skip debug gimples. gcc/testsuite/ChangeLog * gcc.dg/pr64809.c: New test. patch Description: Binary data
Re: [Patch, Fortran] PR64771 - Fix coarray ICE
Steve Kargl s...@troutmask.apl.washington.edu writes: On Sat, Jan 24, 2015 at 06:13:04PM +0100, Tobias Burnus wrote: if (s1-as-type == AS_EXPLICIT) -for (i = 0; i s1-as-rank + s1-as-corank; i++) +for (i = 0; i s1-as-rank + std::max(0, s1-as-corank-1); i++) Doesn't this require '#include algorithms'? I suspect that you are depending on namespace pollution via some other header (coretypes.h?). It was committed with that change, which unfortunately broke Solaris bootstrap: In file included from ./config.h:6:0, from /vol/gcc/src/hg/trunk/local/gcc/fortran/interface.c:68: ./auto-host.h:2055:0: error: _FILE_OFFSET_BITS redefined [-Werror] #define _FILE_OFFSET_BITS 64 ^ In file included from /usr/include/iso/stdlib_iso.h:24:0, from /usr/include/stdlib.h:11, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/cstdlib:72, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_algo.h:59, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/algorithm:62, from /vol/gcc/src/hg/trunk/local/gcc/fortran/interface.c:66: /var/gcc/regression/trunk/11-gcc/build/prev-gcc/include-fixed/sys/feature_tests.h:213:0: note: this is the location of the previous definition #define _FILE_OFFSET_BITS 32 ^ The problem is (as so often) that algorithm was included *before* config.h. Moving it after the other includes allows interface.c to compile without warnings. Ok for mainline? Rainer 2015-01-27 Rainer Orth r...@cebitec.uni-bielefeld.de gcc/fortran: * interface.c: Include algorithm after config.h diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -63,8 +63,6 @@ along with GCC; see the file COPYING3. formal argument list points to symbols within the same namespace as the program unit name. */ -#include algorithm /* For std::max. */ - #include config.h #include system.h #include coretypes.h @@ -73,6 +71,8 @@ along with GCC; see the file COPYING3. #include match.h #include arith.h +#include algorithm /* For std::max. */ + /* The current_interface structure holds information about the interface currently being parsed. This structure is saved and restored during recursive interfaces. */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [Patch, Fortran] PR64771 - Fix coarray ICE
On Tue, Jan 27, 2015 at 03:55:17PM +0100, Rainer Orth wrote: Steve Kargl s...@troutmask.apl.washington.edu writes: On Sat, Jan 24, 2015 at 06:13:04PM +0100, Tobias Burnus wrote: if (s1-as-type == AS_EXPLICIT) - for (i = 0; i s1-as-rank + s1-as-corank; i++) + for (i = 0; i s1-as-rank + std::max(0, s1-as-corank-1); i++) Doesn't this require '#include algorithms'? I suspect that you are depending on namespace pollution via some other header (coretypes.h?). It was committed with that change, which unfortunately broke Solaris bootstrap: In file included from ./config.h:6:0, from /vol/gcc/src/hg/trunk/local/gcc/fortran/interface.c:68: ./auto-host.h:2055:0: error: _FILE_OFFSET_BITS redefined [-Werror] #define _FILE_OFFSET_BITS 64 ^ In file included from /usr/include/iso/stdlib_iso.h:24:0, from /usr/include/stdlib.h:11, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/cstdlib:72, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_algo.h:59, from /var/gcc/regression/trunk/11-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/algorithm:62, from /vol/gcc/src/hg/trunk/local/gcc/fortran/interface.c:66: /var/gcc/regression/trunk/11-gcc/build/prev-gcc/include-fixed/sys/feature_tests.h:213:0: note: this is the location of the previous definition #define _FILE_OFFSET_BITS 32 ^ The problem is (as so often) that algorithm was included *before* config.h. Moving it after the other includes allows interface.c to compile without warnings. Why don't you use MAX macro instead of std::max as everywhere else in the gcc sources? Your change is wrong, you can't include system headers after including system.h and other headers. Jakub
Re: [PATCH][AArch64] Improve bit-test-branch pattern to avoid unnecessary register clobber
On 27 January 2015 at 14:31, Jiong Wang jiong.w...@arm.com wrote: 2015-01-19 Ramana Radhakrishnan ramana.radhakrish...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (tboptabmode1): Clobber CC reg instead of scratch reg. (cboptabmode1): Likewise. * config/aarch64/iterators.md (bcond): New define_code_attr. OK /Marcus gcc/testsuite/ * gcc.dg/long_branch.c: New testcase.
Re: [PATCH][AArch64] Improve bit-test-branch pattern to avoid unnecessary register clobber
On 19/01/15 10:58, Jakub Jelinek wrote: On Mon, Jan 19, 2015 at 10:52:14AM +, Ramana Radhakrishnan wrote: What is aarch64 specific on the testcase? The number of if-then-else's required to get the compiler to generate cmp branch sequences rather than the tbnz instruction. That doesn't mean the same testcase couldn't be tested on other targets and perhaps find bugs in there. That said, if the testcase is too expensive to compile (several seconds is ok, minutes is not), then perhaps it shouldn't be included at all, or should be guarded with run_expensive_tests target. Jakub testcase changed to execution version, and moved to gcc.dg. the compile time only take several seconds. (previously I am using cc1 built by O0 which at most take 24s) ok to install? Thanks. 2015-01-19 Ramana Radhakrishnan ramana.radhakrish...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (tboptabmode1): Clobber CC reg instead of scratch reg. (cboptabmode1): Likewise. * config/aarch64/iterators.md (bcond): New define_code_attr. gcc/testsuite/ * gcc.dg/long_branch.c: New testcase. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 597ff8c..1e00396 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -466,13 +466,17 @@ (const_int 0)) (label_ref (match_operand 2 )) (pc))) - (clobber (match_scratch:DI 3 =r))] + (clobber (reg:CC CC_REGNUM))] - * - if (get_attr_length (insn) == 8) -return \ubfx\\t%w3, %w0, %1, #1\;cbz\\t%w3, %l2\; - return \tbz\\t%w0, %1, %l2\; - + { +if (get_attr_length (insn) == 8) + { + operands[1] = GEN_INT (HOST_WIDE_INT_1U UINTVAL (operands[1])); + return tst\t%w0, %1\;bcond\t%l2; + } +else + return tbz\t%w0, %1, %l2; + } [(set_attr type branch) (set (attr length) (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) @@ -486,13 +490,21 @@ (const_int 0)) (label_ref (match_operand 1 )) (pc))) - (clobber (match_scratch:DI 2 =r))] + (clobber (reg:CC CC_REGNUM))] - * - if (get_attr_length (insn) == 8) -return \ubfx\\t%w2, %w0, sizem1, #1\;cbz\\t%w2, %l1\; - return \tbz\\t%w0, sizem1, %l1\; - + { +if (get_attr_length (insn) == 8) + { + char buf[64]; + uint64_t val = ((uint64_t ) 1) + (GET_MODE_SIZE (MODEmode) * BITS_PER_UNIT - 1); + sprintf (buf, tst\t%%w0, %PRId64, val); + output_asm_insn (buf, operands); + return bcond\t%l1; + } +else + return tbz\t%w0, sizem1, %l1; + } [(set_attr type branch) (set (attr length) (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768)) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 7dd3917..bd144f9 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -823,6 +823,9 @@ (smax s) (umax u) (smin s) (umin u)]) +;; Emit conditional branch instructions. +(define_code_attr bcond [(eq beq) (ne bne) (lt bne) (ge beq)]) + ;; Emit cbz/cbnz depending on comparison type. (define_code_attr cbz [(eq cbz) (ne cbnz) (lt cbnz) (ge cbz)]) diff --git a/gcc/testsuite/gcc.dg/long_branch.c b/gcc/testsuite/gcc.dg/long_branch.c new file mode 100644 index 000..f388a80 --- /dev/null +++ b/gcc/testsuite/gcc.dg/long_branch.c @@ -0,0 +1,198 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-reorder-blocks } */ + +void abort (); + +__attribute__((noinline, noclone)) int +restore (int a, int b) +{ + return a * b; +} + +__attribute__((noinline, noclone)) void +do_nothing (int *input) +{ + *input = restore (*input, 1); + return; +} + +#define CASE_ENTRY(n) \ + case n: \ +sum = sum / (n + 1); \ +sum = restore (sum, n + 1); \ +if (sum == (n + addend)) \ + break;\ +sum = sum / (n + 2); \ +sum = restore (sum, n + 2); \ +sum = sum / (n + 3); \ +sum = restore (sum, n + 3); \ +sum = sum / (n + 4); \ +sum = restore (sum, n + 4); \ +sum = sum / (n + 5); \ +sum = restore (sum, n + 5); \ +sum = sum / (n + 6); \ +sum = restore (sum, n + 6); \ +sum = sum / (n + 7); \ +sum = restore (sum, n + 7); \ +sum = sum / (n + 8); \ +sum = restore (sum, n + 8); \ +sum = sum / (n + 9); \ +sum = restore (sum, n + 9); \ +sum = sum / (n + 10); \ +sum = restore (sum, n + 10); \ +sum = sum / (n + 11); \ +sum = restore (sum, n + 11); \ +sum = sum / (n + 12); \ +sum = restore (sum, n + 12); \ +sum = sum / (n + 13); \ +sum = restore (sum, n + 13); \ +sum = sum / (n + 14); \ +sum = restore (sum, n + 14); \ +sum = sum / (n + 15); \ +sum = restore (sum, n + 15); \ +sum = sum / (n + 16); \ +sum = restore (sum, n + 16); \ +sum = sum / (n + 17); \ +sum = restore (sum, n + 17); \ +sum = sum / (n + 18); \ +sum = restore (sum, n + 18); \ +sum = sum / (n + 19); \ +sum = restore (sum, n + 19);
[PATCH] Fix ada bootstrap under cygwin-64
Hi, this patch fixes the ada bootstrap under cygwin-64. Boot-strapped under x86_64-pc-cygwin. OK for trunk? Thanks Bernd. 2015-01-27 Bernd Edlinger bernd.edlin...@hotmail.de Fix build under cygwin/64. * adaint.h: Add check for __CYGWIN__. * mingw32.h: Prevent windows.h from including x86intrin.h in GCC. patch-ada-cygwin64.diff Description: Binary data
Re: RFA: patch to fix a bad code generation for PR64110 -- new constraints addition
On 01/27/2015 12:11 PM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: On 01/27/2015 09:08 AM, Richard Sandiford wrote: Yeah, but in practice that's only ever going to be a partial transition. Many port maintainers won't look at this, so we'll have to support both versions indefinitely, even if the new behaviour turns out to be the best for all cases. I just think we're going to regret having two sets of constraints with such subtly different meanings. Looking back at the original PR, Jakub said: The ! has been added by me for PR63594, so it isn't there from the era when i?86 backend was using reload. If there is a better way to express that RA should prefer to use memory or xmm register and only use r constraint if it already is in a r register and doesn't need to be reloaded, I can use that. Whether it is ?, ??? or something else. ! description in gcc docs just fitted most what I wanted... In some ways this seems to match the intention of *. Originally I think it was just an RA-only thing and was ignored by reload, but LRA does take it into account too (which sounds like progress to me). I guess we don't need '*' in many cases. It is overused. Imho, IRA should decide what class is better based on costs of alternatives and the explicit exclusion of register class by using '*' is a bad practice. Saying that I believe we should do register class preferrencing algorithm more alternative oriented. The algorithm should choose first an alternative (of may be subset of alternatives) and then register classes. I think it is more logical. It would permits us to rid off all such constraints including '*' and use only one like '?' which increases the alternative cost. In perspective it is even better to rid of '?' too and have some hook (or attribute) to get insn alternative costs which can be depended on sub-target or other run-time characteristics. Otherwise we need to duplicate insn descriptions and put different insn guards. I am going to work on this. But it is hard to say will it work well (may be I have some performance issues with this). This hook somehow (min or average of the values for all alternatives) can be used in combiner and other algorithms need an insn cost. That is how I see the solution of the problem in a long perspective. Definitely agree that it'd be better to remove these constraints in favour of a new attribute. preferred_for_size and preferred_for_speed give something similar, though they're much more stringent than what we need here. If I revert the patch locally and change the *vec_dupmode pattern to use *, it passes both the test for PR64110 and the tests for PR63594. Would that be OK as an alternative? I don't think it will work in general case. It probably works because a different class is chosen in IRA. If IRA for some reasons choose the same class, we might see the same problem in LRA. But isn't that the point of '*'? It should stop IRA from using the 'r' alternative as an indication that 'r' is a good choice for this instruction. If IRA chooses 'r' anyway, it must be because other instructions that use the same allocno strongly prefer 'r'. And in those some circumstances -- i.e. if IRA does choose 'r' despite the constraints in this instruction -- then I think we do want to use the 'r' alternative. And AIUI that's also what the new constraint is designed to do. If IRA chooses 'r' anyway, the new constraint causes LRA to prefer the 'r' alternative _even if_ another operand (the destination) has to be reloaded, which is the fundamental difference between the new constraint and '!'. So I'm still not sure why '*' wouldn't do what we want. Frequently use of '*' (and sometimes '!' for reload) means that we need splitting for this alternative probably into 2/3 insns. Instead of '*' use we would need to set up costs of all these insns. I believe just ignoring the class with '*' is wrong. There are some cases where we need '*' to avoid definitely this reg class, e.g. mmx when we use other classes for fp values. But I guess this solution is not reliable and without the constraints we could set the alternative cost very high to have a reliable right solution. I also don't like when register classes are excluded by '*' for IRA (see my thoughts above). Understood, and I agree it would be good to move to attributes. But in a way, I think that's an even better reason to try to avoid adding these new constraints. It sounds like we're hoping to get rid of them as soon as we've added them :-) Sometimes to get rid off, you should add more :) But to be serious, what I wrote can not be implemented for GCC-5.0 (and the generated code performance is still unknown for the proposed approach). I believe the current solution is more reliable than using '*'. Ridding off the new constraints will be much much smaller problem than ridding of other constraints.
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
On 01/27/15 14:21, Segher Boessenkool wrote: On Tue, Jan 27, 2015 at 01:53:34PM -0700, Jeff Law wrote: I do have a specific PR in mind, but I cannot currently find it. It was about x86, dec mem and then using the flags... Must have sent 100 emails in that thread... And cannot find it now! Are you referring to 61225? That is the one, thanks. It's not going to help 61225. The key insns in 61225 are: (insn 6 3 7 2 (set (reg:SI 91 [ *x_3(D) ]) (mem:SI (reg/v/f:SI 90 [ x ]) [1 *x_3(D)+0 S4 A32])) k.c:11 90 {*movsi_internal} (nil)) (insn 7 6 8 2 (parallel [ (set (reg:SI 88 [ D.1494 ]) (plus:SI (reg:SI 91 [ *x_3(D) ]) (const_int -1 [0x]))) (clobber (reg:CC 17 flags)) ]) k.c:11 220 {*addsi_1} (expr_list:REG_DEAD (reg:SI 91 [ *x_3(D) ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (plus:SI (mem:SI (reg/v/f:SI 90 [ x ]) [1 *x_3(D)+0 S4 A32]) (const_int -1 [0x])) (nil) (insn 8 7 9 2 (set (mem:SI (reg/v/f:SI 90 [ x ]) [1 *x_3(D)+0 S4 A32]) (reg:SI 88 [ D.1494 ])) k.c:11 90 {*movsi_internal} (nil)) (insn 9 8 10 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 88 [ D.1494 ]) (const_int 0 [0]))) k.c:11 3 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 88 [ D.1494 ]) (nil))) Note how REG:SI 88 has two uses. Never do we pass a set of insns into try_combine that are useful to optimize this particular case (we never include insn #9 in any of the attempted combinations). Even a bridge pattern isn't going to help here. jeff
Re: [PATCH] Add comdat_group effective target (PR bootstrap/64612)
On Jan 27, 2015, at 7:10 AM, Jakub Jelinek ja...@redhat.com wrote: This patch introduces a new effective target check and adds it to the pr64612.C - if comdat groups aren't used, there is no guarantee that the D2 dtor will be emitted always alongside of D1 dtor. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok.
Re: [RFC PATCH] Avoid most of the BUILT_IN_*_CHKP enum values
On Tue, Jan 27, 2015 at 06:04:53PM +0300, Ilya Enkovich wrote: 2015-01-27 17:27 GMT+03:00 Jakub Jelinek ja...@redhat.com: I've grepped for BUILT_IN_.*_CHKP in the sources and we actually need far fewer enum values than the 1204 that are being defined. This patch requires builtins.def to say explicitly (by using DEF_*BUILTIN_CHKP macro instead of corresponding DEF_*BUILTIN) which ones need that, for all the others only space in the enum is reserved and nothing else. I'd hope this could work around the buggy AIX stabs handling, but even on say x86_64-linux it has a benefit of decreasing cc1plus .debug_info by about 2.7MB (of course, with dwz that benefit goes to almost nothing, just the ~ 7000 bytes or so, plus .debug_str cost (that is merged even without dwz between TUs). The cost without dwz is obviously mainly from repeating that in most of the translation units. But why declare BUILT_IN_*_CHKP enums that are never used by anything... Enum values not mentioned in the code are not fully useless. When we have builtin functions defined as 'always_inline' functions, they are instrumented and enum names may be used in dumps and debugging. That's not a big value though. Thanks a lot for taking care of it! Note, patch successfully bootstrapped/regtested on x86_64-linux and i686-linux, and David said that on AIX it passed stage1 cc1 linking. Ok for trunk? As for the enums, I doubt the pain is worth the trouble. What perhaps could be done (apparently preexisting issue, because you include builtins.def just once in built_in_names, would be to tweak fprintf (file, built-in %s:%s, built_in_class_names[(int) DECL_BUILT_IN_CLASS (node)], built_in_names[(int) DECL_FUNCTION_CODE (node)]); so that if DECL_FUNCTION_CODE is in between BEGIN_CHKP_BUILTINS and END_CHKP_BUILTINS you don't print (null) on glibc there or crash (on various other hosts), but actually print built_in_names[(int) DECL_FUNCTION_CODE (node) - (int) BEGIN_CHKP_BUILTINS - 1] concatenated with _CHKP. 2015-01-27 Jakub Jelinek ja...@redhat.com * builtins.def (DEF_BUILTIN_CHKP): Define if not defined. (DEF_LIB_BUILTIN_CHKP, DEF_EXT_LIB_BUILTIN_CHKP): Redefine. (DEF_CHKP_BUILTIN): Define using DEF_BUILTIN_CHKP instead of DEF_BUILTIN. (BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCAT, BUILT_IN_STRCHR, BUILT_IN_STRCPY, BUILT_IN_STRLEN): Use DEF_LIB_BUILTIN_CHKP macro instead of DEF_LIB_BUILTIN. (BUILT_IN_MEMCPY_CHK, BUILT_IN_MEMMOVE_CHK, BUILT_IN_MEMPCPY_CHK, BUILT_IN_MEMPCPY, BUILT_IN_MEMSET_CHK, BUILT_IN_STPCPY_CHK, BUILT_IN_STPCPY, BUILT_IN_STRCAT_CHK, BUILT_IN_STRCPY_CHK): Use DEF_EXT_LIB_BUILTIN_CHKP macro instead of DEF_EXT_LIB_BUILTIN. * tree-core.h (enum built_in_function): In between BEGIN_CHKP_BUILTINS and END_CHKP_BUILTINS only define enum values for builtins that use DEF_BUILTIN_CHKP macro. --- gcc/builtins.def.jj 2015-01-15 23:39:10.0 +0100 +++ gcc/builtins.def2015-01-27 15:04:44.860924664 +0100 @@ -63,6 +63,16 @@ along with GCC; see the file COPYING3. The builtins is registered only if COND is true. */ +/* A macro for builtins where the + BUILT_IN_*_CHKP = BUILT_IN_* + BEGIN_CHKP_BUILTINS + 1 + enums should be defined too. */ +#ifndef DEF_BUILTIN_CHKP +#define DEF_BUILTIN_CHKP(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, \ +FALLBACK_P, NONANSI_P, ATTRS, IMPLICIT, COND) \ + DEF_BUILTIN(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, FALLBACK_P,\ + NONANSI_P, ATTRS, IMPLICIT, COND) +#endif + /* A GCC builtin (like __builtin_saveregs) is provided by the compiler, but does not correspond to a function in the standard library. */ @@ -87,6 +97,10 @@ along with GCC; see the file COPYING3. #define DEF_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, false, ATTRS, true, true) +#undef DEF_LIB_BUILTIN_CHKP +#define DEF_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ + TYPE, true, true, false, ATTRS, true, true) /* Like DEF_LIB_BUILTIN, except that the function is not one that is specified by ANSI/ISO C. So, when we're being fully conformant we @@ -96,6 +110,10 @@ along with GCC; see the file COPYING3. #define DEF_EXT_LIB_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, true, ATTRS, false, true) +#undef DEF_EXT_LIB_BUILTIN_CHKP +#define DEF_EXT_LIB_BUILTIN_CHKP(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN_CHKP (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE,\ +
Re: [PATCH, PR tree-optimization/64277] Improve loop iterations count estimation
On 27 Jan 12:40, Ilya Enkovich wrote: Hi, This patch was supposed to fix PR tree-optimization/64277. Tracker is now fixed by warnings disabling but I think patch is still useful to avoid dead code generated by complete unroll. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Ilya -- gcc/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-loop-niter.c (record_nonwrapping_iv): Use base range info when possible to refine estimation. gcc/testsuite/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/pr64277.c: New. Here is a new version fixed according to comments in the tracker. I also fixed a test to scan cunroll dumps. Does it look OK? What are possible branches for this patch? Thanks, Ilya -- diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..c6ef331 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options -O3 -Wall -Werror -fdump-tree-cunroll-details } */ +/* { dg-final { scan-tree-dump loop with 5 iterations completely unrolled cunroll } } */ +/* { dg-final { scan-tree-dump loop with 6 iterations completely unrolled cunroll } } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m 5) +m = 5; + for (i = m; i l; i--) +f1[i] = a[i]++; +} diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 919f5c0..1cd297d 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, { tree niter_bound, extreme, delta; tree type = TREE_TYPE (base), unsigned_type; + tree orig_base = base; if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step)) return; @@ -2777,16 +2778,30 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, if (tree_int_cst_sign_bit (step)) { + wide_int min, max; extreme = fold_convert (unsigned_type, low); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (high) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (wide_int (high), max)) + base = wide_int_to_tree (unsigned_type, max); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, high); delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme); step = fold_build1 (NEGATE_EXPR, unsigned_type, step); } else { + wide_int min, max; extreme = fold_convert (unsigned_type, high); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (low) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (min, wide_int (low))) + base = wide_int_to_tree (unsigned_type, min); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, low); delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base); }
Re: [PATCH] Fix PR64277
On Tue, 27 Jan 2015, Ilya Enkovich wrote: 2015-01-27 12:47 GMT+03:00 Richard Biener rguent...@suse.de: On Tue, 27 Jan 2015, Jakub Jelinek wrote: On Tue, Jan 27, 2015 at 10:25:48AM +0100, Richard Biener wrote: This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? So nothing in the testsuite needed to change? Nice. Yes. Ok for trunk. I'll search for duplicates and add a few testcases. Thanks. Committed as follows (first testcase in PR59124 not fixed - it warns from the first pass). Are you going to port it to 4.9 branch? I plan to do that (4.8 as well) after some time. Richard. Thanks, Ilya 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/56273 PR tree-optimization/59124 PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. * g++.dg/warn/Warray-bounds-6.C: New testcase. * gcc.dg/Warray-bounds-12.c: Likewise. * gcc.dg/Warray-bounds-13.c: Likewise. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2015-01-27 10:34:26.453743828 +0100 --- gcc/tree-vrp.c 2015-01-27 10:43:04.970610102 +0100 *** vrp_finalize (void) *** 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds) check_all_array_refs (); /* We must identify jump threading opportunities before we release --- 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release Index: gcc/testsuite/g++.dg/warn/Warray-bounds-6.C === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/g++.dg/warn/Warray-bounds-6.C 2015-01-27 10:40:31.311871855 +0100 *** *** 0 --- 1,26 + // { dg-do compile } + // { dg-options -O3 -Warray-bounds } + + struct type { + bool a, b; + bool get_b() { return b; } + }; + + type stuff[9u]; + + void bar(); + + void foo() + { + for(unsigned i = 0u; i 9u; i++) + { + if(!stuff[i].a) + continue; + + bar(); + + for(unsigned j = i + 1u; j 9u; j++) + if(stuff[j].a stuff[j].get_b()) // { dg-bogus above array bounds } + return; + } + } Index: gcc/testsuite/gcc.dg/Warray-bounds-12.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-12.c 2015-01-27 10:40:58.196175989 +0100 *** *** 0 --- 1,26 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + /* { dg-additional-options -mssse3 { target x86_64-*-* i?86-*-* } } */ + + void foo(short a[], short m) + { + int i, j; + int f1[10]; + short nc; + + nc = m + 1; + if (nc 3) + { + for (i = 0; i = nc; i++) + { + f1[i] = f1[i] + 1; + } + } + + for (i = 0, j = m; i nc; i++, j--) + { + a[i] = f1[i]; /* { dg-bogus above array bounds } */ + a[j] = i; + } + return; + } Index: gcc/testsuite/gcc.dg/Warray-bounds-13.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-13.c 2015-01-27 10:42:43.738369929 +0100 *** *** 0 --- 1,18 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + + extern char *bar[17]; + + int foo(int argc, char **argv) + { + int i; + int n = 0; + + for (i = 0; i argc; i++) + n++; + + for (i = 0; i argc; i++) + argv[i] = bar[i + n]; /* { dg-bogus above array bounds } */ + + return 0; + } -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
[PATCH] pr 64047 - explicitly handle target_option_default_node in rs6000_set_current_function
From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, the compiler crashes on pr52429.c because this_target_ira_int gets initialized with null x_init_costs and x_op_costs. While I don't really understand this option handling mess r217659 made the analogous change to i386 when it broke this. So it seems likely this is the right way to fix the regression. bootstrapped + regtested ppc64-linux-gnu, without regression and pr52429.c is fixed, ok? Trev gcc/ * config/rs6000/rs6000.c (rs6000_set_current_function): Handle explicit default options. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 85eb0fd..207fc55 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -32609,7 +32609,7 @@ rs6000_set_current_function (tree fndecl) if (old_tree == new_tree) ; - else if (new_tree) + else if (new_tree new_tree != target_option_default_node) { cl_target_option_restore (global_options, TREE_TARGET_OPTION (new_tree)); @@ -32620,7 +32620,7 @@ rs6000_set_current_function (tree fndecl) = save_target_globals_default_opts (); } - else if (old_tree) + else if (old_tree old_tree != target_option_default_node) { new_tree = target_option_current_node; cl_target_option_restore (global_options, -- 2.1.4
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Mon, 26 Jan 2015 14:44:19 +0100 Thomas Schwinge tho...@codesourcery.com wrote: On 17 Jan 02:16, Ilya Verbin wrote: Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Sorry for that! Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) So, you don't assume that a device can have multiple images from multiple libs? Ping? This probably is just a bug that we introduced with our changes? (Julian?) AFAICR, we haven't yet figured out how to make (shared) libraries work with PTX. Actually I'm not entirely sure if static libraries containing PTX code will work either. But, multiple images (e.g. from different object files) are supported, via the loop in gomp_target_init. (The semantics of gomp_register_image_for_device were changed, but not -- intentionally! -- to limit the number of offloaded images to one.) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? This was certainly necessary at some point, when the support for multiple devices of the same type in the OpenACC runtime was delegated entirely to target-dependent code. Later (after one round of refactoring), the gomp_device_descr and the memory map were still separate, with the former possibly representing a number of devices, and the latter having independent copies for each instance of a device. That's largely been refactored (again) away now though -- a gomp_device_descr and its memory map are stored together, per-device instance. So this separation of their initialisation can probably go away, although some (somewhat delicate) code in oacc-init.c would need to be tweaked. Julian
RE: [PATCH RFA MIPS] Prohibit vector modes in accumulators
Richard Sandiford rdsandif...@googlemail.com writes: Matthew Fortune matthew.fort...@imgtec.com writes: 2015-01-23 Robert Suchanek robert.sucha...@imgtec.com * config/mips/mips.c (mips_hard_regno_mode_ok_p): Prohibit accumulators for all vector modes. This seems like a genuine bug and although it can only be triggered by loongson or paired-single support it probably qualifies for fixing. Agreed FWIW. We shouldn't mark something as valid for a mode if even the mode's move pattern can't handle it. I think this kind of thing should go in regardless of development stage. Given that it was one of the pre-existing tests that failed I'm happy that we are covering this issue. All of these LRA related issues are likely to phase in and out with subtle changes to code-gen so I don't think we can always get a test case that fails on trunk. Since Catherine asked for further info then I will leave her to say if she is happy to accept on this basis. Matthew
Re: [PATCH] Fix PR64277
2015-01-27 13:59 GMT+03:00 Richard Biener rguent...@suse.de: On Tue, 27 Jan 2015, Ilya Enkovich wrote: 2015-01-27 12:47 GMT+03:00 Richard Biener rguent...@suse.de: On Tue, 27 Jan 2015, Jakub Jelinek wrote: On Tue, Jan 27, 2015 at 10:25:48AM +0100, Richard Biener wrote: This disables array-bound warnings from VRP2 as discussed. Bootstrapped and tested on x86_64-unknown-linux-gnu - ok for trunk? So nothing in the testsuite needed to change? Nice. Yes. Ok for trunk. I'll search for duplicates and add a few testcases. Thanks. Committed as follows (first testcase in PR59124 not fixed - it warns from the first pass). Are you going to port it to 4.9 branch? I plan to do that (4.8 as well) after some time. Great, thanks! Ilya Richard. Thanks, Ilya 2015-01-27 Richard Biener rguent...@suse.de PR tree-optimization/56273 PR tree-optimization/59124 PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass. * g++.dg/warn/Warray-bounds-6.C: New testcase. * gcc.dg/Warray-bounds-12.c: Likewise. * gcc.dg/Warray-bounds-13.c: Likewise. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2015-01-27 10:34:26.453743828 +0100 --- gcc/tree-vrp.c 2015-01-27 10:43:04.970610102 +0100 *** vrp_finalize (void) *** 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds) check_all_array_refs (); /* We must identify jump threading opportunities before we release --- 10229,10235 substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt, false); ! if (warn_array_bounds first_pass_instance) check_all_array_refs (); /* We must identify jump threading opportunities before we release Index: gcc/testsuite/g++.dg/warn/Warray-bounds-6.C === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/g++.dg/warn/Warray-bounds-6.C 2015-01-27 10:40:31.311871855 +0100 *** *** 0 --- 1,26 + // { dg-do compile } + // { dg-options -O3 -Warray-bounds } + + struct type { + bool a, b; + bool get_b() { return b; } + }; + + type stuff[9u]; + + void bar(); + + void foo() + { + for(unsigned i = 0u; i 9u; i++) + { + if(!stuff[i].a) + continue; + + bar(); + + for(unsigned j = i + 1u; j 9u; j++) + if(stuff[j].a stuff[j].get_b()) // { dg-bogus above array bounds } + return; + } + } Index: gcc/testsuite/gcc.dg/Warray-bounds-12.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-12.c 2015-01-27 10:40:58.196175989 +0100 *** *** 0 --- 1,26 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + /* { dg-additional-options -mssse3 { target x86_64-*-* i?86-*-* } } */ + + void foo(short a[], short m) + { + int i, j; + int f1[10]; + short nc; + + nc = m + 1; + if (nc 3) + { + for (i = 0; i = nc; i++) + { + f1[i] = f1[i] + 1; + } + } + + for (i = 0, j = m; i nc; i++, j--) + { + a[i] = f1[i]; /* { dg-bogus above array bounds } */ + a[j] = i; + } + return; + } Index: gcc/testsuite/gcc.dg/Warray-bounds-13.c === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/testsuite/gcc.dg/Warray-bounds-13.c 2015-01-27 10:42:43.738369929 +0100 *** *** 0 --- 1,18 + /* { dg-do compile } */ + /* { dg-options -O3 -Warray-bounds } */ + + extern char *bar[17]; + + int foo(int argc, char **argv) + { + int i; + int n = 0; + + for (i = 0; i argc; i++) + n++; + + for (i = 0; i argc; i++) + argv[i] = bar[i + n]; /* { dg-bogus above array bounds } */ + + return 0; + } -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
Re: [PATCH][2/2] Improve array-bound warnings and VRP
On Mon, 26 Jan 2015, Jakub Jelinek wrote: On Mon, Jan 26, 2015 at 04:18:32PM +0100, Richard Biener wrote: Ok for trunk? Or should I delay this to GCC 6? Does this work even without the other patch? Yes, I've actually developed 2/2 first. The other patch only ever emits more warnings... Then it probably should be ok. I'm really afraid of emitting more warnings with such high false positive rate now. As the patch also mitigates some of the code bloat we get with the complete peeling (regression against 4.7) I have installed it. It's also the easiest vehicle to verify range-info is not broken by passes between vrp1 and vrp2. Thanks, Richard.
Re: [PATCH, PR tree-optimization/64277] Improve loop iterations count estimation
On Tue, Jan 27, 2015 at 11:47 AM, Ilya Enkovich enkovich@gmail.com wrote: On 27 Jan 12:40, Ilya Enkovich wrote: Hi, This patch was supposed to fix PR tree-optimization/64277. Tracker is now fixed by warnings disabling but I think patch is still useful to avoid dead code generated by complete unroll. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Ilya -- gcc/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-loop-niter.c (record_nonwrapping_iv): Use base range info when possible to refine estimation. gcc/testsuite/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/pr64277.c: New. Here is a new version fixed according to comments in the tracker. I also fixed a test to scan cunroll dumps. Does it look OK? Minor comments below. What are possible branches for this patch? You can probably create a testcase that shows code-size regressions against a version that didn't peel completely (GCC 4.7). Thus I'd say it would apply to 4.9 as well (4.8 doesn't have range information). Thanks, Ilya -- diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..c6ef331 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options -O3 -Wall -Werror -fdump-tree-cunroll-details } */ +/* { dg-final { scan-tree-dump loop with 5 iterations completely unrolled cunroll } } */ +/* { dg-final { scan-tree-dump loop with 6 iterations completely unrolled cunroll } } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m 5) +m = 5; + for (i = m; i l; i--) +f1[i] = a[i]++; +} diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 919f5c0..1cd297d 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, { tree niter_bound, extreme, delta; tree type = TREE_TYPE (base), unsigned_type; + tree orig_base = base; if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step)) return; @@ -2777,16 +2778,30 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, if (tree_int_cst_sign_bit (step)) { + wide_int min, max; extreme = fold_convert (unsigned_type, low); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (high) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (wide_int (high), max)) For me a simple wi::gts_p (high, max) worked fine. + base = wide_int_to_tree (unsigned_type, max); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, high); delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme); step = fold_build1 (NEGATE_EXPR, unsigned_type, step); } else { + wide_int min, max; extreme = fold_convert (unsigned_type, high); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (low) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (min, wide_int (low))) Likewise. Ok for trunk with that changes. For the 4.9 branch you need to adjust the patch to not use wide-ints. I'd leave it on trunk for a while and eventually open a bugreport for the size regression to keep track of it. Thanks, Richard. + base = wide_int_to_tree (unsigned_type, min); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, low); delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base); }
Re: [PATCH 6/8] Handle SCRATCH in decompose_address
On Oct 23, 2014, at 4:18 AM, Jeff Law l...@redhat.com wrote: On 10/22/14 17:01, Maxim Kuvyrkov wrote: On Oct 23, 2014, at 9:02 AM, Jeff Law l...@redhat.com wrote: On 10/20/14 21:35, Maxim Kuvyrkov wrote: Hi, This patch is a simple fix to allow decompose_address to handle SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite for a scheduler improvement that relies on decompose_address to parse insns. Bootstrapped and regtested on x86_64-linux-gnu and regtested on arm-linux-gnueabihf and aarch64-linux-gnu. I'd like to see some further discussion here. get_base_term is supposed to look at its argument as a base address. I'm curious under what circumstances you want to have a SCRATCH as a base address? I didn't see anything in patch #8 which obviously dependended on this, but maybe it's in there, but more subtle than expected. If you can justify why it's useful to handle scratch in here, then the patch will be fine. Without this patch decompose_address() ICEs during second scheduler pass on prologue instructions that usually have (clobber (mem:BLK (scratch)). The only reason for this patch is to prevent that fault and enable use of decompose_address during 2nd scheduler pass. Does this answer your question, or are you looking for a more in-depth reason? Yea, that's everything I needed to know. Patch approved. Hi, Turns out that the above patch applies without conflicts to two functions in rtlanal.c: get_base_term(), for which the patch is intended, and get_index_term(), for which the patch is not. Due to git rebases and patch updates, I have accidentally pushed the patch twice and unintentionally changed get_index_term(). From what I can tell the change is benign, but, still, it is unnecessary. The attached patch reverts the accidental commit. It was bootstrapped arm-linux-gnueabihf. OK for stage 1? I'll regtest it before committing, just in case. Thanks, -- Maxim Kuvyrkov www.linaro.org 0001-Revert-accidental-commit-get_base_index-was-the-inte.patch Description: Binary data
Re: [PATCH][RFA][PR target/15184] Partial fix for direct byte access on x86
I'm withdrawing the combine_simplify_rtx hunk of this patch. While working cleaning up my improvements for the remaining of testcases I stumbled upon a simpler change which covers all the tests. What's kind of funny is I'd been staring at the relevant code a goodly part of the weekend without seeing how easily it could be extended and that the result doesn't have to match a pattern as combine can split the horrid mess in such a way that we two matched insns which when combined with other nearby insns ultimately collapse into precisely what we want. We're still going to need the changes to the heuristic to enable 4 insn combinations as we need to be giving nice big blobs of code to combine_simplify_rtx and its children. It's actually kind of cool to see something like this flow into make_field_assignment: (set (mem/c:HI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7670bcf0 y) [2 y+0 S2 A16]) (subreg:HI (ior:SI (zero_extend:SI (mem/c:QI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7670bcf0 y) [2 y+0 S1 A16])) (reg:SI 100 [ D.1569 ])) 0)) And make_field_assignment turns it into: (set (mem/c:QI (const:SI (plus:SI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7670bcf0 y) (const_int 1 [0x1]))) [2 y+1 S1 A8]) (subreg:QI (lshiftrt:SI (reg:SI 100 [ D.1569 ]) (const_int 8 [0x8])) 0)) Combine then chooses the lshift expression as a split point and emits an insn for the lshift and substitutes a nice simple reg into that hunk of RTL above for the lshift (set (mem/c:QI (const:SI (plus:SI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7670bcf0 y) (const_int 1 [0x1]))) [2 y+1 S1 A8]) (subreg:QI (reg:SI 103) 0)) oh, that looks perfect. Now it's just a simple matter of cleanup :-) Which is actually kindof fun to watch: We'll have this after the combine split step: (insn 9 8 10 2 (parallel [ (set (reg:SI 99 [ D.1569 ]) (ashift:SI (reg:SI 96 [ c ]) (const_int 8 [0x8]))) (clobber (reg:CC 17 flags)) ]) j.c:33 510 {*ashlsi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:SI 96 [ c ]) (nil (insn 10 9 13 2 (parallel [ (set (reg:SI 100 [ D.1569 ]) (and:SI (reg:SI 99 [ D.1569 ]) (const_int 65280 [0xff00]))) (clobber (reg:CC 17 flags)) ]) j.c:33 380 {*andsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:SI 99 [ D.1569 ]) (nil (insn 13 10 14 2 (parallel [ (set (reg:SI 103) (lshiftrt:SI (reg:SI 100 [ D.1569 ]) (const_int 8 [0x8]))) (clobber (reg:CC 17 flags)) ]) j.c:33 543 {*lshrsi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:SI 100 [ D.1569 ]) (nil (insn 14 13 0 2 (set (mem/c:QI (const:SI (plus:SI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7670bcf0 y) (const_int 1 [0x1]))) [2 y+1 S1 A8]) (subreg:QI (reg:SI 103) 0)) j.c:33 93 {*movqi_internal} (expr_list:REG_DEAD (reg:SI 103) (nil))) Eventually all those participate in combinations again resulting in just: (insn 14 13 0 2 (set (mem/c:QI (const:SI (plus:SI (symbol_ref:SI (y) [flags 0x40] var_decl 0x7fd247c16cf0 y) (const_int 1 [0x1]))) [2 y+1 S1 A8]) (subreg:QI (reg:SI 96 [ c ]) 0)) j.c:33 93 {*movqi_internal} (expr_list:REG_DEAD (reg:SI 96 [ c ]) (nil))) ie, movb %al, y+1 Sometimes I malign the combiner, but there are also days when I just have to say wow, not bad given its last major revamp was in 1990/1991 (which brought in the splitting code noted above. Anyway, onward bootstrapping and testing... Jeff On 01/26/15 20:07, Jeff Law wrote: Segher: I know you're not officially noted as a maintainer or reviewer for combine.c, but that's something I'd like to change if you're interested in a larger role. In the mean time, any feedback you have would be appreciated. So the issue mentioned in the BZ is that fairly obvious code sequences that ought to use simple byte moves are expanding into hideous sequences (load, store, couple bitwise logicals, maybe a shift or extension thrown in for good measure). As mentioned in the BZ, one of the issues is that combine is limited in terms of how many insns it will look at. As it turns out that was addressed not terribly low ago and we can do 4 insn combinations. With just a little work in combine.c we can get the desired code for the first two testcases as well as two of my own. The first issue is 4 insn combinations are (reasonably) guarded in such a way as to avoid them if they are unlikely to succeed. We basically look at the operands of the 4 insns and try to guess if there's a reasonable chance a combination would succeed. If not, no 4 insn combinations are
Re: [patch, libobjc] export __objc_get_forward_imp, get_imp again
On 01/22/2015 05:09 PM, Matthias Klose wrote: On 01/22/2015 12:56 AM, Andrew Pinski wrote: On Wed, Jan 21, 2015 at 8:51 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jan 21, 2015 at 08:41:46AM -0800, pins...@gmail.com wrote: On Jan 21, 2015, at 1:02 AM, Matthias Klose d...@ubuntu.com wrote: __objc_get_forward_imp and get_imp were exported in libobjc since GCC 4.1, for some reason these are not exported anymore in GCC 5 (both declared inline). So either export these as before, or don't export them and bump the soname. The latter seems to be unwanted, and at least gnustep-base is using the get_imp function. So better keep the references in GCC 5? Is this an intended change in GCC 5 to not to export inline methods anymore? Just remove the inline instead. The comments like: /* The new name of get_imp(). */ IMP class_getMethodImplementation (Class class_, SEL selector) { if (class_ == Nil || selector == NULL) return NULL; /* get_imp is inlined, so we're good. */ return get_imp (class_, selector); } don't make me very confident in such a change. The extern prototypes really work with both -std=gnu89 and -std=gnu11 and thus will at least keep status quo. Let's do that then. get_imp was renamed to class_getMethodImplementation, which is exported from objc/runtime.h. GNUstep-base uses get_imp to define it's own class_getMethodImplementation, so get_imp isn't really needed. So either make the two functions inline, and don't export them, or declare the prototypes. For the latter I would suggest objc-private/runtime.h (class_getMethodImplementation is declared in objc/runtime.h). now commited the following patch after approval on IRC. Matthias libobjc/ 2015-01-27 Matthias Klose d...@ubuntu.com * sendmsg.c: Add prototypes for __objc_get_forward_imp and get_imp. Index: libobjc/sendmsg.c === --- libobjc/sendmsg.c (revision 220167) +++ libobjc/sendmsg.c (working copy) @@ -104,6 +104,10 @@ struct objc_method * search_for_method_in_list (struct objc_method_list * list, SEL op); id nil_method (id, SEL); +/* Make sure this inline function is exported regardless of GNU89 or C99 + inlining semantics as it is part of the libobjc ABI. */ +extern IMP __objc_get_forward_imp (id, SEL); + /* Given a selector, return the proper forwarding implementation. */ inline IMP @@ -320,6 +324,10 @@ return res; } +/* Make sure this inline function is exported regardless of GNU89 or C99 + inlining semantics as it is part of the libobjc ABI. */ +extern IMP get_imp (Class, SEL); + inline IMP get_imp (Class class, SEL sel)
Re: [PATCH] Fix PR64798
On 27/01/15 14:43 +0100, Richard Biener wrote: The new exceptional EH allocator failed to align exception objects properly (it ended up aligning to __alignof__((std::size_t))). The following fixes that by aligning to what __attribute__((aligned)) would align to (this is what _Unwind_Exception is aligned to, a member of __cxa_refcounted_exception). Bootstrapped and tested on x86_64-unknown-linux-gnu - Rainer is testing this on sparc-solaris where it broke g++.old-deja/g++.eh/badalloc1.C. Ok for trunk? Yes, thanks.
[PATCH][AArch32] Testcase fix for __ATOMIC_CONSUME
Hi, This patch fixes arm/atomic-op-consume.c test to expect safe LDAEX instruction to be generated when __ATOMIC_CONSUME semantics is requested. This patch was tested by running the modified test on arm-none-eabi and arm-none-linux-gnueabi compilers. Is this patch ok? Alex 2015-01-27 Alex Velenko alex.vele...@arm.com gcc/testsuite/ * gcc.target/arm/atomic-op-consume.c (scan-assember-times): Adjust scan-assembler-times pattern. diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c index 0354717..cc6c028 100644 --- a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c +++ b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c @@ -5,6 +5,9 @@ #include ../aarch64/atomic-op-consume.x -/* { dg-final { scan-assembler-times ldrex\tr\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ +/* To workaround Bugzilla 59448 issue, a request for __ATOMIC_CONSUME is always + promoted to __ATOMIC_ACQUIRE, implemented as MEMMODEL_ACQUIRE. This causes + LDAEX to be generated instead of LDREX. */ +/* { dg-final { scan-assembler-times ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ /* { dg-final { scan-assembler-times strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ /* { dg-final { scan-assembler-not dmb } } */
Re: [patch] libstdc++/64368 add configure check for timed mutex operations
This isn't related to the last patch for this bug, except that the PR is currently being used for all darwin FAILs. We need to check a configure macro before using pthread_rwlock_timedrdlock because Darwin doesn't define the _POSIX_TIMEOUTS option. Tested x86_64-linux, committed to trunk. commit d48fe00ea96b3515a6a1f7a6926dfe2ff7db643c Author: Jonathan Wakely jwak...@redhat.com Date: Tue Jan 27 10:38:09 2015 + PR libstdc++/64368 * include/std/shared_mutex (shared_timed_mutex::try_lock_for, shared_timed_mutex::try_lock_until): Only define when POSIX thread timeouts option is supported. (shared_timed_mutex::try_shared_lock_for, shared_timed_mutex::try_shared_lock_until): Likewise. diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex index 643768c..47cfc64 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -108,6 +108,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return true; } +#if _GTHREAD_USE_MUTEX_TIMEDLOCK templatetypename _Rep, typename _Period bool try_lock_for(const chrono::duration_Rep, _Period __rel_time) @@ -149,6 +150,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const auto __s_atime = __s_entry + __delta; return try_lock_until(__s_atime); } +#endif void unlock() @@ -186,6 +188,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return true; } +#if _GTHREAD_USE_MUTEX_TIMEDLOCK templatetypename _Rep, typename _Period bool try_lock_shared_for(const chrono::duration_Rep, _Period __rel_time) @@ -230,6 +233,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const auto __s_atime = __s_entry + __delta; return try_lock_shared_until(__s_atime); } +#endif void unlock_shared()
Re: [PATCH, PR tree-optimization/64277] Improve loop iterations count estimation
On 27 Jan 12:29, Richard Biener wrote: On Tue, Jan 27, 2015 at 11:47 AM, Ilya Enkovich enkovich@gmail.com wrote: On 27 Jan 12:40, Ilya Enkovich wrote: Hi, This patch was supposed to fix PR tree-optimization/64277. Tracker is now fixed by warnings disabling but I think patch is still useful to avoid dead code generated by complete unroll. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Ilya -- gcc/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-loop-niter.c (record_nonwrapping_iv): Use base range info when possible to refine estimation. gcc/testsuite/ 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/pr64277.c: New. Here is a new version fixed according to comments in the tracker. I also fixed a test to scan cunroll dumps. Does it look OK? Minor comments below. What are possible branches for this patch? You can probably create a testcase that shows code-size regressions against a version that didn't peel completely (GCC 4.7). Thus I'd say it would apply to 4.9 as well (4.8 doesn't have range information). Thanks, Ilya -- diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..c6ef331 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options -O3 -Wall -Werror -fdump-tree-cunroll-details } */ +/* { dg-final { scan-tree-dump loop with 5 iterations completely unrolled cunroll } } */ +/* { dg-final { scan-tree-dump loop with 6 iterations completely unrolled cunroll } } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m 5) +m = 5; + for (i = m; i l; i--) +f1[i] = a[i]++; +} diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 919f5c0..1cd297d 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, { tree niter_bound, extreme, delta; tree type = TREE_TYPE (base), unsigned_type; + tree orig_base = base; if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step)) return; @@ -2777,16 +2778,30 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, if (tree_int_cst_sign_bit (step)) { + wide_int min, max; extreme = fold_convert (unsigned_type, low); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (high) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (wide_int (high), max)) For me a simple wi::gts_p (high, max) worked fine. + base = wide_int_to_tree (unsigned_type, max); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, high); delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme); step = fold_build1 (NEGATE_EXPR, unsigned_type, step); } else { + wide_int min, max; extreme = fold_convert (unsigned_type, high); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + TREE_CODE (low) == INTEGER_CST + INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) + get_range_info (orig_base, min, max) == VR_RANGE + wi::gts_p (min, wide_int (low))) Likewise. Ok for trunk with that changes. For the 4.9 branch you need to adjust the patch to not use wide-ints. I'd leave it on trunk for a while and eventually open a bugreport for the size regression to keep track of it. Thanks, Richard. Thanks a lot for review! Here is a final version for GCC 5.0. Will prepare 4.9 version later. Thanks, Ilya -- diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..c6ef331 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options -O3 -Wall -Werror -fdump-tree-cunroll-details } */ +/* { dg-final { scan-tree-dump loop with 5 iterations completely unrolled cunroll } } */ +/* { dg-final { scan-tree-dump loop with 6 iterations completely unrolled cunroll } } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m 5) +m = 5; + for (i = m; i
[PATCH, CHKP] Fix PR middle-end/64805
Hi, Some time ago removal of not instrumented version of funtion with 'always_inline' was delayed to enable their inlining. With this change we may have situations when we inline into a not instrumented version of a function which also has an instrumented version (happens when both of them have 'always_inline'). It causes ICE in cgraph_node verifier because we clear all references before inlining and verifier expects IPA_REF_CHKP reference for all functions having instrumented version. This patch fixes it by rebuilding IPA_REF_CHKP reference. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com PR middle-end/64805 * ipa-inline.c (early_inliner): Rebuild IPA_REF_CHKP reference to avoid error in cgraph node verification. 2015-01-27 Ilya Enkovich ilya.enkov...@intel.com PR middle-end/64805 * gcc.target/i386/pr64805.c: New. diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index c0ff329..d341619 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2464,6 +2464,13 @@ early_inliner (function *fun) #endif node-remove_all_references (); + /* Rebuild this reference because it dosn't depend on + function's body and it's required to pass cgraph_node + verification. */ + if (node-instrumented_version + !node-instrumentation_clone) +node-create_reference (node-instrumented_version, IPA_REF_CHKP, NULL); + /* Even when not optimizing or not inlining inline always-inline functions. */ inlined = inline_always_inline_functions (node); diff --git a/gcc/testsuite/gcc.target/i386/pr64805.c b/gcc/testsuite/gcc.target/i386/pr64805.c new file mode 100644 index 000..8ba0a97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr64805.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -fcheck-pointer-bounds -mmpx } */ + +#include stdio.h + +static inline void __attribute ((always_inline)) functionA(void) +{ + return; +} + +static inline void __attribute ((always_inline)) functionB(void) +{ + functionA(); +} + +int test(void) +{ + functionB(); + + return 0; +}
Re: [PATCH] Workaround -Wmaybe-uninitialized false positives during profiledbootstrap
On 01/27/2015 05:23 AM, DJ Delorie wrote: +/* Workaround -Wstrict-overflow false positive during profiledbootstrap. */ + +# if GCC_VERSION = 4004 +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored -Wstrict-overflow +#endif + #pragma diagnostic ignored was added in 4.4 but #pragma diagnostic push/pop wasn't added until a later release (4.6 I think). Attempts to build with 4.4 (i.e. on RHEL 6) causes warnings on most files. Hello. Thank you for pointing out, thus changing to 4006 would be the right fix? Thanks, Martin 2010-06-21 DJ Delorie d...@redhat.com * diagnostic.h (diagnostic_classification_change_t): New. (diagnostic_context): Add history and push/pop list. (diagnostic_push_diagnostics): Declare. (diagnostic_pop_diagnostics): Declare. * diagnostic.c (diagnostic_classify_diagnostic): Store changes from pragmas in a history chain instead of the global table. (diagnostic_push_diagnostics): New. (diagnostic_pop_diagnostics): New. (diagnostic_report_diagnostic): Scan history chain to find state of diagnostics as of the diagnostic location. * opts.c (set_option): Pass UNKNOWN_LOCATION to diagnostic_classify_diagnostic. (enable_warning_as_error): Likewise. * diagnostic-core.h (DK_POP): Add after real diagnostics, for use in the history chain. * doc/extend.texi: Document pragma GCC diagnostic changes.
Re: [Patch, ARM/Thumb1]Add a Thumb1 insn pattern to legalize the instruction that moves pc to low register
On Fri, Jan 9, 2015 at 7:43 AM, Terry Guo terry@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Monday, December 08, 2014 7:31 PM To: Terry Guo; gcc-patches@gcc.gnu.org Cc: Ramana Radhakrishnan Subject: Re: [Patch, ARM/Thumb1]Add a Thumb1 insn pattern to legalize the instruction that moves pc to low register On 08/12/14 08:24, Terry Guo wrote: Hi there, When compile below simple code: terguo01@terry-pc01:mtpcs-frame$ cat test.c int main(void) { return 0; } I got ICE with option -mtpcs-leaf-frame (no error if remove this option). terguo01@terry-pc01:mtpcs-frame$ /work/terguo01/tools/gcc-arm-none-eabi-5_0-2014q4/bin/arm-none-eabi- gc c -mtpcs-leaf-frame test.c -c -mcpu=cortex-m0plus -mthumb -da test.c: In function 'main': test.c:4:1: error: unrecognizable insn: } ^ (insn 20 19 21 (set (reg:SI 2 r2) (reg:SI 15 pc)) test.c:2 -1 (nil)) test.c:4:1: internal compiler error: in extract_insn, at recog.c:2327 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html\ for instructions. This RTL is generated in function thumb1_expand_prologue. The expected insn pattern is thumb1_movsi_insn in thumb1.md. And instruction like mov r2, pc is a legal instruction. Because gcc returns NO_REG for PC register, so no valid pattern to match instruction that move pc to low register. This patch intends to add a new insn pattern to legalize such thing. Tested with GCC regression test. No regression. Is it OK to trunk? BR, Terry 2014-12-08 Terry Guo terry@arm.com * config/arm/predicates.md (pc_register): New to match PC register. * config/arm/thumb1.md (*thumb1_movpc_insn): New insn pattern. gcc/testsuite/ChangeLog: 2014-12-08 Terry Guo terry@arm.com * gcc.target/arm/thumb1-mov-pc.c: New test. thumb1-move-pc-v1.txt diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 032808c..c5ef5ed 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -361,6 +361,10 @@ (and (match_code smin,smax,umin,umax) (match_test mode == GET_MODE (op +(define_special_predicate pc_register + (and (match_code reg) + (match_test REGNO (op) == PC_REGNUM))) + (define_special_predicate cc_register (and (match_code reg) (and (match_test REGNO (op) == CC_REGNUM) diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index ddedc39..8e6057c 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -1780,6 +1780,16 @@ ) +(define_insn *thumb1_movpc_insn + [(set (match_operand:SI 0 low_register_operand) This needs constraints. The constraint is used now. Is this one OK? This is OK now. Ramana BR, Terry 2015-01-09 Terry Guo terry@arm.com * config/arm/thumb1.md (*thumb1_movpc_insn): New insn pattern.
[Committed] S/390: Increase register move costs for FPR-GPR moves
Hi, I've committed the attached patch which fixes a 4.8 vs 4.9/5.0 performance regression introduced with the aggressive use of FPRs as spill slots. Committed to mainline and 4.9 branch. Bye, -Andreas- 2015-01-27 Andreas Krebbel andreas.kreb...@de.ibm.com * config/s390/s390.c (s390_register_move_cost): Increase costs for FPR-GPR moves. diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 36b547d..fcde638 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -2393,16 +2393,29 @@ s390_float_const_zero_p (rtx value) /* Implement TARGET_REGISTER_MOVE_COST. */ static int -s390_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED, +s390_register_move_cost (machine_mode mode, reg_class_t from, reg_class_t to) { - /* On s390, copy between fprs and gprs is expensive as long as no - ldgr/lgdr can be used. */ - if ((!TARGET_Z10 || GET_MODE_SIZE (mode) != 8) - ((reg_classes_intersect_p (from, GENERAL_REGS) - reg_classes_intersect_p (to, FP_REGS)) - || (reg_classes_intersect_p (from, FP_REGS) - reg_classes_intersect_p (to, GENERAL_REGS + /* On s390, copy between fprs and gprs is expensive. */ + + /* It becomes somewhat faster having ldgr/lgdr. */ + if (TARGET_Z10 GET_MODE_SIZE (mode) == 8) +{ + /* ldgr is single cycle. */ + if (reg_classes_intersect_p (from, GENERAL_REGS) + reg_classes_intersect_p (to, FP_REGS)) + return 1; + /* lgdr needs 3 cycles. */ + if (reg_classes_intersect_p (to, GENERAL_REGS) + reg_classes_intersect_p (from, FP_REGS)) + return 3; +} + + /* Otherwise copying is done via memory. */ + if ((reg_classes_intersect_p (from, GENERAL_REGS) +reg_classes_intersect_p (to, FP_REGS)) + || (reg_classes_intersect_p (from, FP_REGS) + reg_classes_intersect_p (to, GENERAL_REGS))) return 10; return 1;
[Committed] S/390: Increase memory access costs
Hi, I've committed the attached patch which fixes a 4.8 vs 4.9/5.0 performance regression introduced with the aggressive use of FPRs as spill slots. Committed to mainline and 4.9 branch. Bye, -Andreas- 2015-01-27 Andreas Krebbel andreas.kreb...@de.ibm.com * config/s390/s390.c (s390_memory_move_cost): Increase costs for memory accesses. diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 1409fa8..9c67157 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -2434,7 +2434,7 @@ s390_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, reg_class_t rclass ATTRIBUTE_UNUSED, bool in ATTRIBUTE_UNUSED) { - return 1; + return 2; } /* Compute a (partial) cost for rtx X. Return true if the complete
Re: [Patch, Fortran] PR64771 - Fix coarray ICE
Jakub Jelinek ja...@redhat.com writes: The problem is (as so often) that algorithm was included *before* config.h. Moving it after the other includes allows interface.c to compile without warnings. Why don't you use MAX macro instead of std::max as everywhere else in the gcc sources? No idea, ask Tobias :-) Anyway, the original patch would most likely have worked: system.h already includes algorithm. This one compiles just as well, of course. Rainer 2015-01-27 Rainer Orth r...@cebitec.uni-bielefeld.de * interface.c: Remove algorithm. (check_dummy_characteristics): Use MAX instead of std::max. # HG changeset patch # Parent a742f8ce2a00e481ddf92dbecaf8d1ee01448911 Avoid std::max diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -63,8 +63,6 @@ along with GCC; see the file COPYING3. formal argument list points to symbols within the same namespace as the program unit name. */ -#include algorithm /* For std::max. */ - #include config.h #include system.h #include coretypes.h @@ -1215,7 +1213,7 @@ check_dummy_characteristics (gfc_symbol } if (s1-as-type == AS_EXPLICIT) - for (i = 0; i s1-as-rank + std::max(0, s1-as-corank-1); i++) + for (i = 0; i s1-as-rank + MAX (0, s1-as-corank-1); i++) { shape1 = gfc_subtract (gfc_copy_expr (s1-as-upper[i]), gfc_copy_expr (s1-as-lower[i])); -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [Patch, Fortran] PR64771 - Fix coarray ICE
Rainer Orth wrote: Why don't you use MAX macro instead of std::max as everywhere else in the gcc sources? No idea, ask Tobias :-) No real reason - presumably, because I had MAX not in mind and thought of the general move towards standard features. Anyway, the original patch would most likely have worked: system.h already includes algorithm. It did on my system :-) This one compiles just as well, of course. From my side, that patch (using MAX) is fine. Thanks for bearing the bootstrap failure and for the patch. Tobias 2015-01-27 Rainer Orth r...@cebitec.uni-bielefeld.de * interface.c: Remove algorithm. (check_dummy_characteristics): Use MAX instead of std::max. # HG changeset patch # Parent a742f8ce2a00e481ddf92dbecaf8d1ee01448911 Avoid std::max diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -63,8 +63,6 @@ along with GCC; see the file COPYING3. formal argument list points to symbols within the same namespace as the program unit name. */ -#include algorithm /* For std::max. */ - #include config.h #include system.h #include coretypes.h @@ -1215,7 +1213,7 @@ check_dummy_characteristics (gfc_symbol } if (s1-as-type == AS_EXPLICIT) - for (i = 0; i s1-as-rank + std::max(0, s1-as-corank-1); i++) + for (i = 0; i s1-as-rank + MAX (0, s1-as-corank-1); i++) { shape1 = gfc_subtract (gfc_copy_expr (s1-as-upper[i]), gfc_copy_expr (s1-as-lower[i]));
Re: RFA: patch to fix a bad code generation for PR64110 -- new constraints addition
On 01/27/2015 09:08 AM, Richard Sandiford wrote: Yeah, but in practice that's only ever going to be a partial transition. Many port maintainers won't look at this, so we'll have to support both versions indefinitely, even if the new behaviour turns out to be the best for all cases. I just think we're going to regret having two sets of constraints with such subtly different meanings. Looking back at the original PR, Jakub said: The ! has been added by me for PR63594, so it isn't there from the era when i?86 backend was using reload. If there is a better way to express that RA should prefer to use memory or xmm register and only use r constraint if it already is in a r register and doesn't need to be reloaded, I can use that. Whether it is ?, ??? or something else. ! description in gcc docs just fitted most what I wanted... In some ways this seems to match the intention of *. Originally I think it was just an RA-only thing and was ignored by reload, but LRA does take it into account too (which sounds like progress to me). I guess we don't need '*' in many cases. It is overused. Imho, IRA should decide what class is better based on costs of alternatives and the explicit exclusion of register class by using '*' is a bad practice. Saying that I believe we should do register class preferrencing algorithm more alternative oriented. The algorithm should choose first an alternative (of may be subset of alternatives) and then register classes. I think it is more logical. It would permits us to rid off all such constraints including '*' and use only one like '?' which increases the alternative cost. In perspective it is even better to rid of '?' too and have some hook (or attribute) to get insn alternative costs which can be depended on sub-target or other run-time characteristics. Otherwise we need to duplicate insn descriptions and put different insn guards. I am going to work on this. But it is hard to say will it work well (may be I have some performance issues with this). This hook somehow (min or average of the values for all alternatives) can be used in combiner and other algorithms need an insn cost. That is how I see the solution of the problem in a long perspective. If I revert the patch locally and change the *vec_dupmode pattern to use *, it passes both the test for PR64110 and the tests for PR63594. Would that be OK as an alternative? I don't think it will work in general case. It probably works because a different class is chosen in IRA. If IRA for some reasons choose the same class, we might see the same problem in LRA. I also don't like when register classes are excluded by '*' for IRA (see my thoughts above).
Fix ICE in ipa-devirt
Hi, the two testcases show somewhat crazy layout of C++ object that goes in order base1,base2,virtual_base_of_base1 this confuses the walk in get_binfo_at_offset while looking for virtual_base_of_base1 to look into base2 instead of base1. It seems that in the case of virtual inheritance we simply want to do fully recursive search for the given binfo - it is not that expensiv ebecause bases are not many. Bootstrapped/regtested x86_64-linux. Will commit it today after rebuilding firefox. PR IPA/60871 PR IPA/64139 * tree.c (lookup_binfo_at_offset): New function. (get_binfo_at_offset): Use it. * g++.dg/torture/pr64139.C: New testcase. * g++.dg/torture/pr60871.C: Likewise. Index: tree.c === --- tree.c (revision 220142) +++ tree.c (working copy) @@ -11990,6 +11990,23 @@ type_in_anonymous_namespace_p (const_tre return (TYPE_STUB_DECL (t) !TREE_PUBLIC (TYPE_STUB_DECL (t))); } +/* Lookup sub-BINFO of BINFO of TYPE at offset POS. */ + +tree +lookup_binfo_at_offset (tree binfo, tree type, HOST_WIDE_INT pos) +{ + unsigned int i; + tree base_binfo, b; + + for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) +if (pos == tree_to_shwi (BINFO_OFFSET (base_binfo)) +types_same_for_odr (TREE_TYPE (base_binfo), type)) + return base_binfo; +else if ((b = lookup_binfo_at_offset (base_binfo, type, pos)) != NULL) + return b; + return NULL; +} + /* Try to find a base info of BINFO that would have its field decl at offset OFFSET within the BINFO type and which is of EXPECTED_TYPE. If it can be found, return, otherwise return NULL_TREE. */ @@ -12027,42 +12044,22 @@ get_binfo_at_offset (tree binfo, HOST_WI represented in the binfo for the derived class. */ else if (offset != 0) { - tree base_binfo, binfo2 = binfo; - - /* Find BINFO corresponding to FLD. This is bit harder -by a fact that in virtual inheritance we may need to walk down -the non-virtual inheritance chain. */ - while (true) - { - tree containing_binfo = NULL, found_binfo = NULL; - for (i = 0; BINFO_BASE_ITERATE (binfo2, i, base_binfo); i++) - if (types_same_for_odr (TREE_TYPE (base_binfo), TREE_TYPE (fld))) - { - found_binfo = base_binfo; - break; - } - else - if ((tree_to_shwi (BINFO_OFFSET (base_binfo)) - - tree_to_shwi (BINFO_OFFSET (binfo))) - * BITS_PER_UNIT pos - /* Rule out types with no virtual methods or we can get confused -here by zero sized bases. */ - TYPE_BINFO (BINFO_TYPE (base_binfo)) - BINFO_VTABLE (TYPE_BINFO (BINFO_TYPE (base_binfo))) - (!containing_binfo - || (tree_to_shwi (BINFO_OFFSET (containing_binfo)) - tree_to_shwi (BINFO_OFFSET (base_binfo) - containing_binfo = base_binfo; - if (found_binfo) - { - binfo = found_binfo; - break; - } - if (!containing_binfo) - return NULL_TREE; - binfo2 = containing_binfo; - } - } + tree found_binfo = NULL, base_binfo; + int offset = (tree_to_shwi (BINFO_OFFSET (binfo)) + pos + / BITS_PER_UNIT); + + for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) + if (tree_to_shwi (BINFO_OFFSET (base_binfo)) == offset +types_same_for_odr (TREE_TYPE (base_binfo), TREE_TYPE (fld))) + { + found_binfo = base_binfo; + break; + } + if (found_binfo) + binfo = found_binfo; + else + binfo = lookup_binfo_at_offset (binfo, TREE_TYPE (fld), offset); +} type = TREE_TYPE (fld); offset -= pos; Index: testsuite/g++.dg/torture/pr64139.C === --- testsuite/g++.dg/torture/pr64139.C (revision 0) +++ testsuite/g++.dg/torture/pr64139.C (revision 0) @@ -0,0 +1,34 @@ +// { dg-do compile } +class IObject { +public: + virtual ~IObject(); +}; +class A { + virtual int m_fn1(); +}; +class B { +public: + virtual int m_fn2(B) const; +}; +class D : IObject, public virtual B {}; +class G : public D, A { +public: + G(A); +}; +class F : B { + friend class C; +}; +class C { + void m_fn3(const IObject , int ); + void m_fn4(const B , int ); +}; +A a; +void C::m_fn3(const IObject , int p2) { + G r(a); + m_fn4(r, p2); +} +void C::m_fn4(const B p1, int ) { + F b; + p1.m_fn2(b); +} + Index: testsuite/g++.dg/torture/pr60871.C
Re: [PATCH][AArch32] Testcase fix for __ATOMIC_CONSUME
On Tue, Jan 27, 2015 at 4:06 PM, Alex Velenko alex.vele...@arm.com wrote: Hi, This patch fixes arm/atomic-op-consume.c test to expect safe LDAEX instruction to be generated when __ATOMIC_CONSUME semantics is requested. This patch was tested by running the modified test on arm-none-eabi and arm-none-linux-gnueabi compilers. Is this patch ok? Ok. Please remember James's comments in the future about cover notes. Ramana Alex 2015-01-27 Alex Velenko alex.vele...@arm.com gcc/testsuite/ * gcc.target/arm/atomic-op-consume.c (scan-assember-times): Adjust scan-assembler-times pattern. diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c index 0354717..cc6c028 100644 --- a/gcc/testsuite/gcc.target/arm/atomic-op-consume.c +++ b/gcc/testsuite/gcc.target/arm/atomic-op-consume.c @@ -5,6 +5,9 @@ #include ../aarch64/atomic-op-consume.x -/* { dg-final { scan-assembler-times ldrex\tr\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ +/* To workaround Bugzilla 59448 issue, a request for __ATOMIC_CONSUME is always + promoted to __ATOMIC_ACQUIRE, implemented as MEMMODEL_ACQUIRE. This causes + LDAEX to be generated instead of LDREX. */ +/* { dg-final { scan-assembler-times ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ /* { dg-final { scan-assembler-times strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\] 6 } } */ /* { dg-final { scan-assembler-not dmb } } */
[PATCH][AArch64] Testcase fix for __ATOMIC_CONSUME
Hi, This patch fixes aarch64/atomic-op-consume.c test to expect safe LDAXR instruction to be generated when __ATOMIC_CONSUME semantics is requested. This patch was tested by running the modified test on aarch64-none-elf compiler. Is this patch ok? Alex 2015-01-27 Alex Velenko alex.vele...@arm.com gcc/testsuite/ * gcc.target/aarch64/atomic-op-consume.c (scan-assember-times): Adjust scan-assembler-times pattern. diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c index 38d6c2c..7ece5b1 100644 --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c @@ -3,5 +3,8 @@ #include atomic-op-consume.x -/* { dg-final { scan-assembler-times ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\] 6 } } */ +/* To workaround Bugzilla 59448 issue, a request for __ATOMIC_CONSUME is always + promoted to __ATOMIC_ACQUIRE, implemented as MEMMODEL_ACQUIRE. This causes + LDAXR to be generated instead of LDXR. */ +/* { dg-final { scan-assembler-times ldaxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\] 6 } } */ /* { dg-final { scan-assembler-times stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\] 6 } } */
[PATCH][AArch64][test][committed] Fix FAIL: gcc.target/aarch64/store-pair-1.c scan-assembler stp\tw[0-9]+, w[0-9]+
Hi all, I notice this test fails on aarch64-none-elf because the scan-assembler scans for wnum registers when one of them can be the wzr reg since we store a 0 into *a. This patch updates the pattern that is scanned for. Committed as obvious with r220176. Thanks, Kyrill 2015-01-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/store-pair-1.c: Update scan-assembler to check for wzr reg.diff --git a/gcc/testsuite/gcc.target/aarch64/store-pair-1.c b/gcc/testsuite/gcc.target/aarch64/store-pair-1.c index a726d64..a90fc61 100644 --- a/gcc/testsuite/gcc.target/aarch64/store-pair-1.c +++ b/gcc/testsuite/gcc.target/aarch64/store-pair-1.c @@ -9,4 +9,4 @@ int f(int *a, int b) } /* We should be able to produce store pair for the store of 28/29 store. */ -/* { dg-final { scan-assembler stp\tw\[0-9\]+, w\[0-9\]+ } } */ +/* { dg-final { scan-assembler stp\tw(\[0-9\]+)\|(zr), w\[0-9\]+ } } */
[PATCH] PR jit/64780: configure: --enable-host-shared and the jit
Currently the jit requires you to specify --enable-host-shared, or the build eventually fails with linker errors (this is something of a FAQ for people trying out the jit). We seem to have two choices here: (A) default to --enable-host-shared when jit is an enabled language (B) have the toplevel configure reject jit as language if --enable-host-shared is not supplied. FWIW apparently Darwin defaults to position-independent code, so it's not explicitly needed there. I think (B) is the better option for us, since there is a performance cost: there are people who perform benchmarking of GCC (and publish their results on prominent websites). If they turn on the jit and use the same configuration to do their benchmarking of the rest of GCC, they'll see GCC 5 be apparently slower than earlier releases. This is sufficiently subtle that I don't think it's reasonable to simply document it and expect 3rd-party reviewers to see such a note in the documentation before benchmarking. The attached patch implements (B), with a note in the error message recommending that people configure and build GCC twice to avoid the performance hit, so that it can be self-documenting. Tested by hand with various combinations of values for --enable-host-shared and --enable-languages. OK for stage 4? PR jit/64780 * configure.ac: Require the user to explicitly specify --enable-host-shared if the jit is enabled. * configure: Regenerate. --- configure| 24 configure.ac | 24 2 files changed, 48 insertions(+) diff --git a/configure b/configure index 5860241..dd794db 100755 --- a/configure +++ b/configure @@ -14750,6 +14750,30 @@ fi +# PR jit/64780: Require the user to explicitly specify +# --enable-host-shared if the jit is enabled, hinting +# that they might want to do a separate configure/build of +# the jit, to avoid users from slowing down the rest of the +# compiler by enabling the jit. +if test ${host_shared} = no ; then + case ${enable_languages} in +*jit*) + as_fn_error +Enabling language \jit\ requires --enable-host-shared. + +--enable-host-shared typically slows the rest of the compiler down by +a few %, so you must explicitly enable it. + +If you want to build both the jit and the regular compiler, it is often +best to do this via two separate configure/builds, in separate +directories, to avoid imposing the performance cost of +--enable-host-shared on the regular compiler. $LINENO 5 + ;; +*) + ;; + esac +fi + # Specify what files to not compare during bootstrap. compare_exclusions=gcc/cc*-checksum\$(objext) | gcc/ada/*tools/* diff --git a/configure.ac b/configure.ac index 267c8e6..4ea5e00 100644 --- a/configure.ac +++ b/configure.ac @@ -3467,6 +3467,30 @@ AC_ARG_ENABLE(host-shared, [host_shared=$enableval], [host_shared=no]) AC_SUBST(host_shared) +# PR jit/64780: Require the user to explicitly specify +# --enable-host-shared if the jit is enabled, hinting +# that they might want to do a separate configure/build of +# the jit, to avoid users from slowing down the rest of the +# compiler by enabling the jit. +if test ${host_shared} = no ; then + case ${enable_languages} in +*jit*) + AC_MSG_ERROR([ +Enabling language jit requires --enable-host-shared. + +--enable-host-shared typically slows the rest of the compiler down by +a few %, so you must explicitly enable it. + +If you want to build both the jit and the regular compiler, it is often +best to do this via two separate configure/builds, in separate +directories, to avoid imposing the performance cost of +--enable-host-shared on the regular compiler.]) + ;; +*) + ;; + esac +fi + # Specify what files to not compare during bootstrap. compare_exclusions=gcc/cc*-checksum\$(objext) | gcc/ada/*tools/* -- 1.8.5.3
Re: RFA: patch to fix a bad code generation for PR64110 -- new constraints addition
Vladimir Makarov vmaka...@redhat.com writes: On 01/27/2015 09:08 AM, Richard Sandiford wrote: Yeah, but in practice that's only ever going to be a partial transition. Many port maintainers won't look at this, so we'll have to support both versions indefinitely, even if the new behaviour turns out to be the best for all cases. I just think we're going to regret having two sets of constraints with such subtly different meanings. Looking back at the original PR, Jakub said: The ! has been added by me for PR63594, so it isn't there from the era when i?86 backend was using reload. If there is a better way to express that RA should prefer to use memory or xmm register and only use r constraint if it already is in a r register and doesn't need to be reloaded, I can use that. Whether it is ?, ??? or something else. ! description in gcc docs just fitted most what I wanted... In some ways this seems to match the intention of *. Originally I think it was just an RA-only thing and was ignored by reload, but LRA does take it into account too (which sounds like progress to me). I guess we don't need '*' in many cases. It is overused. Imho, IRA should decide what class is better based on costs of alternatives and the explicit exclusion of register class by using '*' is a bad practice. Saying that I believe we should do register class preferrencing algorithm more alternative oriented. The algorithm should choose first an alternative (of may be subset of alternatives) and then register classes. I think it is more logical. It would permits us to rid off all such constraints including '*' and use only one like '?' which increases the alternative cost. In perspective it is even better to rid of '?' too and have some hook (or attribute) to get insn alternative costs which can be depended on sub-target or other run-time characteristics. Otherwise we need to duplicate insn descriptions and put different insn guards. I am going to work on this. But it is hard to say will it work well (may be I have some performance issues with this). This hook somehow (min or average of the values for all alternatives) can be used in combiner and other algorithms need an insn cost. That is how I see the solution of the problem in a long perspective. Definitely agree that it'd be better to remove these constraints in favour of a new attribute. preferred_for_size and preferred_for_speed give something similar, though they're much more stringent than what we need here. If I revert the patch locally and change the *vec_dupmode pattern to use *, it passes both the test for PR64110 and the tests for PR63594. Would that be OK as an alternative? I don't think it will work in general case. It probably works because a different class is chosen in IRA. If IRA for some reasons choose the same class, we might see the same problem in LRA. But isn't that the point of '*'? It should stop IRA from using the 'r' alternative as an indication that 'r' is a good choice for this instruction. If IRA chooses 'r' anyway, it must be because other instructions that use the same allocno strongly prefer 'r'. And in those some circumstances -- i.e. if IRA does choose 'r' despite the constraints in this instruction -- then I think we do want to use the 'r' alternative. And AIUI that's also what the new constraint is designed to do. If IRA chooses 'r' anyway, the new constraint causes LRA to prefer the 'r' alternative _even if_ another operand (the destination) has to be reloaded, which is the fundamental difference between the new constraint and '!'. So I'm still not sure why '*' wouldn't do what we want. I also don't like when register classes are excluded by '*' for IRA (see my thoughts above). Understood, and I agree it would be good to move to attributes. But in a way, I think that's an even better reason to try to avoid adding these new constraints. It sounds like we're hoping to get rid of them as soon as we've added them :-) Thanks, Richard
Re: [ping] Re: proper name of i386/x86-64/etc targets
On Tue, Jan 27, 2015 at 2:56 AM, Sandra Loosemore san...@codesourcery.com wrote: On 01/20/2015 12:02 PM, H.J. Lu wrote: On Tue, Jan 20, 2015 at 10:51 AM, Eric Botcazou ebotca...@adacore.com wrote: Ping? Any thoughts? x86 for the family and x86-32/x86-64 for the 2 architectures? Works for me. [redirecting from gcc@ to gcc-patches@] OK, here is a patch that attempts to implement that convention. I'd appreciate review from a target maintainer to check that I've correctly disambiguated places where i386 was referring to both 32- and 64-bit variants vs 32-bit only. I've left alone some instances of i386 where it seemed appropriate to name a specific processor -- e.g. there are a bunch of examples in the inline asm section that are described as i386 code. If this is OK to commit, I will follow it up with another patch to re-alphabetize the renamed sections (i386 whatever to x86 whatever). Trying to do both the renaming and the shuffling in a single patch would have made it impossible to review the actual changes to content. When I was working on this I also realized that some of the x86-specific material in extend.texi really needs copy-editing; again, best to do that in a separate patch. -@node i386 and x86-64 Options -@subsection Intel 386 and AMD x86-64 Options +@node x86 Options +@subsection x86 Options @cindex i386 Options -@cindex x86-64 Options +@cindex x86 Options +@cindex IA-32 Options @cindex Intel 386 Options @cindex AMD x86-64 Options Let's go all the way and remove all but @cindex x86 Options. -These @samp{-m} options are defined for the i386 and x86-64 family of -computers: +These @samp{-m} options are defined for the x86 family of computers, +including both x86-32 (IA-32 and Intel 386) and AMD x86-64: Also here. ... the x86 family of computers.. Without the including ... part. -@node i386 and x86-64 Windows Options -@subsection i386 and x86-64 Windows Options -@cindex i386 and x86-64 Windows Options +@node x86 Windows Options +@subsection x86 Windows Options +@cindex x86 Windows Options +@cindex i386 Windows Options +@cindex Intel 386 Windows Options +@cindex AMD x86-64 Windows Options +@cindex Windows Options for x86 IMO, all but @cindex x86 Windows Options should be removed. Others LGTM. Thanks, Uros.