Re: [PATCH] Builtin function roundeven folding implementation
Hi. Here is a clean patch that does not fold roundeven resulting for integer type and the conditions for folding functions round/ceil/floor/roundeven and trunc only checks for signaling NaN. This patch also conforms to GNU's coding style and standards. The fact that it should be CASE_MATHFN_FLOATN rather than CASE_MATHFN for roundeven is corrected in i386 expansion patch. Thanks, Tejas gcc/ChangeLog: 2019-06-12 Tejas Joshi * builtins.c (mathfn_built_in_2): Added CASE_MATHFN for ROUNDEVEN. * builtins.def: Added function definitions for roundeven function variants. * fold-const-call.c (fold_const_call_ss): Added case for roundeven function call. * fold-const.c (negate_mathfn_p): Added case for roundeven function. (tree_call_nonnegative_warnv_p): Added case for roundeven function. (integer_valued_real_call_p): Added case for roundeven function. * real.c (is_even): New function. Returns true if real number is even, otherwise returns false. (is_halfway_below): New function. Returns true if real number is halfway between two integers, else return false. (real_roundeven): New function. Round real number to nearest integer, rounding halfway cases towards even. * real.h (real_value): Added descriptive comments. Added function declaration for roundeven function. gcc/testsuite/ChangeLog: 2019-06-12 Tejas Joshi * gcc.dg/torture/builtin-round-roundeven.c: New test. * gcc.dg/torture/builtin-round-roundevenf128.c: New test. On Sat, 10 Aug 2019 at 02:15, Joseph Myers wrote: > > On Fri, 28 Jun 2019, Tejas Joshi wrote: > > > +CASE_CFN_ROUNDEVEN: > > +CASE_CFN_ROUNDEVEN_FN: > > + if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) > > Checking flag_errno_math here does not make sense. roundeven never sets > errno (at least, TS 18661-1 makes it implementation-defined whether sNaN > input counts as a domain error, but I'm not aware of implementations that > make it a domain error and set errno, and typically GCC follows glibc in > such cases in the absence of known implementations requiring a different > approach). > > The only case where you need to avoid folding is where the argument is a > signaling NaN (it's fine to fold for quiet NaNs). In that case, you need > to avoid folding to avoid losing an exception (if the user cares about > signaling NaNs, they probably care about exceptions) - so it still doesn't > matter whether the library implementation also sets errno or not. > > (Yes, this means the existing ceil / floor / round checks should be > adjusted just to check for signaling NaN, though that's fairly cosmetic as > calls to those functions with quiet NaN argument still get folded via > match.pd. trunc ought also check for signaling NaN rather than folding > unconditionally, so all those functions should end up with the same > conditions for folding.) > > > @@ -898,6 +907,10 @@ fold_const_call_ss (wide_int *result, combined_fn fn, > >return fold_const_conversion (result, real_round, arg, > > precision, format); > > > > +CASE_CFN_ROUNDEVEN: > > +CASE_CFN_ROUNDEVEN_FN: > > + return fold_const_conversion (result, real_roundeven, arg, > > precision, format); > > + > > This is the version of fold_const_call_ss for functions returning a result > of integer type; roundeven returns an integer value in a floating-point > type. I don't think this code should be there, and I don't think this > version of the function should be called at all for roundeven. > > -- > Joseph S. Myers > jos...@codesourcery.com diff --git a/gcc/builtins.c b/gcc/builtins.c index e2ba356c0d3..8ceb077b0bf 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2056,6 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn) CASE_MATHFN (REMQUO) CASE_MATHFN_FLOATN (RINT) CASE_MATHFN_FLOATN (ROUND) +CASE_MATHFN (ROUNDEVEN) CASE_MATHFN (SCALB) CASE_MATHFN (SCALBLN) CASE_MATHFN (SCALBN) diff --git a/gcc/builtins.def b/gcc/builtins.def index 6d41bdb4f44..8bb7027aac7 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -542,12 +542,18 @@ DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, AT #define RINT_TYPE(F) BT_FN_##F##_##F DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_RINT, "rint", RINT_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST) #undef RINT_TYPE +DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVEN, "roundeven", BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENF, "roundevenf", BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENL, "roundevenl", BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_ROUND, "round", BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_ROUNDF, "roundf", BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_ROUNDL, "roundl",
[Committed] PR fortran/87991 -- Pointer data object requires a target
Committed as obviously. 2019-08-13 Steven G. Kargl PR fortran/87991 * resolve.c (check_data_variable): data-stmt-object with pointer attribute requires a data-stmt-value with the target attribute. 2019-08-13 Steven G. Kargl PR fortran/87991 * gfortran.dg/pr87991.f90: New test. -- Steve Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (revision 274398) +++ gcc/fortran/resolve.c (working copy) @@ -15726,8 +15726,6 @@ check_data_variable (gfc_data_variable *var, locus *wh return false; } - has_pointer = sym->attr.pointer; - if (gfc_is_coindexed (e)) { gfc_error ("DATA element %qs at %L cannot have a coindex", sym->name, @@ -15735,19 +15733,30 @@ check_data_variable (gfc_data_variable *var, locus *wh return false; } + has_pointer = sym->attr.pointer; + for (ref = e->ref; ref; ref = ref->next) { if (ref->type == REF_COMPONENT && ref->u.c.component->attr.pointer) has_pointer = 1; - if (has_pointer - && ref->type == REF_ARRAY - && ref->u.ar.type != AR_FULL) - { - gfc_error ("DATA element %qs at %L is a pointer and so must " - "be a full array", sym->name, where); - return false; - } + if (has_pointer) + { + if (ref->type == REF_ARRAY && ref->u.ar.type != AR_FULL) + { + gfc_error ("DATA element %qs at %L is a pointer and so must " + "be a full array", sym->name, where); + return false; + } + + if (values.vnode->expr->expr_type == EXPR_CONSTANT) + { + gfc_error ("DATA object near %L has the pointer attribute " + "and the corresponding DATA value is not a valid " + "initial-data-target", where); + return false; + } + } } if (e->rank == 0 || has_pointer) Index: gcc/testsuite/gfortran.dg/pr87991.f90 === --- gcc/testsuite/gfortran.dg/pr87991.f90 (nonexistent) +++ gcc/testsuite/gfortran.dg/pr87991.f90 (working copy) @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "-w" } +! PR fortran/87991 +program p + type t + character(:), pointer :: c + end type + type(t) :: x + allocate (character(3) :: x%c) + data x%c /'abc'/ ! { dg-error "has the pointer attribute" } +end
Re: enforce canonicalization of value_range's
On 8/12/19 7:32 PM, Jeff Law wrote: On 8/12/19 12:48 PM, Aldy Hernandez wrote: This is a ping of: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg8.html I have addressed the comments Jeff mentioned there. This patch goes before the VR_VARYING + types patch, but I can adapt either one to come first. I can even munge them together into one patch, if it helps review. Tested on x86-64 Linux. OK (assuming ChangeLog entries)? Aldy curr.patch commit ea908bdbfc8cdb4bb63e7d42630d01203edbac41 Author: Aldy Hernandez Date: Mon Jul 15 18:09:27 2019 +0200 Enforce canonicalization in value_range. Per your other message, I'll assume you'll write a suitable ChangeLog entry before committing :-) Attached patch has ChangeLog entries. diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 594ee9adc17..de2f39d8487 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -69,23 +69,20 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "wide-int-range.h" +static bool +ranges_from_anti_range (const value_range_base *ar, + value_range_base *vr0, value_range_base *vr1, + bool handle_pointers = false); + Presumably this was better than moving the implementation earlier. Actually, it was for ease of review. I made some changes to the function, and I didn't want the reviewer to miss them because I had moved the function wholesale. I can move the function earlier, after we agree on the changes (see below). +/* Normalize symbolics into constants. */ + +value_range_base +value_range_base::normalize_symbolics () const +{ + if (varying_p () || undefined_p ()) +return *this; + tree ttype = type (); + bool min_symbolic = !is_gimple_min_invariant (min ()); + bool max_symbolic = !is_gimple_min_invariant (max ()); Sadly "symbolic" is a particularly poor term. When I think symbolics, I think SYMBOL_REF, LABEL_REF and the tree equivalents. They're *symbols*. Symbolics in this case are really things like SSA_NAMEs. Confused the hell out of me briefly. Sigh, yeah. Unfortunately, symbolics is what VRP calls anything that is not an INTEGER_CST (that is, PLUS_EXPR and SSA_NAMEs, etc). If we weren't on a path to kill VRP I'd probably suggest a distinct effort to constify this code. Some of the changes were a bit confusing when it looked like we'd dropped a call to set the range of an object. But those were just local copies, so setting the type/min/max directly was actually fine. constification would make this a bit clearer. But again, I don't think it's worth the effort given the long term trajectory for tree-vrp.c. I shouldn't be introducing any new confusion. Did I add any new methods that should've been const that aren't? I can't see any??. I'm happy to fix anything I introduced. So where does the handle_pointers stuff matter? I'm a bit surprised we have to do anything special for them. I've learned to touch as little of VRP as is necessary, as changing anything to be more consistent breaks things in unexpected ways ;-). In this particular case, TYPE_MIN_VALUE and TYPE_MAX_VALUE are not defined for pointers, and I didn't want to change the meaning of vrp_val_{min,max} throughout. I was trying to minimize the changes to existing behavior. If it bothers you too much, we could remove it as a follow up when we are sure there are no expected side-effects from the rest of the patch. ?? OK. I don't expect the answers to the minor questions above will ultimately change anything. I could appreciate a final nod before I commit. And even then, I will wait until the other patch is approved and commit them simultaneously. They are a bit intertwined. Thanks for your prompt reviews. Aldy commit 39be54a2e0ef19b3fb1cc3e51b380a6811e50ab4 Author: Aldy Hernandez Date: Mon Jul 15 18:09:27 2019 +0200 Enforce canonicalization in value_range. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 93e600d68f2..3fa864624c9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,37 @@ +2019-08-13 Aldy Hernandez + + * tree-vrp.c (value_range_base::set): Merge in code from + value_range_base::set_and_canonicalize. + Enforce canonicalization at set time. + Normalize [MIN, MAX] into VARYING and ~[MIN, MAX] into UNDEFINED. + (value_range_base::set_undefined): Inline call to set(). + (value_range_base::set_varying): Same. + (value_range_base::singleton_p): Handle VR_ANTI_RANGEs. + (vrp_val_max): New argument handle_pointers. + (vrp_val_min): Same. + (ranges_from_anti_range): Same. + (extract_range_into_wide_ints): Use tree argument instead of sign + and precision. + (extract_range_from_multiplicative_op): Take in tree type instead + of precision and sign. Adapt function for canonicalized ranges. + (extract_range_from_binary_expr): Pass type to + extract_range_from_multiplicative_op. + Adapt for canonicalized ranges. + (extract_range_from_unary_expr): Same. +
Re: types for VR_VARYING
On 8/12/19 7:46 PM, Jeff Law wrote: On 8/12/19 12:43 PM, Aldy Hernandez wrote: This is a fresh re-post of: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html Andrew gave me some feedback a week ago, and I obviously don't remember what it was because I was about to leave on PTO. However, I do remember I addressed his concerns before getting drunk on rum in tropical islands. FWIW found a great coffee infused rum while in Kauai last week. I'm not a coffee fan, but it was wonderful. The one bottle we brought back isn't going to last until Cauldron and I don't think I can get a special order filled before I leave :( You must bring some to Cauldron before we believe you. :) Their more traditional rums were good as well. Koloa Rum. See if you can get it locally :-) This patch adds MIN/MAX values for VR_VARYING. As was discussed earlier, we are only changing VR_VARYING, not VR_UNDEFINED as was the original plan. As I mentioned earlier, I am tired of re-doing ChangeLogs, so I'll whip up the changelog when the review starts. ACK. ChangeLog entries included in attached patch. @@ -150,12 +166,11 @@ vr_values::set_defs_to_varying (gimple *stmt) ssa_op_iter i; tree def; FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF) -{ - value_range *vr = get_value_range (def); - /* Avoid writing to vr_const_varying get_value_range may return. */ - if (!vr->varying_p ()) - vr->set_varying (); -} +if (value_range_base::supports_type_p (TREE_TYPE (def))) + { + value_range *vr = get_value_range (def); + vr->set_varying (TREE_TYPE (def)); + } } Is the supports_type_p stuff there to placate the calls from ipa-cp? I can live with it in the short term, but it really feels like there should be something in the ipa-cp client that avoids this silliness. I am not happy with this either, but there are various places where statements that are !stmt_interesting_for_vrp() are still setting a range of VARYING, which is then being ignored at a later time. For example, vrp_initialize: if (!stmt_interesting_for_vrp (phi)) { tree lhs = PHI_RESULT (phi); set_def_to_varying (lhs); prop_set_simulate_again (phi, false); } Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the statement is interesting for VRP but extract_range_from_stmt() does not produce a useful range, we also set a varying for a range we will never use. Similarly for a statement that is not interesting in this hunk. Then there is vrp_prop::visit_stmt() where we also set VARYING for types that VRP will never handle: case IFN_ADD_OVERFLOW: case IFN_SUB_OVERFLOW: case IFN_MUL_OVERFLOW: case IFN_ATOMIC_COMPARE_EXCHANGE: /* These internal calls return _Complex integer type, which VRP does not track, but the immediate uses thereof might be interesting. */ if (lhs && TREE_CODE (lhs) == SSA_NAME) { imm_use_iterator iter; use_operand_p use_p; enum ssa_prop_result res = SSA_PROP_VARYING; set_def_to_varying (lhs); I've adjusted the patch so that set_def_to_varying will set the range to VR_UNDEFINED if !supports_type_p. This is a fail safe, as we can't really do anything with a nonsensical range. I just don't want to leave the range in an indeterminate state. /* Update the value range and equivalence set for variable VAR to @@ -1920,7 +1935,7 @@ vr_values::dump_all_value_ranges (FILE *file) vr_values::vr_values () : vrp_value_range_pool ("Tree VRP value ranges") { values_propagated = false; - num_vr_values = num_ssa_names; + num_vr_values = num_ssa_names + num_ssa_names / 10; vr_value = XCNEWVEC (value_range *, num_vr_values); vr_phi_edge_counts = XCNEWVEC (int, num_ssa_names); bitmap_obstack_initialize (_equiv_obstack); My recollection was someone (Richi) proposed 2X for the initial allocation which should ensure that in all but the most pathological cases that we never trigger a reallocation. In terms of "wasted" space, it shouldn't matter in practice. So if you could check the old thread and verify that Richi recommended the 2X initial allocation and if so, make that minor update. Yes, it was 2X. I noticed that Richi made some changes to the lattice handling for VARYING while the discussion was on-going. I missed these, and had failed to adapt the patch for it. I would appreciate a final review of the attached patch, especially the vr-values.c changes, which I have modified to play nice with current trunk. I also noticed that Andrew's patch was setting num_vr_values to num_ssa_names + num_ssa_names / 10. I think he meant num_vr_values + num_vr_values / 10. Please verify the current incantation makes sense. Tested on x86-64 Linux. OK for trunk? Aldy commit 4c21b0f667176933f67469372da7459b8cf0ef71
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
On Tue, 13 Aug 2019, Robin Dapp wrote: diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f7..94400529ad8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0 +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ +#if GIMPLE + (simplify +(plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + /* We actually want to perform two simplifications here: Maybe two "transformations"? Since (1) is the exact opposite of the other simplification you are adding... It would have been nice if it could indeed be split into 2 simplifications, in particular because we already have a better (?) version of (2) not far above in the file. But we cannot add (1) and I can't think of a good way to reuse the existing version of (2) :-( + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) + If for (A + CST1) we either do not care about overflow (e.g. + for a signed inner type) or the overflow is ok for an unsigned + inner type. From the code, it seems like the condition is really "If we know that no overflow can occur, either because it would be UB (signed type), or thanks to range information". Maybe that's the meaning of what you wrote, but I had trouble interpreting it. + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) + If (CST1 + CST2) does not overflow and we do care about overflow + (for a signed outer type) or we do not care about overflow in an + unsigned outer type. */ + (with + { + tree inner_type = TREE_TYPE (@0); + wide_int wmin0, wmax0; + wide_int cst1 = wi::to_wide (@1); + +wi::overflow_type min_ovf = wi::OVF_OVERFLOW, There seems to be an inconsistency between spaces and TABs depending on the lines. + max_ovf = wi::OVF_OVERFLOW; + +/* Get overflow behavior. */ + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); + +/* Get value range of A. */ + enum value_range_kind vr0 = get_range_info (@0, , ); + +/* If we have a proper range, determine min and max overflow + of (A + CST1). At some point we may want to split this into a helper function operation_may_overflow or something, that takes an arbitrary mix of SSA_NAME and constants. Not necessary for this patch though. + ??? We might also want to handle anti ranges. */ +if (vr0 == VR_RANGE) + { +wi::add (wmin0, cst1, TYPE_SIGN (inner_type), _ovf); +wi::add (wmax0, cst1, TYPE_SIGN (inner_type), _ovf); + } + +/* Inner overflow does not matter in this case. */ +if (ovf_undef_inner) + { +min_ovf = wi::OVF_NONE; +max_ovf = wi::OVF_NONE; + } Seems that you could switch the conditions, to avoid unnecessary additions if (ovf_undef_inner) ... else if (vr0 == VR_RANGE) ... Maybe even move get_range_info then. + +/* Extend CST from INNER_TYPE to TYPE. */ +cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); wide_int::from? It looks strange to use an arbitrary variable to access a static function, but that's a style preference, it is ok as is. + +/* Check for overflow of (TYPE)(CST1 + CST2). */ +wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; +wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), +_ovf); + +/* We *do* care about an overflow here as we do not want to introduce + new undefined behavior that was not there before. */ There is always the possibility to cast to an unsigned type, perform the operation, then cast back, like the "other (2)" does. But that makes the transformation even longer, and can lose some information. +if (ovf_undef_outer && outer_ovf) + { +/* Set these here to prevent the final conversion below + to take place instead of introducing a new guard variable. */ +min_ovf = wi::OVF_OVERFLOW; +max_ovf = wi::OVF_OVERFLOW; + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) +(plus (convert @0) { wide_int_to_tree (type, cst); } + ) +#endif + +/* ((T)(A)) + CST -> (T)(A + CST) */ The other transformation is a true simplification. This one is more a canonicalization. It does make sense to perform the operation in the narrower type, I just hope we have enough machinery to revert this later for platforms that do not like arithmetic in sub-word size. Or probably this is still restricted enough (doesn't narrow all operations) that it doesn't matter. +#if
Re: [PATCH] fold more string comparison with known result (PR 90879)
On 8/13/19 3:43 PM, Martin Sebor wrote: > On 8/13/19 2:07 PM, Jeff Law wrote: >> On 8/9/19 10:51 AM, Martin Sebor wrote: >>> >>> PR tree-optimization/90879 - fold zero-equality of strcmp between a >>> longer string and a smaller array >>> >>> gcc/c-family/ChangeLog: >>> >>> PR tree-optimization/90879 >>> * c.opt (-Wstring-compare): New option. >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR tree-optimization/90879 >>> * gcc.dg/Wstring-compare-2.c: New test. >>> * gcc.dg/Wstring-compare.c: New test. >>> * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen. >>> * gcc.dg/strcmpopt_6.c: New test. >>> * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add >>> test cases. >>> * gcc.dg/strlenopt-66.c: Run it. >>> * gcc.dg/strlenopt-68.c: New test. >>> >>> gcc/ChangeLog: >>> >>> PR tree-optimization/90879 >>> * builtins.c (check_access): Avoid using maxbound when null. >>> * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen >>> change. >>> * doc/invoke.texi (-Wstring-compare): Document new warning option. >>> * gimple-fold.c (get_range_strlen_tree): Make setting maxbound >>> conditional. >>> (get_range_strlen): Overwrite initial maxbound when non-null. >>> * gimple-ssa-sprintf.c (get_string_length): Adjust to >>> get_range_strlen >>> change. >>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same. >>> (used_only_for_zero_equality): New function. >>> (handle_builtin_memcmp): Call it. >>> (determine_min_objsize): Return an integer instead of tree. >>> (get_len_or_size, strxcmp_eqz_result): New functions. >>> (maybe_warn_pointless_strcmp): New function. >>> (handle_builtin_string_cmp): Call it. Fold zero-equality of strcmp >>> between a longer string and a smaller array. >>> >> >>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >>> index 4af47855e7c..31e012b741b 100644 >>> --- a/gcc/tree-ssa-strlen.c >>> +++ b/gcc/tree-ssa-strlen.c >> >>> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest) >>> type = TYPE_MAIN_VARIANT (type); >>> - /* We cannot determine the size of the array if it's a flexible >>> array, >>> - which is declared at the end of a structure. */ >>> - if (TREE_CODE (type) == ARRAY_TYPE >>> - && !array_at_struct_end_p (dest)) >>> + /* The size of a flexible array cannot be determined. Otherwise, >>> + for arrays with more than one element, return the size of its >>> + type. GCC itself misuses arrays of both zero and one elements >>> + as flexible array members so they are excluded as well. */ >>> + if (TREE_CODE (type) != ARRAY_TYPE >>> + || !array_at_struct_end_p (dest)) >>> { >>> - tree size_t = TYPE_SIZE_UNIT (type); >>> - if (size_t && TREE_CODE (size_t) == INTEGER_CST >>> - && !integer_zerop (size_t)) >>> - return size_t; >>> + tree type_size = TYPE_SIZE_UNIT (type); >>> + if (type_size && TREE_CODE (type_size) == INTEGER_CST >>> + && !integer_onep (type_size) >>> + && !integer_zerop (type_size)) >>> + return tree_to_uhwi (type_size); >> So I nearly commented on this when looking at the original patch. Can >> we really depend on the size when we've got an array at the end of a >> struct with a declared size other than 0/1? While 0/1 are by far the >> most common way to declare them, couldn't someone have used other sizes? >> I think we pondered doing that at one time to cut down on the noise >> from Coverity for RTL and TREE operand accessors. >> >> Your code makes us safer, so I'm not saying you've done anything wrong, >> just trying to decide if we need to tighten this up even further. > > This patch issues a warning in these cases, i.e., when it sees > a call like, say, strcmp("foobar", A) with an A that's smaller > than the string, because it seems they are likely (rare) bugs. > I haven't seen the warning in any of the projects I tested it > with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM). > > The warning uses strcmp to detect these mistakes (or misuses) > but I'd like to add similar warnings for other string functions > as well and have code out there that does this on purpose use > true flexible array members (or the zero-length extension) > instead. That makes the intent clear. > > It's a judgment call whether to also fold (or do something else > like insert a trap) in addition to issuing a warning. In this > case (reading) I don't think it matters as much as it does for > writes. Either way, it would be nice to set a policy and > document it in the manual so users know what to expect and > so we don't have to revisit this question for each patch that > touches on this subject. The GCC manual documents zero length arrays at the end of an aggregate as a GNU extension for variable length objects. The manual also documents that it could be done with single element arrays, but that doing so does contribute to the base
Re: [PATCH] fold more string comparison with known result (PR 90879)
On 8/13/19 2:07 PM, Jeff Law wrote: On 8/9/19 10:51 AM, Martin Sebor wrote: PR tree-optimization/90879 - fold zero-equality of strcmp between a longer string and a smaller array gcc/c-family/ChangeLog: PR tree-optimization/90879 * c.opt (-Wstring-compare): New option. gcc/testsuite/ChangeLog: PR tree-optimization/90879 * gcc.dg/Wstring-compare-2.c: New test. * gcc.dg/Wstring-compare.c: New test. * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen. * gcc.dg/strcmpopt_6.c: New test. * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add test cases. * gcc.dg/strlenopt-66.c: Run it. * gcc.dg/strlenopt-68.c: New test. gcc/ChangeLog: PR tree-optimization/90879 * builtins.c (check_access): Avoid using maxbound when null. * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change. * doc/invoke.texi (-Wstring-compare): Document new warning option. * gimple-fold.c (get_range_strlen_tree): Make setting maxbound conditional. (get_range_strlen): Overwrite initial maxbound when non-null. * gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen change. * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same. (used_only_for_zero_equality): New function. (handle_builtin_memcmp): Call it. (determine_min_objsize): Return an integer instead of tree. (get_len_or_size, strxcmp_eqz_result): New functions. (maybe_warn_pointless_strcmp): New function. (handle_builtin_string_cmp): Call it. Fold zero-equality of strcmp between a longer string and a smaller array. diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 4af47855e7c..31e012b741b 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest) type = TYPE_MAIN_VARIANT (type); - /* We cannot determine the size of the array if it's a flexible array, - which is declared at the end of a structure. */ - if (TREE_CODE (type) == ARRAY_TYPE - && !array_at_struct_end_p (dest)) + /* The size of a flexible array cannot be determined. Otherwise, + for arrays with more than one element, return the size of its + type. GCC itself misuses arrays of both zero and one elements + as flexible array members so they are excluded as well. */ + if (TREE_CODE (type) != ARRAY_TYPE + || !array_at_struct_end_p (dest)) { - tree size_t = TYPE_SIZE_UNIT (type); - if (size_t && TREE_CODE (size_t) == INTEGER_CST - && !integer_zerop (size_t)) -return size_t; + tree type_size = TYPE_SIZE_UNIT (type); + if (type_size && TREE_CODE (type_size) == INTEGER_CST + && !integer_onep (type_size) + && !integer_zerop (type_size)) +return tree_to_uhwi (type_size); So I nearly commented on this when looking at the original patch. Can we really depend on the size when we've got an array at the end of a struct with a declared size other than 0/1? While 0/1 are by far the most common way to declare them, couldn't someone have used other sizes? I think we pondered doing that at one time to cut down on the noise from Coverity for RTL and TREE operand accessors. Your code makes us safer, so I'm not saying you've done anything wrong, just trying to decide if we need to tighten this up even further. This patch issues a warning in these cases, i.e., when it sees a call like, say, strcmp("foobar", A) with an A that's smaller than the string, because it seems they are likely (rare) bugs. I haven't seen the warning in any of the projects I tested it with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM). The warning uses strcmp to detect these mistakes (or misuses) but I'd like to add similar warnings for other string functions as well and have code out there that does this on purpose use true flexible array members (or the zero-length extension) instead. That makes the intent clear. It's a judgment call whether to also fold (or do something else like insert a trap) in addition to issuing a warning. In this case (reading) I don't think it matters as much as it does for writes. Either way, it would be nice to set a policy and document it in the manual so users know what to expect and so we don't have to revisit this question for each patch that touches on this subject. Martin
[PATCH 3/3] [og9] Wait on queue-full condition in AMD GCN libgomp offloading plugin
This patch lets the AMD GCN libgomp plugin wait for asynchronous queues to have some space to push new operations when they are full, rather than just erroring out immediately on that condition. This fixes the libgomp.oacc-c-c++-common/da-4.c test. Julian ChangeLog libgomp/ * plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full condition. --- libgomp/ChangeLog.openacc | 5 + libgomp/plugin/plugin-gcn.c | 11 +-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 2a9a7f18ca2..f9d8e6ecd39 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,8 @@ +2019-08-13 Julian Brown + + * plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full + condition. + 2019-08-13 Julian Brown * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 65690e643ed..099f70b647c 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1416,8 +1416,15 @@ queue_push_callback (struct goacc_asyncqueue *aq, void (*fn)(void *), void *data) { if (aq->queue_n == ASYNC_QUEUE_SIZE) -GOMP_PLUGIN_fatal ("Async thread %d:%d: error: queue overflowed", - aq->agent->device_id, aq->id); +{ + pthread_mutex_lock (>mutex); + + /* Queue is full. Wait for it to not be full. */ + while (aq->queue_n == ASYNC_QUEUE_SIZE) + pthread_cond_wait (>queue_cond_out, >mutex); + + pthread_mutex_unlock (>mutex); +} pthread_mutex_lock (>mutex); -- 2.22.0
[PATCH 2/3] [og9] Use temporary buffers for async host2dev copies
In libgomp, host-to-device transfers are instigated in several places where the source data is either on the stack, or in an unstable heap location (i.e. which is immediately freed after performing the host-to-device transfer). When the transfer is asynchronous, this means that taking the address of source data and attempting the copy from that at some later point is extremely likely to fail. A previous fix for this problem (from our internal branch, and included with the AMD GCN offloading patches) attempted to separate transfers from the stack (performing them immediately) from transfers from the heap (which can safely be done some time later). Unfortunately that doesn't work well with more recent changes to libgomp and the GCN plugin. So instead, this patch copies the source data for asynchronous host-to-device copies immediately to a temporary buffer, then the transfer to the device can safely take place asynchronously some time later. Julian ChangeLog libgomp/ * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. (copy_data): Free temporary buffer if using. (queue_push_copy): Add using_src_copy parameter. (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to queue_push_copy. (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and copy source data to it immediately. * target.c (gomp_copy_host2dev): Update function comment. (copy_host2dev_immediate): Remove. (gomp_map_pointer, gomp_map_vars_internal): Replace calls to copy_host2dev_immediate with calls to gomp_copy_host2dev. --- libgomp/ChangeLog.openacc | 14 ++ libgomp/plugin/plugin-gcn.c | 20 ++--- libgomp/target.c| 56 +++-- 3 files changed, 52 insertions(+), 38 deletions(-) diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 2279545f361..2a9a7f18ca2 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,17 @@ +2019-08-13 Julian Brown + + * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. + (copy_data): Free temporary buffer if using. + (queue_push_copy): Add using_src_copy parameter. + (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to + queue_push_copy. + (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and + copy source data to it immediately. + * target.c (gomp_copy_host2dev): Update function comment. + (copy_host2dev_immediate): Remove. + (gomp_map_pointer, gomp_map_vars_internal): Replace calls to + copy_host2dev_immediate with calls to gomp_copy_host2dev. + 2019-08-08 Julian Brown * plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index a41568b3306..65690e643ed 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3063,6 +3063,7 @@ struct copy_data const void *src; size_t len; bool use_hsa_memory_copy; + bool using_src_copy; struct goacc_asyncqueue *aq; }; @@ -3077,12 +3078,14 @@ copy_data (void *data_) hsa_fns.hsa_memory_copy_fn (data->dst, data->src, data->len); else memcpy (data->dst, data->src, data->len); + if (data->using_src_copy) +free ((void *) data->src); free (data); } static void queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, -size_t len, bool use_hsa_memory_copy) +size_t len, bool use_hsa_memory_copy, bool using_src_copy) { if (DEBUG_QUEUES) HSA_DEBUG ("queue_push_copy %d:%d: %zu bytes from (%p) to (%p)\n", @@ -3093,6 +3096,7 @@ queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, data->src = src; data->len = len; data->use_hsa_memory_copy = use_hsa_memory_copy; + data->using_src_copy = using_src_copy; data->aq = aq; queue_push_callback (aq, copy_data, data); } @@ -3137,7 +3141,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) { struct agent_info *agent = get_agent_info (device); maybe_init_omp_async (agent); - queue_push_copy (agent->omp_async_queue, dst, src, n, false); + queue_push_copy (agent->omp_async_queue, dst, src, n, false, false); return true; } @@ -3469,7 +3473,15 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, image_address_p (agent, dst)); + /* The source data does not necessarily remain live until the deferred + copy happens. Taking a snapshot of the data here avoids reading + uninitialised data later, but means that (a) data is copied twice and + (b) modifications to the copied data between the "spawning" point of + the
[PATCH 1/3] [og9] Wait at end of OpenACC asynchronous kernels regions
This patch provides a workaround for unreliable operation of asynchronous kernels regions on AMD GCN. At present, kernels regions are decomposed into a series of parallel regions surrounded by a data region capturing the data-movement clauses needed by the region as a whole: #pragma acc kernels async(n) { ... } is translated to: #pragma acc data copyin(...) copyout(...) { #pragma acc parallel async(n) present(...) { ... } #pragma acc parallel async(n) present(...) { ... } } This is however problematic for two reasons: - Variables mapped by the data clause will be unmapped immediately at the end of the data region, regardless of whether the inner asynchronous parallels have completed. (This causes crashes for GCN.) - Even if the "present" clause caused the reference count to stay above zero at the end of the data region -- which it doesn't -- the "present" clauses on the inner parallel regions would not cause "copyout" variables to be transferred back to the host at the appropriate time, i.e. when the async parallel region had completed. There is no "async" data construct in OpenACC, so the correct solution (which I am deferring on for now) is probably to use asynchronous "enter data" and "exit data" directives when translating asynchronous kernels regions instead. The attached patch just adds a "wait" operation before the end of the enclosing data region. This works, but introduces undesirable synchronisation with the host. Julian ChangeLog gcc/ * omp-oacc-kernels.c (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. --- gcc/ChangeLog.openacc | 7 +++ gcc/omp-oacc-kernels.c | 28 +--- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc index 84d80511603..a22f07c817c 100644 --- a/gcc/ChangeLog.openacc +++ b/gcc/ChangeLog.openacc @@ -1,3 +1,10 @@ +2019-08-13 Julian Brown + + * omp-oacc-kernels.c (add_wait): New function, split out of... + (add_async_clauses_and_wait): ...here. Call new outlined function. + (decompose_kernels_region_body): Add wait at the end of + explicitly-asynchronous kernels regions. + 2019-08-08 Julian Brown * config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c index 20913859c12..a6c4220f472 100644 --- a/gcc/omp-oacc-kernels.c +++ b/gcc/omp-oacc-kernels.c @@ -900,6 +900,18 @@ maybe_build_inner_data_region (location_t loc, gimple *body, return body; } +static void +add_wait (location_t loc, gimple_seq *region_body) +{ + /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ + tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); + tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); + gimple *wait_call = gimple_build_call (wait_fn, 2, + sync_arg, integer_zero_node); + gimple_set_location (wait_call, loc); + gimple_seq_add_stmt (region_body, wait_call); +} + /* Helper function of decompose_kernels_region_body. The statements in REGION_BODY are expected to be decomposed parallel regions; add an "async" clause to each. Also add a "wait" pragma at the end of the @@ -923,13 +935,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq *region_body) gimple_omp_target_set_clauses (as_a (stmt), target_clauses); } - /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ - tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); - tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); - gimple *wait_call = gimple_build_call (wait_fn, 2, - sync_arg, integer_zero_node); - gimple_set_location (wait_call, loc); - gimple_seq_add_stmt (region_body, wait_call); + add_wait (loc, region_body); } /* Auxiliary analysis of the body of a kernels region, to determine for each @@ -1378,6 +1384,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) a wait directive at the end. */ if (async_clause == NULL) add_async_clauses_and_wait (loc, _body); + else +/* !!! If we have asynchronous parallel blocks inside a (synchronous) data + region, then target memory will get unmapped at the point the data + region ends, even if the inner asynchronous parallels have not yet + completed. For kernels marked "async", we might want to use "enter data + async(...)" and "exit data async(...)" instead. + For now, insert a (synchronous) wait at the end of the block. */ +add_wait (loc, _body); tree kernels_locals = gimple_bind_vars (as_a (kernels_body)); gimple *body
[PATCH 0/3] [og9] OpenACC async fixes for AMD GCN
These patches stabilise async support for AMD GCN. Several tests that previously failed (some intermittently) now work. Further commentary is provided alongside each patch. Tested with offloading to AMD GCN. I will apply shortly to the openacc-gcc-9-branch. Thanks, Julian Julian Brown (3): [og9] Wait at end of OpenACC asynchronous kernels regions [og9] Use temporary buffers for async host2dev copies [og9] Wait on queue-full condition in AMD GCN libgomp offloading plugin gcc/ChangeLog.openacc | 7 + gcc/omp-oacc-kernels.c | 28 ++- libgomp/ChangeLog.openacc | 19 + libgomp/plugin/plugin-gcn.c | 31 libgomp/target.c| 56 +++-- 5 files changed, 94 insertions(+), 47 deletions(-) -- 2.22.0
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
On 8/13/19 2:42 PM, Robin Dapp wrote: >> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ >> Do you want to handle MINUS? What about POINTER_PLUS_EXPR? > > When I last attempted this patch I had the MINUS still in it but got > confused easily by needing to think of too many cases at once leading to > lots of stupid mistakes. Hence, I left it out in the last revision :) > Once everything is in place, I guess the additional cases can be added > more easily - hope that's reasonable. It's fine by me. Richi has the final say though... jeff
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ > Do you want to handle MINUS? What about POINTER_PLUS_EXPR? When I last attempted this patch I had the MINUS still in it but got confused easily by needing to think of too many cases at once leading to lots of stupid mistakes. Hence, I left it out in the last revision :) Once everything is in place, I guess the additional cases can be added more easily - hope that's reasonable. Regards Robin
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
On 8/13/19 2:33 AM, Robin Dapp wrote: > We would like to simplify code like > (larger_type)(var + const1) + const2 > to > (larger_type)(var + combined_const1_const2) > when we know that no overflow happens. > --- > gcc/match.pd | 101 +++ > 1 file changed, 101 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0317bc704f7..94400529ad8 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0 > > +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ Do you want to handle MINUS? What about POINTER_PLUS_EXPR? Richi and Marc are better suited to review the guts of this change than I am. I though we had these transformations in fold-const.c and I was going to suggest removing them as a part of this patch. *But* I can't seem to find them. Jeff
Re: [PATCH 2/2] Add more entries to the C++ get_std_name_hint array
On 8/13/19 9:36 AM, Jonathan Wakely wrote: This adds some commonly-used C++11/14 names, and some new C++17/20 names. The latter aren't available when using the -std=gnu++14 default, so the fix-it suggesting to use a newer dialect is helpful. * name-lookup.c (get_std_name_hint): Add more entries. Tested x86_64-linux. OK for trunk? OK. Jason
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On 8/13/19 9:32 AM, Jonathan Wakely wrote: * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. I'm not comfortable removing this test coverage entirely. Doesn't it give a useful diagnostic in C++11 mode as well? Jason
Re: [PATCH] fold more string comparison with known result (PR 90879)
On 8/9/19 10:51 AM, Martin Sebor wrote: > > PR tree-optimization/90879 - fold zero-equality of strcmp between a longer > string and a smaller array > > gcc/c-family/ChangeLog: > > PR tree-optimization/90879 > * c.opt (-Wstring-compare): New option. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/90879 > * gcc.dg/Wstring-compare-2.c: New test. > * gcc.dg/Wstring-compare.c: New test. > * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen. > * gcc.dg/strcmpopt_6.c: New test. > * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add > test cases. > * gcc.dg/strlenopt-66.c: Run it. > * gcc.dg/strlenopt-68.c: New test. > > gcc/ChangeLog: > > PR tree-optimization/90879 > * builtins.c (check_access): Avoid using maxbound when null. > * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change. > * doc/invoke.texi (-Wstring-compare): Document new warning option. > * gimple-fold.c (get_range_strlen_tree): Make setting maxbound > conditional. > (get_range_strlen): Overwrite initial maxbound when non-null. > * gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen > change. > * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same. > (used_only_for_zero_equality): New function. > (handle_builtin_memcmp): Call it. > (determine_min_objsize): Return an integer instead of tree. > (get_len_or_size, strxcmp_eqz_result): New functions. > (maybe_warn_pointless_strcmp): New function. > (handle_builtin_string_cmp): Call it. Fold zero-equality of strcmp > between a longer string and a smaller array. > > diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c > index 4af47855e7c..31e012b741b 100644 > --- a/gcc/tree-ssa-strlen.c > +++ b/gcc/tree-ssa-strlen.c > @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest) > >type = TYPE_MAIN_VARIANT (type); > > - /* We cannot determine the size of the array if it's a flexible array, > - which is declared at the end of a structure. */ > - if (TREE_CODE (type) == ARRAY_TYPE > - && !array_at_struct_end_p (dest)) > + /* The size of a flexible array cannot be determined. Otherwise, > + for arrays with more than one element, return the size of its > + type. GCC itself misuses arrays of both zero and one elements > + as flexible array members so they are excluded as well. */ > + if (TREE_CODE (type) != ARRAY_TYPE > + || !array_at_struct_end_p (dest)) > { > - tree size_t = TYPE_SIZE_UNIT (type); > - if (size_t && TREE_CODE (size_t) == INTEGER_CST > - && !integer_zerop (size_t)) > -return size_t; > + tree type_size = TYPE_SIZE_UNIT (type); > + if (type_size && TREE_CODE (type_size) == INTEGER_CST > + && !integer_onep (type_size) > + && !integer_zerop (type_size)) > +return tree_to_uhwi (type_size); So I nearly commented on this when looking at the original patch. Can we really depend on the size when we've got an array at the end of a struct with a declared size other than 0/1? While 0/1 are by far the most common way to declare them, couldn't someone have used other sizes? I think we pondered doing that at one time to cut down on the noise from Coverity for RTL and TREE operand accessors. Your code makes us safer, so I'm not saying you've done anything wrong, just trying to decide if we need to tighten this up even further. No additional comments beyond what I pointed out yesterday against the original patch. Jeff >
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
On Tue, Aug 13, 2019 at 8:20 AM Jeff Law wrote: > > On 8/12/19 6:27 AM, Richard Biener wrote: > > On Fri, 9 Aug 2019, Uros Bizjak wrote: > > > >> On Fri, Aug 9, 2019 at 3:00 PM Richard Biener wrote: > >> > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI > "TARGET_AVX512F"]) > > and then we need to split DImode for 32bits, too. > >>> > >>> For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode > >>> condition, I'll provide _doubleword splitter later. > >> > >> Shouldn't that be TARGET_AVX512VL instead? Or does the insn use > >> %g0 etc. > >> to force use of %zmmN? > > > > It generates V4SI mode, so - yes, AVX512VL. > > case SMAX: > case SMIN: > case UMAX: > case UMIN: > if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) > || (mode == SImode && !TARGET_SSE4_1)) > return false; > > so there's no way to use AVX512VL for 32bit? > >>> > >>> There is a way, but on 32bit targets, we need to split DImode > >>> operation to a sequence of SImode operations for unconverted pattern. > >>> This is of course doable, but somehow more complex than simply > >>> emitting a DImode compare + DImode cmove, which is what current > >>> splitter does. So, a follow-up task. > >> > >> Please find attached the complete .md part that enables SImode for > >> TARGET_SSE4_1 and DImode for TARGET_AVX512VL for both, 32bit and 64bit > >> targets. The patterns also allows for memory operand 2, so STV has > >> chance to create the vector pattern with implicit load. In case STV > >> fails, the memory operand 2 is loaded to the register first; operand > >> 2 is used in compare and cmove instruction, so pre-loading of the > >> operand should be beneficial. > > > > Thanks. > > > >> Also note, that splitting should happen rarely. Due to the cost > >> function, STV should effectively always convert minmax to a vector > >> insn. > > > > I've analyzed the 464.h264ref slowdown on Haswell and it is due to > > this kind of "simple" conversion: > > > > 5.50 │1d0: test %esi,%es > > 0.07 │ mov$0x0,%ex > >│ cmovs %eax,%es > > 5.84 │ imul %r8d,%es > > > > to > > > > 0.65 │1e0: vpxor %xmm0,%xmm0,%xmm0 > > 0.32 │ vpmaxs -0x10(%rsp),%xmm0,%xmm0 > > 40.45 │ vmovd %xmm0,%eax > > 2.45 │ imul %r8d,%eax > > > > which looks like a RA artifact in the end. We spill %esi only > > with -mstv here as STV introduces a (subreg:V4SI ...) use > > of a pseudo ultimatively set from di. STV creates an additional > > pseudo for this (copy-in) but it places that copy next to the > > original def rather than next to the start of the chain it > > converts which is probably the issue why we spill. And this > > is because it inserts those at each definition of the pseudo > > rather than just at the reaching definition(s) or at the > > uses of the pseudo in the chain (that because there may be > > defs of that pseudo in the chain itself). Note that STV emits > > such "conversion" copies as simple reg-reg moves: > > > > (insn 1094 3 4 2 (set (reg:SI 777) > > (reg/v:SI 438 [ y ])) "refbuf.i":4:1 -1 > > (nil)) > > > > but those do not prevail very long (this one gets removed by CSE2). > > So IRA just sees the (subreg:V4SI (reg/v:SI 438 [ y ]) 0) use > > and computes > > > > r438: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS > > a297(r438,l0) costs: SSE_REGS:5628,5628 MEM:3618,3618 > > > > so I wonder if STV shouldn't instead emit gpr->xmm moves > > here (but I guess nothing again prevents RTL optimizers from > > combining that with the single-use in the max instruction...). > > > > So this boils down to STV splitting live-ranges but other > > passes undoing that and then RA not considering splitting > > live-ranges here, arriving at unoptimal allocation. > > > > A testcase showing this issue is (simplified from 464.h264ref > > UMVLine16Y_11): > > > > unsigned short > > UMVLine16Y_11 (short unsigned int * Pic, int y, int width) > > { > > if (y != width) > > { > > y = y < 0 ? 0 : y; > > return Pic[y * width]; > > } > > return Pic[y]; > > } > > > > where the condition and the Pic[y] load mimics the other use of y. > > Different, even worse spilling is generated by > > > > unsigned short > > UMVLine16Y_11 (short unsigned int * Pic, int y, int width) > > { > > y = y < 0 ? 0 : y; > > return Pic[y * width] + y; > > } >
[Darwin, committed] There is no need to distinguish PIC/non-PIC symbol stubs.
As per $SUBJECT, We can use a single flag for both pic and non-pic. Rename this now, before a confusing name gets into the wild. Other than the name, this is NFC. tested on i686/powerpc/x86-64 darwin, x86_64-linux-gnu, powerpc64-linux-gnu applied to mainline. thanks Iain gcc/ 2019-08-13 Iain Sandoe * config/darwin.c (machopic_indirect_call_target): Rename symbol stub flag. (darwin_override_options): Likewise. * config/darwin.h: Likewise. * config/darwin.opt: Likewise. * config/i386/i386.c (output_pic_addr_const): Likewise. * config/rs6000/darwin.h: Likewise. * config/rs6000/rs6000.c (rs6000_call_darwin_1): Likewise. * config/i386/darwin.h (TARGET_MACHO_PICSYM_STUBS): Rename to ... ... this TARGET_MACHO_SYMBOL_STUBS. (FUNCTION_PROFILER):Likewise. * config/i386/i386.h: Likewise. gcc/testsuite/ 2019-08-13 Iain Sandoe * obj-c++.dg/stubify-1.mm: Rename symbol stub option. * obj-c++.dg/stubify-2.mm: Likewise. * objc.dg/stubify-1.m: Likewise. * objc.dg/stubify-2.m: Likewise. diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c index 5ac092583b..fdd23c49bb 100644 --- a/gcc/config/darwin.c +++ b/gcc/config/darwin.c @@ -788,7 +788,7 @@ machopic_indirect_data_reference (rtx orig, rtx reg) rtx machopic_indirect_call_target (rtx target) { - if (! darwin_picsymbol_stubs) + if (! darwin_symbol_stubs) return target; if (GET_CODE (target) != MEM) @@ -3268,13 +3268,13 @@ darwin_override_options (void) Linkers that don't need stubs, don't need the EH symbol markers either. */ - if (!global_options_set.x_darwin_picsymbol_stubs) + if (!global_options_set.x_darwin_symbol_stubs) { if (darwin_target_linker) { if (strverscmp (darwin_target_linker, MIN_LD64_OMIT_STUBS) < 0) { - darwin_picsymbol_stubs = true; + darwin_symbol_stubs = true; ld_needs_eh_markers = true; } } @@ -3283,15 +3283,15 @@ darwin_override_options (void) /* If we don't know the linker version and we're targeting an old system, we know no better than to assume the use of an earlier linker. */ - darwin_picsymbol_stubs = true; + darwin_symbol_stubs = true; ld_needs_eh_markers = true; } } - else if (DARWIN_X86 && darwin_picsymbol_stubs && TARGET_64BIT) + else if (DARWIN_X86 && darwin_symbol_stubs && TARGET_64BIT) { inform (input_location, "%<-mpic-symbol-stubs%> is not required for 64b code (ignored)"); - darwin_picsymbol_stubs = false; + darwin_symbol_stubs = false; } if (generating_for_darwin_version >= 9) diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h index ab5ad03f17..93dc638f8b 100644 --- a/gcc/config/darwin.h +++ b/gcc/config/darwin.h @@ -1019,7 +1019,7 @@ extern void darwin_driver_init (unsigned int *,struct cl_decoded_option **); _tested_ version known to support this so far. */ #define MIN_LD64_NO_COAL_SECTS "236.4" -/* From at least version 62.1, ld64 can build PIC indirection stubs as +/* From at least version 62.1, ld64 can build symbol indirection stubs as needed, and there is no need for the compiler to emit them. */ #define MIN_LD64_OMIT_STUBS "62.1" diff --git a/gcc/config/darwin.opt b/gcc/config/darwin.opt index d7e5e7bfca..7f5616cbe0 100644 --- a/gcc/config/darwin.opt +++ b/gcc/config/darwin.opt @@ -75,9 +75,9 @@ mone-byte-bool Target RejectNegative Report Var(darwin_one_byte_bool) Set sizeof(bool) to 1. -mpic-symbol-stubs -Target Report Var(darwin_picsymbol_stubs) Init(0) -Force generation of PIC symbol stubs. +msymbol-stubs +Target Report Var(darwin_symbol_stubs) Init(0) +Force generation of external symbol indirection stubs. ; Some code-gen may be improved / adjusted if the linker is sufficiently modern. mtarget-linker= diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h index 84afedccfb..385d253692 100644 --- a/gcc/config/i386/darwin.h +++ b/gcc/config/i386/darwin.h @@ -91,8 +91,8 @@ along with GCC; see the file COPYING3. If not see #define WCHAR_TYPE_SIZE 32 /* Generate pic symbol indirection stubs if this is true. */ -#undef TARGET_MACHO_PICSYM_STUBS -#define TARGET_MACHO_PICSYM_STUBS (darwin_picsymbol_stubs) +#undef TARGET_MACHO_SYMBOL_STUBS +#define TARGET_MACHO_SYMBOL_STUBS (darwin_symbol_stubs) /* For compatibility with OSX system tools, use the new style of pic stub if this is set (default). */ @@ -243,7 +243,7 @@ along with GCC; see the file COPYING3. If not see #undef FUNCTION_PROFILER #define FUNCTION_PROFILER(FILE, LABELNO) \ do { \ -if (TARGET_MACHO_PICSYM_STUBS \ +if (TARGET_MACHO_SYMBOL_STUBS \
Re: [PATCH] PR fortran/89647 -- Allow host associated procedure to be a binding target
On Tue, Aug 13, 2019 at 09:52:01AM +0300, Janne Blomqvist wrote: > On Tue, Aug 13, 2019 at 1:32 AM Steve Kargl > wrote: > > > > The attached patch fixes PR fortran/89647, and has been > > regression tested on x86_64-*-freebsd. > > > > If a procedure is made accessible by host association, the > > that procedure can be used as binding target. During the > > resolution of a type bound procedure, gfortran needs to > > check the parent namespace for a procedure before it fails. > > > > 2019-08-12 Steven G. Kargl > > > > PF fortran/89647 > > resolve.c (resolve_typebound_procedure): Allow host associated > > procedure to be a binding target. While here, wrap long line. > > > > 2019-08-12 Steven G. Kargl > > > > PF fortran/89647 > > * gfortran.dg/pr89647.f90: New test. > > > > OK to commit? > > Ok, thanks. Committed to trunk. Committed to 9-branch after successful regression testing. -- Steve
[patch, fortran, committed] Fix PR 90563, error while warning about do subscripts
Hi, I just committed as simple and obvious the patch below. This fixes a 8/9/10 regression where a false positive with -Wdo-subscript (which we know about) was compounded by an also invalid error. Fixed by suppressing errors at the right time. I will also commit the patch to the other affected branches. Regards Thomas 2013-08-13 Thomas Koenig PR fortran/90563 * frontend-passes.c (insert_index): Suppress errors while simplifying the resulting expression. 2013-08-13 Thomas Koenig PR fortran/90563 * gfortran.dg/do_subsript_5.f90: New test. Index: testsuite/gfortran.dg/do_subscript_5.f90 === --- testsuite/gfortran.dg/do_subscript_5.f90 (Revision 274394) +++ testsuite/gfortran.dg/do_subscript_5.f90 (Arbeitskopie) @@ -1,4 +1,5 @@ ! { dg-do compile } +! { dg-additional-options "-Wdo-subscript" } ! PR 90563 - this used to be rejected, wrongly ! Original test case by Tobias Neumann program test @@ -9,9 +10,11 @@ p = 0.0 - do j=1,6 + ! The following warnings are actually bogus, but we are not yet + ! clever enough to suppress them. + do j=1,6 ! { dg-warning "out of bounds" } if (j<5) then - p(j) = p(swap(j)) + p(j) = p(swap(j)) ! { dg-warning "out of bounds" } endif enddo end program ! { dg-do compile } ! { dg-additional-options "-Wdo-subscript" } ! PR 90563 - this used to be rejected, wrongly ! Original test case by Tobias Neumann program test implicit none integer, parameter :: swap(4) = [2,1,3,4] real :: p(20) integer :: j p = 0.0 ! The following warnings are actually bogus, but we are not yet ! clever enough to suppress them. do j=1,6 ! { dg-warning "out of bounds" } if (j<5) then p(j) = p(swap(j)) ! { dg-warning "out of bounds" } endif enddo end program
Re: Use checking forms of DECL_FUNCTION_CODE (PR 91421)
On 8/13/19 3:36 AM, Richard Sandiford wrote: > We were shoe-horning all built-in enumerations (including frontend > and target-specific ones) into a field of type built_in_function. This > was accessed as either an lvalue or an rvalue using DECL_FUNCTION_CODE. > > The obvious danger with this (as was noted by several ??? comments) > is that the ranges have nothing to do with each other, and targets can > easily have more built-in functions than generic code. But my patch to > make the field bigger was the straw that finally made the problem visible. > > This patch therefore: > > - replaces the field with a plain unsigned int > > - turns DECL_FUNCTION_CODE into an rvalue-only accessor that checks > that the function really is BUILT_IN_NORMAL > > - adds corresponding DECL_MD_FUNCTION_CODE and DECL_FE_FUNCTION_CODE > accessors for BUILT_IN_MD and BUILT_IN_FRONTEND respectively > > - adds DECL_UNCHECKED_FUNCTION_CODE for places that need to access the > underlying field (should be low-level code only) > > - adds new helpers for setting the built-in class and function code > > - makes DECL_BUILT_IN_CLASS an rvalue-only accessor too, since all > assignments should go through the new helpers > > Tested on aarch64-linux-gnu and x86_64-linux-gnu (--enable-languages=all > --enable-host-shared). OK to install? > > Richard > > > 2019-08-13 Richard Sandiford > > gcc/ > PR middle-end/91421 > * tree-core.h (function_decl::function_code): Change type to > unsigned int. > * tree.h (DECL_FUNCTION_CODE): Rename old definition to... > (DECL_UNCHECKED_FUNCTION_CODE): ...this. > (DECL_BUILT_IN_CLASS): Make an rvalue macro only. > (DECL_FUNCTION_CODE): New function. Assert that the built-in class > is BUILT_IN_NORMAL. > (DECL_MD_FUNCTION_CODE, DECL_FE_FUNCTION_CODE): New functions. > (set_decl_built_in_function, copy_decl_built_in_function): Likewise. > (fndecl_built_in_p): Change the type of the "name" argument to > unsigned int. > * builtins.c (expand_builtin): Move DECL_FUNCTION_CODE use > after check for DECL_BUILT_IN_CLASS. > * cgraphclones.c (build_function_decl_skip_args): Use > set_decl_built_in_function. > * ipa-param-manipulation.c (ipa_modify_formal_parameters): Likewise. > * ipa-split.c (split_function): Likewise. > * langhooks.c (add_builtin_function_common): Likewise. > * omp-simd-clone.c (simd_clone_create): Likewise. > * tree-streamer-in.c (unpack_ts_function_decl_value_fields): Likewise. > * config/darwin.c (darwin_init_cfstring_builtins): Likewise. > (darwin_fold_builtin): Use DECL_MD_FUNCTION_CODE instead of > DECL_FUNCTION_CODE. > * fold-const.c (operand_equal_p): Compare DECL_UNCHECKED_FUNCTION_CODE > instead of DECL_FUNCTION_CODE. > * lto-streamer-out.c (hash_tree): Use DECL_UNCHECKED_FUNCTION_CODE > instead of DECL_FUNCTION_CODE. > * tree-streamer-out.c (pack_ts_function_decl_value_fields): Likewise. > * print-tree.c (print_node): Use DECL_MD_FUNCTION_CODE when > printing DECL_BUILT_IN_MD. Handle DECL_BUILT_IN_FRONTEND. > * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin) > (aarch64_fold_builtin, aarch64_gimple_fold_builtin): Use > DECL_MD_FUNCTION_CODE instead of DECL_FUNCTION_CODE. > * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise. > * config/alpha/alpha.c (alpha_expand_builtin, alpha_fold_builtin): > (alpha_gimple_fold_builtin): Likewise. > * config/arc/arc.c (arc_expand_builtin): Likewise. > * config/arm/arm-builtins.c (arm_expand_builtin): Likewise. > * config/avr/avr-c.c (avr_resolve_overloaded_builtin): Likewise. > * config/avr/avr.c (avr_expand_builtin, avr_fold_builtin): Likewise. > * config/bfin/bfin.c (bfin_expand_builtin): Likewise. > * config/c6x/c6x.c (c6x_expand_builtin): Likewise. > * config/frv/frv.c (frv_expand_builtin): Likewise. > * config/gcn/gcn.c (gcn_expand_builtin_1): Likewise. > (gcn_expand_builtin): Likewise. > * config/i386/i386-builtins.c (ix86_builtin_reciprocal): Likewise. > (fold_builtin_cpu): Likewise. > * config/i386/i386-expand.c (ix86_expand_builtin): Likewise. > * config/i386/i386.c (ix86_fold_builtin): Likewise. > (ix86_gimple_fold_builtin): Likewise. > * config/ia64/ia64.c (ia64_fold_builtin): Likewise. > (ia64_expand_builtin): Likewise. > * config/iq2000/iq2000.c (iq2000_expand_builtin): Likewise. > * config/mips/mips.c (mips_expand_builtin): Likewise. > * config/msp430/msp430.c (msp430_expand_builtin): Likewise. > * config/nds32/nds32-intrinsic.c (nds32_expand_builtin_impl): Likewise. > * config/nios2/nios2.c (nios2_expand_builtin): Likewise. > * config/nvptx/nvptx.c (nvptx_expand_builtin): Likewise. > * config/pa/pa.c (pa_expand_builtin): Likewise. >
Re: [PATCH] PR fortran/87993 -- An array can have a kind type inquiry suffix
On Tue, Aug 13, 2019 at 09:53:34AM +0300, Janne Blomqvist wrote: > On Tue, Aug 13, 2019 at 3:35 AM Steve Kargl > wrote: > > > > The attached patch ahs been regression tested on x86_64-*-freebsd. > > It probably borders on obvious, but I'll ask away. OK to commit? > > > > 2019-08-12 Steven G. Kargl > > > > PR fortran/87993 > > * expr.c (gfc_simplify_expr): Simplifcation of an array with a kind > > type inquiry suffix yields a constant expression. > > > > 2019-08-12 Steven G. Kargl > > > > PR fortran/87993 > > * gfortran.dg/pr87993.f90: New test. > > Ok. > Committed to trunk. Committed to 9-branch successful regression testing. -- Steve
Re: Optimise constant IFN_WHILE_ULTs
On 8/13/19 4:54 AM, Richard Sandiford wrote: > This patch is a combination of two changes that have to be committed as > a single unit, one target-independent and one target-specific: > > (1) Try to fold IFN_WHILE_ULTs with constant arguments to a VECTOR_CST > (which is always possible for fixed-length vectors but is not > necessarily so for variable-length vectors) > > (2) Make the SVE port recognise constants that map to PTRUE VLn, > which includes those generated by the new fold. > > (2) can't be tested without (1) and (1) would be a significant > pessimisation without (2). > > The target-specific parts also start moving towards doing predicate > manipulation in a canonical VNx16BImode form, using rtx_vector_builders. > > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and > x86_64-linux-gnu. OK for the generic bits (= the first three files > in the diff)? > > Thanks, > Richard > > > 2019-08-13 Richard Sandiford > > gcc/ > * tree.h (build_vector_a_then_b): Declare. > * tree.c (build_vector_a_then_b): New function. > * fold-const-call.c (fold_while_ult): Likewise. > (fold_const_call): Use it to handle IFN_WHILE_ULT. > * config/aarch64/aarch64-protos.h (AARCH64_FOR_SVPATTERN): New macro. > (aarch64_svpattern): New enum. > * config/aarch64/aarch64-sve.md (mov): Pass > constants through aarch64_expand_mov_immediate. > (*aarch64_sve_mov): Use aarch64_mov_operand rather > than general_operand as the predicate for operand 1. > (while_ult): Add a '@' marker. > * config/aarch64/aarch64.c (simd_immediate_info::PTRUE): New > insn_type. > (simd_immediate_info::simd_immediate_info): New overload that > takes a scalar_int_mode and an svpattern. > (simd_immediate_info::u): Add a "pattern" field. > (svpattern_token): New function. > (aarch64_get_sve_pred_bits, aarch64_widest_sve_pred_elt_size) > (aarch64_partial_ptrue_length, aarch64_svpattern_for_vl) > (aarch64_sve_move_pred_via_while): New functions. > (aarch64_expand_mov_immediate): Try using > aarch64_sve_move_pred_via_while for predicates that contain N ones > followed by M zeros but that do not correspond to a VLnnn pattern. > (aarch64_sve_pred_valid_immediate): New function. > (aarch64_simd_valid_immediate): Use it instead of dealing directly > with PTRUE and PFALSE. > (aarch64_output_sve_mov_immediate): Handle new simd_immediate_info > forms. > > gcc/testsuite/ > * gcc.target/aarch64/sve/spill_2.c: Increase iteration counts > beyond the range of a PTRUE. > * gcc.target/aarch64/sve/while_6.c: New test. > * gcc.target/aarch64/sve/while_7.c: Likewise. > * gcc.target/aarch64/sve/while_8.c: Likewise. > * gcc.target/aarch64/sve/while_9.c: Likewise. > * gcc.target/aarch64/sve/while_10.c: Likewise. > Generic bits are fine. jeff
[PATCH, i386]: Add missing *mmx_pextrb and *mmx_pextr{b,w}_zext patterns
We can implement these for TARGET_MMX_WITH_SSE (and TARGET_SSE4_1 in case of.pextrb). 2019-08-13 Uroš Bizjak * config/i386/i386.md (ix86_expand_vector_extract) : Use vec_extr path for TARGET_MMX_WITH_SSE && TARGET_SSE4_1. : Ditto. * config/i386/mmx.md (*mmx_pextrw_zext): Rename from mmx_pextrw. Use SWI48 mode iterator. Use %k to output operand 0. (*mmx_pextrw): New insn pattern. (*mmx_pextrb): Ditto. (*mmx_pextrb_zext): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 718de73395c4..176347cd4e6d 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -14617,6 +14617,11 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) switch (mode) { case E_V2SImode: + use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1; + if (use_vec_extr) + break; + /* FALLTHRU */ + case E_V2SFmode: if (!mmx_ok) break; @@ -14866,7 +14871,10 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) return; case E_V8QImode: + use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1; /* ??? Could extract the appropriate HImode element and shift. */ + break; + default: break; } diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 5ae27c85fe5c..33eb15fae207 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -1510,23 +1510,73 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_insn "mmx_pextrw" - [(set (match_operand:SI 0 "register_operand" "=r,r") -(zero_extend:SI +(define_insn "*mmx_pextrw" + [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,r,m") + (vec_select:HI + (match_operand:V4HI 1 "register_operand" "y,Yv,Yv") + (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))] + "(TARGET_MMX || TARGET_MMX_WITH_SSE) + && (TARGET_SSE || TARGET_3DNOW_A)" + "@ + pextrw\t{%2, %1, %k0|%k0, %1, %2} + %vpextrw\t{%2, %1, %k0|%k0, %1, %2} + %vpextrw\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "*,sse2,sse4") + (set_attr "mmx_isa" "native,*,*") + (set_attr "type" "mmxcvt,sselog1,sselog1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "orig,maybe_vex,maybe_vex") + (set_attr "mode" "DI,TI,TI")]) + +(define_insn "*mmx_pextrw_zext" + [(set (match_operand:SWI48 0 "register_operand" "=r,r") + (zero_extend:SWI48 (vec_select:HI (match_operand:V4HI 1 "register_operand" "y,Yv") (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]] "(TARGET_MMX || TARGET_MMX_WITH_SSE) && (TARGET_SSE || TARGET_3DNOW_A)" "@ - pextrw\t{%2, %1, %0|%0, %1, %2} - %vpextrw\t{%2, %1, %0|%0, %1, %2}" + pextrw\t{%2, %1, %k0|%k0, %1, %2} + %vpextrw\t{%2, %1, %k0|%k0, %1, %2}" [(set_attr "isa" "*,sse2") (set_attr "mmx_isa" "native,*") (set_attr "type" "mmxcvt,sselog1") (set_attr "length_immediate" "1") + (set_attr "prefix" "orig,maybe_vex") (set_attr "mode" "DI,TI")]) +(define_insn "*mmx_pextrb" + [(set (match_operand:QI 0 "nonimmediate_operand" "=r,m") + (vec_select:QI + (match_operand:V8QI 1 "register_operand" "Yv,Yv") + (parallel [(match_operand:SI 2 "const_0_to_7_operand" "n,n")])))] + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" + "@ + %vpextrb\t{%2, %1, %k0|%k0, %1, %2} + %vpextrb\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sselog1") + (set_attr "prefix_data16" "1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "maybe_vex") + (set_attr "mode" "TI")]) + +(define_insn "*mmx_pextrb_zext" + [(set (match_operand:SWI248 0 "register_operand" "=r") + (zero_extend:SWI248 + (vec_select:QI + (match_operand:V8QI 1 "register_operand" "Yv") + (parallel [(match_operand:SI 2 "const_0_to_7_operand" "n")]] + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" + "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "type" "sselog1") + (set_attr "prefix_data16" "1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "maybe_vex") + (set_attr "mode" "TI")]) + (define_expand "mmx_pshufw" [(match_operand:V4HI 0 "register_operand") (match_operand:V4HI 1 "register_mmxmem_operand")
Re: [PR other/79543] Fix GNU ld --version scanning to conform to the GNU Coding Standards
Hi Joseph, > However, I strongly encourage a followup to refactor this code > (*_CHECK_LINKER_FEATURES and *_ENABLE_SYMVERS that use it, not just the > fragment that determines the linker version number), which is evidently > duplicated far too much in different target library directories, into > common macros in a .m4 file in the top-level config/ directory, so it's > more maintainable in future. (Note 1: I don't know what differences there > might be between the versions in different directories; that would need > investigating as part of such a refactoring; differences need not be > deliberate, they could easily have arisen by accident. Note 2: although > libffi is maintained outside of GCC, I think such a refactoring should > still be applied to the libffi directory along with the others; standalone > libffi would simply need its own copy of the relevant .m4 file. Note 3: > it should be possible to do such a refactoring bit by bit if that's more > approachable, rather than necessarily doing a complete refactoring of all > the definitions of all these macros at once.) as it happens, I've been working on exactly this. I'd noticed that *_CHECK_LINKER_FEATURES enabled --gc-sections only with gld although recent Solaris ld supports it as well. What's worse, even with gld the option is detected and used only for some of the libraries due to inconsistencies between the different versions of the macro. I'd meant to unify *_ENABLE_SYMVERS for a long time and this seemed the perfect opportunity to do so, among others because several libs require it from *_CHECK_LINKER_FEATURES to get the linker version info ;-( On top of that, there are many different copies of the code that handles --enable-version-specific-runtime-libs, sets toolexeclibdir and --enable-generated-files-in-srcdir, as well as several instances of *_CHECK_ATTRIBUTE_ALIAS, *_CHECK_ATTRIBUTE_DLLEXPORT, and *_CHECK_ATTRIBUTE_VISIBILITY, and probably more that I've missed so far. Overall, we've created an incredible mess here, and I'm probably partially responsible at least for the *_ENABLE_SYMVERS part. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Add generic support for "noinit" attribute
Ping? On Tue, 30 Jul 2019 at 15:35, Christophe Lyon wrote: > > Hi, > > Thanks for the useful feedback. > > > On Tue, 16 Jul 2019 at 11:54, Richard Sandiford > wrote: > > > > Thanks for doing this in a generic way. > > > > Christophe Lyon writes: > > > @@ -2224,6 +2234,50 @@ handle_weak_attribute (tree *node, tree name, > > >return NULL_TREE; > > > } > > > > > > +/* Handle a "noinit" attribute; arguments as in struct > > > + attribute_spec.handler. Check whether the attribute is allowed > > > + here and add the attribute to the variable decl tree or otherwise > > > + issue a diagnostic. This function checks NODE is of the expected > > > + type and issues diagnostics otherwise using NAME. If it is not of > > > + the expected type *NO_ADD_ATTRS will be set to true. */ > > > + > > > +static tree > > > +handle_noinit_attribute (tree * node, > > > + tree name, > > > + tree args, > > > + intflags ATTRIBUTE_UNUSED, > > > + bool *no_add_attrs) > > > +{ > > > + const char *message = NULL; > > > + > > > + gcc_assert (DECL_P (*node)); > > > + gcc_assert (args == NULL); > > > + > > > + if (TREE_CODE (*node) != VAR_DECL) > > > +message = G_("%qE attribute only applies to variables"); > > > + > > > + /* Check that it's possible for the variable to have a section. */ > > > + if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p) > > > + && DECL_SECTION_NAME (*node)) > > > +message = G_("%qE attribute cannot be applied to variables " > > > + "with specific sections"); > > > + > > > + /* If this var is thought to be common, then change this. Common > > > + variables are assigned to sections before the backend has a > > > + chance to process them. */ > > > + if (DECL_COMMON (*node)) > > > +DECL_COMMON (*node) = 0; > > > + > > > + if (message) > > > +{ > > > + warning (OPT_Wattributes, message, name); > > > + *no_add_attrs = true; > > > +} > > > + > > > + return NULL_TREE; > > > +} > > > > This might cause us to clear DECL_COMMON even when rejecting the > > attribute. Also, the first message should win over the others > > (e.g. for a function in a particular section). > > > > So I think the "/* Check that it's possible ..." should be an else-if > > and the DECL_COMMON stuff should be specific to !message. > > You are right, thanks. > > Jozef, this is true for msp430_data_attr() too. I'll let you update it. > > > > > Since this is specific to ELF targets, I think we should also > > warn if !targetm.have_switchable_bss_sections. > > > OK > > > > @@ -2338,6 +2336,8 @@ msp430_select_section (tree decl, int reloc, > > > unsigned HOST_WIDE_INT align) > > > { > > >if (TREE_CODE (decl) == FUNCTION_DECL) > > > return text_section; > > > + /* FIXME: ATTR_NOINIT is handled generically in > > > + default_elf_select_section. */ > > >else if (has_attr (ATTR_NOINIT, decl)) > > > return noinit_section; > > >else if (has_attr (ATTR_PERSIST, decl)) > > > > I guess adding a FIXME is OK. It's very tempting to remove > > the noinit stuff and use default_elf_select_section instead of > > default_select_section, but I realise that'd be difficult to test. > > I added that as suggested by Jozef, it's best if he handles the > change you propose, I guess he can do proper testing. > > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > > index f2619e1..850153e 100644 > > > --- a/gcc/doc/extend.texi > > > +++ b/gcc/doc/extend.texi > > > @@ -7129,6 +7129,12 @@ The @code{visibility} attribute is described in > > > The @code{weak} attribute is described in > > > @ref{Common Function Attributes}. > > > > > > +@item noinit > > > +@cindex @code{noinit} variable attribute > > > +Any data with the @code{noinit} attribute will not be initialized by > > > +the C runtime startup code, or the program loader. Not initializing > > > +data in this way can reduce program startup times. > > > + > > > @end table > > > > > > @node ARC Variable Attributes > > > > Would be good to mention that the attribute is specific to ELF targets. > > (Yeah, we don't seem to do that consistently for other attributes.) > > It might also be worth saying that it requires specific linker support. > OK, done. > > > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > new file mode 100644 > > > index 000..f33c0d0 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > @@ -0,0 +1,59 @@ > > > +/* { dg-do run } */ > > > +/* { dg-require-effective-target noinit */ > > > +/* { dg-options "-O2" } */ > > > + > > > +/* This test checks that noinit data is handled correctly. */ > > > + > > > +extern void _start (void) __attribute__ ((noreturn)); > > > +extern void abort (void) __attribute__ ((noreturn)); > > > +extern void exit (int)
Re: [PR other/79543] Fix GNU ld --version scanning to conform to the GNU Coding Standards
On Thu, 4 Jul 2019, Chung-Lin Tang wrote: > Bringing back this issue, as this is still bothering our OpenACC toolchains. > > If the main variance in format was the 2007 ' ' to '.' change for non-release > binutils builds, then is the attached patch okay? > > What the patch does is to first look for an 8-digit part at the end > (preceded by either a space or '.'), chop it off if it exists, and then take > the last > space-preceded string. This patch is using \+ with sed, which is a GNU extension rather than part of POSIX sed. I don't think we require GNU sed to build GCC. The patch is OK for trunk with \+ changed to \{1,\} throughout (presuming that does indeed work). However, I strongly encourage a followup to refactor this code (*_CHECK_LINKER_FEATURES and *_ENABLE_SYMVERS that use it, not just the fragment that determines the linker version number), which is evidently duplicated far too much in different target library directories, into common macros in a .m4 file in the top-level config/ directory, so it's more maintainable in future. (Note 1: I don't know what differences there might be between the versions in different directories; that would need investigating as part of such a refactoring; differences need not be deliberate, they could easily have arisen by accident. Note 2: although libffi is maintained outside of GCC, I think such a refactoring should still be applied to the libffi directory along with the others; standalone libffi would simply need its own copy of the relevant .m4 file. Note 3: it should be possible to do such a refactoring bit by bit if that's more approachable, rather than necessarily doing a complete refactoring of all the definitions of all these macros at once.) > (seeking approval for trunk and all active release branches) For release branches you should at least wait a few weeks to see if any issues show up on trunk. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add missing popcount simplifications (PR90693)
On Tue, Aug 13, 2019 at 8:50 AM Wilco Dijkstra wrote: > > Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and > popcount (x) == 1 into (x-1) single-use cases and support an optional convert. A microbenchmark > shows a speedup of 2-2.5x on both x64 and AArch64. > > Bootstrap OK, OK for commit? I think this should be in expand stage where there could be comparison of the cost of the RTLs. The only reason why it is faster for AARCH64 is the requirement of moving between the GPRs and the SIMD registers. Thanks, Andrew Pinski > > ChangeLog: > 2019-08-13 Wilco Dijkstra > > gcc/ > PR middle-end/90693 > * match.pd: Add popcount simplifications. > > testsuite/ > PR middle-end/90693 > * gcc.dg/fold-popcount-5.c: Add new test. > > --- > > diff --git a/gcc/match.pd b/gcc/match.pd > index > 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d > 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > rep (eq eq ne ne) > (simplify >(cmp (popcount @0) integer_zerop) > - (rep @0 { build_zero_cst (TREE_TYPE (@0)); } > + (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) > + /* popcount(X) == 1 -> (X-1) + (for cmp (eq ne) > + rep (lt ge) > +(simplify > + (cmp (convert? (popcount:s @0)) integer_onep) > + (with { > + tree utype = unsigned_type_for (TREE_TYPE (@0)); > + tree a0 = fold_convert (utype, @0); } > + (rep (plus { a0; } { build_minus_one_cst (utype); }) > +(bit_and (negate { a0; }) { a0; }) > + /* popcount(X) > 1 -> (X & (X-1)) != 0. */ > + (for cmp (gt le) > + rep (ne eq) > +(simplify > + (cmp (convert? (popcount:s @0)) integer_onep) > + (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0) > + { build_zero_cst (TREE_TYPE (@0)); } > > /* Simplify: > > diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c > b/gcc/testsuite/gcc.dg/fold-popcount-5.c > new file mode 100644 > index > ..fcf3910587caacb8e39cf437dc3971df892f405a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c > @@ -0,0 +1,69 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +/* Test popcount (x) > 1 -> (x & (x-1)) != 0. */ > + > +int test_1 (long x) > +{ > + return __builtin_popcountl (x) >= 2; > +} > + > +int test_2 (int x) > +{ > + return (unsigned) __builtin_popcount (x) <= 1u; > +} > + > +int test_3 (unsigned x) > +{ > + return __builtin_popcount (x) > 1u; > +} > + > +int test_4 (unsigned long x) > +{ > + return (unsigned char) __builtin_popcountl (x) > 1; > +} > + > +int test_5 (unsigned long x) > +{ > + return (signed char) __builtin_popcountl (x) <= (signed char)1; > +} > + > +int test_6 (unsigned long long x) > +{ > + return 2u <= __builtin_popcountll (x); > +} > + > +/* Test popcount (x) == 1 -> (x-1) + > +int test_7 (unsigned long x) > +{ > + return __builtin_popcountl (x) != 1; > +} > + > +int test_8 (long long x) > +{ > + return (unsigned) __builtin_popcountll (x) == 1u; > +} > + > +int test_9 (int x) > +{ > + return (unsigned char) __builtin_popcount (x) != 1u; > +} > + > +int test_10 (unsigned x) > +{ > + return (unsigned char) __builtin_popcount (x) == 1; > +} > + > +int test_11 (long x) > +{ > + return (signed char) __builtin_popcountl (x) == 1; > +} > + > +int test_12 (long x) > +{ > + return 1u == __builtin_popcountl (x); > +} > + > +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */ > +
Re: [PATCH] Add missing popcount simplifications (PR90693)
On Tue, 13 Aug 2019, Wilco Dijkstra wrote: Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and popcount (x) == 1 into (x-1) Is that true even on targets that have a popcount instruction? (-mpopcnt for x64) diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) - (rep @0 { build_zero_cst (TREE_TYPE (@0)); } + (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) + /* popcount(X) == 1 -> (X-1) That doesn't seem right for a gimple transformation. I assume you didn't write (convert:utype @0) in the output because you want to avoid doing it 3 times? IIRC you are allowed to write (convert:utype@1 @0) in the output and reuse @1 several times. + (rep (plus { a0; } { build_minus_one_cst (utype); }) +(bit_and (negate { a0; }) { a0; }) + /* popcount(X) > 1 -> (X & (X-1)) != 0. */ + (for cmp (gt le) + rep (ne eq) +(simplify + (cmp (convert? (popcount:s @0)) integer_onep) + (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0) + { build_zero_cst (TREE_TYPE (@0)); } Are there any types where this could be a problem? Say if you cast to a 1-bit type. Actually, even converting popcnt(__uint128_t(-1)) to signed char may be problematic. -- Marc Glisse
[PATCH] Add missing popcount simplifications (PR90693)
Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and popcount (x) == 1 into (x-1) gcc/ PR middle-end/90693 * match.pd: Add popcount simplifications. testsuite/ PR middle-end/90693 * gcc.dg/fold-popcount-5.c: Add new test. --- diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) - (rep @0 { build_zero_cst (TREE_TYPE (@0)); } + (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) + /* popcount(X) == 1 -> (X-1) 1 -> (X & (X-1)) != 0. */ + (for cmp (gt le) + rep (ne eq) +(simplify + (cmp (convert? (popcount:s @0)) integer_onep) + (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0) + { build_zero_cst (TREE_TYPE (@0)); } /* Simplify: diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c b/gcc/testsuite/gcc.dg/fold-popcount-5.c new file mode 100644 index ..fcf3910587caacb8e39cf437dc3971df892f405a --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +/* Test popcount (x) > 1 -> (x & (x-1)) != 0. */ + +int test_1 (long x) +{ + return __builtin_popcountl (x) >= 2; +} + +int test_2 (int x) +{ + return (unsigned) __builtin_popcount (x) <= 1u; +} + +int test_3 (unsigned x) +{ + return __builtin_popcount (x) > 1u; +} + +int test_4 (unsigned long x) +{ + return (unsigned char) __builtin_popcountl (x) > 1; +} + +int test_5 (unsigned long x) +{ + return (signed char) __builtin_popcountl (x) <= (signed char)1; +} + +int test_6 (unsigned long long x) +{ + return 2u <= __builtin_popcountll (x); +} + +/* Test popcount (x) == 1 -> (x-1)
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
On 8/12/19 6:27 AM, Richard Biener wrote: > On Fri, 9 Aug 2019, Uros Bizjak wrote: > >> On Fri, Aug 9, 2019 at 3:00 PM Richard Biener wrote: >> (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI "TARGET_AVX512F"]) and then we need to split DImode for 32bits, too. >>> >>> For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode >>> condition, I'll provide _doubleword splitter later. >> >> Shouldn't that be TARGET_AVX512VL instead? Or does the insn use %g0 >> etc. >> to force use of %zmmN? > > It generates V4SI mode, so - yes, AVX512VL. case SMAX: case SMIN: case UMAX: case UMIN: if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) || (mode == SImode && !TARGET_SSE4_1)) return false; so there's no way to use AVX512VL for 32bit? >>> >>> There is a way, but on 32bit targets, we need to split DImode >>> operation to a sequence of SImode operations for unconverted pattern. >>> This is of course doable, but somehow more complex than simply >>> emitting a DImode compare + DImode cmove, which is what current >>> splitter does. So, a follow-up task. >> >> Please find attached the complete .md part that enables SImode for >> TARGET_SSE4_1 and DImode for TARGET_AVX512VL for both, 32bit and 64bit >> targets. The patterns also allows for memory operand 2, so STV has >> chance to create the vector pattern with implicit load. In case STV >> fails, the memory operand 2 is loaded to the register first; operand >> 2 is used in compare and cmove instruction, so pre-loading of the >> operand should be beneficial. > > Thanks. > >> Also note, that splitting should happen rarely. Due to the cost >> function, STV should effectively always convert minmax to a vector >> insn. > > I've analyzed the 464.h264ref slowdown on Haswell and it is due to > this kind of "simple" conversion: > > 5.50 │1d0: test %esi,%es > 0.07 │ mov$0x0,%ex >│ cmovs %eax,%es > 5.84 │ imul %r8d,%es > > to > > 0.65 │1e0: vpxor %xmm0,%xmm0,%xmm0 > 0.32 │ vpmaxs -0x10(%rsp),%xmm0,%xmm0 > 40.45 │ vmovd %xmm0,%eax > 2.45 │ imul %r8d,%eax > > which looks like a RA artifact in the end. We spill %esi only > with -mstv here as STV introduces a (subreg:V4SI ...) use > of a pseudo ultimatively set from di. STV creates an additional > pseudo for this (copy-in) but it places that copy next to the > original def rather than next to the start of the chain it > converts which is probably the issue why we spill. And this > is because it inserts those at each definition of the pseudo > rather than just at the reaching definition(s) or at the > uses of the pseudo in the chain (that because there may be > defs of that pseudo in the chain itself). Note that STV emits > such "conversion" copies as simple reg-reg moves: > > (insn 1094 3 4 2 (set (reg:SI 777) > (reg/v:SI 438 [ y ])) "refbuf.i":4:1 -1 > (nil)) > > but those do not prevail very long (this one gets removed by CSE2). > So IRA just sees the (subreg:V4SI (reg/v:SI 438 [ y ]) 0) use > and computes > > r438: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS > a297(r438,l0) costs: SSE_REGS:5628,5628 MEM:3618,3618 > > so I wonder if STV shouldn't instead emit gpr->xmm moves > here (but I guess nothing again prevents RTL optimizers from > combining that with the single-use in the max instruction...). > > So this boils down to STV splitting live-ranges but other > passes undoing that and then RA not considering splitting > live-ranges here, arriving at unoptimal allocation. > > A testcase showing this issue is (simplified from 464.h264ref > UMVLine16Y_11): > > unsigned short > UMVLine16Y_11 (short unsigned int * Pic, int y, int width) > { > if (y != width) > { > y = y < 0 ? 0 : y; > return Pic[y * width]; > } > return Pic[y]; > } > > where the condition and the Pic[y] load mimics the other use of y. > Different, even worse spilling is generated by > > unsigned short > UMVLine16Y_11 (short unsigned int * Pic, int y, int width) > { > y = y < 0 ? 0 : y; > return Pic[y * width] + y; > } > > I guess this all shows that STVs "trick" of simply wrapping > integer mode pseudos in (subreg:vector-mode ...) is bad? > > I've added a (failing) testcase to reflect the above. Experimenting a bit with just for the conversion insns using V4SImode
Re: [PATCH] Add --with-static-standard-libraries to the top level
Jonathan> What I don't understand is why GDB crashes. It should still be able to Jonathan> catch exceptions from a shared library even if linked to libstdc++.a, Jonathan> unless the static libstdc++.a is somehow incompatible with the shared Jonathan> libstdc++.so the shared lib linked to. Jonathan> Is this on GNU/Linux, or something with a different linking model? GNU/Linux, Fedora 29 in particular. I didn't look into why it fails but the gcc docs specifically mention this problem: '-static-libgcc' [...] There are several situations in which an application should use the shared 'libgcc' instead of the static version. The most common of these is when the application wishes to throw and catch exceptions across different shared libraries. In that case, each of the libraries as well as the application itself should use the shared 'libgcc'. I was able to reproduce it with a simple test program: $ cd /tmp $ cat t.cc void thrower() { throw 23; } $ g++ -fPIC -shared -o libt.so t.cc $ cat a.cc extern void thrower (); int main () { try { thrower (); } catch (...) { } return 0; } $ g++ -o a -static-libgcc -static-libstdc++ a.cc -L$(pwd) -lt $ LD_LIBRARY_PATH=$(pwd) ./a Aborted (core dumped) thanks, Tom
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
On 8/9/19 7:00 AM, Richard Biener wrote: > On Fri, 9 Aug 2019, Richard Biener wrote: > >> On Fri, 9 Aug 2019, Richard Biener wrote: >> >>> On Fri, 9 Aug 2019, Uros Bizjak wrote: >>> On Mon, Aug 5, 2019 at 3:09 PM Uros Bizjak wrote: >> (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI >> "TARGET_AVX512F"]) >> >> and then we need to split DImode for 32bits, too. > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode > condition, I'll provide _doubleword splitter later. Shouldn't that be TARGET_AVX512VL instead? Or does the insn use %g0 etc. to force use of %zmmN? >>> >>> It generates V4SI mode, so - yes, AVX512VL. >> >> case SMAX: >> case SMIN: >> case UMAX: >> case UMIN: >> if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) >> || (mode == SImode && !TARGET_SSE4_1)) >> return false; >> >> so there's no way to use AVX512VL for 32bit? > > There is a way, but on 32bit targets, we need to split DImode > operation to a sequence of SImode operations for unconverted pattern. > This is of course doable, but somehow more complex than simply > emitting a DImode compare + DImode cmove, which is what current > splitter does. So, a follow-up task. Please find attached the complete .md part that enables SImode for TARGET_SSE4_1 and DImode for TARGET_AVX512VL for both, 32bit and 64bit targets. The patterns also allows for memory operand 2, so STV has chance to create the vector pattern with implicit load. In case STV fails, the memory operand 2 is loaded to the register first; operand 2 is used in compare and cmove instruction, so pre-loading of the operand should be beneficial. >>> >>> Thanks. >>> Also note, that splitting should happen rarely. Due to the cost function, STV should effectively always convert minmax to a vector insn. >>> >>> I've analyzed the 464.h264ref slowdown on Haswell and it is due to >>> this kind of "simple" conversion: >>> >>> 5.50 │1d0: test %esi,%es >>> 0.07 │ mov$0x0,%ex >>>│ cmovs %eax,%es >>> 5.84 │ imul %r8d,%es >>> >>> to >>> >>> 0.65 │1e0: vpxor %xmm0,%xmm0,%xmm0 >>> 0.32 │ vpmaxs -0x10(%rsp),%xmm0,%xmm0 >>> 40.45 │ vmovd %xmm0,%eax >>> 2.45 │ imul %r8d,%eax >>> >>> which looks like a RA artifact in the end. We spill %esi only >>> with -mstv here as STV introduces a (subreg:V4SI ...) use >>> of a pseudo ultimatively set from di. STV creates an additional >>> pseudo for this (copy-in) but it places that copy next to the >>> original def rather than next to the start of the chain it >>> converts which is probably the issue why we spill. And this >>> is because it inserts those at each definition of the pseudo >>> rather than just at the reaching definition(s) or at the >>> uses of the pseudo in the chain (that because there may be >>> defs of that pseudo in the chain itself). Note that STV emits >>> such "conversion" copies as simple reg-reg moves: >>> >>> (insn 1094 3 4 2 (set (reg:SI 777) >>> (reg/v:SI 438 [ y ])) "refbuf.i":4:1 -1 >>> (nil)) >>> >>> but those do not prevail very long (this one gets removed by CSE2). >>> So IRA just sees the (subreg:V4SI (reg/v:SI 438 [ y ]) 0) use >>> and computes >>> >>> r438: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS >>> a297(r438,l0) costs: SSE_REGS:5628,5628 MEM:3618,3618 >>> >>> so I wonder if STV shouldn't instead emit gpr->xmm moves >>> here (but I guess nothing again prevents RTL optimizers from >>> combining that with the single-use in the max instruction...). >>> >>> So this boils down to STV splitting live-ranges but other >>> passes undoing that and then RA not considering splitting >>> live-ranges here, arriving at unoptimal allocation. >>> >>> A testcase showing this issue is (simplified from 464.h264ref >>> UMVLine16Y_11): >>> >>> unsigned short >>> UMVLine16Y_11 (short unsigned int * Pic, int y, int width) >>> { >>> if (y != width) >>> { >>> y = y < 0 ? 0 : y; >>> return Pic[y * width]; >>> } >>> return Pic[y]; >>> } >>> >>> where the condition and the Pic[y] load mimics the other use of y. >>> Different, even worse spilling is generated by >>> >>> unsigned short >>> UMVLine16Y_11 (short unsigned int * Pic, int y, int width) >>> { >>> y = y < 0 ? 0 : y; >>> return Pic[y * width] + y; >>> } >>> >>> I guess this all shows that STVs "trick" of simply wrapping >>> integer mode pseudos in (subreg:vector-mode ...) is bad? >>> >>> I've added a (failing) testcase to reflect the above. >> >> Experimenting a bit with just for the conversion insns using >> V4SImode pseudos we end up preserving those moves (but I >> do have to use a lowpart set, using reg:V4SI = subreg:V4SI Simode-reg >> ends up using
Re: [PATCH] Improve DSE to handle redundant zero initializations.
Hello, This should have the changes you all asked for! Let me know if I missed anything! Thank you, Matthew Beliveau On Tue, Aug 13, 2019 at 5:15 AM Richard Biener wrote: > > On Tue, Aug 13, 2019 at 10:18 AM Richard Sandiford > wrote: > > > > Thanks for doing this. > > > > Matthew Beliveau writes: > > > diff --git gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c > > > index 5b7c4fc6d1a..dcaeb8edbfe 100644 > > > --- gcc/tree-ssa-dse.c > > > +++ gcc/tree-ssa-dse.c > > > @@ -628,11 +628,8 @@ dse_optimize_redundant_stores (gimple *stmt) > > >tree fndecl; > > >if ((is_gimple_assign (use_stmt) > > > && gimple_vdef (use_stmt) > > > -&& ((gimple_assign_rhs_code (use_stmt) == CONSTRUCTOR > > > - && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (use_stmt)) == 0 > > > - && !gimple_clobber_p (stmt)) > > > -|| (gimple_assign_rhs_code (use_stmt) == INTEGER_CST > > > -&& integer_zerop (gimple_assign_rhs1 (use_stmt) > > > +&& initializer_zerop (gimple_op (use_stmt, 1), NULL) > > > +&& !gimple_clobber_p (stmt)) > > > || (gimple_call_builtin_p (use_stmt, BUILT_IN_NORMAL) > > > && (fndecl = gimple_call_fndecl (use_stmt)) != NULL > > > && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET > > > @@ -1027,15 +1024,13 @@ dse_dom_walker::dse_optimize_stmt > > > (gimple_stmt_iterator *gsi) > > > { > > >bool by_clobber_p = false; > > > > > > - /* First see if this store is a CONSTRUCTOR and if there > > > - are subsequent CONSTRUCTOR stores which are totally > > > - subsumed by this statement. If so remove the subsequent > > > - CONSTRUCTOR store. > > > + /* Check if this store initalizes zero, or some aggregate of zeros, > > > + and check if there are subsequent stores which are subsumed by this > > > + statement. If so, remove the subsequent store. > > > > > >This will tend to make fewer calls into memset with longer > > >arguments. */ > > > - if (gimple_assign_rhs_code (stmt) == CONSTRUCTOR > > > - && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt)) == 0 > > > + if (initializer_zerop (gimple_op (stmt, 1), NULL) > > > && !gimple_clobber_p (stmt)) > > > dse_optimize_redundant_stores (stmt); > > > > > > > In addition to Jeff's comment, the original choice of gimple_assign_rhs1 > > is the preferred way to write this (applies to both hunks). > > And the !gimple_clobber_p test is now redundant. > > > Richard Bootstrapped/regtested on x86_64-linux, ok for trunk? 2019-08-13 Matthew Beliveau * tree-ssa-dse.c (dse_optimize_redundant_stores): Improved check to catch more redundant zero initialization cases. (dse_dom_walker::dse_optimize_stmt): Likewise. * gcc.dg/tree-ssa/redundant-assign-zero-1.c: New test. * gcc.dg/tree-ssa/redundant-assign-zero-2.c: New test. diff --git gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-1.c gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-1.c new file mode 100644 index 000..b8d01d1644b --- /dev/null +++ gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse-details" } */ + +void blah (char *); + +void bar () +{ + char a[256] = ""; + a[3] = 0; + blah (a); +} + +/* { dg-final { scan-tree-dump-times "Deleted redundant store" 1 "dse1"} } */ diff --git gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-2.c gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-2.c new file mode 100644 index 000..8cefa6f0cb7 --- /dev/null +++ gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse-details" } */ + +#include + +void blahd (double *); + +void fubar () +{ + double d; + double *x = + + memset (, 0 , sizeof d); + *x = 0.0; + blahd (x); +} + +/* { dg-final { scan-tree-dump-times "Deleted redundant store" 1 "dse1"} } */ diff --git gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c index 5b7c4fc6d1a..14b66228e1e 100644 --- gcc/tree-ssa-dse.c +++ gcc/tree-ssa-dse.c @@ -628,11 +628,7 @@ dse_optimize_redundant_stores (gimple *stmt) tree fndecl; if ((is_gimple_assign (use_stmt) && gimple_vdef (use_stmt) - && ((gimple_assign_rhs_code (use_stmt) == CONSTRUCTOR - && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (use_stmt)) == 0 - && !gimple_clobber_p (stmt)) - || (gimple_assign_rhs_code (use_stmt) == INTEGER_CST - && integer_zerop (gimple_assign_rhs1 (use_stmt) + && initializer_zerop (gimple_assign_rhs1 (use_stmt))) || (gimple_call_builtin_p (use_stmt, BUILT_IN_NORMAL) && (fndecl = gimple_call_fndecl (use_stmt)) != NULL && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET @@ -1027,16 +1023,10 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) { bool by_clobber_p = false; - /* First see if this store is a CONSTRUCTOR and if there
Re: [PATCH, contrib]: Do not escape "=".
On 8/13/19 2:55 AM, Uros Bizjak wrote: > Attached patch fixes the following gawk-5.0.1 warning: > > gawk: cmd. line:36: warning: regexp escape sequence `\=' is not a > known regexp operator > > 2019-08-13 Uros Bizjak > > * test_summary: Do not escape "=". > > Tested by building mail-report.log with gawk-5.0.1 and gawk-4.2.1. > > OK for mainline? > > Uros. > OK jeff
Re: Protect some checks of DECL_FUNCTION_CODE
On 8/13/19 3:34 AM, Richard Sandiford wrote: > This patch protects various uses of DECL_FUNCTION_CODE that didn't > obviously check for BUILT_IN_NORMAL first (either directly or in callers). > They could therefore trigger for functions that either aren't built-ins > or are a different kind of built-in. > > Also, the patch removes a redundant GIMPLE_CALL check from > optimize_stdarg_builtin, since it gave the impression that the stmt > was less well checked than it actually is. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu (--enable-languages=all > --enable-host-shared). OK to install? > > Richard > > > 2019-08-13 Richard Sandiford > > gcc/ > PR middle-end/91421 > * attribs.c (decl_attributes): Check the DECL_BUILT_IN_CLASS > before the DECL_FUNCTION_CODE. > * calls.c (maybe_warn_alloc_args_overflow): Use fndecl_built_in_p > to check for a BUILT_IN_ALLOCA call. > * ipa-cp.c (ipa_get_indirect_edge_target_1): Likewise for > BUILT_IN_UNREACHABLE. Don't check for a FUNCTION_TYPE. > * ipa-devirt.c (possible_polymorphic_call_target_p): Likewise. > * ipa-prop.c (try_make_edge_direct_virtual_call): Likewise. > * gimple-ssa-isolate-paths.c (is_addr_local): Check specifically > for BUILT_IN_NORMAL functions. > * trans-mem.c (expand_block_edges): Use gimple_call_builtin_p to > test for BUILT_IN_TM_ABORT. > * tree-ssa-ccp.c (optimize_stack_restore): Use fndecl_built_in_p > to check for a BUILT_IN_STACK_RESTORE call. > (optimize_stdarg_builtin): Remove redundant check for GIMPLE_CALL. > * tree-ssa-threadedge.c > (record_temporary_equivalences_from_stmts_at_dest): Check for a > BUILT_IN_NORMAL decl before checking its DECL_FUNCTION_CODE. > * tree-vect-patterns.c (vect_recog_pow_pattern): Use a positive > test for a BUILT_IN_NORMAL call instead of a negative test for > an internal function call. > > gcc/c/ > PR middle-end/91421 > * c-decl.c (header_for_builtin_fn): Take a FUNCTION_DECL instead > of a built_in_function. > (diagnose_mismatched_decls, implicitly_declare): Update accordingly. OK jeff
Re: [PATCH] Improve documentation of target hooks for libc functions
On 8/13/19 7:24 AM, Jonathan Wakely wrote: > * target.def (libc_has_function, libc_has_fast_function): Improve > documentation strings. > * doc/tm.texi: Regenerate. > > Bootstrapped x86_64-linux. > > OK for trunk? > > OK jeff
Re: [C++ Patch] Improve grok_array_decl location
On 8/13/19 5:22 AM, Paolo Carlini wrote: Hi, for this error message about the types involved in array subscripting I think we can exploit the available location argument and point to the open square bracket instead of simply using input_location. Tested x86_64-linux. Thanks, Paolo Carlini. OK. Jason
Re: C++ PATCH for c++/91416 - GC during late parsing collects live data
On 8/12/19 4:37 AM, Richard Biener wrote: On Sun, Aug 11, 2019 at 7:12 PM Marek Polacek wrote: This is a crash that points to a GC problem. Consider this test: __attribute__ ((unused)) struct S { S() { } } s; We're parsing a simple-declaration. While parsing the decl specs, we parse the attribute, which means creating a TREE_LIST using ggc_alloc_*. A function body is a complete-class context so when parsing the member-specification of this class-specifier, we parse the bodies of the functions we'd queued in cp_parser_late_parsing_for_member. This then leads to this call chain: cp_parser_function_definition_after_declarator -> expand_or_defer_fn -> expand_or_defer_fn_1 -> maybe_clone_body -> expand_or_defer_fn -> cgraph_node::finalize_function -> ggc_collect. In this test, the ggc_collect call collects the TREE_LIST we had allocated, and a crash duly ensues. We can't avoid late parsing of members in this context, so my fix is to bump function_depth, exactly following cp_parser_lambda_body. Since we are performing late parsing, we know we have to be nested in a class. (We still ggc_collect later, in c_parse_final_cleanups.) So the struct S itself is properly referenced by a GC root? If so why not attach the attribute list to the tentative struct instead? Or do we fear we have other non-rooted data live at the point we collect? If so shouldn't we instead bump function_depth when parsing a declaration in general? It's already a significant issue for C++ that we only collect between function declarations, I'm concerned that this change will cause more memory problems. Perhaps if we start parsing a class-specifier we could push the decl_specifiers onto a vec that is a GC root? Jason But here's the thing. This goes back to ancient r104500, at least. How has this not broken before? All you need to trigger it is to enable GC checking and have a class with a ctor/member function, that has an attribute. ...and is defined in the declaration of something else? Does it happen without declaring the variable 's'? Jason
Re: [PATCH 0/3] Libsanitizer: merge from trunk
On 8/13/19 7:07 AM, Martin Liska wrote: > Hi. > > For this year, I decided to make a first merge now and the > next (much smaller) at the end of October. > > The biggest change is rename of many files from .cc to .cpp. > > I bootstrapped the patch set on x86_64-linux-gnu and run > asan/ubsan/tsan tests on x86_64, ppc64le (power8) and > aarch64. > > Libasan SONAME has been already bumped compared to GCC 9. > > For other libraries, I don't see a reason for library bumping: > > $ abidiff /usr/lib64/libubsan.so.1.0.0 > ./x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0 --stat > Functions changes summary: 0 Removed, 0 Changed, 4 Added functions > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > Function symbols changes summary: 3 Removed, 0 Added function symbols not > referenced by debug info > Variable symbols changes summary: 0 Removed, 0 Added variable symbol not > referenced by debug info > > $ abidiff /usr/lib64/libtsan.so.0.0.0 > ./x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.0.0.0 --stat > Functions changes summary: 0 Removed, 0 Changed, 47 Added functions > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > Function symbols changes summary: 1 Removed, 2 Added function symbols not > referenced by debug info > Variable symbols changes summary: 0 Removed, 0 Added variable symbol not > referenced by debug info > > Ready to be installed? ISTM that a sanitizer merge during stage1 should be able to move forward without ACKs. Similarly for other runtimes where we pull from some upstream master. I'd be slightly concerned about the function removals, but I don't think we've really tried to be ABI stable for the sanitizer runtimes. jeff
Re: C++ PATCH for more checking of DECL_MUTABLE_P
On 8/10/19 2:57 PM, Marek Polacek wrote: Some preparatory work before adding constinit. mutable is only appropriate for FIELD_DECLs (when they're not static), but we've never made sure that we're not setting DECL_MUTABLE_P on a different _DECL. I mean to use DECL_LANG_FLAG_0 in a VAR_DECL for DECL_DECLARED_CONSTINIT_P. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2019-08-10 Marek Polacek * cp-tree.h (DECL_MUTABLE_P): Use FIELD_DECL_CHECK. OK. Jason
Re: [C++ Patch] Improve start_function and grokmethod locations
On 8/9/19 7:46 AM, Paolo Carlini wrote: + error_at (id_loc, + "typedef may not be a function definition"); We might use the location of the typedef keyword. OK either way. Jason
Re: C++ PATCH for c++/90473 - wrong code with nullptr in default argument
On 8/8/19 8:56 PM, Marek Polacek wrote: This is a wrong-code bug where we are throwing away the function call in the default argument here: void fn1 (void* p = (f(), nullptr)) { } and thus dropping the possible side-effects of that call to the floor. That happens because check_default_argument returns nullptr_node when it sees a null_ptr_cst_p argument. In this case the argument is a COMPOUND_EXPR "f(), nullptr" and null_ptr_cst_p returns true for it. And so the effects of the LHS expression of the compound expr are forgotten. Fixed as below. It's tempting to change null_ptr_cst_p to return false when it sees something that has TREE_SIDE_EFFECTS, but that would be wrong I think: [conv.ptr] says that a null pointer constant is an integer literal with value zero or a prvalue of type std::nullptr_t. An expression "a, b", the built-in comma expression, is a prvalue if 'b' is an rvalue. And so "f(), nullptr" is a prvalue of type std::nullptr_t. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2019-08-08 Marek Polacek PR c++/90473 - wrong code with nullptr in default argument. * call.c (null_ptr_cst_p): Update quote from the standard. * decl.c (check_default_argument): Don't return nullptr when the arg has side-effects. OK. Jason
Re: [PATCH][2/2][i386] STV changes, reg-copy as vector
On Tue, 13 Aug 2019, Uros Bizjak wrote: > On Tue, Aug 13, 2019 at 2:33 PM Richard Biener wrote: > > > > > > The following splits out the change that makes the live-range split / > > mode change copy pseudos vector mode pseudos. I've tried reproducing > > the issue I faced with SImode chains but failed likely because we're > > never using simple moves in the end while for SImode we can end up > > with > > > > (set (subreg:V4SI (reg:SI 42) 0) (subreg:V4SI (reg:SI 41) 0)) > > > > which we happily propagate out. So pursuing this patch independently > > doesn't make much sense. Still the main change is in > > dimode_scalar_chain::make_vector_copies where 'vreg' is now V2DImode > > instead of DImode and the change to emit_move_insn (vreg, tmp) > > hints at the change done to the above problematic insn. > > > > The rest of the changes deal with some regs already being in > > the appropriate vector mode. > > > > I realize I might have avoided my original issue by emitting not > > the above reg-reg move but > > > > (set (subreg:V4SI (reg:SI 42) 0) > >(vec_merge:V4SI > > (vec_duplicate:V4SI (reg:SI 41)) > > (const_vec [0 0 0 0]) > > (const_1))) > > > > but I didn't try (the full patch contained a testcase for the > > issue). Still it seems odd to use DImode pseudos when all uses > > are wrapped in (subreg:V2DI ...). > > It looks to me that the above is the way to go. make_vector_copies > creates a scalar pseudo, and all other support functions expect a > scalar that will be processed with "replace_with_subreg". I think that > the safest way is to emit the code above for SImode for the > problematic move insn, and > > > + emit_move_insn (gen_rtx_SUBREG (V2DImode, vreg 0), > > + gen_rtx_VEC_MERGE (V2DImode, > > + gen_rtx_VEC_DUPLICATE > > (V2DImode, > > + tmp), > > + CONST0_RTX (V2DImode), > > + GEN_INT (HOST_WIDE_INT_1U))); > > for DImode. This way, you won't need other changes (conditional > generation of SUBREGs), the above should be enough. > > Oh, and don't use emit_move_insn, emit insn with gen_rtx_SET > (gen_rtx_SUBREG (...)) should do the trick. OK, that seems to work on the testcases, which makes splitting out this patch pointless. Integrated the change with the full patch. Thanks, Richard.
Re: [PATCH][2/2][i386] STV changes, reg-copy as vector
On Tue, Aug 13, 2019 at 2:33 PM Richard Biener wrote: > > > The following splits out the change that makes the live-range split / > mode change copy pseudos vector mode pseudos. I've tried reproducing > the issue I faced with SImode chains but failed likely because we're > never using simple moves in the end while for SImode we can end up > with > > (set (subreg:V4SI (reg:SI 42) 0) (subreg:V4SI (reg:SI 41) 0)) > > which we happily propagate out. So pursuing this patch independently > doesn't make much sense. Still the main change is in > dimode_scalar_chain::make_vector_copies where 'vreg' is now V2DImode > instead of DImode and the change to emit_move_insn (vreg, tmp) > hints at the change done to the above problematic insn. > > The rest of the changes deal with some regs already being in > the appropriate vector mode. > > I realize I might have avoided my original issue by emitting not > the above reg-reg move but > > (set (subreg:V4SI (reg:SI 42) 0) >(vec_merge:V4SI > (vec_duplicate:V4SI (reg:SI 41)) > (const_vec [0 0 0 0]) > (const_1))) > > but I didn't try (the full patch contained a testcase for the > issue). Still it seems odd to use DImode pseudos when all uses > are wrapped in (subreg:V2DI ...). It looks to me that the above is the way to go. make_vector_copies creates a scalar pseudo, and all other support functions expect a scalar that will be processed with "replace_with_subreg". I think that the safest way is to emit the code above for SImode for the problematic move insn, and > + emit_move_insn (gen_rtx_SUBREG (V2DImode, vreg 0), > + gen_rtx_VEC_MERGE (V2DImode, > + gen_rtx_VEC_DUPLICATE > (V2DImode, > + tmp), > + CONST0_RTX (V2DImode), > + GEN_INT (HOST_WIDE_INT_1U))); for DImode. This way, you won't need other changes (conditional generation of SUBREGs), the above should be enough. Oh, and don't use emit_move_insn, emit insn with gen_rtx_SET (gen_rtx_SUBREG (...)) should do the trick. Uros. > As (unrelated) change I propose to error with fatal_insn_not_found > in case recog doesn't recognize the altered insns which make it > easier to spot errors early. > > Bootstrap / regtest running on x86_64-unknown-linux-gnu ontop of [1/2], OK > for trunk if that succeeds? > > Thanks, > Richard. > > 2019-08-13 Richard Biener > > * config/i386/i386-features.c > (dimode_scalar_chain::replace_with_subreg): Elide the subreg if the > reg is already vector. > (dimode_scalar_chain::convert_op): Likewise. > (dimode_scalar_chain::make_vector_copies): Use a vector-mode pseudo as > destination to avoid later passes copy-propagating it out. > (dimode_chain::convert_insn): Convert the source of a > memory op in case it is not of the appropriate scalar mode. > Raise fatal_insn_not_found if the converted insn is not recognized. > > Index: gcc/config/i386/i386-features.c > === > --- gcc/config/i386/i386-features.c (revision 274328) > +++ gcc/config/i386/i386-features.c (working copy) > @@ -573,7 +573,8 @@ rtx > dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) > { >if (x == reg) > -return gen_rtx_SUBREG (V2DImode, new_reg, 0); > +return (GET_MODE (new_reg) == V2DImode > + ? new_reg : gen_rtx_SUBREG (V2DImode, new_reg, 0)); > >const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); >int i, j; > @@ -627,7 +628,7 @@ void > dimode_scalar_chain::make_vector_copies (unsigned regno) > { >rtx reg = regno_reg_rtx[regno]; > - rtx vreg = gen_reg_rtx (DImode); > + rtx vreg = gen_reg_rtx (V2DImode); >df_ref ref; > >for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) > @@ -641,7 +642,12 @@ dimode_scalar_chain::make_vector_copies > gen_rtx_SUBREG (SImode, reg, 0)); > emit_move_insn (adjust_address (tmp, SImode, 4), > gen_rtx_SUBREG (SImode, reg, 4)); > - emit_move_insn (vreg, tmp); > + emit_move_insn (vreg, > + gen_rtx_VEC_MERGE (V2DImode, > + gen_rtx_VEC_DUPLICATE > (V2DImode, > + tmp), > + CONST0_RTX (V2DImode), > + GEN_INT (HOST_WIDE_INT_1U))); > } > else if (TARGET_SSE4_1) > { > @@ -841,7 +847,8 @@ dimode_scalar_chain::convert_op (rtx *op > gcc_assert (!DF_REF_CHAIN (ref)); > break; > } > - *op = gen_rtx_SUBREG (V2DImode, *op, 0); > + if (GET_MODE (*op)
[PATCH] Update to MAINTAINERS
I have added myself to the MAINTAINERS file. -- https://www.codethink.co.uk/privacy.html Index: MAINTAINERS === --- MAINTAINERS (revision 274379) +++ MAINTAINERS (working copy) @@ -371,6 +371,7 @@ Jason Eckhardt Bernd Edlinger Phil Edwards +Mark Eggleston Steve Ellcey Mohan Embar Revital Eres
Re: [patch, fortran] Fix PR 90561
Hi Thomas, Yes that's good to apply to 9 and 10 branches. I am certain that there are other places where the new function will be very helpful. Thanks for the patch. Paul On Tue, 13 Aug 2019 at 13:59, Thomas König wrote: > > Hello world, > > this patch fixes a 9/10 regression by placing the variable used to > hold a string length at function scope. > > I chose to implement this version of gfc_evaluate_now as a separate > function because I have a sneaking suspicion this may not be the > last time we are going to encounter something like that - better > have the function there for future use. > > Regression-tested. OK for trunk and gcc-9? > > Regards > > Thomas -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
[PATCH 2/2] Add more entries to the C++ get_std_name_hint array
This adds some commonly-used C++11/14 names, and some new C++17/20 names. The latter aren't available when using the -std=gnu++14 default, so the fix-it suggesting to use a newer dialect is helpful. * name-lookup.c (get_std_name_hint): Add more entries. Tested x86_64-linux. OK for trunk? commit 0c12d092e5659689951b5e134b84c5b506b543d5 Author: Jonathan Wakely Date: Tue Aug 13 13:42:40 2019 +0100 Add more entries to the C++ get_std_name_hint array * name-lookup.c (get_std_name_hint): Add more entries. diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 96b2d90540d..68ebd7655c7 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -5550,9 +5550,11 @@ get_std_name_hint (const char *name) {"make_any", "", cxx17}, /* . */ {"array", "", cxx11}, +{"to_array", "", cxx2a}, /* . */ {"atomic", "", cxx11}, {"atomic_flag", "", cxx11}, +{"atomic_ref", "", cxx2a}, /* . */ {"bitset", "", cxx11}, /* . */ @@ -5575,9 +5577,17 @@ get_std_name_hint (const char *name) {"ofstream", "", cxx98}, /* . */ {"bind", "", cxx11}, +{"bind_front", "", cxx2a}, {"function", "", cxx11}, {"hash", "", cxx11}, +{"invoke", "", cxx17}, {"mem_fn", "", cxx11}, +{"not_fn", "", cxx17}, +{"reference_wrapper", "", cxx11}, +{"unwrap_reference", "", cxx2a}, +{"unwrap_reference_t", "", cxx2a}, +{"unwrap_ref_decay", "", cxx2a}, +{"unwrap_ref_decay_t", "", cxx2a}, /* . */ {"async", "", cxx11}, {"future", "", cxx11}, @@ -5618,11 +5628,16 @@ get_std_name_hint (const char *name) {"map", "", cxx98}, {"multimap", "", cxx98}, /* . */ +{"allocate_shared", "", cxx11}, +{"allocator", "", cxx98}, +{"allocator_traits", "", cxx11}, {"make_shared", "", cxx11}, {"make_unique", "", cxx14}, {"shared_ptr", "", cxx11}, {"unique_ptr", "", cxx11}, {"weak_ptr", "", cxx11}, +/* . */ +{"pmr", "", cxx17}, /* . */ {"mutex", "", cxx11}, {"timed_mutex", "", cxx11}, @@ -5672,14 +5687,39 @@ get_std_name_hint (const char *name) {"u16string", "", cxx11}, {"u32string", "", cxx11}, /* . */ +{"basic_string_view", "", cxx17}, {"string_view", "", cxx17}, /* . */ {"thread", "", cxx11}, +{"this_thread", "", cxx11}, /* . */ +{"apply", "", cxx17}, +{"forward_as_tuple", "", cxx11}, +{"make_from_tuple", "", cxx17}, {"make_tuple", "", cxx11}, +{"tie", "", cxx11}, {"tuple", "", cxx11}, +{"tuple_cat", "", cxx11}, {"tuple_element", "", cxx11}, +{"tuple_element_t", "", cxx14}, {"tuple_size", "", cxx11}, +{"tuple_size_v", "", cxx17}, +/* . */ +{"enable_if", "", cxx11}, +{"enable_if_t", "", cxx14}, +{"invoke_result", "", cxx17}, +{"invoke_result_t", "", cxx17}, +{"remove_cvref", "", cxx17}, +{"remove_cvref_t", "", cxx17}, +{"type_identity", "", cxx2a}, +{"type_identity_t", "", cxx2a}, +{"void_t", "", cxx17}, +{"conjunction", "", cxx17}, +{"conjunction_v", "", cxx17}, +{"disjunction", "", cxx17}, +{"disjunction_v", "", cxx17}, +{"negation", "", cxx17}, +{"negation_v", "", cxx17}, /* . */ {"unordered_map", "", cxx11}, {"unordered_multimap", "", cxx11},
[PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
The std::make_unique function wasn't added until C++14, and neither was the std::complex_literals namespace. gcc/cp: PR c++/91436 * name-lookup.c (get_std_name_hint): Fix min_dialect field for complex_literals and make_unique entries. gcc/testsuite: PR c++/91436 * g++.dg/lookup/missing-std-include-5.C: Limit test to C++14 and up. * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. Tested x86_64-linux. OK for trunk? commit b58e771fe21ae380200fdec00aa899d4b15b73d5 Author: Jonathan Wakely Date: Tue Aug 13 13:25:39 2019 +0100 PR c++/91436 fix C++ dialect for std::make_unique fix-it hint The std::make_unique function wasn't added until C++14, and neither was the std::complex_literals namespace. gcc/cp: PR c++/91436 * name-lookup.c (get_std_name_hint): Fix min_dialect field for complex_literals and make_unique entries. gcc/testsuite: PR c++/91436 * g++.dg/lookup/missing-std-include-5.C: Limit test to C++14 and up. * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 9f278220df3..96b2d90540d 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -5557,7 +5557,7 @@ get_std_name_hint (const char *name) {"bitset", "", cxx11}, /* . */ {"complex", "", cxx98}, -{"complex_literals", "", cxx98}, +{"complex_literals", "", cxx14}, /* . */ {"condition_variable", "", cxx11}, {"condition_variable_any", "", cxx11}, @@ -5619,7 +5619,7 @@ get_std_name_hint (const char *name) {"multimap", "", cxx98}, /* . */ {"make_shared", "", cxx11}, -{"make_unique", "", cxx11}, +{"make_unique", "", cxx14}, {"shared_ptr", "", cxx11}, {"unique_ptr", "", cxx11}, {"weak_ptr", "", cxx11}, diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C b/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C index fe880a6263b..3ec9abd9316 100644 --- a/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C +++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C @@ -1,2 +1,3 @@ +// { dg-do compile { target c++14 } } using namespace std::complex_literals; // { dg-error "" } // { dg-message "#include " "" { target *-*-* } .-1 } diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C b/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C index d9eeb4284e8..a8f27473e6d 100644 --- a/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C +++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C @@ -11,15 +11,6 @@ void test_make_shared () // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } } -template -void test_make_unique () -{ - auto p = std::make_unique(); // { dg-error "'make_unique' is not a member of 'std'" } - // { dg-message "'#include '" "" { target *-*-* } .-1 } - // { dg-error "expected primary-expression before '>' token" "" { target *-*-* } .-2 } - // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } -} - std::shared_ptr test_shared_ptr; // { dg-error "'shared_ptr' in namespace 'std' does not name a template type" } // { dg-message "'#include '" "" { target *-*-* } .-1 }
[PATCH] Improve documentation of target hooks for libc functions
* target.def (libc_has_function, libc_has_fast_function): Improve documentation strings. * doc/tm.texi: Regenerate. Bootstrapped x86_64-linux. OK for trunk? commit c3645cae6c78896218c60135349056a5bf943b8a Author: Jonathan Wakely Date: Tue Aug 13 14:01:01 2019 +0100 Improve documentation of target hooks for libc functions * target.def (libc_has_function, libc_has_fast_function): Improve documentation strings. * doc/tm.texi: Regenerate. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8e5b01c9383..89990cbc871 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5591,12 +5591,12 @@ macro, a reasonable default is used. @deftypefn {Target Hook} bool TARGET_LIBC_HAS_FUNCTION (enum function_class @var{fn_class}) This hook determines whether a function from a class of functions -@var{fn_class} is present at the runtime. +@var{fn_class} is present in the target C library. @end deftypefn @deftypefn {Target Hook} bool TARGET_LIBC_HAS_FAST_FUNCTION (int @var{fcode}) This hook determines whether a function from a class of functions -@var{fn_class} has a fast implementation. +@code{(enum function_class)}@var{fcode} has a fast implementation. @end deftypefn @defmac NEXT_OBJC_RUNTIME diff --git a/gcc/target.def b/gcc/target.def index 7cc0f37a0d1..73334e0e8fc 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2545,14 +2545,14 @@ set via @code{__attribute__}.", DEFHOOK (libc_has_function, "This hook determines whether a function from a class of functions\n\ -@var{fn_class} is present at the runtime.", +@var{fn_class} is present in the target C library.", bool, (enum function_class fn_class), default_libc_has_function) DEFHOOK (libc_has_fast_function, "This hook determines whether a function from a class of functions\n\ -@var{fn_class} has a fast implementation.", +@code{(enum function_class)}@var{fcode} has a fast implementation.", bool, (int fcode), default_libc_has_fast_function)
[PATCH 0/3] Libsanitizer: merge from trunk
Hi. For this year, I decided to make a first merge now and the next (much smaller) at the end of October. The biggest change is rename of many files from .cc to .cpp. I bootstrapped the patch set on x86_64-linux-gnu and run asan/ubsan/tsan tests on x86_64, ppc64le (power8) and aarch64. Libasan SONAME has been already bumped compared to GCC 9. For other libraries, I don't see a reason for library bumping: $ abidiff /usr/lib64/libubsan.so.1.0.0 ./x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0 --stat Functions changes summary: 0 Removed, 0 Changed, 4 Added functions Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 3 Removed, 0 Added function symbols not referenced by debug info Variable symbols changes summary: 0 Removed, 0 Added variable symbol not referenced by debug info $ abidiff /usr/lib64/libtsan.so.0.0.0 ./x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.0.0.0 --stat Functions changes summary: 0 Removed, 0 Changed, 47 Added functions Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 1 Removed, 2 Added function symbols not referenced by debug info Variable symbols changes summary: 0 Removed, 0 Added variable symbol not referenced by debug info Ready to be installed? Thanks, Martin Martin Liska (3): Libsanitizer merge from trunk r368656. Reapply all revisions mentioned in LOCAL_PATCHES. Fix a test-case scan pattern. gcc/testsuite/c-c++-common/asan/memcmp-1.c|4 +- libsanitizer/MERGE|2 +- libsanitizer/Makefile.in |8 +- libsanitizer/aclocal.m4 | 191 +- libsanitizer/asan/Makefile.am | 63 +- libsanitizer/asan/Makefile.in | 323 +- ...asan_activation.cc => asan_activation.cpp} |7 +- libsanitizer/asan/asan_activation.h |5 +- libsanitizer/asan/asan_activation_flags.inc |5 +- .../{asan_allocator.cc => asan_allocator.cpp} | 18 +- libsanitizer/asan/asan_allocator.h| 44 +- .../{asan_debugging.cc => asan_debugging.cpp} |7 +- ..._descriptions.cc => asan_descriptions.cpp} |7 +- libsanitizer/asan/asan_descriptions.h |7 +- .../asan/{asan_errors.cc => asan_errors.cpp} | 22 +- libsanitizer/asan/asan_errors.h | 27 +- ...asan_fake_stack.cc => asan_fake_stack.cpp} |7 +- libsanitizer/asan/asan_fake_stack.h |7 +- .../asan/{asan_flags.cc => asan_flags.cpp}| 13 +- libsanitizer/asan/asan_flags.h|5 +- libsanitizer/asan/asan_flags.inc |9 +- .../{asan_fuchsia.cc => asan_fuchsia.cpp} | 20 +- .../{asan_globals.cc => asan_globals.cpp} | 31 +- ...an_globals_win.cc => asan_globals_win.cpp} |7 +- libsanitizer/asan/asan_init_version.h |5 +- ..._interceptors.cc => asan_interceptors.cpp} | 18 +- libsanitizer/asan/asan_interceptors.h | 39 +- ...cc => asan_interceptors_memintrinsics.cpp} |7 +- .../asan/asan_interceptors_memintrinsics.h|7 +- libsanitizer/asan/asan_interceptors_vfork.S | 12 + libsanitizer/asan/asan_interface.inc |6 +- libsanitizer/asan/asan_interface_internal.h |7 +- libsanitizer/asan/asan_internal.h | 22 +- .../asan/{asan_linux.cc => asan_linux.cpp}| 14 +- .../asan/{asan_mac.cc => asan_mac.cpp}| 11 +- ..._malloc_linux.cc => asan_malloc_linux.cpp} | 17 +- libsanitizer/asan/asan_malloc_local.h | 30 +- ...asan_malloc_mac.cc => asan_malloc_mac.cpp} | 46 +- libsanitizer/asan/asan_malloc_win.cc | 259 -- libsanitizer/asan/asan_malloc_win.cpp | 553 libsanitizer/asan/asan_mapping.h | 25 +- libsanitizer/asan/asan_mapping_myriad.h |5 +- libsanitizer/asan/asan_mapping_sparc64.h |5 +- ...ory_profile.cc => asan_memory_profile.cpp} |7 +- ...asan_new_delete.cc => asan_new_delete.cpp} | 37 +- .../{asan_poisoning.cc => asan_poisoning.cpp} |7 +- libsanitizer/asan/asan_poisoning.h| 15 +- .../asan/{asan_posix.cc => asan_posix.cpp}| 53 +- .../{asan_preinit.cc => asan_preinit.cpp} |7 +- ...remap_shadow.cc => asan_premap_shadow.cpp} |7 +- libsanitizer/asan/asan_premap_shadow.h|5 +- .../asan/{asan_report.cc => asan_report.cpp} | 16 +- libsanitizer/asan/asan_report.h |7 +- .../asan/{asan_rtems.cc => asan_rtems.cpp}| 17 +- .../asan/{asan_rtl.cc => asan_rtl.cpp}| 45 +- libsanitizer/asan/asan_scariness_score.h |5 +- ..._shadow_setup.cc => asan_shadow_setup.cpp} | 11 +- libsanitizer/asan/asan_stack.cc | 38 - libsanitizer/asan/asan_stack.cpp | 88 + libsanitizer/asan/asan_stack.h| 47 +- .../asan/{asan_stats.cc => asan_stats.cpp}|7 +- libsanitizer/asan/asan_stats.h
[PATCH 2/3] Reapply all revisions mentioned in LOCAL_PATCHES.
libsanitizer/ChangeLog: 2019-08-13 Martin Liska * asan/asan_globals.cpp (CheckODRViolationViaIndicator): Reapply patch from trunk. (CheckODRViolationViaPoisoning): Likewise. (RegisterGlobal): Likewise. * asan/asan_mapping.h: Likewise. * sanitizer_common/sanitizer_linux_libcdep.cpp (defined): Likewise. * sanitizer_common/sanitizer_mac.cpp (defined): Likewise. * sanitizer_common/sanitizer_platform_limits_linux.cpp (defined): Likewise. * sanitizer_common/sanitizer_platform_limits_posix.h (defined): Likewise. * sanitizer_common/sanitizer_stacktrace.cpp (GetCanonicFrame): Likewise. * ubsan/ubsan_handlers.cpp (__ubsan::__ubsan_handle_cfi_bad_icall): Likewise. (__ubsan::__ubsan_handle_cfi_bad_icall_abort): Likewise. * ubsan/ubsan_handlers.h (struct CFIBadIcallData): Likewise. (struct CFICheckFailData): Likewise. (RECOVERABLE): Likewise. * ubsan/ubsan_platform.h: Likewise. --- libsanitizer/asan/asan_globals.cpp| 19 --- libsanitizer/asan/asan_mapping.h | 2 +- .../sanitizer_linux_libcdep.cpp | 4 .../sanitizer_common/sanitizer_mac.cpp| 2 +- .../sanitizer_platform_limits_linux.cpp | 7 +-- .../sanitizer_platform_limits_posix.h | 2 +- .../sanitizer_common/sanitizer_stacktrace.cpp | 17 - libsanitizer/ubsan/ubsan_handlers.cpp | 15 +++ libsanitizer/ubsan/ubsan_handlers.h | 8 libsanitizer/ubsan/ubsan_platform.h | 2 ++ 10 files changed, 49 insertions(+), 29 deletions(-) diff --git a/libsanitizer/asan/asan_globals.cpp b/libsanitizer/asan/asan_globals.cpp index 54e75f3cee7..c77e5357bf9 100644 --- a/libsanitizer/asan/asan_globals.cpp +++ b/libsanitizer/asan/asan_globals.cpp @@ -154,23 +154,6 @@ static void CheckODRViolationViaIndicator(const Global *g) { } } -// Check ODR violation for given global G by checking if it's already poisoned. -// We use this method in case compiler doesn't use private aliases for global -// variables. -static void CheckODRViolationViaPoisoning(const Global *g) { - if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) { -// This check may not be enough: if the first global is much larger -// the entire redzone of the second global may be within the first global. -for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) { - if (g->beg == l->g->beg && - (flags()->detect_odr_violation >= 2 || g->size != l->g->size) && - !IsODRViolationSuppressed(g->name)) -ReportODRViolation(g, FindRegistrationSite(g), - l->g, FindRegistrationSite(l->g)); -} - } -} - // Clang provides two different ways for global variables protection: // it can poison the global itself or its private alias. In former // case we may poison same symbol multiple times, that can help us to @@ -216,8 +199,6 @@ static void RegisterGlobal(const Global *g) { // where two globals with the same name are defined in different modules. if (UseODRIndicator(g)) CheckODRViolationViaIndicator(g); -else - CheckODRViolationViaPoisoning(g); } if (CanPoisonMemory()) PoisonRedZones(*g); diff --git a/libsanitizer/asan/asan_mapping.h b/libsanitizer/asan/asan_mapping.h index 41fb49ee46d..09be904270c 100644 --- a/libsanitizer/asan/asan_mapping.h +++ b/libsanitizer/asan/asan_mapping.h @@ -163,7 +163,7 @@ static const u64 kDefaultShort64bitShadowOffset = static const u64 kAArch64_ShadowOffset64 = 1ULL << 36; static const u64 kMIPS32_ShadowOffset32 = 0x0aaa; static const u64 kMIPS64_ShadowOffset64 = 1ULL << 37; -static const u64 kPPC64_ShadowOffset64 = 1ULL << 44; +static const u64 kPPC64_ShadowOffset64 = 1ULL << 41; static const u64 kSystemZ_ShadowOffset64 = 1ULL << 52; static const u64 kSPARC64_ShadowOffset64 = 1ULL << 43; // 0x800 static const u64 kFreeBSD_ShadowOffset32 = 1ULL << 30; // 0x4000 diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp index 1f584a2add6..7dc38a0b703 100644 --- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp @@ -701,9 +701,13 @@ u32 GetNumberOfCPUs() { #elif SANITIZER_SOLARIS return sysconf(_SC_NPROCESSORS_ONLN); #else +#if defined(CPU_COUNT) cpu_set_t CPUs; CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), ), 0); return CPU_COUNT(); +#else + return 1; +#endif #endif } diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cpp b/libsanitizer/sanitizer_common/sanitizer_mac.cpp index bd6301aebad..7552b7aa965 100644 --- a/libsanitizer/sanitizer_common/sanitizer_mac.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_mac.cpp @@ -36,7 +36,7 @@ extern char **environ; #endif -#if defined(__has_include) &&
[PATCH 3/3] Fix a test-case scan pattern.
gcc/testsuite/ChangeLog: 2019-08-13 Martin Liska * c-c++-common/asan/memcmp-1.c: There's a new function in the stack-trace on the top. So shift expected output in stack trace. --- gcc/testsuite/c-c++-common/asan/memcmp-1.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/c-c++-common/asan/memcmp-1.c b/gcc/testsuite/c-c++-common/asan/memcmp-1.c index 5915988be5b..0a513c05ee1 100644 --- a/gcc/testsuite/c-c++-common/asan/memcmp-1.c +++ b/gcc/testsuite/c-c++-common/asan/memcmp-1.c @@ -16,5 +16,5 @@ main () } /* { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow.*(\n|\r\n|\r)" } */ -/* { dg-output "#0 0x\[0-9a-f\]+ +(in _*(interceptor_|wrap_|)memcmp|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#1 0x\[0-9a-f\]+ +(in _*main|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "#1 0x\[0-9a-f\]+ +(in _*(interceptor_|wrap_|)memcmp|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "#2 0x\[0-9a-f\]+ +(in _*main|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
Re: [PATCH v3] Missed function specialization + partial devirtualization
And I would also suggest to come up with parameter that will drive minimum probability, maximum number of promotions and maybe minimal number of edge executions to consider a speculation. Clang provides all these: https://github.com/microsoft/llvm-1/blob/master/lib/Analysis/IndirectCallPromotionAnalysis.cpp#L37 Martin
[patch, fortran] Fix PR 90561
Hello world, this patch fixes a 9/10 regression by placing the variable used to hold a string length at function scope. I chose to implement this version of gfc_evaluate_now as a separate function because I have a sneaking suspicion this may not be the last time we are going to encounter something like that - better have the function there for future use. Regression-tested. OK for trunk and gcc-9? Regards Thomas Index: trans.h === --- trans.h (Revision 274370) +++ trans.h (Arbeitskopie) @@ -507,6 +507,7 @@ void gfc_conv_label_variable (gfc_se * se, gfc_exp /* If the value is not constant, Create a temporary and copy the value. */ tree gfc_evaluate_now_loc (location_t, tree, stmtblock_t *); tree gfc_evaluate_now (tree, stmtblock_t *); +tree gfc_evaluate_now_function_scope (tree, stmtblock_t *); /* Find the appropriate variant of a math intrinsic. */ tree gfc_builtin_decl_for_float_kind (enum built_in_function, int); Index: trans.c === --- trans.c (Revision 274370) +++ trans.c (Arbeitskopie) @@ -118,7 +118,20 @@ gfc_evaluate_now (tree expr, stmtblock_t * pblock) return gfc_evaluate_now_loc (input_location, expr, pblock); } +/* Like gfc_evaluate_now, but add the created variable to the + function scope. */ +tree +gfc_evaluate_now_function_scope (tree expr, stmtblock_t * pblock) +{ + tree var; + var = gfc_create_var_np (TREE_TYPE (expr), NULL); + gfc_add_decl_to_function (var); + gfc_add_modify (pblock, var, expr); + + return var; +} + /* Build a MODIFY_EXPR node and add it to a given statement block PBLOCK. A MODIFY_EXPR is an assignment: LHS <- RHS. */ Index: trans-expr.c === --- trans-expr.c (Revision 274370) +++ trans-expr.c (Arbeitskopie) @@ -10796,7 +10796,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr if (expr1->ts.deferred && gfc_expr_attr (expr1).allocatable && gfc_check_dependency (expr1, expr2, true)) - rse.string_length = gfc_evaluate_now (rse.string_length, ); + rse.string_length = + gfc_evaluate_now_function_scope (rse.string_length, ); string_length = rse.string_length; } else ! { dg-do run } ! PR fortran/90561 ! This used to ICE. ! Original test case by Gerhard Steinmetz. program p character(:), allocatable :: z(:) z = [character(2):: 'ab', 'xy'] z = z(2) if (any(z /= 'xy')) stop 1 end
Re: [PATCH v3] Missed function specialization + partial devirtualization
On 7/31/19 7:29 AM, luoxhu wrote: > This patch aims to fix PR69678 caused by PGO indirect call profiling > performance issues. > The bug that profiling data is never working was fixed by Martin's pull > back of topN patches, performance got GEOMEAN ~1% improvement. > Still, currently the default profile only generates SINGLE indirect target > that called more than 75%. This patch leverages MULTIPLE indirect > targets use in LTO-WPA and LTO-LTRANS stage, as a result, function > specialization, profiling, partial devirtualization, inlining and > cloning could be done successfully based on it. > Performance can get improved from 0.70 sec to 0.38 sec on simple tests. > Details are: > 1. PGO with topn is enbaled by default now, but only one indirect > target edge will be generated in ipa-profile pass, so add variables to > enable > multiple speculative edges through passes, speculative_id will record the > direct edge > index bind to the indirect edge, num_of_ics records how many direct edges > owned by > the indirect edge, postpone gimple_ic to ipa-profile like default as inline > pass will decide whether it is benefit to transform indirect call. > 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for > profile full support in ipa passes and cgraph_edge functions. > speculative_id > can be set by make_speculative id when multiple targets are binded to > one indirect edge, and cloned if new edge is cloned. speculative_id > is streamed out and stream int by lto like lto_stmt_uid. > 3. Add 1 in module testcase and 2 cross module testcases. > 4. Bootstrap and regression test passed on Power8-LE. > > v3 Changes: > 1. Rebase to trunk. > 2. Use speculative_id to track and search the reference node matched > with the direct edge's callee for multiple targets. This could > eliminate the workaround strstr before. Actually, it is the caller's > response to handle the direct edges mapped to same indirect edge. > speculative_call_info will still return one of the direct edge > specified, this will leverage current IPA edge process framework mostly. Thank you for the next version of the patch. Inline comments follow: > > gcc/ChangeLog > > 2019-07-31 Xiong Hu Luo > > PR ipa/69678 > * cgraph.c (symbol_table::create_edge): Init speculative_id. > (cgraph_edge::make_speculative): Add param for setting speculative_id. > (cgraph_edge::speculative_call_info): Find reference by > speculative_id for multiple indirect targets. > (cgraph_edge::resolve_speculation): Decrease the speculations > for indirect edge, drop it's speculative if not direct target > left. > (cgraph_edge::redirect_call_stmt_to_callee): Likewise. > (cgraph_node::verify_node): Don't report error if speculative > edge not include statement. > * cgraph.h (struct indirect_target_info): New struct. > (indirect_call_targets): New vector variable. > (num_of_ics): New variable. > (make_speculative): Add param for setting speculative_id. > (speculative_id): New variable. > * cgraphclones.c (cgraph_node::create_clone): Clone speculative_id. > * ipa-inline.c (inline_small_functions): Add iterator update. > * ipa-profile.c (ipa_profile_generate_summary): Add indirect > multiple targets logic. > (ipa_profile): Likewise. > * ipa-ref.h (speculative_id): New variable. > * ipa.c (process_references): Fix typo. > * lto-cgraph.c (lto_output_edge): Add indirect multiple targets > logic. Stream out speculative_id. > (input_edge): Likewise. > * predict.c (dump_prediction): Revome edges count assert to be > precise. > * symtab.c (symtab_node::create_reference): Init speculative_id. > (symtab_node::clone_references): Clone speculative_id. > (symtab_node::clone_referring): Clone speculative_id. > (symtab_node::clone_reference): Clone speculative_id. > (symtab_node::clear_stmts_in_references): Clear speculative_id. > * tree-inline.c (copy_bb): Duplicate all the speculative edges > if indirect call contains multiple speculative targets. > * tree-profile.c (gimple_gen_ic_profiler): Use the new variable > __gcov_indirect_call.counters and __gcov_indirect_call.callee. > (gimple_gen_ic_func_profiler): Likewise. > (pass_ipa_tree_profile::gate): Fix comment typos. > * value-prof.c (gimple_ic_transform): Handle topn case. > Fix comment typos. > > gcc/testsuite/ChangeLog > > 2019-07-31 Xiong Hu Luo > > PR ipa/69678 > * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase. > --- > gcc/cgraph.c | 70
Re: [PATCH][testsuite] Fix PR91419
> From: Richard Biener > Date: Tue, 13 Aug 2019 09:50:34 +0200 > 2019-08-13 Richard Biener > > PR testsuite/91419 > * lib/target-supports.exp (natural_alignment_32): Amend target > list based on BIGGEST_ALIGNMENT. > (natural_alignment_64): Targets not natural_alignment_32 cannot > be natural_alignment_64. > * gcc.dg/tree-ssa/pr91091-2.c: XFAIL for !natural_alignment_32. > * gcc.dg/tree-ssa/ssa-fre-77.c: Likewise. > * gcc.dg/tree-ssa/ssa-fre-61.c: Require natural_alignment_32. LGTM, thanks. (Not tested myself but my cris-elf autotester will pick it up.) brgds, H-p
[PATCH][2/2][i386] STV changes, reg-copy as vector
The following splits out the change that makes the live-range split / mode change copy pseudos vector mode pseudos. I've tried reproducing the issue I faced with SImode chains but failed likely because we're never using simple moves in the end while for SImode we can end up with (set (subreg:V4SI (reg:SI 42) 0) (subreg:V4SI (reg:SI 41) 0)) which we happily propagate out. So pursuing this patch independently doesn't make much sense. Still the main change is in dimode_scalar_chain::make_vector_copies where 'vreg' is now V2DImode instead of DImode and the change to emit_move_insn (vreg, tmp) hints at the change done to the above problematic insn. The rest of the changes deal with some regs already being in the appropriate vector mode. I realize I might have avoided my original issue by emitting not the above reg-reg move but (set (subreg:V4SI (reg:SI 42) 0) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 41)) (const_vec [0 0 0 0]) (const_1))) but I didn't try (the full patch contained a testcase for the issue). Still it seems odd to use DImode pseudos when all uses are wrapped in (subreg:V2DI ...). As (unrelated) change I propose to error with fatal_insn_not_found in case recog doesn't recognize the altered insns which make it easier to spot errors early. Bootstrap / regtest running on x86_64-unknown-linux-gnu ontop of [1/2], OK for trunk if that succeeds? Thanks, Richard. 2019-08-13 Richard Biener * config/i386/i386-features.c (dimode_scalar_chain::replace_with_subreg): Elide the subreg if the reg is already vector. (dimode_scalar_chain::convert_op): Likewise. (dimode_scalar_chain::make_vector_copies): Use a vector-mode pseudo as destination to avoid later passes copy-propagating it out. (dimode_chain::convert_insn): Convert the source of a memory op in case it is not of the appropriate scalar mode. Raise fatal_insn_not_found if the converted insn is not recognized. Index: gcc/config/i386/i386-features.c === --- gcc/config/i386/i386-features.c (revision 274328) +++ gcc/config/i386/i386-features.c (working copy) @@ -573,7 +573,8 @@ rtx dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) { if (x == reg) -return gen_rtx_SUBREG (V2DImode, new_reg, 0); +return (GET_MODE (new_reg) == V2DImode + ? new_reg : gen_rtx_SUBREG (V2DImode, new_reg, 0)); const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); int i, j; @@ -627,7 +628,7 @@ void dimode_scalar_chain::make_vector_copies (unsigned regno) { rtx reg = regno_reg_rtx[regno]; - rtx vreg = gen_reg_rtx (DImode); + rtx vreg = gen_reg_rtx (V2DImode); df_ref ref; for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) @@ -641,7 +642,12 @@ dimode_scalar_chain::make_vector_copies gen_rtx_SUBREG (SImode, reg, 0)); emit_move_insn (adjust_address (tmp, SImode, 4), gen_rtx_SUBREG (SImode, reg, 4)); - emit_move_insn (vreg, tmp); + emit_move_insn (vreg, + gen_rtx_VEC_MERGE (V2DImode, + gen_rtx_VEC_DUPLICATE (V2DImode, + tmp), + CONST0_RTX (V2DImode), + GEN_INT (HOST_WIDE_INT_1U))); } else if (TARGET_SSE4_1) { @@ -841,7 +847,8 @@ dimode_scalar_chain::convert_op (rtx *op gcc_assert (!DF_REF_CHAIN (ref)); break; } - *op = gen_rtx_SUBREG (V2DImode, *op, 0); + if (GET_MODE (*op) != V2DImode) + *op = gen_rtx_SUBREG (V2DImode, *op, 0); } else if (CONST_INT_P (*op)) { @@ -931,6 +938,8 @@ dimode_scalar_chain::convert_insn (rtx_i case MEM: if (!REG_P (dst)) convert_op (, insn); + else if (GET_MODE (src) != DImode) + src = gen_rtx_SUBREG (DImode, src, 0); break; case REG: @@ -977,7 +986,9 @@ dimode_scalar_chain::convert_insn (rtx_i PATTERN (insn) = def_set; INSN_CODE (insn) = -1; - recog_memoized (insn); + int patt = recog_memoized (insn); + if (patt == -1) +fatal_insn_not_found (insn); df_insn_rescan (insn); }
[PATCH][1/2][i386] STV changes, DImode chain cost
The following splits out the DImode chain cost changes from the patch adding SImode chain handling. There are two main parts: 1) fix REG_P (src) && REG_P (dst) costing which currently favors SSE because we use COSTS_N_INSNS based costs for GPR but move costs for SSE. 2) Use ix86_cost->sse_op as cost of the SSE op rather than just computing gain as one GPR insn cost. The vectorizer already compares GPR insn costs and SSE insn costs so this shouldn't be apples and oranges. Also when handling SImode chains we'd be left with a zero gain everywhere (but the minmax special-casing). besides that for debugging I found it useful to dump per-stmt gain if not zero (and spotted the reg-reg move issue that way). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. OK for trunk? Thanks, Richard. 2019-08-13 Richard Biener * config/i386/i386-features.c (dimode_scalar_chain::compute_convert_gain): Compute and dump individual instruction gain. Fix reg-reg copy GRP cost. Use ix86_cost->sse_op for vector instruction costs. Index: gcc/config/i386/i386-features.c === --- gcc/config/i386/i386-features.c (revision 274328) +++ gcc/config/i386/i386-features.c (working copy) @@ -497,22 +497,23 @@ dimode_scalar_chain::compute_convert_gai rtx def_set = single_set (insn); rtx src = SET_SRC (def_set); rtx dst = SET_DEST (def_set); + int igain = 0; if (REG_P (src) && REG_P (dst)) - gain += COSTS_N_INSNS (2) - ix86_cost->xmm_move; + igain += 2 - ix86_cost->xmm_move; else if (REG_P (src) && MEM_P (dst)) - gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; + igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; else if (MEM_P (src) && REG_P (dst)) - gain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1]; + igain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1]; else if (GET_CODE (src) == ASHIFT || GET_CODE (src) == ASHIFTRT || GET_CODE (src) == LSHIFTRT) { if (CONST_INT_P (XEXP (src, 0))) - gain -= vector_const_cost (XEXP (src, 0)); - gain += ix86_cost->shift_const; + igain -= vector_const_cost (XEXP (src, 0)); + igain += 2 * ix86_cost->shift_const - ix86_cost->sse_op; if (INTVAL (XEXP (src, 1)) >= 32) - gain -= COSTS_N_INSNS (1); + igain -= COSTS_N_INSNS (1); } else if (GET_CODE (src) == PLUS || GET_CODE (src) == MINUS @@ -520,20 +521,20 @@ dimode_scalar_chain::compute_convert_gai || GET_CODE (src) == XOR || GET_CODE (src) == AND) { - gain += ix86_cost->add; + igain += 2 * ix86_cost->add - ix86_cost->sse_op; /* Additional gain for andnot for targets without BMI. */ if (GET_CODE (XEXP (src, 0)) == NOT && !TARGET_BMI) - gain += 2 * ix86_cost->add; + igain += 2 * ix86_cost->add; if (CONST_INT_P (XEXP (src, 0))) - gain -= vector_const_cost (XEXP (src, 0)); + igain -= vector_const_cost (XEXP (src, 0)); if (CONST_INT_P (XEXP (src, 1))) - gain -= vector_const_cost (XEXP (src, 1)); + igain -= vector_const_cost (XEXP (src, 1)); } else if (GET_CODE (src) == NEG || GET_CODE (src) == NOT) - gain += ix86_cost->add - COSTS_N_INSNS (1); + igain += 2 * ix86_cost->add - ix86_cost->sse_op - COSTS_N_INSNS (1); else if (GET_CODE (src) == COMPARE) { /* Assume comparison cost is the same. */ @@ -541,13 +542,20 @@ dimode_scalar_chain::compute_convert_gai else if (CONST_INT_P (src)) { if (REG_P (dst)) - gain += COSTS_N_INSNS (2); + igain += 2 * COSTS_N_INSNS (1); else if (MEM_P (dst)) - gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; - gain -= vector_const_cost (src); + igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; + igain -= vector_const_cost (src); } else gcc_unreachable (); + + if (igain != 0 && dump_file) + { + fprintf (dump_file, " Instruction gain %d for ", igain); + dump_insn_slim (dump_file, insn); + } + gain += igain; } if (dump_file)
Re: [PATCH] Come up with json::integer_number and use it in GCOV.
On 8/2/19 2:40 PM, David Malcolm wrote: > Something that occurred to me reading the updated patch: maybe it would > make things easier to have utility member functions of json::object to > implicitly make the child, e.g.: > > void > json::object::set (const char *key, long v) > { >set (key, new json::integer_number (v)); > } > > so that all those calls can be just: > > obj->set ("line", exploc.line); > obj->set ("column", exploc.column); > > etc (assuming overloading is unambiguous). > > But that's probably orthogonal to this patch. Looks good to me. It's a candidate for a follow up patch. > > >> And I changed all occurrences of float_number with integer_number >> as you suggested. > Thanks. > >> I'm currently testing the updated patch. >> Martin > The updated patch looks good to me, but technically I'm not a reviewer > for these files. Sure, I hope @Jakub or @Richi can approve me that? Thanks, Martin > > Dave
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
> I have become rather wary of INTEGRAL_TYPE_P recently because it > includes enum types, which with -fstrict-enum can have a surprising > behavior. If I have > enum E { A, B, C }; > and e has type enum E, with -fstrict-enum, do your tests manage to > prevent (long)e+1 from becoming (long)(e+1) with an unsafe operation > done in the enum type? Ah, I didn't specifically check for this. Going to rather change the checks to TREE_CODE (type) == INTEGER_TYPE then. Regards Robin
Re: [PATCH 1/3] Perform fold when propagating.
> May I suggest to add a parameter to the substitute-and-fold engine > so we can do the folding on all stmts only when enabled and enable > it just for VRP? That also avoids the testsuite noise. Would something along these lines do? diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c index 7a8f1e037b0..6c0d743b823 100644 --- a/gcc/tree-ssa-propagate.c +++ b/gcc/tree-ssa-propagate.c @@ -814,7 +814,6 @@ ssa_propagation_engine::ssa_propagate (void) ssa_prop_fini (); } - /* Return true if STMT is of the form 'mem_ref = RHS', where 'mem_ref' is a non-volatile pointer dereference, a structure reference or a reference to a single _DECL. Ignore volatile memory references @@ -1064,11 +1063,10 @@ substitute_and_fold_dom_walker::before_dom_children (basic_block bb) /* Replace real uses in the statement. */ did_replace |= substitute_and_fold_engine->replace_uses_in (stmt); - if (did_replace) - gimple_set_modified (stmt, true); - - if (fold_stmt (, follow_single_use_edges)) + /* If we made a replacement, fold the statement. */ + if (did_replace || substitute_and_fold_engine->should_fold_all_stmts ()) { + fold_stmt (, follow_single_use_edges); did_replace = true; stmt = gsi_stmt (i); gimple_set_modified (stmt, true); diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h index 81b635e0787..939680f487c 100644 --- a/gcc/tree-ssa-propagate.h +++ b/gcc/tree-ssa-propagate.h @@ -107,6 +107,13 @@ class substitute_and_fold_engine bool substitute_and_fold (basic_block = NULL); bool replace_uses_in (gimple *); bool replace_phi_args_in (gphi *); + + /* Users like VRP can overwrite this when they want to perform + folding for every propagation. */ + virtual bool should_fold_all_stmts (void) +{ + return false; +} }; #endif /* _TREE_SSA_PROPAGATE_H */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index e2850682da2..8c8fa6f2bec 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -6271,6 +6271,9 @@ class vrp_folder : public substitute_and_fold_engine { return vr_values->simplify_stmt_using_ranges (gsi); } tree op_with_constant_singleton_value_range (tree op) { return vr_values->op_with_constant_singleton_value_range (op); } + + /* Enable aggressive folding in every propagation. */ + bool should_fold_all_stmts (void) { return true; } }; /* If the statement pointed by SI has a predicate whose value can be > I think it's also only necessary to fold a stmt when a (indirect) use > after substitution has either been folded or has (new) SSA name > info (range/known-bits) set? Where would this need to be changed? Regards Robin
Re: [patch] handle casesi dispatch insns in create_trace_edges
Hi Jeff, > On 12 Aug 2019, at 16:48, Jeff Law wrote: > I think part of the reason I settled on rtl.c was because we're not > really just looking at the form, not doing any real "analysis". BUt I > think either location for the implementation is fine. The reference to pc_rtx and to single_set turned out problematic in rtl.c, as they end up unresolved in helpers such as genpreds. I didn't want to start messing with those so I have finally sticked the definition in rtlanal, next to tablejump_p. This is rev 274377, ChangeLog below and patch attached for the list records. rev 274378 follows to fix the placement of the prototype in rtl.h, very minor. Thanks again for your feedback, With Kind Regards, Olivier 2019-08-13 Olivier Hainque * rtlanal.c (tablejump_casesi_pattern): New function, to determine if a tablejump insn is a casesi dispatcher. Extracted from patch_jump_insn. * rtl.h (tablejump_casesi_pattern): Declare. * cfgrtl.c (patch_jump_insn): Use it. * dwarf2cfi.c (create_trace_edges): Use it. testsuite/ * gnat.dg/casesi.ad[bs], test_casesi.adb: New test. casesi.diff Description: Binary data
Re: [PATCH 2/3] C++20 constexpr lib part 2/3 - swappish functions.
On 01/08/19 13:16 -0400, Ed Smith-Rowland via libstdc++ wrote: Greetings, Here is a patch for C++20 p0879 - Constexpr for swap and swap related functions. This essentially constexprifies the rest of . Built and tested with C++20 (and pre-c++20) on x86_64-linux. Ok? Regards, Ed Smith-Rowland 2019-08-01 Edward Smith-Rowland <3dw...@verizon.net> Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. (iter_swap, make_heap, next_permutation, partial_sort_copy, There should be a newline after "New macro." and before the next parenthesized list of identifiers. The parenthesized lists should not span multiple lines, so close and reopen the parens, i.e. Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. (iter_swap, make_heap, next_permutation, partial_sort_copy, pop_heap) (prev_permutation, push_heap, reverse, rotate, sort_heap, swap) (swap_ranges, nth_element, partial_sort, sort): Add constexpr. @@ -193,6 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if __cplusplus > 201703L # define __cpp_lib_constexpr_algorithms 201711L +# define __cpp_lib_constexpr_swap_algorithms 201712L Should this value be 201806L? The new macro also needs to be added to .
Optimise constant IFN_WHILE_ULTs
This patch is a combination of two changes that have to be committed as a single unit, one target-independent and one target-specific: (1) Try to fold IFN_WHILE_ULTs with constant arguments to a VECTOR_CST (which is always possible for fixed-length vectors but is not necessarily so for variable-length vectors) (2) Make the SVE port recognise constants that map to PTRUE VLn, which includes those generated by the new fold. (2) can't be tested without (1) and (1) would be a significant pessimisation without (2). The target-specific parts also start moving towards doing predicate manipulation in a canonical VNx16BImode form, using rtx_vector_builders. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and x86_64-linux-gnu. OK for the generic bits (= the first three files in the diff)? Thanks, Richard 2019-08-13 Richard Sandiford gcc/ * tree.h (build_vector_a_then_b): Declare. * tree.c (build_vector_a_then_b): New function. * fold-const-call.c (fold_while_ult): Likewise. (fold_const_call): Use it to handle IFN_WHILE_ULT. * config/aarch64/aarch64-protos.h (AARCH64_FOR_SVPATTERN): New macro. (aarch64_svpattern): New enum. * config/aarch64/aarch64-sve.md (mov): Pass constants through aarch64_expand_mov_immediate. (*aarch64_sve_mov): Use aarch64_mov_operand rather than general_operand as the predicate for operand 1. (while_ult): Add a '@' marker. * config/aarch64/aarch64.c (simd_immediate_info::PTRUE): New insn_type. (simd_immediate_info::simd_immediate_info): New overload that takes a scalar_int_mode and an svpattern. (simd_immediate_info::u): Add a "pattern" field. (svpattern_token): New function. (aarch64_get_sve_pred_bits, aarch64_widest_sve_pred_elt_size) (aarch64_partial_ptrue_length, aarch64_svpattern_for_vl) (aarch64_sve_move_pred_via_while): New functions. (aarch64_expand_mov_immediate): Try using aarch64_sve_move_pred_via_while for predicates that contain N ones followed by M zeros but that do not correspond to a VLnnn pattern. (aarch64_sve_pred_valid_immediate): New function. (aarch64_simd_valid_immediate): Use it instead of dealing directly with PTRUE and PFALSE. (aarch64_output_sve_mov_immediate): Handle new simd_immediate_info forms. gcc/testsuite/ * gcc.target/aarch64/sve/spill_2.c: Increase iteration counts beyond the range of a PTRUE. * gcc.target/aarch64/sve/while_6.c: New test. * gcc.target/aarch64/sve/while_7.c: Likewise. * gcc.target/aarch64/sve/while_8.c: Likewise. * gcc.target/aarch64/sve/while_9.c: Likewise. * gcc.target/aarch64/sve/while_10.c: Likewise. Index: gcc/tree.h === --- gcc/tree.h 2019-08-13 10:38:29.003944679 +0100 +++ gcc/tree.h 2019-08-13 11:46:02.110720978 +0100 @@ -4314,6 +4314,7 @@ extern tree build_vector_from_val (tree, extern tree build_uniform_cst (tree, tree); extern tree build_vec_series (tree, tree, tree); extern tree build_index_vector (tree, poly_uint64, poly_uint64); +extern tree build_vector_a_then_b (tree, unsigned int, tree, tree); extern void recompute_constructor_flags (tree); extern void verify_constructor_flags (tree); extern tree build_constructor (tree, vec * CXX_MEM_STAT_INFO); Index: gcc/tree.c === --- gcc/tree.c 2019-08-13 10:38:29.003944679 +0100 +++ gcc/tree.c 2019-08-13 11:46:02.110720978 +0100 @@ -1981,6 +1981,23 @@ build_index_vector (tree vec_type, poly_ return v.build (); } +/* Return a VECTOR_CST of type VEC_TYPE in which the first NUM_A + elements are A and the rest are B. */ + +tree +build_vector_a_then_b (tree vec_type, unsigned int num_a, tree a, tree b) +{ + gcc_assert (known_le (num_a, TYPE_VECTOR_SUBPARTS (vec_type))); + unsigned int count = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vec_type)); + /* Optimize the constant case. */ + if ((count & 1) == 0 && TYPE_VECTOR_SUBPARTS (vec_type).is_constant ()) +count /= 2; + tree_vector_builder builder (vec_type, count, 2); + for (unsigned int i = 0; i < count * 2; ++i) +builder.quick_push (i < num_a ? a : b); + return builder.build (); +} + /* Something has messed with the elements of CONSTRUCTOR C after it was built; calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */ Index: gcc/fold-const-call.c === --- gcc/fold-const-call.c 2019-03-08 18:15:31.292760907 + +++ gcc/fold-const-call.c 2019-08-13 11:46:02.106721009 +0100 @@ -691,6 +691,36 @@ fold_const_vec_convert (tree ret_type, t /* Try to evaluate: + IFN_WHILE_ULT (ARG0, ARG1, (TYPE) { ... }) + + Return the value on success and null on failure. */ + +static tree
[committed][AArch64] Improve SVE constant moves
If there's no SVE instruction to load a given constant directly, this patch instead tries to use an Advanced SIMD constant move and then duplicates the constant to fill an SVE vector. The main use of this is to support constants in which each byte is in { 0, 0xff }. Also, the patch prefers a simple integer move followed by a duplicate over a load from memory, like we already do for Advanced SIMD. This is a useful option to have and would be easy to turn off via a tuning parameter if necessary. The patch also extends the handling of wide LD1Rs to big endian, whereas previously we punted to a full LD1RQ. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274375 (treating the machmode.h bit as obvious). Richard 2019-08-13 Richard Sandiford gcc/ * machmode.h (opt_mode::else_mode): New function. (opt_mode::else_blk): Use it. * config/aarch64/aarch64-protos.h (aarch64_vq_mode): Declare. (aarch64_full_sve_mode, aarch64_sve_ld1rq_operand_p): Likewise. (aarch64_gen_stepped_int_parallel): Likewise. (aarch64_stepped_int_parallel_p): Likewise. (aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate argument. * config/aarch64/aarch64.c (aarch64_expand_sve_widened_duplicate): Delete. (aarch64_expand_sve_dupq, aarch64_expand_sve_ld1rq): New functions. (aarch64_expand_sve_const_vector): Rewrite to handle more cases. (aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate argument. Use early returns in the !CONST_INT_P handling. Pass all SVE data vectors to aarch64_expand_sve_const_vector rather than handling some inline. (aarch64_full_sve_mode, aarch64_vq_mode): New functions, split out from... (aarch64_simd_container_mode): ...here. (aarch64_gen_stepped_int_parallel, aarch64_stepped_int_parallel_p) (aarch64_sve_ld1rq_operand_p): New functions. * config/aarch64/predicates.md (descending_int_parallel) (aarch64_sve_ld1rq_operand): New predicates. * config/aarch64/constraints.md (UtQ): New constraint. * config/aarch64/aarch64.md (UNSPEC_REINTERPRET): New unspec. * config/aarch64/aarch64-sve.md (mov): Remove the gen_vec_duplicate from call to aarch64_expand_mov_immediate. (@aarch64_sve_reinterpret): New expander. (*aarch64_sve_reinterpret): New pattern. (@aarch64_vec_duplicate_vq_le): New pattern. (@aarch64_vec_duplicate_vq_be): Likewise. (*sve_ld1rq): Replace with... (@aarch64_sve_ld1rq): ...this new pattern. gcc/testsuite/ * gcc.target/aarch64/sve/init_2.c: Expect ld1rd to be used instead of a full vector load. * gcc.target/aarch64/sve/init_4.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Remove constants that no longer need to be loaded from memory. * gcc.target/aarch64/sve/slp_2.c: Expect the same output for big and little endian. * gcc.target/aarch64/sve/slp_3.c: Likewise. Expect 3 of the doubles to be moved via integer registers rather than loaded from memory. * gcc.target/aarch64/sve/slp_4.c: Likewise but for 4 doubles. * gcc.target/aarch64/sve/spill_4.c: Expect 16-bit constants to be loaded via an integer register rather than from memory. * gcc.target/aarch64/sve/const_1.c: New test. * gcc.target/aarch64/sve/const_2.c: Likewise. * gcc.target/aarch64/sve/const_3.c: Likewise. Index: gcc/machmode.h === --- gcc/machmode.h 2019-03-08 18:14:25.849009683 + +++ gcc/machmode.h 2019-08-13 11:28:59.890111027 +0100 @@ -251,7 +251,8 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS) ALWAYS_INLINE opt_mode (from_int m) : m_mode (machine_mode (m)) {} machine_mode else_void () const; - machine_mode else_blk () const; + machine_mode else_blk () const { return else_mode (BLKmode); } + machine_mode else_mode (machine_mode) const; T require () const; bool exists () const; @@ -271,13 +272,13 @@ opt_mode::else_void () const return m_mode; } -/* If the T exists, return its enum value, otherwise return E_BLKmode. */ +/* If the T exists, return its enum value, otherwise return FALLBACK. */ template inline machine_mode -opt_mode::else_blk () const +opt_mode::else_mode (machine_mode fallback) const { - return m_mode == E_VOIDmode ? E_BLKmode : m_mode; + return m_mode == E_VOIDmode ? fallback : m_mode; } /* Assert that the object contains a T and return it. */ Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h 2019-08-13 11:21:04.501550177 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-13 11:28:59.882111087 +0100 @@ -416,6 +416,8 @@ unsigned HOST_WIDE_INT aarch64_and_split
Re: [PATCH] Add --with-static-standard-libraries to the top level
On 08/08/19 14:53 -0600, Jeff Law wrote: On 8/5/19 12:02 PM, Tom Tromey wrote: gdb should normally not be linked with -static-libstdc++. Currently this has not caused problems, but it's incompatible with catching an exception thrown from a shared library -- and a subsequent patch changes gdb to do just this. This patch adds a new --with-static-standard-libraries flag to the top-level configure. It defaults to "auto", which means enabled if gcc is being built, and disabled otherwise. Tom 2019-07-27 Tom Tromey * configure: Rebuild. * configure.ac: Add --with-static-standard-libraries. Deferring to Jon. It might be worth reviewing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56750 And this thread: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00403.html While I NAK'd Aldy's patch in the email thread, if Jon thinks we should have this capability I won't object. The new option (and its default value) seem reasonable to me. I don't see why GDB should be forced to link to libstdc++.a just because GCC wants to. What I don't understand is why GDB crashes. It should still be able to catch exceptions from a shared library even if linked to libstdc++.a, unless the static libstdc++.a is somehow incompatible with the shared libstdc++.so the shared lib linked to. Is this on GNU/Linux, or something with a different linking model?
Re: [patch, fortran] Some corrections for DO loop index warnings
Hi Steve, Ok. If it regression cleanly on gcc9, go ahead an commit there as well. Committed to both (after doing a regression test on gcc 9 and also waiting for gcc-testresults containing the revision). Thanks for the review!
[committed][AArch64] Use simd_immediate_info for SVE predicate constants
This patch makes predicate constants use the normal simd_immediate_info machinery, rather than treating PFALSE and PTRUE as special cases. This makes it easier to add other types of predicate constant later. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274372. Richard 2019-08-13 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_output_ptrue): Delete. * config/aarch64/aarch64-sve.md (*aarch64_sve_mov): Use a single Dn alternative instead of separate Dz and Dm alternatives. Use aarch64_output_sve_move_immediate. * config/aarch64/aarch64.c (aarch64_sve_element_int_mode): New function. (aarch64_simd_valid_immediate): Fill in the simd_immediate_info for predicates too. (aarch64_output_sve_mov_immediate): Handle predicate modes. (aarch64_output_ptrue): Delete. Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h 2019-08-08 18:11:22.0 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-13 11:20:41.965713252 +0100 @@ -462,7 +462,6 @@ char *aarch64_output_scalar_simd_mov_imm char *aarch64_output_simd_mov_immediate (rtx, unsigned, enum simd_immediate_check w = AARCH64_CHECK_MOV); char *aarch64_output_sve_mov_immediate (rtx); -char *aarch64_output_ptrue (machine_mode, char); bool aarch64_pad_reg_upward (machine_mode, const_tree, bool); bool aarch64_regno_ok_for_base_p (int, bool); bool aarch64_regno_ok_for_index_p (int, bool); Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-13 10:38:35.963894971 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-13 11:20:41.965713252 +0100 @@ -453,8 +453,8 @@ (define_expand "mov" ) (define_insn "*aarch64_sve_mov" - [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa, Upa") - (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dz, Dm"))] + [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa") + (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dn"))] "TARGET_SVE && (register_operand (operands[0], mode) || register_operand (operands[1], mode))" @@ -462,8 +462,7 @@ (define_insn "*aarch64_sve_mov" mov\t%0.b, %1.b str\t%1, %0 ldr\t%0, %1 - pfalse\t%0.b - * return aarch64_output_ptrue (mode, '');" + * return aarch64_output_sve_mov_immediate (operands[1]);" ) ;; = Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-13 11:18:06.762836453 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-13 11:20:41.969713225 +0100 @@ -1635,6 +1635,16 @@ aarch64_get_mask_mode (poly_uint64 nunit return default_get_mask_mode (nunits, nbytes); } +/* Return the integer element mode associated with SVE mode MODE. */ + +static scalar_int_mode +aarch64_sve_element_int_mode (machine_mode mode) +{ + unsigned int elt_bits = vector_element_size (BITS_PER_SVE_VECTOR, + GET_MODE_NUNITS (mode)); + return int_mode_for_size (elt_bits, 0).require (); +} + /* Implement TARGET_PREFERRED_ELSE_VALUE. For binary operations, prefer to use the first arithmetic operand as the else value if the else value doesn't matter, since that exactly matches the SVE @@ -14700,8 +14710,18 @@ aarch64_simd_valid_immediate (rtx op, si /* Handle PFALSE and PTRUE. */ if (vec_flags & VEC_SVE_PRED) -return (op == CONST0_RTX (mode) - || op == CONSTM1_RTX (mode)); +{ + if (op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode)) + { + if (info) + { + scalar_int_mode int_mode = aarch64_sve_element_int_mode (mode); + *info = simd_immediate_info (int_mode, op == CONSTM1_RTX (mode)); + } + return true; + } + return false; +} scalar_float_mode elt_float_mode; if (n_elts == 1 @@ -16393,6 +16413,21 @@ aarch64_output_sve_mov_immediate (rtx co element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode)); + machine_mode vec_mode = GET_MODE (const_vector); + if (aarch64_sve_pred_mode_p (vec_mode)) +{ + static char buf[sizeof ("ptrue\t%0.N, vlN")]; + unsigned int total_bytes; + if (info.u.mov.value == const0_rtx) + snprintf (buf, sizeof (buf), "pfalse\t%%0.b"); + else if (BYTES_PER_SVE_VECTOR.is_constant (_bytes)) + snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, vl%d", element_char, + total_bytes / GET_MODE_SIZE (info.elt_mode)); + else + snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, all", element_char); + return buf; +} + if
[committed][AArch64] Make simd_immediate_info INDEX explicit
This patch tweaks the representation of SVE INDEX instructions in simd_immediate_info so that it's easier to add new types of constant on top. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274371. Richard 2019-08-13 Richard Sandiford gcc/ * config/aarch64/aarch64.c (simd_immediate_info::insn_type): Add INDEX. (simd_immediate_info::value, simd_immediate_info::step) (simd_immediate_info::modifier, simd_immediate_info::shift): Replace with... (simd_immediate_info::u): ...this new union. (simd_immediate_info::simd_immediate_info): Update accordingly. (aarch64_output_simd_mov_immediate): Likewise. (aarch64_output_sve_mov_immediate): Likewise. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-13 10:50:31.050777017 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-13 11:16:57.723336150 +0100 @@ -83,7 +83,7 @@ #define POINTER_BYTES (POINTER_SIZE / BI /* Information about a legitimate vector immediate operand. */ struct simd_immediate_info { - enum insn_type { MOV, MVN }; + enum insn_type { MOV, MVN, INDEX }; enum modifier_type { LSL, MSL }; simd_immediate_info () {} @@ -96,29 +96,43 @@ struct simd_immediate_info /* The mode of the elements. */ scalar_mode elt_mode; - /* The value of each element if all elements are the same, or the - first value if the constant is a series. */ - rtx value; - - /* The value of the step if the constant is a series, null otherwise. */ - rtx step; - /* The instruction to use to move the immediate into a vector. */ insn_type insn; - /* The kind of shift modifier to use, and the number of bits to shift. - This is (LSL, 0) if no shift is needed. */ - modifier_type modifier; - unsigned int shift; + union + { +/* For MOV and MVN. */ +struct +{ + /* The value of each element. */ + rtx value; + + /* The kind of shift modifier to use, and the number of bits to shift. +This is (LSL, 0) if no shift is needed. */ + modifier_type modifier; + unsigned int shift; +} mov; + +/* For INDEX. */ +struct +{ + /* The value of the first element and the step to be added for each +subsequent element. */ + rtx base, step; +} index; + } u; }; /* Construct a floating-point immediate in which each element has mode ELT_MODE_IN and value VALUE_IN. */ inline simd_immediate_info ::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in) - : elt_mode (elt_mode_in), value (value_in), step (NULL_RTX), insn (MOV), -modifier (LSL), shift (0) -{} + : elt_mode (elt_mode_in), insn (MOV) +{ + u.mov.value = value_in; + u.mov.modifier = LSL; + u.mov.shift = 0; +} /* Construct an integer immediate in which each element has mode ELT_MODE_IN and value VALUE_IN. The other parameters are as for the structure @@ -128,17 +142,22 @@ struct simd_immediate_info unsigned HOST_WIDE_INT value_in, insn_type insn_in, modifier_type modifier_in, unsigned int shift_in) - : elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)), -step (NULL_RTX), insn (insn_in), modifier (modifier_in), shift (shift_in) -{} + : elt_mode (elt_mode_in), insn (insn_in) +{ + u.mov.value = gen_int_mode (value_in, elt_mode_in); + u.mov.modifier = modifier_in; + u.mov.shift = shift_in; +} /* Construct an integer immediate in which each element has mode ELT_MODE_IN - and where element I is equal to VALUE_IN + I * STEP_IN. */ + and where element I is equal to BASE_IN + I * STEP_IN. */ inline simd_immediate_info -::simd_immediate_info (scalar_mode elt_mode_in, rtx value_in, rtx step_in) - : elt_mode (elt_mode_in), value (value_in), step (step_in), insn (MOV), -modifier (LSL), shift (0) -{} +::simd_immediate_info (scalar_mode elt_mode_in, rtx base_in, rtx step_in) + : elt_mode (elt_mode_in), insn (INDEX) +{ + u.index.base = base_in; + u.index.step = step_in; +} /* The current code model. */ enum aarch64_code_model aarch64_cmodel; @@ -16275,17 +16294,18 @@ aarch64_output_simd_mov_immediate (rtx c if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT) { - gcc_assert (info.shift == 0 && info.insn == simd_immediate_info::MOV); + gcc_assert (info.insn == simd_immediate_info::MOV + && info.u.mov.shift == 0); /* For FP zero change it to a CONST_INT 0 and use the integer SIMD move immediate path. */ - if (aarch64_float_const_zero_rtx_p (info.value)) -info.value = GEN_INT (0); + if (aarch64_float_const_zero_rtx_p (info.u.mov.value)) +info.u.mov.value = GEN_INT (0); else { const unsigned int buf_size = 20; char float_buf[buf_size] = {'\0'};
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
On Tue, 13 Aug 2019, Robin Dapp wrote: +/* ((T)(A)) + CST -> (T)(A + CST) */ +#if GIMPLE + (simplify + (plus (convert SSA_NAME@0) INTEGER_CST@1) +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) I have become rather wary of INTEGRAL_TYPE_P recently because it includes enum types, which with -fstrict-enum can have a surprising behavior. If I have enum E { A, B, C }; and e has type enum E, with -fstrict-enum, do your tests manage to prevent (long)e+1 from becoming (long)(e+1) with an unsafe operation done in the enum type? + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && int_fits_type_p (@1, TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits +and (A + CST)'s range does not overflow. */ + (with + { + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, + max_ovf = wi::OVF_OVERFLOW; +tree inner_type = TREE_TYPE (@0); + +wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), + TYPE_SIGN (inner_type)); + +wide_int wmin0, wmax0; +if (get_range_info (@0, , ) == VR_RANGE) + { +wi::add (wmin0, w1, TYPE_SIGN (inner_type), _ovf); +wi::add (wmax0, w1, TYPE_SIGN (inner_type), _ovf); + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) + ))) +#endif + -- Marc Glisse
Re: [MSP430][PATCH 2/2] Read MCU data from external file
On Mon, 12 Aug 2019 14:34:39 -0600 Jeff Law wrote: > On 8/8/19 6:17 AM, Jozef Lawrynowicz wrote: > > This patch extends the MCU data handling so that MCU data can be provided > > in an external file (devices.csv). This means the compiler doesn't have to > > be > > updated and rebuilt to support new devices when they are released. > > > > TI distribute devices.csv with other support files (header files, linker > > scripts) on their website. I have also tested that a program builds > > successfully > > for every MCU and the correct hwmult library is passed to the linker. > > I also built the msp430 cross compiler using a trunk build of native GCC to > > ensure all the error and warning messages added by my code are correct, as > > checked by -Wformat-diag. > > > > > > 0002-MSP430-Devices-2-Read-MCU-data-from-external-devices.patch > > > > From 6f67cdd282f2351d7450e343314beeaa745f0159 Mon Sep 17 00:00:00 2001 > > From: Jozef Lawrynowicz > > Date: Tue, 6 Aug 2019 10:52:54 +0100 > > Subject: [PATCH 2/2] MSP430: Devices [2]: Read MCU data from external > > devices.csv file, if it exists > > > > gcc/ChangeLog: > > > > 2019-08-XX Jozef Lawrynowicz > > > > * gcc/config/msp430/driver-msp430.c (msp430_set_driver_var): New. > > * gcc/config/msp430/msp430-devices.c (canonicalize_path_dirsep): New. > > (msp430_check_path_for_devices): New. > > (parse_devices_csv_1): New. > > (parse_devices_csv): New. > > (msp430_extract_mcu_data): Try to find devices.csv and search for the > > MCU data in devices.csv before using the hard-coded data. > > Warn if devices.csv isn't found and the MCU wasn't found in the > > hard-coded data either. > > * gcc/config/msp430/msp430.h (DRIVER_SELF_SPECS): Call > > msp430_set_driver_var for -mno-warn-devices-csv and -mdevices-csv-loc. > > Search for devices.csv on -I and -L paths. > > (EXTRA_SPEC_FUNCTIONS): Add msp430_check_path_for_devices and > > msp430_set_driver_var. > > * gcc/config/msp430/msp430.opt: Add -mwarn-devices-csv and > > -mdevices-csv-loc=. > > * gcc/doc/invoke.texi (-mmcu): Document that -I and -L paths are > > searched for devices.csv. > > (mwarn-devices-csv): Document option. > > > > gcc/testsuite/ChangeLog: > > > > 2019-08-XX Jozef Lawrynowicz > > > > * gcc.target/msp430/msp430.exp (msp430_device_permutations_runtest): > > Handle csv-* and bad-devices-* tests. > > * gcc.target/msp430/devices/README: Document how bad-devices-* tests > > work. > > * gcc.target/msp430/devices/bad-devices-1.c: New test. > > * gcc.target/msp430/devices/bad-devices-2.c: Likewise. > > * gcc.target/msp430/devices/bad-devices-3.c: Likewise. > > * gcc.target/msp430/devices/bad-devices-4.c: Likewise. > > * gcc.target/msp430/devices/bad-devices-5.c: Likewise. > > * gcc.target/msp430/devices/bad-devices-6.c: Likewise. > > * gcc.target/msp430/devices/csv-device-order.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_00.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_01.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_02.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_04.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_08.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_10.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_11.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_12.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_14.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_18.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_20.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_21.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_22.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_24.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430_28.c: Likewise. > > * gcc.target/msp430/devices/csv-msp430fr5969.c: Likewise. > > * gcc.target/msp430/devices/hard-foo.c: Likewise. > > * gcc.target/msp430/devices/bad-devices-1.csv: New test support file. > > * gcc.target/msp430/devices/bad-devices-2.csv: Likewise. > > * gcc.target/msp430/devices/bad-devices-3.csv: Likewise. > > * gcc.target/msp430/devices/bad-devices-4.csv: Likewise. > > * gcc.target/msp430/devices/bad-devices-5.csv: Likewise. > > * gcc.target/msp430/devices/bad-devices-6.csv: Likewise. > > * gcc.target/msp430/devices/devices.csv: Likewise. > > --- > So it's good we don't have to do updates when a new devices.csv is > released -- except when we devices require new multilibs, but I don't > offhand see a sensible way to deal with that. For a given device, the only choice to make regarding multilibs at the moment is whether to use a 430 or 430X multilib. The large memory model multilib will only be used if a user passes -mlarge for a 430X device. So if a device is added which uses a new extension to the ISA (which is not
Re: [AArch64] Add a "y" constraint for V0-V7
James Greenhalgh writes: > On Wed, Aug 07, 2019 at 07:19:12PM +0100, Richard Sandiford wrote: >> Some indexed SVE FCMLA operations have a 3-bit register field that >> requires one of Z0-Z7. This patch adds a public "y" constraint for that. >> >> The patch also documents "x", which is again intended to be a public >> constraint. >> >> Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. >> OK to install? > > > I had the vague recollection that 'y' already meant something... I'm > guessing you already checked, but just in case, please check. Yeah, it's definitely unused at the moment. It might be this usage that you're remembering, since we unofficially "reserved" "y" for it a few years back. (It was a nice coincidence that the w/x/y sequence was still free to mean 32/16/8 FP registers. :-)) Thanks, Richard > Otherwise, this is OK. > > Thanks, > James > > >> >> Richard >> >> >> 2019-08-07 Richard Sandiford >> >> gcc/ >> * doc/md.texi: Document the x and y constraints for AArch64. >> * config/aarch64/aarch64.h (FP_LO8_REGNUM_P): New macro. >> (FP_LO8_REGS): New reg_class. >> (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add an entry for FP_LO8_REGS. >> * config/aarch64/aarch64.c (aarch64_hard_regno_nregs) >> (aarch64_regno_regclass, aarch64_class_max_nregs): Handle FP_LO8_REGS. >> * config/aarch64/predicates.md (aarch64_simd_register): Use >> FP_REGNUM_P instead of checking the classes manually. >> * config/aarch64/constraints.md (y): New constraint. >> >> gcc/testsuite/ >> * gcc.target/aarch64/asm-x-constraint-1.c: New test. >> * gcc.target/aarch64/asm-y-constraint-1.c: Likewise. >>
Use checking forms of DECL_FUNCTION_CODE (PR 91421)
We were shoe-horning all built-in enumerations (including frontend and target-specific ones) into a field of type built_in_function. This was accessed as either an lvalue or an rvalue using DECL_FUNCTION_CODE. The obvious danger with this (as was noted by several ??? comments) is that the ranges have nothing to do with each other, and targets can easily have more built-in functions than generic code. But my patch to make the field bigger was the straw that finally made the problem visible. This patch therefore: - replaces the field with a plain unsigned int - turns DECL_FUNCTION_CODE into an rvalue-only accessor that checks that the function really is BUILT_IN_NORMAL - adds corresponding DECL_MD_FUNCTION_CODE and DECL_FE_FUNCTION_CODE accessors for BUILT_IN_MD and BUILT_IN_FRONTEND respectively - adds DECL_UNCHECKED_FUNCTION_CODE for places that need to access the underlying field (should be low-level code only) - adds new helpers for setting the built-in class and function code - makes DECL_BUILT_IN_CLASS an rvalue-only accessor too, since all assignments should go through the new helpers Tested on aarch64-linux-gnu and x86_64-linux-gnu (--enable-languages=all --enable-host-shared). OK to install? Richard 2019-08-13 Richard Sandiford gcc/ PR middle-end/91421 * tree-core.h (function_decl::function_code): Change type to unsigned int. * tree.h (DECL_FUNCTION_CODE): Rename old definition to... (DECL_UNCHECKED_FUNCTION_CODE): ...this. (DECL_BUILT_IN_CLASS): Make an rvalue macro only. (DECL_FUNCTION_CODE): New function. Assert that the built-in class is BUILT_IN_NORMAL. (DECL_MD_FUNCTION_CODE, DECL_FE_FUNCTION_CODE): New functions. (set_decl_built_in_function, copy_decl_built_in_function): Likewise. (fndecl_built_in_p): Change the type of the "name" argument to unsigned int. * builtins.c (expand_builtin): Move DECL_FUNCTION_CODE use after check for DECL_BUILT_IN_CLASS. * cgraphclones.c (build_function_decl_skip_args): Use set_decl_built_in_function. * ipa-param-manipulation.c (ipa_modify_formal_parameters): Likewise. * ipa-split.c (split_function): Likewise. * langhooks.c (add_builtin_function_common): Likewise. * omp-simd-clone.c (simd_clone_create): Likewise. * tree-streamer-in.c (unpack_ts_function_decl_value_fields): Likewise. * config/darwin.c (darwin_init_cfstring_builtins): Likewise. (darwin_fold_builtin): Use DECL_MD_FUNCTION_CODE instead of DECL_FUNCTION_CODE. * fold-const.c (operand_equal_p): Compare DECL_UNCHECKED_FUNCTION_CODE instead of DECL_FUNCTION_CODE. * lto-streamer-out.c (hash_tree): Use DECL_UNCHECKED_FUNCTION_CODE instead of DECL_FUNCTION_CODE. * tree-streamer-out.c (pack_ts_function_decl_value_fields): Likewise. * print-tree.c (print_node): Use DECL_MD_FUNCTION_CODE when printing DECL_BUILT_IN_MD. Handle DECL_BUILT_IN_FRONTEND. * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin) (aarch64_fold_builtin, aarch64_gimple_fold_builtin): Use DECL_MD_FUNCTION_CODE instead of DECL_FUNCTION_CODE. * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise. * config/alpha/alpha.c (alpha_expand_builtin, alpha_fold_builtin): (alpha_gimple_fold_builtin): Likewise. * config/arc/arc.c (arc_expand_builtin): Likewise. * config/arm/arm-builtins.c (arm_expand_builtin): Likewise. * config/avr/avr-c.c (avr_resolve_overloaded_builtin): Likewise. * config/avr/avr.c (avr_expand_builtin, avr_fold_builtin): Likewise. * config/bfin/bfin.c (bfin_expand_builtin): Likewise. * config/c6x/c6x.c (c6x_expand_builtin): Likewise. * config/frv/frv.c (frv_expand_builtin): Likewise. * config/gcn/gcn.c (gcn_expand_builtin_1): Likewise. (gcn_expand_builtin): Likewise. * config/i386/i386-builtins.c (ix86_builtin_reciprocal): Likewise. (fold_builtin_cpu): Likewise. * config/i386/i386-expand.c (ix86_expand_builtin): Likewise. * config/i386/i386.c (ix86_fold_builtin): Likewise. (ix86_gimple_fold_builtin): Likewise. * config/ia64/ia64.c (ia64_fold_builtin): Likewise. (ia64_expand_builtin): Likewise. * config/iq2000/iq2000.c (iq2000_expand_builtin): Likewise. * config/mips/mips.c (mips_expand_builtin): Likewise. * config/msp430/msp430.c (msp430_expand_builtin): Likewise. * config/nds32/nds32-intrinsic.c (nds32_expand_builtin_impl): Likewise. * config/nios2/nios2.c (nios2_expand_builtin): Likewise. * config/nvptx/nvptx.c (nvptx_expand_builtin): Likewise. * config/pa/pa.c (pa_expand_builtin): Likewise. * config/pru/pru.c (pru_expand_builtin): Likewise. * config/riscv/riscv-builtins.c (riscv_expand_builtin):
Protect some checks of DECL_FUNCTION_CODE
This patch protects various uses of DECL_FUNCTION_CODE that didn't obviously check for BUILT_IN_NORMAL first (either directly or in callers). They could therefore trigger for functions that either aren't built-ins or are a different kind of built-in. Also, the patch removes a redundant GIMPLE_CALL check from optimize_stdarg_builtin, since it gave the impression that the stmt was less well checked than it actually is. Tested on aarch64-linux-gnu and x86_64-linux-gnu (--enable-languages=all --enable-host-shared). OK to install? Richard 2019-08-13 Richard Sandiford gcc/ PR middle-end/91421 * attribs.c (decl_attributes): Check the DECL_BUILT_IN_CLASS before the DECL_FUNCTION_CODE. * calls.c (maybe_warn_alloc_args_overflow): Use fndecl_built_in_p to check for a BUILT_IN_ALLOCA call. * ipa-cp.c (ipa_get_indirect_edge_target_1): Likewise for BUILT_IN_UNREACHABLE. Don't check for a FUNCTION_TYPE. * ipa-devirt.c (possible_polymorphic_call_target_p): Likewise. * ipa-prop.c (try_make_edge_direct_virtual_call): Likewise. * gimple-ssa-isolate-paths.c (is_addr_local): Check specifically for BUILT_IN_NORMAL functions. * trans-mem.c (expand_block_edges): Use gimple_call_builtin_p to test for BUILT_IN_TM_ABORT. * tree-ssa-ccp.c (optimize_stack_restore): Use fndecl_built_in_p to check for a BUILT_IN_STACK_RESTORE call. (optimize_stdarg_builtin): Remove redundant check for GIMPLE_CALL. * tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Check for a BUILT_IN_NORMAL decl before checking its DECL_FUNCTION_CODE. * tree-vect-patterns.c (vect_recog_pow_pattern): Use a positive test for a BUILT_IN_NORMAL call instead of a negative test for an internal function call. gcc/c/ PR middle-end/91421 * c-decl.c (header_for_builtin_fn): Take a FUNCTION_DECL instead of a built_in_function. (diagnose_mismatched_decls, implicitly_declare): Update accordingly. Index: gcc/attribs.c === --- gcc/attribs.c 2019-07-18 09:22:13.409771785 +0100 +++ gcc/attribs.c 2019-08-13 10:31:18.211019996 +0100 @@ -691,6 +691,7 @@ decl_attributes (tree *node, tree attrib if (!built_in || !DECL_P (*anode) + || DECL_BUILT_IN_CLASS (*anode) != BUILT_IN_NORMAL || (DECL_FUNCTION_CODE (*anode) != BUILT_IN_UNREACHABLE && (DECL_FUNCTION_CODE (*anode) != BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE))) Index: gcc/calls.c === --- gcc/calls.c 2019-07-29 09:39:49.750165306 +0100 +++ gcc/calls.c 2019-08-13 10:31:18.215019968 +0100 @@ -1350,7 +1350,6 @@ maybe_warn_alloc_args_overflow (tree fn, location_t loc = EXPR_LOCATION (exp); tree fntype = fn ? TREE_TYPE (fn) : TREE_TYPE (TREE_TYPE (exp)); - built_in_function fncode = fn ? DECL_FUNCTION_CODE (fn) : BUILT_IN_NONE; bool warned = false; /* Validate each argument individually. */ @@ -1376,11 +1375,10 @@ maybe_warn_alloc_args_overflow (tree fn, friends. Also avoid issuing the warning for calls to function named "alloca". */ - if ((fncode == BUILT_IN_ALLOCA - && IDENTIFIER_LENGTH (DECL_NAME (fn)) != 6) - || (fncode != BUILT_IN_ALLOCA - && !lookup_attribute ("returns_nonnull", - TYPE_ATTRIBUTES (fntype + if (fn && fndecl_built_in_p (fn, BUILT_IN_ALLOCA) + ? IDENTIFIER_LENGTH (DECL_NAME (fn)) != 6 + : !lookup_attribute ("returns_nonnull", + TYPE_ATTRIBUTES (fntype))) warned = warning_at (loc, OPT_Walloc_zero, "%Kargument %i value is zero", exp, idx[i] + 1); Index: gcc/ipa-cp.c === --- gcc/ipa-cp.c2019-07-16 09:11:05.561423510 +0100 +++ gcc/ipa-cp.c2019-08-13 10:31:18.219019937 +0100 @@ -2436,8 +2436,7 @@ ipa_get_indirect_edge_target_1 (struct c if (can_refer) { if (!target - || (TREE_CODE (TREE_TYPE (target)) == FUNCTION_TYPE - && DECL_FUNCTION_CODE (target) == BUILT_IN_UNREACHABLE) + || fndecl_built_in_p (target, BUILT_IN_UNREACHABLE) || !possible_polymorphic_call_target_p (ie, cgraph_node::get (target))) { Index: gcc/ipa-devirt.c === --- gcc/ipa-devirt.c2019-08-05 17:46:20.173727579 +0100 +++ gcc/ipa-devirt.c2019-08-13
Re: [PATCH 1/3] Perform fold when propagating.
On Tue, Aug 13, 2019 at 10:36 AM Robin Dapp wrote: > > This patch performs more aggressive folding in order for the > match.pd changes to kick in later. > > Some test cases rely on VRP doing something which now already > happens during CCP so adjust them accordingly. > > Also, the loop versioning pass was missing one case when > deconstructing addresses that would only be triggered after > this patch for me: > It could handle addition and subsequent convert/nop but not > a convert/nop directly. This would cause the hash to be > calculated differently and, in turn, cause the pass to miss > a versioning opportunity. Fixed this by adding the missing > case. > --- > gcc/fortran/trans-intrinsic.c | 5 +++-- > gcc/gimple-loop-versioning.cc | 6 ++ > gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C | 4 +++- > gcc/testsuite/gcc.dg/pr35691-1.c | 4 ++-- > gcc/testsuite/gcc.dg/pr35691-2.c | 4 ++-- > gcc/testsuite/gcc.dg/pr35691-3.c | 4 ++-- > gcc/testsuite/gcc.dg/pr35691-4.c | 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c | 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/bit-assoc.c | 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/forwprop-16.c| 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/loop-15.c| 2 +- > gcc/testsuite/gcc.dg/tree-ssa/pr23744.c| 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/pr52631.c| 4 ++-- > gcc/testsuite/gcc.dg/vect/vect-over-widen-23.c | 2 +- > gcc/tree-ssa-propagate.c | 6 -- > 15 files changed, 36 insertions(+), 25 deletions(-) > > diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c > index a6e33833680..99ec5f34319 100644 > --- a/gcc/fortran/trans-intrinsic.c > +++ b/gcc/fortran/trans-intrinsic.c > @@ -5428,8 +5428,9 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr) >tree type; >tree tmp; >tree found; > - tree forward_branch; > - tree back_branch; > + /* Initialize here to avoid 'maybe used uninitialized'. */ > + tree forward_branch = NULL_TREE; > + tree back_branch = NULL_TREE; >gfc_loopinfo loop; >gfc_ss *arrayss; >gfc_ss *maskss; > diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc > index 8fa19488490..36c70543402 100644 > --- a/gcc/gimple-loop-versioning.cc > +++ b/gcc/gimple-loop-versioning.cc > @@ -1266,6 +1266,12 @@ loop_versioning::record_address_fragment (gimple *stmt, > continue; > } > } > + if (code == NOP_EXPR) CONVERT_EXPR_CODE_P (code) > + { > + tree op1 = gimple_assign_rhs1 (assign); > + address->terms[i].expr = strip_casts (op1); > + continue; > + } > } >i += 1; > } > diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C > b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C > index ca7e18f66a9..83a8f0668df 100644 > --- a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C > +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -fdump-tree-dse2-details -Wno-return-type" } */ > +/* { dg-options "-O2 -fdump-tree-dse2-details -fdump-tree-vrp1-details > -Wno-return-type" } */ > > typedef __SIZE_TYPE__ size_t; > extern "C" > @@ -55,3 +55,5 @@ fill_vec_av_set (av_set_t av) > > /* { dg-final { scan-tree-dump-not "Trimming statement .head = -" "dse2" } } > */ > /* { dg-final { scan-tree-dump-not "mem\[^\r\n\]*, 0\\);" "dse2" } } */ > +/* { dg-final { scan-tree-dump "Folded into: GIMPLE_NOP" "vrp1" } } */ > +/* { dg-final { scan-tree-dump-not "memmove" "dse2" } } */ } > diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c > b/gcc/testsuite/gcc.dg/pr35691-1.c > index 34dc02ab560..2e1d8bd0f43 100644 > --- a/gcc/testsuite/gcc.dg/pr35691-1.c > +++ b/gcc/testsuite/gcc.dg/pr35691-1.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ > +/* { dg-options "-O2 -fdump-tree-ccp1-details" } */ Hmm. So this shows that while CCP doesn't produce anything interesting in the lattice we still substitute (and now fold) everything. The immediate next pass is forwprop so we're doing everything twice. In total we have 3 CCP passes, 5 copy-prop and two VRP passes. So we're re-folding everything an additional 10 times with the patch. I know I suggested this - eh - but the number looks a bit excessive. Mostly due to the fact we have so many of these passes. May I suggest to add a parameter to the substitute-and-fold engine so we can do the folding on all stmts only when enabled and enable it just for VRP? That also avoids the testsuite noise. I think it's also only necessary to fold a stmt when a (indirect) use after substitution has either been folded or has (new) SSA name info (range/known-bits) set? Thanks, Richard. > int foo(int z0, unsigned z1) > { > @@ -9,4 +9,4 @@ int foo(int z0, unsigned z1) >return t2; > } > > -/* { dg-final {
Re: [MSP430][PATCH 1/2] Consolidate handling of hard-coded MCU data
On Mon, 12 Aug 2019 14:30:06 -0600 Jeff Law wrote: > On 8/8/19 6:14 AM, Jozef Lawrynowicz wrote: > > This patch improves the handling of MCU data by consolidating multiple > > copies of hard-coded MCU data into a single location, and adds a new > > function > > to be used as a single entry point for the extraction of MCU data for the > > selected MCU. > > > > This ensures the data is only extracted once per invocation of the > > driver/compiler, whilst previously, the data for the MCU is extracted each > > time > > it is needed. > > > > Some notes: > > - The GNU assembler doesn't do anything with the -mmcu option beyond > > setting up > > the CPU ISA, so if the GCC driver passes it the -mcpu option, which it > > will > > always do if -mmcu is specified, then it is redundant to also pass it > > -mmcu. > > - The indenting in some places (e.g. msp430_select_hwmult_lib) looks wrong > > in > > the patched file, but to make the diff a lot easier to read I have kept > > the > > indenting the same as it was before. I can fix this after the patch is > > accepted. > FWIW, another way to address the indentation issue is with the diff -b > option which says to ignore whitespace differences. Ah right, thanks for the tip. > > Regardless, yes, please clean up any indentation issues. > > > > > > > > 0001-MSP430-Devices-1-Consolidate-handling-of-hard-coded-.patch > > > > From cd131b07e0447d104c99317e7ac37c2420c1bf6e Mon Sep 17 00:00:00 2001 > > From: Jozef Lawrynowicz > > Date: Wed, 31 Jul 2019 22:53:50 +0100 > > Subject: [PATCH 1/2] MSP430: Devices [1]: Consolidate handling of hard-coded > > MCU data > > > > gcc/ChangeLog: > > > > 2019-08-XX Jozef Lawrynowicz > > > > * gcc/config.gcc (msp430*-*-*): Add msp430-devices.o to extra_objs and > > extra_gcc_objs. > > * gcc/config/msp430/driver-msp430.c: Remove msp430_mcu_data. > > (msp430_select_cpu): New spec function. > > (msp430_select_hwmult_lib): Use msp430_extract_mcu_data to extract > > MCU data. > > * gcc/config/msp430/msp430-devices.c: New file. > > * gcc/config/msp430/msp430-devices.h: New file. > > * gcc/config/msp430/msp430.c: Remove msp430_mcu_data. > > (msp430_option_override): Use msp430_extract_mcu_data to extract > > MCU data. > > (msp430_use_f5_series_hwmult): Likewise. > > (use_32bit_hwmult): Likewise. > > (msp430_no_hwmult): Likewise. > > * gcc/config/msp430/msp430.h (ASM_SPEC): Don't pass -mmcu to the > > assembler. > > (DRIVER_SELF_SPECS): Call msp430_select_cpu if -mmcu is used without > > and -mcpu option. > > (EXTRA_SPEC_FUNCTIONS): Add msp430_select_cpu. > > * gcc/config/msp430/t-msp430: Add rule to build msp430-devices.o. > > Remove hard-coded MCU multilib data. > > > > gcc/testsuite/ChangeLog: > > > > 2019-08-XX Jozef Lawrynowicz > > > > * gcc.target/msp430/msp430.exp > > (check_effective_target_msp430_430_selected): New. > > (check_effective_target_msp430_430x_selected): New. > > (check_effective_target_msp430_mlarge_selected): New. > > (check_effective_target_msp430_hwmul_not_none): New. > > (check_effective_target_msp430_hwmul_not_16bit): New. > > (check_effective_target_msp430_hwmul_not_32bit): New. > > (check_effective_target_msp430_hwmul_not_f5): New. > > (msp430_get_opts): New. > > (msp430_device_permutations_runtest): New. > > * gcc.target/msp430/devices/README: New file. > > * gcc.target/msp430/devices-main.c: New test. > > * gcc.target/msp430/devices/hard-cc430f5123.c: Likewise. > > * gcc.target/msp430/devices/hard-foo.c: Likewise. > > * gcc.target/msp430/devices/hard-msp430afe253.c: Likewise. > > * gcc.target/msp430/devices/hard-msp430cg4616.c: Likewise. > > * gcc.target/msp430/devices/hard-msp430f4783.c: Likewise. > > * gcc.target/msp430/devices/hard-rf430frl154h_rom.c: Likewise. > Presumably we aren't supporting switching the selected mcu via > attributes on a per-function basis? No that's not supported at the moment. Given that 430X devices support 430 and 430X ISA instructions, I suppose this is something that could be added in the future. Similar to (but not as complicated as), ARM arm/thumb interworking. > > > +/* Main entry point to load the MCU data for the given -mmcu into > > + extracted_mcu_data. hard_msp430_mcu_data (initialized at the bottom of > > this > > + file) is searched for the MCU name. > > + This function only needs to be executed once, but it can be first called > > + from a number of different locations. */ > > +void > > +msp430_extract_mcu_data (const char * mcu_name) > > +{ > > + static int executed = 0; > > + int i; > > + if (mcu_name == NULL || executed == 1) > > +return; > > + executed = 1; > > + /* FIXME: This array is alpha sorted - we could use a binary search. */ > > + for (i = ARRAY_SIZE (hard_msp430_mcu_data); i--;) > > +if (strcasecmp (mcu_name, hard_msp430_mcu_data[i].name)
[C++ Patch] Improve grok_array_decl location
Hi, for this error message about the types involved in array subscripting I think we can exploit the available location argument and point to the open square bracket instead of simply using input_location. Tested x86_64-linux. Thanks, Paolo Carlini. /cp 2019-08-13 Paolo Carlini * decl2.c (grok_array_decl): Use the location of the open square bracket in error message about invalid types. /testsuite 2019-08-13 Paolo Carlini * g++.dg/conversion/simd4.C: Test locations. Index: cp/decl2.c === --- cp/decl2.c (revision 274330) +++ cp/decl2.c (working copy) @@ -434,8 +434,8 @@ grok_array_decl (location_t loc, tree array_expr, array_expr = p2, index_exp = i1; else { - error ("invalid types %<%T[%T]%> for array subscript", -type, TREE_TYPE (index_exp)); + error_at (loc, "invalid types %<%T[%T]%> for array subscript", + type, TREE_TYPE (index_exp)); return error_mark_node; } Index: testsuite/g++.dg/conversion/simd4.C === --- testsuite/g++.dg/conversion/simd4.C (revision 274330) +++ testsuite/g++.dg/conversion/simd4.C (working copy) @@ -12,13 +12,13 @@ void foo () { b[t]; - b[u];// { dg-error "invalid types" } - b[v];// { dg-error "invalid types" } - b[w];// { dg-error "invalid types" } + b[u];// { dg-error "4:invalid types" } + b[v];// { dg-error "4:invalid types" } + b[w];// { dg-error "4:invalid types" } t[b]; - u[b];// { dg-error "invalid types" } - v[b];// { dg-error "invalid types" } - w[b];// { dg-error "invalid types" } + u[b];// { dg-error "4:invalid types" } + v[b];// { dg-error "4:invalid types" } + w[b];// { dg-error "4:invalid types" } new int[t]; new int[u]; // { dg-error "new-declarator must have integral" } new int[v]; // { dg-error "new-declarator must have integral" }
Re: [PATCH] Improve DSE to handle redundant zero initializations.
On Tue, Aug 13, 2019 at 10:18 AM Richard Sandiford wrote: > > Thanks for doing this. > > Matthew Beliveau writes: > > diff --git gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c > > index 5b7c4fc6d1a..dcaeb8edbfe 100644 > > --- gcc/tree-ssa-dse.c > > +++ gcc/tree-ssa-dse.c > > @@ -628,11 +628,8 @@ dse_optimize_redundant_stores (gimple *stmt) > >tree fndecl; > >if ((is_gimple_assign (use_stmt) > > && gimple_vdef (use_stmt) > > -&& ((gimple_assign_rhs_code (use_stmt) == CONSTRUCTOR > > - && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (use_stmt)) == 0 > > - && !gimple_clobber_p (stmt)) > > -|| (gimple_assign_rhs_code (use_stmt) == INTEGER_CST > > -&& integer_zerop (gimple_assign_rhs1 (use_stmt) > > +&& initializer_zerop (gimple_op (use_stmt, 1), NULL) > > +&& !gimple_clobber_p (stmt)) > > || (gimple_call_builtin_p (use_stmt, BUILT_IN_NORMAL) > > && (fndecl = gimple_call_fndecl (use_stmt)) != NULL > > && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET > > @@ -1027,15 +1024,13 @@ dse_dom_walker::dse_optimize_stmt > > (gimple_stmt_iterator *gsi) > > { > >bool by_clobber_p = false; > > > > - /* First see if this store is a CONSTRUCTOR and if there > > - are subsequent CONSTRUCTOR stores which are totally > > - subsumed by this statement. If so remove the subsequent > > - CONSTRUCTOR store. > > + /* Check if this store initalizes zero, or some aggregate of zeros, > > + and check if there are subsequent stores which are subsumed by this > > + statement. If so, remove the subsequent store. > > > >This will tend to make fewer calls into memset with longer > >arguments. */ > > - if (gimple_assign_rhs_code (stmt) == CONSTRUCTOR > > - && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt)) == 0 > > + if (initializer_zerop (gimple_op (stmt, 1), NULL) > > && !gimple_clobber_p (stmt)) > > dse_optimize_redundant_stores (stmt); > > > > In addition to Jeff's comment, the original choice of gimple_assign_rhs1 > is the preferred way to write this (applies to both hunks). And the !gimple_clobber_p test is now redundant. > Richard
Re: [Patch v2] Enable math functions linking with static library for LTO
On Tue, Aug 13, 2019 at 4:22 AM luoxhu wrote: > > Hi Richard, > > On 2019/8/12 16:51, Richard Biener wrote: > > On Mon, Aug 12, 2019 at 8:50 AM luoxhu wrote: > >> > >> Hi Richard, > >> Thanks for your comments, updated the v2 patch as below: > >> 1. Define and use builtin_with_linkage_p. > >> 2. Add comments. > >> 3. Add a testcase. > >> > >> In LTO mode, if static library and dynamic library contains same > >> function and both libraries are passed as arguments, linker will link > >> the function in dynamic library no matter the sequence. This patch > >> will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL > >> is a math function, then the function in static library will be linked > >> first if its sequence is ahead of the dynamic library. > > > > Comments below > > > >> gcc/ChangeLog > >> > >> 2019-08-12 Xiong Hu Luo > >> > >> PR lto/91287 > >> * builtins.c (builtin_with_linkage_p): New function. > >> * builtins.h (builtin_with_linkage_p): New function. > >> * symtab.c (write_symbol): Use builtin_with_linkage_p. > >> * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p): > >> Likewise. > >> > >> gcc/testsuite/ChangeLog > >> > >> 2019-08-12 Xiong Hu Luo > >> > >> PR lto/91287 > >> * gcc.dg/pr91287.c: New testcase. > >> --- > >> gcc/builtins.c | 89 ++ > >> gcc/builtins.h | 2 + > >> gcc/lto-streamer-out.c | 4 +- > >> gcc/symtab.c | 13 - > >> gcc/testsuite/gcc.dg/pr91287.c | 40 +++ > >> 5 files changed, 145 insertions(+), 3 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/pr91287.c > >> > >> diff --git a/gcc/builtins.c b/gcc/builtins.c > >> index 695a9d191af..f4dea941a27 100644 > >> --- a/gcc/builtins.c > >> +++ b/gcc/builtins.c > >> @@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p) > >> *p = (char)tree_to_uhwi (t); > >> return true; > >> } > >> + > >> +/* Return true if DECL is a specified builtin math function. These > >> functions > >> + should have symbol in symbol table to provide linkage with faster > >> version of > >> + libraries. */ > > > > The comment should read like > > > > /* Return true if the builtin DECL is implemented in a standard > > library. Otherwise > > returns false which doesn't guarantee it is not (thus the list of > > handled builtins > > below may be incomplete). */ > > > >> +bool > >> +builtin_with_linkage_p (tree decl) > >> +{ > >> + if (!decl) > >> +return false; > > > > Omit this check please. > > > >> + if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL) > >> +switch (DECL_FUNCTION_CODE (decl)) > >> +{ > >> + CASE_FLT_FN (BUILT_IN_ACOS): > >> + CASE_FLT_FN (BUILT_IN_ACOSH): > >> + CASE_FLT_FN (BUILT_IN_ASIN): > >> + CASE_FLT_FN (BUILT_IN_ASINH): > >> + CASE_FLT_FN (BUILT_IN_ATAN): > >> + CASE_FLT_FN (BUILT_IN_ATANH): > >> + CASE_FLT_FN (BUILT_IN_ATAN2): > >> + CASE_FLT_FN (BUILT_IN_CBRT): > >> + CASE_FLT_FN (BUILT_IN_CEIL): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL): > >> + CASE_FLT_FN (BUILT_IN_COPYSIGN): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN): > >> + CASE_FLT_FN (BUILT_IN_COS): > >> + CASE_FLT_FN (BUILT_IN_COSH): > >> + CASE_FLT_FN (BUILT_IN_ERF): > >> + CASE_FLT_FN (BUILT_IN_ERFC): > >> + CASE_FLT_FN (BUILT_IN_EXP): > >> + CASE_FLT_FN (BUILT_IN_EXP2): > >> + CASE_FLT_FN (BUILT_IN_EXPM1): > >> + CASE_FLT_FN (BUILT_IN_FABS): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): > >> + CASE_FLT_FN (BUILT_IN_FDIM): > >> + CASE_FLT_FN (BUILT_IN_FLOOR): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR): > >> + CASE_FLT_FN (BUILT_IN_FMA): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): > >> + CASE_FLT_FN (BUILT_IN_FMAX): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX): > >> + CASE_FLT_FN (BUILT_IN_FMIN): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN): > >> + CASE_FLT_FN (BUILT_IN_FMOD): > >> + CASE_FLT_FN (BUILT_IN_FREXP): > >> + CASE_FLT_FN (BUILT_IN_HYPOT): > >> + CASE_FLT_FN (BUILT_IN_ILOGB): > >> + CASE_FLT_FN (BUILT_IN_LDEXP): > >> + CASE_FLT_FN (BUILT_IN_LGAMMA): > >> + CASE_FLT_FN (BUILT_IN_LLRINT): > >> + CASE_FLT_FN (BUILT_IN_LLROUND): > >> + CASE_FLT_FN (BUILT_IN_LOG): > >> + CASE_FLT_FN (BUILT_IN_LOG10): > >> + CASE_FLT_FN (BUILT_IN_LOG1P): > >> + CASE_FLT_FN (BUILT_IN_LOG2): > >> + CASE_FLT_FN (BUILT_IN_LOGB): > >> + CASE_FLT_FN (BUILT_IN_LRINT): > >> + CASE_FLT_FN (BUILT_IN_LROUND): > >> + CASE_FLT_FN (BUILT_IN_MODF): > >> + CASE_FLT_FN (BUILT_IN_NAN): > >> + CASE_FLT_FN (BUILT_IN_NEARBYINT): > >> + CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT): > >> + CASE_FLT_FN (BUILT_IN_NEXTAFTER): > >> + CASE_FLT_FN
Re: [PATCH] PR fortran/91414: Improved PRNG
On Mon, Aug 12, 2019 at 11:23 PM Steve Kargl wrote: > > On Sun, Aug 11, 2019 at 12:37:49PM +0300, Janne Blomqvist wrote: > > Update the PRNG from xorshift1024* to xoshiro256** by the same > > author. For details see > > > > http://prng.di.unimi.it/ > > > > and the paper at > > > > https://arxiv.org/abs/1805.01407 > > > > Also the seeding is slightly improved, by reading only 8 bytes from > > the operating system and using the simple splitmix64 PRNG to fill in > > the rest of the PRNG state (as recommended by the xoshiro author), > > instead of reading the entire state from the OS. > > > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > > > Looks good to me. Thanks, committed to trunk (with a minor bugfix I noticed). I backported the initialization changes to the gcc9 and gcc8 branches as well with the attached patch. -- Janne Blomqvist From 16abae420bcff75950868a30327bf7b242e0388e Mon Sep 17 00:00:00 2001 From: Janne Blomqvist Date: Tue, 13 Aug 2019 10:35:05 +0300 Subject: [PATCH] PR fortran/91414 Improve initialization of PRNG As part of PR 91414 an improved PRNG was contributed to trunk. This is a partial backport of some related changes to the PRNG. Namely when seeding the PRNG, it needs only 8 bytes of randomness from the OS, and uses a simple splitmix64 PRNG to fill in the rest of the state, instead of getting all the state from the OS. This can be useful for operating systems that can run out of entropy. libgfortran/ChangeLog: 2019-08-13 Janne Blomqvist Partial backport from trunk PR fortran/91414 * intrinsics/random.c (lcg_parkmiller): Replace with splitmix64. (splitmix64): New function. (getosrandom): Fix return value, simplify. (init_rand_state): Use getosrandom only to get 8 bytes, splitmix64 to fill rest of state. --- libgfortran/intrinsics/random.c | 51 ++--- 1 file changed, 21 insertions(+), 30 deletions(-) diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c index 7476439647c..9283477482f 100644 --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -275,30 +275,19 @@ jump (xorshift1024star_state* rs) } -/* Super-simple LCG generator used in getosrandom () if /dev/urandom - doesn't exist. */ +/* Splitmix64 recommended by xorshift author for initializing. After + getting one uint64_t value from the OS, this is used to fill in the + rest of the state. */ -#define M 2147483647 /* 2^31 - 1 (A large prime number) */ -#define A 16807 /* Prime root of M, passes statistical tests and produces a full cycle */ -#define Q 127773 /* M / A (To avoid overflow on A * seed) */ -#define R 2836 /* M % A (To avoid overflow on A * seed) */ - -__attribute__((unused)) static uint32_t -lcg_parkmiller(uint32_t seed) +static uint64_t +splitmix64 (uint64_t x) { -uint32_t hi = seed / Q; -uint32_t lo = seed % Q; -int32_t test = A * lo - R * hi; -if (test <= 0) -test += M; -return test; + uint64_t z = (x += 0x9e3779b97f4a7c15); + z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9; + z = (z ^ (z >> 27)) * 0x94d049bb133111eb; + return z ^ (z >> 31); } -#undef M -#undef A -#undef Q -#undef R - /* Get some random bytes from the operating system in order to seed the PRNG. */ @@ -315,7 +304,7 @@ getosrandom (void *buf, size_t buflen) #else #ifdef HAVE_GETENTROPY if (getentropy (buf, buflen) == 0) -return 0; +return buflen; #endif int flags = O_RDONLY; #ifdef O_CLOEXEC @@ -328,7 +317,7 @@ getosrandom (void *buf, size_t buflen) close (fd); return res; } - uint32_t seed = 1234567890; + uint64_t seed = 0x047f7684e9fc949dULL; time_t secs; long usecs; if (gf_gettime (, ) == 0) @@ -340,13 +329,9 @@ getosrandom (void *buf, size_t buflen) pid_t pid = getpid(); seed ^= pid; #endif - uint32_t* ub = buf; - for (size_t i = 0; i < buflen / sizeof (uint32_t); i++) -{ - ub[i] = seed; - seed = lcg_parkmiller (seed); -} - return buflen; + size_t size = buflen < sizeof (uint64_t) ? buflen : sizeof (uint64_t); + memcpy (buf, , size); + return size; #endif /* __MINGW64_VERSION_MAJOR */ } @@ -361,7 +346,13 @@ init_rand_state (xorshift1024star_state* rs, const bool locked) __gthread_mutex_lock (_lock); if (!master_init) { - getosrandom (master_state, sizeof (master_state)); + uint64_t os_seed; + getosrandom (_seed, sizeof (os_seed)); + for (uint64_t i = 0; i < sizeof (master_state) / sizeof (uint64_t); i++) + { + os_seed = splitmix64 (os_seed); + master_state[i] = os_seed; +} njumps = 0; master_init = true; } -- 2.17.1
[PATCH, contrib]: Do not escape "=".
Attached patch fixes the following gawk-5.0.1 warning: gawk: cmd. line:36: warning: regexp escape sequence `\=' is not a known regexp operator 2019-08-13 Uros Bizjak * test_summary: Do not escape "=". Tested by building mail-report.log with gawk-5.0.1 and gawk-4.2.1. OK for mainline? Uros. diff --git a/contrib/test_summary b/contrib/test_summary index 3560a64c4f19..5760b053ec27 100755 --- a/contrib/test_summary +++ b/contrib/test_summary @@ -127,7 +127,7 @@ NR == 1 { if (lang == "") lang = " "$2" "; else lang = " "; } $2 == "version" { save = $0; $1 = ""; $2 = ""; version = $0; gsub(/^ */, "", version); gsub(/\r$/, "", version); $0 = save; } -/\===.*Summary/ { print ""; print; blanks=1; } +/===.*Summary/ { print ""; print; blanks=1; } /tests ===/ || /^(Target|Host|Native)/ || $2 == "version" { print; blanks=1; } /^(XPASS|FAIL|UNRESOLVED|WARNING|ERROR|# of )/ { sub ("\r", ""); print; } /^using:/ { print ""; print; print ""; }
[Ada] Do not remove side-effects in an others_clause with function calls
An aggregate can be handled by the backend if it consists of static constants of an elementary type, or null. If a component is a type conversion we must preanalyze and resolve it to determine whether the ultimate value is in one of these categories. Previously we did a full analysis and resolution of the expression for the component, which could lead to a removal of side-effects, which is semantically incorrect if the expression includes functions with side-effects (e.g. a call to a random generator). Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Ed Schonberg gcc/ada/ * exp_aggr.adb (Aggr_Assignment_OK_For_Backend): Preanalyze expression, rather do a full analysis, to prevent unwanted removal of side effects which mask the intent of the expression. gcc/testsuite/ * gnat.dg/aggr27.adb: New testcase.--- gcc/ada/exp_aggr.adb +++ gcc/ada/exp_aggr.adb @@ -5321,6 +5321,16 @@ package body Exp_Aggr is return False; end if; + -- If the expression has side effects (e.g. contains calls with + -- potential side effects) reject as well. We only preanalyze the + -- expression to prevent the removal of intended side effects. + + Preanalyze_And_Resolve (Expr, Ctyp); + + if not Side_Effect_Free (Expr) then +return False; + end if; + -- The expression needs to be analyzed if True is returned Analyze_And_Resolve (Expr, Ctyp); --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/aggr27.adb @@ -0,0 +1,26 @@ +-- { dg-do run } +-- { dg-options "-gnatws -gnata" } + +with GNAT.Random_Numbers; + +procedure Aggr27 is + + Gen: GNAT.Random_Numbers.Generator; + + function Random return Long_Long_Integer is + Rand : Integer := GNAT.Random_Numbers.Random(Gen); + begin + return Long_Long_Integer(Rand); + end Random; + + type Values is range 1 .. 4; + + Seq_LLI : array (Values) of Long_Long_Integer := (others => Random); + Seq_I : array (Values) of Integer := (others => Integer(Random)); + +begin + -- Verify that there is at least two different entries in each. + + pragma Assert (For some E of Seq_LLI => E /= Seq_LLI (Values'First)); + pragma Assert (For some E of Seq_I => E /= Seq_I (Values'First)); +end Aggr27;
[Ada] Legality rule on ancestors of type extensions in generic bodies
This patch adds an RM reference for the rule that in a generic body a type extension cannot have ancestors that are generic formal types. The patch also extends the check to interface progenitors that may appear in a derived type declaration or private extension declaration. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Ed Schonberg gcc/ada/ * sem_ch3.adb (Check_Generic_Ancestor): New subprogram, aubsidiary to Build_Derived_Record_Type. to enforce the rule that a type extension declared in a generic body cznnot have an ancestor that is a generic formal (RM 3.9.1 (4/2)). The rule applies to all ancestors of the type, including interface progenitors. gcc/testsuite/ * gnat.dg/tagged4.adb: New testcase.--- gcc/ada/sem_ch3.adb +++ gcc/ada/sem_ch3.adb @@ -8574,6 +8574,84 @@ package body Sem_Ch3 is -- An empty Discs list means that there were no constraints in the -- subtype indication or that there was an error processing it. + procedure Check_Generic_Ancestors; + -- In Ada 2005 (AI-344), the restriction that a derived tagged type + -- cannot be declared at a deeper level than its parent type is + -- removed. The check on derivation within a generic body is also + -- relaxed, but there's a restriction that a derived tagged type + -- cannot be declared in a generic body if it's derived directly + -- or indirectly from a formal type of that generic. This applies + -- to progenitors as well. + + - + -- Check_Generic_Ancestors -- + - + + procedure Check_Generic_Ancestors is + Ancestor_Type : Entity_Id; + Intf_List : List_Id; + Intf_Name : Node_Id; + + procedure Check_Ancestor; + -- For parent and progenitors. + + + -- Check_Ancestor -- + + + procedure Check_Ancestor is + begin +-- If the derived type does have a formal type as an ancestor +-- then it's an error if the derived type is declared within +-- the body of the generic unit that declares the formal type +-- in its generic formal part. It's sufficient to check whether +-- the ancestor type is declared inside the same generic body +-- as the derived type (such as within a nested generic spec), +-- in which case the derivation is legal. If the formal type is +-- declared outside of that generic body, then it's certain +-- that the derived type is declared within the generic body +-- of the generic unit declaring the formal type. + +if Is_Generic_Type (Ancestor_Type) + and then Enclosing_Generic_Body (Ancestor_Type) /= + Enclosing_Generic_Body (Derived_Type) +then + Error_Msg_NE + ("ancestor type& is formal type of enclosing" +& " generic unit (RM 3.9.1 (4/2))", + Indic, Ancestor_Type); +end if; + end Check_Ancestor; + + begin + if Nkind (N) = N_Private_Extension_Declaration then +Intf_List := Interface_List (N); + else +Intf_List := Interface_List (Type_Definition (N)); + end if; + + if Present (Enclosing_Generic_Body (Derived_Type)) then +Ancestor_Type := Parent_Type; + +while not Is_Generic_Type (Ancestor_Type) + and then Etype (Ancestor_Type) /= Ancestor_Type +loop + Ancestor_Type := Etype (Ancestor_Type); +end loop; + +Check_Ancestor; + +if Present (Intf_List) then + Intf_Name := First (Intf_List); + while Present (Intf_Name) loop + Ancestor_Type := Entity (Intf_Name); + Check_Ancestor; + Next (Intf_Name); + end loop; +end if; + end if; + end Check_Generic_Ancestors; + begin if Ekind (Parent_Type) = E_Record_Type_With_Private and then Present (Full_View (Parent_Type)) @@ -8680,7 +8758,8 @@ package body Sem_Ch3 is -- Indic can either be an N_Identifier if the subtype indication -- contains no constraint or an N_Subtype_Indication if the subtype - -- indication has a constraint. + -- indecation has a constraint. In either case it can include an + -- interface list. Indic := Subtype_Indication (Type_Def); Constraint_Present := (Nkind (Indic) = N_Subtype_Indication); @@ -8909,52 +8988,8 @@ package body Sem_Ch3 is Freeze_Before (N, Parent_Type); end if; - -- In Ada 2005 (AI-344), the restriction that a derived tagged type - -- cannot be declared at a
[PATCH 2/3] Add simplify rules for wrapped binary operations.
We would like to simplify code like (larger_type)(var + const1) + const2 to (larger_type)(var + combined_const1_const2) when we know that no overflow happens. --- gcc/match.pd | 101 +++ 1 file changed, 101 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f7..94400529ad8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0 +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ +#if GIMPLE + (simplify +(plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + /* We actually want to perform two simplifications here: + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) + If for (A + CST1) we either do not care about overflow (e.g. + for a signed inner type) or the overflow is ok for an unsigned + inner type. + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) + If (CST1 + CST2) does not overflow and we do care about overflow + (for a signed outer type) or we do not care about overflow in an + unsigned outer type. */ + (with + { + tree inner_type = TREE_TYPE (@0); + wide_int wmin0, wmax0; + wide_int cst1 = wi::to_wide (@1); + +wi::overflow_type min_ovf = wi::OVF_OVERFLOW, + max_ovf = wi::OVF_OVERFLOW; + +/* Get overflow behavior. */ + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); + +/* Get value range of A. */ + enum value_range_kind vr0 = get_range_info (@0, , ); + +/* If we have a proper range, determine min and max overflow + of (A + CST1). + ??? We might also want to handle anti ranges. */ +if (vr0 == VR_RANGE) + { +wi::add (wmin0, cst1, TYPE_SIGN (inner_type), _ovf); +wi::add (wmax0, cst1, TYPE_SIGN (inner_type), _ovf); + } + +/* Inner overflow does not matter in this case. */ +if (ovf_undef_inner) + { +min_ovf = wi::OVF_NONE; +max_ovf = wi::OVF_NONE; + } + +/* Extend CST from INNER_TYPE to TYPE. */ +cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); + +/* Check for overflow of (TYPE)(CST1 + CST2). */ +wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; +wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), +_ovf); + +/* We *do* care about an overflow here as we do not want to introduce + new undefined behavior that was not there before. */ +if (ovf_undef_outer && outer_ovf) + { +/* Set these here to prevent the final conversion below + to take place instead of introducing a new guard variable. */ +min_ovf = wi::OVF_OVERFLOW; +max_ovf = wi::OVF_OVERFLOW; + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) +(plus (convert @0) { wide_int_to_tree (type, cst); } + ) +#endif + +/* ((T)(A)) + CST -> (T)(A + CST) */ +#if GIMPLE + (simplify + (plus (convert SSA_NAME@0) INTEGER_CST@1) +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && int_fits_type_p (@1, TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits +and (A + CST)'s range does not overflow. */ + (with + { + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, + max_ovf = wi::OVF_OVERFLOW; +tree inner_type = TREE_TYPE (@0); + +wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), + TYPE_SIGN (inner_type)); + +wide_int wmin0, wmax0; +if (get_range_info (@0, , ) == VR_RANGE) + { +wi::add (wmin0, w1, TYPE_SIGN (inner_type), _ovf); +wi::add (wmax0, w1, TYPE_SIGN (inner_type), _ovf); + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) + ))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0) -- 2.17.0
[PATCH 3/3] Add new test cases for wrapped binop simplification.
--- .../gcc.dg/tree-ssa/copy-headers-5.c | 2 +- .../gcc.dg/tree-ssa/copy-headers-7.c | 2 +- .../gcc.dg/wrapped-binop-simplify-run.c | 52 .../gcc.dg/wrapped-binop-simplify-signed-1.c | 60 +++ .../wrapped-binop-simplify-unsigned-1.c | 36 +++ 5 files changed, 150 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-run.c create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c index 3d9940558cb..7f2e4dc2e4d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-ch2-details" } */ +/* { dg-options "-O2 -fno-tree-ccp -fno-tree-vrp -fdump-tree-ch2-details" } */ int is_sorted(int *a, int n) { diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c index a0a6e6a9b57..12fca8fd2c4 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-ch2-details --param logical-op-non-short-circuit=0" } */ +/* { dg-options "-O2 -fdump-tree-ch2-details -fno-tree-ccp -fno-tree-vrp --param logical-op-non-short-circuit=0" } */ int is_sorted(int *a, int n, int m, int k) { diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-run.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-run.c new file mode 100644 index 000..06bf348f1c6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-run.c @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include +#include + +unsigned int a = 3; +int aa = 3; +int bb = 1; +int cc = 4; + +__attribute__((noinline)) +long foo(int a) +{ + if (a >= -30 && a <= -20) +return (long)(a + 10) + LONG_MAX; +} + +int main() +{ + volatile unsigned long b = (unsigned long)(UINT_MAX + 1) - 1; + assert (b == 18446744073709551615ul); + + volatile unsigned long c = (unsigned long)(a - 4) + 1; + assert (c == 4294967296); + + volatile unsigned long d = (unsigned long)(a + UINT_MAX - 4) + 2; + assert (d == 4294967296); + + volatile unsigned long e = (unsigned long)(a - UINT_MAX) + UINT_MAX; + assert (e == 4294967299); + + volatile unsigned long f = (unsigned long)(a + UINT_MAX) - UINT_MAX; + assert (f == 18446744069414584323ul); + + volatile long g = (long)(a - 4) + 1; + assert (g == 4294967296); + + volatile long h = (long)(aa + UINT_MAX) + 1; + assert (h == 3); + + /* Zero-extend. */ + volatile unsigned long i = (unsigned long)(bb + 4294967294u) + 50; + assert (i == 9294967295); + + /* Sign-extend. */ + volatile unsigned long j = (unsigned long)(cc + 4294967294u) + 80; + assert (j == 82); + + volatile unsigned long k = foo (-25); + assert (k == 9223372036854775792l); +} diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c new file mode 100644 index 000..19b787b61b2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ccp1-details" } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 11 "ccp1" } } */ + +#include + +long foo(int a) +{ + return (long)(a - 2) + 1; +} + +long bar(int a) +{ + return (long)(a + 3) - 1; +} + +long baz(int a) +{ + return (long)(a - 1) + 2; +} + +long baf(int a) +{ + return (long)(a + 1) - 2; +} + +long bak(int a) +{ + return (long)(a + 1) + 3; +} + +long bal(int a) +{ + return (long)(a - 7) - 4; +} + +long bam(int a) +{ + return (long)(a - 1) - INT_MAX; +} + +long bam2(int a) +{ + return (long)(a + 1) + INT_MAX; +} + +long ban(int a) +{ + return (long)(a - 1) + INT_MIN; +} + +long ban2(int a) +{ + return (long)(a + 1) - INT_MIN; +} + +unsigned long baq(int a) +{ + return (unsigned long)(a + 1) - 1; +} diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c new file mode 100644 index 000..71ab0807d7e --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ccp1-details -fdump-tree-evrp-details -fdump-tree-vrp1-details" } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 1 "ccp1" } } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "vrp1" } } */ + +#include + +unsigned long foo(unsigned int a) +{ + if (a > 3 && a < INT_MAX - 100) +return (unsigned long)(a - 1) + 1; +} + +unsigned
[Ada] Wrong dispatching call in type with aspect Implicit_Dereference
When a record type with an an access to class-wide type discriminant has aspect Implicit_Dereference, and the discriminant is used as the controlling argument of a dispatching call, the compiler may generate wrong code to dispatch the call. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Javier Miranda gcc/ada/ * sem_res.adb (Resolve_Selected_Component): When the type of the component is an access to a class-wide type and the type of the context is an access to a tagged type the relevant type is that of the component (since in such case we may need to generate implicit type conversions or dispatching calls). gcc/testsuite/ * gnat.dg/tagged3.adb, gnat.dg/tagged3_pkg.adb, gnat.dg/tagged3_pkg.ads: New testcase.--- gcc/ada/sem_res.adb +++ gcc/ada/sem_res.adb @@ -10598,6 +10598,10 @@ package body Sem_Res is pragma Assert (Found); Resolve (P, It1.Typ); + + -- In general the expected type is the type of the context, not the + -- type of the candidate selected component. + Set_Etype (N, Typ); Set_Entity_With_Checks (S, Comp1); @@ -10610,6 +10614,17 @@ package body Sem_Res is if Ekind (Typ) = E_Anonymous_Access_Subprogram_Type then Set_Etype (N, Etype (Comp1)); + + -- When the type of the component is an access to a class-wide type + -- the relevant type is that of the component (since in such case we + -- may need to generate implicit type conversions or dispatching + -- calls). + + elsif Is_Access_Type (Typ) + and then not Is_Class_Wide_Type (Designated_Type (Typ)) + and then Is_Class_Wide_Type (Designated_Type (Etype (Comp1))) + then +Set_Etype (N, Etype (Comp1)); end if; else --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/tagged3.adb @@ -0,0 +1,42 @@ +-- { dg-do run } + +with Tagged3_Pkg; use Tagged3_Pkg; +procedure Tagged3 is + package SP is + type Ref is tagged private; + + procedure Set (Self : in out Ref'Class; Data : Parent'Class); + + type Reference_Type (Element : access Parent'Class) + is limited null record with Implicit_Dereference => Element; + + function Get (Self : Ref'Class) return Reference_Type; + + private + type Element_Access is access all Parent'Class; + type Ref is tagged record + Data : Element_Access; + end record; + end; + + package body SP is + procedure Set (Self : in out Ref'Class; Data : Parent'Class) is + begin + Self.Data := new Parent'Class'(Data); + end; + + function Get (Self : Ref'Class) return Reference_Type is + begin + return Reference_Type'(Element => Self.Data); + end; + end; + + DC : Child; + RC : SP.Ref; +begin + RC.Set (DC); + Prim1 (RC.Get.Element); -- Test + if not Tagged3_Pkg.Child_Prim1_Called then + raise Program_Error; + end if; +end; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/tagged3_pkg.adb @@ -0,0 +1,12 @@ +with Ada.Text_IO; use Ada.Text_IO; +package body Tagged3_Pkg is + procedure Prim1 (Self : access Parent) is + begin + raise Program_Error; + end; + + procedure Prim1 (Self : access Child) is + begin + Child_Prim1_Called := True; + end; +end; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/tagged3_pkg.ads @@ -0,0 +1,9 @@ +package Tagged3_Pkg is + type Parent is tagged null record; + procedure Prim1 (Self : access Parent); + + type Child is new Parent with null record; + procedure Prim1 (Self : access Child); + + Child_Prim1_Called : Boolean := False; +end;
[PATCH 0/3] Simplify wrapped binops.
Hi, after a long time I remembered to dig out this patch again. It deals with simplifying additions wrapped inside a cast like (larger_type)(var + const1) + const2 to (larger_type)(var + combined_const1_const2). The original pattern is created quite frequently on S/390 (see PR 69526) and manifests similarly to addr1,-1 extend r1,r1 addr1,1 where the adds could be avoided entirely. This is the tree part of the fix, it will still be necessary to correct rtl code generation in doloop later. Bootstrapped and regtested on s390x, x86 running. Regards Robin -- Robin Dapp (3): Perform fold when propagating. Add simplify rules for wrapped binary operations. Add new test cases for wrapped binop simplification. gcc/fortran/trans-intrinsic.c | 5 +- gcc/gimple-loop-versioning.cc | 6 ++ gcc/match.pd | 101 ++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C | 4 +- gcc/testsuite/gcc.dg/pr35691-1.c | 4 +- gcc/testsuite/gcc.dg/pr35691-2.c | 4 +- gcc/testsuite/gcc.dg/pr35691-3.c | 4 +- gcc/testsuite/gcc.dg/pr35691-4.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c| 4 +- gcc/testsuite/gcc.dg/tree-ssa/bit-assoc.c | 4 +- .../gcc.dg/tree-ssa/copy-headers-5.c | 2 +- .../gcc.dg/tree-ssa/copy-headers-7.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/forwprop-16.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/loop-15.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr23744.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr52631.c | 4 +- .../gcc.dg/vect/vect-over-widen-23.c | 2 +- .../gcc.dg/wrapped-binop-simplify-run.c | 52 + .../gcc.dg/wrapped-binop-simplify-signed-1.c | 60 +++ .../wrapped-binop-simplify-unsigned-1.c | 36 +++ gcc/tree-ssa-propagate.c | 6 +- 21 files changed, 287 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-run.c create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c create mode 100644 gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c -- 2.17.0
[PATCH 1/3] Perform fold when propagating.
This patch performs more aggressive folding in order for the match.pd changes to kick in later. Some test cases rely on VRP doing something which now already happens during CCP so adjust them accordingly. Also, the loop versioning pass was missing one case when deconstructing addresses that would only be triggered after this patch for me: It could handle addition and subsequent convert/nop but not a convert/nop directly. This would cause the hash to be calculated differently and, in turn, cause the pass to miss a versioning opportunity. Fixed this by adding the missing case. --- gcc/fortran/trans-intrinsic.c | 5 +++-- gcc/gimple-loop-versioning.cc | 6 ++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C | 4 +++- gcc/testsuite/gcc.dg/pr35691-1.c | 4 ++-- gcc/testsuite/gcc.dg/pr35691-2.c | 4 ++-- gcc/testsuite/gcc.dg/pr35691-3.c | 4 ++-- gcc/testsuite/gcc.dg/pr35691-4.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/bit-assoc.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/forwprop-16.c| 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/loop-15.c| 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr23744.c| 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr52631.c| 4 ++-- gcc/testsuite/gcc.dg/vect/vect-over-widen-23.c | 2 +- gcc/tree-ssa-propagate.c | 6 -- 15 files changed, 36 insertions(+), 25 deletions(-) diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index a6e33833680..99ec5f34319 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -5428,8 +5428,9 @@ gfc_conv_intrinsic_findloc (gfc_se *se, gfc_expr *expr) tree type; tree tmp; tree found; - tree forward_branch; - tree back_branch; + /* Initialize here to avoid 'maybe used uninitialized'. */ + tree forward_branch = NULL_TREE; + tree back_branch = NULL_TREE; gfc_loopinfo loop; gfc_ss *arrayss; gfc_ss *maskss; diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc index 8fa19488490..36c70543402 100644 --- a/gcc/gimple-loop-versioning.cc +++ b/gcc/gimple-loop-versioning.cc @@ -1266,6 +1266,12 @@ loop_versioning::record_address_fragment (gimple *stmt, continue; } } + if (code == NOP_EXPR) + { + tree op1 = gimple_assign_rhs1 (assign); + address->terms[i].expr = strip_casts (op1); + continue; + } } i += 1; } diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C index ca7e18f66a9..83a8f0668df 100644 --- a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-dse2-details -Wno-return-type" } */ +/* { dg-options "-O2 -fdump-tree-dse2-details -fdump-tree-vrp1-details -Wno-return-type" } */ typedef __SIZE_TYPE__ size_t; extern "C" @@ -55,3 +55,5 @@ fill_vec_av_set (av_set_t av) /* { dg-final { scan-tree-dump-not "Trimming statement .head = -" "dse2" } } */ /* { dg-final { scan-tree-dump-not "mem\[^\r\n\]*, 0\\);" "dse2" } } */ +/* { dg-final { scan-tree-dump "Folded into: GIMPLE_NOP" "vrp1" } } */ +/* { dg-final { scan-tree-dump-not "memmove" "dse2" } } */ } diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c index 34dc02ab560..2e1d8bd0f43 100644 --- a/gcc/testsuite/gcc.dg/pr35691-1.c +++ b/gcc/testsuite/gcc.dg/pr35691-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ +/* { dg-options "-O2 -fdump-tree-ccp1-details" } */ int foo(int z0, unsigned z1) { @@ -9,4 +9,4 @@ int foo(int z0, unsigned z1) return t2; } -/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "ccp1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c index b89ce481c6c..d5e335bb72a 100644 --- a/gcc/testsuite/gcc.dg/pr35691-2.c +++ b/gcc/testsuite/gcc.dg/pr35691-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ +/* { dg-options "-O2 -fdump-tree-ccp1-details" } */ int foo(int z0, unsigned z1) { @@ -9,4 +9,4 @@ int foo(int z0, unsigned z1) return t2; } -/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "ccp1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-3.c b/gcc/testsuite/gcc.dg/pr35691-3.c index 75b49a6e2ca..a1896e47b59 100644 --- a/gcc/testsuite/gcc.dg/pr35691-3.c +++ b/gcc/testsuite/gcc.dg/pr35691-3.c @@ -1,5 +1,5 @@ /* { dg-do
[Ada] Build full derivation for private concurrent type
This extends the processing done for the derivation of private discriminated types to concurrent types, which is now required because this derivation is no longer redone when a subtype of the derived concurrent type is built. This increases the number of entities generated internally in the compiler but this case is sufficiently rare as not to be a real concern. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Eric Botcazou gcc/ada/ * sem_ch3.adb (Build_Derived_Concurrent_Type): Add a couple of local variables and use them. When the derived type fully constrains the parent type, rewrite it as a subtype of an implicit (unconstrained) derived type instead of the other way around. (Copy_And_Build): Deal with concurrent types and use predicates. (Build_Derived_Private_Type): Build the full derivation if needed for concurrent types too. (Build_Derived_Record_Type): Add marker comment. (Complete_Private_Subtype): Use predicates. gcc/testsuite/ * gnat.dg/discr56.adb, gnat.dg/discr56.ads, gnat.dg/discr56_pkg1.adb, gnat.dg/discr56_pkg1.ads, gnat.dg/discr56_pkg2.ads: New testcase.--- gcc/ada/sem_ch3.adb +++ gcc/ada/sem_ch3.adb @@ -6831,7 +6831,9 @@ package body Sem_Ch3 is Parent_Type : Entity_Id; Derived_Type : Entity_Id) is - Loc : constant Source_Ptr := Sloc (N); + Loc : constant Source_Ptr := Sloc (N); + Def : constant Node_Id:= Type_Definition (N); + Indic : constant Node_Id:= Subtype_Indication (Def); Corr_Record : constant Entity_Id := Make_Temporary (Loc, 'C'); Corr_Decl: Node_Id; @@ -6842,8 +6844,7 @@ package body Sem_Ch3 is -- this case. Constraint_Present : constant Boolean := - Nkind (Subtype_Indication (Type_Definition (N))) = - N_Subtype_Indication; + Nkind (Indic) = N_Subtype_Indication; D_Constraint : Node_Id; New_Constraint : Elist_Id := No_Elist; @@ -6918,36 +6919,50 @@ package body Sem_Ch3 is Expand_To_Stored_Constraint (Parent_Type, Build_Discriminant_Constraints - (Parent_Type, -Subtype_Indication (Type_Definition (N)), True)); + (Parent_Type, Indic, True)); end if; End_Scope; elsif Constraint_Present then - -- Build constrained subtype, copying the constraint, and derive - -- from it to create a derived constrained type. + -- Build an unconstrained derived type and rewrite the derived type + -- as a subtype of this new base type. declare -Loc : constant Source_Ptr := Sloc (N); -Anon : constant Entity_Id := - Make_Defining_Identifier (Loc, - Chars => New_External_Name (Chars (Derived_Type), 'T')); -Decl : Node_Id; +Parent_Base : constant Entity_Id := Base_Type (Parent_Type); +New_Base: Entity_Id; +New_Decl: Node_Id; +New_Indic : Node_Id; begin -Decl := +New_Base := + Create_Itype (Ekind (Derived_Type), N, Derived_Type, 'B'); + +New_Decl := + Make_Full_Type_Declaration (Loc, + Defining_Identifier => New_Base, + Type_Definition => + Make_Derived_Type_Definition (Loc, + Abstract_Present => Abstract_Present (Def), + Limited_Present => Limited_Present (Def), + Subtype_Indication=> + New_Occurrence_Of (Parent_Base, Loc))); + +Mark_Rewrite_Insertion (New_Decl); +Insert_Before (N, New_Decl); +Analyze (New_Decl); + +New_Indic := + Make_Subtype_Indication (Loc, +Subtype_Mark => New_Occurrence_Of (New_Base, Loc), +Constraint => Relocate_Node (Constraint (Indic))); + +Rewrite (N, Make_Subtype_Declaration (Loc, -Defining_Identifier => Anon, -Subtype_Indication => - New_Copy_Tree (Subtype_Indication (Type_Definition (N; -Insert_Before (N, Decl); -Analyze (Decl); +Defining_Identifier => Derived_Type, +Subtype_Indication => New_Indic)); -Rewrite (Subtype_Indication (Type_Definition (N)), - New_Occurrence_Of (Anon, Loc)); -Set_Analyzed (Derived_Type, False); Analyze (N); return; end; @@ -6978,10 +6993,7 @@ package body Sem_Ch3 is -- Verify that new discriminants are used to constrain old
[Ada] Show Bit_Order and Scalar_Storage_Order in -gnatR4 output
This patch modifies the behavior of -gnatR4 so that representation information for bit and scalar storage order gets displayed in all cases and not just when defaults are overriden. -- Source -- -- pkg.ads package Pkg is type Root is tagged record Data0 : Integer; end record; type Derived is new Root with record Data1 : Integer; end record; end Pkg; - -- Compilation -- - $ gnatmake -gnatR4 pkg.ads Representation information for unit Pkg (spec) -- for Root'Size use 128; for Root'Alignment use 8; for Root use record Data0 at 8 range 0 .. 31; end record; for Root'Bit_Order use System.Low_Order_First; for Root'Scalar_Storage_Order use System.Low_Order_First; for Trootc'Size use 0; for Trootc'Alignment use 0; for Trootc use record end record; for Trootc'Bit_Order use System.Low_Order_First; for Trootc'Scalar_Storage_Order use System.Low_Order_First; for Derived'Size use 192; for Derived'Alignment use 8; for Derived use record Data0 at 8 range 0 .. 31; Data1 at 16 range 0 .. 31; end record; for Derived'Bit_Order use System.Low_Order_First; for Derived'Scalar_Storage_Order use System.Low_Order_First; for Tderivedc'Size use 0; for Tderivedc'Alignment use 0; for Tderivedc use record Data0 at 8 range 0 .. 31; Data1 at 16 range 0 .. 31; end record; for Tderivedc'Bit_Order use System.Low_Order_First; for Tderivedc'Scalar_Storage_Order use System.Low_Order_First;i Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Justin Squirek gcc/ada/ * repinfo.adb (List_Scalar_Storage_Order): Modify conditionals for displaying ordering to always be triggered when -gnatR4 is in effect.--- gcc/ada/repinfo.adb +++ gcc/ada/repinfo.adb @@ -1816,8 +1816,15 @@ package body Repinfo is begin -- For record types, list Bit_Order if not default, or if SSO is shown + -- Also, when -gnatR4 is in effect always list bit order and scalar + -- storage order explicitly, so that you don't need to know the native + -- endianness of the target for which the output was produced in order + -- to interpret it. + if Is_Record_Type (Ent) -and then (List_SSO or else Reverse_Bit_Order (Ent)) +and then (List_SSO + or else Reverse_Bit_Order (Ent) + or else List_Representation_Info = 4) then List_Attr ("Bit_Order", Reverse_Bit_Order (Ent)); end if; @@ -1825,7 +1832,7 @@ package body Repinfo is -- List SSO if required. If not, then storage is supposed to be in -- native order. - if List_SSO then + if List_SSO or else List_Representation_Info = 4 then List_Attr ("Scalar_Storage_Order", Reverse_Storage_Order (Ent)); else pragma Assert (not Reverse_Storage_Order (Ent));
[Ada] Add conformance check on actual subp. in instance of child unit
This patch properly diagnoses a conformance error between a formal subprogram and the corresponding actual, when the instance is that of a child unit that is instantiated as a compilation unit. The error is normally suppressed on an instantiation nested within another generic, given that analysis of the enclosing generic will have performed the conformance check on the nested instance already. In the case of a child unit, its instantiation requires an explicit check if it is a compilation unit, because it has not been analyzed in the context of the parent generic. Compiling test.adb must yield: container-list.ads:3:01: instantiation error at new_container_g-list_g.ads:12 container-list.ads:3:01: not mode conformant with declaration at types.ads:5 container-list.ads:3:01: mode of "Self" does not match with New_Container_G.List_G; pragma Elaborate_All (New_Container_G.List_G); package Container.List is new Container.List_G (Init => Types.Init_Object); with Types; with Erreur; with New_Container_G; pragma Elaborate_All (New_Container_G); package Container is new New_Container_G ( Element_T => Types.Integer_T, Pos_Range_T=> Types.Integer_Idx_T, Container_Name => Erreur.None); package Erreur is type Container_Name_T is (None, Everything); end; package body New_Container_G.List_G is function Get_Element_At_Pos (Self : access List_T; Pos : in Index_Range_T) return Element_Ptr is begin if not Self.T_Status (Pos) then Erreur.Treat_Container_Error (Error_Name => Erreur.Element_Not_Valid, Container_Name => Container_Name, Procedure_Name => Erreur.Get_Element_Ptr_At_Pos, Context=> Erreur.Null_Context_C); end if; return Pos; end Get_Element_At_Pos; function Get_Element_At_Pos (Self : in List_T; Pos : in Index_Range_T) return Element_T is begin if not Self.T_Status (Pos) then Erreur.Treat_Container_Error (Error_Name => Erreur.Element_Not_Valid, Container_Name => Container_Name, Procedure_Name => Erreur.Get_Element_At_Pos, Context=> Erreur.Null_Context_C); end if; return Self.Data (Pos); end Get_Element_At_Pos; procedure Add_New (Self : in out List_T; Pos :out Pos_Range_T) is Free_Found : Boolean := False; begin if Self.First_Free = Rbc_Constants.Null_Pos then Pos := Rbc_Constants.Null_Pos; else Self.Size := Self.Size + 1; Self.T_Status (Self.First_Free) := True; Pos := Self.First_Free; Init (Self.Data (Pos)); if Self.First_Relevant not in Rbc_Constants.Null_Pos + 1 .. Self.First_Free then Self.First_Relevant := Self.First_Free; end if; while not (Free_Found or Self.First_Free = Rbc_Constants.Null_Pos) loop if Self.First_Free = Pos_Range_T'Last then Self.First_Free := Rbc_Constants.Null_Pos; else Self.First_Free := Self.First_Free + 1; if not Self.T_Status (Self.First_Free) then Free_Found := True; end if; end if; end loop; end if; end Add_New; procedure Add_New_At_Pos (Self : in out List_T; Pos : in out Index_Range_T) is Free_Found : Boolean := False; begin if Self.T_Status (Pos) then Erreur.Treat_Container_Error (Error_Name => Erreur.Element_Not_Valid, Container_Name => Container_Name, Procedure_Name => Erreur.Add_New_At_Pos, Context=> Erreur.Null_Context_C); else Self.Size := Self.Size + 1; Self.T_Status (Pos) := True; Init (Self.Data (Pos)); if Self.First_Relevant = Rbc_Constants.Null_Pos or Pos < Self.First_Relevant then Self.First_Relevant := Pos; end if; if Self.First_Free = Pos then -- Look for a new First_Free while not (Free_Found or Self.First_Free = Rbc_Constants.Null_Pos) loop if Self.First_Free = Pos_Range_T'Last then Self.First_Free := Rbc_Constants.Null_Pos; else Self.First_Free := Self.First_Free + 1; if not Self.T_Status (Self.First_Free) then Free_Found := True; end if; end if; end loop; end if; - end if; end Add_New_At_Pos; procedure Clear (Self : out List_T) is begin Self.T_Status := (others => False); Self.First_Free := Init_First_Free; Self.First_Relevant := Rbc_Constants.Null_Pos; Self.Size := Empty; end Clear; procedure Remove_Element_At_Pos (Self : in out List_T; Pos : in Index_Range_T) is Relevant_Found : Boolean := False; begin -- REMOVE ITEM IF VALID - if not Self.T_Status (Pos) then
[Ada] Implement pragma Max_Entry_Queue_Length
This patch implements AI12-0164-1 for the aspect/pragma Max_Entry_Queue_Length. Previously, the GNAT specific pragma Max_Queue_Length fulfilled this role, but was not named to match the standard and thus was insufficent. -- Source -- -- pass.ads with System; package Pass is SOMETHING : constant Integer := 5; Variable : Boolean := False; protected type Protected_Example is entry A (Item : Integer) with Max_Entry_Queue_Length => 2;-- OK entry B (Item : Integer); pragma Max_Entry_Queue_Length (SOMETHING); -- OK entry C (Item : Integer); -- OK entry D (Item : Integer) with Max_Entry_Queue_Length => 4;-- OK entry D (Item : Integer; Item_B : Integer) with Max_Entry_Queue_Length => Float'Digits; -- OK entry E (Item : Integer); pragma Max_Entry_Queue_Length (SOMETHING * 2); -- OK entry E (Item : Integer; Item_B : Integer); pragma Max_Entry_Queue_Length (11); -- OK entry F (Item : Integer; Item_B : Integer); pragma Pre (Variable = True); pragma Max_Entry_Queue_Length (11); -- OK entry G (Item : Integer; Item_B : Integer) with Pre => (Variable = True), Max_Entry_Queue_Length => 11; -- OK private Data : Boolean := True; end Protected_Example; Prot_Ex : Protected_Example; end Pass; -- fail.ads package Fail is -- Not near entry pragma Max_Entry_Queue_Length (40);-- ERROR -- Task type task type Task_Example is entry Insert (Item : in Integer) with Max_Entry_Queue_Length => 10; -- ERROR -- Entry family in task type entry A (Positive) (Item : in Integer) with Max_Entry_Queue_Length => 10; -- ERROR end Task_Example; Task_Ex : Task_Example; -- Aspect applied to protected type protected type Protected_Failure_0 with Max_Entry_Queue_Length => 50 is-- ERROR entry A (Item : Integer); private Data : Integer := 0; end Protected_Failure_0; Protected_Failure_0_Ex : Protected_Failure_0; protected type Protected_Failure is pragma Max_Entry_Queue_Length (10); -- ERROR -- Duplicates entry A (Item : Integer) with Max_Entry_Queue_Length => 10; -- OK pragma Max_Entry_Queue_Length (4); -- ERROR entry B (Item : Integer); pragma Max_Entry_Queue_Length (40); -- OK pragma Max_Entry_Queue_Length (4); -- ERROR entry C (Item : Integer) with Max_Entry_Queue_Length => 10, -- OK Max_Entry_Queue_Length => 40; -- ERROR -- Duplicates with the same value entry AA (Item : Integer) with Max_Entry_Queue_Length => 10; -- OK pragma Max_Entry_Queue_Length (10); -- ERROR entry BB (Item : Integer); pragma Max_Entry_Queue_Length (40); -- OK pragma Max_Entry_Queue_Length (40); -- ERROR entry CC (Item : Integer) with Max_Entry_Queue_Length => 10, -- OK Max_Entry_Queue_Length => 10; -- ERROR -- On subprogram procedure D (Item : Integer) with Max_Entry_Queue_Length => 10; -- ERROR procedure E (Item : Integer); pragma Max_Entry_Queue_Length (4); -- ERROR function F (Item : Integer) return Integer with Max_Entry_Queue_Length => 10; -- ERROR function G (Item : Integer) return Integer; pragma Max_Entry_Queue_Length (4); -- ERROR -- Bad parameters entry H (Item : Integer) with Max_Entry_Queue_Length => 0;-- ERROR entry I (Item : Integer) with Max_Entry_Queue_Length => -1; -- ERROR entry J (Item : Integer) with Max_Entry_Queue_Length => 16#____#; -- ERROR entry K (Item : Integer) with Max_Entry_Queue_Length => False;-- ERROR entry L (Item : Integer) with Max_Entry_Queue_Length => "JUNK"; -- ERROR entry M (Item : Integer) with Max_Entry_Queue_Length => 1.0; -- ERROR entry N (Item : Integer) with Max_Entry_Queue_Length => Long_Integer'(3); -- ERROR -- Entry family entry
[Ada] Fix spurious instantiation error on private record type
This change was initially aimed at fixing a spurious instantiation error due to a disambiguation issue which happens when a generic unit with two formal type parameters is instantiated on a single actual type that is private. The compiler internally sets the Is_Generic_Actual_Type flag on the actual subtypes built for the instantiation in order to ease the disambiguation, but it would fail to set it on the full view if the subtypes are private. The change makes it so that the flag is properly set and reset on the full view in this case. But this uncovered an issue in Subtypes_Statically_Match, which was relying on a stalled Is_Generic_Actual_Type flag set on a full view outside of the instantiation to return a positive answer. This bypass was meant to solve an issue arising with a private discriminated record type whose completion is a discriminated record type itself derived from a private discriminated record type, which is used as actual type in an instantiation in another unit, and the instantiation is used in a child unit of the original unit. In this case, the private and full views of the generic actual type are swapped in the child unit, but there was a mismatch between the chain of full and underlying full views of the private discriminated record type and that of the generic actual type. This secondary issue is solved by avoiding to skip the full view in the preparation of the completion of the private subtype and by directly constraining the underlying full view of the full view of the base type instead of building an underlying full view from scratch. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Eric Botcazou gcc/ada/ * sem_ch3.adb (Build_Underlying_Full_View): Delete. (Complete_Private_Subtype): Do not set the full view on the private subtype here. If the full base is itself derived from private, do not re-derive the parent type but instead constrain an existing underlying full view. (Prepare_Private_Subtype_Completion): Do not get to the underlying full view, if any. Set the full view on the private subtype here. (Process_Full_View): Likewise. * sem_ch12.adb (Check_Generic_Actuals): Also set Is_Generic_Actual_Type on the full view if the type of the actual is private. (Restore_Private_Views): Also reset Is_Generic_Actual_Type on the full view if the type of the actual is private. * sem_eval.adb (Subtypes_Statically_Match): Remove bypass for generic actual types. gcc/testsuite/ * gnat.dg/generic_inst10.adb, gnat.dg/generic_inst10_pkg.ads: New testcase.--- gcc/ada/sem_ch12.adb +++ gcc/ada/sem_ch12.adb @@ -6804,7 +6804,12 @@ package body Sem_Ch12 is Check_Private_View (Subtype_Indication (Parent (E))); end if; -Set_Is_Generic_Actual_Type (E, True); +Set_Is_Generic_Actual_Type (E); + +if Is_Private_Type (E) and then Present (Full_View (E)) then + Set_Is_Generic_Actual_Type (Full_View (E)); +end if; + Set_Is_Hidden (E, False); Set_Is_Potentially_Use_Visible (E, In_Use (Instance)); @@ -14603,6 +14608,10 @@ package body Sem_Ch12 is null; else Set_Is_Generic_Actual_Type (E, False); + + if Is_Private_Type (E) and then Present (Full_View (E)) then + Set_Is_Generic_Actual_Type (Full_View (E), False); + end if; end if; -- An unusual case of aliasing: the actual may also be directly --- gcc/ada/sem_ch3.adb +++ gcc/ada/sem_ch3.adb @@ -232,18 +232,6 @@ package body Sem_Ch3 is -- Needs a more complete spec--what are the parameters exactly, and what -- exactly is the returned value, and how is Bound affected??? - procedure Build_Underlying_Full_View - (N : Node_Id; - Typ : Entity_Id; - Par : Entity_Id); - -- If the completion of a private type is itself derived from a private - -- type, or if the full view of a private subtype is itself private, the - -- back-end has no way to compute the actual size of this type. We build - -- an internal subtype declaration of the proper parent type to convey - -- this information. This extra mechanism is needed because a full - -- view cannot itself have a full view (it would get clobbered during - -- view exchanges). - procedure Check_Access_Discriminant_Requires_Limited (D : Node_Id; Loc : Node_Id); @@ -10447,111 +10435,6 @@ package body Sem_Ch3 is return New_Bound; end Build_Scalar_Bound; - - -- Build_Underlying_Full_View -- - - - procedure Build_Underlying_Full_View - (N : Node_Id; - Typ : Entity_Id; - Par : Entity_Id) - is - Loc : constant Source_Ptr := Sloc (N); - Subt :
[Ada] Small cleanup and improvement in inlining machinery
This is a small cleanup in the inlining machinery of the front-end dealing with back-end inlining. It should save a few cycles at -O0 by stopping it from doing useless work. No functional changes. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Eric Botcazou gcc/ada/ * exp_ch6.adb: Remove with and use clauses for Sem_Ch12. (Expand_Call_Helper): Swap the back-end inlining case and the special front-end expansion case. In back-end inlining mode, do not invoke Add_Inlined_Body unless the call may be inlined. * inline.ads (Add_Pending_Instantiation): New function moved from... * inline.adb (Add_Inlined_Body): Simplify comment. Turn test on the enclosing unit into assertion. (Add_Pending_Instantiation): New function moved from... * sem_ch12.ads (Add_Pending_Instantiation): ...here. * sem_ch12.adb (Add_Pending_Instantiation): ...here.--- gcc/ada/exp_ch6.adb +++ gcc/ada/exp_ch6.adb @@ -60,7 +60,6 @@ with Sem; use Sem; with Sem_Aux; use Sem_Aux; with Sem_Ch6; use Sem_Ch6; with Sem_Ch8; use Sem_Ch8; -with Sem_Ch12; use Sem_Ch12; with Sem_Ch13; use Sem_Ch13; with Sem_Dim; use Sem_Dim; with Sem_Disp; use Sem_Disp; @@ -4316,15 +4315,15 @@ package body Exp_Ch6 is if not Is_Inlined (Subp) then null; - -- Frontend inlining of expression functions (performed also when - -- backend inlining is enabled). + -- Front-end inlining of expression functions (performed also when + -- back-end inlining is enabled). elsif Is_Inlinable_Expression_Function (Subp) then Rewrite (N, New_Copy (Expression_Of_Expression_Function (Subp))); Analyze (N); return; - -- Handle frontend inlining + -- Handle front-end inlining elsif not Back_End_Inlining then Inlined_Subprogram : declare @@ -4420,27 +4419,36 @@ package body Exp_Ch6 is end if; end Inlined_Subprogram; - -- Back end inlining: let the back end handle it + -- Front-end expansion of simple functions returning unconstrained + -- types (see Check_And_Split_Unconstrained_Function). Note that the + -- case of a simple renaming (Body_To_Inline in N_Entity below, see + -- also Build_Renamed_Body) cannot be expanded here because this may + -- give rise to order-of-elaboration issues for the types of the + -- parameters of the subprogram, if any. + + elsif Present (Unit_Declaration_Node (Subp)) + and then Nkind (Unit_Declaration_Node (Subp)) = + N_Subprogram_Declaration + and then Present (Body_To_Inline (Unit_Declaration_Node (Subp))) + and then + Nkind (Body_To_Inline (Unit_Declaration_Node (Subp))) not in + N_Entity + then +Expand_Inlined_Call (Call_Node, Subp, Orig_Subp); + + -- Back-end inlining either if optimization is enabled or the call is + -- required to be inlined. - elsif No (Unit_Declaration_Node (Subp)) - or else Nkind (Unit_Declaration_Node (Subp)) /= - N_Subprogram_Declaration - or else No (Body_To_Inline (Unit_Declaration_Node (Subp))) - or else Nkind (Body_To_Inline (Unit_Declaration_Node (Subp))) in - N_Entity + elsif Optimization_Level > 0 + or else Has_Pragma_Inline_Always (Subp) then Add_Inlined_Body (Subp, Call_Node); --- If the inlined call appears within an instantiation and either --- is required to be inlined or optimization is enabled, ensure +-- If the inlined call appears within an instance, then ensure -- that the enclosing instance body is available so the back end -- can actually perform the inlining. -if In_Instance - and then Comes_From_Source (Subp) - and then (Has_Pragma_Inline_Always (Subp) - or else Optimization_Level > 0) -then +if In_Instance and then Comes_From_Source (Subp) then declare Decl : Node_Id; Inst : Entity_Id; @@ -4491,16 +4499,6 @@ package body Exp_Ch6 is end if; end; end if; - - -- Front end expansion of simple functions returning unconstrained - -- types (see Check_And_Split_Unconstrained_Function). Note that the - -- case of a simple renaming (Body_To_Inline in N_Entity above, see - -- also Build_Renamed_Body) cannot be expanded here because this may -
[Ada] Avoid crash in GNATprove_Mode on allocator inside type
In the special mode for GNATprove, subtypes should be declared for allocators when constraints are used. This was done previously but it does not work inside spec expressions, as the declaration is not inserted and analyzed in the AST in that case, leading to a later crash on an incomplete entity. Thus, no declaration should be created in such a case, letting GNATprove later reject such code due to the use of an allocator in an interfering context. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Yannick Moy gcc/ada/ * sem_ch4.adb (Analyze_Allocator): Do not insert subtype declaration for allocator inside a spec expression. gcc/testsuite/ * gnat.dg/allocator2.adb, gnat.dg/allocator2.ads: New testcase.--- gcc/ada/sem_ch4.adb +++ gcc/ada/sem_ch4.adb @@ -676,9 +676,15 @@ package body Sem_Ch4 is -- In GNATprove mode we need to preserve the link between -- the original subtype indication and the anonymous subtype, - -- to extend proofs to constrained acccess types. - - if Expander_Active or else GNATprove_Mode then + -- to extend proofs to constrained acccess types. We only do + -- that outside of spec expressions, otherwise the declaration + -- cannot be inserted and analyzed. In such a case, GNATprove + -- later rejects the allocator as it is not used here in + -- a non-interfering context (SPARK 4.8(2) and 7.1.3(12)). + + if Expander_Active + or else (GNATprove_Mode and then not In_Spec_Expression) + then Def_Id := Make_Temporary (Loc, 'S'); Insert_Action (E, --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/allocator2.adb @@ -0,0 +1,6 @@ +-- { dg-do compile } +-- { dg-options "-gnatd.F" } + +package body Allocator2 is + procedure Dummy is null; +end Allocator2; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/allocator2.ads @@ -0,0 +1,15 @@ +pragma SPARK_Mode; +package Allocator2 is +type Nat_Array is array (Positive range <>) of Natural with + Default_Component_Value => 0; +type Nat_Stack (Max : Natural) is record + Content : Nat_Array (1 .. Max); +end record; +type Stack_Acc is access Nat_Stack; +type My_Rec is private; +private +type My_Rec is record + My_Stack : Stack_Acc := new Nat_Stack (Max => 10); +end record; + procedure Dummy; +end Allocator2;
[Ada] Avoid spurious errors on dimensionality checking in GNATprove
Complete the partial treatment that was started in r273405. Instead of checking for the special case of nodes inside inlined bodies at the call site, check for this condition inside the top-level procedures called for dimensionality checking. There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Yannick Moy gcc/ada/ * sem_dim.adb (Analyze_Dimension, Analyze_Dimension_Array_Aggregate, Analyze_Dimension_Call, Analyze_Dimension_Extension_Or_Record_Aggregate): Return immediately when inside an inlined body. * sem_res.adb (Resolve_Call): Remove special checking now done inside Analyze_Dimension_Call.--- gcc/ada/sem_dim.adb +++ gcc/ada/sem_dim.adb @@ -1142,6 +1142,11 @@ package body Sem_Dim is if Ada_Version < Ada_2012 then return; + -- Inlined bodies have already been checked for dimensionality + + elsif In_Inlined_Body then + return; + elsif not Comes_From_Source (N) then if Nkind_In (N, N_Explicit_Dereference, N_Identifier, @@ -1245,10 +1250,13 @@ package body Sem_Dim is -- Aspect is an Ada 2012 feature. Nothing to do here if the component -- base type is not a dimensioned type. + -- Inlined bodies have already been checked for dimensionality. + -- Note that here the original node must come from source since the -- original array aggregate may not have been entirely decorated. if Ada_Version < Ada_2012 +or else In_Inlined_Body or else not Comes_From_Source (Original_Node (N)) or else not Has_Dimension_System (Base_Type (Comp_Typ)) then @@ -1634,10 +1642,11 @@ package body Sem_Dim is begin -- Aspect is an Ada 2012 feature. Note that there is no need to check - -- dimensions for calls that don't come from source, or those that may - -- have semantic errors. + -- dimensions for calls in inlined bodies, or calls that don't come + -- from source, or those that may have semantic errors. if Ada_Version < Ada_2012 +or else In_Inlined_Body or else not Comes_From_Source (N) or else Error_Posted (N) then @@ -1966,11 +1975,12 @@ package body Sem_Dim is begin -- Aspect is an Ada 2012 feature. Note that there is no need to check - -- dimensions for aggregates that don't come from source, or if we are - -- within an initialization procedure, whose expressions have been - -- checked at the point of record declaration. + -- dimensions in inlined bodies, or for aggregates that don't come + -- from source, or if we are within an initialization procedure, whose + -- expressions have been checked at the point of record declaration. if Ada_Version < Ada_2012 +or else In_Inlined_Body or else not Comes_From_Source (N) or else Inside_Init_Proc then --- gcc/ada/sem_res.adb +++ gcc/ada/sem_res.adb @@ -6952,9 +6952,7 @@ package body Sem_Res is -- Check the dimensions of the actuals in the call. For function calls, -- propagate the dimensions from the returned type to N. - if not In_Inlined_Body then - Analyze_Dimension_Call (N, Nam); - end if; + Analyze_Dimension_Call (N, Nam); -- All done, evaluate call and deal with elaboration issues
[Ada] Spurious error on nested instantiation
This fixes a spurious error given by the compiler for a call to a subprogram which is the formal subprogram parameter of a generic package, if the generic package is instantiated in the body of an enclosing generic package with two formal types and two formal subprogram parameter homonyms taking them, and this instantiation takes one the two formal types as actual, and the enclosing generic package is instantiated on the same actual type with a single actual subprogram parameter, and the aforementioned call is overloaded. In this case, the renaming generated for the actual subprogram parameter in the nested instantiation is ambiguous and must be disambiguated using the corresponding formal parameter of the enclosing instantiation, otherwise a (sub)type mismatch is created and later subprogram disambiguation is not really possible. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Eric Botcazou gcc/ada/ * sem_ch4.adb (Analyze_One_Call): Remove bypass for type mismatch in nested instantiations. * sem_ch8.adb (Find_Nearer_Entity): New function. (Find_Renamed_Entity): Use it to disambiguate the candidates for the renaming generated for an instantiation when it is ambiguous. gcc/testsuite/ * gnat.dg/generic_inst9.adb, gnat.dg/generic_inst9.ads, gnat.dg/generic_inst9_pkg1-operator.ads, gnat.dg/generic_inst9_pkg1.ads, gnat.dg/generic_inst9_pkg2.adb, gnat.dg/generic_inst9_pkg2.ads: New testcase.--- gcc/ada/sem_ch4.adb +++ gcc/ada/sem_ch4.adb @@ -3619,59 +3619,6 @@ package body Sem_Ch4 is Next_Actual (Actual); Next_Formal (Formal); - -- In a complex case where an enclosing generic and a nested - -- generic package, both declared with partially parameterized - -- formal subprograms with the same names, are instantiated - -- with the same type, the types of the actual parameter and - -- that of the formal may appear incompatible at first sight. - - -- generic - -- type Outer_T is private; - -- with function Func (Formal : Outer_T) - -- return ... is <>; - - -- package Outer_Gen is - -- generic - -- type Inner_T is private; - -- with function Func (Formal : Inner_T) -- (1) - -- return ... is <>; - - -- package Inner_Gen is - -- function Inner_Func (Formal : Inner_T) -- (2) - -- return ... is (Func (Formal)); - -- end Inner_Gen; - -- end Outer_Generic; - - -- package Outer_Inst is new Outer_Gen (Actual_T); - -- package Inner_Inst is new Outer_Inst.Inner_Gen (Actual_T); - - -- In the example above, the type of parameter - -- Inner_Func.Formal at (2) is incompatible with the type of - -- Func.Formal at (1) in the context of instantiations - -- Outer_Inst and Inner_Inst. In reality both types are generic - -- actual subtypes renaming base type Actual_T as part of the - -- generic prologues for the instantiations. - - -- Recognize this case and add a type conversion to allow this - -- kind of generic actual subtype conformance. Note that this - -- is done only when the call is non-overloaded because the - -- resolution mechanism already has the means to disambiguate - -- similar cases. - - elsif not Is_Overloaded (Name (N)) - and then Is_Type (Etype (Actual)) - and then Is_Type (Etype (Formal)) - and then Is_Generic_Actual_Type (Etype (Actual)) - and then Is_Generic_Actual_Type (Etype (Formal)) - and then Base_Type (Etype (Actual)) = - Base_Type (Etype (Formal)) - then - Rewrite (Actual, -Convert_To (Etype (Formal), Relocate_Node (Actual))); - Analyze_And_Resolve (Actual, Etype (Formal)); - Next_Actual (Actual); - Next_Formal (Formal); - -- Handle failed type check else --- gcc/ada/sem_ch8.adb +++ gcc/ada/sem_ch8.adb @@ -6721,6 +6721,15 @@ package body Sem_Ch8 is Old_S : Entity_Id; Inst : Entity_Id; + function Find_Nearer_Entity +(New_S : Entity_Id; + Old1_S : Entity_Id; + Old2_S : Entity_Id) return Entity_Id; + -- Determine whether one of Old_S1 and Old_S2 is nearer to New_S than + -- the other, and return it if so. Return Empty otherwise. We use this + -- in conjunction with Inherit_Renamed_Profile to
[Ada] Avoid crash in GNATprove due to inlining inside type
The special inlining for GNATprove should not inline calls inside record types, used for the constraints of components. There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-13 Yannick Moy gcc/ada/ * sem_res.adb (Resolve_Call): Do not inline calls inside record types.--- gcc/ada/sem_res.adb +++ gcc/ada/sem_res.adb @@ -7062,6 +7062,15 @@ package body Sem_Res is end if; end if; + -- Cannot inline a call inside the definition of a record type, + -- typically inside the constraints of the type. Calls in + -- default expressions are also not inlined, but this is + -- filtered out above when testing In_Default_Expr. + + elsif Is_Record_Type (Current_Scope) then + Cannot_Inline +("cannot inline & (inside record type)?", N, Nam_UA); + -- With the one-pass inlining technique, a call cannot be -- inlined if the corresponding body has not been seen yet.