Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170
On 08/28/2017 04:33 PM, Alan Modra wrote: > On Mon, Aug 28, 2017 at 08:27:35AM -0600, Jeff Law wrote: >> So sorry for the horrible delay. What was the final resolution here? I >> saw a lot of back and forth with HJ and yourself. 80044 is >> CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the >> ASSIGNED state. > > The patch went in trunk as 5a402d649 (r250974) and a9b2df6cc > (r251065). PR81170 and PR81295 are still open due to needing a fix > for powerpc --enable-default-pie on the branches. Last I checked, > both patches apply without any difficulty. > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00831.html I think if you want to backport, go ahead. jeff
Improve DOM's ability to derive equivalences when traversing edges
This is a two part patchkit to improve DOM's ability to derive constant equivalences that arise as a result of traversing a particular edge in the CFG. Until now we only allowed a single NAME = NAME|CONST equivalence to be associated with an edge in the CFG. Patch #1 generalizes that code so that we can record multiple simple equivalences on an edge. Much like expression equivalences, we just shove them into a vec and iterate on the vec in the appropriate places. Patch #2 has the interesting bits. Back in the gcc-7 effort I added the ability to look at the operands of a BIT_IOR_EXPR that had a zero result. In that case each operand of the BIT_IOR must have a zero value. This was to address a missed optimization regression bug during stage4. The plan was always to add analogous BIT_AND support, but I didn't feel like handling BIT_AND was appropriate at the time (no bz entry and no regressions related to that capability). I'd also had the sense that further improvements could be made here. For example, it is common for the BIT_IOR or BIT_AND to be fed by a comparison and we ought to be able to record the result of the comparison. If the comparison happened to be an equality test, then we may ultimately derive a constant for on operand of the equality test as well. It also seemed like the NOP/CONVERT_EXPR handling could be incorporated into such a change. So I pulled together some instrumentation. Lots of things generate equivalences -- but a much smaller subset of those equivalences are ultimately useful. Probably the most surprising was BIT_XOR, which allows us to generate all kinds of equivalences, but none that were useful for ultimate simplification in any of the tests I looked at. The most subtle was COND_EXPRs. We might have something like res = (a != 5) ? x : 1; We can't actually derive anything useful for "a" here, even if we know the result is one. That's because "x" could have the value 1. So you end up only being able to derive equivalences for COND_EXPRs when both arms have a constant value. That restriction dramatically reduces the utility of handling COND_EXPR -- to the point where I'm not including it. So what we end up with is BIT_AND/BIT_IOR, conversions, plus/minus, comparisons and neg/not. So when we determine that a particular SSA_NAME has a constant value, we look at the defining statement and essentially try to derive a value for the input operand(s) based on knowing the result value. If we can derive a constant value for an input operand, we record that value and recurse. In cases where we walk backwards to a condition. We will record the condition into the available expression table. The code is written such that if we find cases where the equivalences for other nodes are useful, they're easy to add. These equivalences are most useful to the threader, but I've seen them help in other cases as well. There's a half-dozen or so new tests reduced from GCC itself. Bootstrapped and regression tested on x86_64, lightly tested on ppc64le via bootstrapping and running the new tests to verify they do the right thing on a !logical_op_short_circuit target. Installing on the trunk. Jeff commit 506ac60cacbc4c4e5e166513ea83c1d2e14eaf3b Author: lawDate: Tue Aug 29 05:03:22 2017 + * tree-ssa-dom.c (class edge_info): Changed from a struct to a class. Add ctor/dtor, methods and data members. (edge_info::edge_info): Renamed from allocate_edge_info. Initialize additional members. (edge_info::~edge_info): New. (free_dom_edge_info): Delete the edge info. (record_edge_info): Use new class & associated member functions. Tighten forms for testing for edge equivalences. (record_temporary_equivalences): Iterate over the simple equivalences rather than assuming there's only one per edge. (cprop_into_successor_phis): Iterate over the simple equivalences rather than assuming there's only one per edge. (optimize_stmt): Use operand_equal_p rather than pointer equality for mini-DSE code. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251396 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index abe7d855260..258d4242f30 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,20 @@ +2017-08-28 Jeff Law + + * tree-ssa-dom.c (class edge_info): Changed from a struct + to a class. Add ctor/dtor, methods and data members. + (edge_info::edge_info): Renamed from allocate_edge_info. + Initialize additional members. + (edge_info::~edge_info): New. + (free_dom_edge_info): Delete the edge info. + (record_edge_info): Use new class & associated member functions. + Tighten forms for testing for edge equivalences. + (record_temporary_equivalences): Iterate over
Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
ping^3 Thanks, Kugan On 11 August 2017 at 16:09, Kugan Vivekanandarajahwrote: > Ping^2? > > Thanks, > Kugan > > On 21 July 2017 at 20:12, Kugan Vivekanandarajah > wrote: >> Ping ? >> >> Thanks, >> Kugan >> >> On 27 June 2017 at 11:20, Kugan Vivekanandarajah >> wrote: >>> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html added this >>> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419 >>> is enabled. >>> >>> This was added to support building kernel loadable modules. In kernel, >>> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed >>> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and >>> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid >>> loading objects with possibly offending sequence). Thus, it could only >>> support pc relative literal loads. >>> >>> However, the following patch was posted to kernel to add >>> -mpc-relative-literal-loads >>> http://www.spinics.net/lists/arm-kernel/msg476149.html >>> >>> -mpc-relative-literal-loads is unconditionally added to the kernel >>> build as can be seen from: >>> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile >>> >>> Therefore this patch removes the hunk so that applications like >>> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without >>> -mno-pc-relative-literal-loads >>> >>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>> regressions. >>> >>> Is this OK for trunk? >>> >>> Thanks, >>> Kugan >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2017-06-27 Kugan Vivekanandarajah >>> >>> * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419. >>> >>> gcc/ChangeLog: >>> >>> 2017-06-27 Kugan Vivekanandarajah >>> >>> * config/aarch64/aarch64.c (aarch64_override_options_after_change_1): >>> Disable pc relative literal load irrespective of >>> TARGET_FIX_ERR_A53_84341 >>> for default.
[PATCH], PR target/82015, add PowerPC warning for unpack_vector_int128 with illegal 2nd argument
One of the local programmers tried to use the __builtin_unpack_vector_int128 function, but his second argument was not the constant 0 or 1. The compiler put the 2nd argument into a register, but there wasn't a valid insn for this, and raised an insn not found message. GCC should warn about this illegal usage. This patch adds such a warning. While I was mucking about with this built-in function, I fixed the constraints to allow the 64-bit integer for unpack result and pack inputs to be in the traditional Altivec registers as well as the traditional floating point registers. I did a bootstrap of the compiler, and there were no regressions on a little endian power8 system. I verified that the new test is run. Can I apply this patch to the trunk? Can I apply the part of the patch from rs6000.c to the existing GCC 6/7 branches as well after a shake down period? The patch to rs6000.md is not appropriate for GCC 6 (since DImode can't go into Altivec registers). For GCC 7, it would need to be modified to use the 'wi' constraint instead of 'wa', since GCC 7 still had support for the -mno-upper-regs-di option to control access to the traditional Altivec register. [gcc] 2017-08-28 Michael MeissnerPR target/82015 * config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Insure that the second argument of the built-in functions to unpack 128-bit scalar types to 64-bit values is 0 or 1. Change to use a switch statement instead a lot of if statements. * config/rs6000/rs6000.md (unpack, FMOVE128_VSX iterator): Allow 64-bit values to be in Altivec registers as well as traditional floating point registers. (pack, FMOVE128_VSX iterator): Likewise. [gcc/testsuite] 2017-08-28 Michael Meissner PR target/82015 * gcc.target/powerpc/pr82015.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 251390) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -14001,14 +14001,17 @@ rs6000_expand_binop_builtin (enum insn_c if (arg0 == error_mark_node || arg1 == error_mark_node) return const0_rtx; - if (icode == CODE_FOR_altivec_vcfux - || icode == CODE_FOR_altivec_vcfsx - || icode == CODE_FOR_altivec_vctsxs - || icode == CODE_FOR_altivec_vctuxs - || icode == CODE_FOR_altivec_vspltb - || icode == CODE_FOR_altivec_vsplth - || icode == CODE_FOR_altivec_vspltw) + switch (icode) { +default: + break; +case CODE_FOR_altivec_vcfux: +case CODE_FOR_altivec_vcfsx: +case CODE_FOR_altivec_vctsxs: +case CODE_FOR_altivec_vctuxs: +case CODE_FOR_altivec_vspltb: +case CODE_FOR_altivec_vsplth: +case CODE_FOR_altivec_vspltw: /* Only allow 5-bit unsigned literals. */ STRIP_NOPS (arg1); if (TREE_CODE (arg1) != INTEGER_CST @@ -14017,16 +14020,15 @@ rs6000_expand_binop_builtin (enum insn_c error ("argument 2 must be a 5-bit unsigned literal"); return CONST0_RTX (tmode); } -} - else if (icode == CODE_FOR_dfptstsfi_eq_dd - || icode == CODE_FOR_dfptstsfi_lt_dd - || icode == CODE_FOR_dfptstsfi_gt_dd - || icode == CODE_FOR_dfptstsfi_unordered_dd - || icode == CODE_FOR_dfptstsfi_eq_td - || icode == CODE_FOR_dfptstsfi_lt_td - || icode == CODE_FOR_dfptstsfi_gt_td - || icode == CODE_FOR_dfptstsfi_unordered_td) -{ + break; +case CODE_FOR_dfptstsfi_eq_dd: +case CODE_FOR_dfptstsfi_lt_dd: +case CODE_FOR_dfptstsfi_gt_dd: +case CODE_FOR_dfptstsfi_unordered_dd: +case CODE_FOR_dfptstsfi_eq_td: +case CODE_FOR_dfptstsfi_lt_td: +case CODE_FOR_dfptstsfi_gt_td: +case CODE_FOR_dfptstsfi_unordered_td: /* Only allow 6-bit unsigned literals. */ STRIP_NOPS (arg0); if (TREE_CODE (arg0) != INTEGER_CST @@ -14035,13 +14037,12 @@ rs6000_expand_binop_builtin (enum insn_c error ("argument 1 must be a 6-bit unsigned literal"); return CONST0_RTX (tmode); } -} - else if (icode == CODE_FOR_xststdcqp - || icode == CODE_FOR_xststdcdp - || icode == CODE_FOR_xststdcsp - || icode == CODE_FOR_xvtstdcdp - || icode == CODE_FOR_xvtstdcsp) -{ + break; +case CODE_FOR_xststdcqp: +case CODE_FOR_xststdcdp: +case CODE_FOR_xststdcsp: +case CODE_FOR_xvtstdcdp: +case CODE_FOR_xvtstdcsp: /* Only allow 7-bit unsigned literals. */ STRIP_NOPS (arg1); if (TREE_CODE (arg1) != INTEGER_CST @@ -14050,6 +14051,21 @@ rs6000_expand_binop_builtin (enum insn_c error ("argument 2 must be a 7-bit unsigned literal"); return CONST0_RTX (tmode); }
Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)
Ping: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01087.html On 08/24/2017 02:43 PM, Martin Sebor wrote: The bulk of this patch touches what I think is considered the middle-end (attribs.c) so let me include its maintainers, Ian, Jeff, and Richard: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01087.html The high-level description and rationale for the change are here: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00602.html Thanks Martin On 08/17/2017 10:03 AM, Martin Sebor wrote: Attached is an updated patch with just the minor editorial tweaks for the issues pointed out by Marek (stray comments and spaces), and with the C++ and libstdc++ bits removed and posted separately. I also added some text to the manual to clarify the const/pure effects. I thought quite a bit more about the const/pure attributes we discussed and tried the approach of warning only on a const declaration after one with attribute pure has been used, but otherwise allowing and silently ignoring pure after const. In the end I decided against it, for a few reasons (most of which I already mentioned but just to summarize). First, there is the risk that someone will write code based on the pure declaration even if there's no intervening call to the function between it and the const one. Code tends to be sloppy, and it's also not uncommon to declare the same function more than once, for whatever reason. (The ssa-ccp-2.c test is an example of the latter.) Second, there are cases of attribute conflicts that GCC already points out that are much more benign in their effects (the ones I know about are always_inline/noinline and cold/hot). In light of the risk above it seems only helpful to include const/pure in the same set. Third, I couldn't find another pair of attributes that GCC would deliberately handle this way (silently accept both but prefer one over the other), and introducing a special case just for these two seemed like a wart. Finally, compiling Binutils, GDB, Glkibc, and the Linux kernel with the enhanced warning didn't turn up any code that does this sort of thing, either intentionally or otherwise. With that, I've left the handling unchanged. I do still have the question whether diagnosing attribute conflicts under -Wattributes is right. The manual implies that -Wattributes is for attributes in the wrong places or on the wrong entities, and that -Wignored-attributes should be expected instead when GCC decides to drop one for some reason. It is a little unfortunate that many -Wattributes warnings print just "attribute ignored" (and have done so for years). I think they should all be enhanced to also print why the attribute is ignored (e.g., "'packed' attribute on function declarations ignored/not valid/not supported" or something like that). Those that ignore attributes that would otherwise be valid e.g., because they conflict with other specifiers of the same attribute but with a different operand might then be candidate for changing to -Wignored-attributes. Thanks Martin
[PING 2] [PATCH 3/4] enhance overflow and truncation detection in strncpy and strncat (PR 81117)
Ping #2: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html On 08/23/2017 01:46 PM, Martin Sebor wrote: Ping: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html Jeff, is this version good to commit or are there any other changes you'd like to see? Martin On 08/14/2017 04:40 PM, Martin Sebor wrote: On 08/10/2017 01:29 PM, Martin Sebor wrote: diff --git a/gcc/builtins.c b/gcc/builtins.c index 016f68d..1aa9e22 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c [ ... ] + + if (TREE_CODE (type) == ARRAY_TYPE) +{ + /* Return the constant size unless it's zero (that's a zero-length + array likely at the end of a struct). */ + tree size = TYPE_SIZE_UNIT (type); + if (size && TREE_CODE (size) == INTEGER_CST + && !integer_zerop (size)) +return size; +} Q. Do we have a canonical test for the trailing array idiom? In some contexts isn't it size 1? ISTM This test needs slight improvement. Ideally we'd use some canonical test for detect the trailing array idiom rather than open-coding it here. You might look at the array index warnings in tree-vrp.c to see if it's got a canonical test you can call or factor and use. You're right, there is an API for this (array_at_struct_end_p, as Richard pointed out). I didn't want to use it because it treats any array at the end of a struct as a flexible array member, but simple tests show that that's what -Wstringop- overflow does now, and it wasn't my intention to tighten up the checking under this change. It surprises me that no tests exposed this. Let me relax the check and think about proposing to tighten it up separately. Done in the attached patch. (I opened bug 81849 for the enhancement to have -Wstringop-overflow diagnose overflows when writing to member arrays bigger than 1 element even if they're last). (I've left the handling for zero size in place because GCC allows global arrays to be declared to have zero elements.) @@ -3883,6 +3920,30 @@ expand_builtin_strncat (tree exp, rtx) return NULL_RTX; } +/* Helper to check the sizes of sequences and the destination of calls + to __builtin_strncpy (DST, SRC, LEN) and __builtin___strncpy_chk. + Returns true on success (no overflow warning), false otherwise. */ + +static bool +check_strncpy_sizes (tree exp, tree dst, tree src, tree len) +{ + tree dstsize = compute_objsize (dst, warn_stringop_overflow - 1); + + if (!check_sizes (OPT_Wstringop_overflow_, +exp, len, /*maxlen=*/NULL_TREE, src, dstsize)) +return false; + + if (!dstsize || TREE_CODE (len) != INTEGER_CST) +return true; + + if (tree_int_cst_lt (dstsize, len)) +warning_at (EXPR_LOCATION (exp), OPT_Wstringop_truncation, +"%K%qD specified bound %E exceeds destination size %E", +exp, get_callee_fndecl (exp), len, dstsize); + + return true; So in the case where you issue the warning, what should the return value be? According to the comment it should be false. It looks like you got the wrong return value for the tree_int_cst_lt (dstsize, len) test. Corrected. The return value is unused by the only caller so there is no test to exercise it. Done in the attached patch. +/* A helper of handle_builtin_stxncpy. Check to see if the specified + bound is a) equal to the size of the destination DST and if so, b) + if it's immediately followed by DST[LEN - 1] = '\0'. If a) holds + and b) does not, warn. Otherwise, do nothing. Return true if + diagnostic has been issued. + + The purpose is to diagnose calls to strncpy and stpncpy that do + not nul-terminate the copy while allowing for the idiom where + such a call is immediately followed by setting the last element + to nul, as in: + char a[32]; + strncpy (a, s, sizeof a); + a[sizeof a - 1] = '\0'; +*/ So using gsi_next to find the next statement could make the heuristic fail to find the a[sizeof a - 1] = '\0'; statement when debugging is enabled. gsi_next_nondebug would be better as it would skip over any debug insns. Thanks. I'll have to remember this. I went with this simple approach for now since it worked for GDB. If it turns out that there are important instances of this idiom that rely on intervening statements the warning can be relaxed. What might be even better would be to use the immediate uses of the memory tag. For your case there should be only one immediate use and it should point to the statement which NUL terminates the destination. Or maybe that would be worse in that you only want to allow this exception when the statements are consecutive. /* Handle a memcpy-like ({mem{,p}cpy,__mem{,p}cpy_chk}) call. If strlen of the second argument is known and length of the third argument is that plus one, strlen of the first argument is the same after this @@ -2512,6 +2789,19 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) You still need to rename strlen_optimize_stmt since after your changes it does both optimizations and warnings.
Re: [PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)
On Aug 28, 2017, at 3:56 PM, Bill Schmidtwrote: > > Hi, > > PR81833 identifies a problem with the little-endian vector multiply-sum > instructions. The original implementation is quite poor (and I am allowed > to say that, since it was mine). This patch fixes the code properly. > > The revised code still uses UNSPECs for these ops, which is not strictly > necessary, although descriptive rtl for them would be pretty complex. I've > put in a FIXME to make note of that for a future cleanup. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. I am > currently testing on powerpc64-linux-gnu for 32- and 64-bit. Provided that > testing succeeds, is this ok for trunk, and for eventual backport to all > supported releases? FYI, big-endian tests have completed successfully. Bill > > Thanks, > Bill > > > [gcc] > > 2017-08-28 Bill Schmidt > > PR target/81833 > * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a > define_insn to a define_expand. > (altivec_vsum2sws_direct): New define_insn. > (altivec_vsumsws): Convert from a define_insn to a define_expand. > > [gcc/testsuite] > > 2017-08-28 Bill Schmidt > > PR target/81833 > * gcc.target/powerpc/pr81833.c: New file. > > > Index: gcc/config/rs6000/altivec.md > === > --- gcc/config/rs6000/altivec.md (revision 251369) > +++ gcc/config/rs6000/altivec.md (working copy) > @@ -1804,51 +1804,61 @@ > "vsum4ss %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -;; FIXME: For the following two patterns, the scratch should only be > -;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should > -;; be emitted separately. > -(define_insn "altivec_vsum2sws" > - [(set (match_operand:V4SI 0 "register_operand" "=v") > -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > - (match_operand:V4SI 2 "register_operand" "v")] > - UNSPEC_VSUM2SWS)) > - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) > - (clobber (match_scratch:V4SI 3 "=v"))] > +(define_expand "altivec_vsum2sws" > + [(use (match_operand:V4SI 0 "register_operand")) > + (use (match_operand:V4SI 1 "register_operand")) > + (use (match_operand:V4SI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > if (VECTOR_ELT_ORDER_BIG) > -return "vsum2sws %0,%1,%2"; > +emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1], > +operands[2])); > else > -return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4"; > -} > - [(set_attr "type" "veccomplex") > - (set (attr "length") > - (if_then_else > - (match_test "VECTOR_ELT_ORDER_BIG") > - (const_string "4") > - (const_string "12")))]) > +{ > + rtx tmp1 = gen_reg_rtx (V4SImode); > + rtx tmp2 = gen_reg_rtx (V4SImode); > + emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2], > + operands[2], GEN_INT (12))); > + emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1)); > + emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2, > + GEN_INT (4))); > +} > + DONE; > +}) > > -(define_insn "altivec_vsumsws" > +; FIXME: This can probably be expressed without an UNSPEC. > +(define_insn "altivec_vsum2sws_direct" > [(set (match_operand:V4SI 0 "register_operand" "=v") > (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > - (match_operand:V4SI 2 "register_operand" "v")] > - UNSPEC_VSUMSWS)) > - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) > - (clobber (match_scratch:V4SI 3 "=v"))] > + (match_operand:V4SI 2 "register_operand" "v")] > + UNSPEC_VSUM2SWS))] > "TARGET_ALTIVEC" > + "vsum2sws %0,%1,%2" > + [(set_attr "type" "veccomplex") > + (set_attr "length" "4")]) > + > +(define_expand "altivec_vsumsws" > + [(use (match_operand:V4SI 0 "register_operand")) > + (use (match_operand:V4SI 1 "register_operand")) > + (use (match_operand:V4SI 2 "register_operand"))] > + "TARGET_ALTIVEC" > { > if (VECTOR_ELT_ORDER_BIG) > -return "vsumsws %0,%1,%2"; > +emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1], > + operands[2])); > else > -return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12"; > -} > - [(set_attr "type" "veccomplex") > - (set (attr "length") > - (if_then_else > - (match_test "(VECTOR_ELT_ORDER_BIG)") > - (const_string "4") > - (const_string "12")))]) > +{ > + rtx tmp1 = gen_reg_rtx (V4SImode); > + rtx tmp2 = gen_reg_rtx (V4SImode); > + emit_insn
Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)
On 08/24/2017 04:09 PM, Jeff Law wrote: On 08/22/2017 02:45 AM, Richard Biener wrote: On Mon, Aug 21, 2017 at 10:10 PM, Martin Seborwrote: On 08/09/2017 10:14 AM, Jeff Law wrote: On 08/06/2017 05:08 PM, Martin Sebor wrote: Well, simply because the way as implemented isn't a must-alias query but maybe one that's good enough for warnings (reduces false positives but surely doesn't eliminate them). I'm very interested in reducing the rate of false positives in these and all other warnings. As I mentioned in my comments, I did my best to weed them out of the implementation by building GDB, Glibc, Busybox, and the Linux kernel. That of course isn't a guarantee that there aren't any. But the first implementation of any non-trivial feature is never perfect, and hardly any warning of sufficient complexity is free of false positives, no matter here it's implemented (the middle-end, front-end, or a standalone static analysis tool). If you spotted some cases I had missed I'd certainly be grateful for examples. Otherwise, they will undoubtedly be reported as more software is exposed to the warning and, if possible, fixed, as happens with all other warnings. I think Richi is saying that the must alias query you've built isn't proper/correct. It's less about false positives for Richi and more about building a proper must alias query if I understand him correctly. I suspect he's also saying that you can't reasonably build must-alias on top of a may-alias query framework. They're pretty different queries. If you need something that is close to, but not quite a must alias query, then you're going to have to make a argument for that and you can't call it a must alias query. Attached is an updated and simplified patch that avoids making changes to any of the may-alias functions. It turns out that all the information the logic needs to determine the overlap is already in the ao_ref structures populated by ao_ref_init_from_ptr_and_size. The only changes to the pass are the enhancement to ao_ref_init_from_ptr_and_size to make use of range info and the addition of the two new functions used by the -Wrestrict clients outside the pass. Warning for memcpy (p, p, ...) is going to fire false positives all around given the C++ FE emits those in some cases and optimization can expose that we are dealing with self-assignments. And *p = *p is valid. Correct. I wound my way through this mess a while back. Essentially Red Hat had a customer with code that had overlapped memcpy arguments. We had them use the memstomp interposition library to start tracking these problems down. One of the things that popped up was structure/class copies which were implemented via calls to memcpy.In the case of self assignment, the interposition library would note the overlap and (rightly IMHO) complain. Is this bug 32667? I'm not having any luck reproducing it with any of the test cases there and varying struct sizes, or with the test case in the duplicate bug 65029 I filed for the same thing last year. It would be nice to have a test case. One could argue that GCC should emit memmove by default for structure assignments, only using memcpy when it knows its not doing self assignment (which may be hard to determine). Furthermore, GCC should eliminate self structure/class assignment. If it's still a problem emitting memmove seems like the right thing to do. From what I've read the performance advantage of memcpy over memmove seems debatable at best. Most performance sensitive code avoids making copies of very large objects so the only code that can be impacted doesn't care about efficiency quite so much. For small enough objects, inlining the copy as GCC already does would obviate the efficiency concern altogether. @@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, } } + /* Avoid folding the call if overlap is detected. */ + if (check_overlap && detect_overlap (loc, stmt, dest, src, len)) + return false; + no, please not. You diagnosed the call (which might be a false positive) so why keep it undefined? The folded stmt will either have the same semantics (aggregate copies expanded as memcpy) or have all reads performed before writes. So can we distinguish here between overlap and the self-copy case? Yes, but only in a limited subset of cases. A self-copy should just be folded away. It's no different than x = x on scalars except that we've dropped it to a memcpy in the IL. Doing so makes the code more efficient and removes false positives from tools like the memstomp interposition library, making those tools more useful. It's possible to do in simple cases but not in general. I agree that in the general case when overlap is possible the only safe solution, short of actually testing for it at runtime, is to call memmove. Martin
Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)
On 08/22/2017 02:45 AM, Richard Biener wrote: On Mon, Aug 21, 2017 at 10:10 PM, Martin Seborwrote: On 08/09/2017 10:14 AM, Jeff Law wrote: On 08/06/2017 05:08 PM, Martin Sebor wrote: Well, simply because the way as implemented isn't a must-alias query but maybe one that's good enough for warnings (reduces false positives but surely doesn't eliminate them). I'm very interested in reducing the rate of false positives in these and all other warnings. As I mentioned in my comments, I did my best to weed them out of the implementation by building GDB, Glibc, Busybox, and the Linux kernel. That of course isn't a guarantee that there aren't any. But the first implementation of any non-trivial feature is never perfect, and hardly any warning of sufficient complexity is free of false positives, no matter here it's implemented (the middle-end, front-end, or a standalone static analysis tool). If you spotted some cases I had missed I'd certainly be grateful for examples. Otherwise, they will undoubtedly be reported as more software is exposed to the warning and, if possible, fixed, as happens with all other warnings. I think Richi is saying that the must alias query you've built isn't proper/correct. It's less about false positives for Richi and more about building a proper must alias query if I understand him correctly. I suspect he's also saying that you can't reasonably build must-alias on top of a may-alias query framework. They're pretty different queries. If you need something that is close to, but not quite a must alias query, then you're going to have to make a argument for that and you can't call it a must alias query. Attached is an updated and simplified patch that avoids making changes to any of the may-alias functions. It turns out that all the information the logic needs to determine the overlap is already in the ao_ref structures populated by ao_ref_init_from_ptr_and_size. The only changes to the pass are the enhancement to ao_ref_init_from_ptr_and_size to make use of range info and the addition of the two new functions used by the -Wrestrict clients outside the pass. Warning for memcpy (p, p, ...) is going to fire false positives all around given the C++ FE emits those in some cases and optimization can expose that we are dealing with self-assignments. And *p = *p is valid. I changed it to only warn for calls to the library function and not to the __builtin_xxx. Though I haven't been able to reproduce the problem you referring to (bug 32667) which makes me wonder if it's been fixed. @@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, } } + /* Avoid folding the call if overlap is detected. */ + if (check_overlap && detect_overlap (loc, stmt, dest, src, len)) + return false; + no, please not. You diagnosed the call (which might be a false positive) so why keep it undefined? The folded stmt will either have the same semantics (aggregate copies expanded as memcpy) or have all reads performed before writes. My goal was to follow the approach reflected in the comments elsewhere in the file: /* Out of bound array access. Value is undefined, but don't fold. */ While gimple_fold_builtin_memory_op may be able to provide well- defined behavior for the otherwise undefined semantics in a small subset of cases, it doesn't attempt to fold many more that it otherwise could (it only folds calls with sizes that are powers of 2). So it seems of dubious value to make an effort in this relatively small subset of cases. In my experience, users also don't appreciate optimizations that "rely on" undefined behavior one way or the other. What they would like to see instead is that when their compiler detects undefined behavior it diagnoses it but either doesn't use it to make optimization decisions, or uses it to disable them. For calls to library functions, that in my view means making the call and not folding it. (Btw., do we have some sort of a policy or guideline for how to handle such cases in general?) With all that said, I don't see a big problem with proceeding with the folding as you suggest either, so I did and added a comment documenting it and a test to verify this guarantee. I should also acknowledge that in my approach I forgot that once the overlap has been diagnosed and the no-warning bit set, the next call to gimple_fold_builtin_memory_op() with the same statement would just go ahead and fold it anyway. So the tests were ineffective regardless. The ao_ref_init_from_ptr_and_size change misses a changelog entry. +detect_overlap (location_t loc, gimple *stmt, tree dst, tree src, tree size, + bool adjust /* = false */) +{ + ao_ref dstref, srcref; + unsigned HOST_WIDE_INT range[2]; + + /* Initialize and store the lower bound of a constant offset (in + bits), disregarding the offset for the destination. */ + ao_ref_init_from_ptr_and_size (,
Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170
On Mon, Aug 28, 2017 at 08:27:35AM -0600, Jeff Law wrote: > So sorry for the horrible delay. What was the final resolution here? I > saw a lot of back and forth with HJ and yourself. 80044 is > CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the > ASSIGNED state. The patch went in trunk as 5a402d649 (r250974) and a9b2df6cc (r251065). PR81170 and PR81295 are still open due to needing a fix for powerpc --enable-default-pie on the branches. Last I checked, both patches apply without any difficulty. https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00831.html -- Alan Modra Australia Development Lab, IBM
[PATCH, rs6000] Stop non-volatile CR usage from killing shrink-wrap
The following patch allows shrink-wrapping to succeed in the presence of non-volatile CR save/restore. The movesi_from_cr define_insn used to list all CRs as used, even though it's only the non-volatile values that we are interested in saving/restoring. This prevented the prolog from being moved past the early exit test because that compare was defining a register used in the prolog (a volatile CR). The patch removes the mentions of the volatile CRs and renames the functions involved so that it's hopefully clear they are for prolog generation only. Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for trunk? -Pat 2017-08-28 Pat Haugen* config/rs6000/rs6000.c (rs6000_emit_prolog_move_from_cr): Rename from rs6000_emit_move_from_cr and call renamed function. (rs6000_emit_prologue): Call renamed functions. * config/rs6000/rs6000.md (prolog_movesi_from_cr): Rename from prolog_movesi_from_cr, remove volatile CRs. testsuite/ChangeLog: 2017-08-28 Pat Haugen * gcc.target/powerpc/cr_shrink-wrap.c: New. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 251389) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -26083,10 +26083,14 @@ rs6000_emit_savres_rtx (rs6000_stack_t * return insn; } -/* Emit code to store CR fields that need to be saved into REG. */ +/* Emit prolog code to store CR fields that need to be saved into REG. This + function should only be called when moving the non-volatile CRs to REG, it + is not a general purpose routine to move the entire set of CRs to REG. + Specifically, gen_prolog_movesi_from_cr() does not contain uses of the + volatile CRs. */ static void -rs6000_emit_move_from_cr (rtx reg) +rs6000_emit_prolog_move_from_cr (rtx reg) { /* Only the ELFv2 ABI allows storing only selected fields. */ if (DEFAULT_ABI == ABI_ELFv2 && TARGET_MFCRF) @@ -26117,7 +26121,7 @@ rs6000_emit_move_from_cr (rtx reg) as well, using logical operations to combine the values. */ } - emit_insn (gen_movesi_from_cr (reg)); + emit_insn (gen_prolog_movesi_from_cr (reg)); } /* Return whether the split-stack arg pointer (r12) is used. */ @@ -26857,7 +26861,7 @@ rs6000_emit_prologue (void) { cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno); START_USE (cr_save_regno); - rs6000_emit_move_from_cr (cr_save_rtx); + rs6000_emit_prolog_move_from_cr (cr_save_rtx); } /* Do any required saving of fpr's. If only one or two to save, do @@ -27095,7 +27099,7 @@ rs6000_emit_prologue (void) { START_USE (0); cr_save_rtx = gen_rtx_REG (SImode, 0); - rs6000_emit_move_from_cr (cr_save_rtx); + rs6000_emit_prolog_move_from_cr (cr_save_rtx); } /* Saving CR requires a two-instruction sequence: one instruction @@ -27182,7 +27186,7 @@ rs6000_emit_prologue (void) /* ??? We might get better performance by using multiple mfocrf instructions. */ crsave = gen_rtx_REG (SImode, 0); - emit_insn (gen_movesi_from_cr (crsave)); + emit_insn (gen_prolog_movesi_from_cr (crsave)); for (i = 0; i < 8; i++) if (!call_used_regs[CR0_REGNO + i]) Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 251389) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -13032,12 +13032,14 @@ (define_insn "*movesi_from_cr_one" }" [(set_attr "type" "mfcrf")]) -(define_insn "movesi_from_cr" +;; Don't include the volatile CRs since their values are not used wrt CR save +;; in the prolog and doing so prevents shrink-wrapping because we can't move the +;; prolog past an insn (early exit test) that defines a register used in the +;; prolog. +(define_insn "prolog_movesi_from_cr" [(set (match_operand:SI 0 "gpc_reg_operand" "=r") -(unspec:SI [(reg:CC CR0_REGNO) (reg:CC CR1_REGNO) - (reg:CC CR2_REGNO) (reg:CC CR3_REGNO) - (reg:CC CR4_REGNO) (reg:CC CR5_REGNO) - (reg:CC CR6_REGNO) (reg:CC CR7_REGNO)] +(unspec:SI [(reg:CC CR2_REGNO) (reg:CC CR3_REGNO) + (reg:CC CR4_REGNO)] UNSPEC_MOVESI_FROM_CR))] "" "mfcr %0" Index: gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c === --- gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */ + +void foo(int i) +{ + if (i > 0) +/* Non-volatile CR kill on true path should not prevent shrink-wrap. */ +asm ("" : : : "cr2", "cr3"); +} + +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */
[PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)
Hi, PR81833 identifies a problem with the little-endian vector multiply-sum instructions. The original implementation is quite poor (and I am allowed to say that, since it was mine). This patch fixes the code properly. The revised code still uses UNSPECs for these ops, which is not strictly necessary, although descriptive rtl for them would be pretty complex. I've put in a FIXME to make note of that for a future cleanup. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. I am currently testing on powerpc64-linux-gnu for 32- and 64-bit. Provided that testing succeeds, is this ok for trunk, and for eventual backport to all supported releases? Thanks, Bill [gcc] 2017-08-28 Bill SchmidtPR target/81833 * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a define_insn to a define_expand. (altivec_vsum2sws_direct): New define_insn. (altivec_vsumsws): Convert from a define_insn to a define_expand. [gcc/testsuite] 2017-08-28 Bill Schmidt PR target/81833 * gcc.target/powerpc/pr81833.c: New file. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 251369) +++ gcc/config/rs6000/altivec.md(working copy) @@ -1804,51 +1804,61 @@ "vsum4ss %0,%1,%2" [(set_attr "type" "veccomplex")]) -;; FIXME: For the following two patterns, the scratch should only be -;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should -;; be emitted separately. -(define_insn "altivec_vsum2sws" - [(set (match_operand:V4SI 0 "register_operand" "=v") -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") - (match_operand:V4SI 2 "register_operand" "v")] -UNSPEC_VSUM2SWS)) - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) - (clobber (match_scratch:V4SI 3 "=v"))] +(define_expand "altivec_vsum2sws" + [(use (match_operand:V4SI 0 "register_operand")) + (use (match_operand:V4SI 1 "register_operand")) + (use (match_operand:V4SI 2 "register_operand"))] "TARGET_ALTIVEC" { if (VECTOR_ELT_ORDER_BIG) -return "vsum2sws %0,%1,%2"; +emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1], +operands[2])); else -return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4"; -} - [(set_attr "type" "veccomplex") - (set (attr "length") - (if_then_else - (match_test "VECTOR_ELT_ORDER_BIG") - (const_string "4") - (const_string "12")))]) +{ + rtx tmp1 = gen_reg_rtx (V4SImode); + rtx tmp2 = gen_reg_rtx (V4SImode); + emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2], + operands[2], GEN_INT (12))); + emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1)); + emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2, + GEN_INT (4))); +} + DONE; +}) -(define_insn "altivec_vsumsws" +; FIXME: This can probably be expressed without an UNSPEC. +(define_insn "altivec_vsum2sws_direct" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") - (match_operand:V4SI 2 "register_operand" "v")] -UNSPEC_VSUMSWS)) - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) - (clobber (match_scratch:V4SI 3 "=v"))] + (match_operand:V4SI 2 "register_operand" "v")] +UNSPEC_VSUM2SWS))] "TARGET_ALTIVEC" + "vsum2sws %0,%1,%2" + [(set_attr "type" "veccomplex") + (set_attr "length" "4")]) + +(define_expand "altivec_vsumsws" + [(use (match_operand:V4SI 0 "register_operand")) + (use (match_operand:V4SI 1 "register_operand")) + (use (match_operand:V4SI 2 "register_operand"))] + "TARGET_ALTIVEC" { if (VECTOR_ELT_ORDER_BIG) -return "vsumsws %0,%1,%2"; +emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1], + operands[2])); else -return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12"; -} - [(set_attr "type" "veccomplex") - (set (attr "length") - (if_then_else - (match_test "(VECTOR_ELT_ORDER_BIG)") - (const_string "4") - (const_string "12")))]) +{ + rtx tmp1 = gen_reg_rtx (V4SImode); + rtx tmp2 = gen_reg_rtx (V4SImode); + emit_insn (gen_altivec_vspltw_direct (tmp1, operands[2], const0_rtx)); + emit_insn (gen_altivec_vsumsws_direct (tmp2, operands[1], tmp1)); + emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2, + GEN_INT (12))); +} + DONE; +}) +; FIXME: This can probably be expressed without an UNSPEC. (define_insn
Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers
Hi Thomas, >>> I think an unconditional warning is OK >>> in this case because >>> >>> - Assigning to a pointer from an obvious non-contiguous target >>>is not useful at all, that I can see >> >> >> I guess you're talking about a *contiguous* pointer here, > > Correct. > >> and in that >> case I would argue that, beyond being "not useful", it's even illegal, >> so why not throw a hard error, if we can infer at compile-time that >> the target is non-contiguous? > > Problem is, we cannot infer this from the tests done. > We would also have to add a test if the array is empty > or that it contains only a single element, and that (I think) > is a) impossible in the general case, and b) not worth the bother. I'm not sure I understand which cases you're worried about here. Maybe you can give an example? In any case, I think your test case is a bit short, so I extended it somewhat (see attachment) and found two cases along the way where your patch throws a warning but shouldn't: r => x(::-1) r => x2(2:3,1) The first one is a stride of minus one, which is valid and contiguous AFAICS. The second one is obviously also contiguous. >> As explained above, I'd vote for an error (or at least a conditional >> warning, so that it can be disabled, like most(?) other gfortran >> warnings). > > Attached is a version which makes this a warning enabled by -Wall; > this should be enough to give people a heads-up. > > I have introduced most of your comments into the revised patch. Thank you, looks much better already. Apart from the two mishandled cases above, one other thing comes to my mind: It might be a good idea to apply your checks not only to pointer assignments, but also to dummy arguments (passing a non-contiguous array to a contiguous dummy pointer), where the same rules should apply. > So, here is the updated patch. Regression-tested on > powerpc-linux, make dvi and make pdf also passed. > OK for trunk? Not quite, but we're getting closer :) Sorry for being such a nag ... Cheers, Janus ! { dg-do compile } ! { dg-additional-options "-Wcontiguous" } program cont_01_neg implicit none real, pointer, contiguous :: r(:) real, pointer, contiguous :: r2(:,:) real, target :: x(45) real, target :: x2(5,9) integer :: i x = (/ (real(i),i=1,45) /) x2 = reshape(x,shape(x2)) i = 4 r => x(::1) r => x(::-1) ! bogus warning here r => x(::3) ! { dg-warning "Assignment to contiguous pointer" } r => x(::-3) ! { dg-warning "Assignment to contiguous pointer" } r => x(::i) ! { dg-warning "Assignment to contiguous pointer" } r2 => x2(:,2:) r2 => x2(1:,:) r2 => x2(2:,:)! { dg-warning "Assignment to contiguous pointer" } r2 => x2(:,:4) r2 => x2(:5,:) r2 => x2(:4,:)! { dg-warning "Assignment to contiguous pointer" } r2 => x2(:,2:3) r2 => x2(1:5,:) r2 => x2(2:3,:) ! { dg-warning "Assignment to contiguous pointer" } r => x2(3,:) ! { dg-warning "Assignment to contiguous pointer" } r => x2(:,3) r2 => x2(3:3,:) ! { dg-warning "Assignment to contiguous pointer" } r2 => x2(:,3:3) r2 => x2(2:3:2,:) ! { dg-warning "Assignment to contiguous pointer" } r2 => x2(:,2:3:2) ! { dg-warning "Assignment to contiguous pointer" } r => x2(1,2:3) r => x2(2:3,1)! bogus warning here end program
[PATCH v3] Fix PR81503 (SLSR invalid fold)
> On Aug 28, 2017, at 7:37 AM, Richard Biener> wrote: > > Not sure, but would it be fixed in a similar way when writing > ? Thanks, Richard, that works very well. I decided this was a good opportunity to also simplify the control flow a little with early returns. Here's the result. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this ok for trunk, and eventually for backport to gcc 5, 6, and 7? (I can omit the control flow cleanups for the older releases if desired.) Thanks, Bill [gcc] 2017-08-03 Bill Schmidt Jakub Jelinek Richard Biener PR tree-optimization/81503 * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure folded constant fits in the target type; reorder tests for clarity. [gcc/testsuite] 2017-08-03 Bill Schmidt Jakub Jelinek Richard Biener PR tree-optimization/81503 * gcc.c-torture/execute/pr81503.c: New file. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 251369) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -2089,104 +2089,104 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); - /* It is highly unlikely, but possible, that the resulting - bump doesn't fit in a HWI. Abandon the replacement - in this case. This does not affect siblings or dependents - of C. Restriction to signed HWI is conservative for unsigned - types but allows for safe negation without twisted logic. */ - if (wi::fits_shwi_p (bump) - && bump.to_shwi () != HOST_WIDE_INT_MIN - /* It is not useful to replace casts, copies, negates, or adds of -an SSA name and a constant. */ - && cand_code != SSA_NAME - && !CONVERT_EXPR_CODE_P (cand_code) - && cand_code != PLUS_EXPR - && cand_code != POINTER_PLUS_EXPR - && cand_code != MINUS_EXPR - && cand_code != NEGATE_EXPR) + /* It is not useful to replace casts, copies, negates, or adds of + an SSA name and a constant. */ + if (cand_code == SSA_NAME + || CONVERT_EXPR_CODE_P (cand_code) + || cand_code == PLUS_EXPR + || cand_code == POINTER_PLUS_EXPR + || cand_code == MINUS_EXPR + || cand_code == NEGATE_EXPR) +return; + + enum tree_code code = PLUS_EXPR; + tree bump_tree; + gimple *stmt_to_print = NULL; + + if (wi::neg_p (bump)) { - enum tree_code code = PLUS_EXPR; - tree bump_tree; - gimple *stmt_to_print = NULL; + code = MINUS_EXPR; + bump = -bump; +} - /* If the basis name and the candidate's LHS have incompatible -types, introduce a cast. */ - if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) - basis_name = introduce_cast_before_cand (c, target_type, basis_name); - if (wi::neg_p (bump)) + /* It is possible that the resulting bump doesn't fit in target_type. + Abandon the replacement in this case. This does not affect + siblings or dependents of C. */ + if (bump != wi::ext (bump, TYPE_PRECISION (target_type), + TYPE_SIGN (target_type))) +return; + + bump_tree = wide_int_to_tree (target_type, bump); + + /* If the basis name and the candidate's LHS have incompatible types, + introduce a cast. */ + if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) +basis_name = introduce_cast_before_cand (c, target_type, basis_name); + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fputs ("Replacing: ", dump_file); + print_gimple_stmt (dump_file, c->cand_stmt, 0); +} + + if (bump == 0) +{ + tree lhs = gimple_assign_lhs (c->cand_stmt); + gassign *copy_stmt = gimple_build_assign (lhs, basis_name); + gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt); + slsr_cand_t cc = c; + gimple_set_location (copy_stmt, gimple_location (c->cand_stmt)); + gsi_replace (, copy_stmt, false); + c->cand_stmt = copy_stmt; + while (cc->next_interp) { - code = MINUS_EXPR; - bump = -bump; + cc = lookup_cand (cc->next_interp); + cc->cand_stmt = copy_stmt; } - - bump_tree = wide_int_to_tree (target_type, bump); - if (dump_file && (dump_flags & TDF_DETAILS)) + stmt_to_print = copy_stmt; +} + else +{ + tree rhs1, rhs2; + if (cand_code != NEGATE_EXPR) { + rhs1 = gimple_assign_rhs1 (c->cand_stmt); + rhs2 = gimple_assign_rhs2 (c->cand_stmt); + } + if (cand_code != NEGATE_EXPR + && ((operand_equal_p (rhs1, basis_name,
Rb_tree constructor optimization
Hi Here is the always equal allocator optimization for associative containers. Tested under Linux x86_64. * include/bits/stl_tree.h (_Rb_tree_impl(_Rb_tree_impl&&, _Node_allocator&&)): New. (_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::true_type)): New. (_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::false_type)): Likewise. (_Rb_tree(_Rb_tree&&, _Node_allocator&&)): Adapt, use latters. Ok to apply ? François diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h index c2417f1..f7d34e3 100644 --- a/libstdc++-v3/include/bits/stl_tree.h +++ b/libstdc++-v3/include/bits/stl_tree.h @@ -704,6 +704,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #else _Rb_tree_impl(_Rb_tree_impl&&) = default; + _Rb_tree_impl(_Rb_tree_impl&& __x, _Node_allocator&& __a) noexcept + : _Node_allocator(std::move(__a)), + _Base_key_compare(std::move(__x)), + _Rb_tree_header(std::move(__x)) + { } + _Rb_tree_impl(const _Key_compare& __comp, _Node_allocator&& __a) : _Node_allocator(std::move(__a)), _Base_key_compare(__comp) { } @@ -947,7 +953,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _Rb_tree(std::move(__x), _Node_allocator(__a)) { } - _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a); +private: + _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::true_type) noexcept + : _M_impl(std::move(__x._M_impl), std::move(__a)) + { } + + _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::false_type); + +public: + _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a) + : _Rb_tree(std::move(__x), std::move(__a), + typename _Alloc_traits::is_always_equal{}) + { } #endif ~_Rb_tree() _GLIBCXX_NOEXCEPT @@ -1591,12 +1608,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>:: -_Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a) +_Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::false_type __eq) : _M_impl(__x._M_impl._M_key_compare, std::move(__a)) { - using __eq = typename _Alloc_traits::is_always_equal; if (__x._M_root() != nullptr) - _M_move_data(__x, __eq()); + _M_move_data(__x, __eq); } template
Re: [06/77] Make GET_MODE_WIDER return an opt_mode
On 08/28/2017 01:05 PM, Richard Sandiford wrote: > >> As for the name, get_nonvoid? Ugh. Not sure. Open to suggestions. > > I'd rather avoid "nonvoid", since the use of VOIDmode for "no mode" is > really an implementation detail in things like opt_mode . > Other possiblities might be: Yea, good point on encoding the implementation detail not being a good idea. > > - require > - demand > - mode > - get_mode > - require_mode > - demand_mode > - else_fail (to go with else_void and else_blk) > - noelse require, demand with or without the _mode suffix seem good to me. jeff
Re: std::forward_list optim for always equal allocator
Hi Any news for this patch ? It does remove a constructor: -_Fwd_list_impl(const _Node_alloc_type& __a) -: _Node_alloc_type(__a), _M_head() It was already unused before the patch. Do you think it has ever been used and so do I need to restore it ? I eventually restore the _M_head() in _Fwd_list_impl constructors cause IMO it is the inline init of _M_next in _Fwd_list_node_base which should be removed. But I remember Jonathan that you didn't want to do so because gcc was not good enough in detecting usage of uninitialized variables, is it still true ? François On 17/07/2017 22:10, François Dumont wrote: Hi Here is the patch to implement the always equal alloc optimization for forward_list. With this version there is no abi issue. I also prefer to implement the _Fwd_list_node_base move operator for consistency with the move constructor and used it where applicable. * include/bits/forward_list.h (_Fwd_list_node_base& operator=(_Fwd_list_node_base&&)): Implement. (_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New. (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, std::true_type)): New, use latter. (forward_list(forward_list&&, _Node_alloc_type&&, std::false_type)): New. (forward_list(forward_list&&, _Node_alloc_type&&, std::true_type)): New. (forward_list(forward_list&&, const _Alloc&)): Adapt to use latters. * include/bits/forward_list.tcc (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&)): Adapt to use _M_impl._M_head move assignment. (forward_list<>::merge(forward_list<>&&, _Comp)): Likewise. Tested under Linux x86_64, ok to commit ? François diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h index 9d86fcc..772e9a0 100644 --- a/libstdc++-v3/include/bits/forward_list.h +++ b/libstdc++-v3/include/bits/forward_list.h @@ -54,6 +54,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER struct _Fwd_list_node_base { _Fwd_list_node_base() = default; +_Fwd_list_node_base(_Fwd_list_node_base&& __x) noexcept + : _M_next(__x._M_next) +{ __x._M_next = nullptr; } + +_Fwd_list_node_base(const _Fwd_list_node_base&) = delete; +_Fwd_list_node_base& operator=(const _Fwd_list_node_base&) = delete; + +_Fwd_list_node_base& +operator=(_Fwd_list_node_base&& __x) noexcept +{ + _M_next = __x._M_next; + __x._M_next = nullptr; + return *this; +} _Fwd_list_node_base* _M_next = nullptr; @@ -68,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __end->_M_next = _M_next; } else - __begin->_M_next = 0; + __begin->_M_next = nullptr; _M_next = __keep; return __end; } @@ -173,7 +187,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER if (_M_node) return _Fwd_list_iterator(_M_node->_M_next); else - return _Fwd_list_iterator(0); + return _Fwd_list_iterator(nullptr); } _Fwd_list_node_base* _M_node; @@ -244,7 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER if (this->_M_node) return _Fwd_list_const_iterator(_M_node->_M_next); else - return _Fwd_list_const_iterator(0); + return _Fwd_list_const_iterator(nullptr); } const _Fwd_list_node_base* _M_node; @@ -285,11 +299,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _Fwd_list_node_base _M_head; _Fwd_list_impl() + noexcept( noexcept(_Node_alloc_type()) ) : _Node_alloc_type(), _M_head() { } -_Fwd_list_impl(const _Node_alloc_type& __a) -: _Node_alloc_type(__a), _M_head() + _Fwd_list_impl(_Fwd_list_impl&&) = default; + + _Fwd_list_impl(_Fwd_list_impl&& __fl, _Node_alloc_type&& __a) + : _Node_alloc_type(std::move(__a)), _M_head(std::move(__fl._M_head)) { } _Fwd_list_impl(_Node_alloc_type&& __a) @@ -312,26 +329,26 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _M_get_Node_allocator() const noexcept { return this->_M_impl; } - _Fwd_list_base() - : _M_impl() { } + _Fwd_list_base() = default; _Fwd_list_base(_Node_alloc_type&& __a) : _M_impl(std::move(__a)) { } + // When allocators are always equal. + _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a, + std::true_type) + : _M_impl(std::move(__lst._M_impl), std::move(__a)) + { } + + // When allocators are not always equal. _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a); - _Fwd_list_base(_Fwd_list_base&& __lst) - : _M_impl(std::move(__lst._M_get_Node_allocator())) - { - this->_M_impl._M_head._M_next = __lst._M_impl._M_head._M_next; - __lst._M_impl._M_head._M_next = 0; - } + _Fwd_list_base(_Fwd_list_base&&) = default; ~_Fwd_list_base() - { _M_erase_after(&_M_impl._M_head, 0); } + { _M_erase_after(&_M_impl._M_head, nullptr); } protected: - _Node* _M_get_node() { @@ -437,10 +454,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER /**
Re: [C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS
Some quick tests show that memory use increased by 1.5% Compilation time reduced by 1% to 3% Your comment on IRC about not needing a identifier->decl map, just a decl hash, because the decl already has the identifier is a good one. I recall considering that when converting namespaces, but there the anonymous namespace has a NULL name, which is annoying. I just used the same map table here. However, it's probably worth the effort. It'd also be nice if the hash table primitive didn't unconditionally hold instrumentation counters. AFAICT they only get folded to global counters upon destruction of the hash table. Which for these things never happens. Remember, when I succeed in folding METHOD_VEC into the same structure, we'll have several members in this table for all but the simplest of structures. nathan -- Nathan Sidwell
Re: [06/77] Make GET_MODE_WIDER return an opt_mode
Jeff Lawwrites: > On 08/11/2017 12:24 PM, Richard Sandiford wrote: >> Jeff Law writes: >>> On 07/13/2017 02:40 AM, Richard Sandiford wrote: GET_MODE_WIDER previously returned VOIDmode if no wider mode existed. That would cause problems with stricter mode classes, since VOIDmode isn't for example a valid scalar integer or floating-point mode. This patch instead makes it return a new opt_mode class, which holds either a T or nothing. 2017-07-13 Richard Sandiford Alan Hayward David Sherwood gcc/ * coretypes.h (opt_mode): New class. * machmode.h (opt_mode): Likewise. (opt_mode::else_void): New function. (opt_mode::operator *): Likewise. (opt_mode::exists): Likewise. (GET_MODE_WIDER_MODE): Turn into a function and return an opt_mode. (GET_MODE_2XWIDER_MODE): Likewise. (mode_iterator::get_wider): Update accordingly. (mode_iterator::get_2xwider): Likewise. (mode_iterator::get_known_wider): Likewise, turning into a template. * combine.c (make_extraction): Update use of GET_MODE_WIDER_MODE, forcing a wider mode to exist. * config/cr16/cr16.h (LONG_REG_P): Likewise. * rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise. * config/c6x/c6x.c (c6x_rtx_costs): Update use of GET_MODE_2XWIDER_MODE, forcing a wider mode to exist. * lower-subreg.c (init_lower_subreg): Likewise. * optabs-libfuncs.c (init_sync_libfuncs_1): Likewise, but not on the final iteration. * config/i386/i386.c (ix86_expand_set_or_movmem): Check whether a wider mode exists before asking for a move pattern. (get_mode_wider_vector): Update use of GET_MODE_WIDER_MODE, forcing a wider mode to exist. (expand_vselect_vconcat): Update use of GET_MODE_2XWIDER_MODE, returning false if no such mode exists. * config/ia64/ia64.c (expand_vselect_vconcat): Likewise. * config/mips/mips.c (mips_expand_vselect_vconcat): Likewise. * expmed.c (init_expmed_one_mode): Update use of GET_MODE_WIDER_MODE. Avoid checking for a MODE_INT if we already know the mode is not a SCALAR_INT_MODE_P. (extract_high_half): Update use of GET_MODE_WIDER_MODE, forcing a wider mode to exist. (expmed_mult_highpart_optab): Likewise. (expmed_mult_highpart): Likewise. * expr.c (expand_expr_real_2): Update use of GET_MODE_WIDER_MODE, using else_void. * lto-streamer-in.c (lto_input_mode_table): Likewise. * optabs-query.c (find_widening_optab_handler_and_mode): Likewise. * stor-layout.c (bit_field_mode_iterator::next_mode): Likewise. * internal-fn.c (expand_mul_overflow): Update use of GET_MODE_2XWIDER_MODE. * omp-low.c (omp_clause_aligned_alignment): Likewise. * tree-ssa-math-opts.c (convert_mult_to_widen): Update use of GET_MODE_WIDER_MODE. (convert_plusminus_to_widen): Likewise. * tree-switch-conversion.c (array_value_type): Likewise. * var-tracking.c (emit_note_insn_var_location): Likewise. * tree-vrp.c (simplify_float_conversion_using_ranges): Likewise. Return false inside rather than outside the loop if no wider mode exists * optabs.c (expand_binop): Update use of GET_MODE_WIDER_MODE and GET_MODE_2XWIDER_MODE (can_compare_p): Use else_void. * gdbhooks.py (OptMachineModePrinter): New class. (build_pretty_printer): Use it for opt_mode. gcc/ada/ * gcc-interface/decl.c (validate_size): Update use of GET_MODE_WIDER_MODE, forcing a wider mode to exist. >>> I'm not a big fan of the API here, particularly using operator* to >>> handle asserting the mode exists. I'd prefer to just use a member >>> function rather than overloading operator*. >>> >>> What's the rationale behind using operator* to imply the assertion? >>> >>> THe changes themsleves look fine, it's really just a question of the API >>> we present. >> >> The original idea was to make opt_mode look pointer-ish, so that >> the dyn_cast <...> result could be used in the same way as for >> dyn_cast etc. The first cut therefore had operator bool () >> to test whether there was a mode and operator * to dereference it. >> >> However, operator bool () created various subtle problems (as it always >> seems to) so we dropped it in favour of exists (). I was neutral >> on whether we should keep '*' or switch to a function, so in the >> end the status quo won out. I'm happy to change it to a named >> accessor though. >> >> Any better ideas than "get ()" for the name? Maybe something >> to emphasis that it is asserting for non-nullness/non-emptiness >> (which '*' does implicitly)? > > Yea, when I was
Re: [Patch, Fortran] PR 81770: [5/6/7 Regression] Bogus warning: Pointer in pointer assignment might outlive the pointer target
2017-08-28 10:31 GMT+02:00 Thomas Koenig: > Hi Janus, > >> the attached patch fixes a bogus warning. The purpose of the warning >> is to detect cases where a pointer lives longer than its target. If >> the target itself is (1) a pointer or (2) a component of a DT pointer, >> we do not know about the lifetime of the target at compile time and no >> warning should be thrown. The existing check only handles case (1) and >> my patch adds the handling of case (2). >> >> Regtestes cleanly on x86_64-linux-gnu. Ok for trunk and the release >> branches? > > > OK, and thanks for the patch! Thanks, Thomas! Committed as r251390 (together with a small typo fix in a related test case): https://gcc.gnu.org/viewcvs/gcc?view=revision=251390 Will take care of the release branches in a couple of days (probably on the weekend). Cheers, Janus
[PATCH], Fix PR 81959 (power9 IEEE 128-bit float convert from 32-bit memory)
When I added the optimization for loading 32-bit values directly into the vector registers from memory to convert to IEEE 128-bit floating point, I forgot to make sure the address did not have PRE_INCREMENT, etc. addressing. I checked the compiler on a little endian power8 system. Is it ok to check this patch into the trunk and back port it GCC 7? GCC 6 did not have the optimization. [gcc] 2017-08-28 Michael MeissnerPR target/81959 * config/rs6000/rs6000.md (float_si2_hw): If register allocation hasn't been done, make sure the memory address is X-FORM (register+register). (floatuns_si2_hw2): Likewise. [gcct/testsuite] 2017-08-28 Michael Meissner PR target/81959 * gcc.target/powerpc/pr81959.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 251358) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_si2_ { if (GET_CODE (operands[2]) == SCRATCH) operands[2] = gen_reg_rtx (DImode); + + if (MEM_P (operands[1]) && !reload_completed) +operands[1] = rs6000_address_for_fpconvert (operands[1]); }) (define_insn_and_split "float2" @@ -14568,6 +14571,9 @@ (define_insn_and_split "floatuns_s { if (GET_CODE (operands[2]) == SCRATCH) operands[2] = gen_reg_rtx (DImode); + + if (MEM_P (operands[1]) && !reload_completed) +operands[1] = rs6000_address_for_fpconvert (operands[1]); }) (define_insn_and_split "floatuns2" Index: gcc/testsuite/gcc.target/powerpc/pr81959.c === --- gcc/testsuite/gcc.target/powerpc/pr81959.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr81959.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */ + +/* PR 81959, the compiler raised on unrecognizable insn message in converting + int to __float128, where the int had a PRE_INC in the address. */ + +#ifndef ARRAY_SIZE +#define ARRAY_SIZE 1024 +#endif + +void +convert_int_to_float128 (__float128 * __restrict__ p, +int * __restrict__ q) +{ + unsigned long i; + + for (i = 0; i < ARRAY_SIZE; i++) +p[i] = (__float128)q[i]; +} + +/* { dg-final { scan-assembler {\mlfiwax\M|\mlxsiwax\M} } } */ +/* { dg-final { scan-assembler {\mxscvsdqp\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrw[sz]\M} } } */
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Aug 28, 2017, at 1:40 PM, Bill Schmidtwrote: > >> >> On Aug 28, 2017, at 12:57 PM, Bill Schmidt >> wrote: >> >> On Aug 28, 2017, at 7:37 AM, Richard Biener >> wrote: >>> >>> On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt >>> wrote: Hi, Here's v2 of the patch with Jakub's suggestions incorporated. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this ok for trunk? Eventually this should be backported to all active releases as well. Ok for that after a week or so of burn-in? (And after 7.2, I imagine.) Thanks, Bill [gcc] 2017-08-03 Bill Schmidt Jakub Jelinek PR tree-optimization/81503 * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure folded constant fits in the target type. [gcc/testsuite] 2017-08-03 Bill Schmidt Jakub Jelinek PR tree-optimization/81503 * gcc.c-torture/execute/pr81503.c: New file. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 250791) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ { tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); + unsigned int prec = TYPE_PRECISION (target_type); + tree maxval = (POINTER_TYPE_P (target_type) +? TYPE_MAX_VALUE (sizetype) +: TYPE_MAX_VALUE (target_type)); /* It is highly unlikely, but possible, that the resulting bump doesn't fit in a HWI. Abandon the replacement @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ types but allows for safe negation without twisted logic. */ if (wi::fits_shwi_p (bump) && bump.to_shwi () != HOST_WIDE_INT_MIN + /* It is more likely that the bump doesn't fit in the target +type, so check whether constraining it to that type changes +the value. For a signed type, the value mustn't change. +For an unsigned type, the value may only change to a +congruent value (for negative bumps). */ + && (TYPE_UNSIGNED (target_type) + ? wi::eq_p (wi::neg_p (bump) + ? bump + wi::to_widest (maxval) + 1 + : bump, + wi::zext (bump, prec)) + : wi::eq_p (bump, wi::sext (bump, prec))) >>> >>> Not sure, but would it be fixed in a similar way when writing >>> >>> @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t >>> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >>> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); >>> >>> - /* It is highly unlikely, but possible, that the resulting >>> - bump doesn't fit in a HWI. Abandon the replacement >>> - in this case. This does not affect siblings or dependents >>> - of C. Restriction to signed HWI is conservative for unsigned >>> - types but allows for safe negation without twisted logic. */ >>> - if (wi::fits_shwi_p (bump) >>> - && bump.to_shwi () != HOST_WIDE_INT_MIN >>> - /* It is not useful to replace casts, copies, negates, or adds of >>> -an SSA name and a constant. */ >>> - && cand_code != SSA_NAME >>> + /* It is not useful to replace casts, copies, negates, or adds of >>> + an SSA name and a constant. */ >>> + if (cand_code != SSA_NAME >>> && !CONVERT_EXPR_CODE_P (cand_code) >>> && cand_code != PLUS_EXPR >>> && cand_code != POINTER_PLUS_EXPR >>> @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t >>> tree bump_tree; >>> gimple *stmt_to_print = NULL; >>> >>> - /* If the basis name and the candidate's LHS have incompatible >>> -types, introduce a cast. */ >>> - if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) >>> - basis_name = introduce_cast_before_cand (c, target_type, >>> basis_name); >>> if (wi::neg_p (bump)) >>> { >>>code = MINUS_EXPR; >>>bump = -bump; >>> } >>> + /* It is possible that the resulting bump doesn't fit in target_type. >>> +Abandon the replacement in this case. This does not affect >>> +siblings or dependents of C. */ >>> + if (bump != wi::ext (bump, TYPE_PRECISION (target_type), >>> + TYPE_SIGN (target_type))) >>> + return; >>> >>>
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
> On Aug 28, 2017, at 12:57 PM, Bill Schmidt> wrote: > > On Aug 28, 2017, at 7:37 AM, Richard Biener > wrote: >> >> On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt >> wrote: >>> Hi, >>> >>> Here's v2 of the patch with Jakub's suggestions incorporated. Bootstrapped >>> and tested on powerpc64le-linux-gnu with no regressions. Is this ok for >>> trunk? >>> >>> Eventually this should be backported to all active releases as well. >>> Ok for that after a week or so of burn-in? (And after 7.2, I imagine.) >>> >>> Thanks, >>> Bill >>> >>> >>> [gcc] >>> >>> 2017-08-03 Bill Schmidt >>> Jakub Jelinek >>> >>> PR tree-optimization/81503 >>> * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure >>> folded constant fits in the target type. >>> >>> [gcc/testsuite] >>> >>> 2017-08-03 Bill Schmidt >>> Jakub Jelinek >>> >>> PR tree-optimization/81503 >>> * gcc.c-torture/execute/pr81503.c: New file. >>> >>> >>> Index: gcc/gimple-ssa-strength-reduction.c >>> === >>> --- gcc/gimple-ssa-strength-reduction.c (revision 250791) >>> +++ gcc/gimple-ssa-strength-reduction.c (working copy) >>> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >>> { >>> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >>> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); >>> + unsigned int prec = TYPE_PRECISION (target_type); >>> + tree maxval = (POINTER_TYPE_P (target_type) >>> +? TYPE_MAX_VALUE (sizetype) >>> +: TYPE_MAX_VALUE (target_type)); >>> >>> /* It is highly unlikely, but possible, that the resulting >>> bump doesn't fit in a HWI. Abandon the replacement >>> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >>> types but allows for safe negation without twisted logic. */ >>> if (wi::fits_shwi_p (bump) >>> && bump.to_shwi () != HOST_WIDE_INT_MIN >>> + /* It is more likely that the bump doesn't fit in the target >>> +type, so check whether constraining it to that type changes >>> +the value. For a signed type, the value mustn't change. >>> +For an unsigned type, the value may only change to a >>> +congruent value (for negative bumps). */ >>> + && (TYPE_UNSIGNED (target_type) >>> + ? wi::eq_p (wi::neg_p (bump) >>> + ? bump + wi::to_widest (maxval) + 1 >>> + : bump, >>> + wi::zext (bump, prec)) >>> + : wi::eq_p (bump, wi::sext (bump, prec))) >> >> Not sure, but would it be fixed in a similar way when writing >> >> @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t >> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); >> >> - /* It is highly unlikely, but possible, that the resulting >> - bump doesn't fit in a HWI. Abandon the replacement >> - in this case. This does not affect siblings or dependents >> - of C. Restriction to signed HWI is conservative for unsigned >> - types but allows for safe negation without twisted logic. */ >> - if (wi::fits_shwi_p (bump) >> - && bump.to_shwi () != HOST_WIDE_INT_MIN >> - /* It is not useful to replace casts, copies, negates, or adds of >> -an SSA name and a constant. */ >> - && cand_code != SSA_NAME >> + /* It is not useful to replace casts, copies, negates, or adds of >> + an SSA name and a constant. */ >> + if (cand_code != SSA_NAME >> && !CONVERT_EXPR_CODE_P (cand_code) >> && cand_code != PLUS_EXPR >> && cand_code != POINTER_PLUS_EXPR >> @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t >> tree bump_tree; >> gimple *stmt_to_print = NULL; >> >> - /* If the basis name and the candidate's LHS have incompatible >> -types, introduce a cast. */ >> - if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) >> - basis_name = introduce_cast_before_cand (c, target_type, basis_name); >> if (wi::neg_p (bump)) >> { >> code = MINUS_EXPR; >> bump = -bump; >> } >> + /* It is possible that the resulting bump doesn't fit in target_type. >> +Abandon the replacement in this case. This does not affect >> +siblings or dependents of C. */ >> + if (bump != wi::ext (bump, TYPE_PRECISION (target_type), >> + TYPE_SIGN (target_type))) >> + return; >> >> bump_tree = wide_int_to_tree (target_type, bump); >> >> + /* If the basis name and the candidate's LHS have incompatible >> +types, introduce a cast. */ >> +
Re: Fix inconsistent section flags
On Mon, Aug 28, 2017 at 11:42:53AM -0600, Jeff Law wrote: > On 07/17/2017 08:35 AM, Joerg Sonnenberger wrote: > > Hello, > > attached patch fixes inconsistent handling of section flags when using > > the section attribute, i.e.: > > > > __attribute__((section("writable1"))) int foo1; > > __attribute__((section("readonly1"))) const int foo1c; > > __attribute__((section("writable2"))) int foo2 = 42; > > __attribute__((section("readonly2"))) const int foo2c = 42; > > > > should give section attributes of "aw", "a", "aw", "a" in that order. > > Currently, "foo1c" is classified as BSS though and therefore put into a > > writable section. > ISTM the test we need here is whether or not the underlying DECL is > readonly. If it READONLY, then it shouldn't go into .bss, but should > instead to into a readable section. > > Testing based on names seems wrong. > > Does the attached (untested) patch work for you? The intention should work, will test it. The attached patch will likely also be needed on top. Joerg Index: varasm.c === RCS file: /home/joerg/repo/netbsd/src/external/gpl3/gcc/dist/gcc/varasm.c,v retrieving revision 1.2 retrieving revision 1.3 diff -u -p -r1.2 -r1.3 --- varasm.c17 Jul 2017 19:53:10 - 1.2 +++ varasm.c22 Jul 2017 20:52:52 - 1.3 @@ -6428,7 +6428,8 @@ categorize_decl_for_section (const_tree location. -fmerge-all-constants allows even that (at the expense of not conforming). */ ret = SECCAT_RODATA; - else if (TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) + else if (DECL_INITIAL (decl) != NULL + && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) ret = SECCAT_RODATA_MERGE_STR_INIT; else ret = SECCAT_RODATA_MERGE_CONST;
[PATCH] Fix bug in simplify_ternary_operation
Hi, I think I found a bug in r17465: ... * cse.c (simplify_ternary_operation): Handle more IF_THEN_ELSE simplifications. diff --git a/gcc/cse.c b/gcc/cse.c index e001597..3c27387 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4713,6 +4713,17 @@ simplify_ternary_operation (code, mode, op0_mode, op0, op1, op2) Note: the parameters of simplify_ternary_operation have the following meaning: ... /* Simplify CODE, an operation with result mode MODE and three operands, OP0, OP1, and OP2. OP0_MODE was the mode of OP0 before it became a constant. Return 0 if no simplifications is possible. */ rtx simplify_ternary_operation (code, mode, op0_mode, op0, op1, op2) enum rtx_code code; enum machine_mode mode, op0_mode; rtx op0, op1, op2; ... && rtx_equal_p (XEXP (op0, 1), op1) && rtx_equal_p (XEXP (op0, 0), op2)) return op2; + else if (! side_effects_p (op0)) + { + rtx temp; + temp = simplify_relational_operation (GET_CODE (op0), op0_mode, + XEXP (op0, 0), XEXP (op0, 1)); We're handling code == IF_THEN_ELSE here, so op0 is the condition, op1 is the 'then expr' and op2 is the 'else expr'. The parameters of simplify_relational_operation have the following meaning: ... /* Like simplify_binary_operation except used for relational operators. MODE is the mode of the operands, not that of the result. If MODE is VOIDmode, both operands must also be VOIDmode and we compare the operands in "infinite precision". If no simplification is possible, this function returns zero. Otherwise, it returns either const_true_rtx or const0_rtx. */ rtx simplify_relational_operation (code, mode, op0, op1) enum rtx_code code; enum machine_mode mode; rtx op0, op1; ... The problem in the patch is that we use op0_mode argument for the mode parameter. The mode parameter of simplify_relational_operation needs to be the mode of the operands of the condition, while op0_mode is the mode of the condition. Patch below fixes this on current trunk. [ I found this by running into an ICE in gcc.c-torture/compile/pr28776-2.c for gcn target. I haven't been able to reproduce this with an upstream branch yet. ] OK for trunk if bootstrap and reg-test for x86_64 succeeds? Thanks, - Tom diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 0133d43..fbf979b 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5567,8 +5567,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, XEXP (op0, 0), XEXP (op0, 1)); } - if (cmp_mode == VOIDmode) - cmp_mode = op0_mode; temp = simplify_relational_operation (GET_CODE (op0), op0_mode, cmp_mode, XEXP (op0, 0), XEXP (op0, 1));
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On 08/10/2017 03:14 PM, Michael Collison wrote: > Hi all, > > One issue that we keep encountering on aarch64 is GCC not making good use of > the flag-setting arithmetic instructions > like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare > the result against zero. > They are represented in a fairly standard way in the backend as PARALLEL > patterns: > (parallel [(set (reg x1) (op (reg x2) (reg x3))) >(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))]) > > GCC isn't forming these from separate arithmetic and comparison instructions > as aggressively as it could. > A particular pain point is when the result of the arithmetic insn is used > before the comparison instruction. > The testcase in this patch is one such example where we have: > (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) > (plus:SI (reg:SI 0 x0 [ x ]) > (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > (nil)) > (insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) > (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > (nil)) > (insn 34 33 17 2 (set (reg:CC 66 cc) > (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) > (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > (nil)) > > This scares combine away as x0 is used in insn 33 as well as the comparison > in insn 34. > I think the compare-elim pass can help us here. Is it the multiple use or the hard register that combine doesn't appreciate. The latter would definitely steer us towards compare-elim. > > This patch extends it by handling comparisons against zero, finding the > defining instruction of the compare > and merging the comparison with the defining instruction into a PARALLEL that > will hopefully match the form > described above. If between the comparison and the defining insn we find an > instruction that uses the condition > registers or any control flow we bail out, and we don't cross basic blocks. > This simple technique allows us to catch cases such as this example. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64. > > Ok for trunk? > > 2017-08-05 Kyrylo Tkachov> Michael Collison > > * compare-elim.c: Include emit-rtl.h. > (can_merge_compare_into_arith): New function. > (try_validate_parallel): Likewise. > (try_merge_compare): Likewise. > (try_eliminate_compare): Call the above when no previous clobber > is available. > (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN > dataflow problems. > > 2017-08-05 Kyrylo Tkachov > Michael Collison > > * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. > > > pr5198v1.patch > > > diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c > index 7e557a2..c65d155 100644 > --- a/gcc/compare-elim.c > +++ b/gcc/compare-elim.c > @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see > #include "tm_p.h" > #include "insn-config.h" > #include "recog.h" > +#include "emit-rtl.h" > #include "cfgrtl.h" > #include "tree-pass.h" > #include "domwalk.h" > @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, > rtx_insn *start) >return reg; > } > > +/* Return true if it is okay to merge the comparison CMP_INSN with > + the instruction ARITH_INSN. Both instructions are assumed to be in the > + same basic block with ARITH_INSN appearing before CMP_INSN. This checks > + that there are no uses or defs of the condition flags or control flow > + changes between the two instructions. */ > + > +static bool > +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn) > +{ > + for (rtx_insn *insn = PREV_INSN (cmp_insn); > + insn && insn != arith_insn; > + insn = PREV_INSN (insn)) > +{ > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + /* Bail if there are jumps or calls in between. */ > + if (!NONJUMP_INSN_P (insn)) > + return false; > + > + df_ref ref; > + /* Find a USE of the flags register. */ > + FOR_EACH_INSN_USE (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > + > + /* Find a DEF of the flags register. */ > + FOR_EACH_INSN_DEF (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > +} > + return true; > +} What about old style asms? Do you need to explicit punt those? I don't think they carry DF info. Don't you also have to verify that the inputs to ARITH_INSN are unchanged between ARITH_INSN and CMP_INSN? > + > +/* Given two SET expressions, SET_A and SET_B determine whether they form > + a recognizable pattern when emitted in parallel. Return that parallel > + if so. Otherwise return NULL. This tries
Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64
On Fri, 2017-08-25 at 15:37 +0100, Szabolcs Nagy wrote: > the use of ifunc in gcc target libraries was a mistake > in my opinion, there are several known bugs in the ifunc > design and uclibc/musl/bionic don't plan to support it. > but at this point i dont have a better proposal for doing > runtime selection of optimal atomics code. > > however in this patch i don't see why would the ctor run > before ifunc resolvers. how does this work on x86_64? > (there the different 16byte atomics are not even compatible, > so if ifunc resolvers in different dsos return different > result because one ran before the ctor, another after then > the runtime behaviour is broken. this can happen when one > dso is bindnow so ifunc relocation is processed before > ctors and another runs resolvers lazily or dlopened later.. > but i didnt test it just looks broken) I am not sure I followed everything here. My basic testing all worked, init_cpu_revision got run when libatomic was loaded and then the IFUNC resolver gets called after that when one of the IFUNC atomic functions is called. init_cpu_revision sets libat_have_lse which, I assume, will not be used by any other libraries. If other libraries have IFUNCs they would have their own static constructors and set their own variables to be checked by their own IFUNCs. So I am not sure how one DSO is going to break another DSO. > note that aarch64 ifunc resolvers get hwcap as an argument > so all this brokenness can be avoided (and if we want to > disable hwcap flags then probably glibc should take care > of that before passing hwcap to the ifunc resolver). I looked at the IFUNC memcpy resolver in glibc and it does not look like it gets hwcap as an argument. When I preprocess everything I see: void *__libc_memcpy_ifunc (void) __asm__ ("__libc_memcpy"); void *__libc_memcpy_ifunc (void) { uint64_t __attribute__((unused)) midr = _dl_aarch64_cpu_features.midr_el1; *res = ** expression that looks at midr value and returns function pointer **; return res; } __asm__ (".type " "__libc_memcpy" ", %gnu_indirect_function"); extern __typeof (__libc_memcpy) memcpy __attribute__ ((alias ("__libc_memcpy"))); Steve Ellcey sell...@cavium.com
Re: [06/77] Make GET_MODE_WIDER return an opt_mode
On 08/11/2017 12:24 PM, Richard Sandiford wrote: > Jeff Lawwrites: >> On 07/13/2017 02:40 AM, Richard Sandiford wrote: >>> GET_MODE_WIDER previously returned VOIDmode if no wider mode existed. >>> That would cause problems with stricter mode classes, since VOIDmode >>> isn't for example a valid scalar integer or floating-point mode. >>> This patch instead makes it return a new opt_mode class, which >>> holds either a T or nothing. >>> >>> 2017-07-13 Richard Sandiford >>> Alan Hayward >>> David Sherwood >>> >>> gcc/ >>> * coretypes.h (opt_mode): New class. >>> * machmode.h (opt_mode): Likewise. >>> (opt_mode::else_void): New function. >>> (opt_mode::operator *): Likewise. >>> (opt_mode::exists): Likewise. >>> (GET_MODE_WIDER_MODE): Turn into a function and return an opt_mode. >>> (GET_MODE_2XWIDER_MODE): Likewise. >>> (mode_iterator::get_wider): Update accordingly. >>> (mode_iterator::get_2xwider): Likewise. >>> (mode_iterator::get_known_wider): Likewise, turning into a template. >>> * combine.c (make_extraction): Update use of GET_MODE_WIDER_MODE, >>> forcing a wider mode to exist. >>> * config/cr16/cr16.h (LONG_REG_P): Likewise. >>> * rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise. >>> * config/c6x/c6x.c (c6x_rtx_costs): Update use of >>> GET_MODE_2XWIDER_MODE, forcing a wider mode to exist. >>> * lower-subreg.c (init_lower_subreg): Likewise. >>> * optabs-libfuncs.c (init_sync_libfuncs_1): Likewise, but not >>> on the final iteration. >>> * config/i386/i386.c (ix86_expand_set_or_movmem): Check whether >>> a wider mode exists before asking for a move pattern. >>> (get_mode_wider_vector): Update use of GET_MODE_WIDER_MODE, >>> forcing a wider mode to exist. >>> (expand_vselect_vconcat): Update use of GET_MODE_2XWIDER_MODE, >>> returning false if no such mode exists. >>> * config/ia64/ia64.c (expand_vselect_vconcat): Likewise. >>> * config/mips/mips.c (mips_expand_vselect_vconcat): Likewise. >>> * expmed.c (init_expmed_one_mode): Update use of GET_MODE_WIDER_MODE. >>> Avoid checking for a MODE_INT if we already know the mode is not a >>> SCALAR_INT_MODE_P. >>> (extract_high_half): Update use of GET_MODE_WIDER_MODE, >>> forcing a wider mode to exist. >>> (expmed_mult_highpart_optab): Likewise. >>> (expmed_mult_highpart): Likewise. >>> * expr.c (expand_expr_real_2): Update use of GET_MODE_WIDER_MODE, >>> using else_void. >>> * lto-streamer-in.c (lto_input_mode_table): Likewise. >>> * optabs-query.c (find_widening_optab_handler_and_mode): Likewise. >>> * stor-layout.c (bit_field_mode_iterator::next_mode): Likewise. >>> * internal-fn.c (expand_mul_overflow): Update use of >>> GET_MODE_2XWIDER_MODE. >>> * omp-low.c (omp_clause_aligned_alignment): Likewise. >>> * tree-ssa-math-opts.c (convert_mult_to_widen): Update use of >>> GET_MODE_WIDER_MODE. >>> (convert_plusminus_to_widen): Likewise. >>> * tree-switch-conversion.c (array_value_type): Likewise. >>> * var-tracking.c (emit_note_insn_var_location): Likewise. >>> * tree-vrp.c (simplify_float_conversion_using_ranges): Likewise. >>> Return false inside rather than outside the loop if no wider mode >>> exists >>> * optabs.c (expand_binop): Update use of GET_MODE_WIDER_MODE >>> and GET_MODE_2XWIDER_MODE >>> (can_compare_p): Use else_void. >>> * gdbhooks.py (OptMachineModePrinter): New class. >>> (build_pretty_printer): Use it for opt_mode. >>> >>> gcc/ada/ >>> * gcc-interface/decl.c (validate_size): Update use of >>> GET_MODE_WIDER_MODE, forcing a wider mode to exist. >> I'm not a big fan of the API here, particularly using operator* to >> handle asserting the mode exists. I'd prefer to just use a member >> function rather than overloading operator*. >> >> What's the rationale behind using operator* to imply the assertion? >> >> THe changes themsleves look fine, it's really just a question of the API >> we present. > > The original idea was to make opt_mode look pointer-ish, so that > the dyn_cast <...> result could be used in the same way as for > dyn_cast etc. The first cut therefore had operator bool () > to test whether there was a mode and operator * to dereference it. > > However, operator bool () created various subtle problems (as it always > seems to) so we dropped it in favour of exists (). I was neutral > on whether we should keep '*' or switch to a function, so in the > end the status quo won out. I'm happy to change it to a named > accessor though. > > Any better ideas than "get ()" for the name? Maybe something > to emphasis that it is asserting for non-nullness/non-emptiness > (which '*' does implicitly)?Yea, when I was reading the first few patches it > felt like you trying to do a pointer-ish
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Aug 28, 2017, at 7:37 AM, Richard Bienerwrote: > > On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt > wrote: >> Hi, >> >> Here's v2 of the patch with Jakub's suggestions incorporated. Bootstrapped >> and tested on powerpc64le-linux-gnu with no regressions. Is this ok for >> trunk? >> >> Eventually this should be backported to all active releases as well. >> Ok for that after a week or so of burn-in? (And after 7.2, I imagine.) >> >> Thanks, >> Bill >> >> >> [gcc] >> >> 2017-08-03 Bill Schmidt >>Jakub Jelinek >> >>PR tree-optimization/81503 >>* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure >>folded constant fits in the target type. >> >> [gcc/testsuite] >> >> 2017-08-03 Bill Schmidt >>Jakub Jelinek >> >>PR tree-optimization/81503 >>* gcc.c-torture/execute/pr81503.c: New file. >> >> >> Index: gcc/gimple-ssa-strength-reduction.c >> === >> --- gcc/gimple-ssa-strength-reduction.c (revision 250791) >> +++ gcc/gimple-ssa-strength-reduction.c (working copy) >> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >> { >> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); >> + unsigned int prec = TYPE_PRECISION (target_type); >> + tree maxval = (POINTER_TYPE_P (target_type) >> +? TYPE_MAX_VALUE (sizetype) >> +: TYPE_MAX_VALUE (target_type)); >> >> /* It is highly unlikely, but possible, that the resulting >> bump doesn't fit in a HWI. Abandon the replacement >> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >> types but allows for safe negation without twisted logic. */ >> if (wi::fits_shwi_p (bump) >> && bump.to_shwi () != HOST_WIDE_INT_MIN >> + /* It is more likely that the bump doesn't fit in the target >> +type, so check whether constraining it to that type changes >> +the value. For a signed type, the value mustn't change. >> +For an unsigned type, the value may only change to a >> +congruent value (for negative bumps). */ >> + && (TYPE_UNSIGNED (target_type) >> + ? wi::eq_p (wi::neg_p (bump) >> + ? bump + wi::to_widest (maxval) + 1 >> + : bump, >> + wi::zext (bump, prec)) >> + : wi::eq_p (bump, wi::sext (bump, prec))) > > Not sure, but would it be fixed in a similar way when writing > > @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t > tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); > enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); > > - /* It is highly unlikely, but possible, that the resulting > - bump doesn't fit in a HWI. Abandon the replacement > - in this case. This does not affect siblings or dependents > - of C. Restriction to signed HWI is conservative for unsigned > - types but allows for safe negation without twisted logic. */ > - if (wi::fits_shwi_p (bump) > - && bump.to_shwi () != HOST_WIDE_INT_MIN > - /* It is not useful to replace casts, copies, negates, or adds of > -an SSA name and a constant. */ > - && cand_code != SSA_NAME > + /* It is not useful to replace casts, copies, negates, or adds of > + an SSA name and a constant. */ > + if (cand_code != SSA_NAME > && !CONVERT_EXPR_CODE_P (cand_code) > && cand_code != PLUS_EXPR > && cand_code != POINTER_PLUS_EXPR > @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t > tree bump_tree; > gimple *stmt_to_print = NULL; > > - /* If the basis name and the candidate's LHS have incompatible > -types, introduce a cast. */ > - if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) > - basis_name = introduce_cast_before_cand (c, target_type, basis_name); > if (wi::neg_p (bump)) >{ > code = MINUS_EXPR; > bump = -bump; >} > + /* It is possible that the resulting bump doesn't fit in target_type. > +Abandon the replacement in this case. This does not affect > +siblings or dependents of C. */ > + if (bump != wi::ext (bump, TYPE_PRECISION (target_type), > + TYPE_SIGN (target_type))) > + return; > > bump_tree = wide_int_to_tree (target_type, bump); > > + /* If the basis name and the candidate's LHS have incompatible > +types, introduce a cast. */ > + if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) > + basis_name = introduce_cast_before_cand (c, target_type, basis_name); > + > if (dump_file &&
Re: Fix inconsistent section flags
On 07/17/2017 08:35 AM, Joerg Sonnenberger wrote: > Hello, > attached patch fixes inconsistent handling of section flags when using > the section attribute, i.e.: > > __attribute__((section("writable1"))) int foo1; > __attribute__((section("readonly1"))) const int foo1c; > __attribute__((section("writable2"))) int foo2 = 42; > __attribute__((section("readonly2"))) const int foo2c = 42; > > should give section attributes of "aw", "a", "aw", "a" in that order. > Currently, "foo1c" is classified as BSS though and therefore put into a > writable section. ISTM the test we need here is whether or not the underlying DECL is readonly. If it READONLY, then it shouldn't go into .bss, but should instead to into a readable section. Testing based on names seems wrong. Does the attached (untested) patch work for you? JEff diff --git a/gcc/varasm.c b/gcc/varasm.c index 91680d6..37438c1 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -976,16 +976,16 @@ decode_reg_name (const char *name) bool bss_initializer_p (const_tree decl) { - return (DECL_INITIAL (decl) == NULL - /* In LTO we have no errors in program; error_mark_node is used -to mark offlined constructors. */ - || (DECL_INITIAL (decl) == error_mark_node - && !in_lto_p) - || (flag_zero_initialized_in_bss - /* Leave constant zeroes in .rodata so they -can be shared. */ - && !TREE_READONLY (decl) - && initializer_zerop (DECL_INITIAL (decl; + /* Do not put constants into the .bss section, they belong in a readonly + section. */ + return (!TREE_READONLY (decl) + && (DECL_INITIAL (decl) == NULL + /* In LTO we have no errors in program; error_mark_node is used +to mark offlined constructors. */ + || (DECL_INITIAL (decl) == error_mark_node + && !in_lto_p) + || (flag_zero_initialized_in_bss + && initializer_zerop (DECL_INITIAL (decl); } /* Compute the alignment of variable specified by DECL. @@ -6508,7 +6508,8 @@ categorize_decl_for_section (const_tree decl, int reloc) ret = SECCAT_BSS; else if (! TREE_READONLY (decl) || TREE_SIDE_EFFECTS (decl) - || ! TREE_CONSTANT (DECL_INITIAL (decl))) + || (DECL_INITIAL (decl) + && ! TREE_CONSTANT (DECL_INITIAL (decl { /* Here the reloc_rw_mask is not testing whether the section should be read-only or not, but whether the dynamic link will have to @@ -6528,7 +6529,8 @@ categorize_decl_for_section (const_tree decl, int reloc) location. -fmerge-all-constants allows even that (at the expense of not conforming). */ ret = SECCAT_RODATA; - else if (TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) + else if (DECL_INITIAL (decl) + && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) ret = SECCAT_RODATA_MERGE_STR_INIT; else ret = SECCAT_RODATA_MERGE_CONST;
Re: [PATCH] Add -static-pie to GCC driver to create static PIE
On Mon, Aug 28, 2017 at 9:10 AM, Joseph Myerswrote: > On Tue, 8 Aug 2017, H.J. Lu wrote: > >> This patch adds -static-pie to GCC driver to create static PIE. A static >> position independent executable (PIE) is similar to static executable, >> but can be loaded at any address without a dynamic linker. All linker >> input files must be compiled with -fpie or -fPIE and linker must support >> --no-dynamic-linker to avoid linking with dynamic linker. "-z text" is >> also needed to prevent dynamic relocations in read-only segments. >> >> OK for trunk? > > I think the documentation for various options needs updating to clarify > exactly what they mean. (And potentially help text, which for driver > options is in gcc.c:display_help with the common.opt text being ignored in > that case.) Done. > -static is no longer just "prevents linking with the shared libraries" as > the documentation says, given it's also overriding (explicit or > configure-time default) -pie. -pie is no longer just "Produce a position > independent executable", it's producing a *dynamically linked* PIE. Done. >> +@item -static-pie >> +@opindex static-pie >> +Produce a static position independent executable on targets that support >> +it. A static position independent executable is similar to static >> +executable, but can be loaded at any address without a dynamic linker. > > "to a static executable". > Done. Here is the updated patch. OK for trunk? Thanks. -- H.J. From da4fdd54c53cfafbd45c7a7fe996f907b6d141f4 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Thu, 20 Jul 2017 14:08:18 -0700 Subject: [PATCH] Add -static-pie to GCC driver to create static PIE This patch adds -static-pie to GCC driver to create static PIE. A static position independent executable (PIE) is similar to static executable, but can be loaded at any address without a dynamic linker. All linker input files must be compiled with -fpie or -fPIE and linker must support --no-dynamic-linker to avoid linking with dynamic linker. "-z text" is also needed to prevent dynamic relocations in read-only segments. PR driver/81498 * common.opt (-static-pie): New alias. (shared): Negate static-pie. (-no-pie): Update help text. (-pie): Likewise. (static-pie): New option. * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add -static-pie support. (GNU_USER_TARGET_ENDFILE_SPEC): Likewise. (LINK_EH_SPEC): Likewise. (LINK_GCC_C_SEQUENCE_SPEC): Likewise. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): Likewise. * config/i386/gnu-user64.h (GNU_USER_TARGET_LINK_SPEC): Likewise. * gcc.c (LINK_COMMAND_SPEC): Likewise. (init_gcc_specs): Likewise. (init_spec): Likewise. (display_help): Update help message for -pie. * doc/invoke.texi: Update -pie, -no-pie and -static. Document -static-pie. --- gcc/common.opt | 13 ++--- gcc/config/gnu-user.h| 15 --- gcc/config/i386/gnu-user.h | 7 --- gcc/config/i386/gnu-user64.h | 11 ++- gcc/doc/invoke.texi | 24 +--- gcc/gcc.c| 20 6 files changed, 57 insertions(+), 33 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 1331008f811..fb88ed655fe 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -353,6 +353,9 @@ Common Alias(pedantic-errors) -pie Driver Alias(pie) +-static-pie +Driver Alias(static-pie) + -pipe Driver Alias(pipe) @@ -3062,7 +3065,7 @@ x Driver Joined Separate shared -Driver RejectNegative Negative(pie) +Driver RejectNegative Negative(static-pie) Create a shared library. shared-libgcc @@ -3108,11 +3111,15 @@ Driver no-pie Driver RejectNegative Negative(shared) -Don't create a position independent executable. +Don't create a dynamically linked position independent executable. pie Driver RejectNegative Negative(no-pie) -Create a position independent executable. +Create a dynamically linked position independent executable. + +static-pie +Driver RejectNegative Negative(pie) +Create a static position independent executable. z Driver Joined Separate diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h index de605b0c466..a967b69a350 100644 --- a/gcc/config/gnu-user.h +++ b/gcc/config/gnu-user.h @@ -53,11 +53,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see "%{shared:; \ pg|p|profile:gcrt1.o%s; \ static:crt1.o%s; \ - " PIE_SPEC ":Scrt1.o%s; \ + static-pie|" PIE_SPEC ":Scrt1.o%s; \ :crt1.o%s} \ crti.o%s \ %{static:crtbeginT.o%s; \ - shared|" PIE_SPEC ":crtbeginS.o%s; \ + shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \ :crtbegin.o%s} \ %{fvtable-verify=none:%s; \ fvtable-verify=preinit:vtv_start_preinit.o%s; \ @@ -70,7 +70,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see :crt1.o%s} \ crti.o%s \ %{static:crtbeginT.o%s; \ - shared|pie:crtbeginS.o%s; \ +
Re: [C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS
Hi, On Mon, 28 Aug 2017, Nathan Sidwell wrote: > This patch replaces the sorted field vector with a hash map. Lookup for > non-function members of a complete class is now O(1), not O(log(n)). > We still do linear lookup when the class is incomplete. Fixing that is > still on the todo list. Have you any memory measurement on a non-trivial C++ codebase with many classes (template instantiations?)? On LP64 platforms the sorted list for n members was 8+8*n bytes, the hash-map is 48 bytes itself plus 16*n bytes for the entries, where you have at least 13 entries always, next 31, next 61 entries (and so on). You pay the price for the next larger chunk already at about 2/3 fill rate, so e.g a class with 10 members was 88 bytes before and is 544 bytes now, which doesn't look that attractive given that the lookup before took four tries and now takes one, especially taking into account cache effects. Ciao, Michael.
Re: [PATCH] Fix PR81921
On Mon, 28 Aug 2017, Joseph Myers wrote: > On Sat, 26 Aug 2017, Richard Biener wrote: > > > On August 26, 2017 12:51:57 AM GMT+02:00, Joseph Myers > >wrote: > > >I'm seeing a build failure for s390x-linux-gnu that looks like it could > > >be > > >related to this change (build was OK at r251332, failed at r251358). > > > > Can you please open a bug? Can you confirm it fails the same way before > > the patch if you use - flto? > > Bug 82012 filed (as noted there, I don't know if the problem is in the > compiler or libitm). I can confirm the same failure appears building the > preprocessed source with a GCC 7 branch compiler with -flto added to the > options, rebuilding trunk r251332 to check with that. Now confirmed with -flto with trunk r251332. -- Joseph S. Myers jos...@codesourcery.com
[C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS
This patch replaces the sorted field vector with a hash map. Lookup for non-function members of a complete class is now O(1), not O(log(n)). We still do linear lookup when the class is incomplete. Fixing that is still on the todo list. This permits moving a bunch of sorted_field_vec handling from c-common to the c FE, and I'll post that in a subsequent patch. committed to trunk. nathan -- Nathan Sidwell 2017-08-28 Nathan Sidwell* cp-tree.h (lang_type): Replace sorted_fields vector with bindings map. (CLASSTYPE_SORTED_FIELDS): Delete. (CLASSTYPE_BINDINGS): New. * decl.c (finish_enum_value_list): Swap args of insert_late_enum_def_bindings. * name-lookup.c (lookup_field_1): Replace binary search of sorted fields with map->get. (sorted_fields_type_new, count_fields, add_fields_to_record_type, add_enum_fields_to_record_type): Delete. (add_class_member, add_class_members): New. (set_class_bindings): Create map and insert. (insert_late_enum_def_binding): Swap parms. Use add_clasS_member. * ptree.c (cxx_print_type): Delete sorted fields printing. Index: cp-tree.h === --- cp-tree.h (revision 251387) +++ cp-tree.h (working copy) @@ -2014,10 +2014,10 @@ struct GTY(()) lang_type { as a list of adopted protocols or a pointer to a corresponding @interface. See objc/objc-act.h for details. */ tree objc_info; - /* sorted_fields is sorted based on a pointer, so we need to be able - to resort it if pointers get rearranged. */ - struct sorted_fields_type * GTY ((reorder ("resort_sorted_fields"))) -sorted_fields; + + /* Map from IDENTIFIER nodes to DECLS. */ + hash_map *bindings; + /* FIXME reuse another field? */ tree lambda_expr; }; @@ -3243,10 +3243,9 @@ extern void decl_shadowed_for_var_insert && TREE_CODE (TYPE_NAME (NODE)) == TYPE_DECL \ && TYPE_DECL_ALIAS_P (TYPE_NAME (NODE))) -/* For a class type: if this structure has many fields, we'll sort them - and put them into a TREE_VEC. */ -#define CLASSTYPE_SORTED_FIELDS(NODE) \ - (LANG_TYPE_CLASS_CHECK (NODE)->sorted_fields) +/* The binding map for a class (not always present). */ +#define CLASSTYPE_BINDINGS(NODE) \ + (LANG_TYPE_CLASS_CHECK (NODE)->bindings) /* If non-NULL for a VAR_DECL, FUNCTION_DECL, TYPE_DECL or TEMPLATE_DECL, the entity is either a template specialization (if Index: decl.c === --- decl.c (revision 251387) +++ decl.c (working copy) @@ -14316,7 +14316,8 @@ finish_enum_value_list (tree enumtype) && COMPLETE_TYPE_P (current_class_type) && UNSCOPED_ENUM_P (enumtype)) { - insert_late_enum_def_bindings (enumtype, current_class_type); + insert_late_enum_def_bindings (current_class_type, enumtype); + /* TYPE_FIELDS needs fixup. */ fixup_type_variants (current_class_type); } Index: name-lookup.c === --- name-lookup.c (revision 251387) +++ name-lookup.c (working copy) @@ -1183,58 +1183,33 @@ lookup_fnfields_slot_nolazy (tree type, tree lookup_field_1 (tree type, tree name, bool want_type) { - tree field; + tree field = NULL_TREE; gcc_assert (identifier_p (name) && RECORD_OR_UNION_TYPE_P (type)); - if (CLASSTYPE_SORTED_FIELDS (type)) + if (CLASSTYPE_BINDINGS (type)) { - tree *fields = _SORTED_FIELDS (type)->elts[0]; - int lo = 0, hi = CLASSTYPE_SORTED_FIELDS (type)->len; - int i; - - while (lo < hi) - { - i = (lo + hi) / 2; - - if (DECL_NAME (fields[i]) > name) - hi = i; - else if (DECL_NAME (fields[i]) < name) - lo = i + 1; - else - { - field = NULL_TREE; + tree *slot = CLASSTYPE_BINDINGS (type)->get (name); + + if (slot) + { + field = *slot; - /* We might have a nested class and a field with the - same name; we sorted them appropriately via - field_decl_cmp, so just look for the first or last - field with this name. */ + if (STAT_HACK_P (field)) + { if (want_type) - { - do - field = fields[i--]; - while (i >= lo && DECL_NAME (fields[i]) == name); - if (!DECL_DECLARES_TYPE_P (field)) - field = NULL_TREE; - } + field = STAT_TYPE (field); else - { - do - field = fields[i++]; - while (i < hi && DECL_NAME (fields[i]) == name); - } - - if (field) - { - field = strip_using_decl (field); - if (is_overloaded_fn (field)) - field = NULL_TREE; - } - - return field; + field = STAT_DECL (field); } + + field = strip_using_decl (field); + if (OVL_P (field)) + field = NULL_TREE; + else if (want_type && !DECL_DECLARES_TYPE_P (field)) + field = NULL_TREE; } - return NULL_TREE; + return field; } field = TYPE_FIELDS (type); @@ -1312,113 +1287,62 @@ lookup_fnfields_slot (tree type, tree na
Re: [PATCH] Fix PR81921
On Sat, 26 Aug 2017, Richard Biener wrote: > On August 26, 2017 12:51:57 AM GMT+02:00, Joseph Myers >wrote: > >I'm seeing a build failure for s390x-linux-gnu that looks like it could > >be > >related to this change (build was OK at r251332, failed at r251358). > > Can you please open a bug? Can you confirm it fails the same way before > the patch if you use - flto? Bug 82012 filed (as noted there, I don't know if the problem is in the compiler or libitm). I can confirm the same failure appears building the preprocessed source with a GCC 7 branch compiler with -flto added to the options, rebuilding trunk r251332 to check with that. Options used with GCC 7 branch to get the same error: s390x-glibc-linux-gnu-g++ -S -o /dev/null -ftls-model=initial-exec -mzarch -mhtm -std=gnu++0x -funwind-tables -fno-exceptions -fno-rtti -fabi-version=4 -g -O2 -fPIC beginend.ii -flto -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add -static-pie to GCC driver to create static PIE
On Tue, 8 Aug 2017, H.J. Lu wrote: > This patch adds -static-pie to GCC driver to create static PIE. A static > position independent executable (PIE) is similar to static executable, > but can be loaded at any address without a dynamic linker. All linker > input files must be compiled with -fpie or -fPIE and linker must support > --no-dynamic-linker to avoid linking with dynamic linker. "-z text" is > also needed to prevent dynamic relocations in read-only segments. > > OK for trunk? I think the documentation for various options needs updating to clarify exactly what they mean. (And potentially help text, which for driver options is in gcc.c:display_help with the common.opt text being ignored in that case.) -static is no longer just "prevents linking with the shared libraries" as the documentation says, given it's also overriding (explicit or configure-time default) -pie. -pie is no longer just "Produce a position independent executable", it's producing a *dynamically linked* PIE. > +@item -static-pie > +@opindex static-pie > +Produce a static position independent executable on targets that support > +it. A static position independent executable is similar to static > +executable, but can be loaded at any address without a dynamic linker. "to a static executable". -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][RFC] Patch candidate for other/39851
On Tue, 8 Aug 2017, Martin Liška wrote: > Hi. > > As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39851#c0 we need > to call targetm.target_option.override () in order to properly report which > ISA options are enabled for a -march/-mtune. Currently, opts.c uses just > #include "common/common-target.h" we are unable to call the function direct. > One solution might be to put the hook to gcc/common/common-target.def, but > that would require huge refactoring of i386.c and i386-common.c files. > > Thus I came with a small hook that lives in cl_option_handler_func. > With that I see proper results for test-case mentioned in the PR. This patch is OK. As you note, making the targetm.target_option.override hook into a common option would involve much refactoring, because many of those hooks mix things acting purely on the options structure (which could readily become common) and things relating to other back-end state (which would need to stay in a hook that's only used in the compiler proper, not the driver). -- Joseph S. Myers jos...@codesourcery.com
Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)
Hi Rainer, > On 28 Aug 2017, at 16:33, Rainer Orthwrote: > > Hi Mike, > >> On Aug 12, 2017, at 9:03 AM, Rainer Orth >> wrote: >>> >>> The new gcc.dg/pie-static-[12].c testcases FAIL on several systems: >>> >>> * Solaris 11.4 has PIE support, but lacks static libc, libm >>> >>> * Linux without the static libc, libm installed >>> >>> The following patch fixes this by requiring both PIE and -static >>> support. >>> >>> Tested with the appropriate runtest invocations on i386-pc-solaris2.11 >>> and x86_64-pc-linux-gnu (where the tests come up as UNSUPPORTED; I don't >>> have a Linux system with static libc/libm installed), installed on >>> mainline. >>> >>> The tests also FAIL on Darwin/x86_64, but the failure mode is different: >>> for -static, the executable is linked with -lcrt0.o (done this way to >>> locate crt0.o in the linker's search path, cf. config/darwin.h >>> (STARTFILE_SPEC)), but neither on Darwin 11 nor on Darwin 17 could I >>> find where crt0.o would come from, so I've left this part alone. >> >> darwin isn't exactly like other systems. There is no crt0.o and static is >> more special than you can imagine. There was once 1 program that did a >> static link, but one was exceptionally special. >> >> Indeed, one way to implement it would be as a request option, and then >> ignore the parts that don't make sense for the platform. In that case, >> -static-libgcc and friends might be the end semantics. > > All this begs the question why on earth would darwin.h (STARTFILE_SPEC) > accept -static and try to link libcrt0.o it the latter doesn't exist. > > However, I've just found that the bundled clang on Darwin 11 does the > same (and fails in the same way: "ld: library not found for -lcrt0.o”). I think it’s because when one is bringing up the system, then one does (or at least used to) have a static libc/crt set. Thus the compiler did/does need to support it for that case. I’ve not done a kernel bootstrap since ≈ Darwin9 era, so things might have changed and this could be history leaking through. Iain Iain Sandoe CodeSourcery / Mentor Embedded / Siemens
Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)
Hi Mike, > On Aug 12, 2017, at 9:03 AM, Rainer Orth> wrote: >> >> The new gcc.dg/pie-static-[12].c testcases FAIL on several systems: >> >> * Solaris 11.4 has PIE support, but lacks static libc, libm >> >> * Linux without the static libc, libm installed >> >> The following patch fixes this by requiring both PIE and -static >> support. >> >> Tested with the appropriate runtest invocations on i386-pc-solaris2.11 >> and x86_64-pc-linux-gnu (where the tests come up as UNSUPPORTED; I don't >> have a Linux system with static libc/libm installed), installed on >> mainline. >> >> The tests also FAIL on Darwin/x86_64, but the failure mode is different: >> for -static, the executable is linked with -lcrt0.o (done this way to >> locate crt0.o in the linker's search path, cf. config/darwin.h >> (STARTFILE_SPEC)), but neither on Darwin 11 nor on Darwin 17 could I >> find where crt0.o would come from, so I've left this part alone. > > darwin isn't exactly like other systems. There is no crt0.o and static is > more special than you can imagine. There was once 1 program that did a > static link, but one was exceptionally special. > > Indeed, one way to implement it would be as a request option, and then > ignore the parts that don't make sense for the platform. In that case, > -static-libgcc and friends might be the end semantics. All this begs the question why on earth would darwin.h (STARTFILE_SPEC) accept -static and try to link libcrt0.o it the latter doesn't exist. However, I've just found that the bundled clang on Darwin 11 does the same (and fails in the same way: "ld: library not found for -lcrt0.o"). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] C/C++: add fix-it hints for various missing symbols
On 07/03/2017 12:37 PM, David Malcolm wrote: > This patch improves our C/C++ frontends' handling of missing > symbols, by making c_parser_require and cp_parser_require use > "better" locations for the diagnostic, and insert fix-it hints, > under certain circumstances (see the comments in the patch for > full details). > > For example, for this code with a missing semicolon: > > $ cat test.c > int missing_semicolon (void) > { > return 42 > } > > trunk currently emits: > > test.c:4:1: error: expected ‘;’ before ‘}’ token >} >^ > > This patch adds a fix-it hint for the missing semicolon, and puts > the error at the location of the missing semicolon, printing the > followup token as a secondary location: > > test.c:3:12: error: expected ‘;’ before ‘}’ token > return 42 > ^ > ; >} >~ > > More examples can be seen in the test cases. > > For reference, clang prints the following: > > test.c:3:12: error: expected ';' after return statement > return 42 > ^ > ; > > i.e. describing what syntactic thing came before, which > I think is likely to be more meaningful to the user. > > clang can also print notes about matching opening symbols > e.g. the note here: > > missing-symbol-2.c:25:22: error: expected ']' > const char test [42; >^ > missing-symbol-2.c:25:19: note: to match this '[' > const char test [42; > ^ > which, although somewhat redundant for this example, seems much more > useful if there's non-trivial nesting of constructs, or more than a few > lines separating the open/close symbols (e.g. showing a stray "namespace {" > that the user forgot to close). > > I'd like to implement both of these ideas as followups, but in > the meantime, is the fix-it hint patch OK for trunk? > (successfully bootstrapped & regrtested on x86_64-pc-linux-gnu) > > gcc/c-family/ChangeLog: > * c-common.c (c_parse_error): Add RICHLOC param, and use it rather > than implicitly using input_location. > (enum missing_token_insertion_kind): New enum. > (get_missing_token_insertion_kind): New function. > (maybe_suggest_missing_token_insertion): New function. > * c-common.h (c_parse_error): Add RICHLOC param. > (maybe_suggest_missing_token_insertion): New decl. > > gcc/c/ChangeLog: > * c-parser.c (struct c_parser): Add "previous_token_loc" field. > (c_parser_consume_token): Set parser->previous_token_loc. > (c_parser_error): Rename to... > (c_parser_error_richloc): ...this, making static, and adding > "richloc" parameter, passing it to the c_parse_error call, > rather than calling c_parser_set_source_position_from_token. > (c_parser_error): Reintroduce, reimplementing in terms of the > above. > (c_parser_require): Add "type_is_unique" param. Use > c_parser_error_richloc rather than c_parser_error, calling > maybe_suggest_missing_token_insertion. > (c_parser_parms_list_declarator): Override default value of new > "type_is_unique" param to c_parser_require. > (c_parser_asm_statement): Likewise. > * c-parser.h (c_parser_require): Add "type_is_unique" param, > defaulting to true. > > gcc/cp/ChangeLog: > * parser.c (cp_parser_error): Add rich_location to call to > c_parse_error. > (get_required_cpp_ttype): New function. > (cp_parser_required_error): Remove calls to cp_parser_error, > instead setting a non-NULL gmsgid, and handling it if set by > calling c_parse_error, potentially with a fix-it hint. > > gcc/testsuite/ChangeLog: > * c-c++-common/cilk-plus/AN/parser_errors.c: Update expected > output to reflect changes to reported locations of missing > symbols. > * c-c++-common/cilk-plus/AN/parser_errors2.c: Likewise. > * c-c++-common/cilk-plus/AN/parser_errors3.c: Likewise. > * c-c++-common/cilk-plus/AN/pr61191.c: Likewise. > * c-c++-common/gomp/pr63326.c: Likewise. > * c-c++-common/missing-symbol.c: New test case. > * g++.dg/cpp1y/digit-sep-neg.C: Update expected output to reflect > changes to reported locations of missing symbols. > * g++.dg/cpp1y/pr65202.C: Likewise. > * g++.dg/other/do1.C: Likewise. > * g++.dg/missing-symbol-2.C: New test case. > * g++.dg/parse/error11.C: Update expected output to reflect > changes to reported locations of missing symbols. > * g++.dg/parse/pragma2.C: Likewise. > * g++.dg/template/error11.C: Likewise. > * gcc.dg/missing-symbol-2.c: New test case. > * gcc.dg/missing-symbol-3.c: New test case. > * gcc.dg/noncompile/940112-1.c: Update expected output to reflect > changes to reported locations of missing symbols. > * gcc.dg/noncompile/971104-1.c: Likewise. > * obj-c++.dg/exceptions-6.mm: Likewise. > * obj-c++.dg/pr48187.mm: Likewise.
[C++ PATCH] Move field_vec creation
this patch moves the FIELD_VEC creation routines from class.c to name-lookup.c It's slightly more than a simple move as I include the following renaming: insert_into_classtype_sorted_fields -> set_class_bindings insert_late_enum_def_into_classtype_sorted_fields -> insert_late_enum_def_bindings Most of this should eventually evaporate, as I intend adding the bindings progressively during class definition. (and late bindings during their definition). But we'll see how far that gets ... nathan -- Nathan Sidwell 2017-08-28 Nathan Sidwell* cp-tree.h (insert_late_enum_def_into_classtype_sorted_fields): Delete. * name-lookup.h (set_class_bindings, insert_late_enum_def_bindings): Declare. * decl.c (finish_enum_value_list): Adjust for insert_late_enum_def_bindings name change. * class.c (finish_struct_1): Call set_class_bindings. (count_fields, add_fields_to_record_type, add_enum_fields_to_record_type, sorted_fields_type_new, insert_into_classtype_sorted_fields, insert_late_enum_def_into_classtype_sorted_fields): Move to ... * name-lookup.h (count_fields, add_fields_to_record_type, add_enum_fields_to_record_type, sorted_fields_type_new, set_class_bindings, insert_late_enum_def_bindings): ... here. Index: class.c === --- class.c (revision 251385) +++ class.c (working copy) @@ -139,9 +139,6 @@ static tree build_simple_base_path (tree static tree build_vtbl_ref_1 (tree, tree); static void build_vtbl_initializer (tree, tree, tree, tree, int *, vec **); -static int count_fields (tree); -static int add_fields_to_record_type (tree, struct sorted_fields_type*, int); -static void insert_into_classtype_sorted_fields (tree, tree, int); static bool check_bitfield_decl (tree); static bool check_field_decl (tree, tree, int *, int *); static void check_field_decls (tree, tree *, int *, int *); @@ -3378,61 +3375,6 @@ add_implicitly_declared_members (tree t, } } -/* Subroutine of insert_into_classtype_sorted_fields. Recursively - count the number of fields in TYPE, including anonymous union - members. */ - -static int -count_fields (tree fields) -{ - tree x; - int n_fields = 0; - for (x = fields; x; x = DECL_CHAIN (x)) -{ - if (DECL_DECLARES_FUNCTION_P (x)) - /* Functions are dealt with separately. */; - else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x))) - n_fields += count_fields (TYPE_FIELDS (TREE_TYPE (x))); - else - n_fields += 1; -} - return n_fields; -} - -/* Subroutine of insert_into_classtype_sorted_fields. Recursively add - all the fields in the TREE_LIST FIELDS to the SORTED_FIELDS_TYPE - elts, starting at offset IDX. */ - -static int -add_fields_to_record_type (tree fields, struct sorted_fields_type *field_vec, int idx) -{ - tree x; - for (x = fields; x; x = DECL_CHAIN (x)) -{ - if (DECL_DECLARES_FUNCTION_P (x)) - /* Functions are handled separately. */; - else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x))) - idx = add_fields_to_record_type (TYPE_FIELDS (TREE_TYPE (x)), field_vec, idx); - else - field_vec->elts[idx++] = x; -} - return idx; -} - -/* Add all of the enum values of ENUMTYPE, to the FIELD_VEC elts, - starting at offset IDX. */ - -static int -add_enum_fields_to_record_type (tree enumtype, -struct sorted_fields_type *field_vec, -int idx) -{ - tree values; - for (values = TYPE_VALUES (enumtype); values; values = TREE_CHAIN (values)) - field_vec->elts[idx++] = TREE_VALUE (values); - return idx; -} - /* FIELD is a bit-field. We are finishing the processing for its enclosing type. Issue any appropriate messages and set appropriate flags. Returns false if an error has been diagnosed. */ @@ -6592,21 +6534,6 @@ determine_key_method (tree type) return; } - -/* Allocate and return an instance of struct sorted_fields_type with - N fields. */ - -static struct sorted_fields_type * -sorted_fields_type_new (int n) -{ - struct sorted_fields_type *sft; - sft = (sorted_fields_type *) ggc_internal_alloc (sizeof (sorted_fields_type) - + n * sizeof (tree)); - sft->len = n; - - return sft; -} - /* Helper of find_flexarrays. Return true when FLD refers to a non-static class data member of non-zero size, otherwise false. */ @@ -7145,14 +7072,7 @@ finish_struct_1 (tree t) && same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t)) SET_DECL_MODE (x, TYPE_MODE (t)); - /* Done with FIELDS...now decide whether to sort these for - faster lookups later. - - We use a small number because most searches fail (succeeding - ultimately as the search bores through the inheritance - hierarchy), and we want this failure to occur quickly. */ - - insert_into_classtype_sorted_fields (TYPE_FIELDS (t), t, 8); + set_class_bindings (t, TYPE_FIELDS (t)); /* Complain if one of the field types requires
Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170
On 06/22/2017 09:28 AM, Alan Modra wrote: > PR80044 notes that -static and -pie together behave differently when > gcc is configured with --enable-default-pie as compared to configuring > without (or --disable-default-pie). This patch removes that > difference. In both cases you now will have -static completely > overriding -pie. > > Fixing this wasn't quite as simple as you'd expect, due to poor > separation of functionality. PIE_SPEC didn't just mean that -pie was > on explicitly or by default, but also -r and -shared were *not* on. > Fortunately the three files touched by this patch are the only places > PIE_SPEC and NO_PIE_SPEC are used, so it isn't too hard to see that > the reason PIE_SPEC and NO_PIE_SPEC are not inverses is the use of > PIE_SPEC in LINK_PIE_SPEC. So, move the inelegant symmetry breaking > addition, to LINK_PIE_SPEC where it belongs. Doing that showed > another problem in gnu-user.h, with PIE_SPEC and NO_PIE_SPEC selection > of crtbegin*.o not properly hooked into a chain of if .. elseif .. > conditions, which required both PIE_SPEC and NO_PIE_SPEC to exclude > -static and -shared. Fixing that particular problem finally allows > PIE_SPEC to serve just one purpose, and NO_PIE_SPEC to disappear. > > Bootstrapped and regression tested powerpc64le-linux c,c++. No > regressions and a bunch of --enable-default-pie failures squashed. > OK mainline and active branches? > > Incidentally, there is a fairly strong case to be made for adding > -static to the -shared, -pie, -no-pie chain of RejectNegative's in > common.opt. Since git 0d6378a9e (svn r48039) 2001-11-15, -static has > done more than just the traditional "prevent linking with dynamic > libraries", as -static selects crtbeginT.o rather than crtbegin.o > on GNU systems. Realizing this is what led me to close pr80044, which > I'd opened with the aim of making -pie -static work together (with the > traditional meaning of -static). I don't that is worth doing, but > mention pr80044 in the changelog due to fixing the insane output > produced by -pie -static with --disable-default-pie. > > PR driver/80044 > PR target/81170 > * gcc.c (NO_PIE_SPEC): Delete. > (PIE_SPEC): Define as !no-pie/pie. Move static|shared|r exclusion.. > (LINK_PIE_SPEC): ..to here. > * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct > chain of crtbegin*.o selection, update for PIE_SPEC changes and format. > (GNU_USER_TARGET_ENDFILE_SPEC): Similarly. > * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly. > (ENDFILE_CRTEND_SPEC): Similarly. > * config/rs6000/sysv4.h (STARTFILE_LINUX_SPEC): Upgrade to > match gnu-user.h startfile. > (ENDFILE_LINUX_SPEC): Similarly. So sorry for the horrible delay. What was the final resolution here? I saw a lot of back and forth with HJ and yourself. 80044 is CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the ASSIGNED state. jeff
[PATCH] small gcc.c cleanup
I committed this update to my post of https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01011.html, deciding it now sufficiently obvious :) nathan -- Nathan Sidwell 2017-08-28 Nathan Sidwell* gcc.c (execute): Fold SIGPIPE handling into switch statement. Adjust internal error message. Index: gcc.c === --- gcc.c (revision 251380) +++ gcc.c (working copy) @@ -3135,49 +3135,50 @@ execute (void) int status = statuses[i]; if (WIFSIGNALED (status)) - { -#ifdef SIGPIPE - /* SIGPIPE is a special case. It happens in -pipe mode - when the compiler dies before the preprocessor is done, - or the assembler dies before the compiler is done. - There's generally been an error already, and this is - just fallout. So don't generate another error unless - we would otherwise have succeeded. */ - if (WTERMSIG (status) == SIGPIPE - && (signal_count || greatest_status >= MIN_FATAL_STATUS)) - { - signal_count++; - ret_code = -1; - } - else -#endif - switch (WTERMSIG (status)) - { - case SIGINT: - /* SIGQUIT and SIGKILL are not available on MinGW. */ + switch (WTERMSIG (status)) + { + case SIGINT: + case SIGTERM: + /* SIGQUIT and SIGKILL are not available on MinGW. */ #ifdef SIGQUIT - case SIGQUIT: + case SIGQUIT: #endif #ifdef SIGKILL - case SIGKILL: + case SIGKILL: #endif - case SIGTERM: - /* The user (or environment) did something to the - inferior. Making this an ICE confuses the user - into thinking there's a compiler bug. Much more - likely is the user or OOM killer nuked it. */ - fatal_error (input_location, - "%s signal terminated program %s", - strsignal (WTERMSIG (status)), - commands[i].prog); + /* The user (or environment) did something to the + inferior. Making this an ICE confuses the user into + thinking there's a compiler bug. Much more likely is + the user or OOM killer nuked it. */ + fatal_error (input_location, + "%s signal terminated program %s", + strsignal (WTERMSIG (status)), + commands[i].prog); + break; + +#ifdef SIGPIPE + case SIGPIPE: + /* SIGPIPE is a special case. It happens in -pipe mode + when the compiler dies before the preprocessor is + done, or the assembler dies before the compiler is + done. There's generally been an error already, and + this is just fallout. So don't generate another + error unless we would otherwise have succeeded. */ + if (signal_count || greatest_status >= MIN_FATAL_STATUS) + { + signal_count++; + ret_code = -1; break; - default: - /* The inferior failed to catch the signal. */ - internal_error_no_backtrace ("%s (program %s)", - strsignal (WTERMSIG (status)), - commands[i].prog); } - } +#endif + /* FALLTHROUGH */ + + default: + /* The inferior failed to catch the signal. */ + internal_error_no_backtrace ("%s signal terminated program %s", + strsignal (WTERMSIG (status)), + commands[i].prog); + } else if (WIFEXITED (status) && WEXITSTATUS (status) >= MIN_FATAL_STATUS) {
Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers
Hi Janus, I think an unconditional warning is OK in this case because - Assigning to a pointer from an obvious non-contiguous target is not useful at all, that I can see I guess you're talking about a *contiguous* pointer here, Correct. and in that case I would argue that, beyond being "not useful", it's even illegal, so why not throw a hard error, if we can infer at compile-time that the target is non-contiguous? Problem is, we cannot infer this from the tests done. We would also have to add a test if the array is empty or that it contains only a single element, and that (I think) is a) impossible in the general case, and b) not worth the bother. - Some language laywer will come up with the fact that it is, in fact, legal if the target is empty or has a single element only, so a hard error would be a rejects-valid. Should be possible to detect and ignore such cases, shouldn't it? Not in general. However, I can also make this into a warning depending on -Wall, if this is preferred. As explained above, I'd vote for an error (or at least a conditional warning, so that it can be disabled, like most(?) other gfortran warnings). Attached is a version which makes this a warning enabled by -Wall; this should be enough to give people a heads-up. I have introduced most of your comments into the revised patch. Also I noticed that there is a function called "gfc_is_simply_contiguous" in expr.c. Could that be useful for your purpose? (Some of the code in there looks very similar to the logic you're adding.) There is a subtle distinction between "simply contiguous" where the compiler _has_ to know at compile-time that the expression is contigous (and failure to diagnose is is a compiler error) and "contiguous" where the burden is on the programmer. My patch is intended to be an aid to the programmer for the "continuous" case. Because of the "simple contiguous" thing, the function does not quite match the requirements. So, here is the updated patch. Regression-tested on powerpc-linux, make dvi and make pdf also passed. OK for trunk? Regards Thomas 2017-08-27 Thomas KoenigPR fortran/49232 * expr.c (gfc_check_pointer_assign): Warn for suspicious assignments with contiguous pointrs if -Wcontiguous is given. * lang.opt (Wcontiguous): New option, implied by -Wall. * invoke.texi: Document -Wcontiguous. 2017-08-27 Thomas Koenig PR fortran/49232 * gfortran.dg/contiguous_4.f90: New test. Index: expr.c === --- expr.c (Revision 251375) +++ expr.c (Arbeitskopie) @@ -3802,6 +3802,54 @@ } } + /* Warn for assignments of contiguous pointers to targets which might + not be contiguous. */ + + if (warn_contiguous && lhs_attr.contiguous) +{ + gfc_array_ref *ar; + int i; + bool do_warn = false; + + ar = gfc_find_array_ref (rvalue, true); + if (ar && ar->type == AR_SECTION) + { + for (i = 0; i < ar->dimen; i++) + { + /* Check for p => a(:,:,2). */ + if (ar->dimen_type[i] == DIMEN_RANGE && ar->stride[i] + && (ar->stride[i]->expr_type != EXPR_CONSTANT + || (ar->stride[i]->expr_type == EXPR_CONSTANT + && mpz_cmp_si (ar->stride[i]->value.integer, 1 + { + do_warn = true; + break; + } + } + if (!do_warn && ar->dimen > 1) + { + /* Check for p => a(2:4,:) */ + for (i = 0; i < ar->dimen - 1; i++) + { + if ((ar->start[i] && ar->as->lower[i] + && gfc_dep_compare_expr (ar->start[i], ar->as->lower[i]) + != 0) + || (ar->end[i] && ar->as->upper[i] + && gfc_dep_compare_expr (ar->end[i], ar->as->upper[i]) + != 0)) + { + do_warn = true; + break; + } + } + } + } + if (do_warn) + gfc_warning (OPT_Wcontiguous, "Assignment to contiguous pointer " + "from possibly non-contiguous target at %L", + >where); +} + /* Warn if it is the LHS pointer may lives longer than the RHS target. */ if (warn_target_lifetime && rvalue->expr_type == EXPR_VARIABLE Index: invoke.texi === --- invoke.texi (Revision 251375) +++ invoke.texi (Arbeitskopie) @@ -145,7 +145,7 @@ @xref{Error and Warning Options,,Options to request or suppress errors and warnings}. @gccoptlist{-Waliasing -Wall -Wampersand -Wargument-mismatch -Warray-bounds --Wc-binding-type -Wcharacter-truncation @gol +-Wc-binding-type -Wcharacter-truncation -Wcontiguous @gol -Wconversion -Wfunction-elimination -Wimplicit-interface @gol -Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only -Wintrinsics-std @gol -Wline-truncation -Wno-align-commons -Wno-tabs -Wreal-q-constant @gol @@ -797,7 +797,7 @@ This currently includes @option{-Waliasing}, @option{-Wampersand}, @option{-Wconversion},
[PATCH] ira-costs: avoid missing base registers in record_address_regs
Hello, The code in record_address_regs shown in the following patch assumes that if a given target cannot have two registers in a memory address, then the sole register, if present, must be the leftmost operand in the PLUS chain. I think this is not true if the target uses unspecs to signify special addressing modes such as TLS. In that case the unspec can be to the left of the register, and this function won't see the register. The proposed fix is to always recurse into non-constant operands like in the adjacent case of index registers being the same as base registers. OK to apply? (most popular targets have MAX_REGS_PER_ADDRESS == 2, so this problem doesn't arise) Alexander * ira-costs.c (record_address_regs): Handle both operands of PLUS for MAX_REGS_PER_ADDRESS == 1. diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 2cd102a0810..1690e889471 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -1138,17 +1138,12 @@ record_address_regs (machine_mode mode, addr_space_t as, rtx x, if (code1 == SUBREG) arg1 = SUBREG_REG (arg1), code1 = GET_CODE (arg1); - /* If this machine only allows one register per address, it - must be in the first operand. */ - if (MAX_REGS_PER_ADDRESS == 1) - record_address_regs (mode, as, arg0, 0, PLUS, code1, scale); - - /* If index and base registers are the same on this machine, + /* If index registers do not appear, or coincide with base registers, just record registers in any non-constant operands. We assume here, as well as in the tests below, that all addresses are in canonical form. */ - else if (INDEX_REG_CLASS -== base_reg_class (VOIDmode, as, PLUS, SCRATCH)) + if (MAX_REGS_PER_ADDRESS == 1 + || INDEX_REG_CLASS == base_reg_class (VOIDmode, as, PLUS, SCRATCH)) { record_address_regs (mode, as, arg0, context, PLUS, code1, scale); if (! CONSTANT_P (arg1))
Re: [PATCH] Add libgcc support for cache clearing on ARM VxWorks
Hi Richard, I discussed with the original author, have looked into the ARM cache matters in greater detail and certainly agree with your general concerns. There are pieces related to the insn cache clearing in the code, but I can see how architecture specific everything is here, indeed not appropriate as a single non-specialized entry point in libgcc. Thanks again for your arm-expert input on the topic. Still working on this. In the meantime, opinion on the two patches below ? https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00278.html (Handle DWARF2_UNWIND_INFO in arm_except_unwind_info) https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00271.html (Fix .cfi inconsistency out of builtin_eh_return) Maybe the dwarf2 unwind support could go away when support for the old ABI gets removed. From the VxWorks perspective, dwarf2 is only needed by versions of VxWorks which rely on the old ABI. Thanks in advance for your feedback, With Kind Regards, Olivier
Re: [WIP] Possible Bug in vect_bb_slp_scalar_cost?
On Mon, Aug 28, 2017 at 10:57 AM, Dominik Inführwrote: > Hi, > > As discussed on IRC: This patches fixes the calculation of the scalar costs > of SLP vectorization. I’ve added a test case and the auto_vec allocation is > now reused for all children of a node. Looks good to me and thanks for the testcase. I'll include this in my next round of testing and make sure it's fixed for GCC 8. Thanks, Richard. > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c > b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c > new file mode 100644 > index 000..5121a88 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-fdump-tree-slp-details" } */ > + > +#define N 4 > + > +int s1[N], s2[N], s3[N]; > +void escape(int, int, int, int); > + > +void > +foo () > +{ > + int t1, t2, t3, t4; > + > + t1 = s1[0] + s2[0] + s3[0]; > + t2 = s1[1] + s2[1] + s3[1]; > + t3 = s1[2] + s2[2] + s3[2]; > + t4 = s1[3] + s2[3] + s3[3]; > + > + s3[0] = t1 - s1[0] * s2[0]; > + s3[1] = t2 - s1[1] * s2[1]; > + s3[2] = t3 - s1[2] * s2[2]; > + s3[3] = t4 - s1[3] * s2[3]; > + > + escape (t1, t2, t3, t4); > +} > + > +/* { dg-final { scan-tree-dump-not "vectorization is not profitable" "slp2" > } } */ > +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */ > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index 04ecaab..cffd19b 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -2690,9 +2690,17 @@ vect_bb_slp_scalar_cost (basic_block bb, > scalar_cost += stmt_cost; > } > > + auto_vec subtree_life; > + > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > - scalar_cost += vect_bb_slp_scalar_cost (bb, child, life); > + { > + /* Do not directly pass LIFE to the recursive call, copy it to > + confine changes in the callee to the current child/subtree. */ > + subtree_life.safe_splice (*life); > + scalar_cost += vect_bb_slp_scalar_cost (bb, child, _life); > + subtree_life.truncate (0); > + } > > return scalar_cost; > } > >> On 04 Aug 2017, at 12:19, Richard Biener wrote: >> >> On Fri, Aug 4, 2017 at 12:08 PM, Dominik Inführ >> wrote: >>> Hi, >>> >>> vect_bb_slp_scalar_cost computes the scalar cost of a SLP node. If there >>> are non-scalar uses of a definition, the costs for it and its operands >>> (children) are ignored. The vector LIFE is used to keep track of this and >>> an element is set to true, such that the def and its children are ignored. >>> But as soon as an element is set to true it is never undone, that means the >>> following sibling and parent nodes of the current node also stay ignored. >> >> Yes, that's intended. They are live as well because they are needed >> to compute the live scalar. >> >> This seems wrong to me, a simple fix would be to clone LIFE for every >> vector, such that changes to LIFE stay in the subtree: >>> >>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c >>> index 032a9444a5a..f919645f28b 100644 >>> --- a/gcc/tree-vect-slp.c >>> +++ b/gcc/tree-vect-slp.c >>> @@ -2590,6 +2590,11 @@ vect_bb_slp_scalar_cost (basic_block bb, >>> gimple *stmt; >>> slp_tree child; >>> >>> + auto_vec subtree_life; >>> + subtree_life.safe_splice (*life); >>> + >>> + life = _life; >>> + >>> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt) >>> { >>> unsigned stmt_cost; >>> >>> >
Re: Improve ECF_NOTHROW flags for direct internal functions
On Mon, Aug 28, 2017 at 10:16 AM, Richard Sandifordwrote: > Richard Biener writes: >> On Thu, Aug 17, 2017 at 1:06 PM, Richard Sandiford >> wrote >>> Richard Biener writes: On Thu, Aug 17, 2017 at 11:49 AM, Richard Sandiford wrote: > Internal functions that map directly to an optab can only throw an > exception for -fnon-call-exceptions. This patch handles that in > internal_fn_flags, in a similar way to ATTR_*NOTHROW in builtins.def. > > (Functions that don't throw even for flag_non_call_exceptions should be > explicitly marked ECF_NOTHROW in internal-fn.def.) > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Hmm. Note the outcome of flag_non_call_exceptions depends on the current function and thus IPA passes querying flags would need to push an appropriate function context. This means the function should get a struct function * argument and opt_for_fn (fn, flag_non_call_exceptions) should be used. It doesn't help very much that all callers don't have any such context either which means this "optimization" looks like in the wrong place :/ (the global value of flag_non_call_exceptions in the IPA case isn't necessarily conservative). So if you insist then add a comment and add a && cfun check so we're sure we are in non-IPA context (or in properly setup context). >>> >>> Bah. In that case, what should happen if a -fno-non-call-exceptions >>> function is inlined into an -fnon-call-exceptions one? Should the call >>> keep the NOTHROWness of the original function, or should it lose >>> NOTHROWness (and thus gain an exception edge)? >> >> nothrow-ness is tracked on the GIMPLE stmt via gimple_call_[set_]nothrow_p, >> GIMPLE shouldn't look at flag_non_call_exceptions, it is basically part >> of the IL. >> >>> I guess the path of least resistance would be to add an extra check >>> for this case in the places that need it, rather than relying solely >>> on gimple_call_flags. >> >> Well, gimple_call_flags works fine already (looking at the above in >> addition to internal_fn_flags). call_expr_flags looks like it might not. > > OK, how does this look? Only the gimple flags matter for the use > case I've seen (which is covered by the SVE tests, but hard to > test as-is). > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. Ok. Thanks, Richard. > Thanks, > Richard > > > 2017-08-28 Richard Sandiford > > gcc/ > * gimplify.c (gimplify_call_expr): Copy the nothrow flag to > calls to internal functions. > (gimplify_modify_expr): Likewise. > * tree-call-cdce.c (use_internal_fn): Likewise. > * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Likewise. > (convert_to_divmod): Set the nothrow flag. > * tree-if-conv.c (predicate_mem_writes): Likewise. > * tree-vect-stmts.c (vectorizable_mask_load_store): Likewise. > (vectorizable_call): Likewise. > (vectorizable_store): Likewise. > (vectorizable_load): Likewise. > * tree-vect-patterns.c (vect_recog_pow_pattern): Likewise. > (vect_recog_mask_conversion_pattern): Likewise. > > Index: gcc/gimplify.c > === > *** gcc/gimplify.c 2017-08-17 13:17:21.266412322 +0100 > --- gcc/gimplify.c 2017-08-28 09:10:13.246297839 +0100 > *** gimplify_call_expr (tree *expr_p, gimple > *** 3150,3156 > > if (EXPR_CILK_SPAWN (*expr_p)) > gimplify_cilk_detach (pre_p); > ! gimple *call = gimple_build_call_internal_vec (ifn, vargs); > gimplify_seq_add_stmt (pre_p, call); > return GS_ALL_DONE; > } > --- 3150,3157 > > if (EXPR_CILK_SPAWN (*expr_p)) > gimplify_cilk_detach (pre_p); > ! gcall *call = gimple_build_call_internal_vec (ifn, vargs); > ! gimple_call_set_nothrow (call, TREE_NOTHROW (*expr_p)); > gimplify_seq_add_stmt (pre_p, call); > return GS_ALL_DONE; > } > *** gimplify_modify_expr (tree *expr_p, gimp > *** 5636,5641 > --- 5637,5643 > vargs.quick_push (CALL_EXPR_ARG (*from_p, i)); > } > call_stmt = gimple_build_call_internal_vec (ifn, vargs); > + gimple_call_set_nothrow (call_stmt, TREE_NOTHROW (*from_p)); > gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p)); > } > else > Index: gcc/tree-call-cdce.c > === > *** gcc/tree-call-cdce.c2017-06-30 12:50:38.243662675 +0100 > --- gcc/tree-call-cdce.c2017-08-28 09:10:13.246297839 +0100 > *** use_internal_fn (gcall *call) > *** 1019,1024 > ---
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidtwrote: > Hi, > > Here's v2 of the patch with Jakub's suggestions incorporated. Bootstrapped > and tested on powerpc64le-linux-gnu with no regressions. Is this ok for > trunk? > > Eventually this should be backported to all active releases as well. > Ok for that after a week or so of burn-in? (And after 7.2, I imagine.) > > Thanks, > Bill > > > [gcc] > > 2017-08-03 Bill Schmidt > Jakub Jelinek > > PR tree-optimization/81503 > * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure > folded constant fits in the target type. > > [gcc/testsuite] > > 2017-08-03 Bill Schmidt > Jakub Jelinek > > PR tree-optimization/81503 > * gcc.c-torture/execute/pr81503.c: New file. > > > Index: gcc/gimple-ssa-strength-reduction.c > === > --- gcc/gimple-ssa-strength-reduction.c (revision 250791) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > { >tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); > + unsigned int prec = TYPE_PRECISION (target_type); > + tree maxval = (POINTER_TYPE_P (target_type) > +? TYPE_MAX_VALUE (sizetype) > +: TYPE_MAX_VALUE (target_type)); > >/* It is highly unlikely, but possible, that the resulting > bump doesn't fit in a HWI. Abandon the replacement > @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > types but allows for safe negation without twisted logic. */ >if (wi::fits_shwi_p (bump) >&& bump.to_shwi () != HOST_WIDE_INT_MIN > + /* It is more likely that the bump doesn't fit in the target > +type, so check whether constraining it to that type changes > +the value. For a signed type, the value mustn't change. > +For an unsigned type, the value may only change to a > +congruent value (for negative bumps). */ > + && (TYPE_UNSIGNED (target_type) > + ? wi::eq_p (wi::neg_p (bump) > + ? bump + wi::to_widest (maxval) + 1 > + : bump, > + wi::zext (bump, prec)) > + : wi::eq_p (bump, wi::sext (bump, prec))) Not sure, but would it be fixed in a similar way when writing @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); - /* It is highly unlikely, but possible, that the resulting - bump doesn't fit in a HWI. Abandon the replacement - in this case. This does not affect siblings or dependents - of C. Restriction to signed HWI is conservative for unsigned - types but allows for safe negation without twisted logic. */ - if (wi::fits_shwi_p (bump) - && bump.to_shwi () != HOST_WIDE_INT_MIN - /* It is not useful to replace casts, copies, negates, or adds of -an SSA name and a constant. */ - && cand_code != SSA_NAME + /* It is not useful to replace casts, copies, negates, or adds of + an SSA name and a constant. */ + if (cand_code != SSA_NAME && !CONVERT_EXPR_CODE_P (cand_code) && cand_code != PLUS_EXPR && cand_code != POINTER_PLUS_EXPR @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t tree bump_tree; gimple *stmt_to_print = NULL; - /* If the basis name and the candidate's LHS have incompatible -types, introduce a cast. */ - if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) - basis_name = introduce_cast_before_cand (c, target_type, basis_name); if (wi::neg_p (bump)) { code = MINUS_EXPR; bump = -bump; } + /* It is possible that the resulting bump doesn't fit in target_type. +Abandon the replacement in this case. This does not affect +siblings or dependents of C. */ + if (bump != wi::ext (bump, TYPE_PRECISION (target_type), + TYPE_SIGN (target_type))) + return; bump_tree = wide_int_to_tree (target_type, bump); + /* If the basis name and the candidate's LHS have incompatible +types, introduce a cast. */ + if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name))) + basis_name = introduce_cast_before_cand (c, target_type, basis_name); + if (dump_file && (dump_flags & TDF_DETAILS)) { fputs ("Replacing: ", dump_file); ? >/* It is not useful to replace casts, copies, negates, or adds of > an SSA name and a constant. */ >&& cand_code !=
Re: [RFC] [PATCH] Introduce configure flag --with-stage1-cflags.
On Fri, Aug 25, 2017 at 9:51 PM, Jeff Lawwrote: > On 07/31/2017 01:47 AM, Martin Liška wrote: >> I would like to ping this. Input from other people will be appreciated ;) > I think the thing to keep in mind here is that IIUC this only affects > things when we've configured using the --with-stage1-cflags option. > > So questions about is -O1 more stable than -O2, should we restrict -O2 > to newer compilers, etc are really more about the defaults we set. > > My understanding is the patch is just adding the capability and does not > change the default. Assuming that's the case, then I'm comfortable > acking the raw infrastructure. OTOH you can simply set STAGE1_CFLAGS so the value of this as a configure option is somewhat questionable. > jeff
Re: Statement Frontier Notes, Location Views, and Inlined Entry Point Markers
On Fri, Aug 25, 2017 at 4:26 PM, Alexandre Olivawrote: > On Aug 23, 2017, Richard Biener wrote: > if they are not a problem up until here why care now? > >>> IIRC we do have a limit for VTA notes too, but there's a C++ testcase >>> (g++.dg/tree-ssa/pr14703.C) that expands and inlines fibonacci template >>> functions so deep, more than doubling the number of statements at all >>> but the base recursion levels, so we'd end up with over 2^{85+} debug >>> stmts if we didn't cut them off somehow. > >> Yeah, but I meant we've kept them throughout GIMPLE (for all functions!) >> but are dropping them here at RTL expansion (which we'll have only a >> single live RTL function at a time). That looks odd ;) > > Aah, yeah, the point is, if we find we exceeded the limit, we don't > bother to clean up the gimple, we just refrain from wasting further time > with it, which we would if we converted them to RTL (and then threw them > away), or copied them all when inlining into some other function. We > could clean up at some point, just as we could stop emitting further > markers once the limit is reached, but it didn't seem important enough > to do so. Should it prove to be, I guess it wouldn't be too hard to add > it to gimple verification passes that walk over all stmts. > >> You're already dropping them at inlining as well so the RTL expansion >> check should be superfluous IMHO (yeah, unrolling might push it over >> the edge for example but all real issues should come from inlining). > > The RTL expansion check is indeed not essential, but if we're over the > limit, we'll to throw it all away, so why bother expanding it and > carrying it through all RTL passes just to throw away at the end? Or > should we not throw it away in this case, and make the limit apply only > to inlining? But then, what if we inline lots of very large functions > into a single one, do we still want to use markers for that function? > That's not how I designed it, but I guess it might work that way too. Hmm, all existing markers should be valid, no? So throwing them away at RTL expansion time will only make the function "consistent" but drop still useful stuff? > >> Hmm, yeah. I guess we'd have to have a multi-DEBUG_STMT that covers >> not only multiple markers but also multiple binds. High GIMPLE has >> nested stmts so it might be tempting to wrap adjacent debug-stmts into >> a single one (basically make the IL walking overhead with debug stmts >> smaller). >> Costs extra memory instead of less when compared to my idea of course. > > Yeah. I guess that's doable and it won't make gimple passes much > trickier: in most cases all that matters are the SSA uses in bind value > expressions, so as long as the update function can efficiently pick the > SSA uses from the op array, it could be a significant win. > > We may need some way to reset one specific bind given a use that is no > longer valid, which I don't immediately see how to implement efficiently > in a multi-debug pack . > > Now, I spent some time trying to think of how to pack multiple debug > stmts in a way that made them also save memory. > > For each packed stmt, we need at least one bit to indicate whether it's > a bind or just a marker. Markers then need a locus, and another bit > indicating whether it's a begin stmt marker or an inline entry point > marker. Debug bind stmts need one bit to indicate tell src binds from > regular ones, and two trees (no locus). > > It is unlikely that it would make sense to allocate extra memory, be it > trees holding integral values, be it other arrays to hold them. I'm > thinking we'd be better off storing some of these bits in an analogous > of the trailing op VLA, that would be present in gdebug but that would > deal with GGC and ssa updates in its own way. > > For packs with few stmts, we could use bits from the subcode to indicate > the count and the kinds. We could use the gimple locus for the first > marker, and then perhaps pack pairs of loci in tree pointer operands (if > their sizes are 1:2, as in lp64). > > When packing more than few stmts, we could then define a format for a > 32-bit word to hold the bits for an additional set of stmts, possibly > packed in the same word as a locus or another such bit pack. Ideally, > should we need more than one of these, we should indicate upfront how > many of these there are, or at least how many ops are used. > > I was thinking it would be ideal if combining two many-stmts debug stmts > could require little more than allocating a gimple with a larger ops > array and copying (most of) the original op arrays to the right places. > > But... this all feels far too hackish and not very maintainable or > forward-looking. E.g., if we add more kinds of debug stmts, the bit > counting suddenly no longer applies, and needs to be reworked. > > So I guess that's also doable, and would save some memory indeed, > but... do you think it's
[PING][PATCH 2/3] retire mem_signal_fence pattern
Ping (for this and patch 3/3 in the thread). On Wed, 2 Aug 2017, Alexander Monakov wrote: > Similar to mem_thread_fence issue from the patch 1/3, RTL representation of > __atomic_signal_fence must be a compiler barrier. We have just one backend > offering this pattern, and it does not place a compiler barrier. > > It does not appear useful to expand signal_fence to some kind of hardware > instruction, its documentation is wrong/outdated, and we are using it > incorrectly anyway. So just remove the whole thing and simply emit a compiler > memory barrier in the optabs.c handler. > > * config/s390/s390.md (mem_signal_fence): Remove. > * doc/md.texi (mem_signal_fence): Remove. > * optabs.c (expand_mem_signal_fence): Remove uses of mem_signal_fence. > Update comments. > * target-insns.def (mem_signal_fence): Remove. > > --- > gcc/config/s390/s390.md | 9 - > gcc/doc/md.texi | 13 - > gcc/optabs.c| 17 + > gcc/target-insns.def| 1 - > 4 files changed, 5 insertions(+), 35 deletions(-) > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index d1ac0b8395d..1d63523d3b0 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -10084,15 +10084,6 @@ (define_insn "*basr_tls" > ; memory barrier patterns. > ; > > -(define_expand "mem_signal_fence" > - [(match_operand:SI 0 "const_int_operand")] ;; model > - "" > -{ > - /* The s390 memory model is strong enough not to require any > - barrier in order to synchronize a thread with itself. */ > - DONE; > -}) > - > (define_expand "mem_thread_fence" >[(match_operand:SI 0 "const_int_operand")] ;; model >"" > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 0be161a08dc..73d0e7d83c0 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -7058,19 +7058,6 @@ If this pattern is not specified, all memory models > except > @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize} > barrier pattern. > > -@cindex @code{mem_signal_fence@var{mode}} instruction pattern > -@item @samp{mem_signal_fence@var{mode}} > -This pattern emits code required to implement a signal fence with > -memory model semantics. Operand 0 is the memory model to be used. > - > -This pattern should impact the compiler optimizers the same way that > -mem_signal_fence does, but it does not need to issue any barrier > -instructions. > - > -If this pattern is not specified, all memory models except > -@code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize} > -barrier pattern. > - > @cindex @code{get_thread_pointer@var{mode}} instruction pattern > @cindex @code{set_thread_pointer@var{mode}} instruction pattern > @item @samp{get_thread_pointer@var{mode}} > diff --git a/gcc/optabs.c b/gcc/optabs.c > index d568ca376fe..6884fd4b8aa 100644 > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -6315,22 +6315,15 @@ expand_mem_thread_fence (enum memmodel model) > } > } > > -/* This routine will either emit the mem_signal_fence pattern or issue a > - sync_synchronize to generate a fence for memory model MEMMODEL. */ > +/* Emit a signal fence with given memory model. */ > > void > expand_mem_signal_fence (enum memmodel model) > { > - if (targetm.have_mem_signal_fence ()) > -emit_insn (targetm.gen_mem_signal_fence (GEN_INT (model))); > - else if (!is_mm_relaxed (model)) > -{ > - /* By default targets are coherent between a thread and the signal > - handler running on the same thread. Thus this really becomes a > - compiler barrier, in that stores must not be sunk past > - (or raised above) a given point. */ > - expand_asm_memory_barrier (); > -} > + /* No machine barrier is required to implement a signal fence, but > + a compiler memory barrier must be issued, except for relaxed MM. */ > + if (!is_mm_relaxed (model)) > +expand_asm_memory_barrier (); > } > > /* This function expands the atomic load operation: > diff --git a/gcc/target-insns.def b/gcc/target-insns.def > index fb92f72ac29..4669439c7e1 100644 > --- a/gcc/target-insns.def > +++ b/gcc/target-insns.def > @@ -58,7 +58,6 @@ DEF_TARGET_INSN (indirect_jump, (rtx x0)) > DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3)) > DEF_TARGET_INSN (jump, (rtx x0)) > DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2)) > -DEF_TARGET_INSN (mem_signal_fence, (rtx x0)) > DEF_TARGET_INSN (mem_thread_fence, (rtx x0)) > DEF_TARGET_INSN (memory_barrier, (void)) > DEF_TARGET_INSN (movstr, (rtx x0, rtx x1, rtx x2)) >
[PATCH] Improve on PR81968
The following avoids invalid flag combinations on removed sections and fixes a word-size bug I noticed when doing so. LTO bootstrap / testing in progress on x86_64-unknown-linux-gnu. Richard. 2017-08-28 Richard BienerPR lto/81968 * simple-object-elf.c (simple_object_elf_copy_lto_debug_section): Adjust field with for sh_type write, set SHF_EXCLUDE only for removed sections. Index: libiberty/simple-object-elf.c === --- libiberty/simple-object-elf.c (revision 251377) +++ libiberty/simple-object-elf.c (working copy) @@ -1382,7 +1382,7 @@ simple_object_elf_copy_lto_debug_section unused. That allows the link editor to remove it in a partial link. */ ELF_SET_FIELD (type_functions, ei_class, Shdr, -shdr, sh_type, Elf_Addr, SHT_NULL); +shdr, sh_type, Elf_Word, SHT_NULL); } flags = ELF_FETCH_FIELD (type_functions, ei_class, Shdr, @@ -1390,7 +1390,7 @@ simple_object_elf_copy_lto_debug_section if (ret == 0) flags &= ~SHF_EXCLUDE; else if (ret == -1) - flags |= SHF_EXCLUDE; + flags = SHF_EXCLUDE; ELF_SET_FIELD (type_functions, ei_class, Shdr, shdr, sh_flags, Elf_Addr, flags); }
[PATCH] Fix PR81993
I am testing the following to cure -gsplit-dwarf with early type pruning. It looks like we have zero testsuite coverage for -gsplit-dwarf (or my grep skills are broken). No testcase, g++.dg/[dwarf2/] isn't set up to do link tests. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2017-08-28 Richard BienerPR debug/81993 * dwarf2out.c (gen_remaining_tmpl_value_param_die_attributes): Do nothing for removed DIEs. Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 251374) +++ gcc/dwarf2out.c (working copy) @@ -26037,7 +26037,8 @@ gen_remaining_tmpl_value_param_die_attri j = 0; FOR_EACH_VEC_ELT (*tmpl_value_parm_die_table, i, e) { - if (!tree_add_const_value_attribute (e->die, e->arg)) + if (!e->die->removed + && !tree_add_const_value_attribute (e->die, e->arg)) { dw_loc_descr_ref loc = NULL; if (! early_dwarf
[PATCH] Fix PR81977
The following fixes a failure to account for lhs_offset (or rather to consistently use the same offset in op0 and off). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2017-08-28 Richard BienerPR tree-optimization/81977 * tree-ssa-sccvn.c (vn_reference_lookup_3): Fix look through memcpy. * g++.dg/torture/pr81977.C: New testcase. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 251377) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -2334,7 +2334,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree memset (, 0, sizeof (op)); op.type = vr->type; op.opcode = MEM_REF; - op.op0 = build_int_cst (ptr_type_node, at - rhs_offset); + op.op0 = build_int_cst (ptr_type_node, at - lhs_offset + rhs_offset); op.off = at - lhs_offset + rhs_offset; vr->operands[0] = op; op.type = TREE_TYPE (rhs); Index: gcc/testsuite/g++.dg/torture/pr81977.C === --- gcc/testsuite/g++.dg/torture/pr81977.C (nonexistent) +++ gcc/testsuite/g++.dg/torture/pr81977.C (working copy) @@ -0,0 +1,55 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32plus } */ + +#include + +typedef struct +{ + uint16_t x ; + uint16_t y ; + uint64_t z ; +} __attribute__((packed, aligned(1))) TestMsgType; + +struct Payload +{ + uint16_t header_info[2]; + TestMsgType _pref; + void Pack(uint8_t *buffer) +{ + __builtin_memcpy(buffer, &_pref, sizeof(_pref)); +} + void UnPack(uint8_t *buffer) +{ + __builtin_memcpy(&_pref, buffer, sizeof(_pref)); +} +}; + + +struct Msg +{ + Payload _payload; + void Pack(uint8_t *buffer) +{ + _payload.Pack(buffer); +} + + void UnPack(uint8_t *buffer) +{ + _payload.UnPack(buffer); +} +}; + +int main() +{ + uint8_t * buffer = new uint8_t [30]; + Msg msg; + Msg msg1; + msg._payload._pref.x = 0xabcd; + msg._payload._pref.y = 0xa; + msg._payload._pref.z = 0x0001020304051617; + msg.Pack([0]); + msg1.UnPack([0]); + if (msg1._payload._pref.x != 0xabcd) +__builtin_abort (); + delete [] buffer; +}
Turn FUNCTION_ARG_PADDING into a target hook
This involved renaming the rather general-sounding "enum direction" to "enum pad_direction" to avoid a conflict with the Fortran frontend. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by checking that there were no extra warnings or changes in testsuite assembly output for at least one target per CPU. OK to install? Richard 2017-08-28 Richard SandifordAlan Hayward David Sherwood gcc/ * coretypes.h (pad_direction): New enum. * defaults.h (DEFAULT_FUNCTION_ARG_PADDING): Delete. (FUNCTION_ARG_PADDING): Likewise. * target.def (function_arg_padding): New hook. * targhooks.h (default_function_arg_padding): Declare. * targhooks.c (default_function_arg_padding): New function. * doc/tm.texi.in (FUNCTION_ARG_PADDING): Replace with... (TARGET_FUNCTION_ARG_PADDING): ...this. * doc/tm.texi: Regenerate. * calls.c (store_unaligned_arguments_into_pseudos): Use pad_direction instead of direction. (compute_argument_addresses): Likewise. (load_register_parameters): Likewise. (emit_library_call_value_1): Likewise. (store_one_arg): Use targetm.calls.function_arg_padding instead of FUNCTION_ARG_PADDING. (must_pass_in_stack_var_size_or_pad): Likewise. * expr.c (emit_group_load_1): Use pad_direction instead of direction. (emit_group_store): Likewise. (emit_single_push_insn_1): Use targetm.calls.function_arg_padding instead of FUNCTION_ARG_PADDING. (emit_push_insn): Likewise, and propagate enum change throughout function. * function.h (direction): Delete. (locate_and_pad_arg_data::where_pad): Use pad_direction instead of direction. * function.c (assign_parm_find_stack_rtl): Likewise. (assign_parm_setup_block_p): Likewise. (assign_parm_setup_block): Likewise. (gimplify_parameters): Likewise. (locate_and_pad_parm): Use targetm.calls.function_arg_padding instead of FUNCTION_ARG_PADDING, and propagate enum change throughout function. * config/aarch64/aarch64.h (FUNCTION_ARG_PADDING): Delete. (BLOCK_REG_PADDING): Use pad_direction instead of direction. * config/aarch64/aarch64-protos.h (aarch64_pad_arg_upward): Delete. * config/aarch64/aarch64.c (aarch64_pad_arg_upward): Replace with... (aarch64_function_arg_padding): ...this new function. (aarch64_gimplify_va_arg_expr): Use pad_direction instead of direction. (TARGET_FUNCTION_ARG_PADDING): Redefine. * config/arm/arm.h (FUNCTION_ARG_PADDING): Delete. (BLOCK_REG_PADDING): Use pad_direction instead of direction. * config/arm/arm-protos.h (arm_pad_arg_upward): Delete. * config/arm/arm.c (TARGET_FUNCTION_ARG_PADDING): Redefine. (arm_pad_arg_upward): Replace with... (arm_function_arg_padding): ...this new function. * config/c6x/c6x.h (BLOCK_REG_PADDING): Use pad_direction instead of direction. * config/ia64/hpux.h (FUNCTION_ARG_PADDING): Delete. * config/ia64/ia64-protos.h (ia64_hpux_function_arg_padding): Delete. * config/ia64/ia64.c (TARGET_FUNCTION_ARG_PADDING): Redefine. (ia64_hpux_function_arg_padding): Replace with... (ia64_function_arg_padding): ...this new function. Use pad_direction instead of direction. Check for TARGET_HPUX. * config/iq2000/iq2000.h (FUNCTION_ARG_PADDING): Delete. * config/iq2000/iq2000.c (TARGET_FUNCTION_ARG_PADDING): Redefine. (iq2000_function_arg_padding): New function. * config/mips/mips-protos.h (mips_pad_arg_upward): Delete. * config/mips/mips.c (mips_pad_arg_upward): Replace with... (mips_function_arg_padding): ...this new function. (mips_pad_reg_upward): Update accordingly. (TARGET_FUNCTION_ARG_PADDING): Redefine. * config/mips/mips.h (PAD_VARARGS_DOWN): Use targetm.calls.function_arg_padding. (FUNCTION_ARG_PADDING): Delete. (BLOCK_REG_PADDING): Use pad_direction instead of direction. * config/nios2/nios2.h (FUNCTION_ARG_PADDING): Delete. (PAD_VARARGS_DOWN): Use targetm.calls.function_arg_padding. * config/nios2/nios2-protos.h (nios2_function_arg_padding): Delete. (nios2_block_reg_padding): Return pad_direction instead of direction. * config/nios2/nios2.c (nios2_block_reg_padding): Return pad_direction instead of direction. (nios2_function_arg_padding): Likewise. Make static. (TARGET_FUNCTION_ARG_PADDING): Redefine. * config/pa/pa.h (FUNCTION_ARG_PADDING): Delete. (BLOCK_REG_PADDING): Use targetm.calls.function_arg_padding. * config/pa/pa-protos.h (pa_function_arg_padding): Delete. *
Turn MODES_TIEABLE_P into a target hook
The lack of function comments in msp430.c and rl78.c is deliberate; the local style there is to define the hook macro immediately before the function as a form of documentation. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by checking that there were no extra warnings or changes in testsuite assembly output for at least one target per CPU. OK to install? Richard 2017-08-27 Richard SandifordAlan Hayward David Sherwood gcc/ * target.def (modes_tieable_p): New hook. * doc/tm.texi (MODES_TIEABLE_P): Replace with... (TARGET_MODES_TIEABLE_P): ...this. * doc/tm.texi.in: Regenerate. * hooks.h (hook_bool_mode_mode_true): Declare. * hooks.c (hook_bool_mode_mode_true): New function. * combine.c (subst): Use targetm.modes_tieable_p instead of MODES_TIEABLE_P. * dse.c (find_shift_sequence): Likewise. * expmed.c (extract_low_bits): Likewise. * lower-subreg.c: Include target.h. (find_decomposable_subregs): Use targetm.modes_tieable_p instead of MODES_TIEABLE_P. * rtlanal.c (rtx_cost): Likewise. * config/aarch64/aarch64.h (MODES_TIEABLE_P): Delete. * config/aarch64/aarch64-protos.h (aarch64_modes_tieable_p): Delete. * config/aarch64/aarch64.c (aarch64_modes_tieable_p): Make static. (TARGET_MODES_TIEABLE_P): Redefine. * config/alpha/alpha.h (MODES_TIEABLE_P): Delete. * config/alpha/alpha.c (alpha_modes_tieable_p): New function. (TARGET_MODES_TIEABLE_P): Redefine. * config/arc/arc.h (MODES_TIEABLE_P): Delete. * config/arc/arc.c (TARGET_MODES_TIEABLE_P): Redefine. (arc_modes_tieable_p): New function. * config/arm/arm.h (MODES_TIEABLE_P): Delete. * config/arm/arm-protos.h (arm_modes_tieable_p): Delete. * config/arm/arm.c (TARGET_MODES_TIEABLE_P): Redefine. (arm_modes_tieable_p): Make static. * config/avr/avr.h (MODES_TIEABLE_P): Delete. * config/bfin/bfin.h (MODES_TIEABLE_P): Delete. * config/bfin/bfin.c (bfin_modes_tieable_p): New function. (TARGET_MODES_TIEABLE_P): Redefine. * config/c6x/c6x.h (MODES_TIEABLE_P): Delete. * config/c6x/c6x.c (c6x_modes_tieable_p): New function. (TARGET_MODES_TIEABLE_P): Redefine. * config/cr16/cr16.h (MODES_TIEABLE_P): Delete. * config/cr16/cr16.c (TARGET_MODES_TIEABLE_P): Redefine. (cr16_modes_tieable_p): New function. * config/cris/cris.h (MODES_TIEABLE_P): Delete. * config/epiphany/epiphany.h (MODES_TIEABLE_P): Delete. * config/fr30/fr30.h (MODES_TIEABLE_P): Delete. (TRULY_NOOP_TRUNCATION): Update comment. * config/frv/frv.h (MODES_TIEABLE_P): Delete. (TRULY_NOOP_TRUNCATION): Update comment. * config/frv/frv.c (TARGET_MODES_TIEABLE_P): Redefine. (frv_modes_tieable_p): New function. * config/ft32/ft32.h (MODES_TIEABLE_P): Delete. * config/h8300/h8300.h (MODES_TIEABLE_P): Delete. * config/h8300/h8300.c (h8300_modes_tieable_p): New function. (TARGET_MODES_TIEABLE_P): Redefine. * config/i386/i386.h (MODES_TIEABLE_P): Delete. * config/i386/i386-protos.h (ix86_modes_tieable_p): Delete. * config/i386/i386.c (ix86_modes_tieable_p): Make static. (TARGET_MODES_TIEABLE_P): Redefine. * config/ia64/ia64.h (MODES_TIEABLE_P): Delete. * config/ia64/ia64.c (TARGET_MODES_TIEABLE_P): Redefine. (ia64_modes_tieable_p): New function. * config/iq2000/iq2000.h (MODES_TIEABLE_P): Delete. * config/iq2000/iq2000.c (TARGET_MODES_TIEABLE_P): Redefine. (iq2000_modes_tieable_p): New function. * config/lm32/lm32.h (MODES_TIEABLE_P): Delete. * config/lm32/lm32.c (TARGET_MODES_TIEABLE_P): Redefine. (lm32_modes_tieable_p): New function. * config/m32c/m32c.h (MODES_TIEABLE_P): Delete. * config/m32c/m32c-protos.h (m32c_modes_tieable_p): Delete. * config/m32c/m32c.c (m32c_modes_tieable_p): Make static. (TARGET_MODES_TIEABLE_P): Redefine. * config/m32r/m32r.h (MODES_TIEABLE_P): Delete. * config/m32r/m32r.c (TARGET_MODES_TIEABLE_P): Redefine. (m32r_modes_tieable_p): New function. * config/m68k/m68k.h (MODES_TIEABLE_P): Delete. * config/m68k/m68k.c (TARGET_MODES_TIEABLE_P): Redefine. (m68k_modes_tieable_p): New function. * config/mcore/mcore.h (MODES_TIEABLE_P): Delete. * config/mcore/mcore.c (TARGET_MODES_TIEABLE_P): Redefine. (mcore_modes_tieable_p): New function. * config/microblaze/microblaze.h (MODES_TIEABLE_P): Delete. * config/microblaze/microblaze.c (microblaze_modes_tieable_p): New function. (TARGET_MODES_TIEABLE_P): Redefine. *
RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations
On Mon, 28 Aug 2017, Jon Beniston wrote: > Hi Richard, > > >- if (nunits < 1) /* Support V1SI. */ > >+ if (nunits < 1 || (nunits == 1 && !reduct_p)) > > return NULL_TREE; > > > >doesn't seem to be against trunk which has > > > > if (nunits <= 1) > >return NULL_TREE; > > > >so what happens if you just change that to > > > > if (nunits < 1) > >return NULL_TREE; > > Ah, that's what I first tried, and mistakenly sent the patch against that. > > That did help the initial problem, but then I needed to add a lot of support > for the V1SI type in the backend (which just duplicated SI patterns) and > there are a few places where GCC seems to have sanity checks against vectors > that are only one element wide. A dot product producing a scalar result > seems natural. Where do you need V1SI types? The vectorizer should perform a SI extract from V1SI here. You need to elaborate a bit here. > > Also, as well as ARC benefitting from this, I think the TI c6x port would > too. That currently has a sdot_prodv2hi pattern that uses SI and V2HI types. The vectorizer doesn't really support vector->scalar ops so V1SI feels more natural here. That is, your patch looks a bit ugly -- there's nothing wrong with V1SI vector types. So maybe instead of adjusting that function the respective caller when facing a lane-reducing op such as DOT_PROD needs to consider scalar result types. Richard. > Cheers, > Jon > > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [WIP] Possible Bug in vect_bb_slp_scalar_cost?
Hi, As discussed on IRC: This patches fixes the calculation of the scalar costs of SLP vectorization. I’ve added a test case and the auto_vec allocation is now reused for all children of a node. diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c new file mode 100644 index 000..5121a88 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-slp-details" } */ + +#define N 4 + +int s1[N], s2[N], s3[N]; +void escape(int, int, int, int); + +void +foo () +{ + int t1, t2, t3, t4; + + t1 = s1[0] + s2[0] + s3[0]; + t2 = s1[1] + s2[1] + s3[1]; + t3 = s1[2] + s2[2] + s3[2]; + t4 = s1[3] + s2[3] + s3[3]; + + s3[0] = t1 - s1[0] * s2[0]; + s3[1] = t2 - s1[1] * s2[1]; + s3[2] = t3 - s1[2] * s2[2]; + s3[3] = t4 - s1[3] * s2[3]; + + escape (t1, t2, t3, t4); +} + +/* { dg-final { scan-tree-dump-not "vectorization is not profitable" "slp2" } } */ +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 04ecaab..cffd19b 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2690,9 +2690,17 @@ vect_bb_slp_scalar_cost (basic_block bb, scalar_cost += stmt_cost; } + auto_vecsubtree_life; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) - scalar_cost += vect_bb_slp_scalar_cost (bb, child, life); + { + /* Do not directly pass LIFE to the recursive call, copy it to + confine changes in the callee to the current child/subtree. */ + subtree_life.safe_splice (*life); + scalar_cost += vect_bb_slp_scalar_cost (bb, child, _life); + subtree_life.truncate (0); + } return scalar_cost; } > On 04 Aug 2017, at 12:19, Richard Biener wrote: > > On Fri, Aug 4, 2017 at 12:08 PM, Dominik Inführ > wrote: >> Hi, >> >> vect_bb_slp_scalar_cost computes the scalar cost of a SLP node. If there are >> non-scalar uses of a definition, the costs for it and its operands >> (children) are ignored. The vector LIFE is used to keep track of this and an >> element is set to true, such that the def and its children are ignored. But >> as soon as an element is set to true it is never undone, that means the >> following sibling and parent nodes of the current node also stay ignored. > > Yes, that's intended. They are live as well because they are needed > to compute the live scalar. > > This seems wrong to me, a simple fix would be to clone LIFE for every > vector, such that changes to LIFE stay in the subtree: >> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c >> index 032a9444a5a..f919645f28b 100644 >> --- a/gcc/tree-vect-slp.c >> +++ b/gcc/tree-vect-slp.c >> @@ -2590,6 +2590,11 @@ vect_bb_slp_scalar_cost (basic_block bb, >> gimple *stmt; >> slp_tree child; >> >> + auto_vec subtree_life; >> + subtree_life.safe_splice (*life); >> + >> + life = _life; >> + >> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt) >> { >> unsigned stmt_cost; >> >> signature.asc Description: Message signed with OpenPGP using GPGMail
Turn HARD_REGNO_CALL_PART_CLOBBERED into a target hook
The SVE patches change the size of a machine_mode from a compile-time constant to a runtime invariant. However, target-specific code can continue to treat the modes as constant-sized if the target only has constant-sized modes. The main snag with this approach is that target-independent code still uses macros from the target .h file. This patch is one of several that converts a target macro to a hook. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by checking that there were no extra warnings or changes in testsuite assembly output for at least one target per CPU. OK to install? Richard 2017-08-28 Richard SandifordAlan Hayward David Sherwood gcc/ * target.def (hard_regno_call_part_clobbered): New hook. * doc/tm.texi.in (HARD_REGNO_CALL_PART_CLOBBERED): Replace with... (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): ...this hook. * doc/tm.texi: Regenerate. * hooks.h (hook_bool_uint_mode_false): Declare. * hooks.c (hook_bool_uint_mode_false): New function. * regs.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * cselib.c (cselib_process_insn): Use targetm.hard_regno_call_part_clobbered instead of HARD_REGNO_CALL_PART_CLOBBERED. * ira-conflicts.c (ira_build_conflicts): Likewise. * ira-costs.c (ira_tune_allocno_costs): Likewise. * lra-constraints.c (need_for_call_save_p): Likewise. * lra-lives.c: Include target.h. (check_pseudos_live_through_calls): Use targetm.hard_regno_call_part_clobbered instead of HARD_REGNO_CALL_PART_CLOBBERED. * regcprop.c: Include target.h. (copyprop_hardreg_forward_1): Use targetm.hard_regno_call_part_clobbered instead of HARD_REGNO_CALL_PART_CLOBBERED. * reginfo.c (choose_hard_reg_mode): Likewise. * regrename.c (check_new_reg_p): Likewise. * reload.c (find_equiv_reg): Likewise. * reload1.c (emit_reload_insns): Likewise. * sched-deps.c (deps_analyze_insn): Likewise. * sel-sched.c (init_regs_for_mode): Likewise. (mark_unavailable_hard_regs): Likewise. * targhooks.c (default_dwarf_frame_reg_mode): Likewise. * config/aarch64/aarch64.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/aarch64/aarch64.c (aarch64_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/avr/avr-protos.h (avr_hard_regno_call_part_clobbered): Delete. * config/avr/avr.c (avr_hard_regno_call_part_clobbered): Make static and return a bool. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/i386/i386.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/i386/i386.c (ix86_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/mips/mips.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/mips/mips.c (mips_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/powerpcspe/powerpcspe.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/powerpcspe/powerpcspe.c (rs6000_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/rs6000/rs6000.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/s390/s390.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * config/s390/s390.c (s390_hard_regno_call_part_clobbered): New function. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine. * config/sh/sh.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete. * system.h (HARD_REGNO_CALL_PART_CLOBBERED): Poison. Index: gcc/target.def === --- gcc/target.def 2017-08-28 09:36:09.610827107 +0100 +++ gcc/target.def 2017-08-28 09:36:10.075827140 +0100 @@ -5395,6 +5395,19 @@ The default version of this hook always bool, (unsigned int regno), default_hard_regno_scratch_ok) +DEFHOOK +(hard_regno_call_part_clobbered, + "This hook should return true if @var{regno} is partly call-saved and\n\ +partly call-clobbered, and if a value of mode @var{mode} would be partly\n\ +clobbered by a call. For example, if the low 32 bits of @var{regno} are\n\ +preserved across a call but higher bits are clobbered, this hook should\n\ +return true for a 64-bit mode but false for a 32-bit mode.\n\ +\n\ +The default implementation returns false, which is correct\n\ +for targets
Re: [Patch, Fortran] PR 81770: [5/6/7 Regression] Bogus warning: Pointer in pointer assignment might outlive the pointer target
Hi Janus, the attached patch fixes a bogus warning. The purpose of the warning is to detect cases where a pointer lives longer than its target. If the target itself is (1) a pointer or (2) a component of a DT pointer, we do not know about the lifetime of the target at compile time and no warning should be thrown. The existing check only handles case (1) and my patch adds the handling of case (2). Regtestes cleanly on x86_64-linux-gnu. Ok for trunk and the release branches? OK, and thanks for the patch! Regards Thomas
RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations
Hi Richard, >- if (nunits < 1) /* Support V1SI. */ >+ if (nunits < 1 || (nunits == 1 && !reduct_p)) > return NULL_TREE; > >doesn't seem to be against trunk which has > > if (nunits <= 1) >return NULL_TREE; > >so what happens if you just change that to > > if (nunits < 1) >return NULL_TREE; Ah, that's what I first tried, and mistakenly sent the patch against that. That did help the initial problem, but then I needed to add a lot of support for the V1SI type in the backend (which just duplicated SI patterns) and there are a few places where GCC seems to have sanity checks against vectors that are only one element wide. A dot product producing a scalar result seems natural. Also, as well as ARC benefitting from this, I think the TI c6x port would too. That currently has a sdot_prodv2hi pattern that uses SI and V2HI types. Cheers, Jon
Add wider_subreg_mode helper functions
This patch adds helper functions that say which of the two modes involved in a subreg is the larger, preferring the outer mode in the event of a tie. It also converts IRA and reload to track modes instead of byte sizes, since this is slightly more convenient when variable-sized modes are added later. Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking there was no change in the testsuite assembly output for at least one target per CPU. OK to install? Richard 2017-08-28 Richard SandifordAlan Hayward David Sherwood gcc/ * rtl.h (wider_subreg_mode): New function. * ira.h (ira_sort_regnos_for_alter_reg): Take a machine_mode * rather than an unsigned int *. * ira-color.c (regno_max_ref_width): Replace with... (regno_max_ref_mode): ...this new variable. (coalesced_pseudo_reg_slot_compare): Update accordingly. Use wider_subreg_mode. (ira_sort_regnos_for_alter_reg): Likewise. Take a machine_mode * rather than an unsigned int *. * lra-constraints.c (uses_hard_regs_p): Use wider_subreg_mode. (process_alt_operands): Likewise. (invariant_p): Likewise. * lra-spills.c (assign_mem_slot): Likewise. (add_pseudo_to_slot): Likewise. * lra.c (collect_non_operand_hard_regs): Likewise. (add_regs_to_insn_regno_info): Likewise. * reload1.c (regno_max_ref_width): Replace with... (regno_max_ref_mode): ...this new variable. (reload): Update accordingly. Update call to ira_sort_regnos_for_alter_reg. (alter_reg): Update to use regno_max_ref_mode. Call wider_subreg_mode. (init_eliminable_invariants): Update to use regno_max_ref_mode. (scan_paradoxical_subregs): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2017-08-28 09:23:51.181220876 +0100 +++ gcc/rtl.h 2017-08-28 09:23:56.230220860 +0100 @@ -2840,6 +2840,24 @@ subreg_lowpart_offset (machine_mode oute GET_MODE_SIZE (innermode)); } +/* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE, + return the mode that is big enough to hold both the outer and inner + values. Prefer the outer mode in the event of a tie. */ + +inline machine_mode +wider_subreg_mode (machine_mode outermode, machine_mode innermode) +{ + return partial_subreg_p (outermode, innermode) ? innermode : outermode; +} + +/* Likewise for subreg X. */ + +inline machine_mode +wider_subreg_mode (const_rtx x) +{ + return wider_subreg_mode (GET_MODE (x), GET_MODE (SUBREG_REG (x))); +} + extern unsigned int subreg_size_highpart_offset (unsigned int, unsigned int); /* Return the SUBREG_BYTE for an OUTERMODE highpart of an INNERMODE value. */ Index: gcc/ira.h === --- gcc/ira.h 2017-08-28 09:22:51.676686496 +0100 +++ gcc/ira.h 2017-08-28 09:23:56.227520860 +0100 @@ -195,7 +195,7 @@ extern void ira_set_pseudo_classes (bool extern void ira_expand_reg_equiv (void); extern void ira_update_equiv_info_by_shuffle_insn (int, int, rtx_insn *); -extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *); +extern void ira_sort_regnos_for_alter_reg (int *, int, machine_mode *); extern void ira_mark_allocation_change (int); extern void ira_mark_memory_move_deletion (int, int); extern bool ira_reassign_pseudos (int *, int, HARD_REG_SET, HARD_REG_SET *, Index: gcc/ira-color.c === --- gcc/ira-color.c 2017-08-28 09:22:51.676686496 +0100 +++ gcc/ira-color.c 2017-08-28 09:23:56.226620860 +0100 @@ -3908,7 +3908,7 @@ coalesced_pseudo_reg_freq_compare (const /* Widest width in which each pseudo reg is referred to (via subreg). It is used for sorting pseudo registers. */ -static unsigned int *regno_max_ref_width; +static machine_mode *regno_max_ref_mode; /* Sort pseudos according their slot numbers (putting ones with smaller numbers first, or last when the frame pointer is not @@ -3921,7 +3921,7 @@ coalesced_pseudo_reg_slot_compare (const ira_allocno_t a1 = ira_regno_allocno_map[regno1]; ira_allocno_t a2 = ira_regno_allocno_map[regno2]; int diff, slot_num1, slot_num2; - int total_size1, total_size2; + machine_mode mode1, mode2; if (a1 == NULL || ALLOCNO_HARD_REGNO (a1) >= 0) { @@ -3936,11 +3936,11 @@ coalesced_pseudo_reg_slot_compare (const if ((diff = slot_num1 - slot_num2) != 0) return (frame_pointer_needed || (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff); - total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1), -regno_max_ref_width[regno1]); - total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2), -regno_max_ref_width[regno2]); - if ((diff =
Add subreg_memory_offset helper functions
This patch adds routines for converting a SUBREG_BYTE offset into a memory address offset. The two only differ for paradoxical subregs, where SUBREG_BYTE is always 0 but the memory address offset can be negative. Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking there was no change in the testsuite assembly output for at least one target per CPU. OK to install? Richard 2017-08-28 Richard SandifordAlan Hayward David Sherwood gcc/ * rtl.h (subreg_memory_offset): Declare. * emit-rtl.c (subreg_memory_offset): New function. * expmed.c (store_bit_field_1): Use it. * expr.c (undefined_operand_subword_p): Likewise. * simplify-rtx.c (simplify_subreg): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2017-08-27 23:19:56.284397205 +0100 +++ gcc/rtl.h 2017-08-28 09:19:13.737714836 +0100 @@ -2827,6 +2827,8 @@ subreg_highpart_offset (machine_mode out } extern int byte_lowpart_offset (machine_mode, machine_mode); +extern int subreg_memory_offset (machine_mode, machine_mode, unsigned int); +extern int subreg_memory_offset (const_rtx); extern rtx make_safe_from (rtx, rtx); extern rtx convert_memory_address_addr_space_1 (machine_mode, rtx, addr_space_t, bool, bool); Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2017-08-22 17:14:30.335896832 +0100 +++ gcc/emit-rtl.c 2017-08-28 09:19:13.736814836 +0100 @@ -1013,6 +1013,33 @@ byte_lowpart_offset (machine_mode outer_ else return subreg_lowpart_offset (outer_mode, inner_mode); } + +/* Return the offset of (subreg:OUTER_MODE (mem:INNER_MODE X) OFFSET) + from address X. For paradoxical big-endian subregs this is a + negative value, otherwise it's the same as OFFSET. */ + +int +subreg_memory_offset (machine_mode outer_mode, machine_mode inner_mode, + unsigned int offset) +{ + if (paradoxical_subreg_p (outer_mode, inner_mode)) +{ + gcc_assert (offset == 0); + return -subreg_lowpart_offset (inner_mode, outer_mode); +} + return offset; +} + +/* As above, but return the offset that existing subreg X would have + if SUBREG_REG (X) were stored in memory. The only significant thing + about the current SUBREG_REG is its mode. */ + +int +subreg_memory_offset (const_rtx x) +{ + return subreg_memory_offset (GET_MODE (x), GET_MODE (SUBREG_REG (x)), + SUBREG_BYTE (x)); +} /* Generate a REG rtx for a new pseudo register of mode MODE. This pseudo is assigned the next sequential register number. */ Index: gcc/expmed.c === --- gcc/expmed.c2017-08-27 23:19:56.284397205 +0100 +++ gcc/expmed.c2017-08-28 09:19:13.736814836 +0100 @@ -726,29 +726,7 @@ store_bit_field_1 (rtx str_rtx, unsigned while (GET_CODE (op0) == SUBREG) { - /* The following line once was done only if WORDS_BIG_ENDIAN, -but I think that is a mistake. WORDS_BIG_ENDIAN is -meaningful at a much higher level; when structures are copied -between memory and regs, the higher-numbered regs -always get higher addresses. */ - int inner_mode_size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))); - int outer_mode_size = GET_MODE_SIZE (GET_MODE (op0)); - int byte_offset = 0; - - /* Paradoxical subregs need special handling on big-endian machines. */ - if (paradoxical_subreg_p (op0)) - { - int difference = inner_mode_size - outer_mode_size; - - if (WORDS_BIG_ENDIAN) - byte_offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD; - if (BYTES_BIG_ENDIAN) - byte_offset += difference % UNITS_PER_WORD; - } - else - byte_offset = SUBREG_BYTE (op0); - - bitnum += byte_offset * BITS_PER_UNIT; + bitnum += subreg_memory_offset (op0) * BITS_PER_UNIT; op0 = SUBREG_REG (op0); } Index: gcc/expr.c === --- gcc/expr.c 2017-08-27 23:19:56.149397206 +0100 +++ gcc/expr.c 2017-08-28 09:19:13.737714836 +0100 @@ -3524,30 +3524,12 @@ emit_move_ccmode (machine_mode mode, rtx static bool undefined_operand_subword_p (const_rtx op, int i) { - machine_mode innermode, innermostmode; - int offset; if (GET_CODE (op) != SUBREG) return false; - innermode = GET_MODE (op); - innermostmode = GET_MODE (SUBREG_REG (op)); - offset = i * UNITS_PER_WORD + SUBREG_BYTE (op); - /* The SUBREG_BYTE represents offset, as if the value were stored in - memory, except for a paradoxical subreg where we define - SUBREG_BYTE to be 0; undo this exception as in - simplify_subreg. */ - if (SUBREG_BYTE (op) == 0 -
Re: Improve ECF_NOTHROW flags for direct internal functions
Richard Bienerwrites: > On Thu, Aug 17, 2017 at 1:06 PM, Richard Sandiford > wrote >> Richard Biener writes: >>> On Thu, Aug 17, 2017 at 11:49 AM, Richard Sandiford >>> wrote: Internal functions that map directly to an optab can only throw an exception for -fnon-call-exceptions. This patch handles that in internal_fn_flags, in a similar way to ATTR_*NOTHROW in builtins.def. (Functions that don't throw even for flag_non_call_exceptions should be explicitly marked ECF_NOTHROW in internal-fn.def.) Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>> >>> Hmm. Note the outcome of flag_non_call_exceptions depends on the >>> current function and thus IPA passes querying flags would need to >>> push an appropriate function context. This means the function should >>> get a struct function * argument and opt_for_fn (fn, >>> flag_non_call_exceptions) >>> should be used. It doesn't help very much that all callers don't have >>> any such context either which means this "optimization" looks like >>> in the wrong place :/ (the global value of flag_non_call_exceptions in >>> the IPA case isn't necessarily conservative). >>> >>> So if you insist then add a comment and add a && cfun check so >>> we're sure we are in non-IPA context (or in properly setup context). >> >> Bah. In that case, what should happen if a -fno-non-call-exceptions >> function is inlined into an -fnon-call-exceptions one? Should the call >> keep the NOTHROWness of the original function, or should it lose >> NOTHROWness (and thus gain an exception edge)? > > nothrow-ness is tracked on the GIMPLE stmt via gimple_call_[set_]nothrow_p, > GIMPLE shouldn't look at flag_non_call_exceptions, it is basically part > of the IL. > >> I guess the path of least resistance would be to add an extra check >> for this case in the places that need it, rather than relying solely >> on gimple_call_flags. > > Well, gimple_call_flags works fine already (looking at the above in > addition to internal_fn_flags). call_expr_flags looks like it might not. OK, how does this look? Only the gimple flags matter for the use case I've seen (which is covered by the SVE tests, but hard to test as-is). Tested on aarch64-linux-gnu and x86_64-linux-gnu. Thanks, Richard 2017-08-28 Richard Sandiford gcc/ * gimplify.c (gimplify_call_expr): Copy the nothrow flag to calls to internal functions. (gimplify_modify_expr): Likewise. * tree-call-cdce.c (use_internal_fn): Likewise. * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Likewise. (convert_to_divmod): Set the nothrow flag. * tree-if-conv.c (predicate_mem_writes): Likewise. * tree-vect-stmts.c (vectorizable_mask_load_store): Likewise. (vectorizable_call): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vect-patterns.c (vect_recog_pow_pattern): Likewise. (vect_recog_mask_conversion_pattern): Likewise. Index: gcc/gimplify.c === *** gcc/gimplify.c 2017-08-17 13:17:21.266412322 +0100 --- gcc/gimplify.c 2017-08-28 09:10:13.246297839 +0100 *** gimplify_call_expr (tree *expr_p, gimple *** 3150,3156 if (EXPR_CILK_SPAWN (*expr_p)) gimplify_cilk_detach (pre_p); ! gimple *call = gimple_build_call_internal_vec (ifn, vargs); gimplify_seq_add_stmt (pre_p, call); return GS_ALL_DONE; } --- 3150,3157 if (EXPR_CILK_SPAWN (*expr_p)) gimplify_cilk_detach (pre_p); ! gcall *call = gimple_build_call_internal_vec (ifn, vargs); ! gimple_call_set_nothrow (call, TREE_NOTHROW (*expr_p)); gimplify_seq_add_stmt (pre_p, call); return GS_ALL_DONE; } *** gimplify_modify_expr (tree *expr_p, gimp *** 5636,5641 --- 5637,5643 vargs.quick_push (CALL_EXPR_ARG (*from_p, i)); } call_stmt = gimple_build_call_internal_vec (ifn, vargs); + gimple_call_set_nothrow (call_stmt, TREE_NOTHROW (*from_p)); gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p)); } else Index: gcc/tree-call-cdce.c === *** gcc/tree-call-cdce.c2017-06-30 12:50:38.243662675 +0100 --- gcc/tree-call-cdce.c2017-08-28 09:10:13.246297839 +0100 *** use_internal_fn (gcall *call) *** 1019,1024 --- 1019,1025 args.safe_push (gimple_call_arg (call, i)); gcall *new_call = gimple_build_call_internal_vec (ifn, args); gimple_set_location (new_call, gimple_location (call)); + gimple_call_set_nothrow (new_call, gimple_call_nothrow_p (call)); /* Transfer the
Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations
On Sun, 27 Aug 2017, Jon Beniston wrote: > Hi, > > I have an out-of-tree GCC port and it is struggling supporting > auto-vectorization on some dot product instructions. For example, I have an > instruction that takes three operands which are all 32-bit general > registers. The second and third operands will be treated as V2HI then do dot > product, and then generate an SI result which is then added to the first > operand which is SI as well. > > I do see there is dot product recognizer in tree-vect-patters.c, however, I > found the following testcase still can't be auto-vectorized on my port which > has implemented all necessary dot product standard patterns. This testcase > can't be auto-vectorized on other targets that have similar V2HI dot product > instructions as well, for example ARC. > > === test.c === > #define K 4 > #define M 4 > #define N 256 > int in[N*K][M]; > int out[K]; > int coeff[N][M]; > void > foo (void) > { > int i, j, k; > int sum; > for (k = 0; k < K; k++) > { > sum = 0; > for (j = 0; j < M; j++) > for (i = 0; i < N; i++) > sum += in[i+k][j] * coeff[i][j]; > out[k] = sum; > } > } > === > The reason that auto-vectorizer doesn't work seems to be that GCC doesn't > support single-element vector types in get_vectype_for_scalar_type_and_size. > tree-vect-stmts.c: get_vectype_for_scalar_type_and_size > ... > if (nunits <= 1) > return NULL_TREE; > > So, I am thinking this actually should be relaxed to support more cases. At > least on vector reduction operations which normally will have scalar result > with wider types than the element type of input operands. > > I have tried to make the auto-vectorizer work for my V2HI dot product case, > with the patch attached. Is this the correct approach? Hum, @@ -9039,7 +9040,7 @@ else simd_mode = mode_for_vector (inner_mode, size / nbytes); nunits = GET_MODE_SIZE (simd_mode) / nbytes; - if (nunits < 1) /* Support V1SI. */ + if (nunits < 1 || (nunits == 1 && !reduct_p)) return NULL_TREE; vectype = build_vector_type (scalar_type, nunits); doesn't seem to be against trunk which has if (nunits <= 1) return NULL_TREE; so what happens if you just change that to if (nunits < 1) return NULL_TREE; ? Richard. > Cheers, > Jon > > gcc/ > 2017-08-27 Jon Beniston> > * tree-vectorizer.h (get_vectype_for_scalar_type): New optional > parameter declaration. > * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Add new > optional parameter "reduct_p". Support single element vector types > if it is true. > (get_vectype_for_scalar_type): Add new parameter "reduct_p". > * tree-vect-patterns.c (vect_pattern_recog_1): Pass new parameter > "reduct_p". > * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. > (vect_model_reduction_cost): Likewise. > (get_initial_def_for_induction): Likewise. > (vect_create_epilog_for_reduction): Likewise. > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)