[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Bug 84859 depends on bug 33315, which changed state. Bug 33315 Summary: stores not commoned by sinking https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Richard Biener --- Fixed.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #12 from Richard Biener --- Author: rguenth Date: Mon Mar 19 14:08:58 2018 New Revision: 258645 URL: https://gcc.gnu.org/viewcvs?rev=258645=gcc=rev Log: 2018-03-19 Richard BienerPR tree-optimization/84859 * tree-ssa-phiopt.c (single_trailing_store_in_bb): New function. (cond_if_else_store_replacement): Perform sinking operation on single-store BBs regardless of MAX_STORES_TO_SINK setting. Generalize what a BB with a single eligible store is. * gcc.dg/tree-ssa/pr84859.c: New testcase. * gcc.dg/tree-ssa/pr35286.c: Disable cselim. * gcc.dg/tree-ssa/split-path-6.c: Likewise. * gcc.dg/tree-ssa/split-path-7.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr84859.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c trunk/gcc/tree-ssa-phiopt.c
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #10 from Richard Biener --- Created attachment 43674 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43674=edit patch Final patch I am testing.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #9 from Richard Biener --- Like moving over a const call after the stores might cause us to spill across the call. Moving over any stmt could enlarge lifetimes enough to do that. Register lifetime could be so that we cannot coalesce the copies in the PHI (but that probably is offset by removing one of the stores -- but maybe the fast path is slowed down then). value_replacement has /* Allow up to 2 cheap preparation statements that prepare argument ... */ for example. But I guess preparation stmts are never of a concern?
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener --- Mine. We can handle the case of a single store in the BBs more generally and bypass the limit. The limit was added in r171381 where it was imposed on all transforms when the function was changed to handle the case of more than one stmt (store) in each BB. Previously the transform for the single stores case was always performed. Thus, for example like the following. Could be simplified somewhat iff the stores need to be the last stmt in the BBs (like it was before said re-org). last_and_only_stmt isn't enough here as we have a feeding scalar stmt around in one BB. Opinions? Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 258552) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -2061,8 +2061,6 @@ static bool cond_if_else_store_replacement (basic_block then_bb, basic_block else_bb, basic_block join_bb) { - gimple *then_assign = last_and_only_stmt (then_bb); - gimple *else_assign = last_and_only_stmt (else_bb); vec then_datarefs, else_datarefs; vec then_ddrs, else_ddrs; gimple *then_store, *else_store; @@ -2073,14 +2071,49 @@ cond_if_else_store_replacement (basic_bl tree then_lhs, else_lhs; basic_block blocks[3]; - if (MAX_STORES_TO_SINK == 0) + /* Handle the case with single store in THEN_BB and ELSE_BB. That is + cheap enough to always handle. */ + gphi *vphi = NULL; + for (gphi_iterator si = gsi_start_phis (join_bb); !gsi_end_p (si); + gsi_next ()) +if (virtual_operand_p (gimple_phi_result (si.phi ( + { + vphi = si.phi (); + break; + } + if (!vphi) return false; + gimple *then_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE + (vphi, single_succ_edge (then_bb))); + gimple *else_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE + (vphi, single_succ_edge (else_bb))); + + /* Verify there are no other stores or loads in the BBs. That allows us + to elide dependence checking. */ + use_operand_p use_p; + imm_use_iterator imm_iter; + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (then_assign)) +if (USE_STMT (use_p) != then_assign + && gimple_bb (USE_STMT (use_p)) == then_bb) + { + then_assign = NULL; + break; + } + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (else_assign)) +if (USE_STMT (use_p) != else_assign + && gimple_bb (USE_STMT (use_p)) == else_bb) + { + else_assign = NULL; + break; + } - /* Handle the case with single statement in THEN_BB and ELSE_BB. */ if (then_assign && else_assign) return cond_if_else_store_replacement_1 (then_bb, else_bb, join_bb, then_assign, else_assign); + if (MAX_STORES_TO_SINK == 0) +return false; + /* Find data references. */ then_datarefs.create (1); else_datarefs.create (1);
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Richard Biener changed: What|Removed |Added Depends on||33315 --- Comment #7 from Richard Biener --- The patch for PR33315 manages to transform if (n_21 > 255) goto ; [50.00%] else goto ; [50.00%] [local count: 477815112]: a[0] = 255; goto ; [100.00%] [local count: 477815112]: _1 = (unsigned char) n_21; a[0] = _1; [local count: 955630225]: _2 = a[0]; to if (n_21 > 255) goto ; [50.00%] else goto ; [50.00%] [local count: 477815112]: _1 = (unsigned char) n_21; [local count: 955630225]: # _23 = PHI <255(5), _1(6)> a[0] = _23; _2 = a[0]; but as proposed that would happen too late (we've already threaded the jump and PRE already would have done sth similar via the load from a[0]). As given it looks like sinking should be performed before the first phiopt pass (which also runs quite late - there was talk of an early phiopt pass). Possibly cselim could be enhanced to catch the above as well. In fact its comments say it would handle the above. Ah, we do /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion is disabled. */ if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize) || !opts->x_flag_tree_loop_if_convert) maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0, opts->x_param_values, opts_set->x_param_values); for whatever reason :/ --param max-stores-to-sink=2 fixes the testcase. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315 [Bug 33315] stores not commoned by sinking
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #6 from Martin Sebor --- We have seen this sort of thing come up a number of times in the past. Jeff and I have discussed changing jump threading to either avoid introducing paths involving invalid calls, or inserting __builtin_unreachable/__builtin_trap. That could not only avoid false positives but also make it possible to emit more efficient code.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #5 from Jakub Jelinek --- Seems the kernel testcase is actually more like: void f (void*); void __attribute__ ((noinline, noclone)) g (const void *p, unsigned n) { unsigned char a[8]; if (n > sizeof a - 1) return; for (; n > 0; n -= *a) { *a = n > 255 ? 255 : n; __builtin_memcpy (a + 1, p, *a); // no warning (good) f (a); } } void __attribute__ ((noinline, noclone)) h (const void *p, unsigned n) { unsigned char a[8]; if (n > sizeof a - 1) return; for (; n > 0; n -= *a) { if (n > 255) *a = 255; else *a = n; __builtin_memcpy (a + 1, p, *a); // bogus -Warray-bounds f (a); } } where the memcpy calls don't clobber *a, but there is another call that makes a escape and could modify the value, so I'm afraid there is absolutely nothing VRP can do here and the only fix is not to warn on statements affected by jump threading. What the kernel should have used (if it couldn't use get rid of the loop altogether like it did) would be something like: void f (void*); void __attribute__ ((noinline, noclone)) g (const void *p, unsigned n) { unsigned char a[8], t; if (n > sizeof a - 1) return; for (; n > 0; n -= t) { t = n > 255 ? 255 : n; *a = t; __builtin_memcpy (a + 1, p, t); // no warning (good) f (a); } } void __attribute__ ((noinline, noclone)) h (const void *p, unsigned n) { unsigned char a[8], t; if (n > sizeof a - 1) return; for (; n > 0; n -= t) { if (n > 255) t = 255; else t = n; *a = t; __builtin_memcpy (a + 1, p, t); // no warning (good) f (a); } } i.e. introduce a temporary for the size in the current iteration and use that across the calls that might have but don't really change that.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #3 from Martin Sebor --- The warning is in response to the call to memcpy(d, s, 255) in h(): [local count: 477815112]: a[0] = 255; __builtin_memcpy (, p_15(D), 255); _26 = a[0]; The wording seems clear: The function is forming an offset that is out of the bounds of the referenced object. The range of the out-of-bounds offset is [9, 255], while the bounds of the object are [0, 8]. If you think you have of a better/clearer way to phrase it then please propose it.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 --- Comment #4 from Jakub Jelinek --- Note, even teaching VRP to look through stores and loads from memory wouldn't help here, not even for the first function, because it uses *a as the length, but also overwrites it (potentially or for real) with the memcpy, so the compiler can't really know anything about *a in the second and following iterations, except that it is unsigned char and thus [0, 255]. n can wrap around.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- That is just something gimple-ssa-warn-restrict.c mades up from the call with 255 constant size. Figuring out from the warning wording that it complains about a call with constant length 255 is impossible.
[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2018-03-14 Known to work||7.3.1 Target Milestone|--- |8.0 Summary|bogus -Warray-bounds on a |[8 Regression] bogus |memcpy in a loop|-Warray-bounds on a memcpy ||in a loop Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- The reason is a missed phiopt to MIN_EXPR for h and VRP not propagating through memory. we have [local count: 955630223]: if (n_6 > 255) goto ; [50.00%] else goto ; [50.00%] [local count: 477815112]: a[0] = 255; goto ; [100.00%] [local count: 477815112]: _1 = (unsigned char) n_6; a[0] = _1; [local count: 955630223]: _2 = a[0]; where we lack a pass replacing that with # tem = PHI <255(5), _1(6)> a[0] = tem; _2 = a[0]; and then a MIN_EXPR. Not sure why thw warn_restrict pass ends up with this kind of value-range here though. In the IL we have threaded the n > 255 check though and ended up with an explicit memcpy (..., 255); but that's not [9, 255].