[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 --- Comment #9 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:4c5b1160776382772fc0a33130dfaf621699fdbf commit r13-3486-g4c5b1160776382772fc0a33130dfaf621699fdbf Author: Richard Biener Date: Mon Oct 24 08:52:12 2022 +0200 tree-optimization/107176 - SCEV analysis association issue The following fixes a wrong-code issue caused by SCEV analysis associating an addition due trying to use tail-recursion in follow_ssa_edge_expr. That causes us to apply a conversion at the wrong point and thus miscompute the scalar evolution of an induction variable. This reverts the PR66375 fix and revisits the PR42512 fix by keeping the evolution symbolic up to the point we process the first linear function when we then can check for the supported cases and substitute the whole symbolic expression with the built chrec substituting the proper initial value. To simplify passing around things and to clarify scoping of the involved functions this change wraps the SCEV DFS walking code into a class. PR tree-optimization/107176 PR tree-optimization/66375 PR tree-optimization/42512 * tree-scalar-evolution.cc (follow_ssa_edge_expr): Revert the PR66375 fix, do not not associate PLUS_EXPR to be able to use tail-recursion. (follow_ssa_edge_binary): Likewise. (interpret_loop_phi): Revert PR42512 fix, do not throw away analyze_evolution_in_loop result after the fact. (follow_ssa_edge_expr): When reaching halting_phi initalize the evolution to the symbolic value of the PHI result. (add_to_evolution_1): When adding the first evolution verify we can handle the expression wrapping the symbolic evolution and replace that in full using the initial condition. (class scev_dfs): New, contains ... (follow_ssa_edge_expr, follow_ssa_edge_binary, follow_ssa_edge_in_condition_phi_branch, follow_ssa_edge_in_condition_phi, follow_ssa_edge_inner_loop_phi, add_to_evolution, add_to_evolution_1): ... these with loop and halting_phi arguments in class data. (scev_dfs::get_ev): New toplevel DFS entry, start with a chrec_dont_know evolution. (analyze_evolution_in_loop): Use scev_dfs. * gcc.dg/torture/pr107176.c: New testcase.
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=42512 --- Comment #8 from Richard Biener --- PR42512 is also related and its "fix" is even worse... it's worse in that it hides wrong cases but also throws away correct ones. But it also shows that we still mishandle that testcase with or without the fix(es). It seems that the intent of the SCC analysis in interpret_loop_phi is to look at the linear operation (or the chain of linear operations) associated in a way that it matches (T2)(init + ev) from which then (T2) { init, +, ev } follows. So when PHI' = (int)(signed char)PHI + 11 we directly match (int)(signed char){ init, +, 11 } and for PHI' = (long)((unsigned)PHI + -90u) + 91 we'd have to associate the outer + 91 which we can't do (but effectively do due to the present bug).
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 --- Comment #7 from Richard Biener --- (In reply to Richard Biener from comment #2) > final value replacement: > b_lsm.8_26 = PHI > with expr: 1 > final stmt: > b_lsm.8_26 = 1; > > where > > (get_scalar_evolution > (scalar = b_lsm.8_15) > (scalar_evolution = {0, +, 1}_1)) > (chrec_apply > (varying_loop = 1) > (chrec = {0, +, 1}_1) > (x = 1) > (res = 1)) > > and > >[local count: 955630225]: > _1 = (unsigned int) b_lsm.8_15; > _2 = _1 + 4294967206; > # RANGE [irange] long int [0, 4294967295] NONZERO 0x > _12 = (long int) _2; > # RANGE [irange] long int [91, 4294967386] NONZERO 0x1 > _3 = _12 + 91; > >[local count: 1073741824]: > # b_lsm.8_15 = PHI <0(2), _3(3)> For comparison in PR66375 we have : # c_6 = PHI <0(2), c_12(3)> a.1_5 = a; if (a.1_5 <= 12) goto ; [INV] else goto ; [INV] : _1 = (signed char) c_6; _2 = (int) _1; c_12 = _2 + -11; _4 = a.1_5 + 1; a = _4; goto ; so the backedge definition is (long)((unsigned)IV + -90u) + 91 vs. (int)(signed char)IV + -11 I think the issue is the CONVERT_EXPR handling in follow_ssa_edge_expr where it isn't all that clear in which case we can analyze the evolution in the narrower or the wider type. In the case in this PR we mishandled the middle conversion while in the older case we mishandle the "initial" conversion. I suspect that trying to optimize things on-the-fly is difficult (and reasoning about the relevant cases there). I've tried (again) to more correctly have the current evolution tentative until we hit the loop PHI again when following the use-def chain from the latch definition, but then we don't even know whether we will have an evolution in the end (tried { initial, +, scev_not_known }). Going fully symbolic will lead to the issue pointed out in comment 6 of PR66375, we'd get { (int)(signed char)0, +, -11 } which isn't what we want. If, as in this bug, we have two evolutions in different types, we probably have to give up. Maybe we need to think of a PLUS as { unknown, +, val } and instead fend off "wrongly" typed evolutions when we reach the loop PHI node again. But then the PR66375 case _does_ have an expression correctly describing the evolution of the IV. Just in this PRs case there is none I think. Or rather, I guess it would be { (long){ 0u, +, -90u }_1, +, 90 }_1 but that wouldn't be affine at least.
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 --- Comment #6 from Richard Biener --- Hmm, and interestingly this fix causes PR66375 to regress again: FAIL: gcc.dg/torture/pr66375.c -O2 execution test ... OK, so that's because my suggested fix is exactly an (incomplete) reversal of the fix for PR66375 (r6-1346-gb9b79ba4264cf6) :/
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 --- Comment #5 from Richard Biener --- [local count: 1073741824]: # b_lsm.8_15 = PHI <0(2), _3(3)> # b_lsm_flag.9_17 = PHI <0(2), 1(3)> if (b_lsm.8_15 <= 0) goto ; [89.00%] else goto ; [11.00%] [local count: 955630225]: _1 = (unsigned int) b_lsm.8_15; _2 = _1 + 4294967206; _12 = (long int) _2; _3 = _12 + 91; goto ; for this loop we get (number_of_iterations_in_loop = (analyze_scalar_evolution (loop_nb = 1) (scalar = b_lsm.8_15) (get_scalar_evolution (scalar = b_lsm.8_15) (scalar_evolution = )) (analyze_initial_condition (loop_phi_node = b_lsm.8_15 = PHI <0(2), _3(3)> ) (init_cond = 0)) (analyze_evolution_in_loop (loop_phi_node = b_lsm.8_15 = PHI <0(2), _3(3)> ) (add_to_evolution (loop_nb = 1) (chrec_before = 0) (to_add = 91) (res = {0, +, 91}_1)) (add_to_evolution (loop_nb = 1) (chrec_before = {0, +, 91}_1) (to_add = 4294967206) (res = {0, +, 1}_1)) Estimating # of iterations of loop 1 (number_of_iterations_in_loop = (analyze_scalar_evolution (loop_nb = 1) (scalar = b_lsm.8_15) (get_scalar_evolution (scalar = b_lsm.8_15) (scalar_evolution = )) (analyze_initial_condition (loop_phi_node = b_lsm.8_15 = PHI <0(2), _3(3)> ) (init_cond = 0)) (analyze_evolution_in_loop (loop_phi_node = b_lsm.8_15 = PHI <0(2), _3(3)> ) (add_to_evolution (loop_nb = 1) (chrec_before = 0) (to_add = 91) (res = {0, +, 91}_1)) (add_to_evolution (loop_nb = 1) (chrec_before = {0, +, 91}_1) (to_add = 4294967206) (res = {0, +, 1}_1)) (evolution_function = (long int) {0, +, 1}_1)) (set_scalar_evolution instantiated_below = 2 (scalar = b_lsm.8_15) (scalar_evolution = (long int) {0, +, 1}_1)) ) and that's wrong I think. (long int)((unsigned int)b + -90u) + 91 isn't the same as (long int) {0, +, 1} with the evolution in unsigned int. It's the evolution of the value in _unsigned int_. The key somehow happens in follow_ssa_edge_expr where we do /* Match an assignment under the form: "a = b +- ...". Use tail-recursion for the simple case. */ *evolution_of_loop = add_to_evolution (loop->num, chrec_convert (type, *evolution_of_loop, at_stmt), code, rhs1, at_stmt); expr = rhs0; goto tail_recurse; which converts the long int { 0, +, 91 }_1 to unsigned int and will then add -90u to that. We are building the CHREC "backwards", we start with '(long)0', the initial condition but then build the CHREC from the backedge def, first {(long)0, +, 91}, then upon _12 = (long int)_2 we're feeding the {(long)0, +, 91} to the _2 = _1 + -90u handler above which simply truncates the "current" CHREC and the outer recursion will simply convert that back. But that's not how the CHREC expression build and simplification should work? The most significant hint is probably that when reaching the initial truncation '_1 = (unsigned int) b_lsm.8_15;' we get there with a CHREC that is unsigned int already! So it's possibly wrong to "post-recurse" to rhs0 here. Doing diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index 9f30f78cb5d..b481c488ca8 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -1265,6 +1265,9 @@ tail_recurse: if (TREE_CODE (rhs0) == SSA_NAME && (TREE_CODE (rhs1) != SSA_NAME || code == MINUS_EXPR)) { + + t_bool res = follow_ssa_edge_expr (loop, at_stmt, rhs0, halting_phi, + evolution_of_loop, limit); /* Match an assignment under the form: "a = b +- ...". Use tail-recursion for the simple case. */ @@ -1272,8 +1275,9 @@ tail_recurse: (loop->num, chrec_convert (type, *evolution_of_loop, at_stmt), code, rhs1, at_stmt); - expr = rhs0; - goto tail_recurse; + //expr = rhs0; + //goto tail_recurse; + return res; } /* Else search for the SCC in both rhs0 and rhs1. */ return follow_ssa_edge_binary (loop, at_stmt, type, rhs0, code, rhs1, fixes the testcase.
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #4 from Richard Biener --- I will try to have a closer look.
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu since r7-2012-g43aabfcfd4139e4c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 Martin Liška changed: What|Removed |Added Summary|[10/11/12/13 Regression]|[10/11/12/13 Regression] |Wrong code at -Os on|Wrong code at -Os on |x86_64-pc-linux-gnu |x86_64-pc-linux-gnu since ||r7-2012-g43aabfcfd4139e4c Keywords|needs-bisection | CC||amker at gcc dot gnu.org, ||marxin at gcc dot gnu.org --- Comment #3 from Martin Liška --- Started with r7-2012-g43aabfcfd4139e4c.
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 --- Comment #2 from Richard Biener --- final value replacement: b_lsm.8_26 = PHI with expr: 1 final stmt: b_lsm.8_26 = 1; where (get_scalar_evolution (scalar = b_lsm.8_15) (scalar_evolution = {0, +, 1}_1)) (chrec_apply (varying_loop = 1) (chrec = {0, +, 1}_1) (x = 1) (res = 1)) and [local count: 955630225]: _1 = (unsigned int) b_lsm.8_15; _2 = _1 + 4294967206; # RANGE [irange] long int [0, 4294967295] NONZERO 0x _12 = (long int) _2; # RANGE [irange] long int [91, 4294967386] NONZERO 0x1 _3 = _12 + 91; [local count: 1073741824]: # b_lsm.8_15 = PHI <0(2), _3(3)>
[Bug tree-optimization/107176] [10/11/12/13 Regression] Wrong code at -Os on x86_64-pc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107176 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.5 Component|c |tree-optimization Keywords||wrong-code Known to fail||7.1.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Summary|Wrong code at -O0/-Os on|[10/11/12/13 Regression] |x86_64-pc-linux-gnu |Wrong code at -Os on ||x86_64-pc-linux-gnu Last reconfirmed||2022-10-06 Known to work||6.3.0 --- Comment #1 from Andrew Pinski --- Confirmed. Here is a better testcase: #include int a; long b; static inline long c(unsigned d) { return d; } static inline void e(int d) { a = d; } int main() { b = 0; for (; b < 1; b = c(b - 90) + 90 + 1) ; e(b >> 2); printf("%d\n", a); if (a != 1073741824) __builtin_abort(); }