[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #11 from Andrew Macleod --- Fixed.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #10 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:661c02e54ea72fb55205df0a717951ff28bb739e commit r12-5522-g661c02e54ea72fb55205df0a717951ff28bb739e Author: Andrew MacLeod Date: Tue Nov 23 14:12:29 2021 -0500 Check for equivalences between PHI argument and def. If a PHI argument on an edge is equivalent with the DEF, then it doesn't provide any new information, defer processing it unless they are all equivalences. PR tree-optimization/103359 gcc/ * gimple-range-fold.cc (fold_using_range::range_of_phi): If arg is equivalent to def, don't initially include it's range. gcc/testsuite/ * gcc.dg/pr103359.c: New.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #9 from Andrew Macleod --- This is an artifact of rangers hybrid optimistic/pessimistic approach. We optimistically assume things are UNDEFINED until we actually have to resolve them. The code at the end of FRE is not representative to what we see in EVRP as the loop code has inserted some PHI copies: : # h_7 = PHI <4(2), 1(7)> : # h_18 = PHI We have to resolve h_22 on the back edge, and there ends up being a series of PHIs, and we commit to VARYING before we fully resolve the cycle: : # h_22 = PHI : # h_20 = PHI : # h_10 = PHI We currently have code to "ignore" arguments that are the same as the def, because they provide no new info, so when processing something like: # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)> we still come up with [1,1][[4,4] I notice that when looking at the PHI coming from the back edge: Equivalence set : [h_10, h_18, h_22] : # h_22 = PHI that h_18 is in the equivalency set of h_22 at this point... I am experimenting with using that information to also decide that since h_22 and h_18 are equivalent on that backedge, that we can avoid using h_22 as well. which means h_18 and h_7 will be equivalent, and evaluate to [1,1][4,4]. and then we eliminate the store to C like we did before. stay tuned.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #8 from Aldy Hernandez --- For the record, I'm using: gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC) as a proxy for gcc11. And I'm using the *.fre1 dump to see what evrp sees on entry. Perhaps there's something going on here such that I don't get IL Richi got in comment #4 for trunk.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #7 from Aldy Hernandez --- (In reply to Richard Biener from comment #4) > So the important part is to eliminate the store to 'c' which GCC11 manages > to do > from EVRP while trunk fails at this task. The IL into EVRP is absoultely > identical: > >: > goto ; [INV] > >: > # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)> > d.2_8 = d; > if (d.2_8 != 0) > goto ; [INV] > >: > if (h_10 != 0) > goto ; [INV] > >: > e.0_1 = e; > if (e.0_1 != 0) > goto ; [INV] > >: > a.1_2 = a; > _3 = (short int) a.1_2; > _14 = a.1_2 & 1; > _4 = (short int) _14; > _5 = _4 | h_10; > _6 = (int) _5; > f.4_21 = (unsigned short) _5; > _23 = f.4_21 * 3; > _24 = (short int) _23; > if (_5 == 0) > goto ; [INV] > else > goto ; [INV] > >: > c = 0; In GCC11 we had this right before evrp: _5 = _4 | h_10; _6 = (int) _5; f.4_21 = (unsigned short) _5; _23 = f.4_21 * 3; _24 = (short int) _23; if (_5 == 0) goto ; [INV] else goto ; [INV] which evrp could fold: Visiting conditional with predicate: if (_5 == 0) With known ranges _5: short int ~[0, 0] Predicate evaluates to: 0 However, in trunk the IL is different: _5 = _4 | h_10; _6 = (int) _5; f.4_21 = (unsigned short) _5; _23 = f.4_21 * 3; _24 = (short int) _23; if (_24 == 0) goto ; [INV] else goto ; [INV] Where did the cast come from in trunk? Cause without the case, we should be able to fold it. For that matter, in trunk we have: Folding statement: if (_24 == 0) Visiting conditional with predicate: if (_24 == 0) With known ranges _24: short int VARYING Predicate evaluates to: DON'T KNOW gimple_simplified to if (_5 == 0) Folded into: if (_5 == 0) So gimple fold does get rid of the cast for us, and at that point ranger can figure out the conditional: [ranger dump] if (_5 == 0) goto ; [INV] else goto ; [INV] ... ... _5 : short int [-INF, -1][1, +INF] Presumably we're asking ranger before calling gimple fold, but the question is, what was going on in GCC11 that the _24 => _5 substitution was done before arriving in evrp.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Richard Biener changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #6 from Richard Biener --- Which means the CCP difference is that we're failing to propagate out copies when they have a useful value: Visiting statement: _25 = _24; which is likely CONSTANT Lattice value changed to CONSTANT 0x0 (0xf). Adding SSA edges to worklist. ssa_edge_worklist: adding SSA use in _7 = _25; Simulating statement: _7 = _25; Visiting statement: _7 = _25; which is likely CONSTANT Lattice value changed to CONSTANT 0x0 (0xf). Adding SSA edges to worklist. that's because the copy and constant lattice are unified and we cannot track both. And while forwprop will propagate out copies since it does that when it visits uses it confuses itself with this. Propagating them out immediately fixes this but the obvious use of replace_uses_by causes excessive folding and redundant work as well as out-of-order debug stmt handling. The following would fix the testcase. Note there's still the appearant EVRP regression compared to GCC 11. The single_use test was added with g:2fde61e3caf4c4660743e53497f52b65da1fe760 as a fix for PR66916. diff --git a/gcc/match.pd b/gcc/match.pd index f059b477f58..b32cc4a9368 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5221,8 +5221,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && ((POINTER_TYPE_P (TREE_TYPE (@00)) && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@00 || (POINTER_TYPE_P (TREE_TYPE (@10)) - && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10)) - && single_use (@0)) + && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10))) (if (TYPE_PRECISION (TREE_TYPE (@00)) == TYPE_PRECISION (TREE_TYPE (@0)) && (TREE_CODE (@10) == INTEGER_CST || @1 != @10)
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #5 from Richard Biener --- (In reply to Andrew Pinski from comment #2) > The other thing is: > > -ftree-bit-ccp > Visiting statement: > _4 = _3 & 1; > which is likely CONSTANT > Applying pattern match.pd:1641, gimple-match.c:23146 > Lattice value changed to CONSTANT 0x0 (0x1). Adding SSA edges to worklist. > marking stmt to be not simulated again > > vs > > -fno-tree-bit-ccp > _4 = _3 & 1; > which is likely CONSTANT > Applying pattern match.pd:1641, gimple-match.c:23146 > Lattice value changed to VARYING. Adding SSA edges to worklist. > > In the first case we mark the stmt as not be simulated again while in the > second case we didn't. VARYING are always 'not simulated again', the dumping is for an optimization when all uses in a stmt will never be simulated again. So the only difference is the folding where there are constraints which folding results we may put into the lattice that are not satisfied here and that we do not fold likely VARYING statements (all the "likely" stuff is compile-time optimization, eventually premature). So I don't really see any CCP issue here. It's just that without bit-CCP we get @@ -365,7 +365,7 @@ f.4_21 = (unsigned short) _5; _23 = f.4_21 * 3; _24 = (short int) _23; - if (_24 == 0) + if (_5 == 0) goto ; [INV] the real issue is IMHO that forwprop doesn't get to it @@ -23,13 +23,7 @@ gimple_simplified to _14 = a.1_2 & 1; _4 = (short int) _14; -gimple_simplified to f_18 = _5; -Removing dead stmt _7 = _24; - -Removing dead stmt _25 = _24; - -Removing dead stmt f_18 = _5; - +gimple_simplified to if (_5 == 0) we don't get /* For integral types with undefined overflow fold x * C1 == C2 into x == C2 / C1 or false. If overflow wraps and C1 is odd, simplify to x == C2 / C1 in the ring Z / 2^n Z. */ (for cmp (eq ne) (simplify (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2) applied here. We do get the fold_sign_changed_comparison pattern applied though, but only from combine_cond_expr_cond and there we do not apply it because of the single-use restriction and the propagation phase of forwprop delaying stmt removal confusing that. The same issue prevents the GIMPLE variant from applying since it has /* From fold_sign_changed_comparison and fold_widened_comparison. FIXME: the lack of symmetry is disturbing. */ (for cmp (simple_comparison) (simplify (cmp (convert@0 @00) (convert?@1 @10)) ... && single_use (@0))
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Richard Biener changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Comment #4 from Richard Biener --- So the important part is to eliminate the store to 'c' which GCC11 manages to do from EVRP while trunk fails at this task. The IL into EVRP is absoultely identical: : goto ; [INV] : # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)> d.2_8 = d; if (d.2_8 != 0) goto ; [INV] : if (h_10 != 0) goto ; [INV] : e.0_1 = e; if (e.0_1 != 0) goto ; [INV] : a.1_2 = a; _3 = (short int) a.1_2; _14 = a.1_2 & 1; _4 = (short int) _14; _5 = _4 | h_10; _6 = (int) _5; f.4_21 = (unsigned short) _5; _23 = f.4_21 * 3; _24 = (short int) _23; if (_5 == 0) goto ; [INV] else goto ; [INV] : c = 0;
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- I will have a look.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 --- Comment #2 from Andrew Pinski --- The other thing is: -ftree-bit-ccp Visiting statement: _4 = _3 & 1; which is likely CONSTANT Applying pattern match.pd:1641, gimple-match.c:23146 Lattice value changed to CONSTANT 0x0 (0x1). Adding SSA edges to worklist. marking stmt to be not simulated again vs -fno-tree-bit-ccp _4 = _3 & 1; which is likely CONSTANT Applying pattern match.pd:1641, gimple-match.c:23146 Lattice value changed to VARYING. Adding SSA edges to worklist. In the first case we mark the stmt as not be simulated again while in the second case we didn't. Someone who understands ccp better should look into this.
[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-11-22 Keywords||missed-optimization Target Milestone|--- |12.0 --- Comment #1 from Andrew Pinski --- Confirmed. In CCP1 on the trunk: Visiting statement: f_18 = (short intD.25) _6; which is likely CONSTANT Lattice value changed to CONSTANT 0x4 (0x1). Adding SSA edges to worklist. In ccp1 on the trunk with -fno-tree-bit-ccp: Visiting statement: f_18 = (short intD.25) _6; which is likely CONSTANT Applying pattern match.pd:3663, gimple-match.c:42920 Applying pattern match.pd:3580, gimple-match.c:42859 Match-and-simplified (short int) _6 to _5 Lattice value changed to CONSTANT _5. Adding SSA edges to worklist. marking stmt to be not simulated again So ccp1 sometimes applies match and simplify and sometimes does not