[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from Richard Biener --- Fixed.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #12 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:d45ddc2c04e471d0dcee016b6edacc00b8341b16 commit r14-4089-gd45ddc2c04e471d0dcee016b6edacc00b8341b16 Author: Richard Biener Date: Thu Sep 14 13:06:51 2023 +0200 tree-optimization/111294 - backwards threader PHI costing This revives an earlier patch since the problematic code applying extra costs to PHIs in copied blocks we couldn't make any sense of prevents a required threading in this case. Instead of coming up with an artificial other costing the following simply removes the bits. As with all threading changes this requires a plethora of testsuite adjustments, but only the last three are unfortunate as is the libgomp team.c adjustment which is required to avoid a bogus -Werror diagnostic during bootstrap. PR tree-optimization/111294 gcc/ * tree-ssa-threadbackward.cc (back_threader_profitability::m_name): Remove (back_threader::find_paths_to_names): Adjust. (back_threader::maybe_thread_block): Likewise. (back_threader_profitability::possibly_profitable_path_p): Remove code applying extra costs to copies PHIs. libgomp/ * team.c (gomp_team_start): Assert alloca size to avoid false positive alloc-size diagnostic. gcc/testsuite/ * gcc.dg/tree-ssa/pr111294.c: New test. * gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust. * gcc.dg/tree-ssa/pr59597.c: Likewise. * gcc.dg/tree-ssa/pr61839_2.c: Likewise. * gcc.dg/tree-ssa/ssa-sink-18.c: Likewise. * g++.dg/warn/Wstringop-overflow-4.C: XFAIL subtest on ilp32. * gcc.dg/uninit-pred-9_b.c: XFAIL subtest everywhere. * gcc.dg/vect/vect-117.c: Make scan for not Invalid sum conditional on lp64.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #11 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:9ea74d235c7e7816b996a17c61288f02ef767985 commit r14-3982-g9ea74d235c7e7816b996a17c61288f02ef767985 Author: Richard Biener Date: Thu Sep 14 09:31:23 2023 +0200 tree-optimization/111294 - better DCE after forwprop The following adds more aggressive DCE to forwprop to clean up dead stmts when folding a stmt leaves some operands unused. The patch uses simple_dce_from_worklist for this purpose, queueing original operands before substitution and folding, but only if we folded the stmt. This removes one dead stmt biasing threading costs in a later pass but it doesn't resolve the optimization issue in the PR yet. PR tree-optimization/111294 * tree-ssa-forwprop.cc (pass_forwprop::execute): Track operands that eventually become dead and use simple_dce_from_worklist to remove their definitions if they did so. * gcc.dg/tree-ssa/evrp10.c: Adjust. * gcc.dg/tree-ssa/evrp6.c: Likewise. * gcc.dg/tree-ssa/forwprop-31.c: Likewise. * gcc.dg/tree-ssa/neg-cast-3.c: Likewise.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #10 from Richard Biener --- But this stmt isn't the issue, BB7 is [local count: 118111600]: # _31 = PHI _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; if (_33 > 11) and that does have one more stmt. It's if (a.8_28 != 0) goto ; [34.00%] else goto ; [66.00%] [local count: 40157944]: [local count: 118111600]: # _31 = PHI _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; if (_33 > 11) goto ; [50.00%] else goto ; [50.00%] [local count: 59055800]: [local count: 118111600]: # iftmp.11_27 = PHI so what the cost model fails to see is that j_22 and _4 are only live on one path to BB9. It's that odd code again I attempted to remove at some point: /* PHIs in the path will create degenerate PHIS in the copied path which will then get propagated away, so looking at just the duplicate path the PHIs would seem unimportant. But those PHIs, because they're assignments to objects typically with lives that exist outside the thread path, will tend to generate PHIs (or at least new PHI arguments) at points where we leave the thread path and rejoin the original blocks. So we do want to account for them. We ignore virtual PHIs. We also ignore cases where BB has a single incoming edge. That's the most common degenerate PHI we'll see here. Finally we ignore PHIs that are associated with the value we're tracking as that object likely dies. */ if (EDGE_COUNT (bb->succs) > 1 && EDGE_COUNT (bb->preds) > 1) { for (gphi_iterator gsip = gsi_start_phis (bb); !gsi_end_p (gsip); gsi_next ()) { gphi *phi = gsip.phi (); tree dst = gimple_phi_result (phi); /* Note that if both NAME and DST are anonymous SSA_NAMEs, then we do not have enough information to consider them associated. */ if (dst != name && name && TREE_CODE (name) == SSA_NAME && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name) || !SSA_NAME_VAR (dst)) && !virtual_operand_p (dst)) ++m_n_insns; } } there's also a missed canonicalization I think: _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; we canonicalize (int)(unsigned char) _31 to _31 & 255 but we fail to do the same for (short)(unsigned char) _31 or rather we fail to anticipate that (short)_33 could be used for j_22, eliding _4. Maybe costing "lowparts" as zero would be useful here.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #9 from Richard Biener --- Created attachment 55898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55898=edit alternative patch for simple_dce_from_worklist
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #8 from Richard Biener --- (In reply to Andrew Pinski from comment #7) > Created attachment 55892 [details] > version of using simple_dce_from_worklist in forwprop > > This is a version of using simple_dce_from_worklist in forwprop I had tried > at one point, but I don't remember why I did finish up this patch. That wouldn't catch the case in question. The issue is when we simplify _32 = (char) _31; - _4 = (unsigned char) _32; + _4 = (unsigned char) _31; we don't realize _32 becomes unused. I think it might be enough to add all original SSA uses of a stmt we fold to the DCE worklist if we simplified it (and also before substituting from the fwprop lattice?). I think it doesn't work to look at orig_stmt operands before we do update_stmt on the new stmt. Sth like the attached works though.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #7 from Andrew Pinski --- Created attachment 55892 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55892=edit version of using simple_dce_from_worklist in forwprop This is a version of using simple_dce_from_worklist in forwprop I had tried at one point, but I don't remember why I did finish up this patch.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #6 from Richard Biener --- So the issue is that forwprop & folding has a hard time in cleaning up dead code afterwards but it would also benefit from doing that more aggressively (and early) because of single_use () and friends. I'm thinking of hooking into update_stmt to discover candidates for simple-dce-from-worklist (likely not early and aggressive enough though).
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener --- I will have a look (the const code is old & odd)
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #4 from Andrew Pinski --- (In reply to Andrew Pinski from comment #1) > In .optimized we have: >[local count: 118111600]: > # _31 = PHI <5(2), 0(3)> > _6 = (int) a.9_28; > _8 = _31 <= 3; > _9 = (int) _8; > if (_6 != _9) > goto ; [66.00%] > else > goto ; [34.00%] > > Except nothing optimizes _8 into the phi post pre (note _31 is only used in > the definition of _8 even). But optimizing that does not help here. Someone who understands jump threading should look into this.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-09-05 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Andrew Pinski --- Confirmed.(In reply to Andrew Pinski from comment #2) > > But _32 will be 0, _4 will be 0, j_22 will be 0, _33 will be 0. > But I think jump threading does not recognize j_22 will be 0 on that edge > and considers it a copy ... I think. after: Checking profitability of path (backwards): bb:7 (4 insns) bb:5 Control statement insns: 2 Overall: 2 insns before: Checking profitability of path (backwards): bb:7 (3 insns) bb:5 Control statement insns: 2 Overall: 1 insns Actually I think it is counting an already dead statement. The statement: _32 = (charD.10) _31; is dead but is being counted.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #2 from Andrew Pinski --- The difference is jump threading: old (able to remove foo): ``` Registering killing_def (path_oracle) _4 Registering value_relation (path_oracle) (_4 pe8 _33) (root: bb5) Checking profitability of path (backwards): [1] Registering jump thread: (5, 7) incoming edge; (7, 9) nocopy; path: 5->7->9 SUCCESS ``` vs new (not able to remove foo): ``` Registering killing_def (path_oracle) _33 Registering value_relation (path_oracle) (_33 pe8 _31) (root: bb5) Checking profitability of path (backwards): FAIL: Jump-thread path not considered: duplication of 2 insns is needed and optimizing for size. path: 5->7->xx REJECTED ``` Old IR before jump threading: ``` [local count: 118111600]: a.9_28 = a; if (a.9_28 != 0) goto ; [34.00%] else goto ; [66.00%] [local count: 40157944]: _30 = (char) b.1_2; [local count: 118111600]: # _33 = PHI <_30(6), 0(5)> _4 = (unsigned char) _33; _6 = (int) a.9_28; j_22 = (short int) _4; _20 = (int) _4; if (_4 > 11) goto ; [50.00%] else goto ; [50.00%] ``` New: ``` [local count: 118111600]: a.9_28 = a; if (a.9_28 != 0) goto ; [34.00%] else goto ; [66.00%] [local count: 40157944]: [local count: 118111600]: # _31 = PHI _32 = (char) _31; _4 = (unsigned char) _31; _6 = (int) a.9_28; j_22 = (short int) _4; _33 = _31 & 255; if (_33 > 11) goto ; [50.00%] else goto ; [50.00%] ``` But _32 will be 0, _4 will be 0, j_22 will be 0, _33 will be 0. But I think jump threading does not recognize j_22 will be 0 on that edge and considers it a copy ... I think.
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 --- Comment #1 from Andrew Pinski --- In .optimized we have: [local count: 118111600]: # _31 = PHI <5(2), 0(3)> _6 = (int) a.9_28; _8 = _31 <= 3; _9 = (int) _8; if (_6 != _9) goto ; [66.00%] else goto ; [34.00%] Except nothing optimizes _8 into the phi post pre (note _31 is only used in the definition of _8 even).
[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |14.0