[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864 --- Comment #3 from Aldy Hernandez --- (In reply to Jeffrey A. Law from comment #2) > It almost looks like a costing issue. The threaders find opportunities to > thread all the incoming edges in the key block to the path which avoids the > call.. But all the paths get rejected. > > > This is the key block: > > ;; basic block 11, loop depth 0, count 976284897 (estimated locally, freq > 0.9092), maybe hot > ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED) > ;;pred: 10 [99.8% (guessed)] count:225266786 (estimated locally, > freq 0.2098) (TRUE_VALUE,EXECUTABLE) > ;;14 [100.0% (guessed)] count:751018112 (estimated locally, > freq 0.6994) (TRUE_VALUE,EXECUTABLE) > # o.10_11 = PHI <(10), o.10_28(14)> > _17 = o.10_11 == > _20 = o.10_11 == > _27 = _20 | _17; > if (_27 != 0) > goto ; [58.87%] > else > goto ; [41.13%] > > It's pretty obvious that 10->11 can thread to 6. If we look at the other > incoming edge we need o.10_28 which comes from bb14 with the value So > that path should be 14->10->11 threading to 6. > > But they get rejected during threadfull2. In order for threadfull2 to thread 10->11->6, it would have to handle pointer equivalences, but the new threader doesn't currently handle them. In VRP we are able to handle equivs through the pointer_equiv_analyzer class which pushes/pops state. It only works when traversing in dominator order, which I suppose is technically the case when going down a path. So...I think you could wire pointer_equiv_analyzer to the new threader and get this, though it may need to be also taught about PHIs. The class is very simplistic, and was only a stop gap until we got pointer ranges (with points-to info). Question though, why didn't the old threader get this? I see bb11 looks exactly the same in DOM3, and the old threader does handle pointer equivalences through its scoped tables. May be the IL is too complicated? If we want to fix this in this release, I think we could do it with pointer_equiv_analyzer, but for future releases we should be able to get it for free when pranges are implemented. Half the work is already done...let's see how far I get considering I'll be going on paternity leave for a few months this year.
[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864 Jeffrey A. Law changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law --- It almost looks like a costing issue. The threaders find opportunities to thread all the incoming edges in the key block to the path which avoids the call.. But all the paths get rejected. This is the key block: ;; basic block 11, loop depth 0, count 976284897 (estimated locally, freq 0.9092), maybe hot ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED) ;;pred: 10 [99.8% (guessed)] count:225266786 (estimated locally, freq 0.2098) (TRUE_VALUE,EXECUTABLE) ;;14 [100.0% (guessed)] count:751018112 (estimated locally, freq 0.6994) (TRUE_VALUE,EXECUTABLE) # o.10_11 = PHI <(10), o.10_28(14)> _17 = o.10_11 == _20 = o.10_11 == _27 = _20 | _17; if (_27 != 0) goto ; [58.87%] else goto ; [41.13%] It's pretty obvious that 10->11 can thread to 6. If we look at the other incoming edge we need o.10_28 which comes from bb14 with the value So that path should be 14->10->11 threading to 6. But they get rejected during threadfull2.
[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org Priority|P3 |P2
[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-10-18 Keywords||needs-bisection Status|UNCONFIRMED |NEW Target Milestone|14.0|12.4 Summary|[14 Regression] Dead Code |[12/13/14 Regression] Dead |Elimination Regression |Code Elimination Regression |since r14-4038-gb975c0dc3be | --- Comment #1 from Andrew Pinski --- Confirmed but this is just happened to be a side effect. In forwprop1 we can change: ``` # iftmp.5_11 = PHI <1(3), 0(4)> _4 = iftmp.5_11 <= 0; _5 = (short int) _4; _25 = (int) _4; _26 = _25 << 5; _29 = (short int) _4; _27 = _29 << 5; _6 = (int) _27; if (_27 != 0) ``` Into just: ``` # iftmp.5_11 = PHI <1(3), 0(4)> if (iftmp.5_11 <= 0) ``` And then ethread is able to thread through that bb. You get the same missed optimization with removing the call to a. that is changing: ``` short v = a((g && p(t)) <= 0, 5); ``` to ``` short v = ((g && p(t)) <= 0); ``` Which then becomes a regression between GCC 11 and GCC 12.