[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2024-03-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

--- Comment #3 from Aldy Hernandez  ---
(In reply to Jeffrey A. Law from comment #2)
> It almost looks like a costing issue.  The threaders find opportunities to
> thread all the incoming edges in the key block to the path which avoids the
> call..  But all the paths get rejected.  
> 
> 
> This is the  key block:
> 
> ;;   basic block 11, loop depth 0, count 976284897 (estimated locally, freq
> 0.9092), maybe hot
> ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   10 [99.8% (guessed)]  count:225266786 (estimated locally,
> freq 0.2098) (TRUE_VALUE,EXECUTABLE)
> ;;14 [100.0% (guessed)]  count:751018112 (estimated locally,
> freq 0.6994) (TRUE_VALUE,EXECUTABLE)
>   # o.10_11 = PHI <(10), o.10_28(14)>
>   _17 = o.10_11 == 
>   _20 = o.10_11 == 
>   _27 = _20 | _17;
>   if (_27 != 0)
> goto ; [58.87%]
>   else
> goto ; [41.13%]
> 
> It's pretty obvious that 10->11 can thread to 6.  If we look at the other
> incoming edge we need o.10_28 which comes from bb14 with the value   So
> that path should be 14->10->11 threading to 6.
> 
> But they get rejected during threadfull2.

In order for threadfull2 to thread 10->11->6, it would have to handle pointer
equivalences, but the new threader doesn't currently handle them.  In VRP we
are able to handle equivs through the pointer_equiv_analyzer class which
pushes/pops state.  It only works when traversing in dominator order, which I
suppose is technically the case when going down a path.

So...I think you could wire pointer_equiv_analyzer to the new threader and get
this, though it may need to be also taught about PHIs.  The class is very
simplistic, and was only a stop gap until we got pointer ranges (with points-to
info).

Question though, why didn't the old threader get this?  I see bb11 looks
exactly the same in DOM3, and the old threader does handle pointer equivalences
through its scoped tables.  May be the IL is too complicated?

If we want to fix this in this release, I think we could do it with
pointer_equiv_analyzer, but for future releases we should be able to get it for
free when pranges are implemented.  Half the work is already done...let's see
how far I get considering I'll be going on paternity leave for a few months
this year.

[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2024-03-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law  ---
It almost looks like a costing issue.  The threaders find opportunities to
thread all the incoming edges in the key block to the path which avoids the
call..  But all the paths get rejected.  


This is the  key block:

;;   basic block 11, loop depth 0, count 976284897 (estimated locally, freq
0.9092), maybe hot
;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED)
;;pred:   10 [99.8% (guessed)]  count:225266786 (estimated locally,
freq 0.2098) (TRUE_VALUE,EXECUTABLE)
;;14 [100.0% (guessed)]  count:751018112 (estimated locally,
freq 0.6994) (TRUE_VALUE,EXECUTABLE)
  # o.10_11 = PHI <(10), o.10_28(14)>
  _17 = o.10_11 == 
  _20 = o.10_11 == 
  _27 = _20 | _17;
  if (_27 != 0)
goto ; [58.87%]
  else
goto ; [41.13%]

It's pretty obvious that 10->11 can thread to 6.  If we look at the other
incoming edge we need o.10_28 which comes from bb14 with the value   So that
path should be 14->10->11 threading to 6.

But they get rejected during threadfull2.

[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-10-18
   Keywords||needs-bisection
 Status|UNCONFIRMED |NEW
   Target Milestone|14.0|12.4
Summary|[14 Regression] Dead Code   |[12/13/14 Regression] Dead
   |Elimination Regression  |Code Elimination Regression
   |since r14-4038-gb975c0dc3be |

--- Comment #1 from Andrew Pinski  ---
Confirmed but this is just happened to be a side effect.

In forwprop1 we can change:
```
  # iftmp.5_11 = PHI <1(3), 0(4)>
  _4 = iftmp.5_11 <= 0;
  _5 = (short int) _4;
  _25 = (int) _4;
  _26 = _25 << 5;
  _29 = (short int) _4;
  _27 = _29 << 5;
  _6 = (int) _27;
  if (_27 != 0)
```

Into just:
```
  # iftmp.5_11 = PHI <1(3), 0(4)>
  if (iftmp.5_11 <= 0)
```

And then ethread is able to thread through that bb.

You get the same missed optimization with removing the call to a.
that is changing:
```
short v = a((g && p(t)) <= 0, 5);
```
to
```
short v = ((g && p(t)) <= 0);
```

Which then becomes a regression between GCC 11 and GCC 12.