[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2022-04-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P3

--- Comment #5 from Richard Biener  ---
There's nothing special about this bug that makes it more important that the
other "DCE" regressions identified against GCC 12 like PRs 102540, 102705,
102892, 102981, 102982, 103388, 104530, 105086.

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2022-03-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Specifically

 ;;   basic block 3, loop depth 1
-;;pred:   12
+;;pred:   10
   foo ();
 ;;succ:   4

...

+;;   basic block 10, loop depth 1
+;;pred:   7
+;;8
+  # _51 = PHI <_55(7), _4(8)>
+  _59 = _30 >> _51;
   iftmp.4_60 = (char) _59;
-;;succ:   12
-
-;;   basic block 12, loop depth 1
-;;pred:   10
-;;11
-  # iftmp.4_61 = PHI <_29(10), iftmp.4_60(11)>
-  _62 = (int) iftmp.4_61;
+  _62 = (int) iftmp.4_60;
   d = _62;
   if (_62 == 0)
-goto ; [33.00%]
+goto ; [66.33%]
   else
-goto ; [67.00%]
+goto ; [33.67%]
 ;;succ:   3
 ;;4


the wrecking occurs in back_jt_path_registry::duplicate_thread_path via
update_bb_profile_for_threading doing

  else if (!(prob == profile_probability::always ()))
{
  FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;

but taken_edge is the skip edge.  So it seems that this might be OK after
all but we've threaded the "unlikely path", leaving the "likely" one
exposed to the unroller most probable path with now the call to foo ()
more likely executed?!  This is

Checking profitability of path (backwards):  bb:12 (4 insns) bb:10
  Control statement insns: 2
  Overall: 2 insns
  [5] Registering jump thread: (10, 12) incoming edge;  (12, 4) nocopy;
path: 10->12->4 SUCCESS

where the 10->12 has 59% probability and the 10->4 67%

Again not sure why we need to adjust the 12->3/4 probabilities on the
path leading through 10->11?  Sure, we need to adjust the incoming
count into 12 and the counts of 3 and 4 but why adjust probabilities?

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2022-03-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
   Priority|P3  |P1

--- Comment #3 from Richard Biener  ---
There's an interesting missing value-numbering optimization here,
call_may_clobber_ref_p_1 considers the call to foo () possibly clobbering 'c'
even though 'c' does not escape the TU.

Since 'foo' is external there's no IPA reference or modref data but we
do know that !may_be_aliased (base) so we could amend

  /* If the reference is based on a decl that is not aliased the call
 cannot possibly clobber it.  */
  if (DECL_P (base)
  && !may_be_aliased (base)
  /* But local non-readonly statics can be modified through recursion
 or the call may implement a threading barrier which we must
 treat as may-def.  */
  && (TREE_READONLY (base)
  || !is_global_var (base)))
return false;

to constrain the "But local ..." (note nested functions make 'local'
difficult to express so we use !is_global_var).  Of course the
threading barrier issue would still exist, but then the call itself
isn't clobbering it just serves as a barrier for code motion - I'm not
sure what kind of transforms we have to forbid.

Now, we _do_ have to ensure that foo () cannot access 'c' which it for
example might do if there's a

bar() { c = 3 };
void (*hook)() = bar;

and foo calls the exported *hook.  In the end we have

c/1 (c) @0x77ff3180
  Type: variable definition analyzed
  Visibility: semantic_interposition prevailing_def_ironly
  References:
  Referring: main/4 (write) main/4 (read)
  Availability: available
  Varpool flags: used-by-single-function

(semantic_interposition!?), used-by-single-function might be the "trick"
to use here.  Maybe we can also compute a non-recursive flag on
main/4 to say that control flow cannot possibly be (indirectly) recursive.

For the threading issue we might need a flag like
not-called-by-address-taken-functions (including not address taken itself) on
functions which should
practically rule out being a thread.

Anyway, the testcase in GCC 11 relies on cunrolli unrolling the inner loop
and cunroll unrolling the outer loop while GCC 12 no longer unrolls the
outer loop because

size: 18-3, last_iteration: 17-3
  Loop size: 18
  Estimated size after unrolling: 19
Not unrolling loop 1: contains call and code would grow.

while GCC 11 has

size: 17-3, last_iteration: 16-3
  Loop size: 17
  Estimated size after unrolling: 18
Making edge 14->9 impossible by redistributing probability to other edges.
Making edge 4->5 impossible by redistributing probability to other edges.
t.c:8:21: optimized: loop with 1 iterations completely unrolled (header
execution count 134197598)
Exit condition of peeled iterations was eliminated.
Last iteration exit edge was proved true.
Forced exit to be taken: if (0 != 0)

The difference is get_loop_hot_path () which on trunk gets presented with
a loop body where some extra path duplication has occured, duplicating the
store
to d and directing the path to foo () where the respective edge has 66%
probability vs. 33% on trunk and on the GCC 11 branch the situation
is reversed with 67% for the skip over the call.

On trunk threadfull1 duplicates the path with the store to 'd' and that
is also what wrecks the edge probabilities.

I think that's what we definitely need to fix here - the profile wreckage
done by threadfull1.

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2021-10-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2021-10-21 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

--- Comment #2 from Andrew Macleod  ---
(In reply to Aldy Hernandez from comment #1)
>
> 
> So, presumably _62 == 0 cannot be true.  If _62 == 0, then
> 
> 0 = _30 >> _6;
> 
> But that cannot happen because _30 is 2 if you follow the series of copies
> from the "c" global.
> 
> There is no way 0 = 2 >> x can ever be true.
> 
> There's probably a couple things missing here.  Maybe
> operator_rshift::op1_range needs to be taught that ~[0,0] = x >> y implies x
> is non-zero.  But also, we can't see through the load from the c=2 global. 
> Shouldn't that c=2 have been propagated by someone at this point?  (VRP1?)


well, that depends on _6.  0 = _30 >> 32 is always true. 
now, we do happen to know that _6 is [0,1], so if we did manage to determine
that _30 is [2,2], then the we will fold [2,2] >> [0,1] into [1,2] and then
everything should fall into place as we know we can never take that breanch.

That means our core issue is
  c.2_28 = c;
  _29 = (char) c.2_28;
  _30 = (int) _29;
  if (_29 >= 0)

that we don't propagate the 2 into the c.2_28.   we are limited to a range of
[1,127] because of that.

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2021-10-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #1 from Aldy Hernandez  ---
We should've elided the call to foo() in the 8->3 edge.  This is the IL in
vrp-thread1 dump:

int main ()
{
  int h;
  int g;
  int _3;
  _Bool _5;
  int _6;
  char iftmp.4_11;
  int _12;
  int _19;
  int _21;
  char iftmp.4_22;
  int d.0_25;
  _Bool _26;
  int _27;
  int c.2_28;
  char _29;
  int _30;
  int _42;
  char iftmp.4_43;
  int _48;
  int _50;
  _Bool _53;
  int _56;
  int _59;
  char iftmp.4_60;
  int _62;

   [local count: 134197598]:
  c = 2;
  goto ; [100.00%]

   [local count: 88583700]:
  foo ();

   [local count: 268435457]:
  g_18 = g_10 + 1;

   [local count: 402633054]:
  # g_10 = PHI <0(2), g_18(4)>
  if (g_10 != 2)
goto ; [66.67%]
  else
goto ; [33.33%]

   [local count: 268435456]:
  d.0_25 = d;
  _26 = d.0_25 == 0;
  _27 = (int) _26;
  c.2_28 = c;
  _29 = (char) c.2_28;
  _30 = (int) _29;
  if (_29 >= 0)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 110058536]:
  # iftmp.4_11 = PHI <_29(6)>
  _12 = (int) iftmp.4_11;
  d = _12;
  _48 = (int) _29;
  d = _48;
  _19 = (int) _29;
  d = _19;
  goto ; [100.00%]

   [local count: 158376920]:
  _42 = _30 >> _27;
  iftmp.4_43 = (char) _42;
  _50 = _42;
  d = _50;
  _53 = _50 == 0;
  _56 = (int) _53;
  _21 = _30 >> _56;
  iftmp.4_22 = (char) _21;
  _3 = _21;
  d = _3;
  _5 = _3 == 0;
  _6 = (int) _5;
  _59 = _30 >> _6;
  iftmp.4_60 = (char) _59;
  _62 = _59;
  d = _62;
  if (_62 == 0)
goto ; [66.33%]
  else
goto ; [33.67%]

   [local count: 134197598]:
  return 0;

}

So, presumably _62 == 0 cannot be true.  If _62 == 0, then

0 = _30 >> _6;

But that cannot happen because _30 is 2 if you follow the series of copies from
the "c" global.

There is no way 0 = 2 >> x can ever be true.

There's probably a couple things missing here.  Maybe
operator_rshift::op1_range needs to be taught that ~[0,0] = x >> y implies x is
non-zero.  But also, we can't see through the load from the c=2 global. 
Shouldn't that c=2 have been propagated by someone at this point?  (VRP1?)

[Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3

2021-10-21 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102879

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-10-21
 CC||aldyh at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org