[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-25 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Andrew Macleod  ---
Fixed.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:661c02e54ea72fb55205df0a717951ff28bb739e

commit r12-5522-g661c02e54ea72fb55205df0a717951ff28bb739e
Author: Andrew MacLeod 
Date:   Tue Nov 23 14:12:29 2021 -0500

Check for equivalences between PHI argument and def.

If a PHI argument on an edge is equivalent with the DEF, then it doesn't
provide any new information, defer processing it unless they are all
equivalences.

PR tree-optimization/103359
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): If arg is
equivalent to def, don't initially include it's range.

gcc/testsuite/
* gcc.dg/pr103359.c: New.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #9 from Andrew Macleod  ---
This is an artifact of rangers hybrid optimistic/pessimistic approach.

We optimistically assume things are UNDEFINED until we actually have to resolve
them. 
The code at the end of FRE is not representative to what we see in EVRP as the
loop code has inserted some PHI copies:

:
# h_7 = PHI <4(2), 1(7)>
 :
# h_18 = PHI 

We have to resolve h_22 on the back edge, and there ends up being a series of
PHIs, and we commit to VARYING before we fully resolve the cycle:
 :
# h_22 = PHI 
 :
# h_20 = PHI 
 :
# h_10 = PHI 

We currently have code to "ignore" arguments that are the same as the def,
because they provide no new info, so when processing something like:
 # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)>
we still come up with [1,1][[4,4]

I notice that when looking at the PHI coming from the back edge:

Equivalence set : [h_10, h_18, h_22]
 :
# h_22 = PHI 
that h_18 is in the equivalency set of h_22 at this point...

I am experimenting with using that information to also decide that since h_22
and h_18 are equivalent on that backedge, that we can avoid using h_22 as well.
which means h_18 and h_7 will be equivalent, and evaluate to [1,1][4,4].  and
then we eliminate the store to C like we did before.

stay tuned.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #8 from Aldy Hernandez  ---
For the record, I'm using:

gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC) 

as a proxy for gcc11.

And I'm using the *.fre1 dump to see what evrp sees on entry.

Perhaps there's something going on here such that I don't get IL Richi got in
comment #4 for trunk.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #7 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #4)
> So the important part is to eliminate the store to 'c' which GCC11 manages
> to do
> from EVRP while trunk fails at this task.  The IL into EVRP is absoultely
> identical:
> 
>:
>   goto ; [INV]
> 
>:
>   # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)>
>   d.2_8 = d;
>   if (d.2_8 != 0)
> goto ; [INV]
> 
>:
>   if (h_10 != 0)
> goto ; [INV]
> 
>:
>   e.0_1 = e;
>   if (e.0_1 != 0)
> goto ; [INV]
> 
>:
>   a.1_2 = a;
>   _3 = (short int) a.1_2;
>   _14 = a.1_2 & 1;
>   _4 = (short int) _14;
>   _5 = _4 | h_10;
>   _6 = (int) _5;
>   f.4_21 = (unsigned short) _5;
>   _23 = f.4_21 * 3;
>   _24 = (short int) _23;
>   if (_5 == 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>:
>   c = 0;

In GCC11 we had this right before evrp:

  _5 = _4 | h_10;
  _6 = (int) _5;
  f.4_21 = (unsigned short) _5;
  _23 = f.4_21 * 3;
  _24 = (short int) _23;
  if (_5 == 0)
goto ; [INV]
  else
goto ; [INV]

which evrp could fold:

Visiting conditional with predicate: if (_5 == 0)

With known ranges
_5: short int ~[0, 0]

Predicate evaluates to: 0

However, in trunk the IL is different:

  _5 = _4 | h_10;
  _6 = (int) _5;
  f.4_21 = (unsigned short) _5;
  _23 = f.4_21 * 3;
  _24 = (short int) _23;
  if (_24 == 0)
goto ; [INV]
  else
goto ; [INV]

Where did the cast come from in trunk?  Cause without the case, we should be
able to fold it.

For that matter, in trunk we have:

Folding statement: if (_24 == 0)

Visiting conditional with predicate: if (_24 == 0)

With known ranges
_24: short int VARYING

Predicate evaluates to: DON'T KNOW
gimple_simplified to if (_5 == 0)
Folded into: if (_5 == 0)

So gimple fold does get rid of the cast for us, and at that point ranger can
figure out the conditional:

[ranger dump]
if (_5 == 0)
  goto ; [INV]
else
  goto ; [INV]

...
...
_5 : short int [-INF, -1][1, +INF]

Presumably we're asking ranger before calling gimple fold, but the question is,
what was going on in GCC11 that the _24 => _5 substitution was done before
arriving in evrp.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #6 from Richard Biener  ---
Which means the CCP difference is that we're failing to propagate out copies
when they have a useful value:

Visiting statement:
_25 = _24;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x0 (0xf).  Adding SSA edges to worklist.
ssa_edge_worklist: adding SSA use in _7 = _25;

Simulating statement: _7 = _25;

Visiting statement:
_7 = _25;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x0 (0xf).  Adding SSA edges to worklist.

that's because the copy and constant lattice are unified and we cannot
track both.  And while forwprop will propagate out copies since it does
that when it visits uses it confuses itself with this.  Propagating them
out immediately fixes this but the obvious use of replace_uses_by causes
excessive folding and redundant work as well as out-of-order debug stmt
handling.

The following would fix the testcase.  Note there's still the appearant EVRP
regression compared to GCC 11.  The single_use test was added with
g:2fde61e3caf4c4660743e53497f52b65da1fe760 as a fix for PR66916.

diff --git a/gcc/match.pd b/gcc/match.pd
index f059b477f58..b32cc4a9368 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5221,8 +5221,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& ((POINTER_TYPE_P (TREE_TYPE (@00))
 && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@00
|| (POINTER_TYPE_P (TREE_TYPE (@10))
-   && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10))
-   && single_use (@0))
+   && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10)))
(if (TYPE_PRECISION (TREE_TYPE (@00)) == TYPE_PRECISION (TREE_TYPE (@0))
&& (TREE_CODE (@10) == INTEGER_CST
|| @1 != @10)

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #5 from Richard Biener  ---
(In reply to Andrew Pinski from comment #2)
> The other thing is:
> 
> -ftree-bit-ccp
> Visiting statement:
> _4 = _3 & 1;
> which is likely CONSTANT
> Applying pattern match.pd:1641, gimple-match.c:23146
> Lattice value changed to CONSTANT 0x0 (0x1).  Adding SSA edges to worklist.
> marking stmt to be not simulated again
> 
> vs
> 
> -fno-tree-bit-ccp
> _4 = _3 & 1;
> which is likely CONSTANT
> Applying pattern match.pd:1641, gimple-match.c:23146
> Lattice value changed to VARYING.  Adding SSA edges to worklist.
> 
> In the first case we mark the stmt as not be simulated again while in the
> second case we didn't.

VARYING are always 'not simulated again', the dumping is for an optimization
when all uses in a stmt will never be simulated again.  So the only difference
is the folding where there are constraints which folding results we may
put into the lattice that are not satisfied here and that we do not fold
likely VARYING statements (all the "likely" stuff is compile-time optimization,
eventually premature).  So I don't really see any CCP issue here.  It's just
that without bit-CCP we get

@@ -365,7 +365,7 @@
   f.4_21 = (unsigned short) _5;
   _23 = f.4_21 * 3;
   _24 = (short int) _23;
-  if (_24 == 0)
+  if (_5 == 0)
 goto ; [INV]

the real issue is IMHO that forwprop doesn't get to it

@@ -23,13 +23,7 @@

 gimple_simplified to _14 = a.1_2 & 1;
 _4 = (short int) _14;
-gimple_simplified to f_18 = _5;
-Removing dead stmt _7 = _24;
-
-Removing dead stmt _25 = _24;
-
-Removing dead stmt f_18 = _5;
-
+gimple_simplified to if (_5 == 0)

we don't get

/* For integral types with undefined overflow fold
   x * C1 == C2 into x == C2 / C1 or false.
   If overflow wraps and C1 is odd, simplify to x == C2 / C1 in the ring
   Z / 2^n Z.  */
(for cmp (eq ne)
 (simplify
  (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2)

applied here.  We do get the fold_sign_changed_comparison pattern applied
though, but only from combine_cond_expr_cond and there we do not apply
it because of the single-use restriction and the propagation phase
of forwprop delaying stmt removal confusing that.  The same issue
prevents the GIMPLE variant from applying since it has

/* From fold_sign_changed_comparison and fold_widened_comparison.
   FIXME: the lack of symmetry is disturbing.  */
(for cmp (simple_comparison)
 (simplify
  (cmp (convert@0 @00) (convert?@1 @10))
...
   && single_use (@0))

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Richard Biener  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
So the important part is to eliminate the store to 'c' which GCC11 manages to
do
from EVRP while trunk fails at this task.  The IL into EVRP is absoultely
identical:

   :
  goto ; [INV]

   :
  # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)>
  d.2_8 = d;
  if (d.2_8 != 0)
goto ; [INV]

   :
  if (h_10 != 0)
goto ; [INV]

   :
  e.0_1 = e;
  if (e.0_1 != 0)
goto ; [INV]

   :
  a.1_2 = a;
  _3 = (short int) a.1_2;
  _14 = a.1_2 & 1;
  _4 = (short int) _14;
  _5 = _4 | h_10;
  _6 = (int) _5;
  f.4_21 = (unsigned short) _5;
  _23 = f.4_21 * 3;
  _24 = (short int) _23;
  if (_5 == 0)
goto ; [INV]
  else
goto ; [INV]

   :
  c = 0;

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #3 from Richard Biener  ---
I will have a look.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

--- Comment #2 from Andrew Pinski  ---
The other thing is:

-ftree-bit-ccp
Visiting statement:
_4 = _3 & 1;
which is likely CONSTANT
Applying pattern match.pd:1641, gimple-match.c:23146
Lattice value changed to CONSTANT 0x0 (0x1).  Adding SSA edges to worklist.
marking stmt to be not simulated again

vs

-fno-tree-bit-ccp
_4 = _3 & 1;
which is likely CONSTANT
Applying pattern match.pd:1641, gimple-match.c:23146
Lattice value changed to VARYING.  Adding SSA edges to worklist.

In the first case we mark the stmt as not be simulated again while in the
second case we didn't.

Someone who understands ccp better should look into this.

[Bug tree-optimization/103359] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2021-11-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103359

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-11-22
   Keywords||missed-optimization
   Target Milestone|--- |12.0

--- Comment #1 from Andrew Pinski  ---
Confirmed.

In CCP1 on the trunk:
Visiting statement:
f_18 = (short intD.25) _6;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x4 (0x1).  Adding SSA edges to worklist.



In ccp1 on the trunk with -fno-tree-bit-ccp:
Visiting statement:
f_18 = (short intD.25) _6;
which is likely CONSTANT
Applying pattern match.pd:3663, gimple-match.c:42920
Applying pattern match.pd:3580, gimple-match.c:42859
Match-and-simplified (short int) _6 to _5
Lattice value changed to CONSTANT _5.  Adding SSA edges to worklist.
marking stmt to be not simulated again

So ccp1 sometimes applies match and simplify and sometimes does not