[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Richard Biener  ---
Fixed.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d45ddc2c04e471d0dcee016b6edacc00b8341b16

commit r14-4089-gd45ddc2c04e471d0dcee016b6edacc00b8341b16
Author: Richard Biener 
Date:   Thu Sep 14 13:06:51 2023 +0200

tree-optimization/111294 - backwards threader PHI costing

This revives an earlier patch since the problematic code applying
extra costs to PHIs in copied blocks we couldn't make any sense of
prevents a required threading in this case.  Instead of coming up
with an artificial other costing the following simply removes the
bits.

As with all threading changes this requires a plethora of testsuite
adjustments, but only the last three are unfortunate as is the
libgomp team.c adjustment which is required to avoid a bogus -Werror
diagnostic during bootstrap.

PR tree-optimization/111294
gcc/
* tree-ssa-threadbackward.cc (back_threader_profitability::m_name):
Remove
(back_threader::find_paths_to_names): Adjust.
(back_threader::maybe_thread_block): Likewise.
(back_threader_profitability::possibly_profitable_path_p): Remove
code applying extra costs to copies PHIs.

libgomp/
* team.c (gomp_team_start): Assert alloca size to avoid false
positive alloc-size diagnostic.

gcc/testsuite/
* gcc.dg/tree-ssa/pr111294.c: New test.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/tree-ssa/pr61839_2.c: Likewise.
* gcc.dg/tree-ssa/ssa-sink-18.c: Likewise.
* g++.dg/warn/Wstringop-overflow-4.C: XFAIL subtest on ilp32.
* gcc.dg/uninit-pred-9_b.c: XFAIL subtest everywhere.
* gcc.dg/vect/vect-117.c: Make scan for not Invalid sum
conditional on lp64.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:9ea74d235c7e7816b996a17c61288f02ef767985

commit r14-3982-g9ea74d235c7e7816b996a17c61288f02ef767985
Author: Richard Biener 
Date:   Thu Sep 14 09:31:23 2023 +0200

tree-optimization/111294 - better DCE after forwprop

The following adds more aggressive DCE to forwprop to clean up dead
stmts when folding a stmt leaves some operands unused.  The patch
uses simple_dce_from_worklist for this purpose, queueing original
operands before substitution and folding, but only if we folded the
stmt.

This removes one dead stmt biasing threading costs in a later pass
but it doesn't resolve the optimization issue in the PR yet.

PR tree-optimization/111294
* tree-ssa-forwprop.cc (pass_forwprop::execute): Track
operands that eventually become dead and use
simple_dce_from_worklist
to remove their definitions if they did so.

* gcc.dg/tree-ssa/evrp10.c: Adjust.
* gcc.dg/tree-ssa/evrp6.c: Likewise.
* gcc.dg/tree-ssa/forwprop-31.c: Likewise.
* gcc.dg/tree-ssa/neg-cast-3.c: Likewise.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #10 from Richard Biener  ---
But this stmt isn't the issue, BB7 is

   [local count: 118111600]:
  # _31 = PHI 
  _4 = (unsigned char) _31;
  _6 = (int) a.8_28;
  j_22 = (short int) _4;
  _33 = _31 & 255;
  if (_33 > 11)

and that does have one more stmt.  It's

  if (a.8_28 != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 40157944]:

   [local count: 118111600]:
  # _31 = PHI  
  _4 = (unsigned char) _31;
  _6 = (int) a.8_28;
  j_22 = (short int) _4;
  _33 = _31 & 255;
  if (_33 > 11)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 59055800]:

   [local count: 118111600]:
  # iftmp.11_27 = PHI 

so what the cost model fails to see is that j_22 and _4 are only live
on one path to BB9.

It's that odd code again I attempted to remove at some point:

  /* PHIs in the path will create degenerate PHIS in the
 copied path which will then get propagated away, so
 looking at just the duplicate path the PHIs would
 seem unimportant.

 But those PHIs, because they're assignments to objects
 typically with lives that exist outside the thread path,
 will tend to generate PHIs (or at least new PHI arguments)
 at points where we leave the thread path and rejoin
 the original blocks.  So we do want to account for them.

 We ignore virtual PHIs.  We also ignore cases where BB
 has a single incoming edge.  That's the most common
 degenerate PHI we'll see here.  Finally we ignore PHIs
 that are associated with the value we're tracking as
 that object likely dies.  */
  if (EDGE_COUNT (bb->succs) > 1 && EDGE_COUNT (bb->preds) > 1)
{
  for (gphi_iterator gsip = gsi_start_phis (bb);
   !gsi_end_p (gsip);
   gsi_next ())
{
  gphi *phi = gsip.phi ();
  tree dst = gimple_phi_result (phi);

  /* Note that if both NAME and DST are anonymous
 SSA_NAMEs, then we do not have enough information
 to consider them associated.  */
  if (dst != name
  && name
  && TREE_CODE (name) == SSA_NAME
  && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
  || !SSA_NAME_VAR (dst))
  && !virtual_operand_p (dst))
++m_n_insns;
}
}

there's also a missed canonicalization I think:

  _4 = (unsigned char) _31;
  _6 = (int) a.8_28;
  j_22 = (short int) _4; 
  _33 = _31 & 255;

we canonicalize (int)(unsigned char) _31 to _31 & 255 but we fail to
do the same for (short)(unsigned char) _31 or rather we fail to
anticipate that (short)_33 could be used for j_22, eliding _4.

Maybe costing "lowparts" as zero would be useful here.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #9 from Richard Biener  ---
Created attachment 55898
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55898=edit
alternative patch for simple_dce_from_worklist

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #8 from Richard Biener  ---
(In reply to Andrew Pinski from comment #7)
> Created attachment 55892 [details]
> version of using simple_dce_from_worklist in forwprop
> 
> This is a version of using simple_dce_from_worklist in forwprop I had tried
> at one point, but I don't remember why I did finish up this patch.

That wouldn't catch the case in question.  The issue is when we simplify

   _32 = (char) _31;
-  _4 = (unsigned char) _32;
+  _4 = (unsigned char) _31;

we don't realize _32 becomes unused.  I think it might be enough to add
all original SSA uses of a stmt we fold to the DCE worklist if we simplified
it (and also before substituting from the fwprop lattice?).  I think it
doesn't work to look at orig_stmt operands before we do update_stmt on the
new stmt.

Sth like the attached works though.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #7 from Andrew Pinski  ---
Created attachment 55892
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55892=edit
version of using simple_dce_from_worklist in forwprop

This is a version of using simple_dce_from_worklist in forwprop I had tried at
one point, but I don't remember why I did finish up this patch.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #6 from Richard Biener  ---
So the issue is that forwprop & folding has a hard time in cleaning up dead
code afterwards but it would also benefit from doing that more aggressively
(and early) because of single_use () and friends.

I'm thinking of hooking into update_stmt to discover candidates for
simple-dce-from-worklist (likely not early and aggressive enough though).

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #5 from Richard Biener  ---
I will have a look (the const code is old & odd)

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> In .optimized we have:
>[local count: 118111600]:
>   # _31 = PHI <5(2), 0(3)>
>   _6 = (int) a.9_28;
>   _8 = _31 <= 3;
>   _9 = (int) _8;
>   if (_6 != _9)
> goto ; [66.00%]
>   else
> goto ; [34.00%]
> 
> Except nothing optimizes _8 into the phi post pre (note _31 is only used in
> the definition of _8 even).

But optimizing that does not help here.

Someone who understands jump threading should look into this.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-09-05
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
Confirmed.(In reply to Andrew Pinski from comment #2)
> 
> But _32 will be 0, _4 will be 0, j_22 will be 0, _33 will be 0.
> But I think jump threading does not recognize j_22 will be 0 on that edge
> and considers it a copy ... I think.

after:
Checking profitability of path (backwards):  bb:7 (4 insns) bb:5
  Control statement insns: 2
  Overall: 2 insns

before:
Checking profitability of path (backwards):  bb:7 (3 insns) bb:5
  Control statement insns: 2
  Overall: 1 insns


Actually I think it is counting an already dead statement. The statement:
  _32 = (charD.10) _31;


is dead but is being counted.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #2 from Andrew Pinski  ---
The difference is jump threading:
old (able to remove foo):
```
 Registering killing_def (path_oracle) _4
 Registering value_relation (path_oracle) (_4 pe8 _33) (root: bb5)
Checking profitability of path (backwards): 
  [1] Registering jump thread: (5, 7) incoming edge;  (7, 9) nocopy; 
path: 5->7->9 SUCCESS
```

vs
new (not able to remove foo):
```
 Registering killing_def (path_oracle) _33
 Registering value_relation (path_oracle) (_33 pe8 _31) (root: bb5)
Checking profitability of path (backwards):   FAIL: Jump-thread path not
considered: duplication of 2 insns is needed and optimizing for size.
path: 5->7->xx REJECTED
```


Old IR before jump threading:
```

   [local count: 118111600]:
  a.9_28 = a;
  if (a.9_28 != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 40157944]:
  _30 = (char) b.1_2;

   [local count: 118111600]:
  # _33 = PHI <_30(6), 0(5)>
  _4 = (unsigned char) _33;
  _6 = (int) a.9_28;
  j_22 = (short int) _4;
  _20 = (int) _4;
  if (_4 > 11)
goto ; [50.00%]
  else
goto ; [50.00%]
```

New:
```
   [local count: 118111600]:
  a.9_28 = a;
  if (a.9_28 != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 40157944]:

   [local count: 118111600]:
  # _31 = PHI 
  _32 = (char) _31;
  _4 = (unsigned char) _31;
  _6 = (int) a.9_28;
  j_22 = (short int) _4;
  _33 = _31 & 255;
  if (_33 > 11)
goto ; [50.00%]
  else
goto ; [50.00%]
```

But _32 will be 0, _4 will be 0, j_22 will be 0, _33 will be 0.
But I think jump threading does not recognize j_22 will be 0 on that edge and
considers it a copy ... I think.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #1 from Andrew Pinski  ---
In .optimized we have:
   [local count: 118111600]:
  # _31 = PHI <5(2), 0(3)>
  _6 = (int) a.9_28;
  _8 = _31 <= 3;
  _9 = (int) _8;
  if (_6 != _9)
goto ; [66.00%]
  else
goto ; [34.00%]

Except nothing optimizes _8 into the phi post pre (note _31 is only used in the
definition of _8 even).

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |14.0