[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #16 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:cc48418cfc2e555d837ae9138cbfac23acb3cdf9

commit r14-10106-gcc48418cfc2e555d837ae9138cbfac23acb3cdf9
Author: Richard Biener 
Date:   Wed Apr 24 08:42:40 2024 +0200

tree-optimization/114787 - more careful loop update with CFG cleanup

When CFG cleanup removes a backedge we have to be more careful with
loop update.  In particular we need to clear niter info and estimates
and if we remove the last backedge of a loop we have to also mark
it for removal to prevent a following basic block merging to associate
loop info with an unrelated header.

PR tree-optimization/114787
* tree-cfg.cc (remove_edge_and_dominated_blocks): When
removing a loop backedge clear niter info and when removing
the last backedge of a loop mark that loop for removal.

* gcc.dg/torture/pr114787.c: New testcase.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #15 from Richard Biener  ---
Created attachment 58023
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58023&action=edit
patch

I'm testing this.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #14 from Richard Biener  ---
(In reply to Jan Hubicka from comment #13)
> -fdump-tree-all-all  changing generated code is also bad.  We probably
> should avoid dumping loop bounds then they are not recorded. I added dumping
> of loop bounds and this may be unexpected side effect. WIll take a look.

I think consistently estimating the number of iterations here is correct.

I don't think the bug should be P1, it's latent and exposed only with an
artificial testcase.  We've likely had similar bugs before where we end up
associating estimates with a wrong loop after some CFG transform.

In this case we end up with the i-loop header being associated with a former
irreducible region.  The fix in the past was to release estimates/niters
on problematic transforms.  Let me have a look.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #13 from Jan Hubicka  ---
-fdump-tree-all-all  changing generated code is also bad.  We probably should
avoid dumping loop bounds then they are not recorded. I added dumping of loop
bounds and this may be unexpected side effect. WIll take a look.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P2  |P1

--- Comment #12 from Jakub Jelinek  ---
I think this should be still P1, while with -fdump-tree-all-all it miscompiled
the testcase already before, most users don't use those options, and on the
trunk it is a regression with just -O1.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #11 from Jakub Jelinek  ---
Seems it is {,likely_}max_loop_iterations_int on the for (; i < 1; i++) loop
which matters (aka loop 3).  Given the i = 0 right before it (guess csmith-ism,
don't see why it couldn't be in the for init expression) it estimates that it
loops once.
Then the copyprop2 pass removes the i++ latch and i <= 0 comparison in that
loop header, so from all I can see that loop disappears.
At profile_estimate time, we have loop 1 the b<=0 loop which iterates just once
and then loop 4 f<=0 nested in loop 3 i<=0 nested in loop 2 a>=0, the m loop
doesn't seem to be in loop structure maybe because of the goto into the loop.
After copyprop, the loop 1 b<=0 is gone and the i<=0 loop is as well, but not
in the loop structure, loop 3 in the loop structure (presumably with the cached
number of loop estimates) has the f<=0 header and loop 4 nested in it has a
header testing f<=0 too.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #10 from Andrew Pinski  ---
I suspect there needs to be a call to free_numbers_of_iterations_estimates
somewhere. Maybe it is copyprop, maybe there are a few other missing ones.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #9 from Jakub Jelinek  ---
It is the
  if (dump_file && (dump_flags & TDF_DETAILS)
  && max_loop_iterations_int (loop) >= 0)
{
  fprintf (dump_file,
   "Loop %d iterates at most %i times.\n", loop->num,
   (int)max_loop_iterations_int (loop));
}
  if (dump_file && (dump_flags & TDF_DETAILS)
  && likely_max_loop_iterations_int (loop) >= 0)
{
  fprintf (dump_file, "Loop %d likely iterates at most %i times.\n",
   loop->num, (int)likely_max_loop_iterations_int (loop));
}
cases which trigger the different code generation with
-fdump-tree-profile_estimate-details -O1, either of them; guess
max_loop_iterations_int and likely_max_loop_iterations_int cache the results
and while it doesn't change the IL from the profile_estimate pass, it changes
the behavior of the cunroll pass later on.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #8 from Jakub Jelinek  ---
Seems it is -fdump-tree-profile_estimate-details that changes the code
generation.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P1  |P2
   Keywords|needs-bisection |

--- Comment #7 from Jakub Jelinek  ---
With -fdump-tree-all-all it started with
r13-3898-gaf96500eea72c674a5686b35c66202ef2bd9688f
The assembly with r13-3897 is the same between -O1 and -O1 -fdump-tree-all-all,
while
with r13-3898 it is different and the testcase hangs.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> The first difference (in GCC 13) with/without -fdump-tree-all-all comes from
> cunroll:


I should note that -fdump-tree-cunroll-all still produces the correct code
generation for GCC 13 which makes this bug even odder.

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

--- Comment #5 from Andrew Pinski  ---
The first difference (in GCC 13) with/without -fdump-tree-all-all comes from
cunroll:



Broken:
```
Loop 3 iterates 2 times.
Loop 3 iterates at most 1 times.
Loop 3 likely iterates at most 1 times.
Analyzing # of iterations of loop 3
  exit condition [2, + , 4294967295] != 0
  bounds on difference of bases: -2 ... -2
  result:
# of iterations 2, bounded by 2
Removed pointless exit: if (ivtmp_43 != 0)
```

Working:
```
Loop 3 iterates 2 times.
Loop 3 iterates at most 2 times.
Loop 3 likely iterates at most 2 times.
```

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||needs-bisection
  Known to work||12.1.0
   Target Milestone|14.0|13.3
  Known to fail||13.1.0
Summary|[14 Regression] wrong code  |[13/14 Regression] wrong
   |at -O1 on x86_64-linux-gnu  |code at -O1 on
   |(the generated code hangs)  |x86_64-linux-gnu (the
   ||generated code hangs)

--- Comment #4 from Andrew Pinski  ---
So with GCC 13, with `-fdump-tree-all-all`, we get the same wrong code as on
the trunk.

This is why I was I misunderstood thinking it was a target issue as I was
comparing the dumps between GCC 13 and the trunk with -all enabled but it was
broken in GCC 13 too.

Anyways I tested GCC 12.3.0 and it looks to be working there.

It would be useful to get another bisect done this time with `-O1
-fdump-tree-all-all` .