[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #16 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:cc48418cfc2e555d837ae9138cbfac23acb3cdf9 commit r14-10106-gcc48418cfc2e555d837ae9138cbfac23acb3cdf9 Author: Richard Biener Date: Wed Apr 24 08:42:40 2024 +0200 tree-optimization/114787 - more careful loop update with CFG cleanup When CFG cleanup removes a backedge we have to be more careful with loop update. In particular we need to clear niter info and estimates and if we remove the last backedge of a loop we have to also mark it for removal to prevent a following basic block merging to associate loop info with an unrelated header. PR tree-optimization/114787 * tree-cfg.cc (remove_edge_and_dominated_blocks): When removing a loop backedge clear niter info and when removing the last backedge of a loop mark that loop for removal. * gcc.dg/torture/pr114787.c: New testcase.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #15 from Richard Biener --- Created attachment 58023 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58023&action=edit patch I'm testing this.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #14 from Richard Biener --- (In reply to Jan Hubicka from comment #13) > -fdump-tree-all-all changing generated code is also bad. We probably > should avoid dumping loop bounds then they are not recorded. I added dumping > of loop bounds and this may be unexpected side effect. WIll take a look. I think consistently estimating the number of iterations here is correct. I don't think the bug should be P1, it's latent and exposed only with an artificial testcase. We've likely had similar bugs before where we end up associating estimates with a wrong loop after some CFG transform. In this case we end up with the i-loop header being associated with a former irreducible region. The fix in the past was to release estimates/niters on problematic transforms. Let me have a look.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #13 from Jan Hubicka --- -fdump-tree-all-all changing generated code is also bad. We probably should avoid dumping loop bounds then they are not recorded. I added dumping of loop bounds and this may be unexpected side effect. WIll take a look.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 Jakub Jelinek changed: What|Removed |Added Priority|P2 |P1 --- Comment #12 from Jakub Jelinek --- I think this should be still P1, while with -fdump-tree-all-all it miscompiled the testcase already before, most users don't use those options, and on the trunk it is a regression with just -O1.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #11 from Jakub Jelinek --- Seems it is {,likely_}max_loop_iterations_int on the for (; i < 1; i++) loop which matters (aka loop 3). Given the i = 0 right before it (guess csmith-ism, don't see why it couldn't be in the for init expression) it estimates that it loops once. Then the copyprop2 pass removes the i++ latch and i <= 0 comparison in that loop header, so from all I can see that loop disappears. At profile_estimate time, we have loop 1 the b<=0 loop which iterates just once and then loop 4 f<=0 nested in loop 3 i<=0 nested in loop 2 a>=0, the m loop doesn't seem to be in loop structure maybe because of the goto into the loop. After copyprop, the loop 1 b<=0 is gone and the i<=0 loop is as well, but not in the loop structure, loop 3 in the loop structure (presumably with the cached number of loop estimates) has the f<=0 header and loop 4 nested in it has a header testing f<=0 too.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #10 from Andrew Pinski --- I suspect there needs to be a call to free_numbers_of_iterations_estimates somewhere. Maybe it is copyprop, maybe there are a few other missing ones.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #9 from Jakub Jelinek --- It is the if (dump_file && (dump_flags & TDF_DETAILS) && max_loop_iterations_int (loop) >= 0) { fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num, (int)max_loop_iterations_int (loop)); } if (dump_file && (dump_flags & TDF_DETAILS) && likely_max_loop_iterations_int (loop) >= 0) { fprintf (dump_file, "Loop %d likely iterates at most %i times.\n", loop->num, (int)likely_max_loop_iterations_int (loop)); } cases which trigger the different code generation with -fdump-tree-profile_estimate-details -O1, either of them; guess max_loop_iterations_int and likely_max_loop_iterations_int cache the results and while it doesn't change the IL from the profile_estimate pass, it changes the behavior of the cunroll pass later on.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #8 from Jakub Jelinek --- Seems it is -fdump-tree-profile_estimate-details that changes the code generation.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 Jakub Jelinek changed: What|Removed |Added Priority|P1 |P2 Keywords|needs-bisection | --- Comment #7 from Jakub Jelinek --- With -fdump-tree-all-all it started with r13-3898-gaf96500eea72c674a5686b35c66202ef2bd9688f The assembly with r13-3897 is the same between -O1 and -O1 -fdump-tree-all-all, while with r13-3898 it is different and the testcase hangs.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > The first difference (in GCC 13) with/without -fdump-tree-all-all comes from > cunroll: I should note that -fdump-tree-cunroll-all still produces the correct code generation for GCC 13 which makes this bug even odder.
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #5 from Andrew Pinski --- The first difference (in GCC 13) with/without -fdump-tree-all-all comes from cunroll: Broken: ``` Loop 3 iterates 2 times. Loop 3 iterates at most 1 times. Loop 3 likely iterates at most 1 times. Analyzing # of iterations of loop 3 exit condition [2, + , 4294967295] != 0 bounds on difference of bases: -2 ... -2 result: # of iterations 2, bounded by 2 Removed pointless exit: if (ivtmp_43 != 0) ``` Working: ``` Loop 3 iterates 2 times. Loop 3 iterates at most 2 times. Loop 3 likely iterates at most 2 times. ```
[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 Andrew Pinski changed: What|Removed |Added Keywords||needs-bisection Known to work||12.1.0 Target Milestone|14.0|13.3 Known to fail||13.1.0 Summary|[14 Regression] wrong code |[13/14 Regression] wrong |at -O1 on x86_64-linux-gnu |code at -O1 on |(the generated code hangs) |x86_64-linux-gnu (the ||generated code hangs) --- Comment #4 from Andrew Pinski --- So with GCC 13, with `-fdump-tree-all-all`, we get the same wrong code as on the trunk. This is why I was I misunderstood thinking it was a target issue as I was comparing the dumps between GCC 13 and the trunk with -all enabled but it was broken in GCC 13 too. Anyways I tested GCC 12.3.0 and it looks to be working there. It would be useful to get another bisect done this time with `-O1 -fdump-tree-all-all` .