Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Tue, 30 Jan 2018, Jakub Jelinek wrote: > On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote: > > The issue is that > > > > static bool > > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, > > bitmap father_bbs, struct loop *loop) > > { > > struct loop *loop_father; > > bool changed = false; > > struct loop *inner; > > enum unroll_level ul; > > > > /* Process inner loops first. */ > > for (inner = loop->inner; inner != NULL; inner = inner->next) > > changed |= tree_unroll_loops_completely_1 (may_increase_size, > >unroll_outer, father_bbs, > >inner); > > > > when recursing tree_unroll_loops_completely_1 appends unrolled > > bodies and loops contained therein in the loop->inner chain. We > > specifically do _not_ allow processing of outer loops here due > > to not up-to-date SSA form and then iterate the unrolling process > > instead: > > > > /* If we changed an inner loop we cannot process outer loops in this > > iteration because SSA form is not up-to-date. Continue with > > siblings of outer loops instead. */ > > if (changed) > > return true; > > > > so I think the proper fix is to avoid visiting loops that were added > > in the above iteration over loop->inner. For example by collecting > > the loop->inner chain in a vec and then iterating over that (much > > like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new > > loops will be >= number_of_loops () [number_of_loops () is really > > mis-named, would be a good time to add max_loop_num () or so] > > So like this if it passes bootstrap/regtest? Yes. > OT, why is loop->num signed rather than unsigned? History. Like so many others (block->index, etc.). Possibly to allow better optimization of loops over those indexes (relying on undefined overflow). Richard. > 2018-01-30 Richard Biener> Jakub Jelinek > > PR tree-optimization/84111 > * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip > inner loops added during recursion, as they don't have up-to-date > SSA form. > > * gcc.c-torture/compile/pr84111.c: New test. > > --- gcc/tree-ssa-loop-ivcanon.c.jj2018-01-29 14:05:46.689153783 +0100 > +++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100 > @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may >bool changed = false; >struct loop *inner; >enum unroll_level ul; > + unsigned num = number_of_loops (cfun); > > - /* Process inner loops first. */ > + /* Process inner loops first. Don't walk loops added by the recursive > + calls because SSA form is not up-to-date. They can be handled in the > + next iteration. */ >for (inner = loop->inner; inner != NULL; inner = inner->next) > -changed |= tree_unroll_loops_completely_1 (may_increase_size, > -unroll_outer, father_bbs, > -inner); > - > +if ((unsigned) inner->num < num) > + changed |= tree_unroll_loops_completely_1 (may_increase_size, > + unroll_outer, father_bbs, > + inner); > + >/* If we changed an inner loop we cannot process outer loops in this > iteration because SSA form is not up-to-date. Continue with > siblings of outer loops instead. */ > --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj 2018-01-29 > 20:38:10.236528914 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 > 20:37:53.227522763 +0100 > @@ -0,0 +1,31 @@ > +/* PR tree-optimization/84111 */ > + > +void > +foo (int x, int y, int z) > +{ > + int a = 0; > + int *b = > + > + while (a < 1) > +{ > + int c = y; > + *b = x; > + lab: > + for (a = 0; a < 36; ++a) > +{ > + *b = 0; > + if (x != 0) > +y = 0; > + while (c < 1) > + ; > +} > +} > + if (z < 33) > +{ > + b = (int *) 0; > + ++y; > + ++z; > + if (x / *b != 0) > +goto lab; > +} > +} > > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote: > The issue is that > > static bool > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, > bitmap father_bbs, struct loop *loop) > { > struct loop *loop_father; > bool changed = false; > struct loop *inner; > enum unroll_level ul; > > /* Process inner loops first. */ > for (inner = loop->inner; inner != NULL; inner = inner->next) > changed |= tree_unroll_loops_completely_1 (may_increase_size, >unroll_outer, father_bbs, >inner); > > when recursing tree_unroll_loops_completely_1 appends unrolled > bodies and loops contained therein in the loop->inner chain. We > specifically do _not_ allow processing of outer loops here due > to not up-to-date SSA form and then iterate the unrolling process > instead: > > /* If we changed an inner loop we cannot process outer loops in this > iteration because SSA form is not up-to-date. Continue with > siblings of outer loops instead. */ > if (changed) > return true; > > so I think the proper fix is to avoid visiting loops that were added > in the above iteration over loop->inner. For example by collecting > the loop->inner chain in a vec and then iterating over that (much > like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new > loops will be >= number_of_loops () [number_of_loops () is really > mis-named, would be a good time to add max_loop_num () or so] So like this if it passes bootstrap/regtest? OT, why is loop->num signed rather than unsigned? 2018-01-30 Richard BienerJakub Jelinek PR tree-optimization/84111 * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip inner loops added during recursion, as they don't have up-to-date SSA form. * gcc.c-torture/compile/pr84111.c: New test. --- gcc/tree-ssa-loop-ivcanon.c.jj 2018-01-29 14:05:46.689153783 +0100 +++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100 @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may bool changed = false; struct loop *inner; enum unroll_level ul; + unsigned num = number_of_loops (cfun); - /* Process inner loops first. */ + /* Process inner loops first. Don't walk loops added by the recursive + calls because SSA form is not up-to-date. They can be handled in the + next iteration. */ for (inner = loop->inner; inner != NULL; inner = inner->next) -changed |= tree_unroll_loops_completely_1 (may_increase_size, - unroll_outer, father_bbs, - inner); - +if ((unsigned) inner->num < num) + changed |= tree_unroll_loops_completely_1 (may_increase_size, +unroll_outer, father_bbs, +inner); + /* If we changed an inner loop we cannot process outer loops in this iteration because SSA form is not up-to-date. Continue with siblings of outer loops instead. */ --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj2018-01-29 20:38:10.236528914 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 20:37:53.227522763 +0100 @@ -0,0 +1,31 @@ +/* PR tree-optimization/84111 */ + +void +foo (int x, int y, int z) +{ + int a = 0; + int *b = + + while (a < 1) +{ + int c = y; + *b = x; + lab: + for (a = 0; a < 36; ++a) +{ + *b = 0; + if (x != 0) +y = 0; + while (c < 1) + ; +} +} + if (z < 33) +{ + b = (int *) 0; + ++y; + ++z; + if (x / *b != 0) +goto lab; +} +} Jakub
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Mon, Jan 29, 2018 at 11:58:45PM -0700, Jeff Law wrote: > On 01/29/2018 04:37 PM, Jakub Jelinek wrote: > > Hi! > > > > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for > > renaming and then invokes some SCEV using functions before finally updating > > the SSA form. On the testcase we end up with 3 degenerate PHIs pointing at > > each other, so follow_copies_to_constant loops forever. > I'm at a loss to understand how we could get in that situation. Was it > edge removal (if so for which PHI) or the duplicate process that created > the loop? The original loop before unrolling had following PHIs for the y parameter: [local count: 118111601]: # y_37 = PHI[local count: 3277506]: # y_29 = PHI [local count: 3498303]: # y_8 = PHI and actually no other PHIs, so effectively all uses of y in the function could be y_20(D), but we haven't cleaned it up yet; before pre we had: [local count: 118111601]: # y_29 = PHI <0(4), y_37(3)> which prevented optimizing everything into y_20(D). In the cunroll emergency dump I've done when it was looping I got: ;; basic block 15, loop depth 0 ;;pred: 2 # y_25 = PHI ;; basic block 17, loop depth 1 ;;pred: 16 ;;19 # y_10 = PHI ;; basic block 20, loop depth 0 ;;pred: 18 # y_5 = PHI ;; basic block 3, loop depth 2 ;;pred: 11 ;;14 # y_37 = PHI ;; basic block 13, loop depth 1 ;;pred: 5 # y_29 = PHI ;; basic block 6, loop depth 1 ;;pred: 20 ;;13 # y_8 = PHI where bb 3, 13 and 6 are the same PHIs as before which aren't going to be renamed, and bb20 is the new cunroll copy of bb13, bb17 copy of bb3. The PHI results of the PHIs in the new bbs already have a new SSA_NAME, but the PHI arguments are waiting to be renamed. Already before the cunroll, 2 of the PHIs are degenerate and only one is not. When SCEV is called, the only non-degenerate among those 3 previously looks like degenerate too, but we'll be replacing y_29(20) in there with y_5(20) during ssa update. Now, with the patch at the end of cunroll after ssa update and cfg cleanup we have: [local count: 18182121]: # y_10 = PHI [local count: 2063999]: # y_5 = PHI PHIs for y and nothing else and finally phicprop2 optimizes all y references to just y_20(D). > > The following patch has 2 different fixes for this, one is to avoid looking > > at SSA_NAMEs registered for update (in theory all we care are the old ssa > > names, but we don't have an API to query just those), the other is to put > > some upper bound on how many times we follow SSA_NAMEs (either single defs > > or degenerate phis). Either of those changes is sufficient to fix this. > Interestingly enough I was looking at a bug several weeks ago where the > ability to query one of the sets (I can't remember if it was old or new) > of names for rewriting would have been useful. tree-cfg.c has a comment like that: tree-cfg.c- /* Technically only new names matter. */ tree-cfg.c: if (name_registered_for_update_p (PHI_RESULT (phi))) tree-cfg.c- return false; Jakub
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Mon, 29 Jan 2018, Jeff Law wrote: > On 01/29/2018 04:37 PM, Jakub Jelinek wrote: > > Hi! > > > > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for > > renaming and then invokes some SCEV using functions before finally updating > > the SSA form. On the testcase we end up with 3 degenerate PHIs pointing at > > each other, so follow_copies_to_constant loops forever. > I'm at a loss to understand how we could get in that situation. Was it > edge removal (if so for which PHI) or the duplicate process that created > the loop? The issue is that static bool tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, bitmap father_bbs, struct loop *loop) { struct loop *loop_father; bool changed = false; struct loop *inner; enum unroll_level ul; /* Process inner loops first. */ for (inner = loop->inner; inner != NULL; inner = inner->next) changed |= tree_unroll_loops_completely_1 (may_increase_size, unroll_outer, father_bbs, inner); when recursing tree_unroll_loops_completely_1 appends unrolled bodies and loops contained therein in the loop->inner chain. We specifically do _not_ allow processing of outer loops here due to not up-to-date SSA form and then iterate the unrolling process instead: /* If we changed an inner loop we cannot process outer loops in this iteration because SSA form is not up-to-date. Continue with siblings of outer loops instead. */ if (changed) return true; so I think the proper fix is to avoid visiting loops that were added in the above iteration over loop->inner. For example by collecting the loop->inner chain in a vec and then iterating over that (much like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new loops will be >= number_of_loops () [number_of_loops () is really mis-named, would be a good time to add max_loop_num () or so] Richard. > We've seen one case where this happened inside DOM, but it was only > because we ignored an unexecutable edge which then caused a PHI to > become a degenerate.One or more of the PHI args in the degenerate > was marked as EDGE_DFS_BACK on its associated edge. > > Are any of the PHI args in pr84111 marked in a similar manner? > > > > > The following patch has 2 different fixes for this, one is to avoid looking > > at SSA_NAMEs registered for update (in theory all we care are the old ssa > > names, but we don't have an API to query just those), the other is to put > > some upper bound on how many times we follow SSA_NAMEs (either single defs > > or degenerate phis). Either of those changes is sufficient to fix this. > Interestingly enough I was looking at a bug several weeks ago where the > ability to query one of the sets (I can't remember if it was old or new) > of names for rewriting would have been useful. > > > > > > During x86_64-linux and i686-linux bootstraps/regtests > > follow_copies_to_constant > > actually never returned anything useful, so the patch doesn't regress at > > least on these targets anything, there are quite a few cases where SSA_NAME > > is name_registered_for_update_p but the function would never return anything > > but var in those cases. And the highest depth ever seen was 3, except for > > the newly added testcase if the && !name_registered_for_update_p (res) part > > is commented out. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2018-01-29 Jakub Jelinek> > > > PR tree-optimization/84111 > > * tree-scalar-evolution.c: Include tree-into-ssa.h. > > (follow_copies_to_constant): Stop on SSA_NAMEs registered for update, > > loop at most 128 times. > > > > * gcc.c-torture/compile/pr84111.c: New test. > Probably reasonable. Though I would like to better understand how we > got the degenerates forming a loop before final ack. > > jeff > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On 01/29/2018 04:37 PM, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for > renaming and then invokes some SCEV using functions before finally updating > the SSA form. On the testcase we end up with 3 degenerate PHIs pointing at > each other, so follow_copies_to_constant loops forever. I'm at a loss to understand how we could get in that situation. Was it edge removal (if so for which PHI) or the duplicate process that created the loop? We've seen one case where this happened inside DOM, but it was only because we ignored an unexecutable edge which then caused a PHI to become a degenerate.One or more of the PHI args in the degenerate was marked as EDGE_DFS_BACK on its associated edge. Are any of the PHI args in pr84111 marked in a similar manner? > > The following patch has 2 different fixes for this, one is to avoid looking > at SSA_NAMEs registered for update (in theory all we care are the old ssa > names, but we don't have an API to query just those), the other is to put > some upper bound on how many times we follow SSA_NAMEs (either single defs > or degenerate phis). Either of those changes is sufficient to fix this. Interestingly enough I was looking at a bug several weeks ago where the ability to query one of the sets (I can't remember if it was old or new) of names for rewriting would have been useful. > > During x86_64-linux and i686-linux bootstraps/regtests > follow_copies_to_constant > actually never returned anything useful, so the patch doesn't regress at > least on these targets anything, there are quite a few cases where SSA_NAME > is name_registered_for_update_p but the function would never return anything > but var in those cases. And the highest depth ever seen was 3, except for > the newly added testcase if the && !name_registered_for_update_p (res) part > is commented out. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2018-01-29 Jakub Jelinek> > PR tree-optimization/84111 > * tree-scalar-evolution.c: Include tree-into-ssa.h. > (follow_copies_to_constant): Stop on SSA_NAMEs registered for update, > loop at most 128 times. > > * gcc.c-torture/compile/pr84111.c: New test. Probably reasonable. Though I would like to better understand how we got the degenerates forming a loop before final ack. jeff