Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Richard Biener
On Tue, 30 Jan 2018, Jakub Jelinek wrote:

> On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote:
> > The issue is that
> > 
> > static bool
> > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
> > bitmap father_bbs, struct loop *loop)
> > {
> >   struct loop *loop_father;
> >   bool changed = false;
> >   struct loop *inner;
> >   enum unroll_level ul;
> > 
> >   /* Process inner loops first.  */
> >   for (inner = loop->inner; inner != NULL; inner = inner->next)
> > changed |= tree_unroll_loops_completely_1 (may_increase_size,
> >unroll_outer, father_bbs,
> >inner);
> > 
> > when recursing tree_unroll_loops_completely_1 appends unrolled
> > bodies and loops contained therein in the loop->inner chain.  We
> > specifically do _not_ allow processing of outer loops here due
> > to not up-to-date SSA form and then iterate the unrolling process
> > instead:
> > 
> >   /* If we changed an inner loop we cannot process outer loops in this
> >  iteration because SSA form is not up-to-date.  Continue with
> >  siblings of outer loops instead.  */
> >   if (changed)
> > return true;
> > 
> > so I think the proper fix is to avoid visiting loops that were added
> > in the above iteration over loop->inner.  For example by collecting
> > the loop->inner chain in a vec and then iterating over that (much
> > like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
> > loops will be >= number_of_loops () [number_of_loops () is really
> > mis-named, would be a good time to add max_loop_num () or so]
> 
> So like this if it passes bootstrap/regtest?

Yes.

> OT, why is loop->num signed rather than unsigned?

History.  Like so many others (block->index, etc.).  Possibly to
allow better optimization of loops over those indexes (relying
on undefined overflow).

Richard.

> 2018-01-30  Richard Biener  
>   Jakub Jelinek  
> 
>   PR tree-optimization/84111
>   * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip
>   inner loops added during recursion, as they don't have up-to-date
>   SSA form.
> 
>   * gcc.c-torture/compile/pr84111.c: New test.
> 
> --- gcc/tree-ssa-loop-ivcanon.c.jj2018-01-29 14:05:46.689153783 +0100
> +++ gcc/tree-ssa-loop-ivcanon.c   2018-01-30 09:30:21.932069737 +0100
> @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may
>bool changed = false;
>struct loop *inner;
>enum unroll_level ul;
> +  unsigned num = number_of_loops (cfun);
>  
> -  /* Process inner loops first.  */
> +  /* Process inner loops first.  Don't walk loops added by the recursive
> + calls because SSA form is not up-to-date.  They can be handled in the
> + next iteration.  */
>for (inner = loop->inner; inner != NULL; inner = inner->next)
> -changed |= tree_unroll_loops_completely_1 (may_increase_size,
> -unroll_outer, father_bbs,
> -inner);
> - 
> +if ((unsigned) inner->num < num)
> +  changed |= tree_unroll_loops_completely_1 (may_increase_size,
> +  unroll_outer, father_bbs,
> +  inner);
> +
>/* If we changed an inner loop we cannot process outer loops in this
>   iteration because SSA form is not up-to-date.  Continue with
>   siblings of outer loops instead.  */
> --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj  2018-01-29 
> 20:38:10.236528914 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 
> 20:37:53.227522763 +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/84111 */
> +
> +void
> +foo (int x, int y, int z)
> +{
> +  int a = 0;
> +  int *b = 
> +
> +  while (a < 1)
> +{
> +  int c = y;
> +  *b = x;
> + lab:
> +  for (a = 0; a < 36; ++a)
> +{
> +  *b = 0;
> +  if (x != 0)
> +y = 0;
> +  while (c < 1)
> + ;
> +}
> +}
> +  if (z < 33)
> +{
> +  b = (int *) 0;
> +  ++y;
> +  ++z;
> +  if (x / *b != 0)
> +goto lab;
> +}
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote:
> The issue is that
> 
> static bool
> tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
> bitmap father_bbs, struct loop *loop)
> {
>   struct loop *loop_father;
>   bool changed = false;
>   struct loop *inner;
>   enum unroll_level ul;
> 
>   /* Process inner loops first.  */
>   for (inner = loop->inner; inner != NULL; inner = inner->next)
> changed |= tree_unroll_loops_completely_1 (may_increase_size,
>unroll_outer, father_bbs,
>inner);
> 
> when recursing tree_unroll_loops_completely_1 appends unrolled
> bodies and loops contained therein in the loop->inner chain.  We
> specifically do _not_ allow processing of outer loops here due
> to not up-to-date SSA form and then iterate the unrolling process
> instead:
> 
>   /* If we changed an inner loop we cannot process outer loops in this
>  iteration because SSA form is not up-to-date.  Continue with
>  siblings of outer loops instead.  */
>   if (changed)
> return true;
> 
> so I think the proper fix is to avoid visiting loops that were added
> in the above iteration over loop->inner.  For example by collecting
> the loop->inner chain in a vec and then iterating over that (much
> like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
> loops will be >= number_of_loops () [number_of_loops () is really
> mis-named, would be a good time to add max_loop_num () or so]

So like this if it passes bootstrap/regtest?

OT, why is loop->num signed rather than unsigned?

2018-01-30  Richard Biener  
Jakub Jelinek  

PR tree-optimization/84111
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip
inner loops added during recursion, as they don't have up-to-date
SSA form.

* gcc.c-torture/compile/pr84111.c: New test.

--- gcc/tree-ssa-loop-ivcanon.c.jj  2018-01-29 14:05:46.689153783 +0100
+++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100
@@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may
   bool changed = false;
   struct loop *inner;
   enum unroll_level ul;
+  unsigned num = number_of_loops (cfun);
 
-  /* Process inner loops first.  */
+  /* Process inner loops first.  Don't walk loops added by the recursive
+ calls because SSA form is not up-to-date.  They can be handled in the
+ next iteration.  */
   for (inner = loop->inner; inner != NULL; inner = inner->next)
-changed |= tree_unroll_loops_completely_1 (may_increase_size,
-  unroll_outer, father_bbs,
-  inner);
- 
+if ((unsigned) inner->num < num)
+  changed |= tree_unroll_loops_completely_1 (may_increase_size,
+unroll_outer, father_bbs,
+inner);
+
   /* If we changed an inner loop we cannot process outer loops in this
  iteration because SSA form is not up-to-date.  Continue with
  siblings of outer loops instead.  */
--- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj2018-01-29 
20:38:10.236528914 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr84111.c   2018-01-29 
20:37:53.227522763 +0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/84111 */
+
+void
+foo (int x, int y, int z)
+{
+  int a = 0;
+  int *b = 
+
+  while (a < 1)
+{
+  int c = y;
+  *b = x;
+ lab:
+  for (a = 0; a < 36; ++a)
+{
+  *b = 0;
+  if (x != 0)
+y = 0;
+  while (c < 1)
+   ;
+}
+}
+  if (z < 33)
+{
+  b = (int *) 0;
+  ++y;
+  ++z;
+  if (x / *b != 0)
+goto lab;
+}
+}


Jakub


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Jakub Jelinek
On Mon, Jan 29, 2018 at 11:58:45PM -0700, Jeff Law wrote:
> On 01/29/2018 04:37 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for
> > renaming and then invokes some SCEV using functions before finally updating
> > the SSA form.  On the testcase we end up with 3 degenerate PHIs pointing at
> > each other, so follow_copies_to_constant loops forever.
> I'm at a loss to understand how we could get in that situation.  Was it
> edge removal (if so for which PHI) or the duplicate process that created
> the loop?

The original loop before unrolling had following PHIs for the y parameter:
   [local count: 118111601]:
  # y_37 = PHI 

   [local count: 3277506]:
  # y_29 = PHI 

   [local count: 3498303]:
  # y_8 = PHI 
and actually no other PHIs, so effectively all uses of y in the function
could be y_20(D), but we haven't cleaned it up yet; before pre we had:
   [local count: 118111601]:
  # y_29 = PHI <0(4), y_37(3)>
which prevented optimizing everything into y_20(D).
In the cunroll emergency dump I've done when it was looping I got:
;;   basic block 15, loop depth 0
;;pred:   2
  # y_25 = PHI 

;;   basic block 17, loop depth 1
;;pred:   16
;;19
  # y_10 = PHI 

;;   basic block 20, loop depth 0
;;pred:   18
  # y_5 = PHI 

;;   basic block 3, loop depth 2
;;pred:   11
;;14
  # y_37 = PHI 

;;   basic block 13, loop depth 1
;;pred:   5
  # y_29 = PHI 

;;   basic block 6, loop depth 1
;;pred:   20
;;13
  # y_8 = PHI 

where bb 3, 13 and 6 are the same PHIs as before which aren't going to be
renamed, and bb20 is the new cunroll copy of bb13, bb17 copy of bb3.
The PHI results of the PHIs in the new bbs already have a new SSA_NAME, but
the PHI arguments are waiting to be renamed.
Already before the cunroll, 2 of the PHIs are degenerate and only one is
not.  When SCEV is called, the only non-degenerate among those 3 previously
looks like degenerate too, but we'll be replacing y_29(20) in there with
y_5(20) during ssa update.

Now, with the patch at the end of cunroll after ssa update and cfg cleanup
we have:
   [local count: 18182121]:
  # y_10 = PHI 

   [local count: 2063999]:
  # y_5 = PHI 
PHIs for y and nothing else and finally phicprop2 optimizes all y references
to just y_20(D).

> > The following patch has 2 different fixes for this, one is to avoid looking
> > at SSA_NAMEs registered for update (in theory all we care are the old ssa
> > names, but we don't have an API to query just those), the other is to put
> > some upper bound on how many times we follow SSA_NAMEs (either single defs
> > or degenerate phis).  Either of those changes is sufficient to fix this.
> Interestingly enough I was looking at a bug several weeks ago where the
> ability to query one of the sets (I can't remember if it was old or new)
> of names for rewriting would have been useful.

tree-cfg.c has a comment like that:
tree-cfg.c-  /* Technically only new names matter.  */
tree-cfg.c:  if (name_registered_for_update_p (PHI_RESULT (phi)))
tree-cfg.c- return false;

Jakub


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Richard Biener
On Mon, 29 Jan 2018, Jeff Law wrote:

> On 01/29/2018 04:37 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for
> > renaming and then invokes some SCEV using functions before finally updating
> > the SSA form.  On the testcase we end up with 3 degenerate PHIs pointing at
> > each other, so follow_copies_to_constant loops forever.
> I'm at a loss to understand how we could get in that situation.  Was it
> edge removal (if so for which PHI) or the duplicate process that created
> the loop?

The issue is that

static bool
tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
bitmap father_bbs, struct loop *loop)
{
  struct loop *loop_father;
  bool changed = false;
  struct loop *inner;
  enum unroll_level ul;

  /* Process inner loops first.  */
  for (inner = loop->inner; inner != NULL; inner = inner->next)
changed |= tree_unroll_loops_completely_1 (may_increase_size,
   unroll_outer, father_bbs,
   inner);

when recursing tree_unroll_loops_completely_1 appends unrolled
bodies and loops contained therein in the loop->inner chain.  We
specifically do _not_ allow processing of outer loops here due
to not up-to-date SSA form and then iterate the unrolling process
instead:

  /* If we changed an inner loop we cannot process outer loops in this
 iteration because SSA form is not up-to-date.  Continue with
 siblings of outer loops instead.  */
  if (changed)
return true;

so I think the proper fix is to avoid visiting loops that were added
in the above iteration over loop->inner.  For example by collecting
the loop->inner chain in a vec and then iterating over that (much
like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
loops will be >= number_of_loops () [number_of_loops () is really
mis-named, would be a good time to add max_loop_num () or so]

Richard.

> We've seen one case where this happened inside DOM, but it was only
> because we ignored an unexecutable edge which then caused a PHI to
> become a degenerate.One or more of the PHI args in the degenerate
> was marked as EDGE_DFS_BACK on its associated edge.
> 
> Are any of the PHI args in pr84111 marked in a similar manner?
> 
> > 
> > The following patch has 2 different fixes for this, one is to avoid looking
> > at SSA_NAMEs registered for update (in theory all we care are the old ssa
> > names, but we don't have an API to query just those), the other is to put
> > some upper bound on how many times we follow SSA_NAMEs (either single defs
> > or degenerate phis).  Either of those changes is sufficient to fix this.
> Interestingly enough I was looking at a bug several weeks ago where the
> ability to query one of the sets (I can't remember if it was old or new)
> of names for rewriting would have been useful.
> 
> 
> > 
> > During x86_64-linux and i686-linux bootstraps/regtests 
> > follow_copies_to_constant
> > actually never returned anything useful, so the patch doesn't regress at
> > least on these targets anything, there are quite a few cases where SSA_NAME
> > is name_registered_for_update_p but the function would never return anything
> > but var in those cases.  And the highest depth ever seen was 3, except for
> > the newly added testcase if the && !name_registered_for_update_p (res) part
> > is commented out.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > 2018-01-29  Jakub Jelinek  
> > 
> > PR tree-optimization/84111
> > * tree-scalar-evolution.c: Include tree-into-ssa.h.
> > (follow_copies_to_constant): Stop on SSA_NAMEs registered for update,
> > loop at most 128 times.
> > 
> > * gcc.c-torture/compile/pr84111.c: New test.
> Probably reasonable.  Though I would like to better understand how we
> got the degenerates forming a loop before final ack.
> 
> jeff
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-29 Thread Jeff Law
On 01/29/2018 04:37 PM, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for
> renaming and then invokes some SCEV using functions before finally updating
> the SSA form.  On the testcase we end up with 3 degenerate PHIs pointing at
> each other, so follow_copies_to_constant loops forever.
I'm at a loss to understand how we could get in that situation.  Was it
edge removal (if so for which PHI) or the duplicate process that created
the loop?

We've seen one case where this happened inside DOM, but it was only
because we ignored an unexecutable edge which then caused a PHI to
become a degenerate.One or more of the PHI args in the degenerate
was marked as EDGE_DFS_BACK on its associated edge.

Are any of the PHI args in pr84111 marked in a similar manner?

> 
> The following patch has 2 different fixes for this, one is to avoid looking
> at SSA_NAMEs registered for update (in theory all we care are the old ssa
> names, but we don't have an API to query just those), the other is to put
> some upper bound on how many times we follow SSA_NAMEs (either single defs
> or degenerate phis).  Either of those changes is sufficient to fix this.
Interestingly enough I was looking at a bug several weeks ago where the
ability to query one of the sets (I can't remember if it was old or new)
of names for rewriting would have been useful.


> 
> During x86_64-linux and i686-linux bootstraps/regtests 
> follow_copies_to_constant
> actually never returned anything useful, so the patch doesn't regress at
> least on these targets anything, there are quite a few cases where SSA_NAME
> is name_registered_for_update_p but the function would never return anything
> but var in those cases.  And the highest depth ever seen was 3, except for
> the newly added testcase if the && !name_registered_for_update_p (res) part
> is commented out.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2018-01-29  Jakub Jelinek  
> 
>   PR tree-optimization/84111
>   * tree-scalar-evolution.c: Include tree-into-ssa.h.
>   (follow_copies_to_constant): Stop on SSA_NAMEs registered for update,
>   loop at most 128 times.
> 
>   * gcc.c-torture/compile/pr84111.c: New test.
Probably reasonable.  Though I would like to better understand how we
got the degenerates forming a loop before final ack.

jeff