Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa

2012-08-16 Thread Richard Guenther
On Wed, Aug 15, 2012 at 6:26 PM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 This patch rewrites the rewriting part of
 rewrite_into_loop_closed_ssa. This function took ~300s on the
 simplified test case for PR54146, walking around the many thousands of
 loops in the 400,000 basic blocks in the CFG, in
 compute_global_livein.

 For rewriting into LC-SSA, we can do better than that if we use the
 knowledge that we're actually computing livein for an SSA name DEF,
 therefore its uses must all be dominated by the basic block where DEF
 is assigned a value. But walking up the dominator tree doesn't work
 here because we want to traverse and mark the edges in the CFG where
 DEF is live, and we may not see those edges if we walk up the
 dominator tree from a USE to DEF. We do know, however, that when we
 walk up the CFG then:

 1. if we enter any loop nested in the same loop as the basic block
 containing DEF (let's call it DEF_LOOP) then we can stop walking
 because the only way in is through one of the edges we're interested
 in.

 2. if we enter a loop is not in the nesting stack of DEF_LOOP, then we
 can skin straight to the header of the loop and continue looking up
 from there.

 3. if we reach a basic block X as a predecessor of Y and Y dominates X
 then we don't have to look at X or any of its predecessors.

 Putting this all together, you end up with what looks like a poor
 man's form of structural analysis where the control tree only contains
 loop nodes, non-loop basic blocks, and irreducible regions. (Honza:
 maybe re-surrect
 http://gcc.gnu.org/ml/gcc-patches/2003-06/msg01669.html? Now that we
 can maintain the loop tree, perhaps it's also feasible to maintain the
 complete control tree and use it in places where we want to analyze
 only a sub-CFG?)

 The effect is that the time spent in rewrite_into_loop_closed_ssa
 drops to less than one second.

 Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ok!

It recently occured to me that the only reason we do not have virtual
operands in loop-closed-SSA form (and thus jump through hoops dealing
with them in cfgloopmainp and elsewhere) was that we'd have so many
of them.  Today we just have a single one, so maybe we can simplify
things here (the into-lc-SSA code just skips vops, not doing that should
provide lc-SSA form even for virtuals).

Many thanks for all these speed-ups!
Richard.


 Ciao!
 Steven


Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa

2012-08-16 Thread Michael Matz
Hi,

On Wed, 15 Aug 2012, Steven Bosscher wrote:

Please convince me that this :

+/* Return the outermost superloop LOOP of USE_LOOP that is a superloop of
+   both DEF_LOOP and USE_LOOP.  */
+static inline struct loop *
+find_sibling_superloop (struct loop *use_loop, struct loop *def_loop)
+{
+  unsigned ud = loop_depth (use_loop);
+  unsigned dd = loop_depth (def_loop);
+  gcc_assert (ud  0  dd  0);
+  if (ud  dd)
+use_loop = superloop_at_depth (use_loop, dd);
+  if (ud  dd)
+def_loop = superloop_at_depth (def_loop, ud);
+  while (loop_outer (use_loop) != loop_outer (def_loop))
+{
+  use_loop = loop_outer (use_loop);
+  def_loop = loop_outer (def_loop);
+  gcc_assert (use_loop  def_loop);
+}
+  return use_loop;

doesn't have the usual problem of advancing two iterators at the same time 
and expecting they'll eventually meet (which they generally don't have 
to).

Also the block comment indicates that could should just as well return the 
outermost loop of the function, because it's the outermost superloop LOOP 
of USE_LOOP that is a superloop of both DEF_LOOP and USE_LOOP.

The implementation seems to want to return the _innermost_ superloop that 
still covers both DEF_LOOP and USE_LOOP, and in that case my concern about 
them never meeting stands.


Ciao,
Michael.


Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa

2012-08-16 Thread Michael Matz
Hi,

On Thu, 16 Aug 2012, Steven Bosscher wrote:

 2. In find_sibling_superloop the first step is to align the loop depth:
... 
 3. Now that USE_LOOP and DEF_LOOP are at the same nesting depth,

Ah, that's the crucial point I missed, the while loops starts out with 
loops at the same depth; yes then the loop works.  Thanks for having a 
walk with me :)

 This must meet at some point, because the outermost loop of all loops 
 is current_loops-tree_root at depth 0, and we count down from the same 
 loop depth.

Btw, then the comment is still wrong.  You're returning the innermost 
common outer loop, not the outermost (which would be trivial).


Ciao,
Michael.


Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa

2012-08-16 Thread Steven Bosscher
On Thu, Aug 16, 2012 at 4:24 PM, Michael Matz m...@suse.de wrote:
 Btw, then the comment is still wrong.  You're returning the innermost
 common outer loop, not the outermost (which would be trivial).

I'll fix the comment.

Ciao!
Steven


[patch] PR54146 - rewrite rewrite_into_loop_closed_ssa

2012-08-15 Thread Steven Bosscher
Hello,

This patch rewrites the rewriting part of
rewrite_into_loop_closed_ssa. This function took ~300s on the
simplified test case for PR54146, walking around the many thousands of
loops in the 400,000 basic blocks in the CFG, in
compute_global_livein.

For rewriting into LC-SSA, we can do better than that if we use the
knowledge that we're actually computing livein for an SSA name DEF,
therefore its uses must all be dominated by the basic block where DEF
is assigned a value. But walking up the dominator tree doesn't work
here because we want to traverse and mark the edges in the CFG where
DEF is live, and we may not see those edges if we walk up the
dominator tree from a USE to DEF. We do know, however, that when we
walk up the CFG then:

1. if we enter any loop nested in the same loop as the basic block
containing DEF (let's call it DEF_LOOP) then we can stop walking
because the only way in is through one of the edges we're interested
in.

2. if we enter a loop is not in the nesting stack of DEF_LOOP, then we
can skin straight to the header of the loop and continue looking up
from there.

3. if we reach a basic block X as a predecessor of Y and Y dominates X
then we don't have to look at X or any of its predecessors.

Putting this all together, you end up with what looks like a poor
man's form of structural analysis where the control tree only contains
loop nodes, non-loop basic blocks, and irreducible regions. (Honza:
maybe re-surrect
http://gcc.gnu.org/ml/gcc-patches/2003-06/msg01669.html? Now that we
can maintain the loop tree, perhaps it's also feasible to maintain the
complete control tree and use it in places where we want to analyze
only a sub-CFG?)

The effect is that the time spent in rewrite_into_loop_closed_ssa
drops to less than one second.

Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


PR54146_lcssa.diff
Description: Binary data