Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa
On Wed, Aug 15, 2012 at 6:26 PM, Steven Bosscher stevenb@gmail.com wrote: Hello, This patch rewrites the rewriting part of rewrite_into_loop_closed_ssa. This function took ~300s on the simplified test case for PR54146, walking around the many thousands of loops in the 400,000 basic blocks in the CFG, in compute_global_livein. For rewriting into LC-SSA, we can do better than that if we use the knowledge that we're actually computing livein for an SSA name DEF, therefore its uses must all be dominated by the basic block where DEF is assigned a value. But walking up the dominator tree doesn't work here because we want to traverse and mark the edges in the CFG where DEF is live, and we may not see those edges if we walk up the dominator tree from a USE to DEF. We do know, however, that when we walk up the CFG then: 1. if we enter any loop nested in the same loop as the basic block containing DEF (let's call it DEF_LOOP) then we can stop walking because the only way in is through one of the edges we're interested in. 2. if we enter a loop is not in the nesting stack of DEF_LOOP, then we can skin straight to the header of the loop and continue looking up from there. 3. if we reach a basic block X as a predecessor of Y and Y dominates X then we don't have to look at X or any of its predecessors. Putting this all together, you end up with what looks like a poor man's form of structural analysis where the control tree only contains loop nodes, non-loop basic blocks, and irreducible regions. (Honza: maybe re-surrect http://gcc.gnu.org/ml/gcc-patches/2003-06/msg01669.html? Now that we can maintain the loop tree, perhaps it's also feasible to maintain the complete control tree and use it in places where we want to analyze only a sub-CFG?) The effect is that the time spent in rewrite_into_loop_closed_ssa drops to less than one second. Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? Ok! It recently occured to me that the only reason we do not have virtual operands in loop-closed-SSA form (and thus jump through hoops dealing with them in cfgloopmainp and elsewhere) was that we'd have so many of them. Today we just have a single one, so maybe we can simplify things here (the into-lc-SSA code just skips vops, not doing that should provide lc-SSA form even for virtuals). Many thanks for all these speed-ups! Richard. Ciao! Steven
Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa
Hi, On Wed, 15 Aug 2012, Steven Bosscher wrote: Please convince me that this : +/* Return the outermost superloop LOOP of USE_LOOP that is a superloop of + both DEF_LOOP and USE_LOOP. */ +static inline struct loop * +find_sibling_superloop (struct loop *use_loop, struct loop *def_loop) +{ + unsigned ud = loop_depth (use_loop); + unsigned dd = loop_depth (def_loop); + gcc_assert (ud 0 dd 0); + if (ud dd) +use_loop = superloop_at_depth (use_loop, dd); + if (ud dd) +def_loop = superloop_at_depth (def_loop, ud); + while (loop_outer (use_loop) != loop_outer (def_loop)) +{ + use_loop = loop_outer (use_loop); + def_loop = loop_outer (def_loop); + gcc_assert (use_loop def_loop); +} + return use_loop; doesn't have the usual problem of advancing two iterators at the same time and expecting they'll eventually meet (which they generally don't have to). Also the block comment indicates that could should just as well return the outermost loop of the function, because it's the outermost superloop LOOP of USE_LOOP that is a superloop of both DEF_LOOP and USE_LOOP. The implementation seems to want to return the _innermost_ superloop that still covers both DEF_LOOP and USE_LOOP, and in that case my concern about them never meeting stands. Ciao, Michael.
Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa
Hi, On Thu, 16 Aug 2012, Steven Bosscher wrote: 2. In find_sibling_superloop the first step is to align the loop depth: ... 3. Now that USE_LOOP and DEF_LOOP are at the same nesting depth, Ah, that's the crucial point I missed, the while loops starts out with loops at the same depth; yes then the loop works. Thanks for having a walk with me :) This must meet at some point, because the outermost loop of all loops is current_loops-tree_root at depth 0, and we count down from the same loop depth. Btw, then the comment is still wrong. You're returning the innermost common outer loop, not the outermost (which would be trivial). Ciao, Michael.
Re: [patch] PR54146 - rewrite rewrite_into_loop_closed_ssa
On Thu, Aug 16, 2012 at 4:24 PM, Michael Matz m...@suse.de wrote: Btw, then the comment is still wrong. You're returning the innermost common outer loop, not the outermost (which would be trivial). I'll fix the comment. Ciao! Steven
[patch] PR54146 - rewrite rewrite_into_loop_closed_ssa
Hello, This patch rewrites the rewriting part of rewrite_into_loop_closed_ssa. This function took ~300s on the simplified test case for PR54146, walking around the many thousands of loops in the 400,000 basic blocks in the CFG, in compute_global_livein. For rewriting into LC-SSA, we can do better than that if we use the knowledge that we're actually computing livein for an SSA name DEF, therefore its uses must all be dominated by the basic block where DEF is assigned a value. But walking up the dominator tree doesn't work here because we want to traverse and mark the edges in the CFG where DEF is live, and we may not see those edges if we walk up the dominator tree from a USE to DEF. We do know, however, that when we walk up the CFG then: 1. if we enter any loop nested in the same loop as the basic block containing DEF (let's call it DEF_LOOP) then we can stop walking because the only way in is through one of the edges we're interested in. 2. if we enter a loop is not in the nesting stack of DEF_LOOP, then we can skin straight to the header of the loop and continue looking up from there. 3. if we reach a basic block X as a predecessor of Y and Y dominates X then we don't have to look at X or any of its predecessors. Putting this all together, you end up with what looks like a poor man's form of structural analysis where the control tree only contains loop nodes, non-loop basic blocks, and irreducible regions. (Honza: maybe re-surrect http://gcc.gnu.org/ml/gcc-patches/2003-06/msg01669.html? Now that we can maintain the loop tree, perhaps it's also feasible to maintain the complete control tree and use it in places where we want to analyze only a sub-CFG?) The effect is that the time spent in rewrite_into_loop_closed_ssa drops to less than one second. Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven PR54146_lcssa.diff Description: Binary data