Re: [PATCH] Improve handling of threads which cross over the current loops header
On Tue, Nov 26, 2013 at 11:37 PM, Jeff Law l...@redhat.com wrote: On 11/26/13 02:26, Richard Biener wrote: But only necessary if this threading returned true, no? Correct. Fix for that spinning overnight. Also how likely did it scramble the loop? I see that thread_block_1 already cancels loops in some cases so I wonder what case it misses so that you need to cancel the loop unconditonally here. If you get into that code, effectively always. We know we're threading around the backedge through a multi-way branch at the top, then to some interior node within the loop. It would depend on the precise structure of the CFG within the loop, but basically if you get here, there is a good chance you have irreducible loops. For a good example of how badly things can get scrambled, look at what we do with 54742. Dump or draw CFGs before/after DOM1. It's painful to see what we do to the loop structure, but removing that switch is such a huge win that losing the loop structure has become a non-concern. Ick ;) Loop fixup isn't able to recover the original loop here, it discovers a new one, the one just covering looping through the default: case of the switch. The first pass that allows CFG manipulations during loop init is able to disambiguate the mess DOM left us with and discovers a loop nest of depth 4. Nice transform I'd say ;) Richard. Do you have a testcase that ICEs if I remove the cancelling above? I'd have to review the PRs to recall if they were ICEs and/or codegen failures. jeff
Re: [PATCH] Improve handling of threads which cross over the current loops header
On 11/27/13 04:28, Richard Biener wrote: Ick ;) Loop fixup isn't able to recover the original loop here, it discovers a new one, the one just covering looping through the default: case of the switch. I wouldn't expect it to. I don't have the transformed CFG in front of me right now, but as you probably saw, it's scrambled pretty badly. Sadly, we found out yesterday that the submitter of that BZ reduced coremark too far and the transformation doesn't actually apply to coremark. We're not going to try and extend things at this point in the game -- that'll have to be 4.10/5.0 material. jeff
Re: [PATCH] Improve handling of threads which cross over the current loops header
On Mon, Nov 25, 2013 at 7:25 PM, Jeff Law l...@redhat.com wrote: On 11/22/13 08:56, Richard Biener wrote: So the issue here is we can create irreducible regions new nested loops. Does just setting the header,latch fields for the current loop handle those cases? Yes. Fixed via the attached patch. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Also tested by removing the requirement that the loop header contain a multi-way branch, bootstrapping and testing that compiler as well. Installed on the trunk. Jeff * tree-ssa-threadupdate.c (thread_through_all_blocks): Selectively invalidate loop information. diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index ee0c838..1a52e47 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -1579,7 +1579,6 @@ thread_through_all_blocks (bool may_peel_loop_headers) bitmap_iterator bi; bitmap threaded_blocks; struct loop *loop; - bool totally_clobbered_loops = false; /* We must know about loops in order to preserve them. */ gcc_assert (current_loops != NULL); @@ -1675,9 +1674,15 @@ thread_through_all_blocks (bool may_peel_loop_headers) /* Our path is still valid, thread it. */ if (e-aux) { - totally_clobbered_loops - |= thread_block ((*path)[0]-e-dest, false); + struct loop *loop = (*path)[0]-e-dest-loop_father; + + retval |= thread_block ((*path)[0]-e-dest, false); e-aux = NULL; + + /* This jump thread likely totally scrambled this loop. + So arrange for it to be fixed up. */ + loop-header = NULL; + loop-latch = NULL; But only necessary if this threading returned true, no? Also how likely did it scramble the loop? I see that thread_block_1 already cancels loops in some cases so I wonder what case it misses so that you need to cancel the loop unconditonally here. Do you have a testcase that ICEs if I remove the cancelling above? Thanks, Richard. } } else @@ -1700,32 +1705,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) threaded_blocks = NULL; paths.release (); - /* If we made changes to the CFG that might have totally messed - up the loop structure, then drop the old loop structure and - rebuild. */ - if (totally_clobbered_loops) -{ - /* Release the current loop structures, they are totally -clobbered at this point. */ - loop_optimizer_finalize (); - current_loops = NULL; - - /* Similarly for dominance information. */ - free_dominance_info (CDI_DOMINATORS); - free_dominance_info (CDI_POST_DOMINATORS); - - /* Before we can rebuild the loop structures, we need dominators, -which requires no unreachable code. So remove unreachable code. */ - delete_unreachable_blocks (); - - /* Now rebuild the loop structures. */ - cfun-curr_properties = ~PROP_loops; - loop_optimizer_init (AVOID_CFG_MODIFICATIONS); - cfun-curr_properties |= PROP_loops; - retval = 1; -} - - if (retval current_loops) + if (retval) loops_state_set (LOOPS_NEED_FIXUP); return retval;
Re: [PATCH] Improve handling of threads which cross over the current loops header
On 11/26/13 02:26, Richard Biener wrote: But only necessary if this threading returned true, no? Correct. Fix for that spinning overnight. Also how likely did it scramble the loop? I see that thread_block_1 already cancels loops in some cases so I wonder what case it misses so that you need to cancel the loop unconditonally here. If you get into that code, effectively always. We know we're threading around the backedge through a multi-way branch at the top, then to some interior node within the loop. It would depend on the precise structure of the CFG within the loop, but basically if you get here, there is a good chance you have irreducible loops. For a good example of how badly things can get scrambled, look at what we do with 54742. Dump or draw CFGs before/after DOM1. It's painful to see what we do to the loop structure, but removing that switch is such a huge win that losing the loop structure has become a non-concern. Do you have a testcase that ICEs if I remove the cancelling above? I'd have to review the PRs to recall if they were ICEs and/or codegen failures. jeff
Re: [PATCH] Improve handling of threads which cross over the current loops header
On 11/22/13 08:56, Richard Biener wrote: So the issue here is we can create irreducible regions new nested loops. Does just setting the header,latch fields for the current loop handle those cases? Yes. Fixed via the attached patch. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Also tested by removing the requirement that the loop header contain a multi-way branch, bootstrapping and testing that compiler as well. Installed on the trunk. Jeff * tree-ssa-threadupdate.c (thread_through_all_blocks): Selectively invalidate loop information. diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index ee0c838..1a52e47 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -1579,7 +1579,6 @@ thread_through_all_blocks (bool may_peel_loop_headers) bitmap_iterator bi; bitmap threaded_blocks; struct loop *loop; - bool totally_clobbered_loops = false; /* We must know about loops in order to preserve them. */ gcc_assert (current_loops != NULL); @@ -1675,9 +1674,15 @@ thread_through_all_blocks (bool may_peel_loop_headers) /* Our path is still valid, thread it. */ if (e-aux) { - totally_clobbered_loops - |= thread_block ((*path)[0]-e-dest, false); + struct loop *loop = (*path)[0]-e-dest-loop_father; + + retval |= thread_block ((*path)[0]-e-dest, false); e-aux = NULL; + + /* This jump thread likely totally scrambled this loop. + So arrange for it to be fixed up. */ + loop-header = NULL; + loop-latch = NULL; } } else @@ -1700,32 +1705,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) threaded_blocks = NULL; paths.release (); - /* If we made changes to the CFG that might have totally messed - up the loop structure, then drop the old loop structure and - rebuild. */ - if (totally_clobbered_loops) -{ - /* Release the current loop structures, they are totally -clobbered at this point. */ - loop_optimizer_finalize (); - current_loops = NULL; - - /* Similarly for dominance information. */ - free_dominance_info (CDI_DOMINATORS); - free_dominance_info (CDI_POST_DOMINATORS); - - /* Before we can rebuild the loop structures, we need dominators, -which requires no unreachable code. So remove unreachable code. */ - delete_unreachable_blocks (); - - /* Now rebuild the loop structures. */ - cfun-curr_properties = ~PROP_loops; - loop_optimizer_init (AVOID_CFG_MODIFICATIONS); - cfun-curr_properties |= PROP_loops; - retval = 1; -} - - if (retval current_loops) + if (retval) loops_state_set (LOOPS_NEED_FIXUP); return retval;
Re: [PATCH] Improve handling of threads which cross over the current loops header
On Fri, Nov 22, 2013 at 8:48 AM, Jeff Law l...@redhat.com wrote: Right now jump threads from within a loop which cross the loop header, then terminate within the loop are ignored as this may create irreducible loops. This patch gets the CFG/SSA updating code in good enough shape to handle the case that the embedded guys care about. ie, the coremark FSA/FSM benchmark. Basically we still ignore most of those threads -- the exception is when we find the loop header has a multi-way branch. In that case we go ahead and thread the jump, throw away and rebuild the loop information. Again, we only do this when we're able to remove a multi-way jump at the top of a loop. During a GCC bootstrap I found two instances where this triggers, once in GCC itself, one in the java runtime libraries. I've manually verified that we do the right thing in those two instances as well as the FSA/FSM optimization. However, the reality is this code isn't being well exercised. So (for testing purposes only) I removed the test that the loop header has to be a multi-way jump and enabled a follow-on patch which allows us to identify many more threads through the loop header to points within the current loop. Not surprisingly the code triggered more often and uncovered a couple minor issues that I've addressed. The bootstrap/regression test for the patch ran fine, as did the regression and bootstrap test with various pass limitations removed (with the exception of tests which explicitly test that we don't thread in situations where it'll create irreducible loops). This is good. Installed on the trunk. One more patch to go before calling and end to my own development work for 4.9 :-) * tree-ssa-threadupdate.c: Include tree-cfg.h and tree-pass.h (thread_block_1): Do not cancel jump threads which go from inside a loop, through the header, then back inside the loop. (bb_ends_with_multiway_branch): New function. (thread_through_all_blocks): Handle threading cases which start in a loop through the loop header to a point in the loop. diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 777fe41..38d4f69 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include cfgloop.h #include hash-table.h #include dbgcnt.h +#include tree-cfg.h +#include tree-pass.h /* Given a block B, update the CFG and SSA graph to reflect redirecting one or more in-edges to B to instead reach the destination of an @@ -815,29 +817,15 @@ thread_block_1 (basic_block bb, bool noloop_only, bool joiners) } /* Another case occurs when trying to thread through our -own loop header, possibly from inside the loop. - -If our loop header is buried in the path, then go ahead -and cancel the jump threading request here. This likely -will need updating for the FSA/FSM coremark case. - -Other cases (BB is the loop header) are handled elsewhere. */ +own loop header, possibly from inside the loop. We will +thread these later. */ unsigned int i; for (i = 1; i path-length (); i++) { if ((*path)[i]-e-src == bb-loop_father-header (!loop_exit_edge_p (bb-loop_father, e2) || (*path)[1]-type == EDGE_COPY_SRC_JOINER_BLOCK)) - { - /* If i != 1, then it's a buried header that will not -be handled elsehwere. */ - if (i != 1) - { - delete_jump_thread_path (path); - e-aux = NULL; - } - break; - } + break; } if (i != path-length ()) @@ -1554,6 +1542,20 @@ mark_threaded_blocks (bitmap threaded_blocks) } +/* Return TRUE if BB ends with a switch statement or a computed goto. + Otherwise return false. */ +static bool +bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED) +{ + gimple stmt = last_stmt (bb); + if (stmt gimple_code (stmt) == GIMPLE_SWITCH) +return true; + if (stmt gimple_code (stmt) == GIMPLE_GOTO + TREE_CODE (gimple_goto_dest (stmt)) == SSA_NAME) +return true; + return false; +} + /* Walk through all blocks and thread incoming edges to the appropriate outgoing edge for each edge pair recorded in THREADED_EDGES. @@ -1573,6 +1575,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) bitmap_iterator bi; bitmap threaded_blocks; struct loop *loop; + bool totally_clobbered_loops = false; /* We must know about loops in order to preserve them. */ gcc_assert (current_loops != NULL); @@ -1609,35 +1612,79 @@ thread_through_all_blocks (bool
Re: [PATCH] Improve handling of threads which cross over the current loops header
On 11/22/13 05:10, Richard Biener wrote: + if (totally_clobbered_loops) +{ + /* Release the current loop structures, they are totally +clobbered at this point. */ + loop_optimizer_finalize (); + current_loops = NULL; This is definitely a no-go and should be immediately reverted. If you wreck a particular loop simply mark it for removal by setting -header and -latch to NULL. Loop fixup will then re-discover it (or really remove it). With the code above you discard all loops in the function including meta-information on openmp loops, #pragma ivdeps info, etc. I realize it discards all that stuff, but that's still the safeest thing to do. If we can selectively discard, that seems like a follow-up item. The loop stuff is new and something I know little about. Thus I chose the safest route. So the issue here is we can create irreducible regions new nested loops. Does just setting the header,latch fields for the current loop handle those cases? jeff
Re: [PATCH] Improve handling of threads which cross over the current loops header
Jeff Law l...@redhat.com wrote: On 11/22/13 05:10, Richard Biener wrote: + if (totally_clobbered_loops) +{ + /* Release the current loop structures, they are totally +clobbered at this point. */ + loop_optimizer_finalize (); + current_loops = NULL; This is definitely a no-go and should be immediately reverted. If you wreck a particular loop simply mark it for removal by setting -header and -latch to NULL. Loop fixup will then re-discover it (or really remove it). With the code above you discard all loops in the function including meta-information on openmp loops, #pragma ivdeps info, etc. I realize it discards all that stuff, but that's still the safeest thing to do. If we can selectively discard, that seems like a follow-up item. The loop stuff is new and something I know little about. Thus I chose the safest route. Well. Safe here is arguably a bit optimistic ;-) So the issue here is we can create irreducible regions new nested loops. Does just setting the header,latch fields for the current loop handle those cases? Yes. Richard. jeff
Re: [PATCH] Improve handling of threads which cross over the current loops header
On 11/22/13 08:56, Richard Biener wrote: So the issue here is we can create irreducible regions new nested loops. Does just setting the header,latch fields for the current loop handle those cases? Yes. Good. I'll take care of it. Thanks, Jeff