Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-27 Thread Richard Biener
On Tue, Nov 26, 2013 at 11:37 PM, Jeff Law l...@redhat.com wrote:
 On 11/26/13 02:26, Richard Biener wrote:

 But only necessary if this threading returned true, no?

 Correct.  Fix for that spinning overnight.



 Also
 how likely did it scramble the loop?  I see that thread_block_1
 already cancels loops in some cases so I wonder what case
 it misses so that you need to cancel the loop unconditonally here.

 If you get into that code, effectively always.  We know we're threading
 around the backedge through a multi-way branch at the top, then to some
 interior node within the loop.  It would depend on the precise structure of
 the CFG within the loop, but basically if you get here, there is a good
 chance you have irreducible loops.

 For a good example of how badly things can get scrambled, look at what we do
 with 54742.  Dump or draw CFGs before/after DOM1.  It's painful to see what
 we do to the loop structure, but removing that switch is such a huge win
 that losing the loop structure has become a non-concern.

Ick ;)  Loop fixup isn't able to recover the original loop here, it
discovers a new one, the one just covering looping through the
default: case of the switch.

The first pass that allows CFG manipulations during loop init
is able to disambiguate the mess DOM left us with and discovers
a loop nest of depth 4.

Nice transform I'd say ;)

Richard.

 Do you have a testcase that ICEs if I remove the cancelling above?

 I'd have to review the PRs to recall if they were ICEs and/or codegen
 failures.

 jeff



Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-27 Thread Jeff Law

On 11/27/13 04:28, Richard Biener wrote:


Ick ;)  Loop fixup isn't able to recover the original loop here, it
discovers a new one, the one just covering looping through the
default: case of the switch.
I wouldn't expect it to.  I don't have the transformed CFG in front of 
me right now, but as you probably saw, it's scrambled pretty badly.


Sadly, we found out yesterday that the submitter of that BZ reduced 
coremark too far and the transformation doesn't actually apply to 
coremark.   We're not going to try and extend things at this point in 
the game -- that'll have to be 4.10/5.0 material.


jeff


Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-26 Thread Richard Biener
On Mon, Nov 25, 2013 at 7:25 PM, Jeff Law l...@redhat.com wrote:
 On 11/22/13 08:56, Richard Biener wrote:


 So the issue here is we can create irreducible regions  new nested
 loops.  Does just setting the header,latch fields for the current loop
 handle those cases?


 Yes.

 Fixed via the attached patch.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Also tested
 by removing the requirement that the loop header contain a multi-way branch,
 bootstrapping and testing that compiler as well. Installed on the trunk.

 Jeff



 * tree-ssa-threadupdate.c (thread_through_all_blocks): Selectively
 invalidate loop information.



 diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
 index ee0c838..1a52e47 100644
 --- a/gcc/tree-ssa-threadupdate.c
 +++ b/gcc/tree-ssa-threadupdate.c
 @@ -1579,7 +1579,6 @@ thread_through_all_blocks (bool may_peel_loop_headers)
bitmap_iterator bi;
bitmap threaded_blocks;
struct loop *loop;
 -  bool totally_clobbered_loops = false;

/* We must know about loops in order to preserve them.  */
gcc_assert (current_loops != NULL);
 @@ -1675,9 +1674,15 @@ thread_through_all_blocks (bool
 may_peel_loop_headers)
 /* Our path is still valid, thread it.  */
 if (e-aux)
   {
 -   totally_clobbered_loops
 - |= thread_block ((*path)[0]-e-dest, false);
 +   struct loop *loop = (*path)[0]-e-dest-loop_father;
 +
 +   retval |= thread_block ((*path)[0]-e-dest, false);
 e-aux = NULL;
 +
 +   /* This jump thread likely totally scrambled this loop.
 +  So arrange for it to be fixed up.  */
 +   loop-header = NULL;
 +   loop-latch = NULL;

But only necessary if this threading returned true, no?  Also
how likely did it scramble the loop?  I see that thread_block_1
already cancels loops in some cases so I wonder what case
it misses so that you need to cancel the loop unconditonally here.

Do you have a testcase that ICEs if I remove the cancelling above?

Thanks,
Richard.

   }
   }
else
 @@ -1700,32 +1705,7 @@ thread_through_all_blocks (bool
 may_peel_loop_headers)
threaded_blocks = NULL;
paths.release ();

 -  /* If we made changes to the CFG that might have totally messed
 - up the loop structure, then drop the old loop structure and
 - rebuild.  */
 -  if (totally_clobbered_loops)
 -{
 -  /* Release the current loop structures, they are totally
 -clobbered at this point.  */
 -  loop_optimizer_finalize ();
 -  current_loops = NULL;
 -
 -  /* Similarly for dominance information.  */
 -  free_dominance_info (CDI_DOMINATORS);
 -  free_dominance_info (CDI_POST_DOMINATORS);
 -
 -  /* Before we can rebuild the loop structures, we need dominators,
 -which requires no unreachable code.  So remove unreachable code.
 */
 -  delete_unreachable_blocks ();
 -
 -  /* Now rebuild the loop structures.  */
 -  cfun-curr_properties = ~PROP_loops;
 -  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 -  cfun-curr_properties |= PROP_loops;
 -  retval = 1;
 -}
 -
 -  if (retval  current_loops)
 +  if (retval)
  loops_state_set (LOOPS_NEED_FIXUP);

return retval;



Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-26 Thread Jeff Law

On 11/26/13 02:26, Richard Biener wrote:


But only necessary if this threading returned true, no?

Correct.  Fix for that spinning overnight.



Also
how likely did it scramble the loop?  I see that thread_block_1
already cancels loops in some cases so I wonder what case
it misses so that you need to cancel the loop unconditonally here.
If you get into that code, effectively always.  We know we're threading 
around the backedge through a multi-way branch at the top, then to some 
interior node within the loop.  It would depend on the precise structure 
of the CFG within the loop, but basically if you get here, there is a 
good chance you have irreducible loops.


For a good example of how badly things can get scrambled, look at what 
we do with 54742.  Dump or draw CFGs before/after DOM1.  It's painful to 
see what we do to the loop structure, but removing that switch is such a 
huge win that losing the loop structure has become a non-concern.






Do you have a testcase that ICEs if I remove the cancelling above?
I'd have to review the PRs to recall if they were ICEs and/or codegen 
failures.


jeff



Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-25 Thread Jeff Law

On 11/22/13 08:56, Richard Biener wrote:



So the issue here is we can create irreducible regions  new nested
loops.  Does just setting the header,latch fields for the current loop
handle those cases?


Yes.

Fixed via the attached patch.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Also 
tested by removing the requirement that the loop header contain a 
multi-way branch, bootstrapping and testing that compiler as well. 
Installed on the trunk.


Jeff


* tree-ssa-threadupdate.c (thread_through_all_blocks): Selectively
invalidate loop information.



diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index ee0c838..1a52e47 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1579,7 +1579,6 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   bitmap_iterator bi;
   bitmap threaded_blocks;
   struct loop *loop;
-  bool totally_clobbered_loops = false;
 
   /* We must know about loops in order to preserve them.  */
   gcc_assert (current_loops != NULL);
@@ -1675,9 +1674,15 @@ thread_through_all_blocks (bool may_peel_loop_headers)
/* Our path is still valid, thread it.  */
if (e-aux)
  {
-   totally_clobbered_loops
- |= thread_block ((*path)[0]-e-dest, false);
+   struct loop *loop = (*path)[0]-e-dest-loop_father;
+
+   retval |= thread_block ((*path)[0]-e-dest, false);
e-aux = NULL;
+
+   /* This jump thread likely totally scrambled this loop.
+  So arrange for it to be fixed up.  */
+   loop-header = NULL;
+   loop-latch = NULL;
  }
  }
   else
@@ -1700,32 +1705,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   threaded_blocks = NULL;
   paths.release ();
 
-  /* If we made changes to the CFG that might have totally messed
- up the loop structure, then drop the old loop structure and
- rebuild.  */
-  if (totally_clobbered_loops)
-{
-  /* Release the current loop structures, they are totally
-clobbered at this point.  */
-  loop_optimizer_finalize ();
-  current_loops = NULL;
-
-  /* Similarly for dominance information.  */
-  free_dominance_info (CDI_DOMINATORS);
-  free_dominance_info (CDI_POST_DOMINATORS);
-
-  /* Before we can rebuild the loop structures, we need dominators,
-which requires no unreachable code.  So remove unreachable code.  */
-  delete_unreachable_blocks ();
-
-  /* Now rebuild the loop structures.  */
-  cfun-curr_properties = ~PROP_loops;
-  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
-  cfun-curr_properties |= PROP_loops;
-  retval = 1;
-}
-
-  if (retval  current_loops)
+  if (retval)
 loops_state_set (LOOPS_NEED_FIXUP);
 
   return retval;


Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-22 Thread Richard Biener
On Fri, Nov 22, 2013 at 8:48 AM, Jeff Law l...@redhat.com wrote:

 Right now jump threads from within a loop which cross the loop header, then
 terminate within the loop are ignored as this may create irreducible loops.

 This patch gets the CFG/SSA updating code in good enough shape to handle the
 case that the embedded guys care about.  ie, the coremark FSA/FSM benchmark.

 Basically we still ignore most of those threads -- the exception is when we
 find the loop header has a multi-way branch.  In that case we go ahead and
 thread the jump, throw away and rebuild the loop information.  Again, we
 only do this when we're able to remove a multi-way jump at the top of a
 loop.

 During a GCC bootstrap I found two instances where this triggers, once in
 GCC itself, one in the java runtime libraries.  I've manually verified that
 we do the right thing in those two instances as well as the FSA/FSM
 optimization.  However, the reality is this code isn't being well exercised.

 So (for testing purposes only) I removed the test that the loop header has
 to be a multi-way jump and enabled a follow-on patch which allows us to
 identify many more threads through the loop header to points within the
 current loop.  Not surprisingly the code triggered more often and uncovered
 a couple minor issues that I've addressed.

 The bootstrap/regression test for the patch ran fine, as did the regression
 and bootstrap test with various pass limitations removed (with the exception
 of tests which explicitly test that we don't thread in situations where
 it'll create irreducible loops).  This is good.

 Installed on the trunk.  One more patch to go before calling and end to my
 own development work for 4.9 :-)






 * tree-ssa-threadupdate.c: Include tree-cfg.h and tree-pass.h
 (thread_block_1): Do not cancel jump threads which go from
 inside a loop, through the header, then back inside the loop.
 (bb_ends_with_multiway_branch): New function.
 (thread_through_all_blocks): Handle threading cases which start
 in a loop through the loop header to a point in the loop.



 diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
 index 777fe41..38d4f69 100644
 --- a/gcc/tree-ssa-threadupdate.c
 +++ b/gcc/tree-ssa-threadupdate.c
 @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
  #include cfgloop.h
  #include hash-table.h
  #include dbgcnt.h
 +#include tree-cfg.h
 +#include tree-pass.h

  /* Given a block B, update the CFG and SSA graph to reflect redirecting
 one or more in-edges to B to instead reach the destination of an
 @@ -815,29 +817,15 @@ thread_block_1 (basic_block bb, bool noloop_only, bool
 joiners)
 }

   /* Another case occurs when trying to thread through our
 -own loop header, possibly from inside the loop.
 -
 -If our loop header is buried in the path, then go ahead
 -and cancel the jump threading request here.  This likely
 -will need updating for the FSA/FSM coremark case.
 -
 -Other cases (BB is the loop header) are handled elsewhere.  */
 +own loop header, possibly from inside the loop.  We will
 +thread these later.  */
   unsigned int i;
   for (i = 1; i  path-length (); i++)
 {
   if ((*path)[i]-e-src == bb-loop_father-header
(!loop_exit_edge_p (bb-loop_father, e2)
   || (*path)[1]-type == EDGE_COPY_SRC_JOINER_BLOCK))
 -   {
 - /* If i != 1, then it's a buried header that will not
 -be handled elsehwere.  */
 - if (i != 1)
 -   {
 - delete_jump_thread_path (path);
 - e-aux = NULL;
 -   }
 - break;
 -   }
 +   break;
 }

   if (i != path-length ())
 @@ -1554,6 +1542,20 @@ mark_threaded_blocks (bitmap threaded_blocks)
  }


 +/* Return TRUE if BB ends with a switch statement or a computed goto.
 +   Otherwise return false.  */
 +static bool
 +bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED)
 +{
 +  gimple stmt = last_stmt (bb);
 +  if (stmt  gimple_code (stmt) == GIMPLE_SWITCH)
 +return true;
 +  if (stmt  gimple_code (stmt) == GIMPLE_GOTO
 +   TREE_CODE (gimple_goto_dest (stmt)) == SSA_NAME)
 +return true;
 +  return false;
 +}
 +
  /* Walk through all blocks and thread incoming edges to the appropriate
 outgoing edge for each edge pair recorded in THREADED_EDGES.

 @@ -1573,6 +1575,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
bitmap_iterator bi;
bitmap threaded_blocks;
struct loop *loop;
 +  bool totally_clobbered_loops = false;

/* We must know about loops in order to preserve them.  */
gcc_assert (current_loops != NULL);
 @@ -1609,35 +1612,79 @@ thread_through_all_blocks (bool
 

Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-22 Thread Jeff Law

On 11/22/13 05:10, Richard Biener wrote:


+  if (totally_clobbered_loops)
+{
+  /* Release the current loop structures, they are totally
+clobbered at this point.  */
+  loop_optimizer_finalize ();
+  current_loops = NULL;


This is definitely a no-go and should be immediately reverted.  If you
wreck a particular loop simply mark it for removal by setting
-header and -latch to NULL.  Loop fixup will then re-discover
it (or really remove it).

With the code above you discard all loops in the function including
meta-information on openmp loops, #pragma ivdeps info, etc.
I realize it discards all that stuff, but that's still the safeest thing 
to do.


If we can selectively discard, that seems like a follow-up item.  The 
loop stuff is new and something I know little about.  Thus I chose the 
safest route.


So the issue here is we can create irreducible regions  new nested 
loops.  Does just setting the header,latch fields for the current loop 
handle those cases?


jeff


Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-22 Thread Richard Biener
Jeff Law l...@redhat.com wrote:
On 11/22/13 05:10, Richard Biener wrote:

 +  if (totally_clobbered_loops)
 +{
 +  /* Release the current loop structures, they are totally
 +clobbered at this point.  */
 +  loop_optimizer_finalize ();
 +  current_loops = NULL;

 This is definitely a no-go and should be immediately reverted.  If
you
 wreck a particular loop simply mark it for removal by setting
 -header and -latch to NULL.  Loop fixup will then re-discover
 it (or really remove it).

 With the code above you discard all loops in the function including
 meta-information on openmp loops, #pragma ivdeps info, etc.
I realize it discards all that stuff, but that's still the safeest
thing 
to do.

If we can selectively discard, that seems like a follow-up item.  The 
loop stuff is new and something I know little about.  Thus I chose the 
safest route.

Well.  Safe here is arguably a bit optimistic ;-)

So the issue here is we can create irreducible regions  new nested 
loops.  Does just setting the header,latch fields for the current loop 
handle those cases?

Yes.

Richard.

jeff




Re: [PATCH] Improve handling of threads which cross over the current loops header

2013-11-22 Thread Jeff Law

On 11/22/13 08:56, Richard Biener wrote:



So the issue here is we can create irreducible regions  new nested
loops.  Does just setting the header,latch fields for the current loop
handle those cases?


Yes.

Good.  I'll take care of it.

Thanks,
Jeff