Re: [PATCH] Fix PR77283

2017-01-13 Thread Richard Biener
On Thu, 12 Jan 2017, Jeff Law wrote:

> On 01/12/2017 07:55 AM, Richard Biener wrote:
> > 
> > The following fixes PR77283, path splitting being overly aggressive
> > and causing loop unrolling not to happen (because how it distorts the
> > CFG).
> > 
> > It is a aim at creating a cost model (there's none apart from
> > not duplicating too much stmts) by means of the observation that
> > we'd have to have PHI nodes in the joiner to have any possibility
> > of CSE opportunities being exposed by duplicating it or threading
> > opportunities being exposed across the new latch.  That includes
> > virtual PHIs for CSE (so any load/store) but not for the threading
> > (but IMHO threading should figure all this out on its own without
> > the requirement for somebody else duplicating the joiner).
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
> > libquantum regression is reportedly fixed by this.  I had to adjust
> > gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
> > I (and of course the cost model) can't see any reason to do it.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Richard.
> > 
> > 2017-01-12  Richard Biener  
> > 
> > PR tree-optimization/77283
> > * gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h
> > and ssa-iterators.h.
> > (is_feasible_trace): Implement a cost model based on joiner
> > PHI node uses.
> > 
> > * gcc.dg/tree-ssa/split-path-7.c: Adjust.
> > * gcc.dg/tree-ssa/split-path-8.c: New testcase.
> > * gcc.dg/tree-ssa/split-path-9.c: Likewise.
> So I think the only concern is split-path-7.  My memory is hazy, but I suspect
> split-path-7 shows the case where path splitting's CFG manipulations can
> result in fewer jumps for diamond sub-graphs.  You might see assembly code
> improvements due to path splitting on this test for the microblaze port.
> 
> Certainly the code in gimple-ssa-split-paths.c that you're adding is an
> improvement and brings gimple path splitting closer to its intended purpose.
> I don't think regressing split-path-7 should block this improvement, but we
> would want a PR to track the code quality regression.
> 
> So I think it's OK for the trunk and if it shows a code quality regression for
> split-path-7 on the microblaze port that we should have a distinct PR to track
> that issue (which is probably best solved in bb-reorder).

Committed.  With the wrong variant of the -9 test, fixed as below.

Richard.

2017-01-13  Richard Biener  

PR tree-optimization/77283
* gcc.dg/tree-ssa/split-path-9.c: Fix.

Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c(revision 244393)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c(working copy)
@@ -9,8 +9,8 @@ foo(unsigned int size, unsigned int *sta
 
   for(i = 0; i < size; i++)
 {
-  if(*state & 1)
-   *state ^= 1;
+  if(state[i] & 1)
+   state[i] ^= 1;
 }
 }
 


Re: [PATCH] Fix PR77283

2017-01-12 Thread Jeff Law

On 01/12/2017 07:55 AM, Richard Biener wrote:


The following fixes PR77283, path splitting being overly aggressive
and causing loop unrolling not to happen (because how it distorts the
CFG).

It is a aim at creating a cost model (there's none apart from
not duplicating too much stmts) by means of the observation that
we'd have to have PHI nodes in the joiner to have any possibility
of CSE opportunities being exposed by duplicating it or threading
opportunities being exposed across the new latch.  That includes
virtual PHIs for CSE (so any load/store) but not for the threading
(but IMHO threading should figure all this out on its own without
the requirement for somebody else duplicating the joiner).

Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
libquantum regression is reportedly fixed by this.  I had to adjust
gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
I (and of course the cost model) can't see any reason to do it.

Ok for trunk?

Thanks,
Richard.

2017-01-12  Richard Biener  

PR tree-optimization/77283
* gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h
and ssa-iterators.h.
(is_feasible_trace): Implement a cost model based on joiner
PHI node uses.

* gcc.dg/tree-ssa/split-path-7.c: Adjust.
* gcc.dg/tree-ssa/split-path-8.c: New testcase.
* gcc.dg/tree-ssa/split-path-9.c: Likewise.
So I think the only concern is split-path-7.  My memory is hazy, but I 
suspect split-path-7 shows the case where path splitting's CFG 
manipulations can result in fewer jumps for diamond sub-graphs.  You 
might see assembly code improvements due to path splitting on this test 
for the microblaze port.


Certainly the code in gimple-ssa-split-paths.c that you're adding is an 
improvement and brings gimple path splitting closer to its intended 
purpose.  I don't think regressing split-path-7 should block this 
improvement, but we would want a PR to track the code quality regression.


So I think it's OK for the trunk and if it shows a code quality 
regression for split-path-7 on the microblaze port that we should have a 
distinct PR to track that issue (which is probably best solved in 
bb-reorder).


Thanks,
Jeff


Re: [PATCH] Fix PR77283

2017-01-12 Thread Jeff Law

On 01/12/2017 07:55 AM, Richard Biener wrote:


The following fixes PR77283, path splitting being overly aggressive
and causing loop unrolling not to happen (because how it distorts the
CFG).

It is a aim at creating a cost model (there's none apart from
not duplicating too much stmts) by means of the observation that
we'd have to have PHI nodes in the joiner to have any possibility
of CSE opportunities being exposed by duplicating it or threading
opportunities being exposed across the new latch.  That includes
virtual PHIs for CSE (so any load/store) but not for the threading
(but IMHO threading should figure all this out on its own without
the requirement for somebody else duplicating the joiner).

Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
libquantum regression is reportedly fixed by this.  I had to adjust
gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
I (and of course the cost model) can't see any reason to do it.
I went back and reviewed the discussion from last year.  The conclusion 
for linit (split-path-7.c) was that there was a path we could split, but 
that there was no real benefit in doing so at the tree level.


The more general conclusion was that path splitting rarely exposes 
CSE/DCE opportunities, contrary to the original motivation (the adpcm 
encoder is the exception).  What path splitting does more often is 
remove an unconditional branch in diamond shaped sub-graphs.


In an ideal world, raw path splitting would move into the RTL pipeline 
since it's primary value is to eliminate jumps and we'd use some kind of 
PHI partitioning to handle cases where constant values from some paths 
of control allow simplification at use sites.  It's just path isolation.


I'll try to get to the rest of the review tomorrow tonight/tomorrow.

jeff



[PATCH] Fix PR77283

2017-01-12 Thread Richard Biener

The following fixes PR77283, path splitting being overly aggressive
and causing loop unrolling not to happen (because how it distorts the
CFG).

It is a aim at creating a cost model (there's none apart from
not duplicating too much stmts) by means of the observation that
we'd have to have PHI nodes in the joiner to have any possibility
of CSE opportunities being exposed by duplicating it or threading
opportunities being exposed across the new latch.  That includes
virtual PHIs for CSE (so any load/store) but not for the threading
(but IMHO threading should figure all this out on its own without
the requirement for somebody else duplicating the joiner).

Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x
libquantum regression is reportedly fixed by this.  I had to adjust
gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because
I (and of course the cost model) can't see any reason to do it.

Ok for trunk?

Thanks,
Richard.

2017-01-12  Richard Biener  

PR tree-optimization/77283
* gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h
and ssa-iterators.h.
(is_feasible_trace): Implement a cost model based on joiner
PHI node uses.

* gcc.dg/tree-ssa/split-path-7.c: Adjust.
* gcc.dg/tree-ssa/split-path-8.c: New testcase.
* gcc.dg/tree-ssa/split-path-9.c: Likewise.

Index: gcc/gimple-ssa-split-paths.c
===
--- gcc/gimple-ssa-split-paths.c(revision 244350)
+++ gcc/gimple-ssa-split-paths.c(working copy)
@@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.
 #include "tracer.h"
 #include "predict.h"
 #include "params.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
+#include "ssa-iterators.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -200,6 +203,58 @@ is_feasible_trace (basic_block bb)
}
 }
 
+  /* If the joiner has no PHIs with useful uses there is zero chance
+ of CSE/DCE/jump-threading possibilities exposed by duplicating it.  */
+  bool found_useful_phi = false;
+  for (gphi_iterator si = gsi_start_phis (bb); ! gsi_end_p (si);
+   gsi_next ())
+{
+  gphi *phi = si.phi ();
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  FOR_EACH_IMM_USE_FAST (use_p, iter, gimple_phi_result (phi))
+   {
+ gimple *stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+   continue;
+ /* If there's a use in the joiner this might be a CSE/DCE
+opportunity.  */
+ if (gimple_bb (stmt) == bb)
+   {
+ found_useful_phi = true;
+ break;
+   }
+ /* If the use is on a loop header PHI and on one path the
+value is unchanged this might expose a jump threading
+opportunity.  */
+ if (gimple_code (stmt) == GIMPLE_PHI
+ && gimple_bb (stmt) == bb->loop_father->header
+ /* But for memory the PHI alone isn't good enough.  */
+ && ! virtual_operand_p (gimple_phi_result (stmt)))
+   {
+ for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+   if (gimple_phi_arg_def (phi, i) == gimple_phi_result (stmt))
+ {
+   found_useful_phi = true;
+   break;
+ }
+ if (found_useful_phi)
+   break;
+   }
+   }
+  if (found_useful_phi)
+   break;
+}
+  if (! found_useful_phi)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Block %d is a join that does not expose CSE/DCE/jump-thread "
+"opportunities when duplicated.\n",
+bb->index);
+  return false;
+}
+
   /* We may want something here which looks at dataflow and tries
  to guess if duplication of BB is likely to result in simplification
  of instructions in BB in either the original or the duplicate.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c(revision 244350)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c(working copy)
@@ -91,4 +91,4 @@ linit ()
}
 }
 }
-/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" 
} } */
+/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" 
} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c(working copy)
@@ -0,0 +1,14 @@
+/* PR77283 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-split-paths-details" }