Re: [PATCH] Fix PR78343

2016-11-28 Thread Richard Biener
On Mon, 28 Nov 2016, Christophe Lyon wrote:

> Hi Richard,
> 
> 
> On 25 November 2016 at 11:20, Richard Biener  wrote:
> > On Thu, 24 Nov 2016, Richard Biener wrote:
> >
> >>
> >> I am testing the following patch for an optimization regression where
> >> a loop made dead by final value replacement was made used again by
> >> DOM 20 passes later.  The real issue here is that we do not get rid
> >> of dead loops until very late so this patch makes sure to do that.
> >> We could schedule it later (but better no later than unrolling
> >> as that might expose a pretty inefficient way of removing a dead loop).
> >>
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > As expected some testcases need adjustment, thus applied as follows.
> >
> > Richard.
> >
> > 2016-11-24  Richard Biener  
> >
> > PR tree-optimization/78343
> > * passes.def: Add CD-DCE pass after loop splitting.
> > * tree-ssa-dce.c (find_obviously_necessary_stmts): Move
> > SCEV init/finalize ...
> > (perform_tree_ssa_dce): ... here.  Deal with being
> > executed inside the loop pipeline in aggressive mode.
> >
> > * gcc.dg/tree-ssa/sccp-2.c: New testcase.
> > * gcc.dg/autopar/uns-outer-6.c: Adjust.
> > * gcc.dg/tree-ssa/20030808-1.c: Likewise.
> > * gcc.dg/tree-ssa/20040305-1.c: Likewise.
> > * gcc.dg/vect/pr38529.c: Likewise.
> >
> 
> But now, I am seeing failures on:
>   gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times cddce3 "->code" 0
>   gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times cddce3 "if " 0
>   gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times cddce3 "if " 2
> because the dump file does not exist.

Bah.  Fixed as follows.

2016-11-28  Richard Biener  

PR tree-optimization/78343
* gcc.dg/tree-ssa/20030808-1.c: Fix dump to generate.
* gcc.dg/tree-ssa/20040305-1.c: Likewise.

Index: gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c  (revision 242908)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-cddce2" } */
+/* { dg-options "-O1 -fdump-tree-cddce3" } */
   
 extern void abort (void);
 
Index: gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c  (revision 242908)
+++ gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cddce2 -fdump-tree-forwprop1-details" } */
+/* { dg-options "-O2 -fdump-tree-cddce3 -fdump-tree-forwprop1-details" } */
   
 int abarney[2];
 int afred[1];

> (on aarch64 and arm targets)
> 
> Christophe
> 
> 
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2a470a7..2fa682b 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -271,6 +271,9 @@ along with GCC; see the file COPYING3.  If not see
> >   NEXT_PASS (pass_tree_unswitch);
> >   NEXT_PASS (pass_scev_cprop);
> >   NEXT_PASS (pass_loop_split);
> > + /* All unswitching, final value replacement and splitting can 
> > expose
> > +empty loops.  Remove them now.  */
> > + NEXT_PASS (pass_cd_dce);
> >   NEXT_PASS (pass_record_bounds);
> >   NEXT_PASS (pass_loop_distribution);
> >   NEXT_PASS (pass_copy_prop);
> > diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c 
> > b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> > index dc2870b..5af60b0 100644
> > --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> > +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> > @@ -25,7 +25,7 @@ parloop (int N)
> >for (i = 0; i < N; i++)
> >  {
> >for (j = 0; j < N; j++)
> > -   y[i]=x[i][j];
> > +   y[i] += x[i][j];
> >sum += y[i];
> >  }
> >g_sum = sum;
> > @@ -46,6 +46,10 @@ main (void)
> >
> >
> >  /* Check that outer loop is parallelized.  */
> > +/* This fails because we have
> > + FAILED: data dependencies exist across iterations
> > +
> > +
> >  /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 
> > "parloops2" } } */
> >  /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 
> > "parloops2" } } */
> >  /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> > index 7cc5404..cda86a7 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> > @@ -33,8 +33,8 @@ delete_dead_jumptables ()
> >  /* There should be no loads of ->code.  If any exist, then we failed to
> > optimize away all the IF statements and the statements feeding
> > their conditions.  */
> > 

Re: [PATCH] Fix PR78343

2016-11-28 Thread Christophe Lyon
Hi Richard,


On 25 November 2016 at 11:20, Richard Biener  wrote:
> On Thu, 24 Nov 2016, Richard Biener wrote:
>
>>
>> I am testing the following patch for an optimization regression where
>> a loop made dead by final value replacement was made used again by
>> DOM 20 passes later.  The real issue here is that we do not get rid
>> of dead loops until very late so this patch makes sure to do that.
>> We could schedule it later (but better no later than unrolling
>> as that might expose a pretty inefficient way of removing a dead loop).
>>
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> As expected some testcases need adjustment, thus applied as follows.
>
> Richard.
>
> 2016-11-24  Richard Biener  
>
> PR tree-optimization/78343
> * passes.def: Add CD-DCE pass after loop splitting.
> * tree-ssa-dce.c (find_obviously_necessary_stmts): Move
> SCEV init/finalize ...
> (perform_tree_ssa_dce): ... here.  Deal with being
> executed inside the loop pipeline in aggressive mode.
>
> * gcc.dg/tree-ssa/sccp-2.c: New testcase.
> * gcc.dg/autopar/uns-outer-6.c: Adjust.
> * gcc.dg/tree-ssa/20030808-1.c: Likewise.
> * gcc.dg/tree-ssa/20040305-1.c: Likewise.
> * gcc.dg/vect/pr38529.c: Likewise.
>

But now, I am seeing failures on:
  gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times cddce3 "->code" 0
  gcc.dg/tree-ssa/20030808-1.c scan-tree-dump-times cddce3 "if " 0
  gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times cddce3 "if " 2
because the dump file does not exist.

(on aarch64 and arm targets)

Christophe


> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2a470a7..2fa682b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -271,6 +271,9 @@ along with GCC; see the file COPYING3.  If not see
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
>   NEXT_PASS (pass_loop_split);
> + /* All unswitching, final value replacement and splitting can expose
> +empty loops.  Remove them now.  */
> + NEXT_PASS (pass_cd_dce);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c 
> b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> index dc2870b..5af60b0 100644
> --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
> @@ -25,7 +25,7 @@ parloop (int N)
>for (i = 0; i < N; i++)
>  {
>for (j = 0; j < N; j++)
> -   y[i]=x[i][j];
> +   y[i] += x[i][j];
>sum += y[i];
>  }
>g_sum = sum;
> @@ -46,6 +46,10 @@ main (void)
>
>
>  /* Check that outer loop is parallelized.  */
> +/* This fails because we have
> + FAILED: data dependencies exist across iterations
> +
> +
>  /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 
> "parloops2" } } */
>  /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 
> "parloops2" } } */
>  /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> index 7cc5404..cda86a7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
> @@ -33,8 +33,8 @@ delete_dead_jumptables ()
>  /* There should be no loads of ->code.  If any exist, then we failed to
> optimize away all the IF statements and the statements feeding
> their conditions.  */
> -/* { dg-final { scan-tree-dump-times "->code" 0 "cddce2"} } */
> +/* { dg-final { scan-tree-dump-times "->code" 0 "cddce3"} } */
>
>  /* There should be no IF statements.  */
> -/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */
> +/* { dg-final { scan-tree-dump-times "if " 0 "cddce3"} } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
> index 501e28c..d1a9af8 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
> @@ -27,4 +27,4 @@ void foo(int edx, int eax)
>
>  /* After cddce we should have two IF statements remaining as the other
> two tests can be threaded.  */
> -/* { dg-final { scan-tree-dump-times "if " 2 "cddce2"} } */
> +/* { dg-final { scan-tree-dump-times "if " 2 "cddce3"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
> new file mode 100644
> index 000..099b281
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +unsigned int
> +test(unsigned int quant)
> +{
> +  unsigned int sum = 0;
> +  for (unsigned int i = 0; i < quant; ++i)
> +sum += quant;
> +  return sum;
> +}
> +
> +/* A single basic-block should remain (computing and

Re: [PATCH] Fix PR78343

2016-11-25 Thread Richard Biener
On Thu, 24 Nov 2016, Richard Biener wrote:

> 
> I am testing the following patch for an optimization regression where
> a loop made dead by final value replacement was made used again by
> DOM 20 passes later.  The real issue here is that we do not get rid
> of dead loops until very late so this patch makes sure to do that.
> We could schedule it later (but better no later than unrolling
> as that might expose a pretty inefficient way of removing a dead loop).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

As expected some testcases need adjustment, thus applied as follows.

Richard.

2016-11-24  Richard Biener  

PR tree-optimization/78343
* passes.def: Add CD-DCE pass after loop splitting.
* tree-ssa-dce.c (find_obviously_necessary_stmts): Move
SCEV init/finalize ...
(perform_tree_ssa_dce): ... here.  Deal with being
executed inside the loop pipeline in aggressive mode.

* gcc.dg/tree-ssa/sccp-2.c: New testcase.
* gcc.dg/autopar/uns-outer-6.c: Adjust.
* gcc.dg/tree-ssa/20030808-1.c: Likewise.
* gcc.dg/tree-ssa/20040305-1.c: Likewise.
* gcc.dg/vect/pr38529.c: Likewise.

diff --git a/gcc/passes.def b/gcc/passes.def
index 2a470a7..2fa682b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -271,6 +271,9 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_split);
+ /* All unswitching, final value replacement and splitting can expose
+empty loops.  Remove them now.  */
+ NEXT_PASS (pass_cd_dce);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c 
b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
index dc2870b..5af60b0 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
@@ -25,7 +25,7 @@ parloop (int N)
   for (i = 0; i < N; i++)
 {
   for (j = 0; j < N; j++)
-   y[i]=x[i][j];
+   y[i] += x[i][j];
   sum += y[i];
 }
   g_sum = sum;
@@ -46,6 +46,10 @@ main (void)
 
 
 /* Check that outer loop is parallelized.  */
+/* This fails because we have
+ FAILED: data dependencies exist across iterations
+   
+
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops2" 
} } */
 /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops2" 
} } */
 /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
index 7cc5404..cda86a7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
@@ -33,8 +33,8 @@ delete_dead_jumptables ()
 /* There should be no loads of ->code.  If any exist, then we failed to
optimize away all the IF statements and the statements feeding
their conditions.  */
-/* { dg-final { scan-tree-dump-times "->code" 0 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "->code" 0 "cddce3"} } */

 /* There should be no IF statements.  */
-/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce3"} } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
index 501e28c..d1a9af8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
@@ -27,4 +27,4 @@ void foo(int edx, int eax)
 
 /* After cddce we should have two IF statements remaining as the other
two tests can be threaded.  */
-/* { dg-final { scan-tree-dump-times "if " 2 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "if " 2 "cddce3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
new file mode 100644
index 000..099b281
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int
+test(unsigned int quant)
+{
+  unsigned int sum = 0;
+  for (unsigned int i = 0; i < quant; ++i)
+sum += quant;
+  return sum;
+}
+
+/* A single basic-block should remain (computing and
+   returning quant * quant).  */
+/* { dg-final { scan-tree-dump-times "bb" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr38529.c 
b/gcc/testsuite/gcc.dg/vect/pr38529.c
index 171adeb..9b5919d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr38529.c
+++ b/gcc/testsuite/gcc.dg/vect/pr38529.c
@@ -11,7 +11,3 @@ void foo()
 for (j = 0; j < 17; ++j)
   a[i] = 0;
 }
-
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect"  } } */
-
-
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 7b9814e..50b5eef 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -400,7