Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:26 PM, Bin.Cheng  wrote:
> On Tue, Jun 20, 2017 at 12:36 PM, Richard Biener
>  wrote:
>> On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
 On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
  wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> For now, loop distribution handles variables used outside of loop as 
>> reduction.
>> This is inaccurate because all partitions contain statement defining 
>> induction
>> vars.
>
> But final induction values are usually not used outside of the loop...
 This is in actuality for induction variable which is used outside of the 
 loop.
>
> What is missing is loop distribution trying to change partition order.  
> In fact
> we somehow assume we can move a reduction across a detected builtin
> (I don't remember if we ever check for validity of that...).
 Hmm, I am not sure when we can't.  If there is any dependence between
 builtin/reduction partitions, it should be captured by RDG or PG,
 otherwise the partitions are independent and can be freely ordered as
 long as reduction partition is scheduled last?
>
>> Ideally we should factor out scev-propagation as a standalone interface
>> which can be called when necessary.  Before that, this patch simply 
>> workarounds
>> reduction issue by checking if the statement belongs to all partitions.  
>> If yes,
>> the reduction must be computed in the last partition no matter how the 
>> loop is
>> distributed.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> stmt_in_all_partitions is not kept up-to-date during partition merging 
> and if
> merging makes the reduction partition(s) pass the stmt_in_all_partitions
> test your simple workaround doesn't work ...
 I think it doesn't matter because:
   A) it's really workaround for induction variables.  In general,
 induction variables are included by all partition.
   B) After classify partition, we immediately fuses all reduction
 partitions.  More stmt_in_all_partitions means we are fusing
 non-reduction partition with reduction partition, so the newly
 generated (stmt_in_all_partitions) are actually not reduction
 statements.  The workaround won't work anyway even the bitmap is
 maintained.
>
> As written it's a valid optimization but can you please note it's 
> limitation in
> some comment please?
 Yeah, I will add comment explaining it.
>>> Comment added in new version patch.  It also computes bitmap outside
>>> now, is it OK?
>>
>> Ok.  Can you add a testcase for this as well please?  I think the
>> series up to this
>> is now fully reviewed, I defered 1/n (the new IFN) to the last one
>> containing the
>> runtime versioning.  Can you re-post that (you can merge with the IFN patch)
>> to apply after the series has been applied up to this?
> Test case added.

Ok.

> Thanks,
> bin
> 2017-06-20  Bin Cheng  
>
> * tree-loop-distribution.c (classify_partition): New parameter and
> better handle reduction statement.
> (rdg_build_partitions): Revise comment.
> (distribute_loop): Compute statements in all partitions and pass it
> to classify_partition.
>
> gcc/testsuite/ChangeLog
> 2017-06-20  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-26.c: New test.


Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-23 Thread Bin.Cheng
On Tue, Jun 20, 2017 at 12:36 PM, Richard Biener
 wrote:
> On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> For now, loop distribution handles variables used outside of loop as 
> reduction.
> This is inaccurate because all partitions contain statement defining 
> induction
> vars.

 But final induction values are usually not used outside of the loop...
>>> This is in actuality for induction variable which is used outside of the 
>>> loop.

 What is missing is loop distribution trying to change partition order.  In 
 fact
 we somehow assume we can move a reduction across a detected builtin
 (I don't remember if we ever check for validity of that...).
>>> Hmm, I am not sure when we can't.  If there is any dependence between
>>> builtin/reduction partitions, it should be captured by RDG or PG,
>>> otherwise the partitions are independent and can be freely ordered as
>>> long as reduction partition is scheduled last?

> Ideally we should factor out scev-propagation as a standalone interface
> which can be called when necessary.  Before that, this patch simply 
> workarounds
> reduction issue by checking if the statement belongs to all partitions.  
> If yes,
> the reduction must be computed in the last partition no matter how the 
> loop is
> distributed.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 stmt_in_all_partitions is not kept up-to-date during partition merging and 
 if
 merging makes the reduction partition(s) pass the stmt_in_all_partitions
 test your simple workaround doesn't work ...
>>> I think it doesn't matter because:
>>>   A) it's really workaround for induction variables.  In general,
>>> induction variables are included by all partition.
>>>   B) After classify partition, we immediately fuses all reduction
>>> partitions.  More stmt_in_all_partitions means we are fusing
>>> non-reduction partition with reduction partition, so the newly
>>> generated (stmt_in_all_partitions) are actually not reduction
>>> statements.  The workaround won't work anyway even the bitmap is
>>> maintained.

 As written it's a valid optimization but can you please note it's 
 limitation in
 some comment please?
>>> Yeah, I will add comment explaining it.
>> Comment added in new version patch.  It also computes bitmap outside
>> now, is it OK?
>
> Ok.  Can you add a testcase for this as well please?  I think the
> series up to this
> is now fully reviewed, I defered 1/n (the new IFN) to the last one
> containing the
> runtime versioning.  Can you re-post that (you can merge with the IFN patch)
> to apply after the series has been applied up to this?
Test case added.

Thanks,
bin
2017-06-20  Bin Cheng  

* tree-loop-distribution.c (classify_partition): New parameter and
better handle reduction statement.
(rdg_build_partitions): Revise comment.
(distribute_loop): Compute statements in all partitions and pass it
to classify_partition.

gcc/testsuite/ChangeLog
2017-06-20  Bin Cheng  

* gcc.dg/tree-ssa/ldist-26.c: New test.
From b16a4839f3211737dccc3ff92ab2c4f325907cd3 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 22 Jun 2017 17:16:58 +0100
Subject: [PATCH 11/13] reduction-workaround-20170607.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c | 36 ++
 gcc/tree-loop-distribution.c | 43 
 2 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
new file mode 100644
index 000..3a69884
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -ftree-loop-distribution -fdump-tree-ldist-details" } */
+
+extern void abort (void);
+
+int a[130], b[128], c[128];
+
+int __attribute__((noinline,noclone))
+foo (int len, int x)
+{
+  int i;
+  for (i = 1; i <= len; ++i)
+{
+  a[i] = a[i + 2] + 1;
+  b[i] = 0;
+  a[i + 1] = a[i] - 3;
+  if (i < x)
+	c[i] = a[i];
+}
+  return i;
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 130; ++i)
+a[i] = i;
+  foo (127, 67);
+  if (a[0] != 0 || a[1] != 4 || a[127] != 130)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 2 loops and 0 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "distributed: split to 1 loops and 1 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump 

Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 For now, loop distribution handles variables used outside of loop as 
 reduction.
 This is inaccurate because all partitions contain statement defining 
 induction
 vars.
>>>
>>> But final induction values are usually not used outside of the loop...
>> This is in actuality for induction variable which is used outside of the 
>> loop.
>>>
>>> What is missing is loop distribution trying to change partition order.  In 
>>> fact
>>> we somehow assume we can move a reduction across a detected builtin
>>> (I don't remember if we ever check for validity of that...).
>> Hmm, I am not sure when we can't.  If there is any dependence between
>> builtin/reduction partitions, it should be captured by RDG or PG,
>> otherwise the partitions are independent and can be freely ordered as
>> long as reduction partition is scheduled last?
>>>
 Ideally we should factor out scev-propagation as a standalone interface
 which can be called when necessary.  Before that, this patch simply 
 workarounds
 reduction issue by checking if the statement belongs to all partitions.  
 If yes,
 the reduction must be computed in the last partition no matter how the 
 loop is
 distributed.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> stmt_in_all_partitions is not kept up-to-date during partition merging and 
>>> if
>>> merging makes the reduction partition(s) pass the stmt_in_all_partitions
>>> test your simple workaround doesn't work ...
>> I think it doesn't matter because:
>>   A) it's really workaround for induction variables.  In general,
>> induction variables are included by all partition.
>>   B) After classify partition, we immediately fuses all reduction
>> partitions.  More stmt_in_all_partitions means we are fusing
>> non-reduction partition with reduction partition, so the newly
>> generated (stmt_in_all_partitions) are actually not reduction
>> statements.  The workaround won't work anyway even the bitmap is
>> maintained.
>>>
>>> As written it's a valid optimization but can you please note it's 
>>> limitation in
>>> some comment please?
>> Yeah, I will add comment explaining it.
> Comment added in new version patch.  It also computes bitmap outside
> now, is it OK?

Ok.  Can you add a testcase for this as well please?  I think the
series up to this
is now fully reviewed, I defered 1/n (the new IFN) to the last one
containing the
runtime versioning.  Can you re-post that (you can merge with the IFN patch)
to apply after the series has been applied up to this?

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (classify_partition): New parameter and
> better handle reduction statement.
> (rdg_build_partitions): Revise comment.
> (distribute_loop): Compute statements in all partitions and pass it
> to classify_partition.


Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-20 Thread Bin.Cheng
On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> For now, loop distribution handles variables used outside of loop as 
>>> reduction.
>>> This is inaccurate because all partitions contain statement defining 
>>> induction
>>> vars.
>>
>> But final induction values are usually not used outside of the loop...
> This is in actuality for induction variable which is used outside of the loop.
>>
>> What is missing is loop distribution trying to change partition order.  In 
>> fact
>> we somehow assume we can move a reduction across a detected builtin
>> (I don't remember if we ever check for validity of that...).
> Hmm, I am not sure when we can't.  If there is any dependence between
> builtin/reduction partitions, it should be captured by RDG or PG,
> otherwise the partitions are independent and can be freely ordered as
> long as reduction partition is scheduled last?
>>
>>> Ideally we should factor out scev-propagation as a standalone interface
>>> which can be called when necessary.  Before that, this patch simply 
>>> workarounds
>>> reduction issue by checking if the statement belongs to all partitions.  If 
>>> yes,
>>> the reduction must be computed in the last partition no matter how the loop 
>>> is
>>> distributed.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> stmt_in_all_partitions is not kept up-to-date during partition merging and if
>> merging makes the reduction partition(s) pass the stmt_in_all_partitions
>> test your simple workaround doesn't work ...
> I think it doesn't matter because:
>   A) it's really workaround for induction variables.  In general,
> induction variables are included by all partition.
>   B) After classify partition, we immediately fuses all reduction
> partitions.  More stmt_in_all_partitions means we are fusing
> non-reduction partition with reduction partition, so the newly
> generated (stmt_in_all_partitions) are actually not reduction
> statements.  The workaround won't work anyway even the bitmap is
> maintained.
>>
>> As written it's a valid optimization but can you please note it's limitation 
>> in
>> some comment please?
> Yeah, I will add comment explaining it.
Comment added in new version patch.  It also computes bitmap outside
now, is it OK?

Thanks,
bin
2017-06-07  Bin Cheng  

* tree-loop-distribution.c (classify_partition): New parameter and
better handle reduction statement.
(rdg_build_partitions): Revise comment.
(distribute_loop): Compute statements in all partitions and pass it
to classify_partition.
From b3502d713309da08d93cd53e5fe8fbfdccf3557b Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 13:21:07 +0100
Subject: [PATCH 11/13] reduction-workaround-20170607.txt

---
 gcc/tree-loop-distribution.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index d741e9e..1b4e238 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1226,17 +1226,18 @@ build_rdg_partition_for_vertex (struct graph *rdg, int v)
 }
 
 /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
-   For the moment we detect only the memset zero pattern.  */
+   For the moment we detect memset, memcpy and memmove patterns.  Bitmap
+   STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.  */
 
 static void
-classify_partition (loop_p loop, struct graph *rdg, partition *partition)
+classify_partition (loop_p loop, struct graph *rdg, partition *partition,
+		bitmap stmt_in_all_partitions)
 {
   bitmap_iterator bi;
   unsigned i;
   tree nb_iter;
   data_reference_p single_load, single_store;
-  bool volatiles_p = false;
-  bool plus_one = false;
+  bool volatiles_p = false, plus_one = false, has_reduction = false;
 
   partition->kind = PKIND_NORMAL;
   partition->main_dr = NULL;
@@ -1251,16 +1252,31 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition)
   if (gimple_has_volatile_ops (stmt))
 	volatiles_p = true;
 
-  /* If the stmt has uses outside of the loop mark it as reduction.  */
+  /* If the stmt is not included by all partitions and there is uses
+	 outside of the loop, then mark the partition as reduction.  */
   if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
 	{
-	  partition->reduction_p = true;
-	  return;
+	  /* Due to limitation in the transform phase we have to fuse all
+	 reduction partitions.  As a result, this could cancel valid
+	 loop distribution especially for loop that induction variable
+	 is used outside of loop.  To workaround this issue, we skip
+	 marking partition as reudction if the reduction stmt 

Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-16 Thread Bin.Cheng
On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
 wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> For now, loop distribution handles variables used outside of loop as 
>> reduction.
>> This is inaccurate because all partitions contain statement defining 
>> induction
>> vars.
>
> But final induction values are usually not used outside of the loop...
This is in actuality for induction variable which is used outside of the loop.
>
> What is missing is loop distribution trying to change partition order.  In 
> fact
> we somehow assume we can move a reduction across a detected builtin
> (I don't remember if we ever check for validity of that...).
Hmm, I am not sure when we can't.  If there is any dependence between
builtin/reduction partitions, it should be captured by RDG or PG,
otherwise the partitions are independent and can be freely ordered as
long as reduction partition is scheduled last?
>
>> Ideally we should factor out scev-propagation as a standalone interface
>> which can be called when necessary.  Before that, this patch simply 
>> workarounds
>> reduction issue by checking if the statement belongs to all partitions.  If 
>> yes,
>> the reduction must be computed in the last partition no matter how the loop 
>> is
>> distributed.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> stmt_in_all_partitions is not kept up-to-date during partition merging and if
> merging makes the reduction partition(s) pass the stmt_in_all_partitions
> test your simple workaround doesn't work ...
I think it doesn't matter because:
  A) it's really workaround for induction variables.  In general,
induction variables are included by all partition.
  B) After classify partition, we immediately fuses all reduction
partitions.  More stmt_in_all_partitions means we are fusing
non-reduction partition with reduction partition, so the newly
generated (stmt_in_all_partitions) are actually not reduction
statements.  The workaround won't work anyway even the bitmap is
maintained.
>
> As written it's a valid optimization but can you please note it's limitation 
> in
> some comment please?
Yeah, I will add comment explaining it.

Thanks,
bin
>
> Also...
>
> +  bitmap_set_range (stmt_in_all_partitions, 0, rdg->n_vertices);
> +  rdg_build_partitions (rdg, stmts, , stmt_in_all_partitions);
>
> ick.  Please instead do
>
>bitmap_copy (smtt_in_all_partitions, partitions[0]->stmts);
>for (i = 1; i < ...)
>  bitmap_and_into (stmt_in_all_partitons, partitions[i]->stmts);
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-07  Bin Cheng  
>>
>> * tree-loop-distribution.c (classify_partition): New parameter and
>> better handle reduction statement.
>> (rdg_build_partitions): New parameter and record statements belonging
>> to all partitions.
>> (distribute_loop): Update use of above functions.


Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-16 Thread Richard Biener
On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> For now, loop distribution handles variables used outside of loop as 
> reduction.
> This is inaccurate because all partitions contain statement defining induction
> vars.

But final induction values are usually not used outside of the loop...

What is missing is loop distribution trying to change partition order.  In fact
we somehow assume we can move a reduction across a detected builtin
(I don't remember if we ever check for validity of that...).

> Ideally we should factor out scev-propagation as a standalone interface
> which can be called when necessary.  Before that, this patch simply 
> workarounds
> reduction issue by checking if the statement belongs to all partitions.  If 
> yes,
> the reduction must be computed in the last partition no matter how the loop is
> distributed.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

stmt_in_all_partitions is not kept up-to-date during partition merging and if
merging makes the reduction partition(s) pass the stmt_in_all_partitions
test your simple workaround doesn't work ...

As written it's a valid optimization but can you please note it's limitation in
some comment please?

Also...

+  bitmap_set_range (stmt_in_all_partitions, 0, rdg->n_vertices);
+  rdg_build_partitions (rdg, stmts, , stmt_in_all_partitions);

ick.  Please instead do

   bitmap_copy (smtt_in_all_partitions, partitions[0]->stmts);
   for (i = 1; i < ...)
 bitmap_and_into (stmt_in_all_partitons, partitions[i]->stmts);

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (classify_partition): New parameter and
> better handle reduction statement.
> (rdg_build_partitions): New parameter and record statements belonging
> to all partitions.
> (distribute_loop): Update use of above functions.


[PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-12 Thread Bin Cheng
Hi,
For now, loop distribution handles variables used outside of loop as reduction.
This is inaccurate because all partitions contain statement defining induction
vars.  Ideally we should factor out scev-propagation as a standalone interface
which can be called when necessary.  Before that, this patch simply workarounds
reduction issue by checking if the statement belongs to all partitions.  If yes,
the reduction must be computed in the last partition no matter how the loop is
distributed.
Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin
2017-06-07  Bin Cheng  

* tree-loop-distribution.c (classify_partition): New parameter and
better handle reduction statement.
(rdg_build_partitions): New parameter and record statements belonging
to all partitions.
(distribute_loop): Update use of above functions.From 51764e6a377cf21ef13ffc36928c9f2b8932aac2 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 13:21:07 +0100
Subject: [PATCH 12/14] reduction-workaround-20170607.txt

---
 gcc/tree-loop-distribution.c | 40 +++-
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 7e31fee8..167155e 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1276,17 +1276,18 @@ build_rdg_partition_for_vertex (struct graph *rdg, int 
v)
 }
 
 /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
-   For the moment we detect only the memset zero pattern.  */
+   For the moment we detect memset, memcpy and memmove patterns.  Bitmap
+   STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.  */
 
 static void
-classify_partition (loop_p loop, struct graph *rdg, partition *partition)
+classify_partition (loop_p loop, struct graph *rdg, partition *partition,
+   bitmap stmt_in_all_partitions)
 {
   bitmap_iterator bi;
   unsigned i;
   tree nb_iter;
   data_reference_p single_load, single_store;
-  bool volatiles_p = false;
-  bool plus_one = false;
+  bool volatiles_p = false, plus_one = false, has_reduction = false;
 
   partition->kind = PKIND_NORMAL;
   partition->main_dr = NULL;
@@ -1301,16 +1302,24 @@ classify_partition (loop_p loop, struct graph *rdg, 
partition *partition)
   if (gimple_has_volatile_ops (stmt))
volatiles_p = true;
 
-  /* If the stmt has uses outside of the loop mark it as reduction.  */
+  /* If the stmt is not included by all partitions and there is uses
+outside of the loop, then mark the partition as reduction.  */
   if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
{
- partition->reduction_p = true;
- return;
+ if (!bitmap_bit_p (stmt_in_all_partitions, i))
+   {
+ partition->reduction_p = true;
+ return;
+   }
+ has_reduction = true;
}
 }
 
   /* Perform general partition disqualification for builtins.  */
   if (volatiles_p
+  /* Simple workaround to prevent classifying the partition as builtin
+if it contains any use outside of loop.  */
+  || has_reduction
   || !flag_tree_loop_distribute_patterns)
 return;
 
@@ -1540,14 +1549,16 @@ share_memory_accesses (struct graph *rdg,
   return false;
 }
 
-/* Aggregate several components into a useful partition that is
-   registered in the PARTITIONS vector.  Partitions will be
-   distributed in different loops.  */
+/* For each seed statement in STARTING_STMTS, this function builds
+   partition for it by adding depended statements according to RDG.
+   All partitions are recorded in PARTITIONS.  Statements belongs
+   to all partitions are recorded in STMT_IN_ALL_PARTITIONS.  */
 
 static void
 rdg_build_partitions (struct graph *rdg,
  vec starting_stmts,
- vec *partitions)
+ vec *partitions,
+ bitmap stmt_in_all_partitions)
 {
   auto_bitmap processed;
   int i;
@@ -1568,6 +1579,7 @@ rdg_build_partitions (struct graph *rdg,
 
   partition *partition = build_rdg_partition_for_vertex (rdg, v);
   bitmap_ior_into (processed, partition->stmts);
+  bitmap_and_into (stmt_in_all_partitions, partition->stmts);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -1814,13 +1826,15 @@ distribute_loop (struct loop *loop, vec stmts,
   ddrs_vec = new vec ();
   ddrs_table = new hash_table (389);
 
+  auto_bitmap stmt_in_all_partitions;
   auto_vec partitions;
-  rdg_build_partitions (rdg, stmts, );
+  bitmap_set_range (stmt_in_all_partitions, 0, rdg->n_vertices);
+  rdg_build_partitions (rdg, stmts, , stmt_in_all_partitions);
 
   any_builtin = false;
   FOR_EACH_VEC_ELT (partitions, i, partition)
 {
-  classify_partition (loop, rdg, partition);
+  classify_partition (loop, rdg,