Re: [PATCH][gomp4] Optimize expand_omp_for_static_chunk for chunk_size one

2015-08-24 Thread Tom de Vries

On 24-08-15 16:28, Tom de Vries wrote:

On 24-08-15 11:43, Jakub Jelinek wrote:

On Mon, Jul 28, 2014 at 11:21:53AM +0200, Tom de Vries wrote:

Jakub,

we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
openacc loop construct.

This results in an inner and outer loop being generated, with the inner loop
having a trip count of one, which means that the inner loop can be simplified to
just the inner loop body. However, subsequent optimizations do not manage to do
this simplification.

This patch sets the loop exit condition to true if the chunk_size is one, to
ensure that the compiler will optimize away the inner loop.

OK for gomp4 branch?

Thanks,
- Tom



2014-07-25  Tom de Vries  

* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
chunk_size is one.


If that is still the case on the trunk, the patch is ok for trunk after
retesting it.  Please mention the PR tree-optimization/65468 in the
ChangeLog entry and make sure there is some runtime testcase that tests
that code path (both OpenMP and OpenACC one).



Committed attached patch to trunk.

I'll look into openacc testcase for trunk.



Committed as attached.

Thanks,
- Tom


Add libgomp.oacc-c-c++-common/vector-loop.c

2015-08-24  Tom de Vries  

	PR tree-optimization/65468
	* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: New test.
---
 .../libgomp.oacc-c-c++-common/vector-loop.c| 33 ++
 1 file changed, 33 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
new file mode 100644
index 000..cc915a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include 
+
+#define N 1024
+
+unsigned int a[N];
+unsigned int b[N];
+unsigned int c[N];
+unsigned int n = N;
+
+int
+main (void)
+{
+  for (unsigned int i; i < n; ++i)
+{
+  a[i] = i % 3;
+  b[i] = i % 5;
+}
+
+#pragma acc parallel vector_length (32) copyin (a,b) copyout (c)
+  {
+#pragma acc loop /* vector clause is missing, since it's not yet supported.  */
+for (unsigned int i = 0; i < n; i++)
+  c[i] = a[i] + b[i];
+  }
+
+  for (unsigned int i; i < n; ++i)
+if (c[i] != (i % 3) + (i % 5))
+  abort ();
+
+  return 0;
+}
-- 
1.9.1



Re: [PATCH][gomp4] Optimize expand_omp_for_static_chunk for chunk_size one

2015-08-24 Thread Tom de Vries

On 24-08-15 11:43, Jakub Jelinek wrote:

On Mon, Jul 28, 2014 at 11:21:53AM +0200, Tom de Vries wrote:

Jakub,

we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
openacc loop construct.

This results in an inner and outer loop being generated, with the inner loop
having a trip count of one, which means that the inner loop can be simplified to
just the inner loop body. However, subsequent optimizations do not manage to do
this simplification.

This patch sets the loop exit condition to true if the chunk_size is one, to
ensure that the compiler will optimize away the inner loop.

OK for gomp4 branch?

Thanks,
- Tom



2014-07-25  Tom de Vries  

* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
chunk_size is one.


If that is still the case on the trunk, the patch is ok for trunk after
retesting it.  Please mention the PR tree-optimization/65468 in the
ChangeLog entry and make sure there is some runtime testcase that tests
that code path (both OpenMP and OpenACC one).



Committed attached patch to trunk.

I'll look into openacc testcase for trunk.

Thanks,
- Tom

Optimize expand_omp_for_static_chunk for chunk_size one

2015-08-24  Tom de Vries  

	PR tree-optimization/65468
	* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
	chunk_size is one.

	* gcc.dg/gomp/static-chunk-size-one.c: New test.

	* testsuite/libgomp.c/static-chunk-size-one.c: New test.
---
 gcc/omp-low.c  | 11 ---
 gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c  | 18 +
 .../testsuite/libgomp.c/static-chunk-size-one.c| 23 ++
 3 files changed, 49 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
 create mode 100644 libgomp/testsuite/libgomp.c/static-chunk-size-one.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d181101..19f34ec 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7204,9 +7204,14 @@ expand_omp_for_static_chunk (struct omp_region *region,
 	  assign_stmt = gimple_build_assign (vback, t);
 	  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
 
-	  t = build2 (fd->loop.cond_code, boolean_type_node,
-		  DECL_P (vback) && TREE_ADDRESSABLE (vback)
-		  ? t : vback, e);
+	  if (tree_int_cst_equal (fd->chunk_size, integer_one_node))
+	t = build2 (EQ_EXPR, boolean_type_node,
+			build_int_cst (itype, 0),
+			build_int_cst (itype, 1));
+	  else
+	t = build2 (fd->loop.cond_code, boolean_type_node,
+			DECL_P (vback) && TREE_ADDRESSABLE (vback)
+			? t : vback, e);
 	  gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
 	}
 
diff --git a/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c b/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
new file mode 100644
index 000..e82de77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-optimized -fno-tree-pre" } */
+
+int
+bar ()
+{
+  int a = 0, i;
+
+#pragma omp parallel for num_threads (3) reduction (+:a) schedule(static, 1)
+  for (i = 0; i < 10; i++)
+a += i;
+
+  return a;
+}
+
+/* Two phis for reduction, one in loop header, one in loop exit.  One phi for iv
+   in loop header.  */
+/* { dg-final { scan-tree-dump-times "PHI" 3 "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/static-chunk-size-one.c b/libgomp/testsuite/libgomp.c/static-chunk-size-one.c
new file mode 100644
index 000..9ed7b83
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/static-chunk-size-one.c
@@ -0,0 +1,23 @@
+extern void abort ();
+
+int
+bar ()
+{
+  int a = 0, i;
+
+#pragma omp parallel for num_threads (3) reduction (+:a) schedule(static, 1)
+  for (i = 0; i < 10; i++)
+a += i;
+
+  return a;
+}
+
+int
+main (void)
+{
+  int res;
+  res = bar ();
+  if (res != 45)
+abort ();
+  return 0;
+}
-- 
1.9.1



Re: [PATCH][gomp4] Optimize expand_omp_for_static_chunk for chunk_size one

2015-08-24 Thread Jakub Jelinek
On Mon, Jul 28, 2014 at 11:21:53AM +0200, Tom de Vries wrote:
> Jakub,
> 
> we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
> openacc loop construct.
> 
> This results in an inner and outer loop being generated, with the inner loop
> having a trip count of one, which means that the inner loop can be simplified 
> to
> just the inner loop body. However, subsequent optimizations do not manage to 
> do
> this simplification.
> 
> This patch sets the loop exit condition to true if the chunk_size is one, to
> ensure that the compiler will optimize away the inner loop.
> 
> OK for gomp4 branch?
> 
> Thanks,
> - Tom

> 2014-07-25  Tom de Vries  
> 
>   * omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
>   chunk_size is one.

If that is still the case on the trunk, the patch is ok for trunk after
retesting it.  Please mention the PR tree-optimization/65468 in the
ChangeLog entry and make sure there is some runtime testcase that tests
that code path (both OpenMP and OpenACC one).

> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index b188e2d..5a73986 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -7345,9 +7345,14 @@ expand_omp_for_static_chunk (struct omp_region *region,
> stmt = gimple_build_assign (vback, t);
> gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>  
> -   t = build2 (fd->loop.cond_code, boolean_type_node,
> -   DECL_P (vback) && TREE_ADDRESSABLE (vback)
> -   ? t : vback, e);
> +   if (tree_int_cst_equal (fd->chunk_size, integer_one_node))
> + t = build2 (EQ_EXPR, boolean_type_node,
> + build_int_cst (itype, 0),
> + build_int_cst (itype, 1));
> +   else
> + t = build2 (fd->loop.cond_code, boolean_type_node,
> + DECL_P (vback) && TREE_ADDRESSABLE (vback)
> + ? t : vback, e);
> gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
>   }
>  


Jakub


[PATCH][gomp4] Optimize expand_omp_for_static_chunk for chunk_size one

2014-07-28 Thread Tom de Vries
Jakub,

we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
openacc loop construct.

This results in an inner and outer loop being generated, with the inner loop
having a trip count of one, which means that the inner loop can be simplified to
just the inner loop body. However, subsequent optimizations do not manage to do
this simplification.

This patch sets the loop exit condition to true if the chunk_size is one, to
ensure that the compiler will optimize away the inner loop.

OK for gomp4 branch?

Thanks,
- Tom
2014-07-25  Tom de Vries  

	* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
	chunk_size is one.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b188e2d..5a73986 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7345,9 +7345,14 @@ expand_omp_for_static_chunk (struct omp_region *region,
 	  stmt = gimple_build_assign (vback, t);
 	  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
 
-	  t = build2 (fd->loop.cond_code, boolean_type_node,
-		  DECL_P (vback) && TREE_ADDRESSABLE (vback)
-		  ? t : vback, e);
+	  if (tree_int_cst_equal (fd->chunk_size, integer_one_node))
+	t = build2 (EQ_EXPR, boolean_type_node,
+			build_int_cst (itype, 0),
+			build_int_cst (itype, 1));
+	  else
+	t = build2 (fd->loop.cond_code, boolean_type_node,
+			DECL_P (vback) && TREE_ADDRESSABLE (vback)
+			? t : vback, e);
 	  gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
 	}