Re: [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels

2014-11-25 Thread Tom de Vries

On 15-11-14 18:23, Tom de Vries wrote:

On 15-11-14 13:14, Tom de Vries wrote:

Hi,

I'm submitting a patch series with initial support for the oacc kernels
directive.

The patch series uses pass_parallelize_loops to implement parallelization of
loops in the oacc kernels region.

The patch series consists of these 8 patches:
...
 1  Expand oacc kernels after pass_build_ealias
 2  Add pass_oacc_kernels
 3  Add pass_ch_oacc_kernels to pass_oacc_kernels
 4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
 5  Add pass_loop_im to pass_oacc_kernels
 6  Add pass_ccp to pass_oacc_kernels
 7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
 8  Do simple omp lowering for no address taken var
...


This patch adds:
- a specialized version of pass_parallelize_loops called
 pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
- relevant test-cases.

The pass only handles loops that are in a kernels region, and skips over bits of
pass_parallelize_loops that are already done for oacc kernels.

The pass reintroduces the use of omp_expand_local, I haven't managed to make it
work yet using the external pass pass_expand_omp_ssa.

An obvious limitation of the patch is the fact that we copy over the clauses
from the kernels directive to the generated parallel directive. We'll need to do
something more intelligent here, f.i. setting vector_length based on the
parallelization factor.

Another limitation is that the pass still needs -ftree-parallelize-loops to
trigger.



Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
- Tom

[PATCH 7/7] Add pass_parloops_oacc_kernels to pass_oacc_kernels

2014-11-25  Tom de Vries  

	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
	pass_oacc_kernels.  Move pass_expand_omp_ssa into pass group
	pass_oacc_kernels.
	* tree-parloops.c (create_parallel_loop): Add function parameters
	region_entry and bool oacc_kernels_p.  Handle oacc_kernels_p.
	(gen_parallel_loop): Same.  Use omp_expand_local if oacc_kernels_p.
	Call create_parallel_loop with additional args.
	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
	dominance info.  Skip loops that are not in a kernels region. Call
	gen_parallel_loop with additional args.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.

	* testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test.
	* testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test.

	* gcc.dg/oacc-kernels-2.c: New test.
	* gcc.dg/oacc-kernels.c: New test.
---
 gcc/passes.def |   1 +
 gcc/testsuite/gcc.dg/oacc-kernels-2.c  |  79 +++
 gcc/testsuite/gcc.dg/oacc-kernels.c|  71 ++
 gcc/tree-parloops.c| 242 -
 gcc/tree-pass.h|   2 +
 .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c  |  65 ++
 .../testsuite/libgomp.oacc-c/oacc-kernels-run.c|  59 +
 7 files changed, 464 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c

diff --git a/gcc/passes.def b/gcc/passes.def
index fb0d331..d91283b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_tree_loop_init);
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_copy_prop);
+  	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
new file mode 100644
index 000..1ff4bad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include 
+#include 
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));

[PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels

2014-11-15 Thread Tom de Vries

On 15-11-14 13:14, Tom de Vries wrote:

Hi,

I'm submitting a patch series with initial support for the oacc kernels 
directive.

The patch series uses pass_parallelize_loops to implement parallelization of
loops in the oacc kernels region.

The patch series consists of these 8 patches:
...
 1  Expand oacc kernels after pass_build_ealias
 2  Add pass_oacc_kernels
 3  Add pass_ch_oacc_kernels to pass_oacc_kernels
 4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
 5  Add pass_loop_im to pass_oacc_kernels
 6  Add pass_ccp to pass_oacc_kernels
 7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
 8  Do simple omp lowering for no address taken var
...


This patch adds:
- a specialized version of pass_parallelize_loops called
pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
- relevant test-cases.

The pass only handles loops that are in a kernels region, and skips over bits of 
pass_parallelize_loops that are already done for oacc kernels.


The pass reintroduces the use of omp_expand_local, I haven't managed to make it 
work yet using the external pass pass_expand_omp_ssa.


An obvious limitation of the patch is the fact that we copy over the clauses 
from the kernels directive to the generated parallel directive. We'll need to do 
something more intelligent here, f.i. setting vector_length based on the 
parallelization factor.


Another limitation is that the pass still needs -ftree-parallelize-loops to 
trigger.

OK for trunk?

Thanks,
- Tom

2014-11-14  Tom de Vries  

	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
	pass_oacc_kernels.  Move pass_expand_omp_ssa into pass group
	pass_oacc_kernels.
	* tree-parloops.c (create_parallel_loop): Add function parameters
	region_entry and bool oacc_kernels_p.  Handle oacc_kernels_p.
	(gen_parallel_loop): Same.  Use omp_expand_local if oacc_kernels_p.
	Call create_parallel_loop with additional args.
	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
	dominance info.  Skip loops that are not in a kernels region. Call
	gen_parallel_loop with additional args.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.

	* testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test.
	* testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test.

	* gcc.dg/oacc-kernels-2.c: New test.
	* gcc.dg/oacc-kernels.c: New test.
---
 gcc/passes.def |   3 +-
 gcc/testsuite/gcc.dg/oacc-kernels-2.c  |  79 +++
 gcc/testsuite/gcc.dg/oacc-kernels.c|  71 ++
 gcc/tree-parloops.c| 242 -
 gcc/tree-pass.h|   2 +
 .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c  |  65 ++
 .../testsuite/libgomp.oacc-c/oacc-kernels-run.c|  59 +
 7 files changed, 465 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c

diff --git a/gcc/passes.def b/gcc/passes.def
index cd9443c..cc09ba9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -80,9 +80,10 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_tree_loop_init);
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_ccp);
+  	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
-	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
new file mode 100644
index 000..1ff4bad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include 
+#include 
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+#pragma acc kernels copyout (a[0:N])
+  {
+for (COUNTERTYPE i = 0; i < N; i++)
+