[PATCH] amdgcn: Add gfx90c target

2024-04-25 Thread Frederik Harwath

Hi Andrew,
this patch adds support for gfx90c GCN5 APU integrated graphics devices.
The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html)
lists those devices as unsupported by rocm-amdhsa.
As we have discussed elsewhere, I have tested the patch on an AMD Ryzen
5 5500U (also with different xnack settings) that I have and it passes
most libgomp offloading tests.
Although those APUs are very constrainted compared to dGPUs, I think
they might be interesting for learning, experimentation, and testing.


Can I commit the patch to the master branch?

Best regards,
Frederik
From 809e2a0248e6fad1e8336b4a883a729017cc62e5 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 24 Apr 2024 20:29:14 +0200
Subject: [PATCH] amdgcn: Add gfx90c target

Add support for gfx90c GCN5 APU integrated graphics devices.
The LLVM AMDGPU documentation does not list those devices as supported
by rocm-amdhsa, but it passes most libgomp offloading tests.
Although they are constrainted compared to dGPUs, they might be
interesting for learning, experimentation, and testing.

gcc/ChangeLog:

	* config.gcc: Add gfx90c.
	* config/gcn/gcn-hsa.h (NO_SRAM_ECC): Likewise.
	* config/gcn/gcn-opts.h (enum processor_type): Likewise.
	(TARGET_GFX90c): New macro.
	* config/gcn/gcn.cc (gcn_option_override): Handle gfx90c.
	(gcn_omp_device_kind_arch_isa): Likewise.
	(output_file_start): Likewise.
	* config/gcn/gcn.h: Add gfx90c.
	* config/gcn/gcn.opt: Likewise.
	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90c): New macro.
	(get_arch): Handle gfx90c.
	(main): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c
	* config/gcn/t-omp-device: Add gfx90c.
	* doc/install.texi: Likewise.
	* doc/invoke.texi: Likewise.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (isa_hsa_name): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c.
	(isa_code): Handle gfx90c.
	(max_isa_vgprs): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c.

Signed-off-by: Frederik Harwath 
---
 gcc/config.gcc  | 4 ++--
 gcc/config/gcn/gcn-hsa.h| 2 +-
 gcc/config/gcn/gcn-opts.h   | 2 ++
 gcc/config/gcn/gcn.cc   | 8 
 gcc/config/gcn/gcn.h| 2 ++
 gcc/config/gcn/gcn.opt  | 3 +++
 gcc/config/gcn/mkoffload.cc | 9 +
 gcc/config/gcn/t-omp-device | 2 +-
 gcc/doc/install.texi| 4 ++--
 gcc/doc/invoke.texi | 3 +++
 libgomp/plugin/plugin-gcn.c | 9 +
 11 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5df3c52f8e9..1bf07b6eece 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4569,7 +4569,7 @@ case "${target}" in
 		for which in arch tune; do
 			eval "val=\$with_$which"
 			case ${val} in
-			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1036 | gfx1100 | gfx1103)
+			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | gfx1030 | gfx1036 | gfx1100 | gfx1103)
 # OK
 ;;
 			*)
@@ -4585,7 +4585,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 7d6e3141cea..4611bc55392 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -93,7 +93,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1036:;march=gfx1100:;march=gfx1103:;" \
 /* These match the defaults set in gcn.cc.  */ \
 "!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
-#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
+#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;march=gfx90c:;"
 
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
configuration.  The name of the attribute also changed.  */
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 49099bad7e7..1091035a69a 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -25,6 +25,7 @@ enum processor_type
   PROCESSOR_VEGA20,  // gfx906
   PROCESSOR_GFX908,
   PROCESSOR_GFX90a,
+  PROCESSOR_GFX90c,
   PROCESSOR_GFX1030,
   PROCESSOR_GFX1036,
   PROCESSOR_GFX1100,
@@ -36,6 +37,7 @@ enum processor_type
 #define TARGET_VEGA20 (gcn_arch == PROCESSOR_VEGA20)
 #define TARGET_GFX908 (gcn_arch == PROCESSOR_GFX908)
 #define TARGET_GFX90a (gcn_arch == PROCESSOR_GFX90a)
+#define TARGET_GFX90c (gcn_arch == PROCESSOR_GFX90c)
 #define TARGET_GFX1030 (gcn_arch == PROCESSOR_GFX1030)
 #define TARGET_GFX1036 (gcn_arch == PROCESSOR_GFX1036)
 #define TARGET_GFX1100 (gcn_arch == PROCESSOR_GFX1

Re: [PATCH] OpenMP: warn about iteration var modifications in loop body

2024-03-06 Thread Frederik Harwath

Ping.


The Linaro CI has kindly pointed me to two test regressions that I had
missed. I have adjust the test expectations in the updated patch which I
have attached.

Frederik


On 28.02.24 8:32 PM, Frederik Harwath wrote:

Hi,

this patch implements a warning about (some simple cases of direct)
modifications of iteration variables in OpenMP loops which are
forbidden according to the OpenMP specification. I think this can be
helpful, especially for new OpenMP users. I have implemented this
after I observed some confusion concerning this topic recently.
The check is implemented during gimplification. It reuses the
"loop_iter_var" vector in the "gimplify_omp_ctx" which was previously
only used for "doacross" handling to identify the loop iteration
variables during the gimplification of MODIFY_EXPRs in omp_for bodies.
I have only added a common C/C++ test because I don't see any special
C++ constructs for which a warning *should* be emitted and Fortran
rejects modifications of iteration variables in do loops in general.

I have run "make check" on x86_64-linux-gnu and not observed any
regressions.

Is it ok to commit this?

Best regards,
Frederik
From d4fb1710bfa1d5b66979db1f0aea2d5c68ab2264 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 27 Feb 2024 21:07:00 +
Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

OpenMP loop iteration variables may not be changed by user code in the
loop body according to the OpenMP specification.  In general, the
compiler cannot enforce this, but nevertheless simple cases in which
the user modifies the iteration variable directly in the loop body
(in contrast to, e.g., modifications through a pointer) can be recognized. A
warning should be useful, for instance, to new users of OpenMP.

This commit implements a warning about forbidden iteration var modifications
during gimplification. It reuses the "loop_iter_var" vector in the
"gimplify_omp_ctx" which was previously only used for "doacross" handling to
identify the loop iteration variables during the gimplification of MODIFY_EXPRs
in omp_for bodies.

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to
	recognize the gimplification state during which the new warning should
	be emitted. Add field "is_doacross" to distinguish the original use of
	"loop_iter_var" from its new use.
	(new_omp_context): Initialize new gimplify_omp_ctx fields.
	(gimplify_modify_expr): Emit warning if iter var is modified.
	(gimplify_omp_for): Make initialization and filling of loop_iter_var
	vector unconditional and adjust new gimplify_omp_ctx fields before
	gimplifying the omp_for body.
	(gimplify_omp_ordered): Check for do_across field in addition to
	emptiness check on loop_iter_var vector since the vector is now always
	being filled.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr92347.c: Adjust.
	* gcc.target/aarch64/sve/pr96195.c: Adjust.
	* c-c++-common/gomp/iter-var-modification.c: New test.

Signed-off-by: Frederik Harwath  
---
 gcc/gimplify.cc   |  54 +++---
 .../c-c++-common/gomp/iter-var-modification.c | 100 ++
 gcc/testsuite/gcc.dg/vect/pr92347.c   |   2 +-
 .../gcc.target/aarch64/sve/pr96195.c  |   2 +-
 4 files changed, 140 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..a74ad987cf7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -235,6 +235,8 @@ struct gimplify_omp_ctx
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool in_omp_for_body;
+  bool is_doacross;
   int defaultmap[5];
 };
 
@@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->loop_iter_var.create (0);
+  c->in_omp_for_body = false;
+  c->is_doacross = false;
+
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
 
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
+{
+  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
+  for (size_t i = 0; i < num_vars; i++)
+	{
+	  if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
+	warning_at (input_location, OPT_Wopenmp,
+			"forbidden modification of iteration variable %qE in "
+			"OpenMP loop", *to_p);
+	}
+}
+
   /* Trying to simplify a clobber using normal logic doesn't work,
  so handle it here.  */
   if (TREE_CLOBBER_P (*from_p))
@@ -15334,6 +15352,

[PATCH] OpenMP: warn about iteration var modifications in loop body

2024-02-28 Thread Frederik Harwath

Hi,

this patch implements a warning about (some simple cases of direct)
modifications of iteration variables in OpenMP loops which are forbidden
according to the OpenMP specification. I think this can be helpful,
especially for new OpenMP users. I have implemented this after I
observed some confusion concerning this topic recently.
The check is implemented during gimplification. It reuses the
"loop_iter_var" vector in the "gimplify_omp_ctx" which was previously
only used for "doacross" handling to identify the loop iteration
variables during the gimplification of MODIFY_EXPRs in omp_for bodies.
I have only added a common C/C++ test because I don't see any special
C++ constructs for which a warning *should* be emitted and Fortran
rejects modifications of iteration variables in do loops in general.

I have run "make check" on x86_64-linux-gnu and not observed any
regressions.

Is it ok to commit this?

Best regards,
Frederik
From 4944a9f94bcda9907e0118e71137ee7e192657c2 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 27 Feb 2024 21:07:00 +
Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

OpenMP loop iteration variables may not be changed by user code in the
loop body according to the OpenMP specification.  In general, the
compiler cannot enforce this, but nevertheless simple cases in which
the user modifies the iteration variable directly in the loop body
(in contrast to, e.g., modifications through a pointer) can be recognized. A
warning should be useful, for instance, to new users of OpenMP.

This commit implements a warning about forbidden iteration var modifications
during gimplification. It reuses the "loop_iter_var" vector in the
"gimplify_omp_ctx" which was previously only used for "doacross" handling to
identify the loop iteration variables during the gimplification of MODIFY_EXPRs
in omp_for bodies.

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to
	recognize the gimplification state during which the new warning should
	be emitted. Add field "is_doacross" to distinguish the original use of
	"loop_iter_var" from its new use.
	(new_omp_context): Initialize new gimplify_omp_ctx fields.
	(gimplify_modify_expr): Emit warning if iter var is modified.
	(gimplify_omp_for): Make initialization and filling of loop_iter_var
	vector unconditional and adjust new gimplify_omp_ctx fields before
	gimplifying the omp_for body.
	(gimplify_omp_ordered): Check for do_across field in addition to
	emptiness check on loop_iter_var vector since the vector is now always
	being filled.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/iter-var-modification.c: New test.

Signed-off-by: Frederik Harwath  
---
 gcc/gimplify.cc   |  54 +++---
 .../c-c++-common/gomp/iter-var-modification.c | 100 ++
 2 files changed, 138 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..a74ad987cf7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -235,6 +235,8 @@ struct gimplify_omp_ctx
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool in_omp_for_body;
+  bool is_doacross;
   int defaultmap[5];
 };
 
@@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->loop_iter_var.create (0);
+  c->in_omp_for_body = false;
+  c->is_doacross = false;
+
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
 
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
+{
+  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
+  for (size_t i = 0; i < num_vars; i++)
+	{
+	  if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
+	warning_at (input_location, OPT_Wopenmp,
+			"forbidden modification of iteration variable %qE in "
+			"OpenMP loop", *to_p);
+	}
+}
+
   /* Trying to simplify a clobber using normal logic doesn't work,
  so handle it here.  */
   if (TREE_CLOBBER_P (*from_p))
@@ -15334,6 +15352,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	  == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
   gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
 	  == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
+  int len = TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt));
+  gimplify_omp_ctxp->loop_iter_var.create (len * 2);
 
   tree c = omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CL

[PATCH 2/4] openmp: Fix initialization for 'unroll full'

2023-07-28 Thread Frederik Harwath
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.

Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.

gcc/ChangeLog:

* omp-transform-loops.cc (full_unroll): Add initialization of index 
variable.

libgomp/ChangeLog:

* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty 
pragmas.
Use -O2.
* 
testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of 
testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use 
-Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
---
 gcc/omp-transform-loops.cc  |  1 +
 .../matrix-no-directive-unroll-full-1.C | 13 +
 .../loop-transforms/matrix-no-directive-1.c |  2 +-
 .../matrix-no-directive-unroll-full-1.c |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c  |  2 ++
 .../loop-transforms/matrix-omp-for-1.c  |  2 +-
 .../loop-transforms/matrix-omp-parallel-for-1.c |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c |  2 ++
 .../matrix-omp-parallel-masked-taskloop-simd-1.c|  2 ++
 .../matrix-omp-target-parallel-for-1.c  |  2 +-
 ...rix-omp-target-teams-distribute-parallel-for-1.c |  2 ++
 .../loop-transforms/matrix-omp-taskloop-1.c |  2 ++
 .../matrix-omp-teams-distribute-parallel-for-1.c|  2 ++
 .../loop-transforms/matrix-simd-1.c |  2 ++
 .../libgomp.c-c++-common/loop-transforms/unroll-1.c |  8 +---
 .../loop-transforms/unroll-non-rect-1.c |  2 ++
 16 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 517faea537c..275a5260dae 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -548,6 +548,7 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx 
*ctx ATTRIBUTE_UNUSED)

   gimple_seq unrolled = NULL;
   gimple_seq_add_seq (, gimple_omp_for_pre_body (omp_for));
+  gimplify_assign (index, init, );
   push_gimplify_context ();
   gimple_seq_add_seq (,
  build_unroll_body (body, unroll_factor, index, incr));
diff --git 
a/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
new file mode 100644
index 000..3a684219627
--- /dev/null
+++ 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
@@ -0,0 +1,13 @@
+/* { dg-additional-options { -O0 -fdump-tree-original -Wall 
-Wno-unknown-pragmas } } */
+
+#define COMMON_DIRECTIVE
+#define COMMON_TOP_TRANSFORM omp unroll full
+#define COLLAPSE_1
+#define COLLAPSE_2
+#define COLLAPSE_3
+#define IMPLEMENTATION_FILE 
"../../libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h"
+
+#include 
"../../libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */
diff --git 

[PATCH 3/4] openmp: Fix diagnostic message for "omp unroll"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (print_optimized_unroll_partial_msg):
Output "omp unroll partial" instead of "omp unroll auto".
(optimize_transformation_clauses): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
---
 gcc/omp-transform-loops.cc| 4 ++--
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90   | 2 +-
 .../testsuite/libgomp.fortran/loop-transforms/unroll-6.f90| 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 275a5260dae..c8853bcee89 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -1423,7 +1423,7 @@ print_optimized_unroll_partial_msg (tree c)
   tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
   "replaced consecutive % directives by "
-  "%\n", tree_to_uhwi (unroll_factor));
 }

@@ -1483,7 +1483,7 @@ optimize_transformation_clauses (tree clauses)

  dump_printf_loc (
  MSG_OPTIMIZED_LOCATIONS, dump_loc,
- "removed useless % directives "
+ "removed useless % directives "
  "preceding 'omp unroll full'\n");
}
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
index fd687890ee6..dab3f0fb5cf 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -5,7 +5,7 @@ subroutine test1
   implicit none
   integer :: i
   !$omp parallel do collapse(1)
-  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll auto\(24\)'} }
+  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll partial\(24\)'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
index 928ca44e811..91e13ff1b37 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -4,7 +4,7 @@
 subroutine test1
   implicit none
   integer :: i
-  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' 
directives preceding 'omp unroll full'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
index 1df8ce8d5bb..b953ce31b5b 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
@@ -22,7 +22,7 @@ contains

 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
-!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll auto\(50\)'} }
+!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll partial\(50\)'} }
 !$omp unroll partial(10)
 do i = 1,n,step
sum = sum + 1
@@ -36,7 +36,7 @@ contains
 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
 do i = 1,n,step
-   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll 
partial' directives preceding 'omp unroll full'} }
!$omp unroll partial(10)
do j = 1, 1000
   sum = sum + 1
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (gomp_for_number_of_iterations):
Always compute "final - init" and do not take absolute value.
Identify non-iterating and infinite loops for constant init,
final, step values for better diagnostic messages, consistent
behaviour in those corner cases, and better testability.
(gomp_for_constant_iterations_p): Add new argument to pass
on information about infinite loops, and ...
(full_unroll): ... use from here to emit a warning and remove
unrolled, known infinite loops consistently.
(process_omp_for): Only print dump message if loop has not
been removed by transformation.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-8.c: New test.
---
 gcc/omp-transform-loops.cc| 94 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 +++
 2 files changed, 146 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index c8853bcee89..b0645397641 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -153,20 +153,27 @@ subst_defs (tree expr, gimple_seq seq)
   return expr;
 }

-/* Return an expression for the number of iterations of the outermost loop of
-   OMP_FOR. */
+/* Return an expression for the number of iterations of the loop at
+   the given LEVEL of OMP_FOR.
+
+   If the expression is a negative constant, this means that the loop
+   is infinite. This can only be recognized for loops with constant
+   initial, final, and step values.  In general, according to the
+   OpenMP specification, the behaviour is unspecified if the number of
+   iterations does not fit the types used for their computation, and
+   hence in particular if the loop is infinite. */

 tree
 gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
 {
   gcc_assert (!non_rectangular_p (omp_for));
-
   tree init = gimple_omp_for_initial (omp_for, level);
   tree final = gimple_omp_for_final (omp_for, level);
   tree_code cond = gimple_omp_for_cond (omp_for, level);
   tree index = gimple_omp_for_index (omp_for, level);
   tree type = gomp_for_iter_count_type (index, final);
-  tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1);
+  tree incr = gimple_omp_for_incr (omp_for, level);
+  tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);

   init = subst_defs (init, gimple_omp_for_pre_body (omp_for));
   init = fold (init);
@@ -181,34 +188,64 @@ gomp_for_number_of_iterations (const gomp_for *omp_for, 
size_t level)
   diff_type = ptrdiff_type_node;
 }

-  tree diff;
-  if (cond == GT_EXPR)
-diff = fold_build2 (minus_code, diff_type, init, final);
-  else if (cond == LT_EXPR)
-diff = fold_build2 (minus_code, diff_type, final, init);
-  else
-gcc_unreachable ();

-  diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step);
-  diff = fold_build1 (ABS_EXPR, type, diff);
+  /* Identify a simple case in which the loop does not iterate. The
+ computation below could not tell this apart from an infinite
+ loop, hence we handle this separately for better diagnostic
+ messages. */
+  gcc_assert (cond == GT_EXPR || cond == LT_EXPR);
+  if (TREE_CONSTANT (init) && TREE_CONSTANT (final)
+  && ((cond == GT_EXPR && tree_int_cst_le (init, final))
+ || (cond == LT_EXPR && tree_int_cst_le (final, init
+return build_int_cst (diff_type, 0);
+
+  tree diff = fold_build2 (minus_code, diff_type, final, init);
+
+  /* Divide diff by the step.
+
+ We could always use CEIL_DIV_EXPR since only non-negative results
+ correspond to valid number of iterations and the behaviour is
+ unspecified by the spec otherwise. But we try to get the rounding
+ right for constant negative values to identify infinite loops
+ more precisely for better warnings. */
+  tree_code div_expr = CEIL_DIV_EXPR;
+  if (TREE_CONSTANT (diff) && TREE_CONSTANT (step))
+{
+  bool diff_is_neg = tree_int_cst_lt (diff, size_zero_node);
+  bool step_is_neg = tree_int_cst_lt (step, size_zero_node);
+  if ((diff_is_neg && !step_is_neg)
+ || (!diff_is_neg && step_is_neg))
+   div_expr = FLOOR_DIV_EXPR;
+}

+  diff = fold_build2 (div_expr, type, diff, step);
   return diff;
 }

-/* Return true if the expression representing the number of iterations for
-   OMP_FOR is a constant expression, false otherwise. */
+/* Return true if the expression representing the number of iterations
+   for OMP_FOR is a non-negative constant and set ITERATIONS to the
+   value of that expression. Otherwise, return false.  Set INFINITE to
+   true if the number of iterations was recognized to be infinite. */

 bool
 gomp_for_constant_iterations_p (gomp_for *omp_for,
-   unsigned HOST_WIDE_INT *iterations)
+ 

[PATCH 0/4] openmp: loop transformation fixes

2023-07-28 Thread Frederik Harwath
Hi,
the following patches contain some fixes from the devel/omp/gcc-13 branch
to the patches that implement the OpenMP 5.1. loop transformation directives
which I have posted in March 2023.

Frederik



Frederik Harwath (4):
  openmp: Fix loop transformation tests
  openmp: Fix initialization for 'unroll full'
  openmp: Fix diagnostic message for "omp unroll"
  openmp: Fix number of iterations computation for "omp unroll full"

 gcc/omp-transform-loops.cc| 99 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 ++
 .../gomp/loop-transforms/unroll-8.f90 |  2 +-
 .../gomp/loop-transforms/unroll-9.f90 |  2 +-
 .../matrix-no-directive-unroll-full-1.C   | 13 +++
 .../loop-transforms/matrix-no-directive-1.c   |  2 +-
 .../matrix-no-directive-unroll-full-1.c   |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c|  2 +
 .../loop-transforms/matrix-omp-for-1.c|  2 +-
 .../matrix-omp-parallel-for-1.c   |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c   |  2 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |  2 +
 .../matrix-omp-target-parallel-for-1.c|  2 +-
 ...p-target-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |  2 +
 ...trix-omp-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-simd-1.c   |  2 +
 .../loop-transforms/unroll-1.c|  8 +-
 .../loop-transforms/unroll-non-rect-1.c   |  2 +
 .../loop-transforms/tile-2.f90|  2 +-
 .../loop-transforms/unroll-1.f90  |  2 +
 .../loop-transforms/unroll-6.f90  |  4 +-
 .../loop-transforms/unroll-simd-1.f90 |  3 +-
 23 files changed, 197 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 1/4] openmp: Fix loop transformation tests

2023-07-28 Thread Frederik Harwath
libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction 
clause.
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize 
var.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add 
reduction
and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j

 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git 
a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions

 integer :: i,j

-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-17 Thread Frederik Harwath via Gcc-patches

Hi Jakub,

On 16.05.23 13:00, Jakub Jelinek wrote:

On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:

The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two
compilers (another one and GCC with this patch series) transformed 
the loops
in the middle end after the handling of data sharing, one planned to 
do so.
Yet another vendor had not yet decided where it will be implemented. 
Clang
currently does everything in the front end, but it was mentioned that 
this

might change in the future e.g. for code sharing with Flang. Implementing
the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments
of the data sharing clauses, but this is known and consequentially, 
no such

When already in the FE we determine how many canonical loops a particular
loop transformation creates, I think the primary changes I'd like to 
see is
really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and 
consider

where is the best spot to lower it. I believe for data sharing it is best
done during gimplification before the containing loops are handled, it is
already shared code among all the FEs, I think will make it easier to 
handle

data sharing right and gimplification is also where doacross processing is
done. While there is restriction that ordered clause is incompatible with
generated loops from tile construct, there isn't one for unroll (unless
"The ordered clause must not appear on a worksharing-loop directive if 
the associated loops

include the generated loops of a tile directive."
means unroll partial implicitly because partial unroll tiles the loop, but
it doesn't say it acts as if it was a tile construct), so we'd have to 
handle

#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
#pragma omp unroll partial(4)
for (int j = 0; j < 64; j++)
{
#pragma omp ordered depend (sink: i - 1, j - 2)
#pragma omp ordered depend (source)
}
and I think handling it after gimplification is going to be increasingly
harder. Of course another possibility is ask lang committee to clarify
unless it has been clarified already in 6.0 (but in TR11 it is not).


I do not really expect that we will have to handle this. Questions 
concerning

the correctness of code after applying loop transformations came up several
times since I have been following the design meetings and the result was
always either that nothing will be changed, because the loop transformations
are not expected to ensure the correctness of enclosing directives, or that
the use of the problematic construct in conjunction with loop 
transformations

will be forbidden. Concerning the use of "ordered" on transformed loops, the
latter approach was suggested for all transformations, cf. issue #3494 
in the
private OpenMP spec repository. I see that you have already asked for 
clarification

on unroll. I suppose this could also be fixed after gimplification with
reasonable effort. But let's just wait for the result of that discussion 
before we

continue worrying about this.


Also, I think creating temporaries is easier to be done during
gimplification than later.


This has not caused problems with the current approach.


Another option is as you implemented a separate pre-omp-lowering pass,
and another one would be do it in the omplower pass, which has actually
several subpasses internally, do it in the scan phase. Disadvantage of
a completely separate pass is that we have to walk the whole IL again,
while doing it in the scan phase means we avoid that cost. We already
do there similar transformations, scan_omp_simd transforms simd constructs
into if (...) simd else simt and then we process it with normal 
scan_omp_for

on what we've created. So, if you insist doing it after gimplification
perhaps for compatibility with other non-LLVM compilers, I'd prefer to
do it there rather than in a completely separate pass.


I see. This would be possible. My current approach is indeed rather
wasteful because the pass is not restricted to functions that actually
use loop transformations. I could add an attribute to such functions
that could be used to avoid the execution of the pass and hence
the gimple walk on functions that do not use transformations.


This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the

I really don't see why. We try to represent what we see in the source
as OpenMP constructs as those constructs. We already have a precedent
with composite loop constructs, where for the combined constructs which
aren't innermost we temporari

Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-16 Thread Frederik Harwath via Gcc-patches

Hi Jakub,

On 15.05.23 12:19, Jakub Jelinek wrote:

On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote:

this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.

I'm afraid we can't do it this way, at least not completely.

The OpenMP requirements and what is being discussed for further loop
transformations pretty much requires parts of it to be done as soon as possible.
My understanding is that that is where other implementations implement that
too and would also prefer GCC not to be the only implementation that takes
significantly different decision in that case from other implementations


The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two 
compilers (another one and GCC with this patch series) transformed the 
loops in the middle end after the handling of data sharing, one planned 
to do so. Yet another vendor had not yet decided where it will be 
implemented. Clang currently does everything in the front end, but it 
was mentioned that this might change in the future e.g. for code sharing 
with Flang. Implementing the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments of the data sharing clauses, but this is known and 
consequentially, no such transformations are planned for OpenMP 6.0. In 
particular, the "apply" clause therefore only permits loop-transforming 
constructs to be applied to the loops generated from other loop

transformations in TR11.


The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
already need to know given their collapse/ordered how many loops they are
actually associated with and the loop transformation constructs can change
that.
So, I think we need to do the loop transformations in the FEs, that doesn't
mean we need to write everything 3 times, once for each frontend.
Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
though how much can be shared between c-family and Fortran is to be
discovered.
Or at least partially, to the extent that we compute how many canonical
loops the loop transformations result in, what artificial iterators they
will use etc., so that during gimplification we can take all that into
account and then can do the actual transformations later.


The patches in this patch series already do compute how many canonical
loop nests result from the loop transformations in the front end.
This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the
loop transformations may also appear on inner loops of a collapsed
loop nest (i.e. within the collapsed depth), representing the
transformation by OMP_FOR-like constructs would imply that a collapsed
loop nest would have to be broken apart into single loops. Perhaps this
could be handled somehow, but the collapsed loop nest would have to be
re-assembled to meet the expectations of e.g. gimplification.
The clause representation is also much better suited for the upcoming
OpenMP "apply" clause where the transformations will not appear
as directives in front of actual loops but inside of other clauses.
In fact, the loop transformation clauses in the implementation already
specify the level of a loop nest to which they apply and it could
be possible to re-use this handling for "apply".

My initial reaction also was to implement the loop transformations
as OMP_FOR-like constructs and the patch actually introduces an
OMP_LOOP_TRANS construct which is used to represent loops that
are not going to be associated with another OpenMP directive after
the transformation, e.g.

void foo () {
  #pragma omp tile sizes (4, 8, 16)
  for (int i = 0; i < 64; ++i)
  {
...
  }

}

You suggest to implement the loop transformations during gimplification.
I am not sure if gimplification is actually well-suited to implement the 
depth-first evaluation of the loop transformations. I also believe that 
gimplification already handles too many things which conceptually are 
not related to the translation to GIMPLE. Having a separate pass seems 
to be the right move to achieve a better separation of concerns. I think 
this will be even more important in the future as the size of the loop 
transformation implementation keeps growing. As you mention below, 
several new constructs are already planned.



For C, I thi

[PATCH] Docs, OpenMP: Small fixes to internal OMP_FOR doc

2023-04-19 Thread Frederik Harwath via Gcc-patches

Hi Sandra,
the OMP_FOR documentation says that the loop index variable
must be signed and it does not list "!=" in the allowed conditional
expressions. But there is nothing that would automatically cast an 
unsigned variable

to signed or that converts the "!=" as you can see from the dump
for this program:

int main ()
{
#pragma omp for
for (unsigned i = 0; i != 10; i++) {}
}

The 005t.gimple dump is:

int __GIMPLE ()
{
  int D_2064;

  {
    {
  unsigned int i;

  #pragma omp for private(i)
  for (i = 0u; i != 10u; i = i + 1u)
    }
  }
  D_2064 = 0;
  return D_2064;
}

(Strictly speaking, the OMP_FOR is represented as a gomp_for at this point,
but this does not really matter.)

Can I commit the patch?

Best regards,
Frederik
From 8af01114c295086526a67f56f6256fc945b1ccb5 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 19 Apr 2023 13:18:55 +0200
Subject: [PATCH] Docs, OpenMP: Small fixes to internal OMP_FOR doc.

gcc/ChangeLog:

	* doc/generic.texi (OpenMP): Add != to allowed
	conditions and state that vars can be unsigned.

	* tree.def (OMP_FOR): Likewise.
---
 gcc/doc/generic.texi | 4 ++--
 gcc/tree.def | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 2c14b7abce2..8b2882da4fe 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -2323,7 +2323,7 @@ Operand @code{OMP_FOR_INIT} is a vector containing iteration
 variable initializations of the form @code{VAR = N1}.
 
 Operand @code{OMP_FOR_COND} is vector containing loop
-conditional expressions of the form @code{VAR @{<,>,<=,>=@} N2}.
+conditional expressions of the form @code{VAR @{<,>,<=,>=,!=@} N2}.
 
 Operand @code{OMP_FOR_INCR} is a vector containing loop index
 increment expressions of the form @code{VAR @{+=,-=@} INCR}.
@@ -2349,7 +2349,7 @@ adjust their data-sharing attributes and diagnose errors.
 @code{OMP_FOR_ORIG_DECLS} is a vector field, with each element holding
 a list of @code{VAR_DECLS} for the corresponding collapse level.
 
-The loop index variable @code{VAR} must be a signed integer variable,
+The loop index variable @code{VAR} must be an integer variable,
 which is implicitly private to each thread.  For rectangular loops,
 the bounds @code{N1} and @code{N2} and the increment expression
 @code{INCR} are required to be loop-invariant integer expressions
diff --git a/gcc/tree.def b/gcc/tree.def
index ee02754354f..90ceeec0b51 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1159,7 +1159,7 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_statement, 2)
variable initializations of the form VAR = N1.
 
Operand 3: OMP_FOR_COND is vector containing loop
-   conditional expressions of the form VAR {<,>,<=,>=} N2.
+   conditional expressions of the form VAR {<,>,<=,>=,!=} N2.
 
Operand 4: OMP_FOR_INCR is a vector containing loop index
increment expressions of the form VAR {+=,-=} INCR.
@@ -1185,7 +1185,7 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_statement, 2)
OMP_FOR_ORIG_DECLS is a vector field, with each element holding
a list of VAR_DECLS for the corresponding collapse level.
 
-   The loop index variable VAR must be a signed integer variable,
+   The loop index variable VAR must be an integer variable,
which is implicitly private to each thread.  For rectangular loops,
the bounds N1 and N2 and the increment expression
INCR are required to be loop-invariant integer expressions
-- 
2.36.1



Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive

2023-04-06 Thread Frederik Harwath via Gcc-patches

Hi Thomas,

On 01.04.23 10:42, Thomas Schwinge wrote:

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):
   

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution 
test



Thank you for reporting the failures! They are caused by mistakes in the 
test code, not the implementation. I have attached a patch which fixes 
the failures.


I have been able to reproduce the failures with -m32. With the patch 
they went away, even with 100 of repeated test executions ;-).



Best regards,

Frederik
From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 6 Apr 2023 14:52:07 +0200
Subject: [PATCH] openmp: Fix loop transformation tests

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
	* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
	* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
	and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j
 
 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions
 
 integer :: i,j
 
-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
-- 
2.36.1



[PATCH 7/7] openmp: Add C/C++ support for loop transformations on inner loops

2023-03-24 Thread Frederik Harwath
Add the parsing of loop transformations on inner loops of a loop-nest.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(c_parser_omp_tile): ... adjust use here,
(c_parser_omp_unroll): ... and here,
(c_parser_omp_for_loop): ... and here.  Stop treating loop
transformations like intervening code, parse them, and adjust
the loop-nest depth if necessary for tiling.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_is_pragma): New function.
(cp_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(cp_parser_omp_tile): ... adjust use here,
(cp_parser_omp_unroll): ... and here,
(cp_parser_omp_for_loop): ... and here.  Stop treating loop

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.

libgomp/ChangeLog
* testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
matrix-* tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
New header file for new tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
New test.
---
 gcc/c/c-parser.cc |  35 +++-
 gcc/cp/parser.cc  |  88 ++--
 .../loop-transforms/imperfect-loop-nest.c |  12 ++
 .../gomp/loop-transforms/unroll-inner-1.c |  15 ++
 .../gomp/loop-transforms/unroll-inner-2.c |  31 +++
 .../gomp/loop-transforms/unroll-non-rect-1.c  |  37 
 .../gomp/loop-transforms/unroll-non-rect-2.c  |  22 ++
 .../libgomp.c++/loop-transforms/tile-1.C  |  52 -
 .../loop-transforms/matrix-1.h|  70 +++
 .../loop-transforms/matrix-constant-iter.h|  71 +++
 .../loop-transforms/matrix-helper.h   |  19 ++
 .../loop-transforms/matrix-no-directive-1.c   |  11 +
 .../matrix-no-directive-unroll-full-1.c   |  13 ++
 .../matrix-omp-distribute-parallel-for-1.c|   6 +
 .../loop-transforms/matrix-omp-for-1.c|  13 ++
 .../matrix-omp-parallel-for-1.c   |  13 ++
 .../matrix-omp-parallel-masked-taskloop-1.c   |   6 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |   6 +
 .../matrix-omp-target-parallel-for-1.c|  13 ++
 ...p-target-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |   6 +
 ...trix-omp-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-simd-1.c   |   6 +
 .../matrix-transform-variants-1.h | 191 ++
 .../loop-transforms/unroll-non-rect-1.c   | 129 
 25 files changed, 801 insertions(+), 76 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
 delete mode 100644 

[PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops

2023-03-24 Thread Frederik Harwath
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops.  The transformation clauses are extended to
carry the information about the level of the loop nest at which a
transformation should be applied.  The middle end transformation pass
is adjusted to apply the transformations at the correct level of a
loop nest and to take their effect on the loop nest depth into
account.

gcc/fortran/ChangeLog:

* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.

gcc/ChangeLog:

* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be 
transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.

gcc/cp/ChangeLog

* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.

gcc/c/ChangeLog

* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.

libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/cp/parser.cc  |  12 +-
 gcc/cp/pt.cc  |  12 +-
 gcc/fortran/openmp.cc | 173 --
 gcc/fortran/trans-openmp.cc   |  74 ++--
 gcc/omp-transform-loops.cc| 

[PATCH 5/7] openmp: Add C/C++ support for "omp tile"

2023-03-24 Thread Frederik Harwath
This commit adds the C and C++ front end support for the "omp tile"
directive.

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_directives): Add PRAGMA_OMP_TILE.
* c-pragma.cc (omp_pragmas_simd): Likewise.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_TILE

gcc/c/ChangeLog:

* c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
generalize ...
(c_parser_omp_nested_loop_transform_clauses): ... to this.
(c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(c_parser_omp_tile_sizes): Parse single "sizes" clause.
(c_parser_omp_loop_transform_clause): New function.
(c_parser_omp_tile): New function for parsing "omp tile"
(c_parser_omp_unroll): Adjust to renaming.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
(cp_parser_nested_omp_unroll_clauses): Rename ...
(cp_parser_omp_nested_loop_transform_clauses): ... to this.
(cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
(cp_parser_omp_tile): New function for parsing "omp tile".
(cp_parser_omp_loop_transform_clause): New  function.
(cp_parser_omp_unroll): Adjust to renaming.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
* semantics.cc (finish_omp_clauses): Likewise.

gcc/ChangeLog:

* gimplify.cc (omp_for_drop_tile_clauses): New function, ...
(gimplify_omp_for): ... used here.

libgomp/ChangeLog:

* testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/tile-1.c: New test.
* c-c++-common/gomp/loop-transforms/tile-2.c: New test.
* c-c++-common/gomp/loop-transforms/tile-3.c: New test.
* c-c++-common/gomp/loop-transforms/tile-4.c: New test.
* c-c++-common/gomp/loop-transforms/tile-5.c: New test.
* c-c++-common/gomp/loop-transforms/tile-6.c: New test.
* c-c++-common/gomp/loop-transforms/tile-7.c: New test.
* c-c++-common/gomp/loop-transforms/tile-8.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt.
* g++.dg/gomp/loop-transforms/tile-1.h: New test.
* g++.dg/gomp/loop-transforms/tile-1a.C: New test.
* g++.dg/gomp/loop-transforms/tile-1b.C: New test.
---
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   2 +
 gcc/c/c-parser.cc | 277 ---
 gcc/cp/parser.cc  | 289 +---
 gcc/cp/pt.cc  |   1 +
 gcc/cp/semantics.cc   |  40 +++
 gcc/gimplify.cc   |  28 ++
 .../gomp/loop-transforms/tile-1.c | 164 +
 .../gomp/loop-transforms/tile-2.c | 183 ++
 .../gomp/loop-transforms/tile-3.c | 117 +++
 .../gomp/loop-transforms/tile-4.c | 322 ++
 .../gomp/loop-transforms/tile-5.c | 150 
 .../gomp/loop-transforms/tile-6.c |  34 ++
 .../gomp/loop-transforms/tile-7.c |  31 ++
 .../gomp/loop-transforms/tile-8.c |  40 +++
 .../gomp/loop-transforms/unroll-2.c   |  12 +-
 .../g++.dg/gomp/loop-transforms/tile-1.h  |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1a.C |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1b.C |  27 ++
 .../libgomp.c++/loop-transforms/tile-1.C  |  52 +++
 .../libgomp.c++/loop-transforms/tile-2.C  |  69 
 .../libgomp.c++/loop-transforms/tile-3.C  |  28 ++
 23 files changed, 1823 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
 create mode 100644 

[PATCH 4/7] openmp: Add Fortran support for "omp tile"

2023-03-24 Thread Frederik Harwath
This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
(enum gfc_exec_op): Add EXEC_OMP_TILE.
(loop_transform_p): New declaration.
(struct gfc_omp_clauses): Add "tile_sizes" field.
* dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
(show_omp_node): Handle EXEC_OMP_TILE.
(show_code_node): Likewise.
* match.h (gfc_match_omp_tile): New declaration.
* openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
(match_tile_sizes): New function.
(OMP_TILE_CLAUSES): New macro.
(gfc_match_omp_tile): New function.
(resolve_omp_do): Handle EXEC_OMP_TILE.
(resolve_omp_tile): New function.
(omp_code_to_statement): Handle EXEC_OMP_TILE.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
and ST_OMP_TILE.
(next_statement): Handle ST_OMP_TILE.
(gfc_ascii_statement): Likewise.
(parse_omp_do): Likewise.
(parse_executable): Likewise.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
(loop_transform_p): New function.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Handle EXEC_OMP_TILE.
(gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 +-
 gcc/fortran/gfortran.h|   7 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 373 +-
 gcc/fortran/parse.cc  |  15 +
 gcc/fortran/resolve.cc|   3 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-openmp.cc   |  86 ++--
 gcc/fortran/trans.cc  |   1 +
 gcc/gimplify.cc   |   3 +
 gcc/omp-general.cc|   2 +-
 gcc/omp-transform-loops.cc| 340 +++-
 .../gomp/loop-transforms/tile-1.f90   | 163 
 .../gomp/loop-transforms/tile-1a.f90  |  10 +
 .../gomp/loop-transforms/tile-2.f90   |  80 
 .../gomp/loop-transforms/tile-3.f90   |  18 +
 .../gomp/loop-transforms/tile-4.f90   |  95 +
 

[PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE

2023-03-24 Thread Frederik Harwath
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".

gcc/ChangeLog:

* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
(OMP_CLAUSE_OACC_TILE_LIST): ... this.
(OMP_CLAUSE_TILE_ITERVAR): Rename to ...
(OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
(OMP_CLAUSE_TILE_COUNT): Rename to ...
(OMP_CLAUSE_OACC_TILE_COUNT): this.
* gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_for): Likewise.
* omp-general.cc (omp_extract_for_data): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_oacc_head_mark): Likewise.
* tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.cc: Likewise.

gcc/c-family/ChangeLog:

* c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
(c_parser_oacc_clause_tile): Likewise.
(c_parser_omp_for_loop): Likewise.
* c-typeck.cc (c_finish_omp_clauses): Likewise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
(cp_parser_omp_clause_collapse): Likewise.
(cp_parser_omp_for_loop): Likewise.
* pt.cc (tsubst_omp_clauses): Likewise.
* semantics.cc (finish_omp_clauses): Likewise.
(finish_omp_for): Likewise.

gcc/fortran/ChangeLog:

* openmp.cc (enum omp_mask2): Adjust to renamings.
(gfc_match_omp_clauses): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
---
 gcc/c-family/c-omp.cc   |  2 +-
 gcc/c/c-parser.cc   | 12 ++--
 gcc/c/c-typeck.cc   |  2 +-
 gcc/cp/parser.cc| 12 ++--
 gcc/cp/pt.cc|  2 +-
 gcc/cp/semantics.cc |  8 
 gcc/fortran/openmp.cc   |  6 +++---
 gcc/fortran/trans-openmp.cc |  4 ++--
 gcc/gimplify.cc |  8 
 gcc/omp-general.cc  |  8 
 gcc/omp-low.cc  |  6 +++---
 gcc/tree-core.h |  2 +-
 gcc/tree-nested.cc  |  4 ++--
 gcc/tree-pretty-print.cc|  4 ++--
 gcc/tree.cc |  2 +-
 gcc/tree.h  | 12 ++--
 16 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 85ba9c528c8..fec7f337772 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -1749,7 +1749,7 @@ c_oacc_split_loop_clauses (tree clauses, tree 
*not_loop_clauses,
 {
  /* Loop clauses.  */
case OMP_CLAUSE_COLLAPSE:
-   case OMP_CLAUSE_TILE:
+   case OMP_CLAUSE_OACC_TILE:
case OMP_CLAUSE_GANG:
case OMP_CLAUSE_WORKER:
case OMP_CLAUSE_VECTOR:
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9d875befccc..e7c9da99552 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14183,7 +14183,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tree 
list)
   location_t loc;

   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");

   loc = c_parser_peek_token (parser)->location;
   matching_parens parens;
@@ -15349,7 +15349,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   location_t loc;
   tree tile = NULL_TREE;

-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");

   loc = c_parser_peek_token (parser)->location;
@@ -15401,9 +15401,9 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   /* Consume the trailing ')'.  */
   c_parser_consume_token (parser);

-  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  c = build_omp_clause (loc, OMP_CLAUSE_OACC_TILE);
   tile = nreverse (tile);
-  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_OACC_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -20270,10 +20270,10 @@ c_parser_omp_for_loop (location_t loc, c_parser 
*parser, enum tree_code code,
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
-else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
   {
tiling = true;
-   collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+   collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
   }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 && OMP_CLAUSE_ORDERED_EXPR (cl))
diff --git 

[PATCH 2/7] openmp: Add C/C++ support for "omp unroll" directive

2023-03-24 Thread Frederik Harwath
This commit implements the C and the C++ front end changes to support
the "omp unroll" directive.  The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
* c-omp.cc: Add "unroll" to omp_directives[].
* c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
pragma_kind and adjust PRAGMA_OMP__LAST_.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_name): Handle "full" and
"partial" clauses.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(c_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(c_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
and PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(c_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.
* parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(cp_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(cp_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
OMP_CLAUSE_FULL.
(cp_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(cp_parser_omp_for_loop): Handle "omp unroll" directives
between directive and loop.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(cp_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.

(cp_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Handle PRAGMA_OMP_UNROLL.
* pt.cc (tsubst_omp_clauses): Handle
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_NONE.
(tsubst_expr): Handle OMP_UNROLL.
* semantics.cc (finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.

libgomp/ChangeLog:

* testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
* g++.dg/gomp/loop-transforms/unroll-1.C: New test.
* g++.dg/gomp/loop-transforms/unroll-2.C: New test.
* g++.dg/gomp/loop-transforms/unroll-3.C: New test.
---
 gcc/c-family/c-gimplify.cc|   1 +
 gcc/c-family/c-omp.cc |   6 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   5 +-
 gcc/c/c-parser.cc | 161 -
 gcc/c/c-typeck.cc |   8 +
 gcc/cp/cp-gimplify.cc |   3 +
 gcc/cp/parser.cc  | 164 +-
 gcc/cp/pt.cc  |   4 +
 gcc/cp/semantics.cc   |  56 ++
 .../gomp/loop-transforms/unroll-1.c   | 133 ++
 .../gomp/loop-transforms/unroll-2.c   |  99 +++
 .../gomp/loop-transforms/unroll-3.c   |  18 ++
 .../gomp/loop-transforms/unroll-4.c   |  19 ++
 .../gomp/loop-transforms/unroll-5.c   |  19 ++
 .../gomp/loop-transforms/unroll-6.c   |  20 +++
 .../gomp/loop-transforms/unroll-7.c   | 

[PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-03-24 Thread Frederik Harwath
Hi,
this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.  The "unroll" and "tile" directives are
internally implemented as clauses.  This fits the representation of
collapsed loop nests by a single internal gomp_for construct.  Loop
transformations can be applied to loops at the different levels of
such a loop nest and this can be represented well with the clause
representation.  The transformations can also be applied to loops
which are not going to be associated with any OpenMP directive after
the transformation. This is represented by a new gomp_for kind.  Loops
of this kind are lowered in the transformation pass since they are not
subject to any further OpenMP-specific processing.

The patches are roughly presented in the order of their development:
Each construct is implemented in the Fortran front end first including
the middle-end additions/changes, followed by a patch that adds the C
and C++ front end changes.  This initial implementation supports the
loop transformation constructs on the outermost loop of a loop nest
only.  The support for applying the transformations to inner loops is
then added in two further patches.

The patches have been bootstrapped and tested on x86_64-linux-gnu with
both nvptx-none and amdgcn-amdhsa offloading.

Best regards,
Frederik

Frederik Harwath (7):
  openmp: Add Fortran support for "omp unroll" directive
  openmp: Add C/C++ support for "omp unroll" directive
  openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
  openmp: Add Fortran support for "omp tile"
  openmp: Add C/C++ support for "omp tile"
  openmp: Add Fortran support for loop transformations on inner loops
  openmp: Add C/C++ support for loop transformations on inner loops

 gcc/Makefile.in   |1 +
 gcc/c-family/c-gimplify.cc|1 +
 gcc/c-family/c-omp.cc |   12 +-
 gcc/c-family/c-pragma.cc  |2 +
 gcc/c-family/c-pragma.h   |7 +-
 gcc/c/c-parser.cc |  403 +++-
 gcc/c/c-typeck.cc |   10 +-
 gcc/cp/cp-gimplify.cc |3 +
 gcc/cp/parser.cc  |  453 -
 gcc/cp/pt.cc  |   15 +-
 gcc/cp/semantics.cc   |  104 +-
 gcc/fortran/dump-parse-tree.cc|   30 +
 gcc/fortran/gfortran.h|   12 +-
 gcc/fortran/match.h   |2 +
 gcc/fortran/openmp.cc |  460 -
 gcc/fortran/parse.cc  |   52 +-
 gcc/fortran/resolve.cc|6 +
 gcc/fortran/st.cc |2 +
 gcc/fortran/trans-openmp.cc   |  187 +-
 gcc/fortran/trans.cc  |2 +
 gcc/gimple-pretty-print.cc|6 +
 gcc/gimple.h  |1 +
 gcc/gimplify.cc   |   79 +-
 gcc/omp-general.cc|   22 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|6 +-
 gcc/omp-transform-loops.cc| 1773 +
 gcc/params.opt|9 +
 gcc/passes.def|1 +
 .../loop-transforms/imperfect-loop-nest.c |   12 +
 .../gomp/loop-transforms/tile-1.c |  164 ++
 .../gomp/loop-transforms/tile-2.c |  183 ++
 .../gomp/loop-transforms/tile-3.c |  117 ++
 .../gomp/loop-transforms/tile-4.c |  322 +++
 .../gomp/loop-transforms/tile-5.c |  150 ++
 .../gomp/loop-transforms/tile-6.c |   34 +
 .../gomp/loop-transforms/tile-7.c |   31 +
 .../gomp/loop-transforms/tile-8.c |   40 +
 .../gomp/loop-transforms/unroll-1.c   |  133 ++
 .../gomp/loop-transforms/unroll-2.c   |   95 +
 .../gomp/loop-transforms/unroll-3.c   |   18 +
 .../gomp/loop-transforms/unroll-4.c   |   19 +
 .../gomp/loop-transforms/unroll-5.c   |   19 +
 .../gomp/loop-transforms/unroll-6.c   |   20 +
 .../gomp/loop-transforms/unroll-7.c   |  144 ++
 .../gomp/loop-transforms/unroll-inner-1.c |   15 +
 .../gomp/loop-transforms/unroll-inner-2.c |   31 +
 .../gomp/loop-transforms/unroll-non-rect-1.c  |   37 +
 .../gomp/loop-transforms/unroll-non-rect-2.c  |   22 +
 .../gomp/loop-transforms/unroll-simd-1.c  |   84 +
 .../g++.dg/gomp/loop-transforms/tile-1.h  |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1a.C |   27 +
 .../g++.

Re: [PATCH 15/40] graphite: Extend SCoP detection dump output

2022-05-18 Thread Harwath, Frederik
Hi Richard,

On Tue, 2022-05-17 at 08:21 +, Richard Biener wrote:
> On Mon, 16 May 2022, Tobias Burnus wrote:
>
> > As requested by Richard: Rediffed patch.
> >
> > Changes: s/.c/.cc/ + some whitespace changes.
> > (At least in my email reader, some  were lost. I also fixed
> > too-long line
> > issues.)
> >
> > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> > (macro was removed late in GCC 12 development ? r12-2605-
> > ge41ba804ba5f5c)
> >
> > Otherwise, it should be identical to Frederik's patch, earlier in
> > this thread.
> >
> > On 15.12.21 16:54, Frederik Harwath wrote:
> > > Extend dump output to make understanding why Graphite rejects to
> > > include a loop in a SCoP easier (for GCC developers).
> >
> > OK for mainline?
>
> +  if (printed)
> +fprintf (file, "\b\b");
>
> please find other means of omitting ", ", like by printing it
> _before_ the number but only for the second and following loop
> number.

Done.

>
> I'll also note that
>
> +static void
> +print_sese_loop_numbers (FILE *file, sese_l sese)
> +{
> +  bool printed = false;
> +  for (auto loop : loops_list (cfun, 0))
> +{
> +  if (loop_in_sese_p (loop, sese))
> +   fprintf (file, "%d, ", loop->num);
> +  printed = true;
> +}
>
> is hardly optimal.  Please instead iterate over
> sese.entry->dest->loop_father and children instead which you can do
> by passing that as extra argument to loops_list.

Done.

This had to be extended a little bit, because a SCoP
can consist of consecutive loop-nests and iterating
only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest-
>loop_father))" would output only the loops from the first
loop-nest in the SCoP (cf. the test file scop-22a.c that I added).

>
> +
> +  if (dump_file && dump_flags & TDF_DETAILS)
> +{
> +  fprintf (dump_file, "Loops in SCoP: ");
> +  for (auto loop : loops_list (cfun, 0))
> +   if (loop_in_sese_p (loop, s))
> + fprintf (dump_file, "%d ", loop->num);
> +  fprintf (dump_file, "\n");
> +}
>
> you are duplicating functionality of the function you just added ...
>

Fixed.

> Otherwise looks OK to me.

Can I commit the revised patch?

Thanks for your review,
Frederik

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
From fb268a37704b1598a84051c735514ff38adad038 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 18 May 2022 07:59:42 +0200
Subject: [PATCH] graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

gcc/ChangeLog:

	* graphite-scop-detection.cc (scop_detection::can_represent_loop):
	Output reason for failure to dump file.
	(scop_detection::harmful_loop_in_region): Likewise.
	(scop_detection::graphite_can_represent_expr): Likewise.
	(scop_detection::stmt_has_simple_data_refs_p): Likewise.
	(scop_detection::stmt_simple_for_scop_p): Likewise.
	(print_sese_loop_numbers): New function.
	(scop_detection::add_scop): Use from here.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-22a.c: New test.
---
 gcc/graphite-scop-detection.cc   | 184 ---
 gcc/testsuite/gcc.dg/graphite/scop-22a.c |  56 +++
 2 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-22a.c

diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc
index 8c0ee9975579..9792d87ee0ae 100644
--- a/gcc/graphite-scop-detection.cc
+++ b/gcc/graphite-scop-detection.cc
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer , const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer , gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer , tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;
 
 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,27 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }
 
+/* Print the loop numbers of the loops contained in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  b

[PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling

2021-12-16 Thread Frederik Harwath

Adjust the testsuite to changed expectations with the new
Graphite-based "kernels" handling.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Adjust.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Adjust.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Adjust.
* 

[PATCH 39/40] openacc: Check type for references in reduction lowering

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* omp-low.c (lower_oacc_reductions): Only create a reference
if variable has pointer type.
---
 gcc/omp-low.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ae5cdfc5e260..2b8b848ec03a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7639,9 +7639,10 @@ lower_oacc_reductions (location_t loc, tree clauses, 
tree level, bool inner,

if (omp_privatize_by_reference (orig))
  {
-   outgoing = build_simple_mem_ref (outgoing);
+if (POINTER_TYPE_P (TREE_TYPE (outgoing)))
+ outgoing = build_simple_mem_ref (outgoing);

-   if (!TREE_CONSTANT (incoming))
+if (POINTER_TYPE_P (TREE_TYPE (incoming)))
  incoming = build_simple_mem_ref (incoming);
  }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 38/40] openacc: fix privatization of by-reference arrays

2021-12-15 Thread Frederik Harwath
From: Tobias Burnus 

Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.

gcc/ChangeLog:

* gimplify.c (localize_reductions): Do not create local
variable for privatized arrays.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a0137089496b..952bc449a7db 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11982,8 +11982,9 @@ localize_reductions (tree clauses, tree body)

if (!lang_hooks.decls.omp_privatize_by_reference (var))
  continue;
-
type = TREE_TYPE (TREE_TYPE (var));
+   if (TREE_CODE (type) == ARRAY_TYPE)
+ continue;
new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));

pr.ref_var = var;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels'

2021-12-15 Thread Frederik Harwath
From: Tobias Burnus 

Nearly all variable mapping is moved from 'kernels' to a surrounding
'data kernels' and then 'force_present' mapped for the 'kernels'. However, as
libgomp.oacc-c-c++-common/declare-vla.c shows, moving 'int i, N' will fail as
there is a special case for is_gimple_reg in mapping and that fails badly if
outside a target region (e.g. offloading = false). As those are transferred by
value and not as a pointer, it makes more sense to only map them at
'kernels' and ignore them for 'data kernels'.
Additionally, as e.g. libgomp.oacc-c-c++-common/kernels-decompose-1.c shows,
one still additionally to handle 'kernels'-declared variables which now are
declared in 'kernels data' and and can be handled as is_gimple_reg.

gcc/
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
is_gimple_reg vars are not yet mapped, fall through to map is as
before the transformation.
(omp_oacc_kernels_decompose_1): Don't map is_gimple_reg vars.
(decompose_kernels_region_body): Use tofrom for is_gimple_reg vars.
(omp_oacc_kernels_decompose_1): Handle is_gimple_reg vars as without
data kernels.

gcc/testsuite/
* gfortran.dg/goacc/declare-3.f95: Update scan-tree-dump-times.
---
 gcc/omp-oacc-kernels-decompose.cc | 9 +++--
 gcc/testsuite/gfortran.dg/goacc/declare-3.f95 | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc 
b/gcc/omp-oacc-kernels-decompose.cc
index c96207d96250..a6be1f1ed238 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -873,7 +873,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
  else
inner_bind_vars = next;
}
-  else
+  else if (!is_gimple_reg (v))
{
  /* Otherwise, build the map clause.  */
  tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
@@ -1222,7 +1222,9 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
   if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
{
  tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
- OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+ OMP_CLAUSE_SET_MAP_KIND (present_clause,
+  is_gimple_reg (var)
+  ? GOMP_MAP_TOFROM : GOMP_MAP_FORCE_PRESENT);
  OMP_CLAUSE_DECL (present_clause) = var;
  OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
  OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
@@ -1437,6 +1439,9 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
   region causes runtime errors.  */
break;

+ if (is_gimple_reg (decl))
+   break;
+
  /* For non-artificial variables, and for non-declaration
 expressions like A[0:n], copy the clause to the data
 region.  */
diff --git a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 
b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
index 9127cba6600d..2a1fe0a68465 100644
--- a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
@@ -39,7 +39,7 @@ program test
   use mod_d
   use mod_e

-  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) 
map\(force_to:b\) map\(force_alloc:a\)$} original } }
+  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) 
map\(to:b\) map\(alloc:a\)$} original } }
 end program test

 ! { dg-final { scan-tree-dump-times {#pragma acc data} 1 original } }
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 36/40] openacc: Enable reduction variable localization for "kernels"

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* gimplify.c (gimplify_omp_for): Enable localization on
"kernels" regions.
(gimplify_omp_workshare): Likewise.
---
 gcc/gimplify.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index bf37388f947c..a0137089496b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12229,11 +12229,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 && outer->region_type != ORT_ACC_KERNELS)
outer = outer->outer_context;

-  /* FIXME: Reductions only work in parallel regions at present.  We avoid
-doing the reduction localization transformation in kernels regions
-here, because the code to remove reductions in kernels regions cannot
-handle that.  */
-  if (outer && outer->region_type == ORT_ACC_PARALLEL)
+  if (outer && (outer->region_type == ORT_ACC_PARALLEL
+   || (outer->region_type == ORT_ACC_KERNELS
+   && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)))
localize_reductions (OMP_FOR_CLAUSES (for_stmt),
 OMP_FOR_BODY (for_stmt));
 }
@@ -13767,8 +13765,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
 {
   push_gimplify_context ();

-  /* FIXME: Reductions are not supported in kernels regions yet.  */
-  if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+  if (ort == ORT_ACC_PARALLEL
+  || (ort == ORT_ACC_KERNELS
+  && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE))
 localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

   gimple *g = gimplify_and_return_first (OMP_BODY (expr), );
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 35/40] Handle references in OpenACC "private" clauses

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
---
 gcc/gimplify.c| 15 
 .../libgomp.oacc-c++/privatized-ref-2.C   | 64 +
 .../libgomp.oacc-c++/privatized-ref-3.C   | 64 +
 .../libgomp.oacc-fortran/privatized-ref-1.f95 | 71 +++
 4 files changed, 214 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index daa69ccf6202..bf37388f947c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11976,6 +11976,21 @@ localize_reductions (tree clauses, tree body)

OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
   }
+else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ continue;
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (, localize_reductions_r, , NULL);
+  }
 }


diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
new file mode 100644
index ..3884f163132c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include 
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop worker
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int  = tmpvar;
+   tmpref = (i * 256 + j) * 99;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 99)
+  abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang worker
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop vector
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int  = tmpvar;
+   tmpref = (i * 256 + j) * 101;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 101)
+  abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
new file mode 100644
index ..c1a10cba31b3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include 
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+int tmpvar;
+int  = tmpvar;
+#pragma acc loop gang
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop worker private(tmpref)
+   for (j = 0; j < 256; j++)
+ {
+   tmpref = (i * 256 + j) * 99;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 99)
+  abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+int tmpvar;
+int  = tmpvar;
+#pragma acc loop gang worker
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop vector private(tmpref)
+   for (j = 0; j < 256; j++)
+ {
+   tmpref = (i * 256 + j) * 101;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 101)
+  abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
new file mode 100644
index ..fe1520a8078c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
@@ -0,0 +1,71 @@
+! { dg-do run }
+
+program main
+  implicit none
+  integer :: myint
+  integer :: i
+  real :: res(65536), tmp
+
+  res(:) = 0.0
+
+  myint = 5
+  call workers(myint, res)
+
+  do i=1,65536
+tmp = i * 99
+if (res(i) .ne. tmp) stop 1
+ 

[PATCH 34/40] Use more appropriate var in localize_reductions call

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (gimplify_omp_for): Use for_stmt in call to
localize_reductions.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 04ffbc256442..daa69ccf6202 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12219,7 +12219,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 here, because the code to remove reductions in kernels regions cannot
 handle that.  */
   if (outer && outer->region_type == ORT_ACC_PARALLEL)
-   localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+   localize_reductions (OMP_FOR_CLAUSES (for_stmt),
+OMP_FOR_BODY (for_stmt));
 }

   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 33/40] Fix tree check failure with reduction localization

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY
instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY.
---
 gcc/gimplify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9a4331c70d6e..04ffbc256442 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -13753,8 +13753,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)

   /* FIXME: Reductions are not supported in kernels regions yet.  */
   if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
-localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
-OMP_TARGET_BODY (*expr_p));
+localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

   gimple *g = gimplify_and_return_first (OMP_BODY (expr), );
   if (gimple_code (g) == GIMPLE_BIND)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 32/40] Reference reduction localization

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (privatize_reduction): New struct.
(localize_reductions_r, localize_reductions): New functions.
(gimplify_omp_for): Call localize_reductions.
(gimplify_omp_workshare): Likewise.
* omp-low.c (lower_oacc_reductions): Handle localized reductions.
Create fewer temp vars.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
documentation.
* tree.c (omp_clause_num_ops): Bump number of ops for
OMP_CLAUSE_REDUCTION to 6.
(walk_tree_1): Adjust accordingly.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
---
 gcc/gimplify.c  | 102 +++
 gcc/omp-low.c   |  45 +---
 gcc/tree-core.h |   4 +-
 gcc/tree.c  | 137 +---
 gcc/tree.h  |   2 +
 5 files changed, 250 insertions(+), 40 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2ab96e7e182..9a4331c70d6e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -240,6 +240,11 @@ struct gimplify_omp_ctx
   int defaultmap[5];
 };

+struct privatize_reduction
+{
+  tree ref_var, local_var;
+};
+
 static struct gimplify_ctx *gimplify_ctxp;
 static struct gimplify_omp_ctx *gimplify_omp_ctxp;
 static bool in_omp_construct;
@@ -11900,6 +11905,80 @@ gimplify_omp_taskloop_expr (tree type, tree *tp, 
gimple_seq *pre_p,
   OMP_FOR_CLAUSES (orig_for_stmt) = c;
 }

+/* Helper function for localize_reductions.  Replace all uses of REF_VAR with
+   LOCAL_VAR.  */
+
+static tree
+localize_reductions_r (tree *tp, int *walk_subtrees, void *data)
+{
+  enum tree_code tc = TREE_CODE (*tp);
+  struct privatize_reduction *pr = (struct privatize_reduction *) data;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  switch (tc)
+{
+case INDIRECT_REF:
+case MEM_REF:
+  if (TREE_OPERAND (*tp, 0) == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  if (*tp == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+default:
+  break;
+}
+
+  return NULL_TREE;
+}
+
+/* OpenACC worker and vector loop state propagation requires reductions
+   to be inside local variables.  This function replaces all reference-type
+   reductions variables associated with the loop with a local copy.  It is
+   also used to create private copies of reduction variables for those
+   which are not associated with acc loops.  */
+
+static void
+localize_reductions (tree clauses, tree body)
+{
+  tree c, var, type, new_var;
+  struct privatize_reduction pr;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ {
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL;
+   continue;
+ }
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (, localize_reductions_r, , NULL);
+
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
+  }
+}
+
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */

 static enum gimplify_status
@@ -12126,6 +12205,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gcc_unreachable ();
 }

+  if (ort == ORT_ACC)
+{
+  gimplify_omp_ctx *outer = gimplify_omp_ctxp;
+
+  while (outer
+&& outer->region_type != ORT_ACC_PARALLEL
+&& outer->region_type != ORT_ACC_KERNELS)
+   outer = outer->outer_context;
+
+  /* FIXME: Reductions only work in parallel regions at present.  We avoid
+doing the reduction localization transformation in kernels regions
+here, because the code to remove reductions in kernels regions cannot
+handle that.  */
+  if (outer && outer->region_type == ORT_ACC_PARALLEL)
+   localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+}
+
   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
  clause for the IV.  */
   if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
@@ -13654,6 +13750,12 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq 
*pre_p)
   || (ort & ORT_HOST_TEAMS) == ORT_HOST_TEAMS)
 {
   push_gimplify_context ();
+
+  /* FIXME: Reductions are not supported in kernels regions yet.  */
+  if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
+OMP_TARGET_BODY (*expr_p));
+
   gimple *g = gimplify_and_return_first (OMP_BODY (expr), );
   if (gimple_code (g) == GIMPLE_BIND)
pop_gimplify_context (g);
diff --git a/gcc/omp-low.c 

[PATCH 31/40] graphite: Accept loops without data references

2021-12-15 Thread Frederik Harwath
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index f173e6c4f890..2dcb85508a3d 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -849,19 +849,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
  return true;
}

-  /* Check if all loop nests have at least one data reference.
-???  This check is expensive and loops premature at this point.
-If important to retain we can pre-compute this for all innermost
-loops and reject those when we build a SESE region for a loop
-during SESE discovery.  */
-  if (! loop->inner
- && ! loop_nest_has_data_refs (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << " does not have any data reference.\n");
- return true;
-   }
-
   DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is 
harmless.\n");
 }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 30/40] graphite: Adjust scop loop-nest choice

2021-12-15 Thread Frederik Harwath
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c| 21 ++---
 gcc/graphite.h   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 010adaabb000..acadf544fadd 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
 conditional if aliasing can be ruled out at runtime and the original
 version of the SCoP, otherwise. */

-  loop_p loop
-  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-  scop->scop_info->region.exit->src->loop_father);
+  loop_p loop = scop_context_loop (scop);
   tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
   tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
   set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 9a5e43a5bfc6..f173e6c4f890 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1774,9 +1791,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-   scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l 
&, loop_p, tree);
 extern void dot_all_sese (FILE *, vec &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 29/40] graphite: Tune parameters for OpenACC use

2021-12-15 Thread Frederik Harwath
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

* graphite-optimize-isl.c (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.c (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c   | 35 ---
 gcc/graphite-scop-detection.c | 28 ++-
 .../gcc.dg/goacc/graphite-parameter-1.c   | 21 +++
 .../gcc.dg/goacc/graphite-parameter-2.c   | 23 
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+ by "kernels" loops in existing OpenACC codes.  Raise the values
+ significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 35 /* default value */
+  && oacc_function_p (cfun))
+max_operations = 200;
+
   if (max_operations)
 isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
  dump_user_location_t loc = find_loop_location
(scop->scop_info->region.entry->dest->loop_father);
  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-"loop nest not optimized, optimization timed out "
-"after %d operations [--param 
max-isl-operations]\n",
-max_operations);
- else
+   {
+  if (oacc_function_p (cfun))
+   {
+ /* Special casing for OpenACC to unify diagnostic messages
+here and in graphite-scop-detection.c. */
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+   "data-dependence analysis of OpenACC loop "
+   "nest "
+   "failed; try increasing the value of "
+   "--param="
+   "max-isl-operations=%d.\n",
+   max_operations);
+}
+  else
+dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed "
+ "out after %d operations [--param "
+ "max-isl-operations]\n",
+ max_operations);
+}
+  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "loop nest not optimized, ISL signalled an 
error\n");
}
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 234dbe0ec729..9a5e43a5bfc6 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2053,6 +2053,9 @@ determine_openacc_reductions (scop_p scop)
 }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
them to SCOPS.  */

@@ -2106,6 +2109,11 @@ build_scops (vec 

[PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite

2021-12-15 Thread Frederik Harwath
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

* tree-ssa-pre.c (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index dc55d868cc19..d61210fc2ee9 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "alias.h"
 #include "gimple-range.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
implement a bit more than just PRE here.  All of them piggy-back
@@ -3742,6 +3743,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+/* The additional dependences introduced by the code insertions
+ can cause Graphite's dependence analysis to fail .  Without
+ special handling of those dependences in Graphite, it seems
+ better to skip this step if OpenACC loops that need to be handled
+ by Graphite are found.  Note that the full redundancy elimination
+ step of this pass is useful for the purpose of dependence
+ analysis, for instance, because it can remove definitions from
+ SCoPs that would otherwise prevent the creation of runtime alias
+ checks since those may only use definitions that are available
+ before the SCoP. */
+
+  if (oacc_function_p (cfun)
+  && ::graphite_analyze_oacc_function_p (cfun))
+return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 27/40] openacc: Handle internal function calls in pass_lim

2021-12-15 Thread Frederik Harwath
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.c (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust
declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call 
to
loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def |  2 +-
 gcc/tree-ssa-loop-im.c | 57 --
 gcc/tree-ssa-loop-manip.h  |  2 +-
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index ccd5083145f8..7c9b7b2345fa 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2107,7 +2107,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
 {
   unsigned todo = TODO_update_ssa_only_virtuals;
-  todo |= loop_invariant_motion_in_fun (cfun, false);
+  todo |= loop_invariant_motion_in_fun (cfun, false, false);
   scev_reset ();
   return todo;
 }
diff --git a/gcc/passes.def b/gcc/passes.def
index 681392f8f79f..1da9382bac53 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -250,7 +250,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_laddress);
-  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 4b187c2cdafe..466dc494fb52 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -327,11 +329,23 @@ enum move_pos
Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+  && gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == ARRAY_REF)
+ return MOVE_IMPOSSIBLE;
+}
+
   if (flag_unswitch_loops
   && gimple_code (stmt) == GIMPLE_COND)
 {
@@ -981,7 +995,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1009,7 +1023,7 @@ compute_invariantness (basic_block bb)
   {
stmt = gsi_stmt (bsi);

-   pos = movement_possibility (stmt);
+   pos = movement_possibility 

[PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels

2021-12-15 Thread Frederik Harwath
From: Andrew Stubbs 

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.c: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c  | 122 
 gcc/omp-expand.c  |  37 +--
 gcc/omp-offload.c | 271 ++
 .../runtime-alias-check-1.c   |  79 +
 .../runtime-alias-check-2.c   |  90 ++
 5 files changed, 457 insertions(+), 142 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index e820e2c32202..010adaabb000 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1697,6 +1698,127 @@ graphite_oacc_analyze_scop (scop_p scop)
   print_isl_schedule (dump_file, scop->original_schedule);
 }

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  sese_info_p region = scop->scop_info;
+
+  /* Usually there will be a chunking loop with the actual work loop
+inside it.  In some corner cases there may only be one loop.  */
+  loop_p top_loop = region->region.entry->dest->loop_father;
+  loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, 
active_loop);
+
+  /* Walk back to GOACC_LOOP block.  */
+  basic_block goacc_loop_block = region->region.entry->src;
+
+  /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+OpenACC kernels loop and will need different handling.  */
+  gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+  while (!gsi_end_p (gsitop)
+&& (!is_gimple_call (gsi_stmt (gsitop))
+|| !gimple_call_internal_p (gsi_stmt (gsitop))
+|| (gimple_call_internal_fn (gsi_stmt (gsitop))
+!= IFN_GOACC_LOOP)))
+   gsi_next ();
+
+  if (!gsi_end_p (gsitop))
+   {
+ /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+statements.  There ought not be any problematic dependencies 
because
+the chunk size and step are only computed for very specific 
purposes.
+They may not be at the very top of the block, but they should be
+found together (the asserts test this assuption). */
+ gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+ gsi_move_after (, );
+ gimple_stmt_iterator gsiinsert = gsibottom;
+ gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+  && gimple_call_internal_p (gsi_stmt (gsitop))
+  && (gimple_call_internal_fn (gsi_stmt (gsitop))
+  == IFN_GOACC_LOOP));
+ gsi_move_after (, );
+
+ /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+Note that these likely depend on some of the hoisted statements.  
*/
+ tree cond_val = force_gimple_operand_gsi (, cond, true, 
NULL,
+   true, GSI_NEW_STMT);
+
+ /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+ for (int n = -1; n < (int)region->bbs.length (); n++)
+   {
+ /* Cover the region plus goacc_loop_block.  */
+ basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next ())
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ if (!is_gimple_call (stmt)
+ || 

[PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences

2021-12-15 Thread Frederik Harwath
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.c (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt|  5 +
 gcc/omp-offload.c | 49 +++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index b6c46ab63e34..ec76a88f14e3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -850,6 +850,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 3458a1acbceb..36dde11f5955 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1900,6 +1900,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop 
*loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+return;
+
+  if (loop->routine)
+return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+{
+  if (dump_enabled_p ())
+   {
+ const dump_user_location_t loc
+   = dump_user_location_t::from_location_t (loop->loc);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "'independent' loop in 'kernels' region has not been 
"
+  "analyzed (cf. 'graphite' "
+  "dumps for more information).\n");
+   }
+  return;
+}
+
+  if (!can_be_parallel)
+warning_at (loop->loc, 0,
+"loop has \"independent\" clause but data dependences were "
+"found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
programmer-specified partitionings.  OUTER_MASK is the partitioning
this loop is contained within.  Return mask of partitioning
@@ -1951,6 +1996,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)
}
}

+  /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+  if (warn_openacc_false_independent)
+oacc_loop_warn_if_false_independent (loop);
+
   if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
{
  loop->flags |= OLF_AUTO;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 24/40] openacc: Add data optimization pass

2021-12-15 Thread Frederik Harwath
From: Andrew Stubbs 

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/Makefile.in   |   1 +
 gcc/doc/gimple.texi   |   2 +
 gcc/gimple-walk.c |  15 +-
 gcc/gimple-walk.h |   6 +
 gcc/omp-data-optimize.cc  | 951 ++
 gcc/passes.def|   1 +
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C| 169 
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h   |   1 +
 .../kernels-decompose-1.c |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   4 +
 14 files changed, 2422 insertions(+), 3 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index debd8047cc85..e876e6ec993c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1515,6 +1515,7 @@ OBJS = \
omp-oacc-kernels-decompose.o \
omp-oacc-neuter-broadcast.o \
omp-simd-clone.o \
+   omp-data-optimize.o \
opt-problem.o \
optabs.o \
optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 5d89dbcc68d5..c8f0b8b2a826 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2770,4 +2770,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} 
is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, 
the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index e15fd4697ba1..b6add4394ab2 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn 
callback_stmt,
}

   if (!wi->removed_stmt)
-   gsi_next ();
+   {
+ if (forward)
+   gsi_next ();
+ else //TODO Correct?  

+   gsi_prev ();
+ //TODO This could do with some unit testing (see other 
'gcc/*-tests.c' files for inspiration), to make sure all the corner cases 
(removing first/last, for example) work correctly.
+   }
 }

   if (wi)
diff --git 

[PATCH 23/40] Add function for printing a single OMP_CLAUSE

2021-12-15 Thread Frederik Harwath
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

* tree-pretty-print.c (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 275dc7d8af73..e85370cfe722 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1360,6 +1360,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int 
spc, dump_flags_t flags)
 }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text ());
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index dacd256302b2..f9ff0ee1ce0b 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = 
TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
  bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
  enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions

2021-12-15 Thread Frederik Harwath
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

* omp-offload.c (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
---
 gcc/omp-offload.c | 51 +--
 .../acc_prof-kernels-1.c  | 18 ---
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2743e90f79a3..392ca56b1f4f 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1097,6 +1097,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+  {
+if (host_compiler)
+  {
+strcat (removed_partitions, axes[ix]);
+strcat (removed_partitions, " ");
+  }
+dims[ix] = -1;
+  }
+  if (removed_partitions[0] != '\0')
+warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+"removed %spartitioning from % region",
+removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
raw attribute.  DIMS is an array of dimensions, which is filled in.
LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1117,6 +1150,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
 {
   purpose[ix] = TREE_PURPOSE (pos);
+
   tree val = TREE_VALUE (pos);
   dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
   pos = TREE_CHAIN (pos);
@@ -1126,14 +1160,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
   if (check
   && warn_openacc_parallelism
-  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-  && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES 
(fn)))
+  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
 {
-  static char const *const axes[] =
-  /* Must be kept in sync with GOMP_DIM enumeration.  */
-   { "gang", "worker", "vector" };
   for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
if (dims[ix] < 0)
  ; /* Defaulting axis.  */
@@ -1144,14 +1179,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
  "region contains %s partitioned code but"
  " is not %s partitioned", axes[ix], axes[ix]);
else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+ {
  /* The dimension is explicitly partitioned to non-unity, but
 no use is made within the region.  */
  warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
  "region is %s partitioned but"
  " does not contain %s partitioned code",
  axes[ix], axes[ix]);
+  }
 }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+ DECL_ATTRIBUTES (fn)))
+oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index ad33f72e2fb6..65c83dce01c9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include 

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" 
} { "" } } 

[PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* graph.c (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
---
 gcc/graph.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index 9acd1d5b95e4..a34356e8a7ec 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -192,6 +192,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct 
function *fun)
 }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
order to get a good ranking of the nodes.  This function is recursive:
It first prints inner loops, then the body of LOOP itself.  */
@@ -206,17 +210,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int 
funcdef_no,

   if (loop->header != NULL
   && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-pp_printf (pp,
-  "\tsubgraph cluster_%d_%d {\n"
-  "\tstyle=\"filled\";\n"
-  "\tcolor=\"darkgreen\";\n"
-  "\tfillcolor=\"%s\";\n"
-  "\tlabel=\"loop %d\";\n"
-  "\tlabeljust=l;\n"
-  "\tpenwidth=2;\n",
-  funcdef_no, loop->num,
-  fillcolors[(loop_depth (loop) - 1) % 3],
-  loop->num);
+{
+  pp_printf (pp,
+ "\tsubgraph cluster_%d_%d {\n"
+ "\tstyle=\"filled\";\n"
+ "\tcolor=\"darkgreen\";\n"
+ "\tfillcolor=\"%s\";\n"
+ "\tlabel=\"loop %d %s\";\n"
+ "\tlabeljust=l;\n"
+ "\tpenwidth=2;\n",
+ funcdef_no, loop->num,
+ fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+ /* This is only meaningful for loops that have been processed
+by Graphite.
+
+TODO Use can_be_parallel_valid_p? */
+ !oacc_get_fn_attrib (cfun->decl)
+ ? ""
+ : loop->can_be_parallel ? "(can_be_parallel = true)"
+ : "(can_be_parallel = false)");
+}

   for (class loop *inner = loop->inner; inner; inner = inner->next)
 draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 19/40] graphite: Add runtime alias checking

2021-12-15 Thread Frederik Harwath
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.c
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.c (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.

gcc/testsuite/ChangeLog:

* gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt  |   4 +
 gcc/graphite-isl-ast-to-gimple.c|  60 ++
 gcc/graphite-poly.c |   2 +
 gcc/graphite-scop-detection.c   | 241 +---
 gcc/graphite.h  |   4 +
 gcc/sese.h  |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 328 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 1a5b9bfcca91..b6c46ab63e34 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1673,6 +1673,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 0712d85b67a6..073b471775de 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
 }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec _ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+   && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec check_pairs;
+  compute_alias_check_pairs (context_loop, _ddrs, _pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, _pairs, _cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Generated runtime alias check: ");
+  print_generic_expr (dump_file, alias_cond, dump_flags);
+  fprintf (dump_file, "\n");
+}
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  /* SCoP detection has failed to handle the aliasing between some data
+references of the SCoP statically. Generate an alias check that selects
+the newly generated version of the SCoP in the true-branch of the
+conditional if aliasing can be ruled out at runtime and the original
+version of the SCoP, otherwise. */
+
+  loop_p loop
+  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+  scop->scop_info->region.exit->src->loop_father);
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+  tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+  set_ifsese_condition (region->if_region, 

[PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c

2021-12-15 Thread Frederik Harwath
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.c (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.c (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c  | 87 
 gcc/tree-data-ref.h  |  3 ++
 gcc/tree-loop-distribution.c | 87 
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 46f4ffedb483..6a3659dc490c 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2636,6 +2636,93 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+  fold_convert (sizetype, niters),
+  size_one_node);
+  return size_binop (MULT_EXPR,
+fold_convert (sizetype, DR_STEP (dr)),
+fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec *alias_ddrs,
+  vec *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+{
+  ddr_p ddr = (*alias_ddrs)[i];
+  struct data_reference *dr_a = DDR_A (ddr);
+  struct data_reference *dr_b = DDR_B (ddr);
+  tree seg_length_a, seg_length_b;
+
+  if (latch_dominated_by_data_ref (loop, dr_a))
+   seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+  else
+   seg_length_a = data_ref_segment_size (dr_a, niters);
+
+  if (latch_dominated_by_data_ref (loop, dr_b))
+   seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+  else
+   seg_length_b = data_ref_segment_size (dr_b, niters);
+
+  unsigned HOST_WIDE_INT access_size_a
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a;
+  unsigned HOST_WIDE_INT access_size_b
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b;
+  unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+  unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+  dr_with_seg_len_pair_t dr_with_seg_len_pair
+   (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+/* ??? Would WELL_ORDERED be safe?  */
+dr_with_seg_len_pair_t::REORDERED);
+
+  comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+}
+
+  if (tree_fits_uhwi_p (niters))
+factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+"Improved number of alias checks from %d to %d\n",
+alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f2..4929b059ddea 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -582,6 +582,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop 
*, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec 

[PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite

2021-12-15 Thread Frederik Harwath
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.c (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-oacc-neuter-broadcast.cc:
Make pass_omp_oacc_neuter_broadcast clonable.
* omp-offload.c (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.c (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.
* tree-pass.h (make_pass_oacc_only): New declaration.
(make_pass_oacc_functions_only): New declaration.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* gcc.dg/tree-ssa/loopclosedphi.c: Likewise.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/vect/bb-slp-59.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* 

[PATCH 17/40] graphite: Fix minor mistakes in comments

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 1ad68a1d4735..0712d85b67a6 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
- in strage ways when inserting the stmts from it into different basic
+ in strange ways when inserting the stmts from it into different basic
  blocks one at a time.  */
   auto_vec stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 204d382ed4cc..33d6a98327b8 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -649,14 +649,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, 
enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
  the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
- data reference DR.  */
+ the reference */
   isl_constraint *c
 = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space 
(acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 16/40] graphite: Rename isl_id_for_ssa_name

2021-12-15 Thread Frederik Harwath
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 99ea0327b1a7..204d382ed4cc 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take 
isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -898,15 +899,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
 space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
+  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 13/40] Fortran: Delinearize array accesses

2021-12-15 Thread Frederik Harwath
The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

Co-Authored-By: Tobias Burnus 

gcc/ChangeLog:

* expr.c (get_inner_reference): Handle NOP_EXPR.

gcc/fortran/ChangeLog:

* lang.opt: Document -param=delinearize.
* trans-array.c: (get_class_array_vptr): New function.
(get_array_lbound): New function.
(get_array_ubound): New function.
(gfc_conv_array_ref): Implement main delinearization logic.
(build_array_ref): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/assumed_type_2.f90: Adjust test expectations.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
* gfortran.dg/graphite/block-2.f: Likewise.
* gfortran.dg/graphite/block-3.f90: Likewise.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/graphite/id-9.f: Likewise.
* gfortran.dg/inline_matmul_16.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Likewise.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 .../gfortran.dg/gomp/affinity-clause-1.f90|   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   2 +-
 .../gfortran.dg/graphite/block-4.f90  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_16.f90  |   2 +
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 16 files changed, 270 insertions(+), 96 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index eb33643bd770..188905b4fe4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index a202c04c4a25..25c5a5a32c41 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 5ceb261b6989..e84b4cb55f05 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank 

[PATCH 15/40] graphite: Extend SCoP detection dump output

2021-12-15 Thread Frederik Harwath
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

* graphite-scop-detection.c (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +-
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer , const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer , gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer , tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+if (loop_in_sese_p (loop, sese))
+  fprintf (file, "%d, ", loop->num);
+printed = true;
+  }
+  if (printed)
+fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
   if (! next
  || harmful_loop_in_region (next))
{
- if (s)
-   add_scop (s);
+  if (next)
+DEBUG_PRINT (
+dp << "[scop-detection] Discarding SCoP on loops ";
+print_sese_loop_numbers (dump_file, next);
+dp << " because of harmful loops\n";);
+  if (s)
+add_scop (s);
  build_scop_depth (loop);
  s = invalid_sese;
}
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
   || !single_pred_p (loop->latch)
   || exit->src != single_pred (loop->latch)
   || !empty_block_p (loop->latch))
-return false;
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape 
unsupported.\n");
+  return false;
+}
+
+  bool edge_irreducible
+  = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+  return false;
+}
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+  single_exit (loop),
+  _desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-&& number_of_iterations_exit (loop, single_exit (loop), _desc, false)
-&& niter_desc.control.no_overflow
-&& (niter = number_of_latch_executions (loop))
-&& !chrec_contains_undetermined (niter)
-&& graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+ << "Condition: " << niter_desc.assumptions << "\n");
+  return false;
+}
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+  return false;
+}
+  if (!niter_desc.control.no_overflow)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can 
overflow.\n");
+  return false;
+}
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter chrec contains undetermined coefficients.\n");
+  return false;
+}
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter expression cannot be represented: "
+  << niter << "\n");
+  return false;
+}
+
+  return true;
 }

 /* Return true 

[PATCH 12/40] Relax some restrictions on the loop bound in kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration.  Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression.  This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.

2020-08-30  Sandra Loosemore  

gcc/c-family/
* c-omp.c (end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(check_and_annotate_for_loop): Use the new helper function.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-21.c: New.
* c-c++-common/goacc/kernels-loop-annotation-22.c: New.
---
 gcc/c-family/c-omp.c  | 120 --
 .../goacc/kernels-loop-annotation-21.c|  42 ++
 .../goacc/kernels-loop-annotation-22.c|  41 ++
 3 files changed, 194 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e73fb5d01f7e..dc63d304ca67 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3165,6 +3165,116 @@ is_local_var (tree decl)
  && !TREE_ADDRESSABLE (decl));
 }

+/* EXP is a loop bound expression for a comparison against local
+   variable DECL.  Check whether this is potentially valid in an OpenACC loop
+   context, namely that it can be precomputed when entering the loop
+   construct per the OpenACC specification.  Local variables referenced
+   in both DECL and EXP that may not be modified in the body of the loop
+   are added to the list in INFO to be checked later.
+
+   FIXME: Ideally we would like to make this test permissive rather than
+   restrictive, and allow the later conversion of the "auto" attribute to
+   either "seq" or "independent" to make the determination using dataflow,
+   alias analysis, etc rather than a tree traversal.  But presently it does
+   not do that and always just hoists the loop bound expression.  So the
+   current implementation only considers expressions involving unmodified
+   local variables and constants, using a tree walk.  */
+
+static tree
+end_test_ok_for_annotation_r (tree *tp, int *walk_subtrees,
+ void *data)
+{
+  tree exp = *tp;
+  struct annotation_info *info = (struct annotation_info *) data;
+
+  switch (TREE_CODE_CLASS (TREE_CODE (exp)))
+{
+case tcc_constant:
+  /* Constants are trivially known to be invariant.  */
+  return NULL_TREE;
+
+case tcc_declaration:
+  if (is_local_var (exp))
+   {
+ tree t;
+ /* Add it to the list of variables that can't be modified in the
+loop, only if not already present.  */
+ for (t = info->vars; t && TREE_VALUE (t) != exp;
+  t = TREE_CHAIN (t))
+   ;
+ if (!t)
+   info->vars = tree_cons (NULL_TREE, exp, info->vars);
+ return NULL_TREE;
+   }
+  else if (TREE_CODE (exp) == VAR_DECL && TREE_READONLY (exp))
+   return NULL_TREE;
+  else if (TREE_CODE (exp) == FUNCTION_DECL)
+   return NULL_TREE;
+  break;
+
+case tcc_unary:
+case tcc_binary:
+case tcc_comparison:
+  /* Allow arithmetic expressions and comparisons provided
+that the operands are good.  */
+  return NULL_TREE;
+
+default:
+  /* Handle some special cases.  */
+  switch (TREE_CODE (exp))
+   {
+   case COND_EXPR:
+   case TRUTH_ANDIF_EXPR:
+   case TRUTH_ORIF_EXPR:
+   case TRUTH_AND_EXPR:
+   case TRUTH_OR_EXPR:
+   case TRUTH_XOR_EXPR:
+   case TRUTH_NOT_EXPR:
+ /* ?: and boolean operators are OK.  */
+ return NULL_TREE;
+
+   case CALL_EXPR:
+ /* Allow calls to constant functions with invariant operands.  */
+ {
+   tree fndecl = get_callee_fndecl (exp);
+   if (fndecl && TREE_READONLY (fndecl))
+ return NULL_TREE;
+ }
+ break;
+
+   case ADDR_EXPR:
+ /* We can expect addresses of things to be invariant.  */
+ return NULL_TREE;
+
+   default:
+ break;
+   }
+}
+
+  /* Reject anything else.  */
+  *walk_subtrees = 0;
+  return exp;
+}
+
+static bool
+end_test_ok_for_annotation (tree decl, tree exp,
+   struct annotation_info *info)
+{
+  /* Traversal returns NULL_TREE if all is well.  */
+  if (!walk_tree (, end_test_ok_for_annotation_r, info, NULL))
+{
+  /* So far, so good.  Check the decl against any variables collected
+in the exp.  */
+  tree t;
+  for (t = info->vars; t; t = TREE_CHAIN (t))
+   if 

[PATCH 11/40] Clean up loop variable extraction in OpenACC kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test.  However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable.  In particular, it's ambiguous when both operands to the
end test comparison are local variables.

This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer.  Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.

2020-08-30  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here.  Allow the loop
variable as either operand to the condition.
---
 gcc/c-family/c-omp.c | 196 +--
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e7c27f45e888..e73fb5d01f7e 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3174,86 +3174,26 @@ static tree (*lang_specific_unwrap_initializer) (tree);

 /* Try to annotate the given NODE, which must be a FOR_STMT, with a
"#pragma acc loop auto" annotation.  In practice, this means
-   building an OMP_FOR node for it.  PREV_STMT is the statement
-   immediately before the loop, which may be used as the loop's
-   initialization statement.  Annotating the loop may fail, in which
-   case INFO is used to record the cause of the failure and the
-   original loop remains unchanged.  This function returns the
-   transformed loop if the transformation succeeded, the original node
-   otherwise.  */
+   building an OMP_FOR node for it.  DECL and INIT are the
+   previously-verified iteration variable and initializer.  Annotating
+   the loop may fail, in which case INFO is used to record the cause
+   of the failure and the original loop remains unchanged.  This
+   function returns the transformed loop if the transformation
+   succeeded, the original node otherwise.  */

 static tree
-annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi,
+annotate_for_loop (tree node, tree decl, tree init,
   struct annotation_info *info)
 {
   gcc_checking_assert (TREE_CODE (node) == FOR_STMT);

   location_t loc = EXPR_LOCATION (node);
   tree cond = FOR_COND (node);
+  tree incr = FOR_EXPR (node);
+
+  gcc_assert (decl);
   gcc_assert (cond);
-  tree decl = TREE_OPERAND (cond, 0);
   gcc_assert (decl && TREE_CODE (decl) == VAR_DECL);
-  tree init = FOR_INIT_STMT (node);
-  tree prev_stmt = NULL_TREE;
-  bool unlink_prev = false;
-  bool fix_decl = false;
-
-
-  /* Both the C and C++ front ends normally put the initializer in the
- statement list just before the FOR_STMT instead of in FOR_INIT_STMT.
- If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out
- because the code below won't handle it.  */
-  if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR)
-{
-  do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE);
-  return node;
-}
-
-  /* Examine the statement before the loop to see if it is a
- valid initializer.  It must be either a MODIFY_EXPR or VAR_DECL,
- possibly wrapped in language-specific structure.  */
-  if (init == NULL_TREE && prev_tsi != NULL)
-{
-  prev_stmt = tsi_stmt (*prev_tsi);
-
-  /* Call the language-specific hook to unwrap prev_stmt.  */
-  if (prev_stmt)
-   prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt);
-
-  /* See if we have a valid MODIFY_EXPR.  */
-  if (prev_stmt
- && TREE_CODE (prev_stmt) == MODIFY_EXPR
- && TREE_OPERAND (prev_stmt, 0) == decl
- && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1)))
-   {
- init = prev_stmt;
- unlink_prev = true;
-   }
-  else if (prev_stmt == decl
-  && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl)))
-   {
- /* If the preceding statement is the declaration of the loop
-variable with its initialization, build an assignment
-expression for the loop's initializer.  */
- init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl,
-DECL_INITIAL (decl));
- /* We need to remove the initializer from the decl if we
-end up using the init we just built instead.  */
- fix_decl = true;
-   }
-}
-
-  if (init == NULL_TREE)
-/* There is nothing we can do to find the correct init statement for
-   this loop, but c_finish_omp_for insists on having one and would fail
-   otherwise.  In that case, we would just return node.  Do 

[PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file.  Now the "auto" clause appears first, instead of after "private".

2020-08-23   Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95  | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
index 41f6307dbb17..42e751dbfb83 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
@@ -30,4 +30,4 @@ subroutine f (a, b, c)
 !$acc end kernels
 end subroutine f

-! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 
"original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
index d51482e4685d..6e2e2c41172b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
index 3c4956d70775..03c4234ce7cd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
@@ -36,4 +36,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
index 3ec459f0a8df..6aeb3f2fe4d0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
@@ -35,4 +35,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
index 91f431cca432..7d1cff64a3d9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
@@ -32,4 +32,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git 

[PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Test for
calls to builtins.

gcc/fortran/
* openmp.c (check_expr_for_invalid_calls): Check for intrinsic
functions.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
---
 gcc/c-family/c-omp.c  | 10 ---
 gcc/fortran/openmp.c  |  9 ---
 .../goacc/kernels-loop-annotation-20.c| 23 
 .../goacc/kernels-loop-annotation-20.f95  | 26 +++
 4 files changed, 61 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 30757877eafe..e7c27f45e888 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   break;

 case CALL_EXPR:
-  /* Direct function calls to functions marked as OpenACC routines are
-allowed.  Reject indirect calls or calls to non-routines.  */
+  /* Direct function calls to builtins and functions marked as
+OpenACC routines are allowed.  Reject indirect calls or calls
+to non-routines.  */
   if (info->state >= as_in_kernels_loop)
{
  tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE;
@@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
}
  if (fn_decl == NULL_TREE)
do_not_annotate_loop_nest (info, as_invalid_call, node);
- else if (!lookup_attribute ("oacc function",
- DECL_ATTRIBUTES (fn_decl)))
+ else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL)
+  && !lookup_attribute ("oacc function",
+DECL_ATTRIBUTES (fn_decl)))
do_not_annotate_loop_nest (info, as_invalid_call, node);
}
   break;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b0b68b494778..d5d996e378d7 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int 
*walk_subtrees,
   switch (expr->expr_type)
 {
 case EXPR_FUNCTION:
-  if (expr->value.function.esym
- && (expr->value.function.esym->attr.oacc_routine_lop
- != OACC_ROUTINE_LOP_NONE))
+  /* Permit calls to Fortran intrinsic functions and to routines
+with an explicitly declared parallelism level.  */
+  if (expr->value.function.isym
+ || (expr->value.function.esym
+ && (expr->value.function.esym->attr.oacc_routine_lop
+ != OACC_ROUTINE_LOP_NONE)))
return 0;
   /* Else fall through.  */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
new file mode 100644
index ..5e3f02845713
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that calls to built-in functions don't inhibit kernels loop
+   annotation.  */
+
+void foo (int n, int *input, int *out1, int *out2)
+{
+#pragma acc kernels
+  {
+int i;
+
+for (i = 0; i < n; i++)
+  {
+   out1[i] = __builtin_clz (input[i]);
+   out2[i] = __builtin_popcount (input[i]);
+  }
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
new file mode 100644
index ..5169a0a1676d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
@@ -0,0 +1,26 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with calls to 

[PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran).

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  

gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
---
 gcc/fortran/openmp.c  | 50 ++-
 .../gfortran.dg/goacc/combined-directives.f90 | 19 +--
 .../goacc/kernels-loop-annotation-18.f95  | 28 +++
 .../goacc/kernels-loop-annotation-19.f95  | 29 +++
 .../goacc/private-explicit-kernels-1.f95  |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95 |  7 ++-
 6 files changed, 131 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 243b5e0a9ac6..b0b68b494778 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,

case EXEC_OACC_PARALLEL_LOOP:
case EXEC_OACC_PARALLEL:
-   case EXEC_OACC_KERNELS_LOOP:
case EXEC_OACC_LOOP:
  /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
@@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,
}
  break;

+   case EXEC_OACC_KERNELS_LOOP:
+ /* This is a combined "acc kernels loop" directive.  We want to
+leave the outer loop alone but try to annotate any nested
+loops in the body.  The expected structure nesting here is
+  EXEC_OACC_KERNELS_LOOP
+EXEC_OACC_KERNELS_LOOP
+  EXEC_DO
+EXEC_DO
+  ...body...  */
+ if (code->block)
+   /* Might be empty?  */
+   {
+ gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP);
+ gfc_omp_clauses *clauses = code->ext.omp_clauses;
+ int collapse = clauses->collapse;
+ gfc_expr_list *tile = clauses->tile_list;
+ gfc_code *inner = code->block->next;
+
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+
+ /* We need to skip over nested loops covered by "collapse" or
+"tile" clauses.  "Tile" takes precedence
+(see gfc_trans_omp_do).  */
+ if (tile)
+   {
+ collapse = 0;
+ for (gfc_expr_list *el = tile; el; el = el->next)
+   collapse++;
+   }
+ if (clauses->orderedc)
+   collapse = clauses->orderedc;
+ if (collapse <= 0)
+   collapse = 1;
+ for (int i = 1; i < collapse; i++)
+   {
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+ inner = inner->block->next;
+   }
+ if (inner)
+   /* Loop might have empty body?  */
+   annotate_do_loops_in_kernels (inner->block->next,
+ inner, goto_targets,
+ as_in_kernels_region);
+   }
+ walk_block = false;
+ break;
+
case EXEC_DO_WHILE:
case EXEC_DO_CONCURRENT:
  /* Traverse the body in a special state to allow EXIT statements
diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 
b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 956349204f4d..562a4e40cd7d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -139,10 +139,21 @@ end subroutine test

 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. 
collapse.2." 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc 

[PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++).

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.
---
 gcc/c-family/c-omp.c  | 36 ---
 .../c-c++-common/goacc/combined-directives.c  |  2 +-
 .../goacc/kernels-loop-annotation-18.c| 18 ++
 .../goacc/kernels-loop-annotation-19.c| 19 ++
 4 files changed, 62 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index fad50da8fbc4..30757877eafe 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3477,18 +3477,30 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
 purpose; for example, all available levels of parallelism may
-have been used up.  */
-  {
-   struct annotation_info nested_info
- = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
- node, info };
-   if (info->state >= as_in_kernels_region)
- do_not_annotate_loop_nest (info, as_explicit_annotation,
-node);
-   walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
-  (void *) _info, NULL);
-   *walk_subtrees = 0;
-  }
+have been used up.  However, assume that the combined construct
+"#pragma acc kernels loop" means to try to process the whole
+loop nest.
+Note that a single OACC_LOOP construct represents an entire set
+of collapsed loops so we do not have to deal explicitly with the
+collapse clause here, as the Fortran front end does.  */
+  if (info->state == as_in_kernels_region && OACC_LOOP_COMBINED (node))
+   {
+ walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
+(void *) info, NULL);
+ *walk_subtrees = 0;
+   }
+  else
+   {
+ struct annotation_info nested_info
+   = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
+   node, info };
+ if (info->state >= as_in_kernels_region)
+   do_not_annotate_loop_nest (info, as_explicit_annotation,
+  node);
+ walk_tree (_BODY (node), annotate_loops_in_kernels_regions,
+(void *) _info, NULL);
+ *walk_subtrees = 0;
+   }
   break;

 case FOR_STMT:
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c 
b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c2a3c57b48b8..2519f23d49f0 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -110,7 +110,7 @@ test ()
 // { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop auto" 6 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 
"gimple" } }
 // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
new file mode 100644
index ..89ec6447625f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that "acc kernels loop" directive causes annotation of the entire
+   loop nest.  */
+
+void f (float *a, float *b)
+{
+#pragma acc kernels loop
+  for (int k = 0; k < 20; 

[PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-08-19  Sandra Loosemore  

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED.  Update all call sites.
---
 gcc/c/c-parser.c   |  3 +++
 gcc/cp/parser.c|  3 +++
 gcc/fortran/trans-openmp.c | 34 +-
 gcc/tree.h |  5 +
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599ef..1258b48693de 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
   tree block = c_begin_compound_stmt (true);
   tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL,
 if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6a..c834d25b028f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
 omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
   tree block = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
   tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (block));

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e81c5588c53c..618e106791e5 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4855,7 +4855,8 @@ typedef struct dovar_init_d {

 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
- gfc_omp_clauses *do_clauses, tree par_clauses)
+ gfc_omp_clauses *do_clauses, tree par_clauses,
+ bool combined)
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
@@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
stmtblock_t *pblock,
 case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break;
 case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
 case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
-case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+case EXEC_OACC_LOOP:
+  stmt = make_node (OACC_LOOP);
+  OACC_LOOP_COMBINED (stmt) = combined;
+  break;
 default: gcc_unreachable ();
 }

@@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pblock = 
   else
 pushlevel ();
-  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, NULL);
+  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, NULL,
+  true);
   protected_set_expr_location (stmt, loc);
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
@@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t 
*pblock,
 omp_do_clauses
   = gfc_trans_omp_clauses (, [GFC_OMP_SPLIT_DO], code->loc);
   body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : ,
-  [GFC_OMP_SPLIT_SIMD], omp_clauses);
+  [GFC_OMP_SPLIT_SIMD], omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (body) != BIND_EXPR)
@@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, 
stmtblock_t *pblock,
 }
   stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO,
   new_pblock, [GFC_OMP_SPLIT_DO],
-  omp_clauses);
+  omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (stmt) != BIND_EXPR)
@@ -6496,7 +6501,8 @@ gfc_trans_omp_distribute (gfc_code *code, gfc_omp_clauses 

[PATCH 05/40] Fix bug in processing of array dimensions in data clauses.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.

This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed.  It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.

2020-03-31  Sandra Loosemore  

gcc/cp/
* semantics.c (handle_omp_array_sections_1): Call STRIP_NOPS
on length and low_bound;
(handle_omp_array_sections): Likewise.
---
 gcc/cp/semantics.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 2443d0327498..c2643d0a7a24 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5145,6 +5145,10 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
,
   if (length)
 length = mark_rvalue_use (length);
   /* We need to reduce to real constant-values for checks below.  */
+  if (length)
+STRIP_NOPS (length);
+  if (low_bound)
+STRIP_NOPS (low_bound);
   if (length)
 length = fold_simple (length);
   if (low_bound)
@@ -5457,6 +5461,11 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  tree low_bound = TREE_PURPOSE (t);
  tree length = TREE_VALUE (t);

+ if (length)
+   STRIP_NOPS (length);
+ if (low_bound)
+   STRIP_NOPS (low_bound);
+
  i--;
  if (low_bound
  && TREE_CODE (low_bound) == INTEGER_CST
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
line numbering.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Add
-fno-openacc-kernels-annotate-loops.
---
 .../gfortran.dg/goacc/classify-kernels-unparallelized.f95| 5 +++--
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95  | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 2ceae2088070..00aac9aa94ea 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -23,8 +23,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC seq loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "note: beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 24 }
  c(i) = a(f (i)) + b(f (i))
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index d061a241074b..ba815319abf2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -19,8 +19,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC gang loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 20 }
  c(i) = a(i) + b(i)
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 238482b91a49..04c998d11dad 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,5 +1,6 @@
 ! Test OpenACC 'kernels' construct decomposition.

+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 03/40] Kernels loops annotation: Fortran.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

Co-Authored-By: Gergö Barany 

gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.c: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
---
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/lang.opt  |   8 +
 gcc/fortran/openmp.c  | 364 ++
 gcc/fortran/parse.c   |   9 +
 .../goacc/classify-kernels-unparallelized.f95 |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95|   1 +
 .../gfortran.dg/goacc/common-block-3.f90  |   1 +
 .../gfortran.dg/goacc/kernels-loop-2.f95  |   1 +
 .../goacc/kernels-loop-annotation-1.f95   |  33 ++
 .../goacc/kernels-loop-annotation-10.f95  |  32 ++
 .../goacc/kernels-loop-annotation-11.f95  |  34 ++
 .../goacc/kernels-loop-annotation-12.f95  |  39 ++
 .../goacc/kernels-loop-annotation-13.f95  |  38 ++
 .../goacc/kernels-loop-annotation-14.f95  |  35 ++
 .../goacc/kernels-loop-annotation-15.f95  |  35 ++
 .../goacc/kernels-loop-annotation-16.f95  |  34 ++
 .../goacc/kernels-loop-annotation-2.f95   |  32 ++
 .../goacc/kernels-loop-annotation-3.f95   |  33 ++
 .../goacc/kernels-loop-annotation-4.f95   |  34 ++
 .../goacc/kernels-loop-annotation-5.f95   |  35 ++
 .../goacc/kernels-loop-annotation-6.f95   |  34 ++
 .../goacc/kernels-loop-annotation-7.f95   |  48 +++
 .../goacc/kernels-loop-annotation-8.f95   |  50 +++
 .../goacc/kernels-loop-annotation-9.f95   |  34 ++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |   1 +
 .../goacc/kernels-loop-data-enter-exit.f95|   1 +
 .../goacc/kernels-loop-data-update.f95|   1 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |   1 +
 .../gfortran.dg/goacc/kernels-loop-n.f95  |   1 +
 .../gfortran.dg/goacc/kernels-loop.f95|   1 +
 .../kernels-parallel-loop-data-enter-exit.f95 |   1 +
 32 files changed, 974 insertions(+)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 

[PATCH 01/40] Kernels loops annotation: C and C++.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.

gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.c: Include tree-iterator.h
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): Split from c_finish_omp_for.  Use
annotation_error function.  Code refactoring to avoid destructive
changes that cannot be undone in case of error.
(is_local_var): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(check_and_annotate_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.

gcc/c/
* c-decl.c (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/cp/
* decl.c (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document -fno-openacc-kernels-annotate-loops.

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.
---
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 799 --
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/cp/decl.c |  44 +
 gcc/doc/invoke.texi   |  32 +-
 .../goacc/classify-kernels-unparallelized.c   |   1 +
 .../c-c++-common/goacc/classify-kernels.c |   3 +-
 .../kernels-counter-var-redundant-load.c  |   1 +
 .../kernels-counter-vars-function-scope.c |   1 +
 .../goacc/kernels-double-reduction-n.c|   1 +
 

[PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Add
-fno-openacc-kernels-annotate-loops.
---
 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index cdf85d4bafae..0f2d2f0a757b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,5 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 00/40] OpenACC "kernels" Improvements

2021-12-15 Thread Frederik Harwath
Hi,
this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  Versions
of the patches have also been committed to the devel/omp/gcc-11 branch
recently.

The patch series contains middle-end changes that modify the "kernels"
loop handling to use Graphite for dependence analysis of loops in
"kernels" regions, as well as new optimizations and adjustments to
existing optimizations to support this analysis. A central step is
contained in the commit titled "openacc: Use Graphite for dependence
analysis in \"kernels\" regions" whose commit message also contains
further explanations. There are also front end changes (cf. the
patches by Sandra Loosemore) that prepare the loops in "kernels"
regions for the middle-end processing and which lift various
restrictions on "kernels" regions.  I have included some dependences
(the patches by Julian Brown) from the devel/omp/gcc-11 branch which
will be re-submitted independently for review.

I have bootstrapped the compiler on x86_64-linux-gnu and performed
comprehensive testing on a powerpc64le-linux-gnu target.  The patches
should apply cleanly on commit r12-4865 of the master branch.

I am aware that we cannot incorporate those patches into GCC at the
current development stage. I hope that we can discuss some of the
changes before they can be considered for inclusion in GCC during the
next stage 1.

Best regards,
Frederik


Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime a lias checking for OpenACC kernels

Frederik Harwath (20):
  Fortran: Delinearize array accesses
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Enable reduction variable localization for "kernels"
  openacc: Check type for references in reduction lowering
  openacc: Adjust testsuite to new "kernels" handling

Julian Brown (4):
  Reference reduction localization
  Fix tree check failure with reduction localization
  Use more appropriate var in localize_reductions call
  Handle references in OpenACC "private" clauses

Sandra Loosemore (12):
  Kernels loops annotation: C and C++.
  Add -fno-openacc-kernels-annotate-loops option to more testcases.
  Kernels loops annotation: Fortran.
  Additional Fortran testsuite fixes for kernels loops annotation pass.
  Fix bug in processing of array dimensions in data clauses.
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).
  Permit calls to builtins and intrinsics in kernels loops.
  Fix patterns in Fortran tests for kernels loop annotation.
  Clean up loop variable extraction in OpenACC kernels loop annotation.
  Relax some restrictions on the loop bound in kernels loop annotation.

Tobias Burnus (2):
  Fix for is_gimple_reg vars to 'data kernels'
  openacc: fix privatization of by-reference arrays

 gcc/Makefile.in   |   2 +
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 915 +++--
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/c/c-parser.c  |   3 +
 gcc/cfgloop.c |   1 +
 gcc/cfgloop.h |   6 +
 gcc/cfgloopmanip.c|   1 +
 gcc/common.opt|   9 +
 gcc/config/nvptx/nvptx.c  |   7 +
 gcc/cp/decl.c |  44 +
 gcc/cp/parser.c   |   3 +
 gcc/cp/semantics.c|   9 +
 gcc/doc/gimple.texi   |   2 +
 gcc/doc/invoke.texi   |  52 +-
 gcc/doc/passes.texi  

[OG11][committed][PATCH 21/22] graphite: Accept loops without data references

2021-11-17 Thread Frederik Harwath
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 99e906a5d120..9311a0e42a57 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -851,19 +851,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
  return true;
}

-  /* Check if all loop nests have at least one data reference.
-???  This check is expensive and loops premature at this point.
-If important to retain we can pre-compute this for all innermost
-loops and reject those when we build a SESE region for a loop
-during SESE discovery.  */
-  if (! loop->inner
- && ! loop_nest_has_data_refs (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << " does not have any data reference.\n");
- return true;
-   }
-
   DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is 
harmless.\n");
 }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 20/22] graphite: Adjust scop loop-nest choice

2021-11-17 Thread Frederik Harwath
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c| 21 ++---
 gcc/graphite.h   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index bdabe588c3d8..ec055a358f39 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
 conditional if aliasing can be ruled out at runtime and the original
 version of the SCoP, otherwise. */

-  loop_p loop
-  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-  scop->scop_info->region.exit->src->loop_father);
+  loop_p loop = scop_context_loop (scop);
   tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
   tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
   set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index afc955cc97eb..99e906a5d120 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1776,9 +1793,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-   scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l 
&, loop_p, tree);
 extern void dot_all_sese (FILE *, vec &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 19/22] graphite: Tune parameters for OpenACC use

2021-11-17 Thread Frederik Harwath
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

* graphite-optimize-isl.c (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.c (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c   | 35 ---
 gcc/graphite-scop-detection.c | 28 ++-
 .../gcc.dg/goacc/graphite-parameter-1.c   | 21 +++
 .../gcc.dg/goacc/graphite-parameter-2.c   | 23 
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+ by "kernels" loops in existing OpenACC codes.  Raise the values
+ significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 35 /* default value */
+  && oacc_function_p (cfun))
+max_operations = 200;
+
   if (max_operations)
 isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
  dump_user_location_t loc = find_loop_location
(scop->scop_info->region.entry->dest->loop_father);
  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-"loop nest not optimized, optimization timed out "
-"after %d operations [--param 
max-isl-operations]\n",
-max_operations);
- else
+   {
+  if (oacc_function_p (cfun))
+   {
+ /* Special casing for OpenACC to unify diagnostic messages
+here and in graphite-scop-detection.c. */
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+   "data-dependence analysis of OpenACC loop "
+   "nest "
+   "failed; try increasing the value of "
+   "--param="
+   "max-isl-operations=%d.\n",
+   max_operations);
+}
+  else
+dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed "
+ "out after %d operations [--param "
+ "max-isl-operations]\n",
+ max_operations);
+}
+  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "loop nest not optimized, ISL signalled an 
error\n");
}
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 8b41044bce5e..afc955cc97eb 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2056,6 +2056,9 @@ determine_openacc_reductions (scop_p scop)
   }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
them to SCOPS.  */

@@ -2109,6 +2112,11 @@ build_scops (vec *scops)

[OG11][committed][PATCH 18/22] openacc: Disable pass_pre on outlined functions analyzed by Graphite

2021-11-17 Thread Frederik Harwath
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

* tree-ssa-pre.c (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 2aedc31e1d73..b904354e4c78 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
implement a bit more than just PRE here.  All of them piggy-back
@@ -3736,6 +3737,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+/* The additional dependences introduced by the code insertions
+ can cause Graphite's dependence analysis to fail .  Without
+ special handling of those dependences in Graphite, it seems
+ better to skip this step if OpenACC loops that need to be handled
+ by Graphite are found.  Note that the full redundancy elimination
+ step of this pass is useful for the purpose of dependence
+ analysis, for instance, because it can remove definitions from
+ SCoPs that would otherwise prevent the creation of runtime alias
+ checks since those may only use definitions that are available
+ before the SCoP. */
+
+  if (oacc_function_p (cfun)
+  && ::graphite_analyze_oacc_function_p (cfun))
+return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 17/22] openacc: Handle internal function calls in pass_lim

2021-11-17 Thread Frederik Harwath
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.c (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust 
declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call 
to
loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def |  2 +-
 gcc/tree-ssa-loop-im.c | 58 --
 gcc/tree-ssa-loop-manip.h  |  2 +-
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 7b799eca805c..d617438910fd 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2096,7 +2096,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
 {
   unsigned todo = TODO_update_ssa_only_virtuals;
-  todo |= loop_invariant_motion_in_fun (cfun, false);
+  todo |= loop_invariant_motion_in_fun (cfun, false, false);
   scev_reset ();
   return todo;
 }
diff --git a/gcc/passes.def b/gcc/passes.def
index 48c9821011f0..d1dedbc287e2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,7 +247,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_laddress);
-  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 7de47edbcb30..b392ae609aaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -320,11 +322,23 @@ enum move_pos
Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+  && gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == ARRAY_REF)
+ return MOVE_IMPOSSIBLE;
+}
+
   if (flag_unswitch_loops
   && gimple_code (stmt) == GIMPLE_COND)
 {
@@ -974,7 +988,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1002,7 +1016,7 @@ compute_invariantness (basic_block bb)
   {
stmt = gsi_stmt (bsi);

-   pos = movement_possibility (stmt);
+   pos = movement_possibility (stmt, 

[OG11][committed][PATCH 16/22] openacc: Warn about "independent" "kernels" loops with data-dependences

2021-11-17 Thread Frederik Harwath
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.c (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt|  5 +
 gcc/omp-offload.c | 49 +++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index aa695e56dc48..4c38ed5cf9ab 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -838,6 +838,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 94a975a88660..b806e36ef515 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2043,6 +2043,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop 
*loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+return;
+
+  if (loop->routine)
+return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+{
+  if (dump_enabled_p ())
+   {
+ const dump_user_location_t loc
+   = dump_user_location_t::from_location_t (loop->loc);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "'independent' loop in 'kernels' region has not been 
"
+  "analyzed (cf. 'graphite' "
+  "dumps for more information).\n");
+   }
+  return;
+}
+
+  if (!can_be_parallel)
+warning_at (loop->loc, 0,
+"loop has \"independent\" clause but data dependences were "
+"found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
programmer-specified partitionings.  OUTER_MASK is the partitioning
this loop is contained within.  Return mask of partitioning
@@ -2094,6 +2139,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)
}
}

+  /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+  if (warn_openacc_false_independent)
+oacc_loop_warn_if_false_independent (loop);
+
   if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
{
  loop->flags |= OLF_AUTO;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 14/22] openacc: Add data optimization pass

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/Makefile.in   |   1 +
 gcc/doc/gimple.texi   |   2 +
 gcc/gimple-walk.c |  15 +-
 gcc/gimple-walk.h |   6 +
 gcc/omp-data-optimize.cc  | 951 ++
 gcc/passes.def|   1 +
 .../goacc/note-parallelism-1-kernels-loops.c  |   7 +-
 ...note-parallelism-1-kernels-straight-line.c |   9 +-
 .../goacc/note-parallelism-kernels-loops.c|  10 +-
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C| 169 
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h   |   1 +
 .../kernels-decompose-1.c |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   4 +
 17 files changed, 2444 insertions(+), 7 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4ebdcdbc5f8c..8c02b85d2a96 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
omp-low.o \
omp-oacc-kernels-decompose.o \
omp-simd-clone.o \
+   omp-data-optimize.o \
opt-problem.o \
optabs.o \
optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e3..a83e17f71a40 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} 
is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, 
the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index cd287860994e..66fd491844d7 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn 
callback_stmt,
}

   if (!wi->removed_stmt)
-   gsi_next ();
+   {
+ if (forward)
+   gsi_next ();
+   

[OG11][committed][PATCH 15/22] openacc: Add runtime alias checking for OpenACC kernels

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.c: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c  | 122 ++
 gcc/graphite-scop-detection.c |  18 +-
 gcc/omp-expand.c  |  37 +-
 gcc/omp-offload.c | 413 ++
 .../runtime-alias-check-1.c   |  79 
 .../runtime-alias-check-2.c   |  90 
 6 files changed, 550 insertions(+), 209 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c516170d9493..bdabe588c3d8 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1698,6 +1699,127 @@ graphite_oacc_analyze_scop (scop_p scop)
   print_isl_schedule (dump_file, scop->original_schedule);
 }

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  sese_info_p region = scop->scop_info;
+
+  /* Usually there will be a chunking loop with the actual work loop
+inside it.  In some corner cases there may only be one loop.  */
+  loop_p top_loop = region->region.entry->dest->loop_father;
+  loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, 
active_loop);
+
+  /* Walk back to GOACC_LOOP block.  */
+  basic_block goacc_loop_block = region->region.entry->src;
+
+  /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+OpenACC kernels loop and will need different handling.  */
+  gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+  while (!gsi_end_p (gsitop)
+&& (!is_gimple_call (gsi_stmt (gsitop))
+|| !gimple_call_internal_p (gsi_stmt (gsitop))
+|| (gimple_call_internal_fn (gsi_stmt (gsitop))
+!= IFN_GOACC_LOOP)))
+   gsi_next ();
+
+  if (!gsi_end_p (gsitop))
+   {
+ /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+statements.  There ought not be any problematic dependencies 
because
+the chunk size and step are only computed for very specific 
purposes.
+They may not be at the very top of the block, but they should be
+found together (the asserts test this assuption). */
+ gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+ gsi_move_after (, );
+ gimple_stmt_iterator gsiinsert = gsibottom;
+ gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+  && gimple_call_internal_p (gsi_stmt (gsitop))
+  && (gimple_call_internal_fn (gsi_stmt (gsitop))
+  == IFN_GOACC_LOOP));
+ gsi_move_after (, );
+
+ /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+Note that these likely depend on some of the hoisted statements.  
*/
+ tree cond_val = force_gimple_operand_gsi (, cond, true, 
NULL,
+   true, GSI_NEW_STMT);
+
+ /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+ for (int n = -1; n < (int)region->bbs.length (); n++)
+   {
+ /* Cover the region plus goacc_loop_block.  */
+ basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next ())
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ if 

[OG11][committed][PATCH 13/22] Add function for printing a single OMP_CLAUSE

2021-11-17 Thread Frederik Harwath
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

* tree-pretty-print.c (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index d769cd8f07c5..2e0255176c76 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1402,6 +1402,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int 
spc, dump_flags_t flags)
 }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text ());
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index cafe9aa95989..3368cb9f1544 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = 
TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
  bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
  enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 11/22] openacc: Add further kernels tests

2021-11-17 Thread Frederik Harwath
Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.
---
 ...classify-kernels-unparallelized-graphite.c |  41 +
 ...classify-kernels-unparallelized-parloops.c |  47 ++
 .../goacc/kernels-decompose-1-parloops.c  | 125 ++
 .../goacc/kernels-reduction-parloops.c|  36 
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 +++
 ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 +++
 .../note-parallelism-kernels-loops-parloops.c |  53 ++
 ...assify-kernels-unparallelized-parloops.f95 |  44 +
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 ++
 .../goacc/kernels-decompose-1-parloops.f95| 121 ++
 .../goacc/kernels-decompose-parloops-2.f95| 154 ++
 .../goacc/kernels-loop-data-parloops-2.f95|  52 ++
 .../goacc/kernels-loop-parloops-2.f95 |  45 +
 .../goacc/kernels-loop-parloops.f95   |  39 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 .../parallel-loop-auto-reduction-2.f90|  98 +++
 17 files changed, 1155 insertions(+)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90

diff --git 
a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
new file mode 100644
index ..77f4524907a9
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
@@ -0,0 +1,41 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC 'kernels' with Graphite kernles handling (default).  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   { dg-additional-options "-fopt-info-optimized-omp" }
+   { dg-additional-options "-fopt-info-note-omp" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-graphite-details" }
+   { dg-additional-options "-fdump-tree-oaccloops1" }
+  

[OG11][committed][PATCH 12/22] openacc: Remove unused partitioning in "kernels" regions

2021-11-17 Thread Frederik Harwath
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

* omp-offload.c (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
---
 gcc/omp-offload.c | 51 +--
 .../acc_prof-kernels-1.c  | 19 ---
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index f5cb222efd8c..68cc5a9d9e5d 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1215,6 +1215,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+  {
+if (host_compiler)
+  {
+strcat (removed_partitions, axes[ix]);
+strcat (removed_partitions, " ");
+  }
+dims[ix] = -1;
+  }
+  if (removed_partitions[0] != '\0')
+warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+"removed %spartitioning from % region",
+removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
raw attribute.  DIMS is an array of dimensions, which is filled in.
LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1235,6 +1268,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
 {
   purpose[ix] = TREE_PURPOSE (pos);
+
   tree val = TREE_VALUE (pos);
   dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
   pos = TREE_CHAIN (pos);
@@ -1244,14 +1278,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
   if (check
   && warn_openacc_parallelism
-  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-  && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES 
(fn)))
+  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
 {
-  static char const *const axes[] =
-  /* Must be kept in sync with GOMP_DIM enumeration.  */
-   { "gang", "worker", "vector" };
   for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
if (dims[ix] < 0)
  ; /* Defaulting axis.  */
@@ -1262,14 +1297,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
  "region contains %s partitioned code but"
  " is not %s partitioned", axes[ix], axes[ix]);
else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+ {
  /* The dimension is explicitly partitioned to non-unity, but
 no use is made within the region.  */
  warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
  "region is %s partitioned but"
  " does not contain %s partitioned code",
  axes[ix], axes[ix]);
+  }
 }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+ DECL_ATTRIBUTES (fn)))
+oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 4a9b11a3d3fe..d398b3463617 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include 

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" 
} { "" } } 

[OG11][committed][PATCH 10/22] openacc: Add "can_be_parallel" flag info to "graph" dumps

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graph.c (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
---
 gcc/graph.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index ce8de33ffe10..3ad07be3b309 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -191,6 +191,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct 
function *fun)
 }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
order to get a good ranking of the nodes.  This function is recursive:
It first prints inner loops, then the body of LOOP itself.  */
@@ -205,17 +209,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int 
funcdef_no,

   if (loop->header != NULL
   && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-pp_printf (pp,
-  "\tsubgraph cluster_%d_%d {\n"
-  "\tstyle=\"filled\";\n"
-  "\tcolor=\"darkgreen\";\n"
-  "\tfillcolor=\"%s\";\n"
-  "\tlabel=\"loop %d\";\n"
-  "\tlabeljust=l;\n"
-  "\tpenwidth=2;\n",
-  funcdef_no, loop->num,
-  fillcolors[(loop_depth (loop) - 1) % 3],
-  loop->num);
+{
+  pp_printf (pp,
+ "\tsubgraph cluster_%d_%d {\n"
+ "\tstyle=\"filled\";\n"
+ "\tcolor=\"darkgreen\";\n"
+ "\tfillcolor=\"%s\";\n"
+ "\tlabel=\"loop %d %s\";\n"
+ "\tlabeljust=l;\n"
+ "\tpenwidth=2;\n",
+ funcdef_no, loop->num,
+ fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+ /* This is only meaningful for loops that have been processed
+by Graphite.
+
+TODO Use can_be_parallel_valid_p? */
+ !oacc_get_fn_attrib (cfun->decl)
+ ? ""
+ : loop->can_be_parallel ? "(can_be_parallel = true)"
+ : "(can_be_parallel = false)");
+}

   for (class loop *inner = loop->inner; inner; inner = inner->next)
 draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 08/22] graphite: Add runtime alias checking

2021-11-17 Thread Frederik Harwath
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.c
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.c (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.
gcc/testsuite/ChangeLog:

* gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt  |   4 +
 gcc/graphite-isl-ast-to-gimple.c|  60 ++
 gcc/graphite-poly.c |   2 +
 gcc/graphite-scop-detection.c   | 239 +---
 gcc/graphite.h  |   4 +
 gcc/sese.h  |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 326 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 771398bc03de..aa695e56dc48 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1636,6 +1636,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 44c06016f1a2..caa0160b9bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
 }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec _ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+   && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec check_pairs;
+  compute_alias_check_pairs (context_loop, _ddrs, _pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, _pairs, _cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Generated runtime alias check: ");
+  print_generic_expr (dump_file, alias_cond, dump_flags);
+  fprintf (dump_file, "\n");
+}
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  /* SCoP detection has failed to handle the aliasing between some data
+references of the SCoP statically. Generate an alias check that selects
+the newly generated version of the SCoP in the true-branch of the
+conditional if aliasing can be ruled out at runtime and the original
+version of the SCoP, otherwise. */
+
+  loop_p loop
+  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+  scop->scop_info->region.exit->src->loop_father);
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+  tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+  set_ifsese_condition (region->if_region, 

[OG11][committed][PATCH 07/22] Move compute_alias_check_pairs to tree-data-ref.c

2021-11-17 Thread Frederik Harwath
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.c (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.c (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c  | 87 
 gcc/tree-data-ref.h  |  3 ++
 gcc/tree-loop-distribution.c | 87 
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index d04e95f7c285..71f8d790e618 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2645,6 +2645,93 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+  fold_convert (sizetype, niters),
+  size_one_node);
+  return size_binop (MULT_EXPR,
+fold_convert (sizetype, DR_STEP (dr)),
+fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec *alias_ddrs,
+  vec *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+{
+  ddr_p ddr = (*alias_ddrs)[i];
+  struct data_reference *dr_a = DDR_A (ddr);
+  struct data_reference *dr_b = DDR_B (ddr);
+  tree seg_length_a, seg_length_b;
+
+  if (latch_dominated_by_data_ref (loop, dr_a))
+   seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+  else
+   seg_length_a = data_ref_segment_size (dr_a, niters);
+
+  if (latch_dominated_by_data_ref (loop, dr_b))
+   seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+  else
+   seg_length_b = data_ref_segment_size (dr_b, niters);
+
+  unsigned HOST_WIDE_INT access_size_a
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a;
+  unsigned HOST_WIDE_INT access_size_b
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b;
+  unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+  unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+  dr_with_seg_len_pair_t dr_with_seg_len_pair
+   (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+/* ??? Would WELL_ORDERED be safe?  */
+dr_with_seg_len_pair_t::REORDERED);
+
+  comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+}
+
+  if (tree_fits_uhwi_p (niters))
+factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+"Improved number of alias checks from %d to %d\n",
+alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 8001cc54f518..5016ec926b1d 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -577,6 +577,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop 
*, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec 

[OG11][committed][PATCH 05/22] graphite: Fix minor mistakes in comments

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
  a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
  in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c202213f39b3..44c06016f1a2 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
- in strage ways when inserting the stmts from it into different basic
+ in strange ways when inserting the stmts from it into different basic
  blocks one at a time.  */
   auto_vec stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 195851cb540a..12fa2d669b3c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -644,14 +644,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, 
enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
  the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
- data reference DR.  */
+ the reference */
   isl_constraint *c
 = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space 
(acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 04/22] graphite: Rename isl_id_for_ssa_name

2021-11-17 Thread Frederik Harwath
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index eebf2e02cfca..195851cb540a 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take 
isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -893,15 +894,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
 space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
+  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 02/22] openacc: Move pass_oacc_device_lower after pass_graphite

2021-11-17 Thread Frederik Harwath
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.c (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.c (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.c (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
* to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/omp-general.c |  8 +-
 gcc/omp-offload.c |  8 ++
 gcc/passes.c  | 42 
 gcc/passes.def  

[OG11][committed][PATCH 03/22] graphite: Extend SCoP detection dump output

2021-11-17 Thread Frederik Harwath
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

* graphite-scop-detection.c (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +-
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer , const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer , gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer , tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+if (loop_in_sese_p (loop, sese))
+  fprintf (file, "%d, ", loop->num);
+printed = true;
+  }
+  if (printed)
+fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
   if (! next
  || harmful_loop_in_region (next))
{
- if (s)
-   add_scop (s);
+  if (next)
+DEBUG_PRINT (
+dp << "[scop-detection] Discarding SCoP on loops ";
+print_sese_loop_numbers (dump_file, next);
+dp << " because of harmful loops\n";);
+  if (s)
+add_scop (s);
  build_scop_depth (loop);
  s = invalid_sese;
}
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
   || !single_pred_p (loop->latch)
   || exit->src != single_pred (loop->latch)
   || !empty_block_p (loop->latch))
-return false;
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape 
unsupported.\n");
+  return false;
+}
+
+  bool edge_irreducible
+  = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+  return false;
+}
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+  single_exit (loop),
+  _desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-&& number_of_iterations_exit (loop, single_exit (loop), _desc, false)
-&& niter_desc.control.no_overflow
-&& (niter = number_of_latch_executions (loop))
-&& !chrec_contains_undetermined (niter)
-&& graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+ << "Condition: " << niter_desc.assumptions << "\n");
+  return false;
+}
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+  return false;
+}
+  if (!niter_desc.control.no_overflow)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can 
overflow.\n");
+  return false;
+}
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter chrec contains undetermined coefficients.\n");
+  return false;
+}
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter expression cannot be represented: "
+  << niter << "\n");
+  return false;
+}
+
+  return true;
 }

 /* Return true 

[OG11][committed][PATCH 01/22] Fortran: delinearize multi-dimensional array accesses

2021-11-17 Thread Frederik Harwath
From: Sandra Loosemore 

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

gcc/
* expr.c (get_inner_reference): Handle NOP_EXPR like
VIEW_CONVERT_EXPR.

gcc/fortran/
* lang.opt (-param=delinearize=): New.
* trans-array.c (get_class_array_vptr): New, split from...
(build_array_ref): ...here.
(get_array_lbound, get_array_ubound): New, split from...
(gfc_conv_array_ref): ...here.  Additional code refactoring
plus support for delinearization of the array access.

gcc/testsuite/
* gfortran.dg/assumed_type_2.f90: Adjust patterns.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/graphite/block-3.f90: Remove xfails.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Adjust patterns.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus  
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   1 -
 .../gfortran.dg/graphite/block-4.f90  |   1 -
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 13 files changed, 264 insertions(+), 95 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 21b7e96ed62e..c7ee800c4d4f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7539,6 +7539,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index dba333448c11..1548d56278a4 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index b7d949929722..3eb9a1778173 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+   

[OG11][committed][PATCH 00/22] OpenACC "kernels" Improvements

2021-11-17 Thread Frederik Harwath
Hi,

this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  The
central step is contained in the commit titled "openacc: Use Graphite
for dependence analysis in \"kernels\" regions" whose commit message
also contains further explanations.

Best regards,
Frederik

PS: The commit series also includes a backport from master
"00b98b6cac25 Add dg-final option-based target selectors" and two
trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg
declaration" and "35cdc94463fe Fix branch prediction dump message"



Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime alias checking for OpenACC kernels

Frederik Harwath (19):
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Fix branch prediction dump message
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Add further kernels tests
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Adjust test expectations to new "kernels" handling

Sandra Loosemore (1):
  Fortran: delinearize multi-dimensional array accesses

 gcc/Makefile.in   |2 +
 gcc/cfgloop.c |1 +
 gcc/cfgloop.h |6 +
 gcc/cfgloopmanip.c|1 +
 gcc/common.opt|9 +
 gcc/config/nvptx/nvptx.c  |7 +
 gcc/doc/gimple.texi   |2 +
 gcc/doc/invoke.texi   |   20 +-
 gcc/doc/passes.texi   |6 +-
 gcc/expr.c|1 +
 gcc/flag-types.h  |1 +
 gcc/fortran/lang.opt  |4 +
 gcc/fortran/trans-array.c |  321 --
 gcc/gimple-loop-interchange.cc|2 +-
 gcc/gimple-pretty-print.c |3 +
 gcc/gimple-walk.c |   15 +-
 gcc/gimple-walk.h |6 +
 gcc/gimple.h  |7 +-
 gcc/gimplify.c|   13 +-
 gcc/graph.c   |   35 +-
 gcc/graphite-dependences.c|  220 +++-
 gcc/graphite-isl-ast-to-gimple.c  |  271 -
 gcc/graphite-oacc.c   |  689 
 gcc/graphite-oacc.h   |   55 +
 gcc/graphite-optimize-isl.c   |   42 +-
 gcc/graphite-poly.c   |   41 +-
 gcc/graphite-scop-detection.c |  654 +--
 gcc/graphite-sese-to-poly.c   |   90 +-
 gcc/graphite.c|  120 +-
 gcc/graphite.h|   40 +-
 gcc/internal-fn.c |2 +
 gcc/internal-fn.h |4 +-
 gcc/omp-data-optimize.cc  |  951 
 gcc/omp-expand.c  |  110 +-
 gcc/omp-general.c |   23 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.c |  321 +-
 gcc/omp-oacc-kernels-decompose.cc |  145 ++-
 gcc/omp-offload.c | 1001 +
 gcc/omp-offload.h |2 +
 gcc/params.opt|5 +-
 gcc/passes.c  |   42 +
 gcc/passes.def|   47 +-
 gcc/predict.c |2 +-
 gcc/sese.c|   25 +-
 gcc/sese.h|   19 +
 gcc/testsuite/c-c++-common/goacc/acc-icf.c|4 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |2 +-
 ...classify-kernels-unparallelized-graphite.c |   41 +
 ...lassify-kernels-unparallelized-parloops.c} |   12 +-
 .../c-c++-common/goacc/classify-kernels.c |   27 +-
 .../c-

Re: [PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops

2020-11-16 Thread Frederik Harwath


Hi Richard,

Richard Biener  writes:

> On Thu, Nov 12, 2020 at 11:11 AM Frederik Harwath
>  wrote:
>>
>> This patch enables the use of Graphite for the analysis of OpenACC
>> "auto" loops. [...]
>> Furthermore, Graphite is extended by functionality that extends
>> its applicability to real-world code (e.g. runtime alias checking).
>
> I wonder if this can be split into a refactoring of graphite and adding
> runtime alias capability and a part doing the OpenACC pieces.
>

Yes, I did not remove the runtime alias checking from this WIP-patch,
but I planned to submit it separately. I am going to do this soon.

Frederik


> Richard.
>
>> ---
>>  gcc/common.opt|   8 +
>>  gcc/graphite-dependences.c|  12 +-
>>  gcc/graphite-isl-ast-to-gimple.c  |  77 +-
>>  gcc/graphite-oacc.h   |  90 ++
>>  gcc/graphite-scop-detection.c | 828 ++
>>  gcc/graphite-sese-to-poly.c   |  26 +-
>>  gcc/graphite.c| 403 -
>>  gcc/graphite.h|  11 +-
>>  gcc/internal-fn.h |   7 +-
>>  gcc/omp-expand.c  |  26 +-
>>  gcc/omp-offload.c | 173 +++-
>>  gcc/predict.c |   2 +-
>>  .../graphite/alias-0-no-runtime-check.c   |  20 +
>>  .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
>>  gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
>>  gcc/tree-chrec-oacc.h |  45 +
>>  gcc/tree-chrec.c  |  16 +-
>>  gcc/tree-data-ref.c   | 112 ++-
>>  gcc/tree-data-ref.h   |   8 +-
>>  gcc/tree-loop-distribution.c  |  17 +-
>>  gcc/tree-scalar-evolution.c   | 257 +-
>>  gcc/tree-ssa-loop-ivcanon.c   |   9 +-
>>  gcc/tree-ssa-loop-niter.c |  13 +
>>  23 files changed, 1870 insertions(+), 333 deletions(-)
>>  create mode 100644 gcc/graphite-oacc.h
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
>>  create mode 100644 gcc/tree-chrec-oacc.h
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index dfed6ec76ba..caaeaa1aa6f 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1600,6 +1600,14 @@ fgraphite-identity
>>  Common Report Var(flag_graphite_identity) Optimization
>>  Enable Graphite Identity transformation.
>>
>> +fgraphite-non-affine-accesses
>> +Common Report Var(flag_graphite_non_affine_accesses) Init(0)
>> +Allow Graphite to handle non-affine data accesses.
>> +
>> +fgraphite-runtime-alias-checks
>> +Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
>> +Allow Graphite to add runtime alias checks to loops if aliasing cannot be 
>> resolved statically.
>> +
>>  fhoist-adjacent-loads
>>  Common Report Var(flag_hoist_adjacent_loads) Optimization
>>  Enable hoisting adjacent loads to encourage generating conditional move
>> diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
>> index 7078c949800..76ba027cdf3 100644
>> --- a/gcc/graphite-dependences.c
>> +++ b/gcc/graphite-dependences.c
>> @@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *,
>>   {
>> if (dump_file)
>>   {
>> -   fprintf (dump_file, "Adding read to depedence graph: ");
>> +   fprintf (dump_file, "Adding read to dependence graph: ");
>> print_pdr (dump_file, pdr);
>>   }
>> isl_union_map *um
>> @@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *,
>> reads = isl_union_map_union (reads, um);
>> if (dump_file)
>>   {
>> -   fprintf (dump_file, "Reads depedence graph: ");
>> +   fprintf (dump_file, "Reads dependence graph: ");
>> print_isl_union_map (dump_file, reads);
>>   }
>>   }
>> @@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *,
>>   {
>> if (dump_file)
>>   {
>> -   fprin

[PATCH 2/2] OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

2020-11-12 Thread Frederik Harwath


This patch changes the "kernels" conversion to route loops in OpenACC
"kernels" regions through Graphite. This is done by converting the loops
in "kernels" regions which are not yet known to be "independent" to
"auto" loops as in the current (OG10) "parloops" based "kernels"
handling. Afterwards, the "kernels" regions will now be treated
essentially like "parallel" regions. A new internal target kind however
still enables to distinguish between the types of regions which is
useful for diagnostic messages.

The old "parloops" based "kernels" handling will be deprecated, but is
still available through the command line options
"-fopenacc-kernels=split-parloops" and "-fopenacc-kernels=parloops".
---
 gcc/c-family/c.opt|  5 +-
 gcc/doc/invoke.texi   | 10 ++-
 gcc/doc/passes.texi   |  6 +-
 gcc/flag-types.h  |  1 +
 gcc/gimple-pretty-print.c |  3 +
 gcc/gimple.h  |  9 ++-
 gcc/gimplify.c|  1 +
 gcc/omp-expand.c  | 63 +--
 gcc/omp-general.c | 19 -
 gcc/omp-general.h |  1 +
 gcc/omp-low.c | 76 +++
 gcc/omp-oacc-kernels.c| 59 --
 gcc/omp-offload.c | 50 +++-
 .../goacc/kernels-conversion-parloops.c   | 61 +++
 .../c-c++-common/goacc/kernels-conversion.c   | 12 +--
 .../gfortran.dg/goacc/kernels-reductions.f90  | 37 +
 gcc/tree-parloops.c   | 16 +++-
 gcc/tree-ssa-loop.c   | 10 +++
 18 files changed, 395 insertions(+), 44 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4ef7ea76aa1..255ff84ca4b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1747,7 +1747,7 @@ Specify default OpenACC compute dimensions.

 fopenacc-kernels=
 C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
--fopenacc-kernels=[split|parloops] Configure OpenACC 'kernels' constructs 
handling.
+-fopenacc-kernels=[split|split-parloops|parloops]  Configure OpenACC 
'kernels' constructs handling.

 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
@@ -1755,6 +1755,9 @@ Name(openacc_kernels) Type(enum openacc_kernels)
 EnumValue
 Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT)

+EnumValue
+Enum(openacc_kernels) String(split-parloops) 
Value(OPENACC_KERNELS_SPLIT_PARLOOPS)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fe04b4d8e6a..d713d6ae8ab 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2266,12 +2266,20 @@ permitted.
 @opindex fopenacc-kernels
 @cindex OpenACC accelerator programming
 Configure OpenACC 'kernels' constructs handling.
+
 With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
 are split into a sequence of compute constructs, each then handled
-individually.
+individually. The data dependence analysis that is necessary to
+determine if loops can be parallelized is performed by the Graphite
+pass.
 This is the default.
+With @option{-fopenacc-kernels=split-parloops}, OpenACC 'kernels' constructs
+are split into a sequence of compute constructs, each then handled
+individually.
+This is deprecated.
 With @option{-fopenacc-kernels=parloops}, the whole OpenACC
 'kernels' constructs is handled by the @samp{parloops} pass.
+This is deprecated.

 @item -fopenmp
 @opindex fopenmp
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 7424690dac3..5dda056a2bb 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -248,9 +248,9 @@ constraints in order to generate the points-to sets.  It is 
located in

 This is a pass group for processing OpenACC kernels regions.  It is a
 subpass of the IPA OpenACC pass group that runs on offloaded functions
-containing OpenACC kernels loops.  It is located in
-@file{tree-ssa-loop.c} and is described by
-@code{pass_ipa_oacc_kernels}.
+containing OpenACC kernels loops if @samp{parloops} based handling of
+kernels regions is used. It is located in @file{tree-ssa-loop.c} and
+is described by @code{pass_ipa_oacc_kernels}.

 @item Target clone

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e2255a56745..058c4e214af 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -376,6 +376,7 @@ enum cf_protection_level
 enum openacc_kernels
 {
   OPENACC_KERNELS_SPLIT,
+  OPENACC_KERNELS_SPLIT_PARLOOPS,
   OPENACC_KERNELS_PARLOOPS
 };

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 

[PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops

2020-11-12 Thread Frederik Harwath


This patch enables the use of Graphite for the analysis of OpenACC
"auto" loops. The goal is to decide if a loop may be parallelized
(i.e. converted to an "independent" loop) or not.  Graphite and the
functionality on which it relies (scalar evolution, data references) are
extended to interpret the internal representation of OpenACC loop
constructs that is encoded (e.g. through calls to OpenACC-specific
internal functions) in the OpenACC outlined functions (".omp_fn") and to
ignore some artifacts of the outlining process that are not relevant for
the analysis the original loops (e.g. pointers introduced for the
purpose of offloading are irrelevant to the question whether the
original loops can be parallelized or not). This is done in a way that
does not impact code which does not use OpenACC.  Furthermore, Graphite
is extended by functionality that extends its applicability to
real-world code (e.g. runtime alias checking).  The OpenACC lowering is
extended to use the result of Graphite's analysis to assign
"independent" clauses to loops.
---
 gcc/common.opt|   8 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  26 +-
 gcc/omp-offload.c | 173 +++-
 gcc/predict.c |   2 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 23 files changed, 1870 insertions(+), 333 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/tree-chrec-oacc.h

diff --git a/gcc/common.opt b/gcc/common.opt
index dfed6ec76ba..caaeaa1aa6f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1600,6 +1600,14 @@ fgraphite-identity
 Common Report Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-non-affine-accesses
+Common Report Var(flag_graphite_non_affine_accesses) Init(0)
+Allow Graphite to handle non-affine data accesses.
+
+fgraphite-runtime-alias-checks
+Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loops if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 7078c949800..76ba027cdf3 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding read to depedence graph: ");
+   fprintf (dump_file, "Adding read to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *,
reads = isl_union_map_union (reads, um);
if (dump_file)
  {
-   fprintf (dump_file, "Reads depedence graph: ");
+   fprintf (dump_file, "Reads dependence graph: ");
print_isl_union_map (dump_file, reads);
  }
  }
@@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding must write to depedence graph: ");
+   fprintf (dump_file, "Adding must write to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -106,7 +106,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
*,
must_writes = isl_union_map_union (must_writes, um);
if (dump_file)
  {
-

[PATCH 0/2] Use Graphite for OpenACC "kernels" regions

2020-11-12 Thread Frederik Harwath


Hi,
the two following patches implement a new handling of the loops in
OpenACC "kernels" regions which is based on Graphite and which is meant
to replace the current handling based on the "parloops" pass.  This
extends the class of OpenACC codes using "kernels" regions that can be
analysed by GCC's OpenACC implementation considerably.

We would like to incorporate this work into master soon, but further
work will be necessary in the next weeks to resolve some open questions,
clean up the code etc. In particular, the patches cannot be applied on
master currently because they rely on other patches which have not been
committed to master yet, e.g. the re-ordering of the OpenACC passes to
run device lowering after Graphite which has recently been submitted
(subject "Move pass_oacc_device_lower after pass_graphite"), the
transformation pass which converts OpenACC kernels regions to parallel
regions from OG10 (commit 809ea59722263eb6c2d48402e1eed80727134038).

Best regards,
Frederik


Frederik Harwath (2):
  [WIP] OpenACC: Add Graphite-based handling of "auto" loops
  OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

 gcc/c-family/c.opt|   5 +-
 gcc/common.opt|   8 +
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/passes.texi   |   6 +-
 gcc/flag-types.h  |   1 +
 gcc/gimple-pretty-print.c |   3 +
 gcc/gimple.h  |   9 +-
 gcc/gimplify.c|   1 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  89 +-
 gcc/omp-general.c |  19 +-
 gcc/omp-general.h |   1 +
 gcc/omp-low.c |  76 +-
 gcc/omp-oacc-kernels.c|  59 +-
 gcc/omp-offload.c | 223 -
 gcc/predict.c |   2 +-
 .../goacc/kernels-conversion-parloops.c   |  61 ++
 .../c-c++-common/goacc/kernels-conversion.c   |  12 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-parloops.c   |  16 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 gcc/tree-ssa-loop.c   |  10 +
 39 files changed, 2265 insertions(+), 377 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 gcc/tree-chrec-oacc.h

--
2.17.1
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: Move pass_oacc_device_lower after pass_graphite

2020-11-06 Thread Frederik Harwath

Hi Richard,

Richard Biener  writes:

> On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath

> What's on my TODO list (or on the list of things to explore) is to make
> the dump file names/suffixes explicit in passes.def like via
>
>   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")
>
> and we'd get a dump named .ccp_oacc or so.

That would be very helpful for avoiding the drudgery of adapting those
pass numbers!

> Now, what does oacc_device_lower actually do that you need to
> re-run complex lowering?  What does cunrolli do at this point that
> the complete_unroll pass later does not do?
>

Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is
necessary to handle the code that gets created by the OpenACC reduction
lowering during oaccdevlow.  I have attached a test case (a reduced
version of
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which
shows that the complex instructions are created by
pass_oacc_device_lower and which leads to an ICE if compiled without the
new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").
The problem is an unlowered addition. This is from a diff of the dump of
the pass following oaccdevlow1 (ccp4) with disabled and with enabled
tree-cplxlower2:

<   _91 = VIEW_CONVERT_EXPR(_1);
<   _92 = reduction_var_2 + _91;
---
>   _104 = REALPART_EXPR (_1)>;
>   _105 = IMAGPART_EXPR (_1)>;
>   _91 = COMPLEX_EXPR <_104, _105>;
>   _106 = reduction_var$real_100 + _104;
>   _107 = reduction_var$imag_101 + _105;
>   _92 = COMPLEX_EXPR <_106, _107>;

> What's special about oacc_device lower that doesn't also apply
> to omp_device_lower?

The passes do different things. The goal is to optimize OpenACC
loops using Graphite. The relevant lowering of the internal OpenACC
function calls happens in pass_oacc_device_lower.

> Is all this targeted at code compiled exclusively for the offload
> target?  Thus we're in lto1 here?

The OpenACC outlined functions also get compiled for the host.

> Does it make eventually more sense to have a completely custom pass
> pipeline for the  offload compilation?  Maybe even per offload target?
> See how we have a custom pipeline for -Og (pass_all_optimizations_g).

What would be the main benefits of a separate pipeline? Avoiding
(re-)running passes unneccessarily, less unwanted interactions
in the test suite (but your suggestion above regarding the fixed
pass names would also solve this)?

>> Ok to include the patch in master?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
new file mode 100644
index 000..6879e5aaf25
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
@@ -0,0 +1,50 @@
+/* { dg-additional-options "-foffload=-fdump-tree-cplxlower2" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-do link } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } {""} } */
+
+#include 
+#if !defined(__hppa__) || !defined(__hpux__)
+#include 
+#endif
+
+#define N 100
+
+static float _Complex __attribute__ ((noinline))
+sum (float _Complex ary[N])
+{
+  float _Complex reduction_var = 0;
+#pragma acc parallel loop gang reduction(+:reduction_var)
+  for (int ix = 0; ix < N; ix++)
+reduction_var += ary[ix];
+
+ return reduction_var;
+}
+
+int main (void)
+{
+  float _Complex ary[N];
+  float _Complex result;
+
+  for (int ix = 0; ix < N;  ix++)
+{
+  float frac = ix * (1.0f / 1024) + 1.0f;
+  ary[ix] = frac + frac * 2.0j - 1.0j;
+}
+
+  result = sum (ary);
+  printf("%.1f%+.1fi\n", creal(result), cimag(result));
+  return 0;
+}
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR" 1 "oaccdevlow1" } }
+
+ There is just one COMPLEX_EXPR right before oaccdevlow1 ...*/
+
+/* { dg-final { scan-offload-tree-dump-times "GOACC_REDUCTION .*?reduction_var.*?;" 4 "oaccdevlow1" } }
+
+  ... but several IFN_GOACC_REDUCTION calls for the reduction variable which are subsequently lowered ... */
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR " 4  "cplxlower2" } }
+
+ ... which introduces new COMPLEX_EXPRs. */


[PATCH] testsuite: Clean up lto and offload dump files

2020-11-04 Thread Frederik Harwath

Hi,

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects leftover
dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is used
for matching the "ltrans" files is also changed since the existing
pattern failed to remove some LTO ("ltrans0.ltrans.") dump files.


This patch gets rid of at least one unresolved libgomp test result that
would otherwise be introduced by the patch discussed in this thread:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557889.html


diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {

[...]

-   lappend tfiles "$stem.{$basename_ext,exe}"

I do not understand why "exe" should be included here. I have removed it
and I did not notice any files matching the resultig pattern being left
back by "make check-gcc".


Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9eb5da60e8822e1f6fa90b32bff6123ed62c146c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 4 Nov 2020 14:09:46 +0100
Subject: [PATCH] testsuite: Clean up lto and offload dump files

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects
leftover dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is
used for matching the "ltrans" files is also changed since the
existing pattern failed to match some dump files.

2020-11-04  Frederik Harwath  

gcc/testsuite/ChangeLog:

	* lib/gcc-dg.exp (proc schedule-cleanups): Adapt "-flto" handling,
	add "-foffload" handling.
---
 gcc/testsuite/lib/gcc-dg.exp | 50 
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {
 # stem.ext..
 # (tree)passes can have multiple instances, thus optional trailing *
 set ptn "\[0-9\]\[0-9\]\[0-9\]$ptn.*"
+set ltrans no
+set mkoffload no
+
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
 	verbose "Cleanup -flto seen" 4
-	set ltrans "{ltrans\[0-9\]*.,}"
-} else {
-	set ltrans ""
+	set ltrans yes
+}
+
+if [regexp -- {(^|\s+)-foffload=} $opts] {
+	verbose "Cleanup -foffload seen" 4
+	set mkoffload yes
 }
-set ptn "$ltrans$ptn"
+
 verbose "Cleanup final ptn: $ptn" 4
 set tfiles {}
 foreach src $testcases {
-	set basename [file tail $src]
-	if { $ltrans != "" } {
-	# ??? should we use upvar 1 output_file instead of this (dup ?)
-	set stem [file rootname $basename]
-	set basename_ext [file extension $basename]
-	if {$basename_ext != ""} {
-		regsub -- {^.*\.} $basename_ext {} basename_ext
-	}
-	lappend tfiles "$stem.{$basename_ext,exe}"
-	unset basename_ext
-	} else {
-	lappend tfiles $basename
-	}
+set basename [file tail $src]
+set stem [file rootname $basename]
+set basename_ext [file extension $basename]
+if {$basename_ext != ""} {
+regsub -- {^.*\.} $basename_ext {} basename_ext
+}
+set extensions [list $basename_ext]
+
+if { $ltrans == yes } {
+lappend extensions "ltrans\[0-9\]*.ltrans"
+}
+if { $mkoffload == yes} {
+# The * matches the offloading target's name, e.g. "xnvptx-none".
+lappend extensions "*.mkoffload"
+}
+
+set extensions_ptn [join $extensions ","]
+if { [llength $extensions] > 1 } {
+set extensions_ptn "{$extensions_ptn}"
+}
+
+  	lappend tfiles "$stem.$extensions_ptn"
 }
+
 if { [llength $tfiles] > 1 } {
 	set tfiles [join $tfiles ","]
 	set tfiles "{$tfiles}"
-- 
2.17.1



Move pass_oacc_device_lower after pass_graphite

2020-11-03 Thread Frederik Harwath

Hi,

as a first step towards enabling the use of Graphite for optimizing
OpenACC loops this patch moves the OpenACC device lowering after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which ensures
that they run on OpenACC functions only. The choice of the new position
for pass_oacc_device_lower is further constrainted by the need to
execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes if
optimizations are disabled is included inside of a new
pass_no_optimizations.

The patch has been bootstrapped on x86_64-linux-gnu and tested with the
GCC testsuite and with the libgomp testsuite with nvptx and gcn
offloading.

The patch should have no impact on non-OpenACC user code. However the
new pass instances have changed the pass instance numbering and hence
the dump scanning commands in several tests had to be adjusted. I hope
that I found all that needed adjustment, but it is well possible that I
missed some tests that execute for particular targets or non-default
languages only. The resulting UNRESOLVED tests are usually easily fixed
by appending a pass number to the name of a pass that previously had no
number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass
number (e.g. "dce6" becomes "dce7") in a dump scanning command.

The patch leads to several new unresolved tests in the libgomp testsuite
which are caused by the combination of torture testing, missing cleanup
of the offload dump files, and the new pass numbering.  If a test that
uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with
"-O0" and afterwards with "-O2", each run of the test executes different
instances of pass_oacc_device_lower and produces dumps whose names
differ only in the pass instance number.  The dump scanning command in
the second run fails, because the dump files do not get removed after
the first run and the command consequently matches two different dump
files.  This seems to be a known issue.  I am going to submit a patch
that implements the cleanup of the offload dumps soon.

I have tried to rule out performance regressions by running different
benchmark suites with nvptx and gcn offloading. Nevertheless, I think
that it makes sense to keep an eye on OpenACC performance in the close
future and revisit the optimizations that run on the device lowered
function if necessary.

Ok to include the patch in master?

Best regards,
Frederik


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 93fb166876a0540416e19c9428316d1370dd1e1b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 3 Nov 2020 12:58:37 +0100
Subject: [PATCH] Move pass_oacc_device_lower after pass_graphite

As a first step towards enabling the use of Graphite for optimizing
OpenACC loops, the OpenACC device lowering must be moved after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which
ensures that they execute on OpenACC functions only. The choice of the
new position for pass_oacc_device_lower is further constrainted by the
need to execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes
if optimizations are disabled is included inside of a new
pass_no_optimizations.

2020-11-03  Frederik Harwath  
	Thomas Schwinge  

gcc/ChangeLog:

	* omp-general.c (oacc_get_fn_dim_size): Adapt.
	* omp-offload.c (pass_oacc_device_lower::clone) : New method.
	* passes.c (class pass_no_optimizations): New pass.
	(make_pass_no_optimizations): New static function.
	* passes.def: Move pass_oacc_device_lower into pass_tree_loop
	and add further instances to pass_tree_no_loop and to new pass
	pass_no_optimizations. Add new instances of
	pass_lower_complex, pass_ccp, pass_sink_code,
	pass_complete_unrolli, pass_backprop, pass_phiprop,
	pass_forwprop, pass_vrp, pass_dce, pass_loop_done,
	pass_loop_init, pass_fix_loops supporting the
	pass_oacc_device_lower instance in pass_tree_loop.
	* tree-pass.h (make_pass_oacc_functions): New static function.
	(make_pass_oacc_functions): New static function.
	* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New method.
	(pass_complete_unrolli::clone): New method.
	* tree-ssa-loop.c (pass

Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-20 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

>> Can I include the patch in OG10?
>
> Unless Julian/Kwok speak up soon: OK, thanks.

This has been delayed a bit by my vacation, but I have now committed
the patch.

> May want to remove "libgomp" from the first line of the commit log --
> this commit doesn't relate to libgomp specifically.
>
> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
> 'parallel', but we can add that later.  I anyway have a WIP patch
> waiting, adding more 'serial' construct testing, for a different reason,
> so I'll include it there.)

I forgot to remove "libgomp" from the commit message, sorry, but
I have included the test cases for the "serial construct".

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 7c10ae450b95495dda362cb66770bb78b546592e Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jul 2020 11:24:21 +0200
Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan
 loop" error message

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.

2020-07-20  Frederik Harwath  

gcc/fortran/

	* openmp.c (oacc_is_parallel_or_serial): Removed function.
	(oacc_is_kernels): New function.
	(oacc_is_compute_construct): New function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.

gcc/testsuite/

	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
	verifying that the "gang reduction on an orphan loop" error message
	is not emitted for non-orphaned loops.

	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
---
 gcc/fortran/ChangeLog |   9 ++
 gcc/fortran/openmp.c  |  13 ++-
 gcc/testsuite/ChangeLog   |   7 ++
 .../c-c++-common/goacc/orphan-reductions-2.c  | 103 ++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 |  87 +++
 5 files changed, 216 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index e86279cb647..5a1f81c286e 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,12 @@
+2020-07-20  Frederik Harwath  
+
+	* openmp.c (oacc_is_parallel_or_serial): Removed function.
+	(oacc_is_kernels): New function.
+	(oacc_is_compute_construct): New function.
+	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
+	instead of "oacc_is_parallel_or_serial" for checking that a
+	loop is not orphaned.
+
 2020-07-08  Harald Anlauf  
 
 	Backported from master:
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ab68e9f2173..706933c869a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5927,9 +5927,16 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_kernels (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
+}
+
+static bool
+oacc_is_compute_construct (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code)
+|| oacc_is_kernels (code);
 }
 
 static gfc_statement
@@ -6223,7 +6230,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !oacc_is_parallel_or_serial (c->code))
+  if (c == NULL || !oacc_is_compute_construct (c->code))
 	gfc_error ("gang reduction on an orphan loop at %L", >loc);
 }
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 59e6c93b07a..fa1937a4ea2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2020-07-20  Frederik Harwath  
+
+	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
+	verifying that the "gang reduction on an orphan loop" error message
+	is not emitted for non-orphaned loops.
+	* c-c++-comm

Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-07 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

> (CC  added, for everything touching gfortran.)

Thanks!

> On 2020-07-07T10:52:08+0200, Frederik Harwath  
> wrote:
>> This patch fixes the check for reductions on orphaned gang loops
>
> This is the "Make OpenACC orphan gang reductions errors" functionality
> originally added in gomp-4_0-branch r247461.
>
>> the Fortran frontend which (in contrast to the C, C++ frontends)
>> erroneously rejects reductions on gang loops that are contained in
>> "kernels" constructs and which hence are not orphaned.
>>
>> According to the OpenACC standard version 2.5 and later, reductions on
>> orphaned gang loops are explicitly disallowed (cf.  section "Changes
>> from Version 2.0 to 2.5").  Remember that a loop is "orphaned" if it is
>> not lexically contained in a compute construct (cf. section "Loop
>> construct" of the OpenACC standard), i.e. in either a "parallel", a
>> "serial", or a "kernels" construct.
>
> Or the other way round: a 'loop' construct is orphaned if it appears
> inside a 'routine' region, right?

The "not lexically contained in a compute construct" definition is
from the standard. Assuming that the frontend's parser rejects "loop"
directives if they do not occur inside of either the "serial",
"parallel", "kernels" compute constructs or in a function with a
"routine" directive, both definitions should be indeed equivalent ;-).

> Unless Julian/Kwok speak up soon: OK, thanks.
>
> Reviewed-by: Thomas Schwinge 
>
> May want to remove "libgomp" from the first line of the commit log --
> this commit doesn't relate to libgomp specifically.

Right.

> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
> 'parallel', but we can add that later.  I anyway have a WIP patch
> waiting, adding more 'serial' construct testing, for a different reason,
> so I'll include it there.)

I had left this out intentionally, because having the gang reduction in
the serial construct leads to a "region contains gang partitioned
code but is not gang partitioned"
error. Of course, we might still add a test case with that expectation.

Thanks for the review!

Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-07 Thread Frederik Harwath

Hi,
This patch fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs and which hence are not orphaned.

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  Remember that a loop is "orphaned" if it is
not lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

The patch has been tested by running the GCC and libgomp testsuites.
The latter tests ran with offloading to nvptx although that should not
be important here unless there was some very subtle reason for
forbidding the gang reductions on kernels loops. As expect, there seems
to be no such reason, i.e. I observed no regressions with the patch.

Can I include the patch in OG10?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 7320635211fff3a773beb0de1914dbfcc317ab37 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 7 Jul 2020 10:41:21 +0200
Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan
 loop" error message

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.

2020-07-07  Frederik Harwath  

gcc/fortran/

	* openmp.c (oacc_is_parallel_or_serial): Removed function.
	(oacc_is_kernels): New function.
	(oacc_is_compute_construct): New function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.

gcc/testsuite/

	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
	verifying that the error message is not emitted for
	non-orphaned loops.

	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
---
 gcc/fortran/openmp.c  | 13 +++-
 .../c-c++-common/goacc/orphan-reductions-2.c  | 69 +++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 | 58 
 3 files changed, 137 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 28408c4c99a..83c498112a8 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5926,9 +5926,16 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_kernels (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
+}
+
+static bool
+oacc_is_compute_construct (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code)
+|| oacc_is_kernels (code);
 }
 
 static gfc_statement
@@ -6222,7 +6229,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !oacc_is_parallel_or_serial (c->code))
+  if (c == NULL || !oacc_is_compute_construct (c->code))
 	gfc_error ("gang reduction on an orphan loop at %L", >loc);
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
new file mode 100644
index 000..2b651fd2b9f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
@@ -0,0 +1,69 @@
+/* Verify that the error message for gang reduction on orphaned OpenACC loops
+   is not reported for non-orphaned loops. */
+
+#include 
+
+int
+kernels (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc kernels
+  {
+#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+  }
+  return

PING Re: testsuite: clarify scan-dump file globbing behavior

2020-06-02 Thread Frederik Harwath
Frederik Harwath  writes:

ping :-)

> Frederik Harwath  writes:
>
> Hi Rainer, hi Mike,
> ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html
>
> Best regards,
> Frederik
>
>> Hi Thomas,
>>
>> Thomas Schwinge  writes:
>>
>>> I can't formally approve testsuite patches, but did a review anyway:
>>
>> Thanks for the review!
>>
>>> On 2020-05-15T12:31:54+0200, Frederik Harwath  
>>> wrote:
>>
>>>> The dump
>>>> scanning procedures are changed to make the test unresolved
>>>> if globbing matches more than one file.
>>>
>>> (The code changes look good, but I have not tested that specific aspect.)
>>
>> We do not have automated tests for the testsuite commands :-), but I
>> have of course tested this manually.
>>
>>> As I said, not an approval, and minor comments (see below), but still:
>>>
>>> Reviewed-by: Thomas Schwinge 
>>>
>>> Do we have to similarly also audit/alter other testsuite infrastructure
>>> files, anything that uses '[glob [...]]'?  (..., and then generalize
>>> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
>>> incrementally, as far as I'm concerned.
>>
>> I also think it would make sense to adapt similar test commands as well.
>>
>>> May also make this more useful/explicit:
>>>
>>> This is useful if, for example, if a pass has several static
>>> instances [correct terminology?], and depending on torture testing
>>> command-line flags, a different instance executes and produces a dump
>>> file, and so in the test case you can use a generic [put example
>>> here] to scan the varying dump files names.
>>>
>>> (Or similar.)
>>
>> I have moved the explanation below the description of the individual
>> commands and added an example. See the attached revised patch.
>>
>> Best regards,
>> Frederik
>>
>> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
>> From: Frederik Harwath 
>> Date: Fri, 15 May 2020 10:35:48 +0200
>> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior
>>
>> The test commands for scanning optimization dump files
>> perform globbing on the argument that specifies the suffix
>> of the dump files to be scanned.  This behavior is currently
>> undocumented.  Furthermore, the current implementation of
>> "scan-dump" and similar procedures yields an error whenever
>> the file name globbing matches more than one file (due to an
>> attempt to call "open" on multiple files) while a failure to
>> match any file results in an unresolved test.
>>
>> This commit documents the globbing behavior.  The dump
>> scanning procedures are changed to make the test unresolved
>> if globbing matches more than one file.
>>
>> gcc/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * doc/sourcebuild.texi: Describe globbing of the
>>  dump file scanning commands "suffix" argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * lib/scandump.exp (glob-dump-file): New proc.
>>  (scan-dump): Use glob-dump-file for file name expansion.
>>  (scan-dump-times): Likewise.
>>  (scan-dump-dem): Likewise.
>>  (scan-dump-dem-not): Likewise.
>>
>> Reviewed-by: Thomas Schwinge 
>> ---
>>  gcc/doc/sourcebuild.texi   | 13 
>>  gcc/testsuite/lib/scandump.exp | 54 +++---
>>  2 files changed, 56 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 240d6e4b08e..9df4b06d460 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text 
>> in the dump file with
>>  suffix @var{suffix}.
>>  @end table
>>
>> +The @var{suffix} argument which describes the dump file to be scanned
>> +may contain a glob pattern that must expand to exactly one file
>> +name. This is useful if, e.g., different pass instances are executed
>> +depending on torture testing command-line flags, producing dump files
>> +whose names differ only in their pass instance number suffix.  For
>> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
>> +occurrences of the string ``code has been optimized'', use:
>> +@smallexample
>> +/* @{ dg-options

Re: testsuite: clarify scan-dump file globbing behavior

2020-05-25 Thread Frederik Harwath
Frederik Harwath  writes:

Hi Rainer, hi Mike,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html

Best regards,
Frederik

> Hi Thomas,
>
> Thomas Schwinge  writes:
>
>> I can't formally approve testsuite patches, but did a review anyway:
>
> Thanks for the review!
>
>> On 2020-05-15T12:31:54+0200, Frederik Harwath  
>> wrote:
>
>>> The dump
>>> scanning procedures are changed to make the test unresolved
>>> if globbing matches more than one file.
>>
>> (The code changes look good, but I have not tested that specific aspect.)
>
> We do not have automated tests for the testsuite commands :-), but I
> have of course tested this manually.
>
>> As I said, not an approval, and minor comments (see below), but still:
>>
>> Reviewed-by: Thomas Schwinge 
>>
>> Do we have to similarly also audit/alter other testsuite infrastructure
>> files, anything that uses '[glob [...]]'?  (..., and then generalize
>> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
>> incrementally, as far as I'm concerned.
>
> I also think it would make sense to adapt similar test commands as well.
>
>> May also make this more useful/explicit:
>>
>> This is useful if, for example, if a pass has several static
>> instances [correct terminology?], and depending on torture testing
>> command-line flags, a different instance executes and produces a dump
>> file, and so in the test case you can use a generic [put example
>> here] to scan the varying dump files names.
>>
>> (Or similar.)
>
> I have moved the explanation below the description of the individual
> commands and added an example. See the attached revised patch.
>
> Best regards,
> Frederik
>
> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
> From: Frederik Harwath 
> Date: Fri, 15 May 2020 10:35:48 +0200
> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior
>
> The test commands for scanning optimization dump files
> perform globbing on the argument that specifies the suffix
> of the dump files to be scanned.  This behavior is currently
> undocumented.  Furthermore, the current implementation of
> "scan-dump" and similar procedures yields an error whenever
> the file name globbing matches more than one file (due to an
> attempt to call "open" on multiple files) while a failure to
> match any file results in an unresolved test.
>
> This commit documents the globbing behavior.  The dump
> scanning procedures are changed to make the test unresolved
> if globbing matches more than one file.
>
> gcc/ChangeLog:
>
> 2020-05-19  Frederik Harwath  
>
>   * doc/sourcebuild.texi: Describe globbing of the
>   dump file scanning commands "suffix" argument.
>
> gcc/testsuite/ChangeLog:
>
> 2020-05-19  Frederik Harwath  
>
>   * lib/scandump.exp (glob-dump-file): New proc.
>   (scan-dump): Use glob-dump-file for file name expansion.
>   (scan-dump-times): Likewise.
>   (scan-dump-dem): Likewise.
>   (scan-dump-dem-not): Likewise.
>
> Reviewed-by: Thomas Schwinge 
> ---
>  gcc/doc/sourcebuild.texi   | 13 
>  gcc/testsuite/lib/scandump.exp | 54 +++---
>  2 files changed, 56 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 240d6e4b08e..9df4b06d460 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in 
> the dump file with
>  suffix @var{suffix}.
>  @end table
>
> +The @var{suffix} argument which describes the dump file to be scanned
> +may contain a glob pattern that must expand to exactly one file
> +name. This is useful if, e.g., different pass instances are executed
> +depending on torture testing command-line flags, producing dump files
> +whose names differ only in their pass instance number suffix.  For
> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
> +occurrences of the string ``code has been optimized'', use:
> +@smallexample
> +/* @{ dg-options "-fdump-tree-mypass" @} */
> +/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" 
> @} @} */
> +@end smallexample
> +
> +
>  @subsubsection Check for output files
>
>  @table @code
> diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
> index d6ba350acc8..f3a991b590a 100644
> --- a/gcc/testsuite/lib/scandump.exp
> +++ b/gcc/testsuite/lib/scandump.exp
>

Re: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

2020-05-19 Thread Frederik Harwath
Martin Liška  writes:

Hi Martin,

> On 5/19/20 11:45 AM, Frederik Harwath wrote:
> Thank you Frederick for the patch.
>
> Looking at what I grepped:
> https://github.com/marxin/gcc-changelog/issues/1#issuecomment-621910248

I get a 404 error when I try to access this URL. The repository also
does not seem to be in your list of public repositories.


> Can you also add 'Signed-off-by'? And please create a list with these
> exceptions at the beginning of the script.

Yes, I will add it.

> Fine with that.

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

2020-05-19 Thread Frederik Harwath
Hi,
the new contrib/gcc-changelog/git_check_commit.py script
(which, by the way, is very useful!) does not handle "Reviewed-by" and
"Reviewed-on" lines yet and hence it expects those lines to be indented
by a tab although those lines are usually not indented. The script
already knows about "Co-Authored-By" lines and I have extended it to
handle the "Reviewed-{by,on}" lines in a similar way. The information
from those lines is not processed further since the review information
apparantly does not get included in the ChangeLogs.

Ok to commit the patch?

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 0dc9b201bc1607de36cb9b3604a87cc3646292e3 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 19 May 2020 11:15:28 +0200
Subject: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

git-check-commit.py does not know about "Reviewed-by" and
"Reviewed-on" lines and hence it expects those lines which
follow the ChangeLog entries to be indented by a tab.

This commit makes the script skip those lines.  No further
processing is attempted because the review information
is not part of the ChangeLogs.

contrib/

2020-05-19  Frederik Harwath  

	* gcc-changelog/git_commit.py: Skip over lines starting
	with "Reviewed-by: " or "Reviewed-on: ".
---
 contrib/gcc-changelog/git_commit.py | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index 5214cc36538..ebcf853f02f 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -150,6 +150,8 @@ star_prefix_regex = re.compile(r'\t\*(?P\ *)(?P.*)')
 LINE_LIMIT = 100
 TAB_WIDTH = 8
 CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
+REVIEWED_BY_PREFIX = 'reviewed-by: '
+REVIEWED_ON_PREFIX = 'reviewed-on: '
 
 
 class Error:
@@ -344,12 +346,19 @@ class GitCommit:
 else:
 pr_line = line.lstrip()
 
-if line.lower().startswith(CO_AUTHORED_BY_PREFIX):
+lowered_line = line.lower()
+if lowered_line.startswith(CO_AUTHORED_BY_PREFIX):
 name = line[len(CO_AUTHORED_BY_PREFIX):]
 author = self.format_git_author(name)
 self.co_authors.append(author)
 continue
 
+# Skip over review information for now.
+# This avoids errors due to missing tabs on these lines below.
+if lowered_line.startswith((REVIEWED_BY_PREFIX,\
+REVIEWED_ON_PREFIX)):
+continue
+
 # ChangeLog name will be deduced later
 if not last_entry:
 if author_tuple:
-- 
2.17.1



Re: testsuite: clarify scan-dump file globbing behavior

2020-05-19 Thread Frederik Harwath
Hi Thomas,

Thomas Schwinge  writes:

> I can't formally approve testsuite patches, but did a review anyway:

Thanks for the review!

> On 2020-05-15T12:31:54+0200, Frederik Harwath  
> wrote:

>> The dump
>> scanning procedures are changed to make the test unresolved
>> if globbing matches more than one file.
>
> (The code changes look good, but I have not tested that specific aspect.)

We do not have automated tests for the testsuite commands :-), but I
have of course tested this manually.

> As I said, not an approval, and minor comments (see below), but still:
>
> Reviewed-by: Thomas Schwinge 
>
> Do we have to similarly also audit/alter other testsuite infrastructure
> files, anything that uses '[glob [...]]'?  (..., and then generalize
> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
> incrementally, as far as I'm concerned.

I also think it would make sense to adapt similar test commands as well.

> May also make this more useful/explicit:
>
> This is useful if, for example, if a pass has several static
> instances [correct terminology?], and depending on torture testing
> command-line flags, a different instance executes and produces a dump
> file, and so in the test case you can use a generic [put example
> here] to scan the varying dump files names.
>
> (Or similar.)

I have moved the explanation below the description of the individual
commands and added an example. See the attached revised patch.

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 15 May 2020 10:35:48 +0200
Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned.  This behavior is currently
undocumented.  Furthermore, the current implementation of
"scan-dump" and similar procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file results in an unresolved test.

This commit documents the globbing behavior.  The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

gcc/ChangeLog:

2020-05-19  Frederik Harwath  

	* doc/sourcebuild.texi: Describe globbing of the
	dump file scanning commands "suffix" argument.

gcc/testsuite/ChangeLog:

2020-05-19  Frederik Harwath  

	* lib/scandump.exp (glob-dump-file): New proc.
	(scan-dump): Use glob-dump-file for file name expansion.
	(scan-dump-times): Likewise.
	(scan-dump-dem): Likewise.
	(scan-dump-dem-not): Likewise.

Reviewed-by: Thomas Schwinge 
---
 gcc/doc/sourcebuild.texi   | 13 
 gcc/testsuite/lib/scandump.exp | 54 +++---
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 240d6e4b08e..9df4b06d460 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in the dump file with
 suffix @var{suffix}.
 @end table
 
+The @var{suffix} argument which describes the dump file to be scanned
+may contain a glob pattern that must expand to exactly one file
+name. This is useful if, e.g., different pass instances are executed
+depending on torture testing command-line flags, producing dump files
+whose names differ only in their pass instance number suffix.  For
+example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
+occurrences of the string ``code has been optimized'', use:
+@smallexample
+/* @{ dg-options "-fdump-tree-mypass" @} */
+/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" @} @} */
+@end smallexample
+
+
 @subsubsection Check for output files
 
 @table @code
diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index d6ba350acc8..f3a991b590a 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -39,6 +39,34 @@ proc dump-base { args } {
 return $dumpbase
 }
 
+# Expand dump file name pattern to exactly one file.
+# Return a single dump file name or an empty string
+# if the pattern matches no file or more than one file.
+#
+# Argument 0 is the testcase name
+# Argument 1 is the dump file glob pattern
+proc glob-dump-file { args } {
+
+set pattern [lindex $args 1]
+set dump_file "[glob -nocomplain $pattern]"
+set num_files [llength $dump_file]
+
+if { $num_files != 1 } {
+	set testcase [lindex $args 0]
+	i

testsuite: clarify scan-dump file globbing behavior

2020-05-15 Thread Frederik Harwath
Hi,

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned. This behavior is currently
undocumented. Furthermore, the current implementation of
"scan-dump" and related procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file at all results in an unresolved test.

This patch documents the globbing behavior. The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

The procedures in scandump.exp all perform the file name expansion in
essentially the same way and I have extracted this into a new
procedure. But there is one very minor exception:

> @@ -67,10 +95,10 @@ proc scan-dump { args } {
>  set dumpbase [dump-base $src [lindex $args 3]]
> -set output_file "[glob -nocomplain $dumpbase.[lindex $args 2]]"
> +
> +set pattern "$dumpbase.[lindex $args 2]"
> +set output_file "[glob-dump-file $testcase $pattern]"
>  if { $output_file == "" } {
> - verbose -log "$testcase: dump file does not exist"
> - verbose -log "dump file: $dumpbase.$suf"

"scan-dump" is the only procedure that prints the "dump file: ..." line.
Should this be kept or is it ok to remove this as I have done in the
patch? $dumpbase.$suf does not emit the correct file name anyway
(a random example from my testing: "dump file: stdatomic-init.c.dce*")
and the name of the files can be inferred from the test name easily.

I have tested the changes by running "make check" (with a
--enable-languages=C only build, but this covers lots of uses
of the affected test procedures) and observed no regressions.

Ok to commit this to master?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 6912e03d51d360dbbcf7eb1dc8d77d08c2a6e54c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 15 May 2020 10:35:48 +0200
Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned.  This behavior is currently
undocumented.  Furthermore, the current implementation of
"scan-dump" and similar procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file results in an unresolved test.

This commit documents the globbing behavior.  The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

gcc/ChangeLog:

2020-05-15  Frederik Harwath  

	* doc/sourcebuild.texi: Describe globbing of the
	dump file scanning commands "suffix" argument.

gcc/testsuite/ChangeLog:

2020-05-15  Frederik Harwath  

	* lib/scandump.exp (glob-dump-file): New proc.
	(scan-dump): Use glob-dump-file for file name expansion.
	(scan-dump-times): Likewise.
	(scan-dump-dem): Likewise.
	(scan-dump-dem-not): Likewise.
---
 gcc/doc/sourcebuild.texi   |  4 ++-
 gcc/testsuite/lib/scandump.exp | 54 +++---
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 240d6e4b08e..b6c5a21cb71 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2888,7 +2888,9 @@ stands for zero or more unmatched lines; the whitespace after
 
 These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree},
 @code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa}, and
-@code{wpa-ipa}.
+@code{wpa-ipa}.  The @var{suffix} argument which describes the dump file
+to be scanned may contain a glob pattern that must expand to exactly one
+file name.
 
 @table @code
 @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail @var{selector} @}]
diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index d6ba350acc8..f3a991b590a 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -39,6 +39,34 @@ proc dump-base { args } {
 return $dumpbase
 }
 
+# Expand dump file name pattern to exactly one file.
+# Return a single dump file name or an empty string
+# if the pattern matches no file or more than one file.
+#
+# Argument 0 is the testcase name
+# Argument 1 is the dump file glob pattern
+proc glob-dump-file { args } {
+
+set pattern [lindex $args 1]
+set dump_file "[glob -nocomplain $pattern]"
+set num_files [llength $dump_file]
+
+if { $num_files != 1 } {
+	set testcase

Re: [og8] Report errors on missing OpenACC reduction clauses in nested reductions

2020-04-21 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

> Via <https://gcc.gnu.org/PR94629> "10 issues located by the PVS-studio
> static analyzer" (so please reference that one on any patch submission),
> on <https://habr.com/en/company/pvs-studio/blog/497640/> in "Fragment N3,
> Assigning a variable to itself", we find this latter assignment qualified
> as "very strange to assign a variable to itself".
>
> Probably that should've been 'outer_ctx' instead of 'ctx'?

I agree that the original intention must have been to assign the
outer_ctx's "outer_reduction_clauses" to the corresponding field of the
inner "ctx". This would make sense, semantically. But this field is
meant to be used by the function "scan_omp_for" only and ...

> then does the current algorith still work despite this error?

... this function never requires the struct field to be intialized in
that way.  Before the field is used, it always copies the clauses from
the outer context's outer_reduction_clauses to ctx->outer_reduction_clauses:

>> +  if (ctx->outer_reduction_clauses == NULL && ctx->outer != NULL)
>> +ctx->outer_reduction_clauses
>> +  = chainon (unshare_expr (ctx->outer->local_reduction_clauses),
>> + ctx->outer->outer_reduction_clauses);

Hence I found it preferrable to remove the assignment to the
"outer_reduction_clauses" field and the "local_reduction_clauses" field
from "new_omp_context" completely. (The fields are still zero intialized
by the allocation of the struct which uses XCNEW.) That way the whole
logic regarding the fields is now contained in "scan_omp_for".

I have executed "make check" (on x86_64-linux-gnu) to verify that the
change causes no regressions. Ok to push the commit to master?

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2d60b374a44b212ff97c8b1fd6f8c39e478dc70f Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 21 Apr 2020 12:36:14 +0200
Subject: [PATCH] Remove fishy self-assignment in omp-low.c [PR94629]

The PR noticed that omp-low.c contains a self-assignment in the 
function new_omp_context:

if (outer_ctx) {
...
ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;

This is obviously useless.  The original intention might have been
to copy the field from the outer_ctx to ctx.  Since this is done
(properly) in the only function where this field is actually used
(in function scan_omp_for) and the field is being initialized to zero
during the struct allocation, there is no need to attempt to do
anything to this field in new_omp_context. Thus this commit
removes any assignment to the field from new_omp_context.

2020-04-21  Frederik Harwath  

	PR other/94629
	* gcc/omp-low.c (new_omp_context): Remove assignments to
	ctx->outer_reduction_clauses and ctx->local_reduction_clauses.
---
 gcc/omp-low.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 67565d61400..88f23e60d34 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -128,10 +128,16 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
+  /* A tree_list of the reduction clauses in this context. This is
+only used for checking the consistency of OpenACC reduction
+clauses in scan_omp_for and is not guaranteed to contain a valid
+value outside of this function. */
   tree local_reduction_clauses;
 
-  /* A tree_list of the reduction clauses in outer contexts.  */
+  /* A tree_list of the reduction clauses in outer contexts. This is
+only used for checking the consistency of OpenACC reduction
+clauses in scan_omp_for and is not guaranteed to contain a valid
+value outside of this function. */
   tree outer_reduction_clauses;
 
   /* Nesting depth of this context.  Used to beautify error messages re
@@ -931,8 +937,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->outer = outer_ctx;
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
-  ctx->local_reduction_clauses = NULL;
-  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
   ctx->depth = outer_ctx->depth + 1;
 }
   else
@@ -948,8 +952,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
-  ctx->local_reduction_clauses = NULL;
-  ctx->outer_reduction_clauses = NULL;
   ctx->depth = 1;
 }
 
-- 
2.17.1



Re: [og9] Really fix og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof"

2020-03-27 Thread Frederik Harwath


Hi Thomas,

Thomas Schwinge  writes:

> On 2020-03-25T18:09:25+0100, I wrote:
>> On 2018-02-22T12:23:25+0100, Tom de Vries  wrote:
>>> when using cuda 9 nvprof with an openacc executable, the executable hangs.
>
>> What Frederik has discovered today in the hard way... [...]
>> -- the hang was back. [...]
> ..., and now the attached patch to devel/omp/gcc-9 in commit
> 775f1686a3df68bd20370f1fabc6273883e2c5d2 'Really fix og9 "Fix hang when
> running oacc exec with CUDA 9.0 nvprof"'.

Thanks for fixing this issue! I can confirm that nvprof now works on
code compiled from devel/omp/gcc-9. I have used nvprof 9.1.85 on Ubuntu
18.04 for testing.

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [C/C++, OpenACC] Reject vars of different scope in acc declare (PR94120)

2020-03-12 Thread Frederik Harwath
Tobias Burnus  writes:

Hi Tobias,

> Fortran patch: https://gcc.gnu.org/pipermail/gcc-patches/current/541774.html
>
> "A declare directive must be in the same scope
>   as the declaration of any var that appears in
>   the data clauses of the directive."
>
> ("A declare directive is used […] following a variable
>declaration in C or C++".)
>
> NOTE for C++: This patch assumes that variables in a namespace
> are handled in the same way as those which are at
> global (namespace) scope; however, the OpenACC specification's
> wording currently is "In C or C++ global scope, only …".
> Hence, one can argue about this part of the patch; but as
> it fixes an ICE and is a very sensible extension – the other
> option is to reject it – I believe it is fine.
> (On the OpenACC side, this is now Issue 288.)

Sounds reasonable to me.

> +bool
> +c_check_oacc_same_scope (tree decl)
> +{
> +  struct c_binding *b = I_SYMBOL_BINDING (DECL_NAME (decl));
> +  return b != NULL && B_IN_CURRENT_SCOPE (b);
> +}

Is the function really specific to OpenACC? If not, then "_oacc"
could be dropped from its name. How about "c_check_current_scope"?

> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 24f71671469..8f09eb0d375 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> [...]
> -   if (global_bindings_p ())
> +   if (current_binding_level->kind == sk_namespace)
> [...]
> -  if (error || global_bindings_p ())
> +  if (error || current_binding_level->kind == sk_namespace)
>  return NULL_TREE;

So - just to be sure - the new namespace condition subsumes the old
"global_bindings_p" condition because the global scope is also a namespace,
right? Yes, now I see that you have a test case that demonstrates that
the declare directive still works for global variables with those changes.

> diff --git a/gcc/testsuite/g++.dg/declare-pr94120.C 
> b/gcc/testsuite/g++.dg/declare-pr94120.C
> new file mode 100644
> index 000..8515c4ff875
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/declare-pr94120.C
> @@ -0,0 +1,30 @@
> +/* { dg-do compile }  */
> +
> +/* PR middle-end/94120  */
> +
> +int b[8];
> +#pragma acc declare create (b)

Looks good to me.

Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH, committed][OpenACC] Adapt libgomp acc_get_property.f90 test

2020-02-21 Thread Harwath, Frederik
Hi,
The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

This obvious patch fixes that problem. Committed as 
r10-6782-g83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a.

Best regards,
Frederik
From 83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 21 Feb 2020 15:26:02 +0100
Subject: [PATCH] Adapt libgomp acc_get_property.f90 test

The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

2020-02-21  Frederik Harwath  

	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
	integer(acc_device_property) for the type of the return value of
	acc_get_property.
---
 libgomp/ChangeLog  | 7 +++
 .../testsuite/libgomp.oacc-fortran/acc_get_property.f90| 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 3c640c7350b..bff3ae58c9a 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-21  Frederik Harwath  
+
+	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
+	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
+	integer(acc_device_property) for the type of the return value of
+	acc_get_property.
+
 2020-02-19  Tobias Burnus  
 
 	* .gitattributes: New; whitespace handling for Fortran's openacc_lib.h.
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
index 80ae292f41f..1af7cc3b988 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
@@ -26,13 +26,14 @@ end program test
 ! and do basic device independent validation.
 subroutine print_device_properties (device_type)
   use openacc
+  use iso_c_binding, only: c_size_t
   implicit none
 
   integer, intent(in) :: device_type
 
   integer :: device_count
   integer :: device
-  integer(acc_device_property) :: v
+  integer(c_size_t) :: v
   character*256 :: s
 
   device_count = acc_get_num_devices(device_type)
-- 
2.17.1



  1   2   >