[PATCH 2/6] [og9] OpenACC middle-end worker-partitioning support

2019-09-04 Thread Julian Brown
This patch implements worker-partitioning support in the middle end,
by rewriting gimple. The OpenACC execution model requires that code can
run in either "worker single" mode where only a single worker per gang
is active, or "worker partitioned" mode, where multiple workers per gang
are active. This means we need to do something equivalent to
spawning additional workers when transitioning from worker-single to
worker-partitioned mode. However, GPUs typically fix the number of threads
of invoked kernels at launch time, so we need to do something with the
"extra" threads when they are not wanted.

The scheme used is -- very briefly! -- to conditionalise each basic block
that executes in "worker single" mode for worker 0 only. Conditional
branches are handled specially so "idle" (non-0) workers follow along with
worker 0. On transitioning to "worker partitioned" mode, any variables
modified by worker 0 are propagated to the other workers via GPU shared
memory. Special care is taken for routine calls, writes through pointers,
and so forth.

Much of omp-sese.c originates from code written for NVPTX by Nathan
Sidwell (adapted to work on gimple instead of RTL) -- though at present,
only the per-basic-block scheme is implemented, and the SESE-finding
algorithm isn't yet used.

Julian

ChangeLog

gcc/
* Makefile.in (OBJS): Add omp-sese.o.
* omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START,
BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New
builtins.
* omp-offload.c (omp-sese.h): Include header.
(oacc_loop_xform_head_tail): Call update_stmt for modified builtin
calls.
(oacc_loop_process): Likewise.
(default_goacc_create_propagation_record): New default implementation
for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook.
(execute_oacc_loop_designation): New.  Split out of oacc_device_lower.
(execute_oacc_gimple_workers): New.  Likewise.
(execute_oacc_device_lower): Recreate dims array.
(pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New.
(pass_oacc_loop_designation, pass_oacc_gimple_workers): New.
(make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New.
* omp-offload.h (oacc_fn_attrib_level): Add prototype.
* omp-sese.c: New file.
* omp-sese.h: New file.
* passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers):
Add passes.
* target.def (worker_partitioning, create_propagation_record): Add
target hooks.
* targhooks.h (default_goacc_create_propagation_record): Add prototype.
* tree-pass.h (make_pass_oacc_loop_designation,
make_pass_oacc_gimple_workers): Add prototypes.
* doc/tm.texi.in (TARGET_GOACC_WORKER_PARTITIONING,
TARGET_GOACC_CREATE_PROPAGATION_RECORD): Add documentation hooks.
* doc/tm.texi: Regenerate.
---
 gcc/ChangeLog.openacc |   32 +
 gcc/Makefile.in   |1 +
 gcc/doc/tm.texi   |   10 +
 gcc/doc/tm.texi.in|4 +
 gcc/omp-builtins.def  |8 +
 gcc/omp-offload.c |  159 +++-
 gcc/omp-offload.h |1 +
 gcc/omp-sese.c| 2036 +
 gcc/omp-sese.h|   26 +
 gcc/passes.def|2 +
 gcc/target.def|   13 +
 gcc/targhooks.h   |1 +
 gcc/tree-pass.h   |2 +
 13 files changed, 2276 insertions(+), 19 deletions(-)
 create mode 100644 gcc/omp-sese.c
 create mode 100644 gcc/omp-sese.h

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index b1c627b394c..a2b2dcfcf26 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,35 @@
+2019-09-05  Julian Brown  
+
+   * Makefile.in (OBJS): Add omp-sese.o.
+   * omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START,
+BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New
+   builtins.
+   * omp-offload.c (omp-sese.h): Include header.
+   (oacc_loop_xform_head_tail): Call update_stmt for modified builtin
+   calls.
+   (oacc_loop_process): Likewise.
+   (default_goacc_create_propagation_record): New default implementation
+   for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook.
+   (execute_oacc_loop_designation): New.  Split out of oacc_device_lower.
+   (execute_oacc_gimple_workers): New.  Likewise.
+   (execute_oacc_device_lower): Recreate dims array.
+   (pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New.
+   (pass_oacc_loop_designation, pass_oacc_gimple_workers): New.
+   (make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New.
+   * omp-offload.h (oacc_fn_attrib_level): Add prototype.
+   * omp-sese.c: New file.
+   * omp-sese.h: New file.
+   * passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers):
+

[PATCH 4/6] [og9] Fix up tests for oaccdevlow pass splitting

2019-09-04 Thread Julian Brown
This patch adjusts some tests after the splitting of the oaccdevlow pass
into three passes.

Julian

ChangeLog

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c,
c-c++-common/goacc/classify-kernels.c,
c-c++-common/goacc/classify-parallel.c,
c-c++-common/goacc/classify-routine.c,
gfortran.dg/goacc/classify-kernels-unparallelized.f95,
gfortran.dg/goacc/classify-kernels.f95,
gfortran.dg/goacc/classify-parallel.f95,
gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of
oaccdevlow pass.
---
 gcc/testsuite/ChangeLog.openacc  | 12 
 .../goacc/classify-kernels-unparallelized.c  |  8 
 gcc/testsuite/c-c++-common/goacc/classify-kernels.c  |  8 
 gcc/testsuite/c-c++-common/goacc/classify-parallel.c |  8 
 gcc/testsuite/c-c++-common/goacc/classify-routine.c  |  8 
 .../goacc/classify-kernels-unparallelized.f95|  8 
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 |  8 
 .../gfortran.dg/goacc/classify-parallel.f95  |  8 
 gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 |  8 
 9 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 8295fe61ba7..899b9cf1783 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,15 @@
+2019-09-05  Julian Brown  
+
+   * c-c++-common/goacc/classify-kernels-unparallelized.c,
+   c-c++-common/goacc/classify-kernels.c,
+   c-c++-common/goacc/classify-parallel.c,
+   c-c++-common/goacc/classify-routine.c,
+   gfortran.dg/goacc/classify-kernels-unparallelized.f95,
+   gfortran.dg/goacc/classify-kernels.f95,
+   gfortran.dg/goacc/classify-parallel.f95,
+   gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of
+   oaccdevlow pass.
+
 2019-07-10  Julian Brown  
 
* c-c++-common/goacc/mdc-1.c: Update clause matching patterns.
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 9dad2de504c..f05fba9d31b 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -5,7 +5,7 @@
{ dg-additional-options "-fopt-info-optimized-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+   { dg-additional-options "-fdump-tree-oaccloops" } */
 
 #define N 1024
 
@@ -36,6 +36,6 @@ void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC 
kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC 
kernels offload" 1 "oaccloops" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccloops" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index f1d46130685..009db79b018 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -5,7 +5,7 @@
{ dg-additional-options "-fopt-info-optimized-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+   { dg-additional-options "-fdump-tree-oaccloops" } */
 
 #define N 1024
 
@@ -31,6 +31,6 @@ void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC 
kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc ker

[PATCH 1/6] [og9] Target-dependent gang-private variable decl rewriting

2019-09-04 Thread Julian Brown
This patch adds support for rewriting variables marked up with the "oacc
gangprivate" attributes in a target-dependent way in the oaccdevlow pass
of the offload compiler.

This behaviour is controlled by a new target hook,
TARGET_GOACC_ADJUST_GANGPRIVATE_DECL. This is conceptually similar to
the existing TARGET_GOACC_EXPAND_ACCEL_VAR hook, but that one works too
late in the compilation process for AMD GCN.

The patch to set the "oacc gangprivate" attribute was posted upstream here:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

A version of that is already present on the og9 branch.

Julian

ChangeLog

gcc/
* omp-offload.c (convert.h): Include.
(struct addr_expr_rewrite_info): Add struct.
(rewrite_addr_expr): New function.
(is_sync_builtin_call): New function.
(execute_oacc_device_lower): Support rewriting gang-private variables
using target hook, and fix up addr_expr nodes afterwards.
* target.def (adjust_gangprivate_decl): New target hook.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new
target hook.
* doc/tm.texi: Regenerate.
---
 gcc/ChangeLog.openacc |  13 +
 gcc/doc/tm.texi   |   4 ++
 gcc/doc/tm.texi.in|   2 +
 gcc/omp-offload.c | 133 ++
 gcc/target.def|   6 ++
 5 files changed, 158 insertions(+)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index a22f07c817c..b1c627b394c 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,16 @@
+2019-09-05  Julian Brown  
+
+   * omp-offload.c (convert.h): Include.
+   (struct addr_expr_rewrite_info): Add struct.
+   (rewrite_addr_expr): New function.
+   (is_sync_builtin_call): New function.
+   (execute_oacc_device_lower): Support rewriting gang-private variables
+   using target hook, and fix up addr_expr nodes afterwards.
+   * target.def (adjust_gangprivate_decl): New target hook.
+   * doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new
+   target hook.
+   * doc/tm.texi: Regenerate.
+
 2019-08-13  Julian Brown  
 
* omp-oacc-kernels.c (add_wait): New function, split out of...
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9b88498eb95..f3707c6abe3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6162,6 +6162,10 @@ memories.  A return value of NULL indicates that the 
target does not
 handle this VAR_DECL, and normal RTL expanding is resumed.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_GANGPRIVATE_DECL (tree 
@var{var})
+Tweak variable declaration for a gang-private variable.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_EXPLODE_ARGS (void)
 Define this hook to TRUE if arguments to offload regions should be
 exploded, i.e. passed as true arguments rather than in an argument array.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index c9c4341a35f..cebadf4a502 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,6 +4210,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_GOACC_EXPAND_ACCEL_VAR
 
+@hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL
+
 @hook TARGET_GOACC_EXPLODE_ARGS
 
 @node Anchored Addresses
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 1129b00511e..c94dc956d7e 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "cfgloop.h"
+#include "convert.h"
 
 /* Describe the OpenACC looping structure of a function.  The entire
function is held in a 'NULL' loop.  */
@@ -1570,6 +1571,78 @@ maybe_discard_oacc_function (tree decl)
   return false;
 }
 
+struct addr_expr_rewrite_info
+{
+  gimple *stmt;
+  hash_set *adjusted_vars;
+  bool avoid_pointer_conversion;
+  bool modified;
+};
+
+static tree
+rewrite_addr_expr (tree *tp, int *walk_subtrees, void *data)
+{
+  walk_stmt_info *wi = (walk_stmt_info *) data;
+  addr_expr_rewrite_info *info = (addr_expr_rewrite_info *) wi->info;
+
+  if (TREE_CODE (*tp) == ADDR_EXPR)
+{
+  tree arg = TREE_OPERAND (*tp, 0);
+
+  if (info->adjusted_vars->contains (arg))
+   {
+ if (info->avoid_pointer_conversion)
+   {
+ *tp = build_fold_addr_expr (arg);
+ info->modified = true;
+ *walk_subtrees = 0;
+   }
+ else
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt);
+ tree repl = build_fold_addr_expr (arg);
+ gimple *stmt1
+   = gimple_build_assign (make_ssa_name (TREE_TYPE (repl)), repl);
+ tree conv = convert_to_pointer (TREE_TYPE (*tp),
+ gimple_assign_lhs (stmt1));
+ gimple *stmt

[PATCH 3/6] [og9] AMD GCN adjustments for middle-end worker partitioning

2019-09-04 Thread Julian Brown
This patch renames the TARGET_GOACC_ADJUST_PROPAGATION_RECORD
hook introduced in the GCN backend by a previous merge to
TARGET_GOACC_CREATE_PROPAGATION_RECORD, and removes a FIXME relating to
missing worker-partitioning support.

Julian

ChangeLog

gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename
prototype to...
(gcn_goacc_create_propagation_record): This.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
function to...
(gcn_goacc_create_propagation_record): This.  Adjust comment.
* config/gcn/gcn.c (gcn_init_builtins): Override decls for
BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
(gcn_fork_join): Remove inaccurate comment.
(TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to...
(TARGET_GOACC_CREATE_PROPAGATION_RECORD): This.
---
 gcc/ChangeLog.openacc   | 15 +++
 gcc/config/gcn/gcn-protos.h |  2 +-
 gcc/config/gcn/gcn-tree.c   |  6 +++---
 gcc/config/gcn/gcn.c| 11 +++
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index a2b2dcfcf26..0d068ac8ae2 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,18 @@
+2019-09-05  Julian Brown  
+
+   * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename
+   prototype to...
+   (gcn_goacc_create_propagation_record): This.
+   * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
+   function to...
+   (gcn_goacc_create_propagation_record): This.  Adjust comment.
+   * config/gcn/gcn.c (gcn_init_builtins): Override decls for
+BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
+BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
+   (gcn_fork_join): Remove inaccurate comment.
+   (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to...
+   (TARGET_GOACC_CREATE_PROPAGATION_RECORD): This.
+
 2019-09-05  Julian Brown  
 
* Makefile.in (OBJS): Add omp-sese.o.
diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index da7faf29c70..1711862c6a2 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -37,7 +37,7 @@ extern rtx gcn_full_exec ();
 extern rtx gcn_full_exec_reg ();
 extern rtx gcn_gen_undef (machine_mode);
 extern bool gcn_global_address_p (rtx);
-extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
+extern tree gcn_goacc_create_propagation_record (tree record_type, bool sender,
 const char *name);
 extern void gcn_goacc_adjust_gangprivate_decl (tree var);
 extern void gcn_goacc_reduction (gcall *call);
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index c6b6302e9ed..04902a39b29 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -667,12 +667,12 @@ gcn_goacc_reduction (gcall *call)
 }
 }
 
-/* Implement TARGET_GOACC_ADJUST_PROPAGATION_RECORD.
+/* Implement TARGET_GOACC_CREATE_PROPAGATION_RECORD.
  
-   Tweak (worker) propagation record, e.g. to put it in shared memory.  */
+   Create (worker) propagation record in shared memory.  */
 
 tree
-gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
+gcn_goacc_create_propagation_record (tree record_type, bool sender,
 const char *name)
 {
   tree type = record_type;
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index f3f112d95a9..ca9321b5f25 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3468,8 +3468,6 @@ gcn_init_builtins (void)
   TREE_NOTHROW (gcn_builtin_decls[i]) = 1;
 }
 
-/* FIXME: remove the ifdef once OpenACC support is merged upstream.  */
-#ifdef BUILT_IN_GOACC_SINGLE_START
   /* These builtins need to take/return an LDS pointer: override the generic
  versions here.  */
 
@@ -3486,7 +3484,6 @@ gcn_init_builtins (void)
 
   set_builtin_decl (BUILT_IN_GOACC_BARRIER,
gcn_builtin_decls[GCN_BUILTIN_ACC_BARRIER], false);
-#endif
 }
 
 /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
@@ -4765,8 +4762,6 @@ static bool
 gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims),
   bool ARG_UNUSED (is_fork))
 {
-  /* GCN does not use the fork/join concept invented for NVPTX.
- Instead we use standard autovectorization.  */
   return false;
 }
 
@@ -6029,9 +6024,9 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_FUNCTION_VALUE_REGNO_P gcn_function_value_regno_p
 #undef  TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR gcn_gimplify_va_arg_expr
-#undef  TARGET_GOACC_ADJUST_PROPAGATION_RECORD
-#define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \
-  gcn_goacc_adjust_propagation_record
+#undef

[PATCH 0/6] [og9] OpenACC worker partitioning in middle end (AMD GCN)

2019-09-04 Thread Julian Brown
This patch series provides support for worker partitioning in the middle
end. The OpenACC device-lowering pass (oaccdevlow) is split into three
passes: the first assigns parallelism levels to loops, the second (new)
part rewrites basic blocks to implement a neutering/broadcasting scheme
for the OpenACC worker-partitioned execution mode, and the third part
performs the rest of the previous device-lowering pass.

Also included are patches to add support for placing gang-private
variables in special memory (e.g. LDS, "local-data share", on AMD GCN),
and to rewrite reductions targeting reference variables to use temporary
local scalar variables instead.

Further commentary is provided alongside individual patches.

Tested with offloading to AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

Julian Brown (6):
  [og9] Target-dependent gang-private variable decl rewriting
  [og9] OpenACC middle-end worker-partitioning support
  [og9] AMD GCN adjustments for middle-end worker partitioning
  [og9] Fix up tests for oaccdevlow pass splitting
  [og9] Reference reduction localization
  [og9] Enable worker partitioning for AMD GCN

 gcc/ChangeLog.openacc |   83 +
 gcc/Makefile.in   |1 +
 gcc/config/gcn/gcn-protos.h   |2 +-
 gcc/config/gcn/gcn-tree.c |6 +-
 gcc/config/gcn/gcn.c  |   15 +-
 gcc/config/gcn/gcn.opt|2 +-
 gcc/doc/tm.texi   |   14 +
 gcc/doc/tm.texi.in|6 +
 gcc/gimplify.c|  102 +
 gcc/omp-builtins.def  |8 +
 gcc/omp-low.c |   47 +-
 gcc/omp-offload.c |  290 ++-
 gcc/omp-offload.h |1 +
 gcc/omp-sese.c| 2036 +
 gcc/omp-sese.h|   26 +
 gcc/passes.def|2 +
 gcc/target.def|   19 +
 gcc/targhooks.h   |1 +
 gcc/testsuite/ChangeLog.openacc   |   12 +
 .../goacc/classify-kernels-unparallelized.c   |8 +-
 .../c-c++-common/goacc/classify-kernels.c |8 +-
 .../c-c++-common/goacc/classify-parallel.c|8 +-
 .../c-c++-common/goacc/classify-routine.c |8 +-
 .../goacc/classify-kernels-unparallelized.f95 |8 +-
 .../gfortran.dg/goacc/classify-kernels.f95|8 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |8 +-
 .../gfortran.dg/goacc/classify-routine.f95|8 +-
 gcc/tree-core.h   |4 +-
 gcc/tree-pass.h   |2 +
 gcc/tree.c|   11 +-
 gcc/tree.h|2 +
 libgomp/ChangeLog.openacc |5 +
 libgomp/plugin/plugin-gcn.c   |4 +-
 33 files changed, 2660 insertions(+), 105 deletions(-)
 create mode 100644 gcc/omp-sese.c
 create mode 100644 gcc/omp-sese.h

-- 
2.22.0



[PATCH] [og9] Fix libgomp.oacc-fortran/lib-13.f90 async bug

2019-09-04 Thread Julian Brown
This patch fixes a bug with the lib-13.f90 test -- an asynchronous
compute region inside a synchronous data region leads to a data race
copying out/unmapping target data.

This test failed intermittently for AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
wait API calls.
---
 libgomp/ChangeLog.openacc | 5 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index f9d8e6ecd39..c7ef40e922c 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-09-05  Julian Brown  
+
+   * testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
+   wait API calls.
+
 2019-08-13  Julian Brown  
 
* plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
index da944c35de9..ea35d71b789 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -19,11 +19,10 @@ program main
 end do
   !$acc end parallel
 end do
-  !$acc end data
 
   call acc_wait_all_async (nprocs + 1)
-
   call acc_wait (nprocs + 1)
+  !$acc end data
 
   if (acc_async_test (1) .neqv. .TRUE.) call abort
   if (acc_async_test (2) .neqv. .TRUE.) call abort
-- 
2.22.0



[PATCH 3/3] [og9] Wait on queue-full condition in AMD GCN libgomp offloading plugin

2019-08-13 Thread Julian Brown
This patch lets the AMD GCN libgomp plugin wait for asynchronous queues
to have some space to push new operations when they are full, rather
than just erroring out immediately on that condition. This fixes the
libgomp.oacc-c-c++-common/da-4.c test.

Julian

ChangeLog

libgomp/
* plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full
condition.
---
 libgomp/ChangeLog.openacc   |  5 +
 libgomp/plugin/plugin-gcn.c | 11 +--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 2a9a7f18ca2..f9d8e6ecd39 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-08-13  Julian Brown  
+
+   * plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full
+   condition.
+
 2019-08-13  Julian Brown  
 
* plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field.
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 65690e643ed..099f70b647c 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1416,8 +1416,15 @@ queue_push_callback (struct goacc_asyncqueue *aq, void 
(*fn)(void *),
 void *data)
 {
   if (aq->queue_n == ASYNC_QUEUE_SIZE)
-GOMP_PLUGIN_fatal ("Async thread %d:%d: error: queue overflowed",
-  aq->agent->device_id, aq->id);
+{
+  pthread_mutex_lock (>mutex);
+
+  /* Queue is full.  Wait for it to not be full.  */
+  while (aq->queue_n == ASYNC_QUEUE_SIZE)
+   pthread_cond_wait (>queue_cond_out, >mutex);
+
+  pthread_mutex_unlock (>mutex);
+}
 
   pthread_mutex_lock (>mutex);
 
-- 
2.22.0



[PATCH 2/3] [og9] Use temporary buffers for async host2dev copies

2019-08-13 Thread Julian Brown
In libgomp, host-to-device transfers are instigated in several places
where the source data is either on the stack, or in an unstable
heap location (i.e. which is immediately freed after performing the
host-to-device transfer).

When the transfer is asynchronous, this means that taking the address
of source data and attempting the copy from that at some later point
is extremely likely to fail. A previous fix for this problem (from our
internal branch, and included with the AMD GCN offloading patches)
attempted to separate transfers from the stack (performing them
immediately) from transfers from the heap (which can safely be done some
time later).

Unfortunately that doesn't work well with more recent changes to libgomp
and the GCN plugin. So instead, this patch copies the source data for
asynchronous host-to-device copies immediately to a temporary buffer,
then the transfer to the device can safely take place asynchronously
some time later.

Julian

ChangeLog

libgomp/
* plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field.
(copy_data): Free temporary buffer if using.
(queue_push_copy): Add using_src_copy parameter.
(GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to
queue_push_copy.
(GOMP_OFFLOAD_async_host2dev): Likewise.  Allocate temporary buffer and
copy source data to it immediately.
* target.c (gomp_copy_host2dev): Update function comment.
(copy_host2dev_immediate): Remove.
(gomp_map_pointer, gomp_map_vars_internal): Replace calls to
copy_host2dev_immediate with calls to gomp_copy_host2dev.
---
 libgomp/ChangeLog.openacc   | 14 ++
 libgomp/plugin/plugin-gcn.c | 20 ++---
 libgomp/target.c| 56 +++--
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 2279545f361..2a9a7f18ca2 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,17 @@
+2019-08-13  Julian Brown  
+
+   * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field.
+   (copy_data): Free temporary buffer if using.
+   (queue_push_copy): Add using_src_copy parameter.
+   (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to
+   queue_push_copy.
+   (GOMP_OFFLOAD_async_host2dev): Likewise.  Allocate temporary buffer and
+   copy source data to it immediately.
+   * target.c (gomp_copy_host2dev): Update function comment.
+   (copy_host2dev_immediate): Remove.
+   (gomp_map_pointer, gomp_map_vars_internal): Replace calls to
+   copy_host2dev_immediate with calls to gomp_copy_host2dev.
+
 2019-08-08  Julian Brown  
 
* plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index a41568b3306..65690e643ed 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3063,6 +3063,7 @@ struct copy_data
   const void *src;
   size_t len;
   bool use_hsa_memory_copy;
+  bool using_src_copy;
   struct goacc_asyncqueue *aq;
 };
 
@@ -3077,12 +3078,14 @@ copy_data (void *data_)
 hsa_fns.hsa_memory_copy_fn (data->dst, data->src, data->len);
   else
 memcpy (data->dst, data->src, data->len);
+  if (data->using_src_copy)
+free ((void *) data->src);
   free (data);
 }
 
 static void
 queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src,
-size_t len, bool use_hsa_memory_copy)
+size_t len, bool use_hsa_memory_copy, bool using_src_copy)
 {
   if (DEBUG_QUEUES)
 HSA_DEBUG ("queue_push_copy %d:%d: %zu bytes from (%p) to (%p)\n",
@@ -3093,6 +3096,7 @@ queue_push_copy (struct goacc_asyncqueue *aq, void *dst, 
const void *src,
   data->src = src;
   data->len = len;
   data->use_hsa_memory_copy = use_hsa_memory_copy;
+  data->using_src_copy = using_src_copy;
   data->aq = aq;
   queue_push_callback (aq, copy_data, data);
 }
@@ -3137,7 +3141,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void 
*src, size_t n)
 {
   struct agent_info *agent = get_agent_info (device);
   maybe_init_omp_async (agent);
-  queue_push_copy (agent->omp_async_queue, dst, src, n, false);
+  queue_push_copy (agent->omp_async_queue, dst, src, n, false, false);
   return true;
 }
 
@@ -3469,7 +3473,15 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void 
*dst, const void *src,
 {
   struct agent_info *agent = get_agent_info (device);
   assert (agent == aq->agent);
-  queue_push_copy (aq, dst, src, n, image_address_p (agent, dst));
+  /* The source data does not necessarily remain live until the deferred
+ copy happens.  Taking a snapshot of the data here avoids reading
+ uninitialised data later, but means that (a) data is copied twice and
+ (b) mod

[PATCH 1/3] [og9] Wait at end of OpenACC asynchronous kernels regions

2019-08-13 Thread Julian Brown
This patch provides a workaround for unreliable operation of asynchronous
kernels regions on AMD GCN. At present, kernels regions are decomposed
into a series of parallel regions surrounded by a data region capturing
the data-movement clauses needed by the region as a whole:

  #pragma acc kernels async(n)
  { ... }

is translated to:

  #pragma acc data copyin(...) copyout(...)
  {
#pragma acc parallel async(n) present(...)
{ ... }
#pragma acc parallel async(n) present(...)
{ ... }
  }

This is however problematic for two reasons:

 - Variables mapped by the data clause will be unmapped immediately at the end
   of the data region, regardless of whether the inner asynchronous
   parallels have completed. (This causes crashes for GCN.)

 - Even if the "present" clause caused the reference count to stay above zero
   at the end of the data region -- which it doesn't -- the "present"
   clauses on the inner parallel regions would not cause "copyout"
   variables to be transferred back to the host at the appropriate time,
   i.e. when the async parallel region had completed.

There is no "async" data construct in OpenACC, so the correct solution
(which I am deferring on for now) is probably to use asynchronous
"enter data" and "exit data" directives when translating asynchronous
kernels regions instead.

The attached patch just adds a "wait" operation before the end of
the enclosing data region. This works, but introduces undesirable
synchronisation with the host.

Julian

ChangeLog

gcc/
* omp-oacc-kernels.c (add_wait): New function, split out of...
(add_async_clauses_and_wait): ...here. Call new outlined function.
(decompose_kernels_region_body): Add wait at the end of
explicitly-asynchronous kernels regions.
---
 gcc/ChangeLog.openacc  |  7 +++
 gcc/omp-oacc-kernels.c | 28 +---
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 84d80511603..a22f07c817c 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-08-13  Julian Brown  
+
+   * omp-oacc-kernels.c (add_wait): New function, split out of...
+   (add_async_clauses_and_wait): ...here. Call new outlined function.
+   (decompose_kernels_region_body): Add wait at the end of
+   explicitly-asynchronous kernels regions.
+
 2019-08-08  Julian Brown  
 
* config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 20913859c12..a6c4220f472 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -900,6 +900,18 @@ maybe_build_inner_data_region (location_t loc, gimple 
*body,
   return body;
 }
 
+static void
+add_wait (location_t loc, gimple_seq *region_body)
+{
+  /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0).  */
+  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
+  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
+  gimple *wait_call = gimple_build_call (wait_fn, 2,
+ sync_arg, integer_zero_node);
+  gimple_set_location (wait_call, loc);
+  gimple_seq_add_stmt (region_body, wait_call);
+}
+
 /* Helper function of decompose_kernels_region_body.  The statements in
REGION_BODY are expected to be decomposed parallel regions; add an
"async" clause to each.  Also add a "wait" pragma at the end of the
@@ -923,13 +935,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq 
*region_body)
   gimple_omp_target_set_clauses (as_a  (stmt),
  target_clauses);
 }
-  /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0).  */
-  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
-  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
-  gimple *wait_call = gimple_build_call (wait_fn, 2,
- sync_arg, integer_zero_node);
-  gimple_set_location (wait_call, loc);
-  gimple_seq_add_stmt (region_body, wait_call);
+  add_wait (loc, region_body);
 }
 
 /* Auxiliary analysis of the body of a kernels region, to determine for each
@@ -1378,6 +1384,14 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
  a wait directive at the end.  */
   if (async_clause == NULL)
 add_async_clauses_and_wait (loc, _body);
+  else
+/* !!! If we have asynchronous parallel blocks inside a (synchronous) data
+   region, then target memory will get unmapped at the point the data
+   region ends, even if the inner asynchronous parallels have not yet
+   completed.  For kernels marked "async", we might want to use "enter data
+   async(...)" and "exit data async(...)" instead.
+   For now, insert a (synchronous) wait 

[PATCH 0/3] [og9] OpenACC async fixes for AMD GCN

2019-08-13 Thread Julian Brown
These patches stabilise async support for AMD GCN. Several tests that
previously failed (some intermittently) now work.

Further commentary is provided alongside each patch. Tested with
offloading to AMD GCN.

I will apply shortly to the openacc-gcc-9-branch.

Thanks,

Julian

Julian Brown (3):
  [og9] Wait at end of OpenACC asynchronous kernels regions
  [og9] Use temporary buffers for async host2dev copies
  [og9] Wait on queue-full condition in AMD GCN libgomp offloading
plugin

 gcc/ChangeLog.openacc   |  7 +
 gcc/omp-oacc-kernels.c  | 28 ++-
 libgomp/ChangeLog.openacc   | 19 +
 libgomp/plugin/plugin-gcn.c | 31 
 libgomp/target.c| 56 +++--
 5 files changed, 94 insertions(+), 47 deletions(-)

-- 
2.22.0



[PATCH 2/3] [og9] Fix configury for AMD GCN testing

2019-08-08 Thread Julian Brown
This patch updates the configury for AMD GCN for version
of the patch "Forward -foffload=[...] from the driver (compile-time)
to libgomp (run-time)" currently applied to the og9 branch. This is
necessary for OpenACC testing on AMD GCN to work properly, at least in
our test environment.

Julian

ChangeLog

libgomp/
* plugin/configfrag.ac (amdgcn): Set tgt_plugin.
* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
Add AMD GCN support.
(check_effective_target_openacc_amdgcn_accel_selected): Test
offload_target instead of offload_target_openacc.
* testsuite/libgomp.oacc-c++/c++.exp (amdgcn*): Rename stanza to...
(gcn): ...this. Don't set tagopt redundantly here.
* testsuite/libgomp.oacc-c/c.exp (amdgcn*, gcn): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (amdgcn*, gcn): Likewise.
* configure: Regenerated.
---
 libgomp/ChangeLog.openacc  | 13 +
 libgomp/configure  |  1 +
 libgomp/plugin/configfrag.ac   |  1 +
 libgomp/testsuite/lib/libgomp.exp  |  7 +--
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |  3 +--
 libgomp/testsuite/libgomp.oacc-c/c.exp |  3 +--
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  3 +--
 7 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 80d089f49e2..62c56e3bf92 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,16 @@
+2019-08-08  Julian Brown  
+
+	* plugin/configfrag.ac (amdgcn): Set tgt_plugin.
+	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
+	Add AMD GCN support.
+	(check_effective_target_openacc_amdgcn_accel_selected): Test
+	offload_target instead of offload_target_openacc.
+	* testsuite/libgomp.oacc-c++/c++.exp (amdgcn*): Rename stanza to...
+	(gcn): ...this. Don't set tagopt redundantly here.
+	* testsuite/libgomp.oacc-c/c.exp (amdgcn*, gcn): Likewise.
+	* testsuite/libgomp.oacc-fortran/fortran.exp (amdgcn*, gcn): Likewise.
+	* configure: Regenerated.
+
 2019-08-08  Julian Brown  
 
 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_exec_params,
diff --git a/libgomp/configure b/libgomp/configure
index 39da8af4546..85a29c5b5e1 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15781,6 +15781,7 @@ rm -f core conftest.err conftest.$ac_objext \
 		;;
 	  *)
 		tgt_name=gcn
+		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
 		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
 		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 6fedd28eccc..1ea67c913ba 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -256,6 +256,7 @@ if test x"$enable_offload_targets" != x; then
 		;;
 	  *)
 		tgt_name=gcn
+		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
 		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
 		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 19bee806fb0..9644176da2a 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -316,6 +316,9 @@ proc offload_target_to_openacc_device_type { offload_target } {
 	nvptx* {
 	return "nvidia"
 	}
+	amdgcn* {
+	return "gcn"
+	}
 	default {
 	error "Unknown offload target: $offload_target"
 	}
@@ -463,8 +466,8 @@ proc check_effective_target_openacc_amdgcn_accel_selected { } {
 if { ![check_effective_target_openacc_amdgcn_accel_present] } {
 	return 0;
 }
-global offload_target_openacc
-if { [string match "amdgcn*" $offload_target_openacc] } {
+global offload_target
+if { [string match "amdgcn*" $offload_target] } {
 return 1;
 }
 return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 1285a6a1c6d..86aacff0c37 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -111,9 +111,8 @@ if { $lang_test_file_found } {
 
 		set acc_mem_shared 0
 	}
-	amdgcn* {
+	gcn {
 		set acc_mem_shared 0
-		set tagopt "-DACC_DEVICE_TYPE_gcn=\"$offload_target_openacc\""
 	}
 	default {
 		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index f7005ebba48..9ab68bbef14 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -72,9 +72,8 @@ foreach offload_target [concat [split $offload_targets ":"] "disable"] {
 
 	set acc_mem_shared 0
 	}
-	amdgcn* {
+	gcn {
 	set acc_mem_shared 0
-	set tagopt "

[PATCH 3/3] [og9] Use a single worker for OpenACC on AMD GCN

2019-08-08 Thread Julian Brown
This patch sets the number of workers (per-gang) to 1 for AMD GCN,
as a stop-gap measure until the middle-end transformations to enable
multiple workers have been applied.

Julian

ChangeLog

gcc/
* config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure
flag_worker_partitioning is not set.
(TARGET_GOACC_WORKER_PARTITIONING): Remove target hook definition.
* config/gcn/gcn.opt (macc-experimental-workers): Default to off.

libgomp/
* plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of
workers.
---
 gcc/ChangeLog.openacc   | 7 +++
 gcc/config/gcn/gcn.c| 4 ++--
 gcc/config/gcn/gcn.opt  | 2 +-
 libgomp/ChangeLog.openacc   | 5 +
 libgomp/plugin/plugin-gcn.c | 4 +++-
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 0caa1cd1401..84d80511603 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-08-08  Julian Brown  
+
+	* config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure
+	flag_worker_partitioning is not set.
+	(TARGET_GOACC_WORKER_PARTITIONING): Remove target hook definition.
+	* config/gcn/gcn.opt (macc-experimental-workers): Default to off.
+
 2019-07-31  Julian Brown  
 
 	* builtin-types.def (BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 9f73fc8161a..f3f112d95a9 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4662,6 +4662,8 @@ gcn_goacc_validate_dims (tree decl, int dims[], int fn_level,
   /* FIXME: remove -facc-experimental-workers when they're ready.  */
   int max_workers = flag_worker_partitioning ? 16 : 1;
 
+  gcc_assert (!flag_worker_partitioning);
+
   /* The vector size must appear to be 64, to the user, unless this is a
  SEQ routine.  The real, internal value is always 1, which means use
  autovectorization, but the user should not see that.  */
@@ -6038,8 +6040,6 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_GOACC_REDUCTION gcn_goacc_reduction
 #undef  TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS gcn_goacc_validate_dims
-#undef  TARGET_GOACC_WORKER_PARTITIONING
-#define TARGET_GOACC_WORKER_PARTITIONING true
 #undef  TARGET_HARD_REGNO_MODE_OK
 #define TARGET_HARD_REGNO_MODE_OK gcn_hard_regno_mode_ok
 #undef  TARGET_HARD_REGNO_NREGS
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 2fd3996edba..90d35f42e57 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -62,7 +62,7 @@ Target Report RejectNegative Var(flag_bypass_init_error)
 bool flag_worker_partitioning = false
 
 macc-experimental-workers
-Target Report Var(flag_worker_partitioning) Init(1)
+Target Report Var(flag_worker_partitioning) Init(0)
 
 int stack_size_opt = -1
 
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 62c56e3bf92..2279545f361 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-08-08  Julian Brown  
+
+	* plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of
+	workers.
+
 2019-08-08  Julian Brown  
 
 	* plugin/configfrag.ac (amdgcn): Set tgt_plugin.
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 6eaae66c1a9..a41568b3306 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3233,8 +3233,10 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs,
  problem size, so let's do a reasonable number of single-worker gangs.
  64 gangs matches a typical Fiji device.  */
 
+  /* NOTE: Until support for middle-end worker partitioning is merged, use 1
+ for the default number of workers.  */
   if (dims[0] == 0) dims[0] = 64; /* Gangs.  */
-  if (dims[1] == 0) dims[1] = 16; /* Workers.  */
+  if (dims[1] == 0) dims[1] = 1;  /* Workers.  */
 
   /* The incoming dimensions are expressed in terms of gangs, workers, and
  vectors.  The HSA dimensions are expressed in terms of "work-items",


[PATCH 1/3] [og9] Add missing exec_params libgomp plugin entry points

2019-08-08 Thread Julian Brown
This patch adds two missing (dummy) entry points to the GCN libgomp
plugin. These are not used at present, because we have not enabled the
function parameter flattening transformation that uses these entry points
on GCN.

Julian

ChangeLog

libgomp/
* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_exec_params,
GOMP_OFFLOAD_openacc_async_exec_params): New functions.
---
 libgomp/ChangeLog.openacc   |  5 +
 libgomp/plugin/plugin-gcn.c | 17 +
 2 files changed, 22 insertions(+)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index a187ebb7295..80d089f49e2 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-08-08  Julian Brown  
+
+	* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_exec_params,
+	GOMP_OFFLOAD_openacc_async_exec_params): New functions.
+
 2019-07-31  Julian Brown  
 
 	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Use relative
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index b059348c7bf..6eaae66c1a9 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3291,6 +3291,14 @@ GOMP_OFFLOAD_openacc_exec (void (*fn_ptr) (void *), size_t mapnum,
 	NULL);
 }
 
+void
+GOMP_OFFLOAD_openacc_exec_params (void (*fn_ptr) (void *), size_t mapnum,
+  void **hostaddrs, void **devaddrs,
+  unsigned *dims, void *targ_mem_desc)
+{
+  GOMP_PLUGIN_fatal ("OpenACC exec params unimplemented.");
+}
+
 void
 GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), size_t mapnum,
  void **hostaddrs, void **devaddrs,
@@ -3303,6 +3311,15 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), size_t mapnum,
 	aq);
 }
 
+void
+GOMP_OFFLOAD_openacc_async_exec_params (void (*fn) (void *), size_t mapnum,
+	void **hostaddrs, void **devaddrs,
+	unsigned *dims, void *targ_mem_desc,
+	struct goacc_asyncqueue *aq)
+{
+  GOMP_PLUGIN_fatal ("OpenACC async exec params unimplemented.");
+}
+
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (int device)
 {


[PATCH 0/3] [og9] Initial OpenACC fixes for AMD GCN

2019-08-08 Thread Julian Brown
Hi,

This patch series provides basic support for OpenACC on AMD GCN,
using a single worker (per-gang) only. This is enough to improve test
results significantly over the previous state on the og9 branch, but
bugs still remain.

Further commentary attached to individual patches. Tested with offloading
to AMD GCN. I will apply shortly.

Thanks,

Julian

Julian Brown (3):
  [og9] Add missing exec_params libgomp plugin entry points
  [og9] Fix configury for AMD GCN testing
  [og9] Use a single worker for OpenACC on AMD GCN

 gcc/ChangeLog.openacc |  7 ++
 gcc/config/gcn/gcn.c  |  4 ++--
 gcc/config/gcn/gcn.opt|  2 +-
 libgomp/ChangeLog.openacc | 23 +++
 libgomp/configure |  1 +
 libgomp/plugin/configfrag.ac  |  1 +
 libgomp/plugin/plugin-gcn.c   | 21 -
 libgomp/testsuite/lib/libgomp.exp |  7 --
 libgomp/testsuite/libgomp.oacc-c++/c++.exp|  3 +--
 libgomp/testsuite/libgomp.oacc-c/c.exp|  3 +--
 .../libgomp.oacc-fortran/fortran.exp  |  3 +--
 11 files changed, 63 insertions(+), 12 deletions(-)

-- 
2.22.0



[PATCH 8/8] [og9] Update parallel-dims.c and serial-dims.c warning line numbering.

2019-08-02 Thread Julian Brown
This patch adjusts the parallel-dims.c and serial-dims.c tests to
use relative, rather than absolute, line numbers for expected warning
emission.

ChangeLog

2019-07-31  Julian Brown  

* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Use relative
line numbers for warning.
* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Likewise.
---
 libgomp/ChangeLog.openacc | 6 ++
 .../testsuite/libgomp.oacc-c-c++-common/parallel-dims.c   | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c | 8 
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index c850203e145..a187ebb7295 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,9 @@
+2019-07-31  Julian Brown  
+
+   * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Use relative
+   line numbers for warning.
+   * testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Likewise.
+
 2019-07-31  Julian Brown  
 
* config/nvptx/gomp_print.c (gomp_print_string, gomp_print_integer,
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index ec63e3fe2c9..d9f2c75e868 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -158,7 +158,7 @@ int main ()
 gangs_min = workers_min = vectors_min = INT_MAX;
 gangs_max = workers_max = vectors_max = INT_MIN;
 #pragma acc parallel copy (vectors_actual) /* { dg-warning "region contains 
vector partitioned code but is not vector partitioned" } */ \
-  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } 157 } */ \
+  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } .-1 } */ \
   vector_length (VECTORS) /* { dg-warning "'vector_length' value must be 
positive" "" { target c++ } } */
 {
   /* We're actually executing with vector_length (1), just the GCC nvptx
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
index d4692091b84..fd4b17c40c2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
@@ -59,10 +59,10 @@ int main ()
 gangs_max = workers_max = vectors_max = INT_MIN;
 gangs_actual = workers_actual = vectors_actual = 1;
 #pragma acc serial
-/* { dg-warning "region contains gang partitioned code but is not gang 
partitioned" "" { target *-*-* } 58 } */
-/* { dg-warning "region contains worker partitioned code but is not worker 
partitioned" "" { target *-*-* } 58 } */
-/* { dg-warning "region contains vector partitioned code but is not vector 
partitioned" "" { target *-*-* } 58 } */
-/* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } 58 } */
+/* { dg-warning "region contains gang partitioned code but is not gang 
partitioned" "" { target *-*-* } .-1 } */
+/* { dg-warning "region contains worker partitioned code but is not worker 
partitioned" "" { target *-*-* } .-2 } */
+/* { dg-warning "region contains vector partitioned code but is not vector 
partitioned" "" { target *-*-* } .-3 } */
+/* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } .-4 } */
 {
   if (acc_on_device (acc_device_nvidia))
{
-- 
2.22.0



[PATCH 7/8] [og9] NVPTX GOMP_OFFLOAD_openacc_async_construct arg fix and gomp_print_* support

2019-08-02 Thread Julian Brown
This patch introduces versions of the gomp_print_{string,integer,double}
low-level printing functions that work for NVPTX.

ChangeLog

2019-07-31  Julian Brown  

libgomp/
* config/nvptx/gomp_print.c (gomp_print_string, gomp_print_integer,
gomp_print_double): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add
dummy device parameter.
---
 libgomp/ChangeLog.openacc |  7 +++
 libgomp/config/nvptx/gomp_print.c | 20 
 libgomp/plugin/plugin-nvptx.c |  2 +-
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/config/nvptx/gomp_print.c

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index c03f8714408..c850203e145 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-07-31  Julian Brown  
+
+   * config/nvptx/gomp_print.c (gomp_print_string, gomp_print_integer,
+   gomp_print_double): New.
+   * plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add
+   dummy device parameter.
+
 2019-07-31  Julian Brown  
 
* libgomp.map (GOMP_2.0.GOMP_4_BRANCH): Remove GOACC_parallel_keyed_v2.
diff --git a/libgomp/config/nvptx/gomp_print.c 
b/libgomp/config/nvptx/gomp_print.c
new file mode 100644
index 000..811bdd6e9a9
--- /dev/null
+++ b/libgomp/config/nvptx/gomp_print.c
@@ -0,0 +1,20 @@
+#include 
+#include 
+
+void
+gomp_print_string (const char *msg, const char *value)
+{
+  printf ("%s%s\n", msg, value);
+}
+
+void
+gomp_print_integer (const char *msg, int64_t value)
+{
+  printf ("%s%ld\n", msg, value);
+}
+
+void
+gomp_print_double (const char *msg, double value)
+{
+  printf ("%s%f\n", msg, value);
+}
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 09567ce852c..4beb3222e8f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1732,7 +1732,7 @@ GOMP_OFFLOAD_openacc_cuda_set_stream (struct 
goacc_asyncqueue *aq, void *stream)
 }
 
 struct goacc_asyncqueue *
-GOMP_OFFLOAD_openacc_async_construct (void)
+GOMP_OFFLOAD_openacc_async_construct (int device __attribute__((unused)))
 {
   CUstream stream = NULL;
   CUDA_CALL_ERET (NULL, cuStreamCreate, , CU_STREAM_DEFAULT);
-- 
2.22.0



[PATCH 6/8] [og9] Make OpenACC function-parameter explosion optional

2019-08-02 Thread Julian Brown
This patch adjusts the implementation of function-argument flattening
by Cesar posted (for the og7 branch) here so that it only affects NVPTX:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01456.html

Changes made are as follows (briefly):

  * The GOACC_parallel_keyed_v2 libgomp entry point has been removed, in
favour of using a launch tag (GOMP_LAUNCH_ARGS_EXPLODED) to indicate
that an offload function should be launched with flattened-out
arguments (rather than passing all arguments in an array).

  * A new target hook (TARGET_GOACC_EXPLODE_ARGS) has been introduced.  This
must be implemented in the *host* (not offload) compiler, and returns
TRUE if offload kernels should be called with flattened-out arguments.

The patch also contains the configury bits to disable building of libffi
for the AMD GCN target, as is required for the build to complete.

Julian

ChangeLog

2019-07-31  Julian Brown  

* configure.ac (amdgcn*-*-*): Add target-libffi to noconfigdirs for AMD
GCN.
* configure: Regenerated.

gcc/
* builtin-types.def (BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
Remove.
* config/i386/i386.c (ix86_goacc_explode_args): New.
(TARGET_GOACC_EXPLODE_ARGS): Define, using above function.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add TARGET_GOACC_EXPLODE_ARGS hook.
* fortran/types.def (BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
Remove.
* omp-builtins.def (GOACC_parallel_keyed_v2): Remove.
* omp-expand.c (expand_omp_target): Use explode_args target hook.
Use GOMP_LAUNCH_ARGS_EXPLODED launch tag.
* omp-low.c (build_receiver_ref, build_sender_ref,
create_omp_child_function, scan_omp_target, lower_omp_target): Use
explode_args target hook.
* target.def (explode_args): New target hook.
* tree-ssa-structalias.c (target.h): Include.
(find_func_aliases_for_builtin_call): Conditionalise disabling of pass
for OpenACC parallel regions based on explode_args target hook.  Remove
'params' from BUILT_IN_GOACC_PARALLEL arguments.
(find_func_clobbers): Likewise.
(ipa_pta_execute): Update for removed 'params' argument.

include/
* gomp-constants.h (GOMP_LAUNCH_ARGS_EXPLODED): Define.

libgomp/
* libgomp.map (GOMP_2.0.GOMP_4_BRANCH): Remove GOACC_parallel_keyed_v2.
* libgomp_g.h (GOACC_parallel_keyed_v2): Remove prototype.
* oacc-parallel.c (GOACC_parallel_keyed_internal): Rename to...
(GOACC_parallel_keyed): ...this.  Handle GOMP_LAUNCH_ARGS_EXPLODED
launch tag.  Remove previous wrapper functions.
(GOACC_parallel_keyed_v2): Remove.
---
 ChangeLog.openacc  |  6 +++
 configure  |  3 ++
 configure.ac   |  3 ++
 gcc/ChangeLog.openacc  | 24 ++
 gcc/builtin-types.def  |  4 --
 gcc/config/i386/i386.c | 32 +
 gcc/doc/tm.texi|  5 ++
 gcc/doc/tm.texi.in |  2 +
 gcc/fortran/types.def  |  4 --
 gcc/omp-builtins.def   |  4 +-
 gcc/omp-expand.c   | 18 
 gcc/omp-low.c  | 28 +++-
 gcc/target.def |  7 +++
 gcc/tree-ssa-structalias.c | 52 +++--
 include/ChangeLog.openacc  |  4 ++
 include/gomp-constants.h   |  1 +
 libgomp/ChangeLog.openacc  |  9 
 libgomp/libgomp.map|  1 -
 libgomp/libgomp_g.h|  2 -
 libgomp/oacc-parallel.c| 93 --
 20 files changed, 201 insertions(+), 101 deletions(-)

diff --git a/ChangeLog.openacc b/ChangeLog.openacc
index 1b54affbe80..156a9b9a798 100644
--- a/ChangeLog.openacc
+++ b/ChangeLog.openacc
@@ -1,3 +1,9 @@
+2019-07-31  Julian Brown  
+
+   * configure.ac (amdgcn*-*-*): Add target-libffi to noconfigdirs for AMD
+   GCN.
+   * configure: Regenerated.
+
 2018-12-20  Maciej W. Rozycki  
 
* Makefile.def (lang_env_dependencies): Disable `cxx' dependency
diff --git a/configure b/configure
index 033929b0ab8..ef00d1f5249 100755
--- a/configure
+++ b/configure
@@ -3466,6 +3466,9 @@ case "${target}" in
   alpha*-*-*vms*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
+  amdgcn*-*-*)
+noconfigdirs="$noconfigdirs target-libffi"
+;;
   arm*-*-freebsd*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
diff --git a/configure.ac b/configure.ac
index de361880ba7..5184b82f300 100644
--- a/configure.ac
+++ b/configure.ac
@@ -748,6 +748,9 @@ case "${target}" in
   alpha*-*-*vms*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
+  amdgcn*-*-*)
+noconfigdirs="$noconfigdirs target-libffi"
+;;
   arm*-*-freebsd*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 4a806549d50..0caa1cd1401 100644
--- a/gcc/ChangeLo

[PATCH 3/8] [og9] Stub implementation of unwinding for AMD GCN

2019-08-02 Thread Julian Brown
This is a backport to the og9 branch of the patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00443.html

2019-06-25  Andrew Stubbs  

Backport from mainline:

libgcc/
* config/gcn/t-amdgcn (LIB2ADD): Add unwind-gcn.c.
* config/gcn/unwind-gcn.c: New file.
---
 libgcc/ChangeLog.openacc   |  7 +++
 libgcc/config/gcn/t-amdgcn |  3 ++-
 libgcc/config/gcn/unwind-gcn.c | 37 ++
 3 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/gcn/unwind-gcn.c

diff --git a/libgcc/ChangeLog.openacc b/libgcc/ChangeLog.openacc
index 1aaa7178df9..45fc6aaf530 100644
--- a/libgcc/ChangeLog.openacc
+++ b/libgcc/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-06-25  Andrew Stubbs  
+
+   Backport from mainline:
+
+   * config/gcn/t-amdgcn (LIB2ADD): Add unwind-gcn.c.
+   * config/gcn/unwind-gcn.c: New file.
+
 2019-06-25  Kwok Cheung Yeung  
 Andrew Stubbs  
 
diff --git a/libgcc/config/gcn/t-amdgcn b/libgcc/config/gcn/t-amdgcn
index 8687c9f3d9f..adbd866a1d9 100644
--- a/libgcc/config/gcn/t-amdgcn
+++ b/libgcc/config/gcn/t-amdgcn
@@ -1,5 +1,6 @@
 LIB2ADD += $(srcdir)/config/gcn/lib2-divmod.c \
-  $(srcdir)/config/gcn/lib2-divmod-hi.c
+  $(srcdir)/config/gcn/lib2-divmod-hi.c \
+  $(srcdir)/config/gcn/unwind-gcn.c
 
 LIB2ADDEH=
 LIB2FUNCS_EXCLUDE=__main
diff --git a/libgcc/config/gcn/unwind-gcn.c b/libgcc/config/gcn/unwind-gcn.c
new file mode 100644
index 000..bfb7028fed6
--- /dev/null
+++ b/libgcc/config/gcn/unwind-gcn.c
@@ -0,0 +1,37 @@
+/* Stub unwinding implementation.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "unwind.h"
+
+_Unwind_Reason_Code
+_Unwind_Backtrace (_Unwind_Trace_Fn trace, void *trace_argument)
+{
+  return 0;
+}
+
+_Unwind_Ptr
+_Unwind_GetIPInfo (struct _Unwind_Context *c, int *ip_before_insn)
+{
+  return 0;
+}
-- 
2.22.0



[PATCH 4/8] [og9] Enable full GFortran library for AMD GCN

2019-08-02 Thread Julian Brown
This is a backport to the og9 branch of the patch posted for mainline here:

https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00444.html

2019-06-25  Kwok Cheung Yeung  
Andrew Stubbs  

Backport from mainline:

libgfortran/
* configure: Regenerate.
* configure.ac (LIBGFOR_MINIMAL): Do not use on AMD GCN.
---
 libgfortran/ChangeLog.openacc | 9 +
 libgfortran/configure | 3 +--
 libgfortran/configure.ac  | 3 +--
 3 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 libgfortran/ChangeLog.openacc

diff --git a/libgfortran/ChangeLog.openacc b/libgfortran/ChangeLog.openacc
new file mode 100644
index 000..98361640e8d
--- /dev/null
+++ b/libgfortran/ChangeLog.openacc
@@ -0,0 +1,9 @@
+2019-06-25  Kwok Cheung Yeung  
+   Andrew Stubbs  
+
+   Backport from mainline:
+
+   libgfortran/
+   * configure: Regenerate.
+   * configure.ac (LIBGFOR_MINIMAL): Do not use on AMD GCN.
+
diff --git a/libgfortran/configure b/libgfortran/configure
index 487d8c090e2..8b58cdf1c6a 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6164,8 +6164,7 @@ fi
 # * C library support for other features such as signal, environment
 #   variables, time functions
 
- if test "x${target_cpu}" = xnvptx \
-|| test "x${target_cpu}" = xamdgcn; then
+ if test "x${target_cpu}" = xnvptx; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index c06db7b1a78..30ff8734760 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -205,8 +205,7 @@ AM_CONDITIONAL(LIBGFOR_USE_SYMVER_SUN, [test 
"x$gfortran_use_symver" = xsun])
 # * C library support for other features such as signal, environment
 #   variables, time functions
 
-AM_CONDITIONAL(LIBGFOR_MINIMAL, [test "x${target_cpu}" = xnvptx \
-|| test "x${target_cpu}" = xamdgcn])
+AM_CONDITIONAL(LIBGFOR_MINIMAL, [test "x${target_cpu}" = xnvptx])
 
 # Figure out whether the compiler supports "-ffunction-sections 
-fdata-sections",
 # similarly to how libstdc++ does it
-- 
2.22.0



[PATCH 2/8] [og9] Create GCN-specific gthreads

2019-08-02 Thread Julian Brown
This is a backport to the og9 branch of the patch posted to mainline here:

https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00442.html

2019-06-25  Kwok Cheung Yeung  
Andrew Stubbs  

Backport from mainline:

gcc/
* config.gcc (thread_file): Set to gcn for AMD GCN.
* config/gcn/gcn.c (gcn_emutls_var_init): New function.
(TARGET_EMUTLS_VAR_INIT): New hook.

config/
* gthr.m4 (GCC_AC_THREAD_HEADER): Add case for gcn.

libgcc/
* configure: Regenerate.
* config/gcn/gthr-gcn.h: New.
---
 config/ChangeLog.openacc |   7 ++
 config/gthr.m4   |   1 +
 gcc/ChangeLog.openacc|   9 ++
 gcc/config.gcc   |   1 +
 gcc/config/gcn/gcn.c |  12 +++
 libgcc/ChangeLog.openacc |   8 ++
 libgcc/config/gcn/gthr-gcn.h | 163 +++
 libgcc/configure |   1 +
 8 files changed, 202 insertions(+)
 create mode 100644 config/ChangeLog.openacc
 create mode 100644 libgcc/ChangeLog.openacc
 create mode 100644 libgcc/config/gcn/gthr-gcn.h

diff --git a/config/ChangeLog.openacc b/config/ChangeLog.openacc
new file mode 100644
index 000..0a4142d747c
--- /dev/null
+++ b/config/ChangeLog.openacc
@@ -0,0 +1,7 @@
+2019-06-25  Kwok Cheung Yeung  
+Andrew Stubbs  
+
+   Backport from mainline:
+
+   * gthr.m4 (GCC_AC_THREAD_HEADER): Add case for gcn.
+
diff --git a/config/gthr.m4 b/config/gthr.m4
index 7b29f1f3327..4b937306ad0 100644
--- a/config/gthr.m4
+++ b/config/gthr.m4
@@ -13,6 +13,7 @@ AC_DEFUN([GCC_AC_THREAD_HEADER],
 case $1 in
 aix)   thread_header=config/rs6000/gthr-aix.h ;;
 dce)   thread_header=config/pa/gthr-dce.h ;;
+gcn)   thread_header=config/gcn/gthr-gcn.h ;;
 lynx)  thread_header=config/gthr-lynx.h ;;
 mipssde)   thread_header=config/mips/gthr-mipssde.h ;;
 posix) thread_header=gthr-posix.h ;;
diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index e573f621fdd..9e1e9315923 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,12 @@
+2019-06-25  Kwok Cheung Yeung  
+Andrew Stubbs  
+
+   Backport from mainline:
+
+   * config.gcc (thread_file): Set to gcn for AMD GCN.
+   * config/gcn/gcn.c (gcn_emutls_var_init): New function.
+   (TARGET_EMUTLS_VAR_INIT): New hook.
+
 2019-05-22  Kwok Cheung Yeung  
Andrew Stubbs  
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6dc016cab51..aff3bfad3d1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1431,6 +1431,7 @@ amdgcn-*-amdhsa)
fi
# Force .init_array support.
gcc_cv_initfini_array=yes
+   thread_file=gcn
;;
 moxie-*-elf)
gas=yes
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 54c37990d9c..9f73fc8161a 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3159,6 +3159,16 @@ gcn_valid_cvt_p (machine_mode from, machine_mode to, 
enum gcn_cvt_t op)
  || (to == DFmode && (from == SImode || from == SFmode)));
 }
 
+/* Implement TARGET_EMUTLS_VAR_INIT.
+
+   Disable emutls (gthr-gcn.h does not support it, yet).  */
+
+tree
+gcn_emutls_var_init (tree, tree decl, tree)
+{
+  sorry_at (DECL_SOURCE_LOCATION (decl), "TLS is not implemented for GCN.");
+}
+
 /* }}}  */
 /* {{{ Costs.  */
 
@@ -6003,6 +6013,8 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_CONSTANT_ALIGNMENT gcn_constant_alignment
 #undef  TARGET_DEBUG_UNWIND_INFO
 #define TARGET_DEBUG_UNWIND_INFO gcn_debug_unwind_info
+#undef  TARGET_EMUTLS_VAR_INIT
+#define TARGET_EMUTLS_VAR_INIT gcn_emutls_var_init
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN gcn_expand_builtin
 #undef  TARGET_FUNCTION_ARG
diff --git a/libgcc/ChangeLog.openacc b/libgcc/ChangeLog.openacc
new file mode 100644
index 000..1aaa7178df9
--- /dev/null
+++ b/libgcc/ChangeLog.openacc
@@ -0,0 +1,8 @@
+2019-06-25  Kwok Cheung Yeung  
+Andrew Stubbs  
+
+   Backport from mainline:
+
+   * configure: Regenerate.
+   * config/gcn/gthr-gcn.h: New.
+
diff --git a/libgcc/config/gcn/gthr-gcn.h b/libgcc/config/gcn/gthr-gcn.h
new file mode 100644
index 000..4227b515f01
--- /dev/null
+++ b/libgcc/config/gcn/gthr-gcn.h
@@ -0,0 +1,163 @@
+/* Threads compatibility routines for libgcc2 and libobjc.  */
+/* Compile this one with gcc.  */
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted 

[PATCH 1/8] [og9] Add support for constructors and destructors on GCN

2019-08-02 Thread Julian Brown
This is a backport to the og9 branch of the mainline patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01539.html

2019-05-22  Kwok Cheung Yeung  
Andrew Stubbs  

Backport from mainline:

* config.gcc (gcc_cv_initfini_array): Set for AMD GCN.
* config/gcn/gcn-run.c (init_array_kernel, fini_array_kernel): New.
(kernel): Rename to...
(main_kernel): ... this.
(load_image): Load _init_array and _fini_array kernels.
(run): Add argument for kernel to run.
(main): Run init_array_kernel before main_kernel, and
fini_array_kernel after.
* config/gcn/gcn.c (gcn_handle_amdgpu_hsa_kernel_attribute): Allow
amdgpu_hsa_kernel attribute on functions.
(gcn_disable_constructors): Delete.
(TARGET_ASM_CONSTRUCTOR, TARGET_ASM_DESTRUCTOR): Delete.
* config/gcn/crt0.c (size_t): Define.
(_init_array, _fini_array): New.
(__preinit_array_start, __preinit_array_end,
__init_array_start, __init_array_end,
__fini_array_start, __fini_array_end): Declare weak references.
---
 gcc/ChangeLog.openacc| 23 +
 gcc/config.gcc   |  2 ++
 gcc/config/gcn/gcn-run.c | 36 +-
 gcc/config/gcn/gcn.c | 22 +---
 libgcc/config/gcn/crt0.c | 56 
 5 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index aa6db4f6344..e573f621fdd 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,26 @@
+2019-05-22  Kwok Cheung Yeung  
+   Andrew Stubbs  
+
+   Backport from mainline:
+
+   * config.gcc (gcc_cv_initfini_array): Set for AMD GCN.
+   * config/gcn/gcn-run.c (init_array_kernel, fini_array_kernel): New.
+   (kernel): Rename to...
+   (main_kernel): ... this.
+   (load_image): Load _init_array and _fini_array kernels.
+   (run): Add argument for kernel to run.
+   (main): Run init_array_kernel before main_kernel, and
+   fini_array_kernel after.
+   * config/gcn/gcn.c (gcn_handle_amdgpu_hsa_kernel_attribute): Allow
+   amdgpu_hsa_kernel attribute on functions.
+   (gcn_disable_constructors): Delete.
+   (TARGET_ASM_CONSTRUCTOR, TARGET_ASM_DESTRUCTOR): Delete.
+   * config/gcn/crt0.c (size_t): Define.
+   (_init_array, _fini_array): New.
+   (__preinit_array_start, __preinit_array_end,
+   __init_array_start, __init_array_end,
+   __fini_array_start, __fini_array_end): Declare weak references.
+
 2019-07-10  Cesar Philippidis  
Julian Brown  
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 09fb9ecd2cd..6dc016cab51 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1429,6 +1429,8 @@ amdgcn-*-amdhsa)
extra_programs="${extra_programs} mkoffload\$(exeext)"
tm_file="${tm_file} gcn/offload.h"
fi
+   # Force .init_array support.
+   gcc_cv_initfini_array=yes
;;
 moxie-*-elf)
gas=yes
diff --git a/gcc/config/gcn/gcn-run.c b/gcc/config/gcn/gcn-run.c
index 00a71014c20..84718f42846 100644
--- a/gcc/config/gcn/gcn-run.c
+++ b/gcc/config/gcn/gcn-run.c
@@ -66,7 +66,9 @@ bool debug = false;
 
 hsa_agent_t device = { 0 };
 hsa_queue_t *queue = NULL;
-uint64_t kernel = 0;
+uint64_t init_array_kernel = 0;
+uint64_t fini_array_kernel = 0;
+uint64_t main_kernel = 0;
 hsa_executable_t executable = { 0 };
 
 hsa_region_t kernargs_region = { 0 };
@@ -427,14 +429,30 @@ load_image (const char *filename)
   XHSA (hsa_fns.hsa_executable_freeze_fn (executable, ""),
"Freeze GCN executable");
 
-  /* Locate the "main" function, and read the kernel's properties.  */
+  /* Locate the "_init_array" function, and read the kernel's properties.  */
   hsa_executable_symbol_t symbol;
+  XHSA (hsa_fns.hsa_executable_get_symbol_fn (executable, NULL, "_init_array",
+ device, 0, ),
+   "Find '_init_array' function");
+  XHSA (hsa_fns.hsa_executable_symbol_get_info_fn
+   (symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, 
_array_kernel),
+   "Extract '_init_array' kernel object kernel object");
+
+  /* Locate the "_fini_array" function, and read the kernel's properties.  */
+  XHSA (hsa_fns.hsa_executable_get_symbol_fn (executable, NULL, "_fini_array",
+ device, 0, ),
+   "Find '_fini_array' function");
+  XHSA (hsa_fns.hsa_executable_symbol_get_info_fn
+   (symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, 
_array_kernel),
+   "Extract '_fini_array' kernel object kernel object");
+
+  /* Locate the "main" function, and read the kernel's properties.  */
   XHSA (hsa_fns.hsa_executable_get_symbol_fn (executable, NULL, "main&

[PATCH 0/8] [og9] AMD GCN offloading support

2019-08-02 Thread Julian Brown
This patch series provides basic offloading support for OpenMP and
OpenACC on openacc-gcc-9-branch, although OpenACC in particular is likely
to be buggy until we post follow-up fixes from our internal branch.

Further commentary is provided alongside each patch.

The series as a whole has been tested with offloading to NVPTX (no
regressions) and also with offloading to AMD GCN (results reasonable,
but not fully checked).

I will apply shortly.

Thanks,

Julian

Julian Brown (8):
  [og9] Add support for constructors and destructors on GCN
  [og9] Create GCN-specific gthreads
  [og9] Stub implementation of unwinding for AMD GCN
  [og9] Enable full GFortran library for AMD GCN
  [og9] AMD GCN offloading support
  [og9] Make OpenACC function-parameter explosion optional
  [og9] NVPTX GOMP_OFFLOAD_openacc_async_construct arg fix and
gomp_print_* support
  [og9] Update parallel-dims.c and serial-dims.c warning line numbering.

 ChangeLog.openacc |6 +
 config/ChangeLog.openacc  |7 +
 config/gthr.m4|1 +
 configure |3 +
 configure.ac  |3 +
 gcc/ChangeLog.openacc |   63 +
 gcc/builtin-types.def |4 -
 gcc/config.gcc|5 +-
 gcc/config/gcn/gcn-run.c  |   36 +-
 gcc/config/gcn/gcn.c  |   24 +-
 gcc/config/gcn/mkoffload.c|  702 
 gcc/config/gcn/offload.h  |   35 +
 gcc/config/i386/i386.c|   32 +
 gcc/doc/tm.texi   |5 +
 gcc/doc/tm.texi.in|2 +
 gcc/fortran/types.def |4 -
 gcc/omp-builtins.def  |4 +-
 gcc/omp-expand.c  |   18 +-
 gcc/omp-low.c |   28 +-
 gcc/target.def|7 +
 gcc/tree-ssa-structalias.c|   52 +-
 include/ChangeLog.openacc |4 +
 include/gomp-constants.h  |5 +-
 libgcc/ChangeLog.openacc  |   25 +
 libgcc/Makefile.in|2 +
 libgcc/config/gcn/crt0.c  |   56 +
 libgcc/config/gcn/gomp_print.c|  101 +
 libgcc/config/gcn/gthr-gcn.h  |  163 +
 libgcc/config/gcn/reduction.c |   30 +
 libgcc/config/gcn/t-amdgcn|   14 +-
 libgcc/config/gcn/t-gcn-hsa   |   52 +
 libgcc/config/gcn/unwind-gcn.c|   37 +
 libgcc/configure  |1 +
 libgfortran/ChangeLog.openacc |9 +
 libgfortran/configure |3 +-
 libgfortran/configure.ac  |3 +-
 libgomp/ChangeLog.openacc |  179 +
 libgomp/Makefile.am   |2 +-
 libgomp/Makefile.in   |   63 +-
 libgomp/affinity-fmt.c|   10 +-
 libgomp/config.h.in   |3 +
 .../config/{nvptx => accel}/libgomp-plugin.c  |0
 libgomp/config/{nvptx => accel}/lock.c|0
 libgomp/config/{nvptx => accel}/mutex.c   |0
 libgomp/config/{nvptx => accel}/mutex.h   |0
 libgomp/config/{nvptx => accel}/oacc-async.c  |0
 libgomp/config/{nvptx => accel}/oacc-cuda.c   |0
 libgomp/config/{nvptx => accel}/oacc-host.c   |0
 libgomp/config/{nvptx => accel}/oacc-init.c   |0
 libgomp/config/{nvptx => accel}/oacc-mem.c|0
 libgomp/config/{nvptx => accel}/oacc-plugin.c |0
 libgomp/config/{nvptx => accel}/omp-lock.h|0
 libgomp/config/{nvptx => accel}/openacc.f90   |2 +
 libgomp/config/{nvptx => accel}/pool.h|0
 libgomp/config/{nvptx => accel}/proc.c|1 +
 libgomp/config/{nvptx => accel}/ptrlock.c |0
 libgomp/config/{nvptx => accel}/ptrlock.h |0
 libgomp/config/{nvptx => accel}/sem.c |0
 libgomp/config/{nvptx => accel}/sem.h |0
 .../{nvptx => accel}/thread-stacksize.h   |0
 libgomp/config/gcn/affinity-fmt.c |   51 +
 libgomp/config/gcn/bar.c  |  230 ++
 libgomp/config/gcn/bar.h  |  168 +
 libgomp/config/gcn/doacross.h |   58 +
 libgomp/config/gcn/gomp_print.c   |2 +
 libgomp/config/gcn/icv-device.c   |   72 +
 libgomp/config/gcn/simple-bar.h   |   61 +
 libgomp/config/gcn/target.c   |   49 +
 libgomp/config/gcn/task.c |   39 +
 libgomp/config/gcn/team.c |  202 +
 libgomp/config/gcn/time.c |   52 +
 libgomp/config/linux/gomp_print.

[og9] OpenACC assumed-size arrays with non-lexical data mappings

2019-07-09 Thread Julian Brown
Hi,

This patch provides support for implicit mapping of assumed-sized
arrays for OpenACC, in cases where those arrays have previously been
mapped using non-lexical data mappings (e.g. "#pragma acc enter data").

Previously posted here:

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02090.html

and then revised:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00069.html

It's not clear if this is required behaviour for OpenACC, but at least
one test program we are using relies on the semantics introduced by this
patch.

Tested with offloading to nvptx. I will apply to the
openacc-gcc-9-branch shortly.

Julian

ChangeLog

2019-07-10  Cesar Philippidis  
Thomas Schwinge  
        Julian Brown  

gcc/
* gimplify.c (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.c (lower_omp_target): Set the size of assumed-size
Fortran arrays to one to allow use of data already mapped on
the offload device.

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.

>From 2c5a7e445ebadc920730c732279732d2f9b40598 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Thu, 4 Jul 2019 18:14:41 -0700
Subject: [PATCH 2/3] Assumed-size arrays with non-lexical data mappings

	gcc/fortran/
	* trans-openmp.c (gfc_omp_finish_clause): Change clauses mapping
	assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
	* gimplify.c (gimplify_adjust_omp_clauses_1): Raise error for
	assumed-size arrays in map clauses for Fortran/OpenMP.
	* omp-low.c (lower_omp_target): Set the size of assumed-size Fortran
	arrays to one to allow use of data already mapped on the offload device.
---
 gcc/fortran/ChangeLog.openacc |  9 +
 gcc/fortran/trans-openmp.c| 22 +-
 gcc/gimplify.c| 14 ++
 gcc/omp-low.c |  5 +
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/ChangeLog.openacc b/gcc/fortran/ChangeLog.openacc
index c44a5ebdb3b..beba7d94ad2 100644
--- a/gcc/fortran/ChangeLog.openacc
+++ b/gcc/fortran/ChangeLog.openacc
@@ -1,3 +1,12 @@
+2019-07-10  Julian Brown  
+
+	* trans-openmp.c (gfc_omp_finish_clause): Change clauses mapping
+	assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
+	* gimplify.c (gimplify_adjust_omp_clauses_1): Raise error for
+	assumed-size arrays in map clauses for Fortran/OpenMP.
+	* omp-low.c (lower_omp_target): Set the size of assumed-size Fortran
+	arrays to one to allow use of data already mapped on the offload device.
+
 2019-07-10  Julian Brown  
 
 	* openmp.c (resolve_oacc_data_clauses): Allow polymorphic allocatable
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d5ae0b717df..db009130c85 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1137,10 +1137,18 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   tree decl = OMP_CLAUSE_DECL (c);
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  An exception is if the array is
- mapped explicitly in an enclosing data construct for OpenACC, in which
- case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
- error.  */
+ explicitly using array sections.  For OpenACC this restriction is lifted
+ if the array has already been mapped:
+
+   - Using a lexically-enclosing data region: in that case we see the
+ GOMP_MAP_FORCE_PRESENT mapping kind here.
+
+   - Using a non-lexical data mapping ("acc enter data").
+
+ In the latter case we change the mapping type to GOMP_MAP_FORCE_PRESENT.
+ This raises an error for OpenMP in our the caller
+ (gimplify.c:gimplify_adjust_omp_clauses_1).  OpenACC will raise a runtime
+ error if the assumed-size array is not mapped.  */
   if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
   && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
@@ -1148,11 +1156,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
 GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 	 == NULL)
-{
-  error_at (OMP_CLAUSE_LOCATION (c),
-		"implicit mapping of assumed size array %qD", decl);
-  return;
-}
+OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
 
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 60e04ff8353..58142c9eb90 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -10088,7 +10088,21 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
   *list_p = clause;
   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
   gimplify_omp_ctxp = ctx->ou

[og9] Allow the accelerator to have more offloaded functions than the host

2019-07-09 Thread Julian Brown
Hi,

This patch was previously posted here by Cesar:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00668.html

This patch is necessary when not all objects containing offload code
are linked into the final executable, including those in static
libraries. Re-tested with offloading to nvptx. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

2019-07-10  Cesar Philippidis  

libgomp/
* target.c (gomp_load_image_to_device): Allow the accelerator to
possess more offloaded functions than the host.
>From 8fa310efa11254ed430d7e5dca80333a612b699e Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Sun, 7 Jul 2019 11:25:51 -0700
Subject: [PATCH 3/3] Allow the accelerator to have more offloaded functions
 than the host

	libgomp/
	* target.c (gomp_load_image_to_device): Allow the accelerator to
	possess more offloaded functions than the host.
---
 libgomp/ChangeLog.openacc | 5 +
 libgomp/target.c  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 1d88bd54cd2..00c58601336 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-07-10  Cesar Philippidis  
+
+	* target.c (gomp_load_image_to_device): Allow the accelerator to
+	possess more offloaded functions than the host.
+
 2019-07-10  Julian Brown  
 
 	* oacc-parallel.c (GOACC_enter_exit_data): Fix optional arguments for
diff --git a/libgomp/target.c b/libgomp/target.c
index a4ed763d507..c81e5ababb7 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2131,7 +2131,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 = devicep->load_image_func (devicep->target_id, version,
 target_data, _table);
 
-  if (num_target_entries != num_funcs + num_vars)
+  if (num_target_entries < num_funcs + num_vars)
 {
   gomp_mutex_unlock (>lock);
   if (is_register_lock)
-- 
2.22.0



[og9] Support Fortran 2003 class pointers in OpenACC

2019-07-09 Thread Julian Brown
This patch provides initial support for Fortran 2003 polymorphic class
pointers in OpenACC. This necessitated some rewriting of the lowering
code in gfc_trans_omp_clauses, partly reverting some of the changes
made by the earlier manual deep copy support. In the new code, I've
tried to reuse existing lowering code in the Fortran front-end where
appropriate.

The main changes can be summarised thus:

1. Polymorphic class pointers can be used in OpenACC data-mapping
clauses. Class descriptors (comprising a _data pointer and a _vptr
virtual-table pointer) are mapped using GOMP_MAP_TO_PSET, in a similar
way to the existing support for array descriptors.

2. For OpenACC, a new internal-only gomp_map_kind has been introduced
when mapping derived-type pointer components, GOMP_MAP_ATTACH_DETACH,
instead of hijacking GOMP_MAP_ALWAYS_POINTER for attach/detach
operations then rewriting it in gimplify.c. This cleans up some code
paths and hopefully self-documents better.

3. OpenACC "enter data" and "exit data" now have GOMP_MAP_POINTER and
GOMP_MAP_PSET mappings removed during gimplification. In some
circumstances, passing an array to a function/subroutine and then doing
an "enter data" on it could leave dangling references to the function's
stack, although the actual array data is defined outside the function.
In any case, the pointer/pointer-set mappings don't seem to be
necessary for OpenACC "enter data".

Tested with offloading to nvptx. I will apply shortly (to the
openacc-gcc-9-branch).

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (insert_struct_comp_map): Handle GOMP_MAP_ATTACH_DETACH.
(gimplify_scan_omp_clauses): Separate out handling of OACC_ENTER_DATA
and OACC_EXIT_DATA. Remove GOMP_MAP_POINTER and GOMP_MAP_TO_PSET
mappings, apart from those following GOMP_MAP_DECLARE_{,DE}ALLOCATE.
Handle GOMP_MAP_ATTACH_DETACH.
* tree-pretty-print.c (dump_omp_clause): Support GOMP_MAP_ATTACH_DETACH.
Print "bias" not "len" for attach/detach clause types.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_ATTACH_DETACH.

gcc/c/
* c-typeck.c (handle_omp_array_sections): Use GOMP_MAP_ATTACH_DETACH
for OpenACC attach/detach operations.

gcc/cp/
* semantics.c (handle_omp_array_sections): Likewise.
(finish_omp_clauses): Handle GOMP_MAP_ATTACH_DETACH.

gcc/fortran/
* openmp.c (resolve_oacc_data_clauses): Allow polymorphic allocatable
variables.
* trans-expr.c (gfc_conv_component_ref,
conv_parent_component_reference): Make global.
(gfc_auto_dereference_var): New function, broken out of...
(gfc_conv_variable): ...here. Call outlined function instead.
* trans-openmp.c (gfc_trans_omp_array_section): New function, broken out
of...
(gfc_trans_omp_clauses): ...here. Separate out OpenACC derived
type/polymorphic class pointer handling. Call above outlined function.
* trans.h (gfc_conv_component_ref, conv_parent_component_references,
gfc_auto_dereference_var): Add prototypes.

gcc/testsuite/
* c-c++-common/goacc/mdc-1.c: Update clause matching patterns.

libgomp/
* oacc-parallel.c (GOACC_enter_exit_data): Fix optional arguments for
changes to clause stripping in enter data/exit data directives.
* testsuite/libgomp.oacc-fortran/class-ptr-param.f95: New test.
* testsuite/libgomp.oacc-fortran/classtypes-1.f95: New test.
* testsuite/libgomp.oacc-fortran/classtypes-2.f95: New test.
* testsuite/libgomp.oacc-fortran/derivedtype-1.f95: New test.
* testsuite/libgomp.oacc-fortran/derivedtype-2.f95: New test.
* testsuite/libgomp.oacc-fortran/multidim-slice.f95: New test.
>From 3c260613f2e74d6639c4dbd43b018b6640ae8454 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Wed, 20 Feb 2019 05:21:15 -0800
Subject: [PATCH 1/3] Support Fortran 2003 class pointers in OpenACC

	gcc/
	* gimplify.c (insert_struct_comp_map): Handle GOMP_MAP_ATTACH_DETACH.
	(gimplify_scan_omp_clauses): Separate out handling of OACC_ENTER_DATA
	and OACC_EXIT_DATA. Remove GOMP_MAP_POINTER and GOMP_MAP_TO_PSET
	mappings, apart from those following GOMP_MAP_DECLARE_{,DE}ALLOCATE.
	Handle GOMP_MAP_ATTACH_DETACH.
	* tree-pretty-print.c (dump_omp_clause): Support GOMP_MAP_ATTACH_DETACH.
	Print "bias" not "len" for attach/detach clause types.

	include/
	* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_ATTACH_DETACH.

	gcc/c/
	* c-typeck.c (handle_omp_array_sections): Use GOMP_MAP_ATTACH_DETACH
	for OpenACC attach/detach operations.

	gcc/cp/
	* semantics.c (handle_omp_array_sections): Likewise.
	(finish_omp_clauses): Handle GOMP_MAP_ATTACH_DETACH.

	gcc/fortran/
	* openmp.c (resolve_oacc_data_clauses): Allow polymorphic allocatable
	variables.
	* trans-expr.c (gfc_conv_component_re

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-12 Thread Julian Brown
On Wed, 12 Jun 2019 13:57:22 +0200
Thomas Schwinge  wrote:

> Hi!
> 
> First, thanks for picking this up, and improving the patch you
> inherited.

Thanks for review!

> I understand right that this will address some aspects of PR90115
> "OpenACC: predetermined private levels for variables declared in
> blocks" (so please mention that one in the ChangeLog updates, and
> commit log), but it doesn't address all of these aspects (and see
> also Cesar's list in
> <http://mid.mail-archive.com/70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com>),
> and also not yet PR90114 "Predetermined private levels for variables
> declared in OpenACC accelerator routines"?

There's two possible reasons for placing gang-private variables in
shared memory: correct implementation of OpenACC semantics, or
optimisation, since shared memory is faster than local memory (on NVidia
devices). Handling of private variables is intimately tied with the
execution model for gangs/workers/vectors implemented by a particular
target: for PTX, that's handled in the backend using a
broadcasting/neutering scheme.

That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something
like an atomic operation in worker-partitioned mode and expects a
worker-single variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work
properly.

It is *not* sufficient for the next level down, though -- expecting to
perform atomic operations in vector-partitioned mode on a variable
that is declared in vector-single mode, i.e. so that it is supposed to
be shared across all vector elements. AFAIK, that's not
straightforward, and we haven't attempted to implement it.

I think the original motivation for this patch was optimisation, though
-- typical code won't try to use atomics in this way. Cesar's list of
caveats that you linked to seems to support that notion.

> On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown
>  wrote:
> > --- a/gcc/config/nvptx/nvptx.c
> > +++ b/gcc/config/nvptx/nvptx.c  
> 
> > @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
> >  write_shared_buffer (asm_out_file, vector_red_sym,
> >  vector_red_align, vector_red_size);
> >  
> > +  if (gangprivate_shared_size)
> > +write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> > +gangprivate_shared_align,
> > gangprivate_shared_size);  
> 
> Curious, what is the reason that we maintain this
> '__gangprivate_shared' variable on a per-file basis instead of on a
> per-function basis (with names '__gangprivate_shared_[function]', or
> similar), which should make it more obvious where each block of
> '.shared' memory belongs to?

I can't comment on that, I'm afraid that was a part of the patch that I
inherited and didn't alter much...

> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi  
> 
> > +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree
> > @var{var}) +This hook, if defined, is used by accelerator target
> > back-ends to expand +specially handled kinds of VAR_DECL
> > expressions.  A particular use is to +place variables with specific
> > attributes inside special accelarator +memories.  A return value of
> > NULL indicates that the target does not +handle this VAR_DECL, and
> > normal RTL expanding is resumed. +@end deftypefn  
> 
> I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
> Using different "memories" for specially tagged DECLs seems to be a
> pretty generic concept (address spaces?), and...

This is partly another NVPTX weirdness -- the target uses address
spaces, but only within the backend, and without using the generic
middle-end address space machinery. The other reason for using an
attribute instead of assigning an address space is that the former can
be detected by the target compiler, but will be ignored by the host
compiler. Forcing use of an address space this early would mean that
the same non-standard address space would have to make sense for both
host and offloaded code.

For AMD GCN, we do use the generic address space support, and I found
that I could re-use the "oacc gangprivate" attribute -- but not the
expand_accel_var hook (expand time is too late for that target).
Instead, another new hook "TARGET_GOACC_ADJUST_GANGPRIVATE_DECL" is
called from omp-offload.c:execute_oacc_device_lower for variables that
have the "oacc gangprivate" attribute set. Those bits haven't been
posted upstream yet, though.

> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target,
> &

Re: [PATCH, OpenACC] (1/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2019-06-09 Thread Julian Brown
On Wed, 5 Dec 2018 21:10:45 +
Julian Brown  wrote:

> Thanks for review! How's this version?
> 
> I took the liberty of fixing the patch for Fortran array-descriptor
> mappings that use a PSET, also, and adding another test for that
> functionality.

This is a ping/new version of this patch, incorporating previous review
comments and also fixing the inheritance behaviour for references (e.g.
for array slices of Fortran function arguments). I've also merged the
two patches sent below into one:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01790.html
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01791.html

The second part having been conditionally approved based on approval of
the first already, and rebased.

Re-tested with offloading to NVPTX. OK?

Thanks,

Julian

2019-06-09  Julian Brown  
Cesar Philippidis  

gcc/ 
* gimplify.c (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data
constructs, and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g.
array slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be
present in OpenACC context. 

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
assumed-size array if present in a lexically-enclosing data
construct.
(gfc_omp_finish_clause): Guard addition of clauses for pointers
with DECL_P.

gcc/testsuite/ 
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.

libgomp/ 
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: New test.
commit db3025e82a47ee8ca9fa8c87daa4eb9a7007fe75
Author: Julian Brown 
Date:   Thu Aug 16 20:02:10 2018 -0700

Inheritance of array sections on data constructs

    2018-08-28  Julian Brown  
Cesar Philippidis  

gcc/
* gimplify.c (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
assumed-size array if present in a lexically-enclosing data construct.
(gfc_omp_finish_clause): Guard addition of clauses for pointers with
DECL_P.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: New test.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 0eb5956cc53..56d56151a00 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1076,9 +1076,13 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be
- mapped explicitly using array sections.  */
-  if (TREE_CODE (decl) == PARM_DECL
+  /* Assu

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-07 Thread Julian Brown
Hi Jakub,

Thanks for the review! I believe I've addressed all your comments in
the attached version of the patch.

On Mon, 3 Jun 2019 18:23:00 +0200
Jakub Jelinek  wrote:

> Why vec * rather than vec?

> > @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context
> > *outer_ctx) }
> >  
> >ctx->cb.decl_map = new hash_map;
> > +  ctx->oacc_addressable_var_decls = new vec ();  
> 
> You then don't have to new it here and delete below.  As the context
> is cleared with XCNEW, you don't need to do anything here, and just
> release when deleting.  Note, even if using a pointer for some reason
> was needed (not in this case), using unconditional new for something
> only used for small subset of contexts is unacceptable, it would be
> then desirable to only create when needed.

Fixed.

> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +  {
> > +   tree decl = OMP_CLAUSE_DECL (c);
> > +   if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > + ctx->oacc_addressable_var_decls->safe_push (decl);
> > +  }
> > +}  
> 
> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
> I'd hope you only want to do that for OpenACC contexts.  Perhaps it
> is ok to bail out early if the context isn't OpenACC one.  On the
> other side, the if (!ctx) condition makes no sense, the callers of
> course guarantee that ctx is non-NULL.

I'm not sure where that came from -- ctx can be NULL at the top-level
of lower_omp as called from execute_lower_omp. Maybe that was left over
from an earlier version of the patch. Anyway, I've removed that bit
and fixed the patch to only call oacc_record_private_var_clauses in
OpenACC contexts.

> > @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p,
> > omp_context *ctx) ctx);
> >break;
> >  case GIMPLE_BIND:
> > +  oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a  > *> (stmt)));  
> 
> Again, why is this done unconditionally?  It should be relevant to
> gather it only in some subset of context, so guard that and don't do
> it otherwise.

And here (where ctx *can* be NULL).

> >lower_omp (gimple_bind_body_ptr (as_a  (stmt)),
> > ctx); maybe_remove_omp_member_access_dummy_vars (as_a 
> > (stmt)); break;
> > @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
> >  
> >if (all_contexts)
> >  {
> > +  splay_tree_foreach (all_contexts,
> > process_oacc_gangprivate_1, NULL);  
> 
> Similarly.  Either guard with if (flag_openacc), or have some flag
> cleared at the start of the pass and set only if you find something
> interesting so that the splay_tree_foreach does something.

I've introduced maybe_oacc_gangprivate_vars, and the splay tree walk is
only called if that's true. It's set whenever something's put in
oacc_addressable_var_decls in some omp context.

Re-tested with offloading to NVPTX. OK?

Thanks,

Julian

commit 6c2a018b940d0b132395048b0600f7d897319ee2
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2019-06-03  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
function comment.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(maybe_oacc_gangprivate_vars): New global variable.
 

Re: [wwwdocs] Document existence of openacc-gcc-9-branch

2019-06-05 Thread Julian Brown
On Wed, 5 Jun 2019 10:30:41 +0200
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On Tue, 4 Jun 2019 23:05:53 +0100, Julian Brown
>  wrote:
> > I've pushed a new branch "openacc-gcc-9-branch" to the Git
> > mirror (i.e. as a Git-only branch), for development of OpenACC and
> > related functionality on top of the GCC 9 branch. It's currently
> > based off the gcc-9_1_0-release tag, and contains a number of
> > patches mainly merged from either the openacc-gcc-8-branch, or from
> > further-developed versions of those patches that have been
> > submitted for upstream review.
> > 
> > This patch updates the svn.html page to point to the new branch
> > rather than the old openacc-gcc-8-branch, which is retired now.
> > 
> > OK to commit?  
> 
> As obvious, but please also add an "openacc-gcc-8-branch" stanza next
> to "openacc-gcc-7-branch" in the "Merged Development Branches"
> section, and update the "gomp-4_0-branch" and "openacc-gcc-7-branch"
> stanzas accordingly.
> 
> Well, actually please move "gomp-4_0-branch", "openacc-gcc-7-branch",
> and "openacc-gcc-8-branch" into the "Inactive Development Branches"
> section, for all "These branches are inactive and contain work that
> might not been merged": they all contain some changes that have not
> been forward-ported to their later instances.

I've committed this version.

Thanks,

Julian
Index: htdocs/svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.225
diff -u -p -r1.225 svn.html
--- htdocs/svn.html	30 Sep 2018 14:38:47 -	1.225
+++ htdocs/svn.html	5 Jun 2019 18:39:19 -
@@ -291,19 +291,19 @@ the command svn log --stop-on-copy
   Patches should be marked with the tag [no-undefined-overflow]
   in the subject line.  The branch is maintained by Richard Biener.
 
-  https://gcc.gnu.org/wiki/OpenACC;>openacc-gcc-8-branch
+  https://gcc.gnu.org/wiki/OpenACC;>openacc-gcc-9-branch
   This https://gcc.gnu.org/wiki/GitMirror;>Git-only branch is
   used for collaborative development
   of https://gcc.gnu.org/wiki/OpenACC;>OpenACC support and related
   functionality, such
   as https://gcc.gnu.org/wiki/Offloading;>offloading support.  The
-  branch is based on gcc-8-branch.  Find it
+  branch is based on gcc-9-branch.  Find it
   at git://gcc.gnu.org/git/gcc.git,
-  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-8-branch;>https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-8-branch,
+  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch;>https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch,
   or
-  https://github.com/gcc-mirror/gcc/tree/openacc-gcc-8-branch;>https://github.com/gcc-mirror/gcc/tree/openacc-gcc-8-branch.
-  Please send email with a short-hand [og8] tag in the subject
-  line, and use ChangeLog.openacc files.
+  https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch;>https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch.
+  Please send patch emails with a short-hand [og9] tag in the
+  subject line, and use ChangeLog.openacc files.
 
   https://gcc.gnu.org/wiki/plugins;>plugins
   This branch adds plugin functionality to GCC.  See the 
   https://gcc.gnu.org/wiki/tuples/;>gimple-tuples-branch
   gomp-20050608-branch
   gomp-3_0-branch
-  gomp-4_0-branch
-  This branch was used to update
-  the https://gcc.gnu.org/wiki/openmp;>OpenMP support to version
-  4.0, including development
-  of https://gcc.gnu.org/wiki/Offloading;>offloading support in
-  GCC as well as support
-  for https://gcc.gnu.org/wiki/OpenACC;>OpenACC.  These features
-  got merged into trunk.  Based on gcc-6-branch then, this branch was used for
-  on-going development of OpenACC support and related functionality, which then
-  moved to openacc-gcc-7-branch, and now openacc-gcc-8-branch.
 
   java-gui-20050128-branch
   This was a temporary branch for development of java GUI libraries
@@ -820,11 +810,6 @@ inactive.
   ea...@eagercon.com.
   All changes have been merged into mainline.
 
-  openacc-gcc-7-branch
-  Based on gcc-7-branch, this branch was used for on-going development
-  of https://gcc.gnu.org/wiki/OpenACC;>OpenACC support and related
-  functionality, which now moved openacc-gcc-8-branch.
-
   pch-branch
   tree-ssa-20020619-branch
   https://gcc.gnu.org/wiki/Var_Tracking_Assignments;>var-tracking-assignments*-branch
@@ -978,6 +963,25 @@ merged.
   OpenMP support in GCC.  They were never properly maintained and
   have now been superseded by gomp-20050608-branch.
 
+  gomp-4_0-branch
+  This branch was based on gcc-6-branch, and was used to update
+  the https://gcc.gnu.org/wiki/openmp;>

[wwwdocs] Document existence of openacc-gcc-9-branch

2019-06-04 Thread Julian Brown
Hi,

I've pushed a new branch "openacc-gcc-9-branch" to the Git
mirror (i.e. as a Git-only branch), for development of OpenACC and
related functionality on top of the GCC 9 branch. It's currently based
off the gcc-9_1_0-release tag, and contains a number of patches mainly
merged from either the openacc-gcc-8-branch, or from further-developed
versions of those patches that have been submitted for upstream review.

This patch updates the svn.html page to point to the new branch rather
than the old openacc-gcc-8-branch, which is retired now.

OK to commit?

Thanks,

Julian
Index: htdocs/svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.225
diff -u -p -r1.225 svn.html
--- htdocs/svn.html	30 Sep 2018 14:38:47 -	1.225
+++ htdocs/svn.html	4 Jun 2019 17:10:37 -
@@ -291,18 +291,18 @@ the command svn log --stop-on-copy
   Patches should be marked with the tag [no-undefined-overflow]
   in the subject line.  The branch is maintained by Richard Biener.
 
-  https://gcc.gnu.org/wiki/OpenACC;>openacc-gcc-8-branch
+  https://gcc.gnu.org/wiki/OpenACC;>openacc-gcc-9-branch
   This https://gcc.gnu.org/wiki/GitMirror;>Git-only branch is
   used for collaborative development
   of https://gcc.gnu.org/wiki/OpenACC;>OpenACC support and related
   functionality, such
   as https://gcc.gnu.org/wiki/Offloading;>offloading support.  The
-  branch is based on gcc-8-branch.  Find it
+  branch is based on gcc-9-branch.  Find it
   at git://gcc.gnu.org/git/gcc.git,
-  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-8-branch;>https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-8-branch,
+  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch;>https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch,
   or
-  https://github.com/gcc-mirror/gcc/tree/openacc-gcc-8-branch;>https://github.com/gcc-mirror/gcc/tree/openacc-gcc-8-branch.
-  Please send email with a short-hand [og8] tag in the subject
+  https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch;>https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch.
+  Please send email with a short-hand [og9] tag in the subject
   line, and use ChangeLog.openacc files.
 
   https://gcc.gnu.org/wiki/plugins;>plugins


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-03 Thread Julian Brown
On Tue, 11 Dec 2018 15:08:11 +
Julian Brown  wrote:

> Is this version OK? Re-tested with offloading to NVPTX.

This is a ping for the patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

This is a new version of the patch, rebased and with a couple of
additional bugfixes, as follows:

Firstly, in mark_oacc_gangprivate, each decl is looked up (using
maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
innermost-nested copy of the decl.

Secondly, I'd misunderstood when the maximum parallelism level was
calculated for each nested omp_context, meaning that the code to
trigger adding the "oacc gangprivate" attribute could trigger in the
wrong circumstances. I've fixed this by moving the attribute-setting to
execute_lower_omp.

I've also added a new testcase (gangprivate-attrib-2.f90). Re-tested
with offloading to nvptx.

OK for trunk?

Thank you,

Julian

2019-06-03  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
function comment.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.
(lower_omp_target): Likewise, for "target" clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(process_oacc_gangprivate_1): New function.
(lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within OMP
regions.
(execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
context.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.
>From 917189cd07fcb68ba289c5fbcd768b7d4dff785f Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Thu, 9 Aug 2018 20:27:04 -0700
Subject: [PATCH] [OpenACC] Add support for gang local storage allocation in
 shared memory

2019-06-03  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
	function comment.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
	mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.
	(lower_omp_target): Likewise, for "target" clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(process_oacc_gangprivate_1): New funct

[PATCH, og8] Don't rescan "attach" node for dereferenced struct member

2019-02-15 Thread Julian Brown
Hi,

The following (og8 branch) patch added support for
attaching/detaching from dereferenced struct members:

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01778.html

Unfortunately I made a mistake in the portion of that patch that
inserts new alloc and firstprivate_pointer nodes for the struct base,
meaning that the node rewritten to an attach operation would be
scanned again. This is both unnecessary, and can cause problems in some
circumstances.

Tested with offloading to nvptx, no regressions and the new test passes.
I will apply (to the og8 branch) shortly.

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Avoid scanning
'c' again after creating base-pointer nodes for
dereferenced struct.

gcc/testsuite/
* gfortran.dg/goacc/derived-types-2.f90: New.
commit e374d415801588435d62ac214e0313ffd3ef2198
Author: Julian Brown 
Date:   Thu Feb 14 16:40:21 2019 -0800

[og8] Don't rescan "attach" node for dereferenced struct member

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Avoid scanning 'c' again
after creating base-pointer nodes for dereferenced struct.

gcc/testsuite/
* gfortran.dg/goacc/derived-types-2.f90: New.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8bf11eb659e..2ff5b68e0cc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8289,8 +8289,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		  *list_p = c2;
 		  OMP_CLAUSE_CHAIN (c2) = c3;
 		  OMP_CLAUSE_CHAIN (c3) = c;
-		  c = c3;
-		  list_p = _CLAUSE_CHAIN (c3);
 
 		  struct_deref_set->add (decl);
 		}
diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90 b/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90
new file mode 100644
index 000..d01583fac89
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90
@@ -0,0 +1,14 @@
+module bar
+  type :: type1
+ real(8), pointer, public :: p(:) => null()
+  end type
+  type :: type2
+ class(type1), pointer :: p => null()
+  end type
+end module
+
+subroutine foo (var)
+   use bar
+   type(type2), intent(inout) :: var
+   !$acc enter data create(var%p%p)
+end subroutine


[PATCH, og8] OpenACC: attach/detach array slices on dereferenced struct members

2019-01-31 Thread Julian Brown
Hi,

This patch adds support for array slices on dereferenced struct
members, e.g.:

#pragma acc parallel copy(mystruct->a[0:n])

This works by making a new mapping pair for each struct pointer used in
the directive ("alloc(mystruct) firstprivate_pointer(mystruct)").

The C/C++ parsers permit chained dereferences
("mystruct->anotherstruct->bla[0:n]"). In this case, the current
implementation performs an attach/detach operation on the
final/innermost dereference only (so, "bla[0:n]" attaches to
the appropriate offset in "anotherstruct"). Other options might be to
explicitly disallow chained dereferences, or attach the whole chain
sequentially. The standard isn't helpful here (as of 2.6), but I think
that the chosen behaviour is reasonably consistent.

Arrays of structures aren't (yet?) supported
(either "copy(structarr[i].a[0:n])" or "copy(structarr[i]->a[0:n])"). I
added a basic test case for that.

Tested with offloading to nvptx, no regressions and the new tests pass.

I will apply shortly (to the og8 branch).

Thanks,

Julian

ChangeLog

gcc/c/
* c-typeck.c (handle_omp_array_sections_1): Handle chained dereferences.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* semantics.c (handle_omp_array_sections_1): Handle array section on
dereferenced struct member.
(finish_omp_clauses): Don't error on multiple dereferenced struct
elements with the same base.

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle array sections on
dereferenced struct members.

gcc/testsuite/
* c-c++-common/goacc/deep-copy-arrayofstruct.c: New test.

libgomp/
* testsuite/libgomp.oacc-c++/deep-copy-12.C: New test.
* testsuite/libgomp.oacc-c++/deep-copy-13.C: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-9.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-11.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-14.c: New test.
commit 56101feb78bc2e3344159f96b7d0ab9eaf4bd529
Author: Julian Brown 
Date:   Wed Jan 30 04:54:24 2019 -0800

[og8] Attach/detach array slices on dereferenced struct members

gcc/c/
* c-typeck.c (handle_omp_array_sections_1): Handle chained dereferences.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* semantics.c (handle_omp_array_sections_1): Handle array section on
dereferenced struct member.
(finish_omp_clauses): Don't error on multiple dereferenced struct
elements with the same base.

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle array sections on
dereferenced struct members.

gcc/testsuite/
* c-c++-common/goacc/deep-copy-arrayofstruct.c: New test.

libgomp/
* testsuite/libgomp.oacc-c++/deep-copy-12.C: New test.
* testsuite/libgomp.oacc-c++/deep-copy-13.C: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-9.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-11.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-14.c: New test.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index d25e2d8c14c..7f021649216 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12445,9 +12445,16 @@ handle_omp_array_sections_1 (tree c, tree t, vec ,
 		  return error_mark_node;
 		}
 	  t = TREE_OPERAND (t, 0);
+	  if (ort == C_ORT_ACC && TREE_CODE (t) == MEM_REF)
+		{
+		  if (maybe_ne (mem_ref_offset (t), 0))
+	error_at (OMP_CLAUSE_LOCATION (c),
+			  "cannot dereference %qE in %qs clause", t,
+			  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+		  else
+		t = TREE_OPERAND (t, 0);
+		}
 	}
-	  if (TREE_CODE (t) == MEM_REF)
-	t = TREE_OPERAND (t, 0);
 	}
   if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
 	{
@@ -13750,11 +13757,18 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		  break;
 		}
 		  t = TREE_OPERAND (t, 0);
+		  if (ort == C_ORT_ACC && TREE_CODE (t) == MEM_REF)
+	{
+		  if (maybe_ne (mem_ref_offset (t), 0))
+			error_at (OMP_CLAUSE_LOCATION (c),
+  "cannot dereference %qE in %qs clause", t,
+  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+		  else
+			t = TREE_OPERAND (t, 0);
+		}
 		}
 	  if (remove)
 		break;
-	  if (TREE_CODE (t) == MEM_REF)
-		t = TREE_OPERAND (t, 0);
 	  if (VAR_P (t) || TREE_CODE (t) == PARM_DECL)
 		{
 		  if (bitmap_bit_p (_field_head, DECL_UID (t)))
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 497fd39b10c..72c4dcec2b3 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -4557,6 +4557,8 @@ handle_omp_array_

[PATCH] Better distinguish OpenACC and OpenMP sections in libgomp.texi

2019-01-10 Thread Julian Brown
Hi,

This patch looks like it should have been attached to the following
email:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01173.html

but it looks like the wrong patch (and ChangeLog!) were attached
instead. For convenience, I'll copy Cesar's blurb (mildly corrected)
from the previous message:

"This patch updates the libgomp documentation to more clearly identify
OpenMP-specific sections. Specifically, the sections "Runtime Library
Routine" and "Environment Variables" are now prefixed by OpenMP, because
those sections are not applicable to OpenACC."

I've re-checked that the generated libgomp.pdf looks ok.

OK? (Documentation, so should be OK for stage 4, IIUC.)

Thanks,

Julian

ChangeLog

2019-xx-xx  Thomas Schwinge  
James Norris  

* libgomp.texi: Better distinguish OpenACC and OpenMP "Runtime
Library Routines", and "Environment Variables".
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4991271..e2e384a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -95,10 +95,12 @@ changed to GNU Offloading and Multi Processing Runtime Library.
 @comment
 @menu
 * Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+   The OpenMP runtime application programming
interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
+* OpenMP Environment Variables: Environment Variables.
+   Influencing OpenMP runtime behavior with
+   environment variables.
 * Enabling OpenACC::   How to enable OpenACC for your
applications.
 * OpenACC Runtime Library Routines:: The OpenACC runtime application
@@ -144,11 +146,11 @@ version 4.5.
 
 
 @c -
-@c Runtime Library Routines
+@c OpenMP Runtime Library Routines
 @c -
 
 @node Runtime Library Routines
-@chapter Runtime Library Routines
+@chapter OpenMP Runtime Library Routines
 
 The runtime routines described here are defined by Section 3 of the OpenMP
 specification in version 4.5.  The routines are structured in following
@@ -1327,11 +1329,11 @@ guaranteed not to change during the execution of the program.
 
 
 @c -
-@c Environment Variables
+@c OpenMP Environment Variables
 @c -
 
 @node Environment Variables
-@chapter Environment Variables
+@chapter OpenMP Environment Variables
 
 The environment variables which beginning with @env{OMP_} are defined by
 section 4 of the OpenMP specification in version 4.5, while those


[PATCH, og8] Update _OPENACC macro, etc. for OpenACC 2.6

2019-01-09 Thread Julian Brown
Hi,

This patch updates the _OPENACC macro to 201711, indicating OpenACC 2.6
support, on the openacc-gcc-8-branch. Tested with offloading to NVPTX.

I will apply shortly.

Cheers,

Julian
commit e2ff11ceee7f1294313773d013a9d68f0d4e3c02
Author: Julian Brown 
Date:   Wed Jan 9 03:41:04 2019 -0800

[og8] Update OpenACC version to 2.6

	gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Update _OPENACC define to 201711.

	gcc/doc/
	* invoke.texi: Update mention of OpenACC version to 2.6.

	gcc/fortran/
	* cpp.c (cpp_define_builtins): Update _OPENACC define to 201711.
	* gfortran.texi: Update mentions of OpenACC version to 2.6.
	* intrinsic.texi: Likewise.

	gcc/testsuite/
	* c-c++-common/cpp/openacc-define-3.c: Update expected value for
	_OPENACC define.
	* gfortran.dg/openacc-define-3.f90: Likewise.

	libgomp/
	* acc_prof.h (_ACC_PROF_INFO_VERSION): Update to 201711.
	* libgomp.texi: Update mentions of OpenACC version to 2.6.  Update
	section numbers to match version 2.6 of the spec.
	* openacc.f90 (openacc_version): Update to 201711.
	* openacc_lib.h (openacc_version): Update to 201711.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c
	(cb_any_event): Update expected profiling info version to 201711.
	* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update expected
	openacc_version to 201711.
	* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index fe83981..949cc9a 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1387,7 +1387,7 @@ c_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__SSP__=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201510");
+cpp_define (pfile, "_OPENACC=201711");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eab399d..085a871 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2138,7 +2138,7 @@ freestanding and hosted environments.
 Enable handling of OpenACC directives @code{#pragma acc} in C/C++ and
 @code{!$acc} in Fortran.  When @option{-fopenacc} is specified, the
 compiler generates accelerated code according to the OpenACC Application
-Programming Interface v2.5 @w{@uref{https://www.openacc.org}}.  This option
+Programming Interface v2.6 @w{@uref{https://www.openacc.org}}.  This option
 implies @option{-pthread}, and thus is only supported on targets that
 have support for @option{-pthread}.
 
diff --git a/gcc/fortran/cpp.c b/gcc/fortran/cpp.c
index 0d2af2e..11f97ab 100644
--- a/gcc/fortran/cpp.c
+++ b/gcc/fortran/cpp.c
@@ -165,7 +165,7 @@ cpp_define_builtins (cpp_reader *pfile)
   cpp_define (pfile, "_LANGUAGE_FORTRAN=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201510");
+cpp_define (pfile, "_OPENACC=201711");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 6dd6682..8cfe716 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -538,7 +538,7 @@ Additionally, the GNU Fortran compilers supports the OpenMP specification
 (version 4.0 and most of the features of the 4.5 version,
 @url{http://openmp.org/@/wp/@/openmp-specifications/}).
 There also is support for the OpenACC specification (targeting
-version 2.5, @uref{http://www.openacc.org/}).  See
+version 2.6, @uref{http://www.openacc.org/}).  See
 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
 
 @node Varying Length Character Strings
@@ -2132,7 +2132,7 @@ influence run-time behavior.
 
 GNU Fortran strives to be compatible to the
 @uref{http://www.openacc.org/, OpenACC Application Programming
-Interface v2.5}.
+Interface v2.6}.
 
 To enable the processing of the OpenACC directive @code{!$acc} in
 free-form source code; the @code{c$acc}, @code{*$acc} and @code{!$acc}
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 761b575..df94873 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -14862,7 +14862,7 @@ kind @code{omp_proc_bind_kind}:
 @section OpenACC Module @code{OPENACC}
 @table @asis
 @item @emph{Standard}:
-OpenACC Application Programming Interface v2.5
+OpenACC Application Programming Interface v2.6
 @end table
 
 
@@ -14876,9 +14876,9 @@ are listed below.
 
 For details refer to the actual
 @uref{http://www.openacc.org/,
-OpenACC Application Programming Interface v2.5}.
+OpenACC Application Programming Interface v2.6}.
 
 @code{OPENACC} provides the scalar default-integer
 named constant @code{openacc_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenACC version; for OpenACC v2.5 the value is @code{201510}.
+of the OpenACC vers

Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2019-01-08 Thread Julian Brown
On Sat, 22 Dec 2018 15:09:34 +
Iain Sandoe  wrote:

> Hi Julian,
> 
> > On 21 Dec 2018, at 16:47, Julian Brown 
> > wrote: 
> 
> > On Fri, 21 Dec 2018 14:31:19 +0100
> > Jakub Jelinek  wrote:
> >   
> >> On Fri, Dec 21, 2018 at 01:23:03PM +, Julian Brown wrote:  
> >>> 2018-xx-yy  Nathan Sidwell
> >   
> 
> >>>   * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> >>>   * testsuite/libgomp.oacc-c++/pr71959.C: New.
> >>   
> >>> +void apply (int (*fn)(), Iter out) asm
> >>> ("_ZN5Apply5applyEPFivE4Iter");
> >> 
> >> Will this work even on targets that use _ or other symbol
> >> prefixes?  
> > 
> > I'd guess so, else there would be no portable way of using "asm" to
> > write pre-mangled C++ names. The only existing similar uses I could
> > find in the testsuite are for the ifunc attribute, not asm, though
> > (e.g. g++.dg/ext/attr-ifunc-*.C).  
> 
> It won’t work on such targets (e.g. Darwin)
> … but it’s not too hard to make it happen (see, for example,
> gcc.dg/memcmp-1.c)
> 
> One just has to remember that __USER_LABEL_PREFIX__ is a token, not a
> string.
> 
> so .. in the example above…
> 
> #define STR1(X) #X
> #define STR2(X) STR1(X)
> 
> ….
> 
>  asm(STR2(__USER_LABEL_PREFIX__) "_ZN5Apply5applyEPFivE4Iter”);

Thanks! I've amended the test to use this technique (though I can't
easily test on Darwin, so this is "best effort").

> > Anyway, OpenACC is only useful for a handful of targets at present,
> > neither of which use special symbol prefixes AFAIK.  
> 
> I have hopes of one day getting offloading to work on Darwin (the
> only limitation is developer time, not technical feasibility) .. 

Is this OK now (for stage 4)?

Thanks,

Julian
commit 2ee3f8d09a7b2af6c9ba29cdd8e8587db1946c0b
Author: Julian Brown 
Date:   Wed Dec 19 05:01:58 2018 -0800

Add testcase from PR71959

	libgomp/

	PR lto/71959
	* testsuite/libgomp.oacc-c++/pr71959-aux.cc: New.
	* testsuite/libgomp.oacc-c++/pr71959.C: New.

diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc b/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc
new file mode 100644
index 000..10a6eeb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc
@@ -0,0 +1,35 @@
+// { dg-do compile }
+
+#define STR1(X) #X
+#define STR2(X) STR1(X)
+#define LABEL(X) STR2(__USER_LABEL_PREFIX__) X
+
+struct Iter
+{
+  int *cursor;
+
+  void ctor (int *cursor_) asm (LABEL ("_ZN4IterC1EPi"));
+  int *point () const asm (LABEL ("_ZNK4Iter5pointEv"));
+};
+
+#pragma acc routine
+void Iter::ctor (int *cursor_)
+{
+  cursor = cursor_;
+}
+
+#pragma acc routine
+int *Iter::point () const
+{
+  return cursor;
+}
+
+void apply (int (*fn)(), Iter out) asm (LABEL ("_ZN5Apply5applyEPFivE4Iter"));
+
+#pragma acc routine
+void apply (int (*fn)(), struct Iter out)
+{ *out.point() = fn (); }
+
+extern "C" void __gxx_personality_v0 ()
+{
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
new file mode 100644
index 000..bf27a75
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
@@ -0,0 +1,31 @@
+// { dg-additional-sources "pr71959-aux.cc" }
+
+// PR lto/71959 ICEd LTO due to mismatch between writing & reading behaviour
+
+struct Iter
+{
+  int *cursor;
+
+  Iter(int *cursor_) : cursor(cursor_) {}
+
+  int *point() const { return cursor; }
+};
+
+#pragma acc routine seq
+int one () { return 1; }
+
+struct Apply
+{
+  static void apply (int (*fn)(), Iter out)
+  { *out.point() = fn (); }
+};
+
+int main ()
+{
+  int x;
+
+#pragma acc parallel copyout(x)
+  Apply::apply (one, Iter ());
+
+  return x != 1;
+}


Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-22 Thread Julian Brown
On Tue, 18 Dec 2018 13:47:34 +0100
Jakub Jelinek  wrote:

> On Thu, Dec 13, 2018 at 03:44:25PM +0000, Julian Brown wrote:
> > +static tree
> > +convert_to_firstprivate_int (tree var, gimple_seq *gs)
> > +{
> > +  tree type = TREE_TYPE (var), new_type = NULL_TREE;
> > +  tree tmp = NULL_TREE;
> > +
> > +  if (omp_is_reference (var))
> > +type = TREE_TYPE (type);
> > +
> > +  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
> > +{
> > +  if (omp_is_reference (var))
> > +   {
> > + tmp = create_tmp_var (type);
> > + gimplify_assign (tmp, build_simple_mem_ref (var), gs);
> > + var = tmp;
> > +   }
> > +
> > +  return fold_convert (pointer_sized_int_node, var);
> > +}
> > +
> > +  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
> > +
> > +  new_type = lang_hooks.types.type_for_size (tree_to_uhwi
> > (TYPE_SIZE (type)),
> > +true);
> > +
> > +  if (omp_is_reference (var))
> > +{
> > +  tmp = create_tmp_var (type);
> > +  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
> > +  var = tmp;
> > +}  
> 
> Why are you duplicating this if?  Can't you just do it before the
>   if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
> test once, even better in the same if as you do type = TREE_TYPE
> (type); ?
> 
> Otherwise ok from me, but please check with Thomas if he is ok with
> it too.

Thanks! This version tidies up the code duplication. Re-tested with
offloading to nvptx.

Thomas - OK with you?

Julian
commit 5861e3529ed799715bbd2ea40d5b08a9ddae49bb
Author: Julian Brown 
Date:   Thu Dec 6 04:38:59 2018 -0800

Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b406ce7..1fc2538 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3497,6 +3497,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -9052,6 +9065,74 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something that can
+   be represented by a POINTER_SIZED_INT_NODE.  Any new instructions are
+   appended to GS.  This is used to optimize firstprivate variables, so that
+   small types (less precision than POINTER_SIZE) do not require additional
+   data mappings.  */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+
+  if (omp_is_reference (var))
+{
+  type = TREE_TYPE (type);
+  tree tmp = create_tmp_var (type);
+  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+  var = tmp;
+}
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+return fold_convert (pointer_sized_int_node, var);
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
+
+  new_type = lang_hooks.types.type_for_size (tree_to_uhwi (TYPE_SIZE (type)),
+	 true);
+  tree tmp = create_tmp_var (new_type);
+  var = fold_build1 (VIEW_CONVERT_EXPR, new_type, var);
+  gimplify_assign (tmp, var, gs);
+
+  return fold_convert (pointer_sized_int_node, tmp);
+}
+
+/* Like convert_to_firstprivate_int, but restore the original type.  */
+
+static tree
+convert_from_firstprivate_int (tree var, bool is_ref, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var);
+  tree new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  gcc_assert (TREE_CODE (var) == MEM_REF);
+  var = TREE_OPERAND (var, 0);
+
+  if (INTEGRAL_TYPE_P (var) || POINTER_TYPE_P (type))
+return fold_convert (type, var);
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
+
+  new_t

Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-21 Thread Julian Brown
Hi Jakub,

Thanks for review!

On Fri, 21 Dec 2018 14:31:19 +0100
Jakub Jelinek  wrote:

> On Fri, Dec 21, 2018 at 01:23:03PM +0000, Julian Brown wrote:
> > 2018-xx-yy  Nathan Sidwell  
> > 
> > PR lto/71959
> > libgomp/
> > * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> > * testsuite/libgomp.oacc-c++/pr71959.C: New.  
> 
> Just nits, better use pr71959-aux.cc (*.cc files aren't considered as
> testcases by *.exp:
> set tests [lsort [concat \
>   [find $srcdir/$subdir *.C] \
>   [find
> $srcdir/$subdir/../libgomp.oacc-c-c++-common *.c]]] ) and just a is
> weird.

Fixed.

> > commit c69dce8ba0ecd7ff620f4f1b8dacc94c61984107
> > Author: Julian Brown 
> > Date:   Wed Dec 19 05:01:58 2018 -0800
> > 
> > Add testcase from PR71959
> > 
> > libgomp/  
> 
> Please mention
>   PR lto/71959
> here in the ChangeLog.

Fixed.

> > * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> > * testsuite/libgomp.oacc-c++/pr71959.C: New.  
> 
> > +void apply (int (*fn)(), Iter out) asm
> > ("_ZN5Apply5applyEPFivE4Iter");  
> 
> Will this work even on targets that use _ or other symbol prefixes?

I'd guess so, else there would be no portable way of using "asm" to
write pre-mangled C++ names. The only existing similar uses I could find
in the testsuite are for the ifunc attribute, not asm, though (e.g.
g++.dg/ext/attr-ifunc-*.C).

Anyway, OpenACC is only useful for a handful of targets at present,
neither of which use special symbol prefixes AFAIK.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
> > @@ -0,0 +1,31 @@
> > +// { dg-additional-sources "pr71959-a.C" }
> > +
> > +// pr lto/71959 ICEd LTO due to mismatch between writing & reading
> > behaviour  
> 
> Capital PR instead of pr .

Fixed. OK now?

Thanks,

Julian


Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-21 Thread Julian Brown
On Fri, 21 Dec 2018 02:56:36 +
Julian Brown  wrote:

> On Tue, 25 Sep 2018 14:59:18 +0200
> Martin Jambor  wrote:
> 
> > Hi,
> > 
> > I have noticed a few things...
> > 
> > On Thu, Sep 20 2018, Cesar Philippidis wrote:  
> > > This is another old gomp4 patch that demotes an ICE in PR71959 to
> > > a linker warning. One problem here is that it is not clear if
> > > OpenACC allows individual member functions in C++ classes to be
> > > marked as acc routines. There's another issue accessing member
> > > data inside offloaded regions. We'll add some support for member
> > > data OpenACC 2.6, but some of the OpenACC C++ semantics are still
> > > unclear.
> > >
> > > Is this OK for trunk? I bootstrapped and regtested it for x86_64
> > > Linux with nvptx offloading.  
> > [...]  
> 
> The testcase associated with this bug appears to be fixed by the
> following patch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01167.html
> 
> So, it's unclear if there's anything left to do here, and this patch
> can probably be withdrawn.

...or actually, maybe we should keep the new testcase in case of future
regressions. This patch contains just that.

OK to apply?

Thanks,

Julian

ChangeLog

2018-xx-yy  Nathan Sidwell  

PR lto/71959
libgomp/
    * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
* testsuite/libgomp.oacc-c++/pr71959.C: New.
commit c69dce8ba0ecd7ff620f4f1b8dacc94c61984107
Author: Julian Brown 
Date:   Wed Dec 19 05:01:58 2018 -0800

Add testcase from PR71959

	libgomp/
	* testsuite/libgomp.oacc-c++/pr71959-a.C: New.
	* testsuite/libgomp.oacc-c++/pr71959.C: New.

diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
new file mode 100644
index 000..ec4b14a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
@@ -0,0 +1,31 @@
+// { dg-do compile }
+
+struct Iter
+{
+  int *cursor;
+
+  void ctor (int *cursor_) asm("_ZN4IterC1EPi");
+  int *point () const asm("_ZNK4Iter5pointEv");
+};
+
+#pragma acc routine
+void  Iter::ctor (int *cursor_)
+{
+  cursor = cursor_;
+}
+
+#pragma acc routine
+int *Iter::point () const
+{
+  return cursor;
+}
+
+void apply (int (*fn)(), Iter out) asm ("_ZN5Apply5applyEPFivE4Iter");
+
+#pragma acc routine
+void apply (int (*fn)(), struct Iter out)
+{ *out.point() = fn (); }
+
+extern "C" void __gxx_personality_v0 ()
+{
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
new file mode 100644
index 000..8508c17
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
@@ -0,0 +1,31 @@
+// { dg-additional-sources "pr71959-a.C" }
+
+// pr lto/71959 ICEd LTO due to mismatch between writing & reading behaviour
+
+struct Iter
+{
+  int *cursor;
+
+  Iter(int *cursor_) : cursor(cursor_) {}
+
+  int *point() const { return cursor; }
+};
+
+#pragma acc routine seq
+int one () { return 1; }
+
+struct Apply
+{
+  static void apply (int (*fn)(), Iter out)
+  { *out.point() = fn (); }
+};
+
+int main ()
+{
+  int x;
+
+#pragma acc parallel copyout(x)
+  Apply::apply (one, Iter ());
+
+  return x != 1;
+}


Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-20 Thread Julian Brown
On Tue, 25 Sep 2018 14:59:18 +0200
Martin Jambor  wrote:

> Hi,
> 
> I have noticed a few things...
> 
> On Thu, Sep 20 2018, Cesar Philippidis wrote:
> > This is another old gomp4 patch that demotes an ICE in PR71959 to a
> > linker warning. One problem here is that it is not clear if OpenACC
> > allows individual member functions in C++ classes to be marked as
> > acc routines. There's another issue accessing member data inside
> > offloaded regions. We'll add some support for member data OpenACC
> > 2.6, but some of the OpenACC C++ semantics are still unclear.
> >
> > Is this OK for trunk? I bootstrapped and regtested it for x86_64
> > Linux with nvptx offloading.
> [...]

The testcase associated with this bug appears to be fixed by the
following patch:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01167.html

So, it's unclear if there's anything left to do here, and this patch
can probably be withdrawn.

Thanks,

Julian


Re: [PATCH 2/3] Factor out duplicate code in gimplify_scan_omp_clauses

2018-12-18 Thread Julian Brown
On Sat, 10 Nov 2018 09:11:19 -0800
Julian Brown  wrote:

> This patch, created while trying to figure out the open-coded
> linked-list handling in gimplify_scan_omp_clauses, factors out four
> somewhat repetitive portions of that function into two new outlined
> functions. This was done largely mechanically; the actual lines of
> executed code are more-or-less the same.  That means the interfaces
> to the new functions is somewhat eccentric though, and could no doubt
> be improved.  I've tried to add commentary to the best of my
> understanding, but suggestions for improvements are welcome!
> 
> As a bonus, one apparent bug introduced during an earlier refactoring
> to use the polynomial types has been fixed (I think!): "known_eq (o1,
> 2)" should have been "known_eq (o1, o2)".
> 
> Tested alongside other patches in this series and the async patches.
> OK?

Now the main part of the attach/detach support has been conditionally
accepted pending Thomas's approval (thanks!), is this prerequisite part
OK too?

Thanks,

Julian


Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes

2018-12-13 Thread Julian Brown
On Tue, 4 Dec 2018 16:55:04 +0100
Tom de Vries  wrote:

> On 04-12-18 13:29, Jakub Jelinek wrote:
> > On Fri, Jun 29, 2018 at 11:19:53AM -0700, Cesar Philippidis wrote:  
> >> The attached patch includes the nvptx and GCC ME reductions
> >> enhancements.
> >>
> >> Is this patch OK for trunk? It bootstrapped / regression tested
> >> cleanly for x86_64 with nvptx offloading.  
> > This is all OpenACC specific code not really shareable with OpenMP,
> > if Thomas (for middle-end) and Tom (for NVPTX backend) are ok with
> > it, it is ok for trunk.
> >   
> 
> Formatting needs to be fixed:
> ...
> There should be exactly one space between function name and
> parenthesis. 160:+  unsigned old_shift = DIM_SIZE(VECTOR);
> ...
> 
> Also, the updated patch does not address my comment about
> probabilities here
> ( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00325.html ): ...
> > +  /* Create the loop.  */
> > +  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;  
> 
> Edges need probabilities, as in nvptx_lockless_update,
> nvptx_lockfull_update and nvptx_goacc_reduction_init.
> ...

Something like the attached?

Tested alongside other revised patches in the series:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00931.html

(except the lines adding edge probabilities, which I've
smoke-tested but haven't yet gone through a full test cycle).

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
(nvptx_split_blocks): Call it for cond_uni insn.
(nvptx_expand_cond_uni): New.
(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
(nvptx_init_builtins): Initialize it.
(nvptx_expand_builtin):
(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
tree BITS operand.
(nvptx_vector_reduction): New.
(nvptx_adjust_reduction_type): New.
(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_teardown):
* config/nvptx/nvptx.md (cond_uni): New pattern.

commit 401876d422c4fa7f02c1b899e81568eea6ad7531
Author: Julian Brown 
Date:   Tue Dec 11 13:35:52 2018 -0800

Various OpenACC reduction enhancements - ME and nvptx changes

	gcc/
	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
	(nvptx_split_blocks): Call it for cond_uni insn.
	(nvptx_expand_cond_uni): New.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
	(nvptx_init_builtins): Initialize it.
	(nvptx_expand_builtin):
	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
	tree BITS operand.
	(nvptx_vector_reduction): New.
	(nvptx_adjust_reduction_type): New.
	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
	Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_teardown):
	* config/nvptx/nvptx.md (cond_uni): New pattern.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..0023dad 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
 }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+ blocks' terminating branch, if we didn't care for self
+ checking.)  */
+  for (;;)
+{
+  probe = next_real_insn (probe);
+  if (!probe)
+	break;
+  pat = PATTERN (probe);
+
+  if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+  gcc_assert (NONJUMP_INSN_P (probe));
+}
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			   UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe,  (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of

Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-13 Thread Julian Brown
On Fri, 7 Dec 2018 15:05:46 +0100
Jakub Jelinek  wrote:

> On Thu, Dec 06, 2018 at 10:40:41PM +0000, Julian Brown wrote:
> > +   && (TREE_CODE (inner_type) == REAL_TYPE
> > +   || (!omp_is_reference (var)
> > +   && INTEGRAL_TYPE_P (inner_type))
> > +   || TREE_CODE (inner_type) == INTEGER_TYPE)  
> 
> Not sure I understand the above.  INTEGRAL_TYPE_P is INTEGER_TYPE,
> BOOLEAN_TYPE and ENUMERAL_TYPE, so you want to handle INTEGER_TYPE
> no magger whether var should be passed by reference or not, but
> BOOLEAN_TYPE or ENUMERAL_TYPE only if it is not a reference?
> That is just weird.  Any test to back that up?

I couldn't figure out any reason for the test being written like that
-- specifically, what it was meant to exclude -- but the attached
simplifies it to ANY_INTEGRAL_TYPE_P or FLOAT_TYPE_P, and that seems to
work fine.

> > +   if ((TREE_CODE (inner_type) == REAL_TYPE
> > +|| (!omp_is_reference (var)
> > +&& INTEGRAL_TYPE_P (inner_type))
> > +|| TREE_CODE (inner_type) ==
> > INTEGER_TYPE)  
> 
> Ditto here.

Likewise. Re-tested with offloading to NVPTX. OK?

Thanks for review,

Julian

ChangeLog

gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
commit 15114e33ecb6cb687dbdfb30d69d7dcbeeb87fca
Author: Julian Brown 
Date:   Thu Dec 6 04:38:59 2018 -0800

Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b406ce7..adc686c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3497,6 +3497,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -9052,6 +9065,87 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something that can
+   be represented by a POINTER_SIZED_INT_NODE.  Any new instructions are
+   appended to GS.  This is used to optimize firstprivate variables, so that
+   small types (less precision than POINTER_SIZE) do not require additional
+   data mappings.  */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  if (omp_is_reference (var))
+type = TREE_TYPE (type);
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+{
+  if (omp_is_reference (var))
+	{
+	  tmp = create_tmp_var (type);
+	  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+	  var = tmp;
+	}
+
+  return fold_convert (pointer_sized_int_node, var);
+}
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
+
+  new_type = lang_hooks.types.type_for_size (tree_to_uhwi (TYPE_SIZE (type)),
+	 true);
+
+  if (omp_is_reference (var))
+{
+  tmp = create_tmp_var (type);
+  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+  var = tmp;
+}
+
+  tmp = create_tmp_var (new_type);
+  var = fold_build1 (VIEW_CONVERT_EXPR, new_type, var);
+  gimplify_assign (tmp, var, gs);
+
+  return fold_convert (pointer_sized_int_node, tmp);
+}
+
+/* Like convert_to_firstprivate_int, b

Re: [patch] various OpenACC reduction enhancements - test cases

2018-12-13 Thread Julian Brown
On Tue, 4 Dec 2018 13:59:33 +0100
Jakub Jelinek  wrote:

> On Fri, Jun 29, 2018 at 11:23:21AM -0700, Cesar Philippidis wrote:
> > Attached are the updated reductions tests cases. Again, these have
> > been bootstrapped and regression tested cleanly for x86_64 with
> > nvptx offloading. Is it OK for trunk?  
> 
> If Thomas is ok with this, it is ok for trunk.

Here's a new version to go with the FE patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html

Thanks,

Julian

ChangeLog

2018-xx-xx  Cesar Philippidis  
Nathan Sidwell  
Julian Brown  

gcc/testsuite/
* c-c++-common/goacc/orphan-reductions-1.c: New test.
* c-c++-common/goacc/reduction-7.c: New test.
* c-c++-common/goacc/routine-4.c: Update.
* g++.dg/goacc/reductions-1.C: New test.
* gcc.dg/goacc/loop-processing-1.c: Update.
* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

libgomp/
* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
* libgomp.oacc-fortran/reduction-9.f90: New test.
commit 7d445a56d6db96696cec8359e58258d47fa7c9ae
Author: Julian Brown 
Date:   Wed Dec 12 11:11:03 2018 -0800

Various OpenACC reduction enhancements - test cases

2018-xx-xx  Cesar Philippidis  
	    Nathan Sidwell  
	Julian Brown  

	gcc/testsuite/
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/reduction-7.c: New test.
	* c-c++-common/goacc/routine-4.c: Update.
	* g++.dg/goacc/reductions-1.C: New test.
	* gcc.dg/goacc/loop-processing-1.c: Update.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

	libgomp/
	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
	* libgomp.oacc-fortran/reduction-9.f90: New test.

diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
new file mode 100644
index 000..b0bd4a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
@@ -0,0 +1,56 @@
+/* Test orphan reductions.  */
+
+#include 
+
+#pragma acc routine seq
+int
+seq_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop seq reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 1;
+
+  return sum;
+}
+
+#pragma acc routine gang
+int
+gang_reduction (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+
+
+  return s1 + s2;
+}
+
+#pragma acc routine worker
+int
+worker_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop worker reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 3;
+
+  return sum;
+}
+
+#pragma acc routine vector
+int
+vector_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop vector reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 4;
+
+  return sum;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 000..eba1d02
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,111 @@
+/* Exercise invalid reductions on array and struct members.  */
+
+void
+test_parallel ()
+{
+  struct {
+int a;
+float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.b[3] += 1;
+
+#pragma acc parallel reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+s2[2].a += 1;
+
+#pragma acc parallel reduction(+:s2[3].b[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+s2[3].b[4] += 1;
+
+#pragma acc parallel reduction(+:z[5]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+z[5] += 1;
+}
+
+void
+test_combined ()
+{
+  struct {
+int a;
+float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel loop reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.b[3] += 1;
+
+#pragma acc parallel loop reduction(

Re: [patch] various OpenACC reduction enhancements - FE changes

2018-12-13 Thread Julian Brown
_omp_all_clauses (cp_parser
> > *parser, omp_clause_mask mask, c_name = "private";
> >   break;
> > case PRAGMA_OMP_CLAUSE_REDUCTION:
> > - clauses = cp_parser_omp_clause_reduction (parser,
> > clauses);
> > + clauses = cp_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_OMP); c_name = "reduction";
> >   break;
> > case PRAGMA_OMP_CLAUSE_SCHEDULE:  
> 
> Again, needs adjustement for IN_REDUCTION/TASK_REDUCTION.

Done.

> > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> > index c779137da45..177acdd9cc4 100644
> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> > @@ -5875,6 +5875,14 @@ finish_omp_clauses (tree clauses, enum
> > c_omp_region_type ort) field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD)
> > == C_ORT_OMP); goto check_dup_generic;
> > case OMP_CLAUSE_REDUCTION:
> > + if (ort == C_ORT_ACC && oacc_get_fn_attrib
> > (current_function_decl)
> > + && omp_find_clause (clauses, OMP_CLAUSE_GANG))
> > +   {
> > + error_at (OMP_CLAUSE_LOCATION (c),
> > +   "gang reduction on an orphan loop");
> > + remove = true;
> > + break;
> > +   }
> >   field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
> >   t = OMP_CLAUSE_DECL (c);
> >   if (TREE_CODE (t) == TREE_LIST)  
> 
> In C++ finish_omp_clauses there are 2 loops, so you can easily just
> remember if OMP_CLAUSE_GANG has been seen in the first loop and
> diagnose this in the second loop only.

Done.

Re-tested with offloading to nvptx, and with updates to the new
testcases (to be posted). OK?

Thanks,

Julian

ChangeLog

2018-xx-xx  Cesar Philippidis  
    Nathan Sidwell  
Julian Brown  

gcc/c/
* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(c_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_all_clauses): Update calls to
c_parser_omp_clause_reduction.
(c_parser_omp_all_clauses): Likewise.
* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
gang reductions.

gcc/cp/
* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type.  Update call to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_all_clauses): Update call to
cp_parser_omp_clause_reduction.
(cp_parser_omp_all_clauses): Likewise.
* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
gang reductions.

gcc/fortran/
* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
gang reductions.
* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.
commit 0fcaa69b46d2661c3b133c42e0ce73693088b04e
Author: Julian Brown 
Date:   Wed Dec 12 11:09:29 2018 -0800

Various OpenACC reduction enhancements - FE changes

2018-xx-xx  Cesar Philippidis  
	Nathan Sidwell  
	Julian Brown  

	gcc/c/
	* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(c_parser_omp_clause_reduction): Change is_omp boolean parameter to
	c_omp_region_type.  Update call to c_parser_omp_variable_list.
	(c_parser_oacc_all_clauses): Update calls to
	c_parser_omp_clause_reduction.
	(c_parser_omp_all_clauses): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/cp/
	* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
	c_omp_region_type.  Update call to cp_parser_omp_var_list_no_open.
	(cp_parser_oacc_all_clauses): Update call to
	cp_parser_omp_clause_reduction.
	(cp_parser_omp_all_clauses): Likewise.
	* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
	gang reductions.
	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b875c4f..59a461b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11869,7 +11869,8 @@ c_parser_oacc_wa

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-12-11 Thread Julian Brown
On Fri, 17 Aug 2018 18:39:00 +0200
Bernhard Reutner-Fischer  wrote:

> On 16 August 2018 17:46:43 CEST, Julian Brown
>  wrote:
> >On Wed, 15 Aug 2018 21:56:54 +0200
> >Bernhard Reutner-Fischer  wrote:
> >  
> >> On 15 August 2018 18:46:37 CEST, Julian Brown
> >>  wrote:  
> >> >On Mon, 13 Aug 2018 12:06:21 -0700
> >> >Cesar Philippidis  wrote:
> >> 
> >> atttribute has more t than strictly necessary. 
> >> Don't like signed integer levels where they should be some
> >> unsigned. Also don't like single switch cases instead of if.
> >> And omitting function comments even if the hook way above is
> >> documented may be ok ish but is a bit lazy ;)  
> >
> >Here's a new version with those comments addressed. I also changed
> >the logic around a little to avoid adding decls to the vec in
> >omp_context which would never be given the gang-private attribute.
> >
> >Re-tested with offloading to NVPTX.
> >
> >OK?  
> 
> (TREE_CODE (var) == VAR_DECL
> Is nowadays known as VAR_P (decl), FWIW.

Fixed. (And also Tom's formatting nit mentioned in another email.)

> ISTM that global variables are not JIT-friendly.
> No further comments from me.

Probably true, but AFAIK nobody's trying to use the (GCC) JIT with the
PTX backend, and the backend already uses global variables for several
other purposes. Of course PTX code is JIT'ted itself by the NVidia
runtime, but I guess that's not what you were referring to!

Is this version OK? Re-tested with offloading to NVPTX.

Thanks,

Julian
commit 3335ddfa72944be5359280116e8eb4febd4ed3c7
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..02c2847 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 

Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-06 Thread Julian Brown
On Tue, 4 Dec 2018 15:27:12 +0100
Jakub Jelinek  wrote:

> On Thu, Sep 20, 2018 at 07:38:04PM -0400, Julian Brown wrote:
> > 2018-09-20  Cesar Philippidis  
> >     Julian Brown  
> > 
> > gcc/
> > * omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
> > (convert_to_firstprivate_int): New function.
> > (convert_from_firstprivate_int): New function.
> > (lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in
> > OpenACC.
> > 
> > libgomp/
> > * oacc-parallel.c (GOACC_parallel_keyed): Handle
> > GOMP_MAP_FIRSTPRIVATE_INT host addresses.
> > * plugin/plugin-nvptx.c (nvptx_exec): Handle
> > GOMP_MAP_FIRSTPRIVATE_INT host addresses.
> > * testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
> > * testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c:
> > New test.
> > * testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New
> > test.  
> 
> > @@ -8039,7 +8182,7 @@ lower_omp_target (gimple_stmt_iterator
> > *gsi_p, omp_context *ctx) if (omp_is_reference (ovar))
> >   type = TREE_TYPE (type);
> > if ((INTEGRAL_TYPE_P (type)
> > -&& TYPE_PRECISION (type) <= POINTER_SIZE)
> > +&& tree_to_uhwi (TYPE_SIZE (type)) <=
> > POINTER_SIZE) || TREE_CODE (type) == POINTER_TYPE)
> >   {
> > tkind = GOMP_MAP_FIRSTPRIVATE_INT;
> > @@ -8194,7 +8337,7 @@ lower_omp_target (gimple_stmt_iterator
> > *gsi_p, omp_context *ctx) if (omp_is_reference (var))
> >   type = TREE_TYPE (type);
> > if ((INTEGRAL_TYPE_P (type)
> > -&& TYPE_PRECISION (type) <= POINTER_SIZE)
> > +&& tree_to_uhwi (TYPE_SIZE (type)) <=
> > POINTER_SIZE) || TREE_CODE (type) == POINTER_TYPE)
> >   {
> > x = build_receiver_ref (var, false, ctx);  
> 
> Why this?

My *guess* is that it was an attempt to handle cases where the type
precision is less than the type size, and maybe it was feared that
type-punning to an int would then copy the wrong bits. Those changes
appear to not have been necessary though, at least with respect to
testsuite coverage. I also fixed the Fortran test to use "STOP n"
instead of "call abort".

I re-tested the attached with offloading to nvptx. OK?

Thanks,

Julian
commit 5c5d0e7ca29413ba8ec0c38b616a7c59f36f56cd
Author: Julian Brown 
Date:   Mon Sep 17 19:38:21 2018 -0700

Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b406ce7..4718a65 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3497,6 +3497,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -9052,6 +9065,88 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something
+   that can be represented by a POINTER_SIZED_INT_NODE.  Any new
+   instructions are appended to GS.  This is primarily used to
+   optimize firstprivate variables, so that small types (less
+   precision than POINTER_SIZE) do not require additional data
+   mappings. */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  if (omp_is_reference (var))
+type = TREE_TYPE (type);
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+{
+  if (omp_is_reference (var))
+	{
+	  tmp = create_tmp_var (type);
+	  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+	  var = tmp;
+	}
+
+  return fold_convert (pointer_sized_int_node, var);
+}
+
+  gcc_a

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-06 Thread Julian Brown
On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:

> On Thu, 6 Dec 2018 21:42:14 +0100
> Thomas Schwinge  wrote:
> 
> > [...]
> > ..., where the "Invalid read of size 8" happens, and which
> > eventually would try to "free (tgt)" again, via
> > libgomp/target.c:gomp_unmap_tgt:
> > 
> > attribute_hidden void
> > gomp_unmap_tgt (struct target_mem_desc *tgt)
> > {
> >   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> > region.  */ if (tgt->tgt_end)
> > gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> > 
> >   free (tgt->array);
> >   free (tgt);
> > }
> > 
> > Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> > or something else?  
> 
> It might be worth trying this with the refcounting changes in the
> attach/detach patch.

...oh, also make sure you have this patch in the series you're testing
with:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html

else your "wait" will be ignored, IIUC.

Julian


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-06 Thread Julian Brown
On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:

> [...]
> ..., where the "Invalid read of size 8" happens, and which eventually
> would try to "free (tgt)" again, via libgomp/target.c:gomp_unmap_tgt:
> 
> attribute_hidden void
> gomp_unmap_tgt (struct target_mem_desc *tgt)
> {
>   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> region.  */ if (tgt->tgt_end)
> gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> 
>   free (tgt->array);
>   free (tgt);
> }
> 
> Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> or something else?

It might be worth trying this with the refcounting changes in the
attach/detach patch.

Julian


Re: [patch,openacc] Propagate independent clause for OpenACC kernels pass

2018-12-05 Thread Julian Brown
On Tue, 4 Dec 2018 14:55:03 +0100
Richard Biener  wrote:

> On Tue, 4 Dec 2018, Jakub Jelinek wrote:
> 
> > On Mon, Dec 03, 2018 at 11:40:39PM +0000, Julian Brown wrote:  
> > > Jakub asked in the following email at the time of the patch
> > > submission for the gomp4 branch what the difference was between
> > > the new marked_independent flag and safelen == INT_MAX:
> > > 
> > >   https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01100.html
> > > 
> > > If I understand the followup correctly,
> > > 
> > >   https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01117.html
> > > 
> > > a setting of safelen > 1 means that up to that number of loop
> > > iterations can run together in lockstep (as if each insn in the
> > > loop was blindly rewritten to a safelen-width SIMD equivalent) --
> > > but anything that happens in iteration N + 1 cannot happen before
> > > something that happens in iteration N. Chung-Lin pointed out that
> > > OpenACC's semantics are even less strict (allowing iterations to
> > > proceed fully independently in an arbitrary order), so the
> > > marked_independent flag does carry non-redundant information --
> > > even with safelen set to INT_MAX.  
> > 
> > OpenMP 5 (not implemented in GCC 9 though) has order(concurrent)
> > clause for this (no cross-iteration dependencies at all, iterations
> > can be run in any order, in parallel etc.).
> > 
> > I believe it matches the can_be_parallel flag we now have, but I
> > remember there were some issues with that flag for use in DO
> > CONCURRENT.
> > 
> > Or do we want to have some other flag for really independent
> > iterations? What passes could use that?  Would the vectorizer
> > appreciate the stronger assertion in some cases?  
> 
> The vectorizer doesn't really care.  It would be autopar that should.
> The issue with using can_be_parallel for DO CONCURRENT was that the
> middle-end introduces non-trivial sharing between iterations,
> introducing dependences that then make the loop no longer
> can_be_parallel.  I believe similar things could happen with
> ->safelen (consider loop reversal and existing forward dependences).
> I guess we're simply lucky in that area ;)

I wondered if I should try modifying the patch to set the
can_be_parallel flag for kernels loops with an "independent" clause
instead (and try to address Jakub's other comments). Do I understand
right that the issue with the can_be_parallel flag is that it does not
necessarily guarantee safety of optimisations for loops which are
supposed to have fully-independent iterations, rather than that it has
different semantics from the proposed marked_independent flag?

However, it turns out that this patch has a dependency on this one:

  https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01179.html

and, according to Cesar, that in turn has a dependency on another patch:

  https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01189.html

so, it might take me a little time to untangle all that. Does the rough
idea sound plausible, though? Or is modifying this patch to use
can_be_parallel likely to just cause problems at present?

Thanks,

Julian


Re: [PATCH, OpenACC] (1/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-12-05 Thread Julian Brown
On Tue, 4 Dec 2018 15:02:15 +0100
Jakub Jelinek  wrote:

> On Tue, Aug 28, 2018 at 03:19:19PM -0400, Julian Brown wrote:
> > 2018-08-28  Julian Brown  
> > Cesar Philippidis  
> > 
> > PR middle-end/70828
> > 
> > gcc/
> > * gimplify.c (gimplify_omp_ctx): Add decl_data_clause hash
> > map. (new_omp_context): Initialise above.
> > (delete_omp_context): Delete above.
> > (gimplify_scan_omp_clauses): Scan for array mappings on
> > data constructs, and record in above map.
> > (gomp_needs_data_present): New function.
> > (gimplify_adjust_omp_clauses_1): Handle data mappings (e.g.
> > array slices) declared in lexically-enclosing data constructs.
> > * omp-low.c (lower_omp_target): Allow decl for bias not to
> > be present in omp context.
> > 
> > gcc/testsuite/
> > * c-c++-common/goacc/acc-data-chain.c: New test.
> > * gfortran.dg/goacc/pr70828.f90: New test.
> > * gfortran.dg/goacc/pr70828-2.f90: New test.
> > 
> > libgomp/
> > * testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
> > * testsuite/libgomp.oacc-fortran/implicit_copy.f90: New
> > test.
> > * testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
> > * testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
> > * testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
> > * testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.  
> 
> > --- a/gcc/gimplify.c
> > +++ b/gcc/gimplify.c
> > @@ -191,6 +191,7 @@ struct gimplify_omp_ctx
> >bool target_map_scalars_firstprivate;
> >bool target_map_pointers_as_0len_arrays;
> >bool target_firstprivatize_array_bases;
> > +  hash_map > *decl_data_clause;
> >  };
> >  
> >  static struct gimplify_ctx *gimplify_ctxp;
> > @@ -413,6 +414,7 @@ new_omp_context (enum omp_region_type
> > region_type) c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
> >else
> >  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
> > +  c->decl_data_clause = new hash_map
> > >;  
> 
> Not really happy about creating this unconditionally.  Can you leave
> it NULL by default and only initialize for contexts where it will be
> needed?
> 
> > @@ -7793,8 +7796,21 @@ gimplify_scan_omp_clauses (tree *list_p,
> > gimple_seq *pre_p, case OMP_TARGET:
> >   break;
> > case OACC_DATA:
> > - if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
> > -   break;
> > + {
> > +   tree nextc = OMP_CLAUSE_CHAIN (c);
> > +   if (nextc
> > +   && OMP_CLAUSE_CODE (nextc) == OMP_CLAUSE_MAP
> > +   && (OMP_CLAUSE_MAP_KIND (nextc)
> > + == GOMP_MAP_FIRSTPRIVATE_POINTER
> > +   || OMP_CLAUSE_MAP_KIND (nextc) ==
> > GOMP_MAP_POINTER))
> > + {
> > +   tree base_addr = OMP_CLAUSE_DECL (nextc);
> > +   ctx->decl_data_clause->put (base_addr,
> > + std::make_pair (unshare_expr (c),
> > unshare_expr (nextc)));  
> 
> Don't like the wrapping here, can you just split it up:
>   std::pair p
> = std::make_pair (unshare_expr (c),
>   unshare_expr (nextc));
>   ctx->decl_data_clause->put (base_addr, p);
> or similar?
> 
> > +
> > +static std::pair *
> > +gomp_needs_data_present (tree decl)  
> 
> Would be helpful to have acc/oacc in the function name.
> > +{
> > +  gimplify_omp_ctx *ctx = NULL;
> > +
> > +  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
> > +  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
> > +  && (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
> > + || TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) !=
> > ARRAY_TYPE))
> > +return NULL;
> > +
> > +  if (gimplify_omp_ctxp->region_type != ORT_ACC_PARALLEL
> > +  && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
> > +return NULL;  
> 
> And move this test to the top.
> 
> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c
> > @@ -8411,9 +8411,10 @@ lower_omp_target (gimple_stmt_iterator
> > *gsi_p, omp_context *ctx) x = fold_convert_loc (clause_loc, type,
> > x); if (!integer_zerop (OMP_CLAUSE_SIZE (c)))
> >   {
> > -   tree bias = OMP_CLAUSE_SIZE (c);
> > -   if (DECL_P (bias))
> > - bias = lookup_decl (bias, ctx);
> > +   

Re: [PATCH, OpenACC] Support Fortran derived type members in "acc update" directives

2018-12-04 Thread Julian Brown
On Tue, 4 Dec 2018 20:12:58 +0100
Jakub Jelinek  wrote:

> On Tue, Dec 04, 2018 at 07:06:43PM +0000, Julian Brown wrote:
> > Thanks for the review! As it happened though, I had to rewrite a
> > lot of the code in this patch for the attach/detach patch, and I
> > had meant to withdraw this one. Many apologies about the wasted
> > time! I mentioned the superseding in the first submission of the
> > attach/detach patch:
> > 
> >   https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00826.html  
> 
> I haven't looked at the dynamic array series because I haven't heard
> back on https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00946.html

Those patches are independent of the attach/detach ones (though the
latter do depend on Chung-Lin's async support patches).

Thanks,

Julian


Re: [PATCH, OpenACC] Support Fortran derived type members in "acc update" directives

2018-12-04 Thread Julian Brown
Hi Jakub,

On Tue, 4 Dec 2018 15:17:08 +0100
Jakub Jelinek  wrote:

> On Mon, Sep 03, 2018 at 08:46:54PM -0400, Julian Brown wrote:
> > 2018-09-03  Cesar Philippidis  
> > 
> > gcc/fortran/
> > * openmp.c (gfc_match_omp_variable_list): New allow_derived
> > argument. (gfc_match_omp_map_clause): Update call to
> > gfc_match_omp_variable_list. (gfc_match_omp_clauses): Update
> > calls to gfc_match_omp_map_clause. (gfc_match_oacc_update):
> > Update call to gfc_match_omp_clauses. (resolve_omp_clauses):
> > Permit derived type variables in ACC UPDATE clauses.
> > * trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived
> > type members.
> > 
> > gcc/
> > * gimplify.c (gimplify_scan_omp_clauses): Update handling
> > of ACC UPDATE variables.
> > 
> > gcc/testsuite/
> > * gfortran.dg/goacc/derived-types.f90: New test.
> > 
> > libgomp/
> > * testsuite/libgomp.oacc-fortran/update-2.f90: New test.
> > * testsuite/libgomp.oacc-fortran/derived-type-1.f90: New
> > test.  
> 
> Note, already OpenMP 4.5 allows the %s in map/to/from clauses, I just
> didn't get to that yet.
> And OpenMP 5.0 allows arbitrary expressions there.
> 
> > @@ -4336,9 +4342,12 @@ resolve_omp_clauses (gfc_code *code,
> > gfc_omp_clauses *omp_clauses, || n->expr->ref == NULL
> > || n->expr->ref->next
> > || n->expr->ref->type != REF_ARRAY)
> > - gfc_error ("%qs in %s clause at %L is not a
> > proper "
> > -"array section", n->sym->name,
> > name,
> > ->where);
> > + {
> > +   if (n->sym->ts.type != BT_DERIVED)
> > + gfc_error ("%qs in %s clause at %L is
> > not a proper "
> > +"array section",
> > n->sym->name, name,
> > +>where);
> > + }
> > else if (n->expr->ref->u.ar.codimen)
> >   gfc_error ("Coarrays not supported in %s
> > clause at %L", name, >where);  
> 
> I'm worried about this change a little bit.  It isn't guarded for
> OpenACC only and I wonder if you actually resolve properly the
> derived expressions (look inside of those).
> 
> > diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> > index f038f4c..95b15e5 100644
> > --- a/gcc/fortran/trans-openmp.c
> > +++ b/gcc/fortran/trans-openmp.c
> > @@ -2108,7 +2108,68 @@ gfc_trans_omp_clauses (stmtblock_t *block,
> > gfc_omp_clauses *clauses, tree decl = gfc_get_symbol_decl (n->sym);
> >   if (DECL_P (decl))
> > TREE_ADDRESSABLE (decl) = 1;
> > - if (n->expr == NULL || n->expr->ref->u.ar.type ==
> > AR_FULL)
> > + /* Handle derived-typed members for OpenACC Update.
> > */
> > + if (n->sym->ts.type == BT_DERIVED
> > + && n->expr != NULL && n->expr->ref != NULL
> > + && (n->expr->ref->next == NULL
> > + || (n->expr->ref->next != NULL
> > + && n->expr->ref->next->type == REF_ARRAY
> > + && n->expr->ref->next->u.ar.type ==
> > AR_FULL))
> > + && (n->expr->ref->type == REF_ARRAY
> > + && n->expr->ref->u.ar.type != AR_SECTION))  
> 
> Like here you have all kinds of conditions, but has resolving made
> sure all the needed diagnostics is emitted?
> Perhaps at least for now this also should be guarded on OpenACC only,
> once OpenMP allows %s in map/to/from, part of this will be usable for
> it, but e.g.
> 
> > + if (context != type)
> > +   {
> > + tree f2 = c->norestrict_decl;
> > + if (!f2 || DECL_FIELD_CONTEXT (f2) != type)
> > +   for (f2 = TYPE_FIELDS (TREE_TYPE (decl));
> > f2;
> > +f2 = DECL_CHAIN (f2))
> > + if (TREE_CODE (f2) == FIELD_DECL
> > + && DECL_NAME (f2) == DECL_NAME
> > (field))
> > +   break;
> > + gcc_assert (f2);
> > + c->norestrict_decl = f2;
> > +  

Re: [patch,opencc] Don't mark OpenACC auto loops as independent inside acc parallel regions

2018-12-03 Thread Julian Brown
On Thu, 20 Sep 2018 09:49:43 -0700
Cesar Philippidis  wrote:

> OpenACC as a concept of loop independence, in which independent loops
> may be executed in parallel across gangs, workers and vectors. Inside
> acc parallel regions, if a loop isn't explicitly marked seq or auto,
> it is predetermined to be independent.
> 
> This patch corrects a bug where acc loops marked as auto were being
> mistakenly promoted to independent. That's bad because it can generate
> bogus results if a dependency exist.
> 
> Note that this patch depends on the following patches for
> -fnote-info-omp-optimized which is used in a test case.
> 
>   * Add user-friendly OpenACC diagnostics regarding detected
> parallelism.
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01652.html
> 
>   * Correct the reported line number in fortran combined OpenACC
> directives
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01554.html
> 
>   * Correct the reported line number in c++ combined OpenACC
> directives https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01552.html
> 
> Is this OK for trunk? I bootstrapped and regtested on x86_64 Linux
> with nvptx offloading.

LGTM, FWIW.

Thanks,

Julian


Re: [patch,openacc] Set safelen to INT_MAX for oacc independent pragma

2018-12-03 Thread Julian Brown
On Thu, 20 Sep 2018 11:21:28 -0700
Cesar Philippidis  wrote:

> This is another old gomp4 OpenACC patch which impacts targets that use
> simd vectorization, such as the host and AMD GCN, rather than nvptx.
> Basically, as the subject states, it sets safelen to INT_MAX for
> independent acc loops, which I believe is already being done for
> OpenMP in certain situations.
> 
> The original discussion for this patch can be found here
> .
> 
> Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
> Linux with nvptx offloading.

I believe this is conservatively safe, although I understand that a
safelen of INT_MAX does not correspond strictly to the way a GPU will
execute greater-than-warp-size numbers of independent loop iterations.
This isn't a problem for NVPTX (which IIUC does not use the information
carried by the safelen setting at present) or the host, but may need
attention for e.g. AMD GCN or other GPUs that use a similar execution
scheme.

This may need merging with the non-marked_independent parts of:

  https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00140.html

Julian


Re: [patch,openacc] Propagate independent clause for OpenACC kernels pass

2018-12-03 Thread Julian Brown
On Thu, 20 Sep 2018 11:06:40 -0700
Cesar Philippidis  wrote:

> This is another old patch teaches the omp expansion pass how to
> propagate the acc loop independent clause to the later stages
> throughout compilation. Unfortunately, it didn't include any test
> cases. I'm not sure how effective this will be with the existing
> kernel parloops pass. But as I noted in my Cauldron talk, we would
> like to convert acc kernels regions to acc parallel regions, and this
> patch could help in that regard.
> 
> Chung-Lin, do you have anymore state on this patch?
> 
> Anyway, I bootstrapped and regtested it for x86_64 Linux with nvptx
> offloading and it didn't introduce any regressions. We do have a
> couple of other standalone kernels patches in og8, but those depend
> on other patches.

It's not surprising that there are no new tests and no regressions: the
new "marked_independent" field is not used anywhere, either within this
patch, or on the gomp4 branch where it originated, nor currently on the
og8 branch! It looks like the planned use by the parloops pass (etc.)
has not materialised so far.

Jakub asked in the following email at the time of the patch submission
for the gomp4 branch what the difference was between the new
marked_independent flag and safelen == INT_MAX:

  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01100.html

If I understand the followup correctly,

  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01117.html

a setting of safelen > 1 means that up to that number of loop
iterations can run together in lockstep (as if each insn in the loop
was blindly rewritten to a safelen-width SIMD equivalent) -- but
anything that happens in iteration N + 1 cannot happen before something
that happens in iteration N. Chung-Lin pointed out that OpenACC's
semantics are even less strict (allowing iterations to proceed fully
independently in an arbitrary order), so the marked_independent flag
does carry non-redundant information -- even with safelen set to
INT_MAX.

Actually I think that given the above, setting safelen to a value
greater than 32 (the warp size) may not be safe for NVPTX on OpenACC,
depending on the vagaries of warp scheduling. But that's not the
subject of this patch.

Anyway: given that the information recorded by this patch is not used
at present, and further work on the kernels pass may head in a
different direction, I'm not sure that it makes sense to commit it
at this point.

Also, it occurs to me that if the independent flag is set on loops
within kernels regions with an explicit "independent" clause, it should
also be set by default on loops in parallel regions without clauses
that disable the independent-iteration semantics.

Julian


Re: [patch,openacc] C, C++ OpenACC wait diagnostic change

2018-12-03 Thread Julian Brown
On Fri, 30 Nov 2018 16:25:42 +0100
Thomas Schwinge  wrote:

> In addition to your "(1" token sequence (and similar ones), I suppose
> what these code paths in C and C++ are supposed to catch the "wait ()"
> case (see line 149 of gcc/testsuite/c-c++-common/goacc/asyncwait-1.c).
> 
> I suppose in C, we do diagnose an "error: expected expression before
> ')' token" in "c_parser_expr_list"/"c_parser_expr_no_commas", and
> then return a list with an "error_mark_node", right?  (I have not
> verified that.)
> 
> > So, we can elide the
> > diagnostic with no change to compiler behaviour.  
> 
> In that case, yes.

[...]

> Right, one single error diagnostic is enough.
> 
> But please make sure that the "wait ()" case continues to be diagnosed
> correctly -- similarly to C, I suggest "expected expression before ')'
> token" (or whatever is natural to the C++ parser), and then
> accordingly tidy up that "dg-error" regular expression on line 149 of
> gcc/testsuite/c-c++-common/goacc/asyncwait-1.c.
> 
> In C++, this is the case that: "args != NULL && args->length () ==
> 0", I suppose?  (I have not verified that.)
> 
> Oh, and next to "wait ()" please also add test coverage for "wait (".

I've made those changes in the attached, thank you. OK?

Julian

ChangeLog

2018-XX-YY  James Norris  
Cesar Philippidis  
Julian Brown  

gcc/c/
* c-parser.c (c_parser_oacc_wait_list): Remove dead diagnostic
code.

gcc/cp/
* parser.c (cp_parser_oacc_wait_list): Fix error message and avoid
duplicate diagnostic.

gcc/testsuite/
    * c-c++-common/goacc/asyncwait-1: Update expected errors and add a
test for "wait (".

Reviewed-by: Thomas Schwinge  
Reviewed-by: Joseph Myers  
commit e3f9a5935e9ec3062017602a580139a0bccf1f4c
Author: Julian Brown 
Date:   Fri Sep 28 05:52:55 2018 -0700

OpenACC wait list diagnostic change

2018-XX-YY  James Norris  
	Cesar Philippidis  
	Julian Brown  

	gcc/c/
	* c-parser.c (c_parser_oacc_wait_list): Remove dead diagnostic
	code.

	gcc/cp/
	* parser.c (cp_parser_oacc_wait_list): Fix error message and avoid
	duplicate diagnostic.

	gcc/testsuite/
	* c-c++-common/goacc/asyncwait-1: Update expected errors and add a
	test for "wait (".

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index afc4071..0d7fcc0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11801,14 +11801,6 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 return list;
 
   args = c_parser_expr_list (parser, false, true, NULL, NULL, NULL, NULL);
-
-  if (args->length () == 0)
-{
-  c_parser_error (parser, "expected integer expression before ')'");
-  release_tree_vector (args);
-  return list;
-}
-
   args_tree = build_tree_list_vec (args);
 
   for (t = args_tree; t; t = TREE_CHAIN (t))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ab6d237..ac19cb4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32605,9 +32605,11 @@ cp_parser_oacc_wait_list (cp_parser *parser, location_t clause_loc, tree list)
 
   if (args == NULL || args->length () == 0)
 {
-  cp_parser_error (parser, "expected integer expression before ')'");
   if (args != NULL)
-	release_tree_vector (args);
+	{
+	  cp_parser_error (parser, "expected integer expression list");
+	  release_tree_vector (args);
+	}
   return list;
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
index e1840af..2f5d476 100644
--- a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
@@ -116,7 +116,6 @@ f (int N, float *a, float *b)
 }
 
 #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-/* { dg-error "expected integer expression before '\\\)'" "" { target c++ } .-1 } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
@@ -152,6 +151,12 @@ f (int N, float *a, float *b)
 b[ii] = a[ii];
 }
 
+#pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait ( /* { dg-error "expected (primary-|)expression before" } */
+{
+for (ii = 0; ii < N; ii++)
+b[ii] = a[ii];
+}
+
 #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait
 {
 for (ii = 0; ii < N; ii++)
@@ -171,7 +176,6 @@ f (int N, float *a, float *b)
 #pragma acc wait (1,2,,) /* { dg-error "expected (primary-|)expression before" } */
 
 #pragma acc wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-/* { dg-error "expected integer expression before '\\\)'" "" { target c++ } .-1 } */
 
 #pragma acc wait (1,*) /* { dg-error "expected (primary-|)expression before" } */
 


Re: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach)

2018-12-03 Thread Julian Brown
On Fri, 30 Nov 2018 03:41:09 -0800
Julian Brown  wrote:

> This is a new version of the patch incorporating
> several improvements/bugfixes made on the og8 branch:

I released I forgot (again!) to incorporate the changes suggested by
Bernhard in:

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00861.html

I've folded those into my copy of the local patch now, but I'll hold
off resubmitting until the rest of the patch is reviewed.

Thanks,

Julian


Re: [committed] Clean up Fortran OpenACC wait clause handling

2018-12-03 Thread Julian Brown
On Fri, 30 Nov 2018 21:48:20 +0100
Thomas Schwinge  wrote:

> Hi!
> 
> commit 3e3de40a5ab21d72f08071a7a40120dd05608cc1
> Author: tschwinge 
> Date:   Fri Nov 30 20:39:18 2018 +
> 
> Clean up Fortran OpenACC wait clause handling
> 
> "wait" can be deduced from "wait_list".
> 
> gcc/fortran/
> * gfortran.h (struct gfc_omp_clauses): Remove "wait".
> Adjust all users.

This appears to conflict with Chung-Lin's uncommitted patch ("Properly
handle wait clause with no arguments"):

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html

I'm not sure if such waits have a "wait_list" or not -- I guess not
though? If so this patch might need to be reverted.

Thanks,

Julian


Re: [patch,openacc] Fix infinite recursion in OMP clause pretty-printing, default label

2018-11-30 Thread Julian Brown
On Thu, 29 Nov 2018 21:25:33 +
Joseph Myers  wrote:

> On Thu, 29 Nov 2018, Julian Brown wrote:
> 
> > On Thu, 20 Sep 2018 10:08:51 -0700
> > Cesar Philippidis  wrote:
> >   
> > > Apparently, Tom ran into an ICE when we were adding support for
> > > new clauses back in the gomp-4_0-branch days.  This patch
> > > shouldn't be necessary because all of the clauses are fully
> > > implemented now, but it may prevent similar bugs from occurring
> > > in the future at least during development.
> > > 
> > > Is this patch OK for trunk? I bootstrapped and regtested it for
> > > x86_64 Linux with nvptx offloading.  
> > 
> > Joseph, could you take a look at this please?  
> 
> Lots of other places in the same function use gcc_unreachable ().  I
> think using gcc_unreachable () here as well would be more appropriate
> than special-casing this one place in this function to use "unknown".

How's this? (Obvious, but re-tested anyway.)

Thanks,

Julian

ChangeLog

gcc/
* tree-pretty-print.c (dump_omp_clause): Make default case
gcc_unreachable.
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 99eca4a..0861cc9 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1180,9 +1180,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
   break;
 
 default:
-  /* Should never happen.  */
-  dump_generic_node (pp, clause, spc, flags, false);
-  break;
+  gcc_unreachable ();
 }
 }
 


Re: [patch,openacc] Fix acc_shutdown issue

2018-11-30 Thread Julian Brown
On Thu, 20 Sep 2018 10:05:30 -0700
Cesar Philippidis  wrote:

> Attached is an old gomp4 patch that allegedly fixes an shutdown
> runtime issue involving OpenACC accelerators. Unfortunately, the
> original patch didn't include a test case, nor did it generate any
> regressions in the libgomp testsuite when I reverted it in og8.
> 
> With that said, I like how this patch eliminates the redundant use of
> gomp_mutex_lock to unmap variables (because gomp_unmap_vars already
> acquires a lock). However, the trade-off is that it does increase
> tgt->list_count to num_funcs + num_vars.
> 
> Does anyone have any strong opinion on this patch and is it OK for
> trunk? I bootstrapped and regtested it for x86_64 Linux with nvptx
> offloading and I didn't encounter any regressions.

I'd like to withdraw this patch (on behalf of Cesar, who has left
Mentor). It's been superseded by:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02556.html

Thanks,

Julian


[PATCH] OpenACC reference count consistency checking

2018-11-30 Thread Julian Brown

This is a trunk-compatible version of the patch posted here:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02365.html

I understand it may not be suitable for committing (especially not
outside stage 1 -- though it's "obviously harmless" in its dormant state),
but it might be helpful for review purposes for the main attach/detach
patch, i.e.:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02556.html

For convenience, I will copy the blurb from the og8 submission of the
patch here.

[...] The model used for checking is as follows.

 1. Each splay tree key that references a target memory descriptor
increases that descriptor's refcount by 1.

 2. Each variable listed in a target memory descriptor that links back to a
splay tree key increases that key's refcount by 1. Each target memory
descriptor's variable list is counted only once, even if multiple
splay tree keys point to it (via their "tgt" field).

 3. Additional ("real") target memory descriptors may be present
representing data mapped through "acc data" or "acc parallel/kernels"
blocks.  These descriptors have their refcount bumped, and the
variables linked through such blocks have their refcounts bumped also
(again, with "once only" semantics).

 4. Asynchronous operations "artificially" bump the reference counts for
referenced target memory descriptors (but *not* for linked
variables/splay tree keys), in order to delay freeing mapped device
memory until the asynchronous operation has completed.  We model this,
for checking purposes only, using an off-side linked list.

 5. "Virtual" reference counts ("virtual_refcount") cannot be checked
purely statically, so we add the incoming value to each key's
statically-determined reference count ("refcount_chk"), and make
sure that the total matches the incoming reference count ("refcount").

Thanks,

Julian

ChangeLog

libgomp/
* libgomp.h (RC_CHECKING): New macro, disabled by default, guarding all
hunks in this patch.
(target_mem_desc): Add forward declaration.
(async_tgt_use): New struct.
(target_mem_desc): Add refcount_chk, mark fields.
(acc_dispatch_t): Add tgt_uses, au_lock fields.
(dump_tgt, gomp_rc_check): Add prototypes.
* oacc-async (goacc_async_unmap_tgt): Add refcount self-check code.
(goacc_async_copyout_unmap_vars): Likewise.
(goacc_remove_var_async): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed_internal): Add refcount
self-check code.
(GOACC_data_start, GOACC_data_end, GOACC_enter_exit_data): Likewise.
* target.c (stdio.h): Include.
(dump_tgt, rc_check_clear, rc_check_count, rc_check_verify)
(gomp_rc_check): New functions to consistency-check reference counts.
(gomp_target_init): Initialise self-check-related device fields.
---
 libgomp/libgomp.h   |   31 +++
 libgomp/oacc-async.c|   46 +++
 libgomp/oacc-parallel.c |   33 
 libgomp/target.c|  199 +++
 4 files changed, 309 insertions(+), 0 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index df49c1b..24cbddd 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -874,9 +874,26 @@ struct target_var_desc {
   uintptr_t length;
 };
 
+/* Uncomment to enable reference-count consistency checking (for development
+   use only).  */
+/*#define RC_CHECKING 1*/
+
+#ifdef RC_CHECKING
+struct target_mem_desc;
+
+struct async_tgt_use {
+  struct target_mem_desc *tgt;
+  struct async_tgt_use *next;
+};
+#endif
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
+#ifdef RC_CHECKING
+  uintptr_t refcount_chk;
+  bool mark;
+#endif
   /* All the splay nodes allocated together.  */
   splay_tree_node array;
   /* Start of the target region.  */
@@ -925,6 +942,10 @@ struct splay_tree_key_s {
  "present increment" operations (via "acc enter data") refering to the same
  host-memory block.  */
   uintptr_t virtual_refcount;
+#ifdef RC_CHECKING
+  /* The recalculated reference count, for verification.  */
+  uintptr_t refcount_chk;
+#endif
   /* For a block with attached pointers, the attachment counters for each.  */
   unsigned short *attach_count;
   /* Pointer to the original mapping of "omp declare target link" object.  */
@@ -958,6 +979,10 @@ typedef struct acc_dispatch_t
 int nasyncqueue;
 struct goacc_asyncqueue **asyncqueue;
 struct goacc_asyncqueue_list *active;
+#ifdef RC_CHECKING
+struct async_tgt_use *tgt_uses;
+gomp_mutex_t au_lock;
+#endif
 
 __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;
 __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;
@@ -1085,6 +1110,12 @@ extern void gomp_detach_pointer (struct gomp_device_descr *,
  struct goacc_asyncqueue *, splay_tree_key,
  uintptr_t, bool, struct gomp_coalesce_buf *);
 
+#ifdef RC_CHECKING
+extern 

Re: [PATCH] Adjust offsets for present data clauses

2018-11-30 Thread Julian Brown
On Fri, 30 Nov 2018 17:55:17 +0800
Chung-Lin Tang  wrote:

> On 2018/7/21 6:07 AM, Cesar Philippidis wrote:
> > This is another old gomp4 patch that corrects a bug where the
> > runtime was passing the wrong offset for subarray data to the
> > accelerator. The original description of this patch can be found
> > here 
> > 
> > I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?
> > 
> > Thanks,
> > Cesar
> >   
> 
> Hi Thomas, this patch should be within your maintainership area now.
> 
> I think this patch is pretty obvious; this is what the 'offset' field
> of struct target_var_desc is supposed to be used for, and is in line
> with other sites throughout target.c.
> 
> I do think it might be better to use a more succinct form like as
> attached, you may consider which form better suits your taste when
> you apply it.

This one will be superseded by the attach/detach patch, I think (where
the additional offset is added also, via calling "gomp_map_val".

HTH,

Julian


Re: [patch,openacc] use existing local variable in cp_parser_oacc_enter_exit_data

2018-11-29 Thread Julian Brown
On Wed, 26 Sep 2018 11:21:33 -0700
Cesar Philippidis  wrote:

> This is an old gomp4 patch that updates the location of the clause for
> acc enter/exit data. Apparently, it didn't impact any test cases. Is
> this OK for trunk or should we drop it from OG8?
> 
> I bootstrapped and regtested it for x86_64 Linux with nvptx
> offloading.

At least at a glance, there is no actual change to behaviour given in
this patch, it is just an extremely minor cleanup. I.e. in:

  location_t loc = pragma_tok->location;
  [...]
  SET_EXPR_LOCATION (stmt, pragma_tok->location);

the variable "loc" is used in the SET_EXPR_LOCATION instead. It doesn't
look like anything could mutate either variable in the interim.

So, OK, or shall we just drop this? (Joseph?)

Thanks,

Julian


Re: [patch,openacc] C, C++ OpenACC wait diagnostic change

2018-11-29 Thread Julian Brown
On Fri, 28 Sep 2018 14:17:42 +0100
Julian Brown  wrote:

> On Wed, 26 Sep 2018 14:08:37 -0700
> Cesar Philippidis  wrote:
> 
> > On 09/26/2018 12:50 PM, Joseph Myers wrote:  
> > > On Wed, 26 Sep 2018, Cesar Philippidis wrote:
> > > 
> > >> Attached is an old patch which updated the C and C++ FEs to use
> > >> %<)%> for the right ')' symbol. It's mostly a cosmetic change.
> > >> All of the changes are self-contained to the OpenACC code
> > >> path.
> > > 
> > > Why is the "before ')'" included in the call to c_parser_error at
> > > all? c_parser_error calls c_parse_error which adds its own "
> > > before " and token description or expansion, so I'd expect the
> > > current error to result in a message ending in something of the
> > > form "before X before Y".
> 
> > Julian, I need to start working on deep copy in OpenACC. Can you
> > take over this patch? The error handling code in the C FE needs to
> > be removed because it's dead.  
> 
> I agree that the error-handling path in question in the C FE is dead.
> The difference is that in C, c_parser_oacc_wait_list parses the open
> parenthesis, the list and then the close parenthesis separately, and
> so a token sequence like:
> 
>(1
> 
> will return an expression list of length 1. In the C++ FE rather, a
> cp_parser_parenthesized_expression_list is parsed all in one go, and
> if the input is not that well-formed sequence then NULL is returned
> (or a zero-length vector for an empty list).
> 
> But for C, it does not appear that c_parser_expr_list has a code path
> that can return a zero-length list at all. So, we can elide the
> diagnostic with no change to compiler behaviour. This patch does that,
> and also changes the C++ diagnostic, leading to errors being reported
> like:
> 
> diag.c: In function 'int main(int, char*)':
> diag.c:6:59: error: expected ')' before end of line
> 6 | #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1
>   | ~ ^
>   |   )
> diag.c:6:59: error: expected integer expression list before end of
> line 
> 
> Actually I'm not too sure how useful the second error line is. Maybe
> we should just remove it to improve consistency between C & C++?
> 
> The attached has been tested with offloading to nvptx and
> bootstrapped. OK?

Ping?

Thanks,

Julian


Re: [patch,openacc] Fix infinite recursion in OMP clause pretty-printing, default label

2018-11-29 Thread Julian Brown
On Thu, 20 Sep 2018 10:08:51 -0700
Cesar Philippidis  wrote:

> Apparently, Tom ran into an ICE when we were adding support for new
> clauses back in the gomp-4_0-branch days.  This patch shouldn't be
> necessary because all of the clauses are fully implemented now, but
> it may prevent similar bugs from occurring in the future at least
> during development.
> 
> Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
> Linux with nvptx offloading.

Joseph, could you take a look at this please?

Thanks,

Julian


[PATCH 1/2] [og8] Further OpenACC reference-counting improvements

2018-11-28 Thread Julian Brown

This is the main set of improvements to reference-counting behaviour
(see parent email for further details).

ChangeLog

libgomp/
* libgomp.h (splay_tree_key_s): Substitute dynamic_refcount field for
virtual_refcount.
(acc_dispatch_t): Remove data_environ field.
(gomp_acc_insert_pointer, gomp_acc_data_env_remove_tgt): Remove
prototypes.
(gomp_acc_remove_pointer): Update prototype.
* oacc-async.c (goacc_remove_var_async): New function.
* oacc-host.c (host_dispatch): Don't initialise removed data_environ
field.
* oacc-init.c (acc_shutdown_1): Use gomp_remove_var instead of
gomp_unmap_vars to remove mappings by splay tree key instead of target
memory descriptor.
* oacc-int.h (splay_tree_key_s): Add forward declaration.
(goacc_remove_car_async): Add prototype.
* oacc-mem.c (gomp_acc_data_env_remove, gomp_acc_data_env_remove_tgt):
Remove functions.
(present_create_copy): Use virtual_refcount instead of dynamic_refcount,
and don't modify after calling gomp_map_vars_async.  Don't create dummy
target_mem_desc.  Fix target pointer return value.
(delete_copyout): Update for virtual_refcount semantics.  Use
goacc_remove_var_async for asynchronous delete/copyouts.
(gomp_acc_insert_pointer): Remove function.
(gomp_acc_remove_pointer): Use virtual_refcount semantics.
* oacc-parallel.c (find_pointer): Add missing GOMP_MAP_FORCE_DETACH
case.
(GOACC_enter_exit_data): Fix struct mapping/unmapping for
virtual_refcount semantics.  Fix attach/detach behaviour.  Don't call
gomp_acc_insert_pointer.
* target.c (gomp_map_vars_existing): Fix initialisation of do_detach
field.
(gomp_map_vars_async): Handle GOMP_MAP_VARS_OPENACC_ENTER_DATA.  Update
for virtual_refcount semantics.  Add some missing initialisations in
dynamic array code paths.
(gomp_unmap_tgt): Don't call gomp_acc_data_env_remove_tgt.
(gomp_remove_var): Fix use-after-free.
(gomp_unmap_vars_async): Update for virtual_refcount semantics.
(gomp_load_image_to_device): Don't use tgt's variable list to store
static function and variable mappings. Initialise virtual refcount.
(gomp_target_init): Don't initialise removed data_environ field.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c: Update test for
fixed refcount behaviour.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Likewise.
---
 libgomp/libgomp.h  |   22 +--
 libgomp/oacc-async.c   |   18 ++
 libgomp/oacc-host.c|2 -
 libgomp/oacc-init.c|6 +-
 libgomp/oacc-int.h |5 +
 libgomp/oacc-mem.c |  206 +---
 libgomp/oacc-parallel.c|  127 ++---
 libgomp/target.c   |   63 ---
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   11 +-
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|1 +
 10 files changed, 189 insertions(+), 272 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 568e260..ea44afc 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -860,8 +860,11 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Dynamic reference count.  */
-  uintptr_t dynamic_refcount;
+  /* Reference counts beyond those that represent genuine references in the
+ linked splay tree key/target memory structures, e.g. for multiple OpenACC
+ "present increment" operations (via "acc enter data") refering to the same
+ host-memory block.  */
+  uintptr_t virtual_refcount;
   /* For a block with attached pointers, the attachment counters for each.  */
   unsigned short *attach_count;
   /* Pointer to the original mapping of "omp declare target link" object.  */
@@ -887,13 +890,6 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 typedef struct acc_dispatch_t
 {
-  /* This is a linked list of data mapped using the
- acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
- Unlike mapped_data in the goacc_thread struct, unmapping can
- happen out-of-order with respect to mapping.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  struct target_mem_desc *data_environ;
-
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
   __typeof (GOMP_OFFLOAD_openacc_exec_params) *exec_params_func;
@@ -1010,9 +1006,9 @@ enum gomp_map_vars_kind
 
 struct gomp_coalesce_buf;
 
-extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
-extern void gomp_acc_remove_pointer (void **, size_t *, unsigned short *,
- int, 

[PATCH 0/2] [og8] Further OpenACC/libgomp refcounting fixes

2018-11-28 Thread Julian Brown

As mentioned in:

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01773.html

I had a few more changes planned for the reference-counting implementation
for OpenACC in libgomp.  These are embodied in this patch series.
The highlights are:

 - reference counts in the linked memory-mapping splay tree structure can be
   self-checked for consistency using optional (i.e. development-only)
   code.  This survives a libgomp test run (with offloading to nvptx),
   so I'm reasonably confident it's good.

 - the "data_environ" field in the device descriptor -- a linear linked
   list containing a target memory descriptor for each "acc enter data"
   mapping -- has been removed.  This brings OpenACC closer to the OpenMP
   implementation for non-lexically-scoped data mapping
   (GOMP_target_enter_exit_data), and is potentially a performance win
   if lots of data is mapped in this way.

 - the semantics of the "dynamic_refcount" field in the splay_tree_key
   structure have shifted slightly, so I've renamed the field.  It now
   represents references that are excess to those represented by actual
   pointers in the linked splay tree/target-memory descriptor structure.
   That might have been the intention before in fact, but the
   implementation was inconsistent.

I will apply to the og8 branch shortly.

Julian Brown (2):
  [og8] Further OpenACC reference-counting improvements
  [og8] OpenACC reference count consistency checking

 libgomp/libgomp.h  |   55 +++--
 libgomp/oacc-async.c   |   64 +
 libgomp/oacc-host.c|2 -
 libgomp/oacc-init.c|6 +-
 libgomp/oacc-int.h |5 +
 libgomp/oacc-mem.c |  206 
 libgomp/oacc-parallel.c|  160 +++-
 libgomp/target.c   |  262 ++--
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   11 +-
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|1 +
 10 files changed, 499 insertions(+), 273 deletions(-)



[PATCH 2/2] [og8] OpenACC reference count consistency checking

2018-11-28 Thread Julian Brown

This is the reference count consistency-checking code.  The model used
for checking is as follows.

 1. Each splay tree key that references a target memory descriptor
increases that descriptor's refcount by 1.

 2. Each variable listed in a target memory descriptor that links back to a
splay tree key increases that key's refcount by 1. Each target memory
descriptor's variable list is counted only once, even if multiple
splay tree keys point to it (via their "tgt" field).

 3. Additional ("real") target memory descriptors may be present
representing data mapped through "acc data" or "acc parallel/kernels"
blocks.  These descriptors have their refcount bumped, and the
variables linked through such blocks have their refcounts bumped also
(again, with "once only" semantics).

 4. Asynchronous operations "artificially" bump the reference counts for
referenced target memory descriptors (but *not* for linked
variables/splay tree keys), in order to delay freeing mapped device
memory until the asynchronous operation has completed.  We model this,
for checking purposes only, using an off-side linked list.

 5. "Virtual" reference counts ("virtual_refcount") cannot be checked
purely statically, so we add the incoming value to each key's
statically-determined reference count ("refcount_chk"), and make
sure that the total matches the incoming reference count ("refcount").

With the previous patch, as noted in the parent email, this allows a
libgomp test run to complete successfully (with checking enabled).

Julian

ChangeLog

libgomp/
* libgomp.h (RC_CHECKING): New macro, disabled by default, guarding all
hunks in this patch.
(target_mem_desc): Add forward declaration.
(async_tgt_use): New struct.
(target_mem_desc): Add refcount_chk, mark fields.
(acc_dispatch_t): Add tgt_uses, au_lock fields.
(dump_tgt, gomp_rc_check): Add prototypes.
* oacc-async (goacc_async_unmap_tgt): Add refcount self-check code.
(goacc_async_copyout_unmap_vars): Likewise.
(goacc_remove_var_async): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed_internal): Add refcount
self-check code.
(GOACC_data_start, GOACC_data_end, GOACC_enter_exit_data): Likewise.
* target.c (stdio.h): Include.
(dump_tgt, rc_check_clear, rc_check_count, rc_check_verify)
(gomp_rc_check): New functions to consistency-check reference counts.
(gomp_target_init): Initialise self-check-related device fields.
---
 libgomp/libgomp.h   |   33 -
 libgomp/oacc-async.c|   46 +++
 libgomp/oacc-parallel.c |   33 
 libgomp/target.c|  199 +++
 4 files changed, 310 insertions(+), 1 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index ea44afc..77cc923 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -814,9 +814,26 @@ struct target_var_desc {
   uintptr_t length;
 };
 
+/* Uncomment to enable reference-count consistency checking (for development
+   use only).  */
+/*#define RC_CHECKING 1*/
+
+#ifdef RC_CHECKING
+struct target_mem_desc;
+
+struct async_tgt_use {
+  struct target_mem_desc *tgt;
+  struct async_tgt_use *next;
+};
+#endif
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
+#ifdef RC_CHECKING
+  uintptr_t refcount_chk;
+  bool mark;
+#endif
   /* All the splay nodes allocated together.  */
   splay_tree_node array;
   /* Start of the target region.  */
@@ -865,6 +882,10 @@ struct splay_tree_key_s {
  "present increment" operations (via "acc enter data") refering to the same
  host-memory block.  */
   uintptr_t virtual_refcount;
+#ifdef RC_CHECKING
+  /* The recalculated reference count, for verification.  */
+  uintptr_t refcount_chk;
+#endif
   /* For a block with attached pointers, the attachment counters for each.  */
   unsigned short *attach_count;
   /* Pointer to the original mapping of "omp declare target link" object.  */
@@ -899,7 +920,11 @@ typedef struct acc_dispatch_t
 int nasyncqueue;
 struct goacc_asyncqueue **asyncqueue;
 struct goacc_asyncqueue_list *active;
-
+#ifdef RC_CHECKING
+struct async_tgt_use *tgt_uses;
+gomp_mutex_t au_lock;
+#endif
+
 __typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;
 __typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;
 __typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;
@@ -1028,6 +1053,12 @@ extern void gomp_detach_pointer (struct gomp_device_descr *,
  struct goacc_asyncqueue *, splay_tree_key,
  uintptr_t, bool, struct gomp_coalesce_buf *);
 
+#ifdef RC_CHECKING
+extern void dump_tgt (const char *, struct target_mem_desc *);
+extern void gomp_rc_check (struct gomp_device_descr *,
+			   struct target_mem_desc *);
+#endif
+
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 	  

[PATCH 5/6] [og8] Backport parts of upstream declare-allocate patch

2018-11-20 Thread Julian Brown

This patch adjusts mappings used for some special cases in Fortran
(e.g. allocatable scalars) on og8 to match code that is already upstream,
or that has been submitted but not yet reviewed. Parts taken from
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01205.html and parts
reverted from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02188.html.

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't use
GOMP_MAP_FIRSTPRIVATE_POINTER.
(gfc_trans_omp_clauses_1): Adjust handling of allocatable scalars.

gcc/
* gimplify.c (demote_firstprivate_pointer): Remove.
(gimplify_scan_omp_clauses): Remove special handling for OpenACC. Don't
call demote_firstprivate_pointer.
(gimplify_adjust_omp_clauses): Adjust promotion of reduction clauses.
* omp-low.c (lower_omp_target): Remove special handling for Fortran.

gcc/testsuite/
* gfortran.dg/goacc/kernels-alias-3.f95: Revert comment changes and
XFAIL.

libgomp/
* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Remove XFAIL for
-O2 and -O3 and explanatory comment.
---
 gcc/fortran/trans-openmp.c |   22 -
 gcc/gimplify.c |   49 ++-
 gcc/omp-low.c  |3 +-
 .../gfortran.dg/goacc/kernels-alias-3.f95  |4 +-
 .../libgomp.oacc-fortran/non-scalar-data.f90   |6 +--
 5 files changed, 20 insertions(+), 64 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 98f40d1..71a3ebb 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1084,7 +1084,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 	return;
   tree orig_decl = decl;
   c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
   OMP_CLAUSE_DECL (c4) = decl;
   OMP_CLAUSE_SIZE (c4) = size_int (0);
   decl = build_fold_indirect_ref (decl);
@@ -1100,10 +1100,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 	  OMP_CLAUSE_SIZE (c3) = size_int (0);
 	  decl = build_fold_indirect_ref (decl);
 	  OMP_CLAUSE_DECL (c) = decl;
-	  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
 	}
-  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
-	OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
 }
   if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
 {
@@ -2168,11 +2165,15 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 	(TREE_TYPE (TREE_TYPE (field)
 		{
 		  tree orig_decl = decl;
-		  enum gomp_map_kind gmk = GOMP_MAP_FIRSTPRIVATE_POINTER;
-		  if (GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
-			  && (n->sym->attr.oacc_declare_create)
-			  && clauses->update_allocatable)
-			gmk = ptr_map_kind;
+		  enum gomp_map_kind gmk = GOMP_MAP_POINTER;
+		  if (GFC_DECL_GET_SCALAR_ALLOCATABLE (field)
+			  && n->sym->attr.oacc_declare_create)
+			{
+			  if (clauses->update_allocatable)
+			gmk = GOMP_MAP_ALWAYS_POINTER;
+			  else
+			gmk = GOMP_MAP_FIRSTPRIVATE_POINTER;
+			}
 		  node4 = build_omp_clause (input_location,
 		OMP_CLAUSE_MAP);
 		  OMP_CLAUSE_SET_MAP_KIND (node4, gmk);
@@ -2189,10 +2190,7 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 			  OMP_CLAUSE_DECL (node3) = decl;
 			  OMP_CLAUSE_SIZE (node3) = size_int (0);
 			  decl = build_fold_indirect_ref (decl);
-			  OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
 			}
-		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
-			OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
 		}
 		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))
 		  && n->u.map_op != OMP_MAP_ATTACH
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 40bf586..7f55cfd 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7634,37 +7634,6 @@ find_decl_expr (tree *tp, int *walk_subtrees, void *data)
   return NULL_TREE;
 }
 
-static void
-demote_firstprivate_pointer (tree decl, gimplify_omp_ctx *ctx)
-{
-  if (!lang_GNU_Fortran ())
-return;
-
-  while (ctx)
-{
-  if (ctx->region_type == ORT_ACC_PARALLEL
-	  || ctx->region_type == ORT_ACC_KERNELS)
-	break;
-  ctx = ctx->outer_context;
-}
-
-  if (ctx == NULL)
-return;
-
-  tree clauses = ctx->clauses;
-
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-	  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
-	  && OMP_CLAUSE_DECL (c) == decl)
-	{
-	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_POINTER);
-	  return;
-	}
-}
-}
-
 /* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
the struct node to insert the new mapping after (when the struct node is
@@ -7843,7 +7812,7 @@ 

[PATCH 6/6] [og8] OpenACC refcounting refresh

2018-11-20 Thread Julian Brown

This patch represents a mild overhaul of reference counting for OpenACC
in libgomp.  It's been partly automatically checked (using code not yet
quite finished nor submitted upstream), but it's already more precise
than the pre-patch implementation (as demonstrated by adjustments to
previously-erroneous tests, included).

I have a few more changes planned, but those are still tbd.

libgomp/
* libgomp.h (gomp_device_descr): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
(gomp_acc_remove_pointer): Update prototype.
(gomp_acc_data_env_remove_tgt): Add prototype.
(gomp_unmap_vars, gomp_map_vars_async): Update prototype.
* oacc-int.h (goacc_async_copyout_unmap_vars): Update prototype.
* oacc-async.c (goacc_async_copyout_unmap_vars): Remove finalize
parameter.
* oacc-init.c (acc_shutdown_1): Remove finalize argument to
gomp_unmap_vars call.
* oacc-mem.c (lookup_dev_1): New helper function.
(lookup_dev): Rewrite in terms of above.
(acc_free): Update calls to lookup_dev.
(acc_map_data): Likewise.  Don't add data mapped this way to OpenACC
data environment list.
(gomp_acc_data_env_remove, gomp_acc_data_env_remove_tgt): New functions.
(acc_unmap_data): Rewrite using splay tree functions directly.  Don't
call gomp_unmap_vars.  Fix refcount handling.
(present_create_copy): Use GOMP_MAP_VARS_OPENACC_ENTER_DATA in
gomp_map_vars_async call.  Adjust refcount handling.
(delete_copyout): Remove dubious handling of target_mem_desc refcount.
(gomp_acc_insert_pointer): Use GOMP_MAP_VARS_OPENACC_ENTER_DATA in
gomp_map_vars_async call.  Update refcount handling.
(gomp_acc_remove_pointer): Reimplement.  Fix detach and refcount
handling.
* oacc-parallel.c (find_pointer): Handle more mapping types.  Update
calls to gomp_unmap_vars and goacc_async_copyout_unmap_vars.
(GOACC_enter_exit_data): Update refcount handling.

libgomp/
* target.c (gomp_detach_pointer): Unlock device on error path.
(gomp_map_vars_async): Support GOMP_MAP_VARS_OPENACC_ENTER_DATA and
mapping size fix GOMP_MAP_ATTACH.
(gomp_unmap_tgt): Call gomp_acc_data_env_remove_tgt.
(gomp_unmap_vars): Remove finalize parameter.
(gomp_unmap_vars_async): Likewise.  Adjust detach handling.
(GOMP_target, GOMP_target_ext, GOMP_target_end_data)
(gomp_target_task_fn): Update calls to gomp_unmap_vars.
* testsuite/libgomp.oacc-c-c++-common/context-2.c: Use correct API to
unmap data.
* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: New test.
* testsuite/libgomp.oacc-fortran/data-2.f90: Fix for unmap semantics.
---
 libgomp/libgomp.h  |   10 +-
 libgomp/oacc-async.c   |4 +-
 libgomp/oacc-init.c|2 +-
 libgomp/oacc-int.h |2 +-
 libgomp/oacc-mem.c |  387 ++--
 libgomp/oacc-parallel.c|   76 +++--
 libgomp/target.c   |   35 ++-
 .../libgomp.oacc-c-c++-common/context-2.c  |6 +-
 .../libgomp.oacc-c-c++-common/context-4.c  |6 +-
 .../libgomp.oacc-c-c++-common/deep-copy-6.c|   59 +++
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   42 +++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|   53 +++
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   20 +-
 13 files changed, 445 insertions(+), 257 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 17fe0d3..568e260 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1002,6 +1002,7 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
+  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
@@ -1010,7 +1011,8 @@ enum gomp_map_vars_kind
 struct gomp_coalesce_buf;
 
 extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
-extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
+extern void gomp_acc_remove_pointer (void **, size_t *, unsigned short *,
+ int, void *, bool, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
    unsigned short *);
 struct gomp_coalesce_buf;
@@ -1039,10 +1041,12 @@ extern struct 

[PATCH 3/6] [og8] OpenACC 2.6 manual deep copy support (attach/detach)

2018-11-20 Thread Julian Brown

Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00826.html

gcc/c/
* c-parser.c (c_parser_omp_variable_list): Allow deref (->) in
variable lists.
(c_parser_oacc_all_clauses): Re-alphabetize cases.
* c-typeck.c (handle_omp_array_sections_1): Support deref.

gcc/cp/
* parser.c (cp_parser_omp_var_list_no_open): Support deref.
(cp_parser_oacc_all_clauses): Re-alphabetize cases.
* semantics.c (finish_omp_clauses): Allow "this" for OpenACC data
clauses.  Support deref.

gcc/fortran/
* gfortran.h (gfc_omp_map_op): Add OMP_MAP_ATTACH, OMP_MAP_DETACH.
* openmp.c (omp_mask2): Add OMP_CLAUSE_ATTACH, OMP_CLAUSE_DETACH.
(gfc_match_omp_clauses): Remove allow_derived parameter, infer from
clause mask.  Support attach and detach.  Slight reformatting.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES)
(OACC_ENTER_DATA_CLAUSES): Add OMP_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSES): Add OMP_CLAUSE_DETACH.
(match_acc): Remove derived_types parameter, and don't pass to
gfc_match_omp_clauses.
(gfc_match_oacc_update): Don't pass allow_derived argument.
(gfc_match_oacc_enter_data): Likewise.
(gfc_match_oacc_exit_data): Likewise.
(check_symbol_not_pointer): Don't disallow pointer objects of derived
type.
(resolve_oacc_data_clauses): Don't disallow allocatable derived types.
(resolve_omp_clauses): Perform duplicate checking only for non-derived
type component accesses (plain variables and arrays or array sections).
Support component refs.
* trans-openmp.c (gfc_omp_privatize_by_reference): Support component
refs.
(gfc_trans_omp_clauses_1): Support component refs, attach and detach
clauses.

gcc/
* gimplify.c (gimplify_omp_var_data): Add GOVD_MAP_HAS_ATTACHMENTS.
(insert_struct_component_mapping): Support derived-type member mappings
for arrays with descriptors which use GOMP_MAP_TO_PSET.
(gimplify_scan_omp_clauses): Rewrite GOMP_MAP_ALWAYS_POINTER to
GOMP_MAP_ATTACH for OpenACC struct/derived-type component pointers.
Handle pointer mappings that use GOMP_MAP_TO_PSET.  Handle attach/detach
clauses.
(gimplify_adjust_omp_clauses_1): Skip adjustments for explicit
attach/detach clauses.
(gimplify_omp_target_update): Handle finalize for detach.

gcc/testsuite/
* c-c++-common/goacc/mdc-1.c: Update scan tests.
* gfortran.dg/goacc/data-clauses.f95: Remove expected errors.
* gfortran.dg/goacc/derived-types.f90: Likewise.
* gfortran.dg/goacc/enter-exit-data.f95: Likewise.

libgomp/
* libgomp.h (struct target_var_desc): Add do_detach flag.
(struct splay_tree_key_s): Add attach_count field.
(struct gomp_coalesce_buf): Add forward declaration.
(gomp_map_val, gomp_attach_pointer, gomp_detach_pointer): Add
prototypes.
(gomp_unmap_vars): Add finalize parameter.
* libgomp.map (OACC_2.6): New section. Add acc_attach, acc_attach_async,
acc_detach, acc_detach_async, acc_detach_finalize,
acc_detach_finalize_async.
* oacc-async.c (goacc_async_copyout_unmap_vars): Add finalize parameter.
Pass to gomp_unmap_vars_async.
* oacc-init.c (acc_shutdown_1): Update call to gomp_unmap_vars.
* oacc-int.h (goacc_async_copyout_unmap_vars): Add finalize parameter.
* oacc-mem.c (acc_unmap_data): Update call to gomp_unmap_vars.
(present_create_copy): Initialise attach_count.
(delete_copyout): Likewise.
(gomp_acc_insert_pointer): Likewise.
(gomp_acc_remove_pointer): Update calls to gomp_unmap_vars,
goacc_async_copyout_unmap_vars.
(acc_attach_async, acc_attach, goacc_detach_internal, acc_detach)
(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): New
functions.
* oacc-parallel.c (find_pointer): Support attach/detach.  Make a little
more strict.
(GOACC_parallel_keyed_internal): Use gomp_map_val to calculate device
addresses.  Update calls to gomp_unmap_vars,
goacc_async_copyout_unmap_vars.
(GOACC_data_end): Update call to gomp_unmap_vars.
(GOACC_enter_exit_data): Support attach/detach and GOMP_MAP_STRUCT.
* openacc.h (acc_attach, acc_attach_async, acc_detach)
(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): Add
prototypes.
* target.c (limits.h): Include.
(gomp_map_vars_existing): Initialise do_detach field of tgt_var_desc.
(gomp_attach_pointer, gomp_detach_pointer): New functions.
(gomp_map_val): Make global.
(gomp_map_vars_async): Support attach and detach.
(gomp_remove_var): Free attach count array if present.
 

[PATCH 0/6] [og8] OpenACC attach/detach

2018-11-20 Thread Julian Brown

This patch series is a backport of the OpenACC attach/detach support to
the openacc-gcc-8-branch branch. It was previously posted upstream here:

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00823.html

This version of the series has been adjusted to account for features on
the branch that are not yet upstream. It also contains improvements to
the reference counting behaviour, partially verified using self-checking
code (not quite complete, and not yet submitted).

Tested (as a series) with offloading to nvptx. I will apply to the
openacc-gcc-8-branch shortly.

Julian Brown (6):
  [og8] Host-to-device transfer coalescing & magic offset value
self-documentation
  [og8] Factor out duplicate code in gimplify_scan_omp_clauses
  [og8] OpenACC 2.6 manual deep copy support (attach/detach)
  [og8] Interaction of dynamic/multidimensional arrays with
attach/detach.
  [og8] Backport parts of upstream declare-allocate patch
  [og8] OpenACC refcounting refresh

 gcc/c/c-parser.c   |   15 +-
 gcc/c/c-typeck.c   |4 +
 gcc/cp/parser.c|   16 +-
 gcc/cp/semantics.c |6 +-
 gcc/fortran/gfortran.h |2 +
 gcc/fortran/openmp.c   |  126 --
 gcc/fortran/trans-openmp.c |  163 +++-
 gcc/gimplify.c |  414 ++
 gcc/omp-low.c  |   13 +-
 .../c-c++-common/goacc/deep-copy-multidim.c|   32 ++
 gcc/testsuite/c-c++-common/goacc/mdc-1.c   |   10 +-
 gcc/testsuite/gfortran.dg/goacc/data-clauses.f95   |   38 +-
 gcc/testsuite/gfortran.dg/goacc/derived-types.f90  |   23 +-
 .../gfortran.dg/goacc/enter-exit-data.f95  |   24 +-
 .../gfortran.dg/goacc/kernels-alias-3.f95  |4 +-
 libgomp/libgomp.h  |   30 ++-
 libgomp/libgomp.map|   10 +
 libgomp/oacc-mem.c |  459 
 libgomp/oacc-parallel.c|  212 --
 libgomp/openacc.h  |6 +
 libgomp/target.c   |  291 +++--
 .../libgomp.oacc-c-c++-common/context-2.c  |6 +-
 .../libgomp.oacc-c-c++-common/context-4.c  |6 +-
 .../libgomp.oacc-c-c++-common/deep-copy-1.c|   24 +
 .../libgomp.oacc-c-c++-common/deep-copy-2.c|   29 ++
 .../libgomp.oacc-c-c++-common/deep-copy-3.c|   34 ++
 .../libgomp.oacc-c-c++-common/deep-copy-4.c|   87 
 .../libgomp.oacc-c-c++-common/deep-copy-5.c|   81 
 .../libgomp.oacc-c-c++-common/deep-copy-6.c|   59 +++
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   42 ++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|   53 +++
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   20 +-
 .../testsuite/libgomp.oacc-fortran/deep-copy-1.f90 |   35 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-2.f90 |   33 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-3.f90 |   34 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-4.f90 |   49 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-5.f90 |   57 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-6.f90 |   61 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-7.f90 |   89 
 .../testsuite/libgomp.oacc-fortran/deep-copy-8.f90 |   41 ++
 .../libgomp.oacc-fortran/derived-type-1.f90|6 +-
 .../libgomp.oacc-fortran/non-scalar-data.f90   |6 +-
 .../testsuite/libgomp.oacc-fortran/update-2.f90|   44 +-
 43 files changed, 2079 insertions(+), 715 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-3.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-4.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-5.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-7.f90
 cr

[PATCH 4/6] [og8] Interaction of dynamic/multidimensional arrays with attach/detach.

2018-11-20 Thread Julian Brown

OpenACC multidimensional (or "dynamic") arrays do not seem to fit very
neatly into the attach/detach mechanism described for OpenACC 2.6,
that is if the user tries to use a multidimensional array as a field
in a struct.  This patch disallows that combination, for now at least.
Multidimensional array support in general has been submitted upstream
here but not yet accepted:

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00937.html

gcc/
* omp-low.c (scan_sharing_clauses): Disallow dynamic (multidimensional)
arrays within structs.

gcc/testsuite/
* c-c++-common/goacc/deep-copy-multidim.c: Add test.

libgomp/
* target.c (gomp_map_vars_async, gomp_load_image_to_device):
Zero-initialise do_detach, dynamic_refcount and attach_count in more
places.
---
 gcc/omp-low.c  |   10 +-
 .../c-c++-common/goacc/deep-copy-multidim.c|   32 
 libgomp/target.c   |6 
 3 files changed, 47 insertions(+), 1 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e559211..1726451 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1481,7 +1481,15 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  t = TREE_TYPE (t);
 		}
 
-	  install_var_field (da_decl, by_ref, 3, ctx);
+	  if (DECL_P (decl))
+		install_var_field (da_decl, by_ref, 3, ctx);
+	  else
+	{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			"dynamic arrays cannot be used within structs");
+		  break;
+		}
+
 	  tree new_var = install_var_local (da_decl, ctx);
 
 	  bool existed = ctx->dynamic_arrays->put (new_var, da_dimensions);
diff --git a/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c b/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
new file mode 100644
index 000..1696f0c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+
+#include 
+#include 
+
+struct dc
+{
+  int a;
+  int **b;
+};
+
+int
+main ()
+{
+  int n = 100, i, j;
+  struct dc v = { .a = 3 };
+
+  v.b = (int **) malloc (sizeof (int *) * n);
+  for (i = 0; i < n; i++)
+v.b[i] = (int *) malloc (sizeof (int) * n);
+
+#pragma acc parallel loop copy(v.a, v.b[:n][:n]) /* { dg-error "dynamic arrays cannot be used within structs" } */
+  for (i = 0; i < n; i++)
+for (j = 0; j < n; j++)
+  v.b[i][j] = v.a + i + j;
+
+  for (i = 0; i < n; i++)
+for (j = 0; j < n; j++)
+  assert (v.b[i][j] == v.a + i + j);
+
+  return 0;
+}
diff --git a/libgomp/target.c b/libgomp/target.c
index d9d42eb..da51291 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1484,6 +1484,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 	 set to false here.  */
 	  tgt->list[i].copy_from = false;
 	  tgt->list[i].always_copy_from = false;
+	  tgt->list[i].do_detach = false;
 
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
@@ -1521,6 +1522,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 
 		  k->tgt = tgt;
 		  k->refcount = 1;
+		  k->dynamic_refcount = 0;
+		  k->attach_count = NULL;
 		  k->link_key = NULL;
 		  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		  target_row_addr = tgt->tgt_start + tgt_size;
@@ -1532,6 +1535,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 		= GOMP_MAP_COPY_FROM_P (kind & typemask);
 		  row_desc->always_copy_from
 		= GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+		  row_desc->do_detach = false;
 		  row_desc->offset = 0;
 		  row_desc->length = da->data_row_size;
 
@@ -1839,6 +1843,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
   k->tgt_offset = target_table[i].start;
   k->refcount = REFCOUNT_INFINITY;
+  k->attach_count = NULL;
   k->link_key = NULL;
   tgt->list[i].key = k;
   tgt->refcount++;
@@ -1873,6 +1878,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
   k->tgt_offset = target_var->start;
   k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
+  k->attach_count = NULL;
   k->link_key = NULL;
   tgt->list[i].key = k;
   tgt->refcount++;


[PATCH 1/6] [og8] Host-to-device transfer coalescing & magic offset value self-documentation

2018-11-20 Thread Julian Brown

Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00825.html

libgomp/
* libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define.
* target.c (FIELD_TGT_EMPTY): Define.
(gomp_coalesce_chunk): New.
(gomp_coalesce_buf): Use above instead of flat array of size_t pairs.
(gomp_coalesce_buf_add): Adjust for above change.
(gomp_copy_host2dev): Likewise.
(gomp_map_val): Use OFFSET_* macros instead of magic constants.  Write
as switch instead of list of ifs.
(gomp_map_vars_async): Adjust for gomp_coalesce_chunk change.  Use
OFFSET_* macros.
---
 libgomp/libgomp.h |5 +++
 libgomp/target.c  |  101 +++-
 2 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 607f4c2..acf7f8f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -842,6 +842,11 @@ struct target_mem_desc {
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
 
+/* Special offset values.  */
+#define OFFSET_INLINED (~(uintptr_t) 0)
+#define OFFSET_POINTER (~(uintptr_t) 1)
+#define OFFSET_STRUCT (~(uintptr_t) 2)
+
 struct splay_tree_key_s {
   /* Address of the host object.  */
   uintptr_t host_start;
diff --git a/libgomp/target.c b/libgomp/target.c
index ab17650..7220ac6 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,6 +45,8 @@
 #include "plugin-suffix.h"
 #endif
 
+#define FIELD_TGT_EMPTY (~(size_t) 0)
+
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -206,8 +208,14 @@ goacc_device_copy_async (struct gomp_device_descr *devicep,
 }
 }
 
-/* Infrastructure for coalescing adjacent or nearly adjacent (in device addresses)
-   host to device memory transfers.  */
+/* Infrastructure for coalescing adjacent or nearly adjacent (in device
+   addresses) host to device memory transfers.  */
+
+struct gomp_coalesce_chunk
+{
+  /* The starting and ending point of a coalesced chunk of memory.  */
+  size_t start, end;
+};
 
 struct gomp_coalesce_buf
 {
@@ -215,10 +223,10 @@ struct gomp_coalesce_buf
  it will be copied to the device.  */
   void *buf;
   struct target_mem_desc *tgt;
-  /* Array with offsets, chunks[2 * i] is the starting offset and
- chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address
+  /* Array with offsets, chunks[i].start is the starting offset and
+ chunks[i].end ending offset relative to tgt->tgt_start device address
  of chunks which are to be copied to buf and later copied to device.  */
-  size_t *chunks;
+  struct gomp_coalesce_chunk *chunks;
   /* Number of chunks in chunks array, or -1 if coalesce buffering should not
  be performed.  */
   long chunk_cnt;
@@ -251,14 +259,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
 {
   if (cbuf->chunk_cnt < 0)
 	return;
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  cbuf->chunk_cnt = -1;
 	  return;
 	}
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP)
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end + MAX_COALESCE_BUF_GAP)
 	{
-	  cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len;
+	  cbuf->chunks[cbuf->chunk_cnt-1].end = start + len;
 	  cbuf->use_cnt++;
 	  return;
 	}
@@ -268,8 +276,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
   if (cbuf->use_cnt == 1)
 	cbuf->chunk_cnt--;
 }
-  cbuf->chunks[2 * cbuf->chunk_cnt] = start;
-  cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len;
+  cbuf->chunks[cbuf->chunk_cnt].start = start;
+  cbuf->chunks[cbuf->chunk_cnt].end = start + len;
   cbuf->chunk_cnt++;
   cbuf->use_cnt = 1;
 }
@@ -301,20 +309,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
   if (cbuf)
 {
   uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start;
-  if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (doff < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  long first = 0;
 	  long last = cbuf->chunk_cnt - 1;
 	  while (first <= last)
 	{
 	  long middle = (first + last) >> 1;
-	  if (cbuf->chunks[2 * middle + 1] <= doff)
+	  if (cbuf->chunks[middle].end <= doff)
 		first = middle + 1;
-	  else if (cbuf->chunks[2 * middle] <= doff)
+	  else if (cbuf->chunks[middle].start <= doff)
 		{
-		  if (doff + sz > cbuf->chunks[2 * middle + 1])
+		  if (doff + sz > cbuf->chunks[middle].end)
 		gomp_fatal ("internal libgomp cbuf error");
-		  memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0]),
+		  memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start),
 			  h, sz);
 		  return;
 		}
@@ -538,17 +546,25 @@ gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
 return tgt->list[i].key->tgt->tgt_start

[PATCH 2/6] [og8] Factor out duplicate code in gimplify_scan_omp_clauses

2018-11-20 Thread Julian Brown

Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00824.html

gcc/
* gimplify.c (insert_struct_component_mapping)
(check_base_and_compare_lt): New.
(gimplify_scan_omp_clauses): Outline duplicated code into calls to
above two functions.
---
 gcc/gimplify.c |  307 
 1 files changed, 174 insertions(+), 133 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9be0b70..824e020 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7661,6 +7661,160 @@ demote_firstprivate_pointer (tree decl, gimplify_omp_ctx *ctx)
 }
 }
 
+/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
+   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
+   the struct node to insert the new mapping after (when the struct node is
+   initially created).  PREV_NODE is the first of two or three mappings for a
+   pointer, and is either:
+ - the node before C, when a pair of mappings is used, e.g. for a C/C++
+   array section.
+ - not the node before C.  This is true when we have a reference-to-pointer
+   type (with a mapping for the reference and for the pointer), or for
+   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
+   If SCP is non-null, the new node is inserted before *SCP.
+   if SCP is null, the new node is inserted before PREV_NODE.
+   The return type is:
+ - PREV_NODE, if SCP is non-null.
+ - The newly-created ALLOC or RELEASE node, if SCP is null.
+ - The second newly-created ALLOC or RELEASE node, if we are mapping a
+   reference to a pointer.  */
+
+static tree
+insert_struct_component_mapping (enum tree_code code, tree c, tree struct_node,
+ tree prev_node, tree *scp)
+{
+  enum gomp_map_kind mkind = (code == OMP_TARGET_EXIT_DATA
+			  || code == OACC_EXIT_DATA)
+			 ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  tree cl = scp ? prev_node : c2;
+  OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
+  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
+  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
+  if (struct_node)
+OMP_CLAUSE_CHAIN (struct_node) = c2;
+
+  /* We might need to create an additional mapping if we have a reference to a
+ pointer (in C++).  Don't do this if we have something other than a
+ GOMP_MAP_ALWAYS_POINTER though, i.e. a GOMP_MAP_TO_PSET.  */
+  if (OMP_CLAUSE_CHAIN (prev_node) != c
+  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
+  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+	  == GOMP_MAP_ALWAYS_POINTER))
+{
+  tree c4 = OMP_CLAUSE_CHAIN (prev_node);
+  tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  OMP_CLAUSE_SET_MAP_KIND (c3, mkind);
+  OMP_CLAUSE_DECL (c3) = unshare_expr (OMP_CLAUSE_DECL (c4));
+  OMP_CLAUSE_SIZE (c3) = TYPE_SIZE_UNIT (ptr_type_node);
+  OMP_CLAUSE_CHAIN (c3) = prev_node;
+  if (!scp)
+	OMP_CLAUSE_CHAIN (c2) = c3;
+  else
+	cl = c3;
+}
+
+  if (scp)
+*scp = c2;
+
+  return cl;
+}
+
+/* Called initially with ORIG_BASE non-null, sets PREV_BITPOS and PREV_POFFSET
+   to the offset of the field given in BASE.  Return type is 1 if BASE is equal
+   to *ORIG_BASE after stripping off ARRAY_REF and INDIRECT_REF nodes and
+   calling get_inner_reference, else 0.
+
+   Called subsequently with ORIG_BASE null, compares the offset of the field
+   given in BASE to PREV_BITPOS, PREV_POFFSET. Returns -1 if the base object
+   has changed, 0 if the new value has a higher bit position than that
+   described by the aforementioned arguments, or 1 if the new value is less
+   than them.  Used for (insertion) sorting components after a GOMP_MAP_STRUCT
+   mapping.  */
+
+static int
+check_base_and_compare_lt (tree base, tree *orig_base, tree decl,
+			   poly_int64 *prev_bitpos,
+			   poly_offset_int *prev_poffset)
+{
+  tree offset;
+  poly_int64 bitsize, bitpos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep = 0;
+  poly_offset_int poffset;
+
+  if (orig_base)
+{
+  while (TREE_CODE (base) == ARRAY_REF)
+	base = TREE_OPERAND (base, 0);
+
+  if (TREE_CODE (base) == INDIRECT_REF)
+	base = TREE_OPERAND (base, 0);
+}
+  else
+{
+  if (TREE_CODE (base) == ARRAY_REF)
+	{
+	  while (TREE_CODE (base) == ARRAY_REF)
+	base = TREE_OPERAND (base, 0);
+	  if (TREE_CODE (base) != COMPONENT_REF
+	  || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
+	return -1;
+	}
+  else if (TREE_CODE (base) == INDIRECT_REF
+	   && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
+	   && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
+		   == REFERENCE_TYPE))
+	base = TREE_OPERAND (base, 0);
+}
+
+  base = get_inner_reference (base, , , , ,
+			  , , );
+
+  if (orig_base)
+*orig_base = 

[PATCH 3/3] OpenACC 2.6 manual deep copy support (attach/detach)

2018-11-10 Thread Julian Brown

This patch implements the bulk of support for OpenACC 2.6 manual deep
copy for the C, C++ and Fortran front-ends, the middle end and the
libgomp runtime.  I've incorporated parts of the patches previously
posted by Cesar:

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01941.html
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01942.html
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01943.html
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01946.html

The patch also supersedes the patch posted earlier to support OpenACC 2.5
"update" directives with Fortran derived types:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00153.html

Some brief notes:

 * Struct members mapped with a tuple of map(to/from), optional pset and
   an always_pointer are rewritten in gimplify_scan_omp_clauses to use
   a new GOMP_MAP_ATTACH mapping type instead of the final
   GOMP_MAP_ALWAYS_POINTER. Explicit "attach" clauses also use the
   GOMP_MAP_ATTACH mapping, and explicit "detach" uses GOMP_MAP_DETACH.

   This means that the new "attach operation" takes place when, and only
   when, the GOMP_MAP_ATTACH appears explicitly in the list of clauses
   (as rewritten by gimplify.c).  Similarly for GOMP_MAP_DETACH.

 * The runtime needs to keep track of potentially multiple "attachment
   counters" for each mapped struct/derived type.  The way I've
   implemented this is as a simple array of shorts, where each element
   maps 1-to-1 onto logical "slots" in the mapped struct.  The attachment
   counters are associated with the block of memory containing the
   structure in the host's address space, hence the array is allocated
   on-demand in the splay_tree_key_s structure.  This does unfortunately
   grow that structure a little in all cases.

Tested alongside the other patches in the series and bootstrapped. OK?

Julian

ChangeLog

gcc/c-family/
* c-pragma.h (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_ATTACH,
PRAGMA_OACC_CLAUSE_DETACH.

gcc/c/
* c-parser.c (c_parser_omp_clause_name): Add parsing of attach and
detach clauses.
(c_parser_omp_variable_list): Allow deref (->) in variable lists.
(c_parser_oacc_data_clause): Support attach and detach clauses.
(c_parser_oacc_all_clauses): Likewise.
(OACC_DATA_CLAUSE_MASK, OACC_ENTER_DATA_CLAUSE_MASK)
(OACC_KERNELS_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK): Add
PRAGMA_OACC_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DETACH.
* c-typeck.c (handle_omp_array_sections_1): Reject subarrays for attach
and detach.  Support deref.
(c_oacc_check_attachments): New function.
(c_finish_omp_clauses): Check attach/detach arguments for being
pointers using above.  Support deref.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Support attach and detach
clauses.
(cp_parser_omp_var_list_no_open): Support deref.
(cp_parser_oacc_data_clause): Support attach and detach clauses.
(cp_parser_oacc_all_clauses): Likewise.
(OACC_DATA_CLAUSE_MASK, OACC_ENTER_DATA_CLAUSE_MASK)
(OACC_KERNELS_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK): Add
PRAGMA_OACC_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DETACH.
* semantics.c (handle_omp_array_sections_1): Reject subarrays for
attach and detach.
(cp_oacc_check_attachments): New function.
(finish_omp_clauses): Use above function.  Allow structure fields and
class members to appear in OpenACC data clauses.  Support deref.

gcc/fortran/
* gfortran.h (gfc_omp_map_op): Add OMP_MAP_ATTACH, OMP_MAP_DETACH.
* openmp.c (gfc_match_omp_variable_list): Add allow_derived parameter.
Parse derived-type member accesses if true.
(omp_mask2): Add OMP_CLAUSE_ATTACH, OMP_CLAUSE_DETACH.
(gfc_match_omp_map_clause): Add allow_derived parameter.  Pass to
gfc_match_omp_variable_list.
(gfc_match_omp_clauses): Support attach and detach.  Support derived
types for appropriate OpenACC directives.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES)
(OACC_ENTER_DATA_CLAUSES): Add OMP_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSES): Add OMP_CLAUSE_DETACH.
(check_symbol_not_pointer): Don't disallow pointer objects of derived
type.
(resolve_oacc_data_clauses): Don't disallow allocatable derived types.
(resolve_omp_clauses): Perform duplicate checking only for non-derived
type component accesses (plain variables and arrays or array sections).
Support component refs.
* trans-openmp.c (gfc_omp_privatize_by_reference): Support component
refs.
(gfc_trans_omp_clauses): Support component refs, attach and detach
clauses.

gcc/
* gimplify.c (gimplify_omp_var_data): Add GOVD_MAP_HAS_ATTACHMENTS.

[PATCH 2/3] Factor out duplicate code in gimplify_scan_omp_clauses

2018-11-10 Thread Julian Brown

This patch, created while trying to figure out the open-coded linked-list
handling in gimplify_scan_omp_clauses, factors out four somewhat
repetitive portions of that function into two new outlined functions.
This was done largely mechanically; the actual lines of executed code are
more-or-less the same.  That means the interfaces to the new functions
is somewhat eccentric though, and could no doubt be improved.  I've tried
to add commentary to the best of my understanding, but suggestions for
improvements are welcome!

As a bonus, one apparent bug introduced during an earlier refactoring
to use the polynomial types has been fixed (I think!): "known_eq (o1,
2)" should have been "known_eq (o1, o2)".

Tested alongside other patches in this series and the async patches. OK?

ChangeLog

gcc/
* gimplify.c (insert_struct_component_mapping)
(check_base_and_compare_lt): New.
(gimplify_scan_omp_clauses): Outline duplicated code into calls to
above two functions.
---
 gcc/gimplify.c | 307 -
 1 file changed, 174 insertions(+), 133 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 61dca24..274edc0 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7967,6 +7967,160 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
   return 1;
 }
 
+/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
+   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
+   the struct node to insert the new mapping after (when the struct node is
+   initially created).  PREV_NODE is the first of two or three mappings for a
+   pointer, and is either:
+ - the node before C, when a pair of mappings is used, e.g. for a C/C++
+   array section.
+ - not the node before C.  This is true when we have a reference-to-pointer
+   type (with a mapping for the reference and for the pointer), or for
+   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
+   If SCP is non-null, the new node is inserted before *SCP.
+   if SCP is null, the new node is inserted before PREV_NODE.
+   The return type is:
+ - PREV_NODE, if SCP is non-null.
+ - The newly-created ALLOC or RELEASE node, if SCP is null.
+ - The second newly-created ALLOC or RELEASE node, if we are mapping a
+   reference to a pointer.  */
+
+static tree
+insert_struct_component_mapping (enum tree_code code, tree c, tree struct_node,
+ tree prev_node, tree *scp)
+{
+  enum gomp_map_kind mkind = (code == OMP_TARGET_EXIT_DATA
+			  || code == OACC_EXIT_DATA)
+			 ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  tree cl = scp ? prev_node : c2;
+  OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
+  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
+  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
+  if (struct_node)
+OMP_CLAUSE_CHAIN (struct_node) = c2;
+
+  /* We might need to create an additional mapping if we have a reference to a
+ pointer (in C++).  Don't do this if we have something other than a
+ GOMP_MAP_ALWAYS_POINTER though, i.e. a GOMP_MAP_TO_PSET.  */
+  if (OMP_CLAUSE_CHAIN (prev_node) != c
+  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
+  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+	  == GOMP_MAP_ALWAYS_POINTER))
+{
+  tree c4 = OMP_CLAUSE_CHAIN (prev_node);
+  tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  OMP_CLAUSE_SET_MAP_KIND (c3, mkind);
+  OMP_CLAUSE_DECL (c3) = unshare_expr (OMP_CLAUSE_DECL (c4));
+  OMP_CLAUSE_SIZE (c3) = TYPE_SIZE_UNIT (ptr_type_node);
+  OMP_CLAUSE_CHAIN (c3) = prev_node;
+  if (!scp)
+	OMP_CLAUSE_CHAIN (c2) = c3;
+  else
+	cl = c3;
+}
+
+  if (scp)
+*scp = c2;
+
+  return cl;
+}
+
+/* Called initially with ORIG_BASE non-null, sets PREV_BITPOS and PREV_POFFSET
+   to the offset of the field given in BASE.  Return type is 1 if BASE is equal
+   to *ORIG_BASE after stripping off ARRAY_REF and INDIRECT_REF nodes and
+   calling get_inner_reference, else 0.
+
+   Called subsequently with ORIG_BASE null, compares the offset of the field
+   given in BASE to PREV_BITPOS, PREV_POFFSET. Returns -1 if the base object
+   has changed, 0 if the new value has a higher bit position than that
+   described by the aforementioned arguments, or 1 if the new value is less
+   than them.  Used for (insertion) sorting components after a GOMP_MAP_STRUCT
+   mapping.  */
+
+static int
+check_base_and_compare_lt (tree base, tree *orig_base, tree decl,
+			   poly_int64 *prev_bitpos,
+			   poly_offset_int *prev_poffset)
+{
+  tree offset;
+  poly_int64 bitsize, bitpos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep = 0;
+  poly_offset_int poffset;
+
+  if (orig_base)
+{
+  while (TREE_CODE (base) == ARRAY_REF)
+	base = 

[PATCH 1/3] Host-to-device transfer coalescing & magic offset value self-documentation

2018-11-10 Thread Julian Brown

This patch (by Cesar, with some minor additional changes) replaces usage
of several magic constants in target.c with named macros, and replaces
the flat array of size_t pairs used for coalescing host-to-device copies
with an array of a new struct with start/end fields instead.

Tested and bootstrapped alongside the other patches in this series
(plus the async patches).. OK?

Julian


ChangeLog

libgomp/
* libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define.
* target.c (FIELD_TGT_EMPTY): Define.
(gomp_coalesce_chunk): New.
(gomp_coalesce_buf): Use above instead of flat array of size_t pairs.
(gomp_coalesce_buf_add): Adjust for above change.
(gomp_copy_host2dev): Likewise.
(gomp_map_val): Use OFFSET_* macros instead of magic constants.  Write
as switch instead of list of ifs.
(gomp_map_vars_async): Adjust for gomp_coalesce_chunk change.  Use
OFFSET_* macros.
---
 libgomp/libgomp.h |   5 +++
 libgomp/target.c  | 101 --
 2 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index dac8dc4..cb25e86 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -902,6 +902,11 @@ struct target_mem_desc {
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
 
+/* Special offset values.  */
+#define OFFSET_INLINED (~(uintptr_t) 0)
+#define OFFSET_POINTER (~(uintptr_t) 1)
+#define OFFSET_STRUCT (~(uintptr_t) 2)
+
 struct splay_tree_key_s {
   /* Address of the host object.  */
   uintptr_t host_start;
diff --git a/libgomp/target.c b/libgomp/target.c
index f3e2332..2bfc7e2 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,6 +45,8 @@
 #include "plugin-suffix.h"
 #endif
 
+#define FIELD_TGT_EMPTY (~(size_t) 0)
+
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -205,8 +207,14 @@ goacc_device_copy_async (struct gomp_device_descr *devicep,
 }
 }
 
-/* Infrastructure for coalescing adjacent or nearly adjacent (in device addresses)
-   host to device memory transfers.  */
+/* Infrastructure for coalescing adjacent or nearly adjacent (in device
+   addresses) host to device memory transfers.  */
+
+struct gomp_coalesce_chunk
+{
+  /* The starting and ending point of a coalesced chunk of memory.  */
+  size_t start, end;
+};
 
 struct gomp_coalesce_buf
 {
@@ -214,10 +222,10 @@ struct gomp_coalesce_buf
  it will be copied to the device.  */
   void *buf;
   struct target_mem_desc *tgt;
-  /* Array with offsets, chunks[2 * i] is the starting offset and
- chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address
+  /* Array with offsets, chunks[i].start is the starting offset and
+ chunks[i].end ending offset relative to tgt->tgt_start device address
  of chunks which are to be copied to buf and later copied to device.  */
-  size_t *chunks;
+  struct gomp_coalesce_chunk *chunks;
   /* Number of chunks in chunks array, or -1 if coalesce buffering should not
  be performed.  */
   long chunk_cnt;
@@ -250,14 +258,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
 {
   if (cbuf->chunk_cnt < 0)
 	return;
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  cbuf->chunk_cnt = -1;
 	  return;
 	}
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP)
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end + MAX_COALESCE_BUF_GAP)
 	{
-	  cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len;
+	  cbuf->chunks[cbuf->chunk_cnt-1].end = start + len;
 	  cbuf->use_cnt++;
 	  return;
 	}
@@ -267,8 +275,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
   if (cbuf->use_cnt == 1)
 	cbuf->chunk_cnt--;
 }
-  cbuf->chunks[2 * cbuf->chunk_cnt] = start;
-  cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len;
+  cbuf->chunks[cbuf->chunk_cnt].start = start;
+  cbuf->chunks[cbuf->chunk_cnt].end = start + len;
   cbuf->chunk_cnt++;
   cbuf->use_cnt = 1;
 }
@@ -300,20 +308,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
   if (cbuf)
 {
   uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start;
-  if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (doff < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  long first = 0;
 	  long last = cbuf->chunk_cnt - 1;
 	  while (first <= last)
 	{
 	  long middle = (first + last) >> 1;
-	  if (cbuf->chunks[2 * middle + 1] <= doff)
+	  if (cbuf->chunks[middle].end <= doff)
 		first = middle + 1;
-	  else if (cbuf->chunks[2 * middle] <= doff)
+	  else if (cbuf->chunks[middle].start <= doff)
 		{
-		  if (doff + sz > cbuf->chunks[2 * middle + 1])
+		  if (doff + sz > cbuf->chunks[middle].end)
 		gomp_fatal 

[PATCH 0/3] OpenACC 2.6 manual deep copy support (attach/detach)

2018-11-10 Thread Julian Brown

Hi,

This patch series adds support for OpenACC 2.6's "manual deep copy"
feature.  This consists of three main parts:

 * Variable lists in data clauses can specify members of structs (in
   C/C++) or derived types (in Fortran). In C/C++ we allow either "." or
   "->" to be used to select members of structs or pointers-to-structs
   respectively. Fortran uses "%", as in the base language.

 * Struct and derived type members that are pointers trigger new
   "attach" and "detach" operations. The typical supported case is for a
   struct to be copied verbatim to the target initially. Subsequently,
   attach operations can be used to rewrite pointers to host memory
   contained in the struct to point to device memory instead. In this
   way, structs (and derived types) can be used fairly naturally in
   offloaded code. The detach operation restores pointers to point to
   host memory, i.e. before the whole struct is copied verbatim back to
   the host again.

 * There are new explicit attach/detach clauses in all three supported
   languages, as well as new behaviour for existing clauses
   (copy/copyin/copyout, etc.) implementing the semantics described above.

For more details, see the OpenACC 2.6 spec, or the Deep Copy Attach and
Detach Technical Report (TR-16-1) on the OpenACC site.

The patches are split into three parts, the first two of which are
tangentially-related cleanups, and the third of which contains the bulk
of the changes. I'll write more about those in their respective emails.

This patch series relies on the libgomp async implementation rework done
by Chung-Lin, posted previously:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01424.html

Julian Brown (3):
  Host-to-device transfer coalescing & magic offset value
self-documentation
  Factor out duplicate code in gimplify_scan_omp_clauses
  OpenACC 2.6 manual deep copy support (attach/detach)

 gcc/c-family/c-pragma.h|   2 +
 gcc/c/c-parser.c   |  34 +-
 gcc/c/c-typeck.c   |  59 +++-
 gcc/cp/parser.c|  38 +-
 gcc/cp/semantics.c |  75 +++-
 gcc/fortran/gfortran.h |   2 +
 gcc/fortran/openmp.c   | 145 +---
 gcc/fortran/trans-openmp.c |  78 -
 gcc/gimplify.c | 390 +
 gcc/omp-low.c  |   3 +
 gcc/testsuite/c-c++-common/goacc/mdc-1.c   |  54 +++
 gcc/testsuite/c-c++-common/goacc/mdc-2.c   |  62 
 gcc/testsuite/g++.dg/goacc/mdc.C   |  68 
 gcc/testsuite/gfortran.dg/goacc/data-clauses.f95   |  38 +-
 gcc/testsuite/gfortran.dg/goacc/derived-types.f90  |  77 
 .../gfortran.dg/goacc/enter-exit-data.f95  |  24 +-
 gcc/tree-pretty-print.c|   9 +
 include/gomp-constants.h   |   8 +
 libgomp/libgomp.h  |  23 +-
 libgomp/libgomp.map|  10 +
 libgomp/oacc-async.c   |   4 +-
 libgomp/oacc-int.h |   2 +-
 libgomp/oacc-mem.c |  86 -
 libgomp/oacc-parallel.c| 220 +---
 libgomp/openacc.h  |   6 +
 libgomp/target.c   | 292 ---
 .../libgomp.oacc-c-c++-common/deep-copy-1.c|  24 ++
 .../libgomp.oacc-c-c++-common/deep-copy-2.c|  29 ++
 .../libgomp.oacc-c-c++-common/deep-copy-3.c|  34 ++
 .../libgomp.oacc-c-c++-common/deep-copy-4.c|  87 +
 .../libgomp.oacc-c-c++-common/deep-copy-5.c|  81 +
 .../testsuite/libgomp.oacc-fortran/deep-copy-1.f90 |  35 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-2.f90 |  33 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-3.f90 |  34 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-4.f90 |  49 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-5.f90 |  57 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-6.f90 |  61 
 .../testsuite/libgomp.oacc-fortran/deep-copy-7.f90 |  89 +
 .../testsuite/libgomp.oacc-fortran/deep-copy-8.f90 |  41 +++
 .../libgomp.oacc-fortran/derived-type-1.f90|  28 ++
 .../testsuite/libgomp.oacc-fortran/update-2.f90| 284 +++
 41 files changed, 2406 insertions(+), 369 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/mdc-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/mdc-2.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/mdc.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-types.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/d

Re: Default compute dimensions (runtime)

2018-10-05 Thread Julian Brown
Hi,

Continuing the thread from here:

https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00198.html

On Wed, 3 Feb 2016 19:52:09 +0300
Alexander Monakov  wrote:

> On Wed, 3 Feb 2016, Nathan Sidwell wrote:
> > You can only override at runtime those dimensions that you said
> > you'd override at runtime when you compiled your program.  
> 
> Ah, I see.  That's not obvious to me, so perhaps added documentation
> can be expanded to explain that?  (I now see that the plugin silently
> drops user-provided dimensions where a value recorded at compile time
> is present; not sure if that'd be worth a runtime diagnostic, could
> be very noisy) 

This version of the patch has slightly-expanded documentation.

> > > I don't see why you say that because cuDeviceGetAttribute provides
> > > CU_DEVICE_ATTRIBUTE_WARP_SIZE,
> > > CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK,
> > > CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X (which is not too useful for
> > > this case) and cuFuncGetAttribute that allows to get a
> > > per-function thread limit. There's a patch on gomp-nvptx branch
> > > that adds querying some of those to the plugin.  
> > 
> > thanks.  There doesn't appear to be one for number of physical CTAs
> > though, right?  
> 
> Sorry, I don't understand the question: CTA is a logical entity.  One
> could derive limit of possible concurrent CTAs from number of SMs
> (CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT) multiplied by how many
> CTAs fit on one multiprocessor.  The latter figure can be taken as a
> rough worst-case value, or semi-intelligent per-kernel estimate based
> on register limits (there's code on gomp-nvptx branch that does
> this), or one can use the cuOcc* API to ask the driver for a precise
> per-kernel figure.

While the runtime part of the patch already appears to have been
committed as part of the following patch:

https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01589.html

The compile-time part of the patch has not made it upstream yet. Thus,
this rebased and retested patch consists of the parsing changes (for
-fopenacc-dim=X:Y:Z, allowing '-') and warning changes (for strange
partitioning choices), plus associated testsuite adjustments.

Tested with offloading to NVPTX and bootstrapped.

OK for trunk?

Thanks,

Julian

20xx-xx-xx  Nathan Sidwell  
Tom de Vries  
Thomas Schwinge  
Julian Brown  

gcc/
* doc/invoke.texi (fopenacc-dim): Update.
* omp-offload.c (oacc_parse_default_dims): Update.
(oacc_validate_dims): Emit warnings about strange partitioning choices.

gcc/testsuite/
* c-c++-common/goacc/acc-icf.c: Update.
* c-c++-common/goacc/parallel-dims-1.c: Likewise.
* c-c++-common/goacc/parallel-reduction.c: Likewise.
* c-c++-common/goacc/pr70688.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/routine-4.f90: Likewise.
* gfortran.dg/goacc/routine-level-of-parallelism-1.f90: Likewise.
* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add -w.
* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-warn-1.c: New.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/private-variables.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/pr84028.f90: Likewise.
* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
    * testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New.
commit a918a8739ae7652250c978b0ececa181a587b0c0
Author: Julian Brown 
Date:   Fri Oct 5 11:11:47 2018 -0700

    OpenACC default compute dimensions

20xx-xx-xx  Nathan Sidw

Re: [PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-10-04 Thread Julian Brown
On Sun, 23 Sep 2018 10:48:52 +0200
Bernhard Reutner-Fischer  wrote:

> On Sat, 22 Sep 2018 at 00:32, Julian Brown 
> wrote:
> 
> @@ -6218,13 +6221,20 @@ add_clause (gfc_symbol *sym, gfc_omp_map_op
> map_op) {
>gfc_omp_namelist *n;
> 
> +  if (!module_oacc_clauses)
> +module_oacc_clauses = gfc_get_omp_clauses ();
> +
> +  if (sym->backend_decl == NULL)
> +gfc_get_symbol_decl (sym);
> +
> +  for (n = module_oacc_clauses->lists[OMP_LIST_MAP]; n != NULL; n =
> n->next)
> +if (n->sym->backend_decl == sym->backend_decl)
> +  return;
> +
> 
> Didn't look too close, but should this throw an error instead of
> silently returning, or was the error emitted earlier?

The purpose of this fragment seems not to have been to do with error
reporting at all, but rather to do with de-duplicating symbols that
are listed (once) in clauses of "declare" directives in module blocks.
Variables that are listed twice are diagnosed elsewhere.

As for why the de-duplication is necessary, it seems to be because of
the way that modules are instantiated in programs and in subroutines.
E.g. in declare-allocatable-1.f90, we have something along the lines of:

  module vars
implicit none
integer, parameter :: n = 100
real*8, allocatable :: b(:)
   !$acc declare create (b)
  end module vars

  program test
use vars
...
  end program test

  subroutine sub1
use vars
...
  end subroutine sub1

  subroutine sub2
use vars
...
  end subroutine sub2

The function find_module_oacc_declare_clauses is called for each of
'test', 'sub1' and 'sub2'. But in trans-decl.c:finish_oacc_declare, the
new declare clauses are only attached to the namespace for a FL_PROGRAM
(i.e. 'test'), not for the subroutines. The module_oacc_clauses global
variable is reset only after moving the clauses to a FL_PROGRAM's
namespace, otherwise it accumulates.

Hence, with the above code, we'd scan 'test', find declare clauses, and
attach them to the namespace for 'test'. We'd then reset
module_oacc_clauses.

Then, we'd scan 'sub1', and accumulate declare clauses from 'vars' into
a fresh module_oacc_clauses.

Then we'd scan 'sub2', and accumulate declare clauses from 'vars'
again: this is why the de-duplication in the patch seemed to be
necessary.

This seems wrong to me though, and admits the possibility of clauses
instantiated in a subroutine "leaking" into a subsequent program block.
As a tentative fix, I've tried resetting module_oacc_clauses before
each time the find_module_oacc_declare_clauses traversal takes place,
and removing the de-duplication code.

This seems to work fine for the current tests in the testsuite, but I
wonder the reason that things weren't done like like that to start
with? The code dates back to 2015 (by James Norris):

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02367.html

> Furthermore the testcase uses "call abort" which is non-standard.
> We recently moved to "STOP n" in the testsuite, please adjust the new
> testcases accordingly.

Fixed. Re-tested with offloading to NVPTX and bootstrapped. OK?

Thank you,

Julian

ChangeLog

gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise.  Create local storage for
the values being pointed to.  Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.c (add_attributes_to_decl): Enable lowering of OpenACC
declare create, declare copyin and declare deviceptr clauses.
(find_module_oacc_declare_clauses): Relax oacc_declare_create to
OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to
match OpenACC 2.5 semantics.
(finish_oacc_declare): Reset module_oacc_clauses before scanning each
namespace.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_tran

Re: [patch,openacc] C, C++ OpenACC wait diagnostic change

2018-09-28 Thread Julian Brown
On Wed, 26 Sep 2018 14:08:37 -0700
Cesar Philippidis  wrote:

> On 09/26/2018 12:50 PM, Joseph Myers wrote:
> > On Wed, 26 Sep 2018, Cesar Philippidis wrote:
> >   
> >> Attached is an old patch which updated the C and C++ FEs to use
> >> %<)%> for the right ')' symbol. It's mostly a cosmetic change. All
> >> of the changes are self-contained to the OpenACC code path.  
> > 
> > Why is the "before ')'" included in the call to c_parser_error at
> > all? c_parser_error calls c_parse_error which adds its own " before
> > " and token description or expansion, so I'd expect the current
> > error to result in a message ending in something of the form
> > "before X before Y".  

> Julian, I need to start working on deep copy in OpenACC. Can you take
> over this patch? The error handling code in the C FE needs to be
> removed because it's dead.

I agree that the error-handling path in question in the C FE is dead.
The difference is that in C, c_parser_oacc_wait_list parses the open
parenthesis, the list and then the close parenthesis separately, and so
a token sequence like:

   (1

will return an expression list of length 1. In the C++ FE rather, a
cp_parser_parenthesized_expression_list is parsed all in one go, and if
the input is not that well-formed sequence then NULL is returned (or a
zero-length vector for an empty list).

But for C, it does not appear that c_parser_expr_list has a code path
that can return a zero-length list at all. So, we can elide the
diagnostic with no change to compiler behaviour. This patch does that,
and also changes the C++ diagnostic, leading to errors being reported
like:

diag.c: In function 'int main(int, char*)':
diag.c:6:59: error: expected ')' before end of line
6 | #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1
  | ~ ^
  |   )
diag.c:6:59: error: expected integer expression list before end of line 

Actually I'm not too sure how useful the second error line is. Maybe we
should just remove it to improve consistency between C & C++?

The attached has been tested with offloading to nvptx and bootstrapped.
OK?

Thanks,

Julian

2018-XX-YY  James Norris  
Cesar Philippidis  
Julian Brown  

gcc/c/
* c-parser.c (c_parser_oacc_wait_list): Remove dead diagnostic
code.

gcc/cp/
* parser.c (cp_parser_oacc_wait_list): Change error message.

gcc/testsuite/
    * c-c++-common/goacc/asyncwait-1: Update expected errors.
commit 3a59bdbccc3c2383c0056c74797d698c7d81dce2
Author: Julian Brown 
Date:   Fri Sep 28 05:52:55 2018 -0700

OpenACC wait list diagnostic change

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1f173fc..92a8089 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11597,14 +11597,6 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 return list;
 
   args = c_parser_expr_list (parser, false, true, NULL, NULL, NULL, NULL);
-
-  if (args->length () == 0)
-{
-  c_parser_error (parser, "expected integer expression before ')'");
-  release_tree_vector (args);
-  return list;
-}
-
   args_tree = build_tree_list_vec (args);
 
   for (t = args_tree; t; t = TREE_CHAIN (t))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6696f17..43128e0 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32086,7 +32086,7 @@ cp_parser_oacc_wait_list (cp_parser *parser, location_t clause_loc, tree list)
 
   if (args == NULL || args->length () == 0)
 {
-  cp_parser_error (parser, "expected integer expression before ')'");
+  cp_parser_error (parser, "expected integer expression list");
   if (args != NULL)
 	release_tree_vector (args);
   return list;
diff --git a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
index e1840af..2fc8948 100644
--- a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
@@ -116,7 +116,7 @@ f (int N, float *a, float *b)
 }
 
 #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-/* { dg-error "expected integer expression before '\\\)'" "" { target c++ } .-1 } */
+/* { dg-error "expected integer expression list before" "" { target c++ } .-1 } */
 {
 for (ii = 0; ii < N; ii++)
 b[ii] = a[ii];
@@ -171,7 +171,7 @@ f (int N, float *a, float *b)
 #pragma acc wait (1,2,,) /* { dg-error "expected (primary-|)expression before" } */
 
 #pragma acc wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-/* { dg-error "expected integer expression before '\\\)'" "" { target c++ } .-1 } */
+/* { dg-error "expected integer expression list before" "" { target c++ } .-1 } */
 
 #pragma acc wait (1,*) /* { dg-error "expected (primary-|)expression before" } */
 


Re: [PATCH][OpenACC] Update deviceptr handling during gimplification

2018-09-25 Thread Julian Brown
On Tue, 7 Aug 2018 15:09:38 -0700
Cesar Philippidis  wrote:

> I had previously posted this patch as part of a monster deviceptr
> patch here
> . This
> patch breaks out the generic gimplifier changes. Essentially, with
> this patch, the gimplifier will now transfer deviceptr data clauses
> using GOMP_MAP_FORCE_DEVICEPTR.
> 
> Is this patch OK for trunk? It bootstrapped / regression tested
> cleanly for x86_64 with nvptx offloading.

This patch also appears to fix the attached test case, which had been
associated with a different deviceptr-related patch on the og8 branch
(the other parts of which are upstream already). Perhaps you'd like to
incorporate this test into your patch? It was by James Norris
originally, IIUC.

Thanks,

Julian

ChangeLog

2018-xx-xx  James Norris  

libgomp/
* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test.
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
@@ -0,0 +1,197 @@
+! { dg-do run }
+
+! Test the deviceptr clause with various directives
+! and in combination with other directives where
+! the deviceptr variable is implied.
+
+subroutine subr1 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc data deviceptr (a)
+
+  !$acc parallel copy (b)
+do i = 1, N
+  a(i) = i * 2
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc end data
+
+end subroutine
+
+subroutine subr2 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  !$acc declare deviceptr (a)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc parallel copy (b)
+do i = 1, N
+  a(i) = i * 4
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr3 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  !$acc declare deviceptr (a)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc kernels copy (b)
+do i = 1, N
+  a(i) = i * 8
+  b(i) = a(i)
+end do
+  !$acc end kernels
+
+end subroutine
+
+subroutine subr4 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc parallel deviceptr (a) copy (b)
+do i = 1, N
+  a(i) = i * 16
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr5 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc kernels deviceptr (a) copy (b)
+do i = 1, N
+  a(i) = i * 32
+  b(i) = a(i)
+end do
+  !$acc end kernels
+
+end subroutine
+
+subroutine subr6 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc parallel deviceptr (a) copy (b)
+do i = 1, N
+  b(i) = i
+end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr7 (a, b)
+  implicit none
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: i = 0
+
+  !$acc data deviceptr (a)
+
+  !$acc parallel copy (b)
+do i = 1, N
+  a(i) = i * 2
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc parallel copy (b)
+do i = 1, N
+  a(i) = b(i) * 2
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc end data
+
+end subroutine
+
+program main
+  use iso_c_binding, only: c_ptr, c_f_pointer
+  implicit none
+  type (c_ptr) :: cp
+  integer, parameter :: N = 8
+  integer, pointer :: fp(:)
+  integer :: i = 0
+  integer :: b(N)
+
+  interface
+function acc_malloc (s) bind (C)
+  use iso_c_binding, only: c_ptr, c_size_t
+  integer (c_size_t), value :: s
+  type (c_ptr) :: acc_malloc
+end function
+  end interface
+
+  cp = acc_malloc (N * sizeof (fp(N)))
+  call c_f_pointer (cp, fp, [N])
+
+  call subr1 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 2) call abort
+  end do
+
+  call subr2 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 4) call abort
+  end do
+
+  call subr3 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 8) call abort
+  end do
+
+  call subr4 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 16) call abort
+  end do
+
+  call subr5 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 32) call abort
+  end do
+
+  call subr6 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i) call abort
+  end do
+
+  call subr7 (fp, b)
+
+  do i = 1, N
+if (b(i) .ne. i * 4) call abort
+  end do
+
+end program main


Re: [PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-09-21 Thread Julian Brown
On Fri, 21 Sep 2018 03:14:22 +0200
Bernhard Reutner-Fischer  wrote:

> > diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
> > index 95ea615..2ac5908 100644
> > --- a/gcc/fortran/trans-array.c
> > +++ b/gcc/fortran/trans-array.c
> > @@ -88,6 +88,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "trans-types.h"
> >  #include "trans-array.h"
> >  #include "trans-const.h"
> > +#include "trans-stmt.h"
> >  #include "dependency.h"  
> 
> please dont mix declarations and definitions, i.e. please put
> gfc_trans_oacc_declare_allocate() into trans-openmp.c, and add the
> declaration to trans.h, in the corresponding /* In trans-openmp.c */
> block there.

Do you mean like this?

Thanks,

Julian

ChangeLog

2018-09-20  Cesar Philippidis  
Julian Brown  

gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise.  Create local storage for
the values being pointed to.  Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.c (add_attributes_to_decl): Enable lowering of OpenACC
declare create, declare copyin and declare deviceptr clauses.
(add_clause): Don't duplicate OpenACC declare clauses.  Populate
sym->backend_decl so that it can be used to determine if two symbols are
unique.
(find_module_oacc_declare_clauses): Relax oacc_declare_create to
OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to 
match OpenACC 2.5 semantics.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.c (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans.h (gfc_trans_oacc_declare_allocate): Declare.

gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.

include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.

libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-array.f90: New test.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test. 
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
    * testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
commit 2601a2c2c6222026baf0e73cd2d9694c64356e77
Author: Julian Brown 
Date:   Wed Sep 12 20:15:08 2018 -0700

Fortran "declare create"/allocate support for OpenACC

	gcc/
	* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
	create, declare copyin and declare deviceptr to have local lifetimes.
	(convert_to_firstprivate_int): Handle pointer types.
	(convert_from_firstprivate_int): Likewise.  Create local storage for
	the values being pointed to.  Add new orig_type argument.
	(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
	Add orig_type argument to convert_from_firstprivate_int call.
	Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
	firstprivate VLAs.
	* tree-pretty-print.c (dump_omp_clause): Handle
	GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

	gcc/fo

[PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-09-20 Thread Julian Brown
This patch (a combination of several previous patches by Cesar) adds
support for OpenACC 2.5's "declare create" directive with Fortran
allocatable variables (2.13.2. create clause). Allocate and deallocate
statements now allocate/deallocate memory on the target device as well
as on the host.

This works by triggering expansion of executable OpenACC
directives ("enter data" or "exit data") with new
GOMP_MAP_DECLARE_ALLOCATE or GOMP_MAP_DECLARE_DEALLOCATE clauses when
those statements are seen. Unlike other OpenACC functionality, no
additional explicit markup is required in the user's code.

This patch depends on the patch implementing GOMP_MAP_FIRSTPRIVATE_INT
for OpenACC, posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01202.html

Tested alongside that patch with offloading to NVPTX, and bootstrapped.
OK for trunk?

Thanks,

Julian

ChangeLog

2018-09-20  Cesar Philippidis  
Julian Brown  

gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise.  Create local storage for
the values being pointed to.  Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (trans-stmt.h): Include.
(gfc_array_allocate): Call gfc_trans_oacc_declare_allocate for decls
that have oacc_declare_create attribute set.
* trans-decl.c (add_attributes_to_decl): Enable lowering of OpenACC
declare create, declare copyin and declare deviceptr clauses.
(add_clause): Don't duplicate OpenACC declare clauses.  Populate
sym->backend_decl so that it can be used to determine if two symbols are
unique.
(find_module_oacc_declare_clauses): Relax oacc_declare_create to
OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to 
match OpenACC 2.5 semantics.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.c (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans-stmt.h (gfc_trans_oacc_declare_allocate): Declare.

gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.

include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.

libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-array.f90: New test.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test. 
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
>From b63d0329fb73679b07f6318b8dd092113d5c8505 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Wed, 12 Sep 2018 20:15:08 -0700
Subject: [PATCH 2/2] Fortran "declare create"/allocate support for OpenACC

	gcc/
	* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
	create, declare copyin and declare deviceptr to have local lifetimes.
	(convert_to_firstprivate_int): Handle pointer types.
	(convert_from_firstprivate_int): Likewise.  Create local storage for
	the values being pointed to.  Add new orig_type argument.
	(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
	Add orig_type argument to convert_from_firstprivate_int call.
	Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
	firstprivate VLAs.
	* tree-pretty-print.c (dump_omp_clause): Handle
	GOMP_MA

[PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-09-20 Thread Julian Brown
This patch (by Cesar) changes the way that mapping of firstprivate
scalars works for OpenACC. For scalars whose type has a size equal to or
smaller than the size of a pointer, rather than copying the value of
the scalar to the target device and having a separate mapping for a
pointer to the copied value, a single "pointer" is mapped whose bits
are a type-punned representation of the value itself.

This is a performance optimisation: the idea, IIUC, is that it is a
good idea to avoid having all launched compute resources contend for a
single memory location -- the pointed-to cell containing the scalar on
the device, in this case. Cesar talks about speedups obtained here
(for an earlier version of the patch):

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02171.html

The patch implies an API change for the libgomp plugin, in that it must
now understand that NULL device pointers correspond to host pointers
that are actually type-punned scalars.

Tested with offloading to NVPTX and bootstrapped. OK for mainline?

Julian

ChangeLog

2018-09-20  Cesar Philippidis  
        Julian Brown  

gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New
test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
>From 1263a1bef1780fd015f9ee937c2b2df2717f1603 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 17 Sep 2018 19:38:21 -0700
Subject: [PATCH 1/2] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
---
 gcc/omp-low.c  | 171 +++--
 libgomp/oacc-parallel.c|   7 +-
 libgomp/plugin/plugin-nvptx.c  |   2 +-
 .../testsuite/libgomp.oacc-c++/firstprivate-int.C  |  83 +
 .../libgomp.oacc-c-c++-common/firstprivate-int.c   |  67 +++
 .../libgomp.oacc-fortran/firstprivate-int.f90  | 205 +
 6 files changed, 518 insertions(+), 17 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/firstprivate-int.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fdabf67..5fc4a66 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3264,6 +3264,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -7470,6 +7483,88 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something
+   that can be represented by a POINTER_SIZED_INT_NODE.  Any new
+   instructions are appended to GS.  This is primarily used to
+   optimize firstprivate variables, so that small types (less
+   precision than POINTER_SIZE) do not require additional data
+   mappings. */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  if (omp_is_reference (var))
+type = TREE_TYPE (type);
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+{
+  if (omp_is_reference (var))
+	{
+	  tmp = create_tmp_var (type);
+	  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+	  var = tmp;
+	}
+
+  return fold_convert (pointer_sized_int_node, var);
+}
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTE

Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-19 Thread Julian Brown
On Fri, 14 Sep 2018 22:49:35 -0400
Julian Brown  wrote:

> > > On 12/09/18 16:16, Richard Biener wrote:
> > > It may well be that there's a better way to solve the problem, or
> > > at least to do the lookups.
> > > 
> > > It may also be that there are some unintended consequences, such
> > > as false name matches, but I don't know of any at present.

> > Possibly, this was an abuse of these hooks, but it's arguably wrong
> > that that e.g. handle_alias_pairs has the "assembler name" leak
> > through into the user's source code -- if it's expected that the
> > hook could make arbitrary transformations to the string. (The
> > latter hook is only used by PE code for x86 at present, by the look
> > of it, and the default handles only special-purpose mangling
> > indicated by placing a '*' at the front of the symbol.)  

Two places I've found that currently expose the underlying symbol name
in the user's source code: one (documented!) is C++, where one must
write the mangled symbol name as the alias target:

int foo (int c) { ... }
int bar (int) __attribute__((alias("_Z3fooi")));

another (perhaps obscure) is x86/PE with "fastcall":

__attribute__((fastcall)) void foo(void) { ... }
void bar(void) __attribute__((alias("@foo@0")));

both of which probably suggest that using the decl name, rather than
demangling the assembler name (or using some completely different
solution) was the wrong thing to do.

I'll keep thinking about this...

Julian


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-14 Thread Julian Brown
On Wed, 12 Sep 2018 13:34:06 -0400
Julian Brown  wrote:

> On Wed, 12 Sep 2018 17:31:58 +0100
> Andrew Stubbs  wrote:
> 
> > On 12/09/18 16:16, Richard Biener wrote:  
> > > I think the symptom GCN sees needs to be better understood - like
> > > wheter it is generally OK to mangle things arbitrarily.
> > 
> > The name mangling is a horrible workaround for a bug in the HSA
> > runtime code (which we do not own, cannot fix, and would want to
> > support old versions of anyway).  Basically it refuses to load any
> > binary that has the same symbol as another, already loaded, binary,
> > regardless of the symbol's linkage.  Worse, it also rejects any
> > binary that has duplicate symbols within it, despite the fact that
> > it already linked just fine.
> > 
> > Adding the extra lookups is enough to build GCN binaries, with
> > mangled names, whereas the existing name mangling support was either
> > more specialized or bit rotten (I don't know which).
> > 
> > It may well be that there's a better way to solve the problem, or
> > at least to do the lookups.
> > 
> > It may also be that there are some unintended consequences, such as 
> > false name matches, but I don't know of any at present.
> > 
> > Julian, can you comment, please?  
> 
> I did the local-symbol name mangling in two places:
> 
> - The ASM_FORMAT_PRIVATE_NAME macro (good for local statics)
> - The TARGET_MANGLE_DECL_ASSEMBLER_NAME hook (for file-scope
>   local/statics)
> 
> Possibly, this was an abuse of these hooks, but it's arguably wrong
> that that e.g. handle_alias_pairs has the "assembler name" leak
> through into the user's source code -- if it's expected that the hook
> could make arbitrary transformations to the string. (The latter hook
> is only used by PE code for x86 at present, by the look of it, and
> the default handles only special-purpose mangling indicated by
> placing a '*' at the front of the symbol.)

One possibility might be to allow
symbol_table::decl_assembler_name_hash and
symbol_table::assembler_names_equal_p to be overridden by a target
hook, and define them for GCN to ignore the symbol "localisation"
magic. At the moment they will just ignore "*" at the start of a
symbol, and a (fixed) user label prefix, no matter what
TARGET_MANGLE_DECL_ASSEMBLER_NAME does.

Another way would be to do some appropriate mangling for local symbols
in the assembler, rather than the compiler (though we're using the LLVM
assembler, and so far have got away with not making any invasive
changes to that).

> If we had a symtab_node::get_for_name () using a suitable hash table,
> I think it'd probably be right to use that. Can that be done
> (easily), or is there some equivalent way? Introducing a new hash
> table everywhere for a bug workaround for a relatively obscure
> feature on a single target seems unfortunate.

An "obvious" solution of calling targetm.mangle_decl_assembler_name
before looking up in symtab_node::get_for_asmname, something like:

static void
handle_alias_pairs (void)
{
  alias_pair *p;
  unsigned i;

  for (i = 0; alias_pairs && alias_pairs->iterate (i, );)
{
  tree asmname = targetm.mangle_decl_assembler_name (p->decl, p->target);
  symtab_node *target_node = symtab_node::get_for_asmname (asmname);
  [...]

seems like it could possibly work for handle_alias_pairs, but not so
much for c-pragma.c:maybe_apply_pending_pragma_weaks, where there is no
decl available to pass as the first argument to the target hook.

Julian


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Julian Brown
On Wed, 12 Sep 2018 17:31:58 +0100
Andrew Stubbs  wrote:

> On 12/09/18 16:16, Richard Biener wrote:
> > I think the symptom GCN sees needs to be better understood - like
> > wheter it is generally OK to mangle things arbitrarily.  
> 
> The name mangling is a horrible workaround for a bug in the HSA
> runtime code (which we do not own, cannot fix, and would want to
> support old versions of anyway).  Basically it refuses to load any
> binary that has the same symbol as another, already loaded, binary,
> regardless of the symbol's linkage.  Worse, it also rejects any
> binary that has duplicate symbols within it, despite the fact that it
> already linked just fine.
> 
> Adding the extra lookups is enough to build GCN binaries, with
> mangled names, whereas the existing name mangling support was either
> more specialized or bit rotten (I don't know which).
> 
> It may well be that there's a better way to solve the problem, or at 
> least to do the lookups.
> 
> It may also be that there are some unintended consequences, such as 
> false name matches, but I don't know of any at present.
> 
> Julian, can you comment, please?

I did the local-symbol name mangling in two places:

- The ASM_FORMAT_PRIVATE_NAME macro (good for local statics)
- The TARGET_MANGLE_DECL_ASSEMBLER_NAME hook (for file-scope
  local/statics)

Possibly, this was an abuse of these hooks, but it's arguably wrong that
that e.g. handle_alias_pairs has the "assembler name" leak through into
the user's source code -- if it's expected that the hook could make
arbitrary transformations to the string. (The latter hook is only used
by PE code for x86 at present, by the look of it, and the default
handles only special-purpose mangling indicated by placing a '*' at the
front of the symbol.)

I couldn't find an existing place where the DECL_NAMEs for symbols were
indexed in a hash table, equivalent to the table for assembler names.
Aliases are made via pragmas, so it's not 100% clear to me what the
scoping/lookup rules are supposed to be for those anyway, nor what the
possibility or consequences might be of false matches.

(The "!target" case in maybe_apply_pending_pragma_weaks, if it doesn't
somehow make a false match, just slows down reporting of an error a
little, I think. Similarly in handle_alias_pairs.)

If we had a symtab_node::get_for_name () using a suitable hash table, I
think it'd probably be right to use that. Can that be done (easily), or
is there some equivalent way? Introducing a new hash table everywhere
for a bug workaround for a relatively obscure feature on a single
target seems unfortunate.

Thanks,

Julian


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-11 Thread Julian Brown
On Mon, 10 Sep 2018 20:31:49 -0400
Julian Brown  wrote:

> [...] I think the handling of references can and should match between
> the two APIs (though implementation details of the patch to make that
> so need a little work still).

Here's a new version of the patch, somewhat simplified and slightly more
obviously making the treatment of references between OpenMP and OpenACC
the same. I worried a little about the potential side-effects of making
ctx->target_firstprivatize_array_bases true for parallel and kernels
regions, but test results revealed no problems with doing that and I
think generated code may even be a little better (and more consistent)
in some cases.

For example, one case that is handled differently now is as follows:

#include 

__attribute__((noinline)) int
bar (int c)
{
  int arr[c];

#pragma acc parallel loop copy(arr) 
  for (int i = 0; i < c; i++)
arr[i] = i; 

  for (int i = 0; i < c; i++) 
if (arr[i] != i)
  abort ();

  return arr[c - 1]; 
}

int main (int argc, char *argv[])
{
  return bar (100);
}

The VLA was previously mapped as:

#pragma omp target oacc_parallel map(tofrom:*arr.1 [len: D.2607]) \
map(alloc:arr [pointer assign, bias: 0]) firstprivate(c)

and is now mapped as:

#pragma omp target oacc_parallel map(tofrom:*arr.1 [len: D.2607]) \
map(firstprivate:arr [pointer assign, bias: 0]) firstprivate(c)

Either works, but IIUC using firstprivate_pointer can be more efficient
if the pointer is dereferenced multiple times in a kernel, since a local
copy of the incoming mapped pointer is made per-thread/workitem.
Generally, array sections are already using firstprivate pointers for
their bases with OpenACC.

Re-tested with offloading to NVPTX and bootstrapped. OK, or any other
comments?

Thanks,

Julian

ChangeLog

2018-09-09  Cesar Philippidis  
    Julian Brown  

PR middle-end/86336

gcc/cp/
* semantics.c (finish_omp_clauses): Treat C++ references the same in
OpenACC as OpenMP.

* gimplify.c (gimplify_scan_omp_clauses): Set
target_firstprivatize_array_bases in OpenACC parallel and kernels
region contexts.  Remove GOMP_MAP_FIRSTPRIVATE_REFERENCE clauses from
OpenACC data regions.

libgomp/
* testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.
commit 6f3d5b86b4413722c3e7ab3ca9a678d7c35b68fe
Author: Julian Brown 
Date:   Thu Sep 6 15:32:50 2018 -0700

[OpenACC] C++ reference mapping

2018-09-09  Cesar Philippidis  
    	Julian Brown  

	PR middle-end/86336

	gcc/cp/
	* semantics.c (finish_omp_clauses): Treat C++ references the same in
	OpenACC as OpenMP.

	* gimplify.c (gimplify_scan_omp_clauses): Set
	target_firstprivatize_array_bases in OpenACC parallel and kernels
	region contexts.  Remove GOMP_MAP_FIRSTPRIVATE_REFERENCE clauses from
	OpenACC data regions.

	libgomp/
	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index f3e5d83..bf3c63a 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6878,7 +6878,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	handle_map_references:
 	  if (!remove
 	  && !processing_template_decl
-	  && (ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
+	  && ort != C_ORT_DECLARE_SIMD
 	  && TYPE_REF_P (TREE_TYPE (OMP_CLAUSE_DECL (c
 	{
 	  t = OMP_CLAUSE_DECL (c);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index dbd0f0e..f0eb04a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7513,6 +7513,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
   case OMP_TARGET_EXIT_DATA:
   case OACC_DECLARE:
   case OACC_HOST_DATA:
+  case OACC_PARALLEL:
+  case OACC_KERNELS:
 	ctx->target_firstprivatize_array_bases = true;
   default:
 	break;
@@ -8556,7 +8558,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
   if (code == OACC_DATA
 	  && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-	  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER)
+	  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
+	  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 	remove = true;
   if (remove)
 	*list_p = OMP_CLAUSE_CHAIN (c);
diff --git a/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C b/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
index 8e4b296..e5f8707 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
@@ -1,8 +1,7 @@
 // Ensure that a non-scalar dummy arguments which are implicitly used inside
 // offloaded regions are properly mapped using present_or_copy semantics.
 
-// { dg-xfail-if "TODO" { *-*-* } }
-// { dg-excess-errors "ICE" }
+// { dg-do run }
 
 #include 
 


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-10 Thread Julian Brown
On Mon, 10 Sep 2018 22:22:15 +0100
Jason Merrill  wrote:

> On Mon, Sep 10, 2018 at 7:07 PM, Julian Brown
>  wrote:
> > I think it's more accurate to say that OpenACC says nothing about
> > C++ references at all, nor about how unadorned pointers are mapped
> > in copy/copyin/copyout clauses. So arguably we get to choose
> > whatever we want, preferably based on the principle of least
> > surprise. (ICE'ing definitely counts as a surprise!)
> >
> > As noted in a previous email, PGI seems to treat pointers to
> > aggregates specially, mapping them as ptr[0:1], but it's unclear if
> > the same is true for pointers to scalars with their compiler.
> > Neither behaviour seems to be standard-mandated, but this patch
> > extends the idea to references to scalars nonetheless.  
> 
> That certainly seems like the most sensible way of handling references
> to non-arrays.  [...]

To try to clarify things for myself a bit, I tried to figure out better
what the current OpenMP behaviour in GCC is, and what the equivalent
OpenACC behaviour should be. I think the handling of references can and
should match between the two APIs (though implementation details of the
patch to make that so need a little work still).

Pointers (without array sections) are a little more awkward: going by
what OpenMP 4.5 and OpenACC 2.5 say, there does seem to be a deliberate
difference in mapping behaviour, at least for cases that are specified.

Previously, I was confusing the cases marked (*) and (**) below a
little. So, we have:

== OpenMP 4.5 =

#include 

int
main (int argc, char* argv[])
{
  int arr[32];
  int  = arr[16];
  int *myptr = [18];
  const char *sep = "";

  for (int i = 0; i < 32; i++)
arr[i] = i;

//#pragma omp target // mapped as firstprivate: no effect on host
//#pragma omp target defaultmap(tofrom:scalar) // works
#pragma omp target map(tofrom:myref) // works
  {
myref = 1000;
  }

#pragma omp target enter data map(to:arr[0:32])

//#pragma omp target // works, mapped as zero-length array section (*)
//#pragma omp target map(tofrom:myptr) // crashes (**)
#pragma omp target map(tofrom:myptr[0:1]) // works
  {
*myptr = 2000;
  }

#pragma omp target exit data map(from:arr[0:32])

  for (int i = 0; i < 32; i++, sep = ", ")
printf ("%s%d", sep, arr[i]);

  printf ("\n");

  return 0;
}


== OpenACC 2.5 

#include 

int
main (int argc, char* argv[])
{
  int arr[32];
  int  = arr[16];
  int *myptr = [18];
  const char *sep = "";

  for (int i = 0; i < 32; i++)
arr[i] = i;

//#pragma acc parallel // mapped as firstprivate: no effect on host
#pragma acc parallel copy(myref) // works
  {
myref = 1000;
  }

#pragma acc enter data copyin(arr[0:32])

//#pragma acc parallel // crashes (*)
//#pragma acc parallel copy(myptr) // crashes (**)
//#pragma acc parallel copy(myptr[0:1]) // works
//#pragma acc parallel present(myptr) // runtime error, not present
#pragma acc parallel present(myptr[0:1]) // works
  {
*myptr = 2000;
  }

#pragma acc exit data copyout(arr[0:32])

  for (int i = 0; i < 32; i++, sep = ", ")
printf ("%s%d", sep, arr[i]);

  printf ("\n");

  return 0;
}

===

The pointer-mapping cases marked (*), implicit mapping, are the ones
specified in OpenMP 4.5 to map as zero-length array sections. For
OpenACC the pointer is considered a scalar so is mapped as bits (so the
host pointer causes the target to crash on dereference).

The cases marked (**) -- also maybe applicable to C++ "this" --
currently copy as bits on OpenMP and on OpenACC, but could be changed
to map like length-one array sections. Or, they could raise a warning.
There's no apparent difference between OpenMP and OpenACC there though
(in specified behaviour and/or implementation? Despite what I thought
previously) so that's probably a decision for another day.

Cheers,

Julian


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-10 Thread Julian Brown
On Mon, 10 Sep 2018 10:52:47 -0700
Cesar Philippidis  wrote:

> On 09/10/2018 10:37 AM, Jason Merrill wrote:
> > On Mon, Sep 10, 2018 at 4:05 AM, Julian Brown
> >  wrote:  
> >> This patch (by Cesar) changes the way C++ references are mapped in
> >> OpenACC regions, fixing an ICE in the non-scalar-data.C testcase.
> >>
> >> Post-patch, references are mapped like this (from the omplower
> >> dump):
> >>
> >> map(force_present:*x [len: 4]) map(firstprivate ref:x [pointer
> >> assign, bias: 0])
> >>
> >> Tested with offloading to NVPTX and bootstrapped. OK for trunk?
> >>
> >> Thanks,
> >>
> >> Julian
> >>
> >> ChangeLog
> >>
> >> 2018-09-09  Cesar Philippidis  
> >> Julian Brown  
> >>
> >> PR middle-end/86336
> >>
> >> (gimplify_adjust_omp_clauses_1): Update handling of
> >> mapping of C++ references.  
> > 
> > How is reference handling specified differently between OpenMP and
> > OpenACC?  It seems strange for them to differ.  
> 
> Both OpenACC and OpenMP privatize mapped array pointers on the
> accelerator for subarrays in the same way. However, for pointers
> without subarrays, OpenMP treats them as zero-length arrays, whereas
> OpenACC treats them as ordinary scalars so that the pointer target
> will not get remapped on the accelerator (which is odd because
> there's a deviceptr clause for that). Scalars in C++ are special,
> because references must treated like an array of length one, for lack
> of a better terminology.

I think it's more accurate to say that OpenACC says nothing about C++
references at all, nor about how unadorned pointers are mapped in
copy/copyin/copyout clauses. So arguably we get to choose whatever we
want, preferably based on the principle of least surprise. (ICE'ing
definitely counts as a surprise!)

As noted in a previous email, PGI seems to treat pointers to
aggregates specially, mapping them as ptr[0:1], but it's unclear if the
same is true for pointers to scalars with their compiler. Neither
behaviour seems to be standard-mandated, but this patch extends the
idea to references to scalars nonetheless.

> > In any case, you shouldn't need to check lang_GNU_CXX since we're
> > already calling the langhook.  
> 
> Julian, can you look into this? I'm traveling tomorrow.

Yes, I'll continue to look at this patch.

Thanks,

Julian


[PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-09 Thread Julian Brown
This patch (by Cesar) changes the way C++ references are mapped in
OpenACC regions, fixing an ICE in the non-scalar-data.C testcase.

Post-patch, references are mapped like this (from the omplower dump):

map(force_present:*x [len: 4]) map(firstprivate ref:x [pointer assign, bias: 0])

Tested with offloading to NVPTX and bootstrapped. OK for trunk?

Thanks,

Julian

ChangeLog

2018-09-09  Cesar Philippidis  
Julian Brown  

PR middle-end/86336

gcc/cp/
* semantics.c (finish_omp_clauses): Map C++ references by value and
FIRSTPRIVATE_REFERENCE.

* gimplify.c (gimplify_scan_omp_clauses): Remove FIRSTPRIVATE_REFERENCE
mappings in OpenACC data regions.
(gimplify_adjust_omp_clauses_1): Update handling of mapping of C++
references.

libgomp/
* testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.
commit fed5f1044b3d7add83065b3bbe2ba2a95a1e95ce
Author: Julian Brown 
Date:   Thu Sep 6 15:32:50 2018 -0700

[OpenACC] C++ reference mapping

2018-09-09  Cesar Philippidis  
	Julian Brown  

	gcc/cp/
	* semantics.c (finish_omp_clauses): Map C++ references by value and
	FIRSTPRIVATE_REFERENCE.

	* gimplify.c (gimplify_scan_omp_clauses): Remove FIRSTPRIVATE_REFERENCE
	mappings in OpenACC data regions.
	(gimplify_adjust_omp_clauses_1): Update handling of mapping of C++
	references.

	libgomp/
	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 676de01..707f054 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6877,7 +6877,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	handle_map_references:
 	  if (!remove
 	  && !processing_template_decl
-	  && (ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
+	  && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
+		  || ort == C_ORT_ACC)
 	  && TYPE_REF_P (TREE_TYPE (OMP_CLAUSE_DECL (c
 	{
 	  t = OMP_CLAUSE_DECL (c);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index dbd0f0e..4011cb2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8556,7 +8556,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
   if (code == OACC_DATA
 	  && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-	  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER)
+	  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
+	  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 	remove = true;
   if (remove)
 	*list_p = OMP_CLAUSE_CHAIN (c);
@@ -8872,7 +8873,9 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  OMP_CLAUSE_CHAIN (nc) = OMP_CLAUSE_CHAIN (clause);
 	  OMP_CLAUSE_CHAIN (clause) = nc;
 	}
-  else if (gimplify_omp_ctxp->target_firstprivatize_array_bases
+  else if gimplify_omp_ctxp->region_type & ORT_ACC)
+		 && lang_GNU_CXX ())
+		|| gimplify_omp_ctxp->target_firstprivatize_array_bases)
 	   && lang_hooks.decls.omp_privatize_by_reference (decl))
 	{
 	  OMP_CLAUSE_DECL (clause) = build_simple_mem_ref (decl);
diff --git a/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C b/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
index 8e4b296..e5f8707 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
@@ -1,8 +1,7 @@
 // Ensure that a non-scalar dummy arguments which are implicitly used inside
 // offloaded regions are properly mapped using present_or_copy semantics.
 
-// { dg-xfail-if "TODO" { *-*-* } }
-// { dg-excess-errors "ICE" }
+// { dg-do run }
 
 #include 
 


[PATCH, OpenACC] Support Fortran derived type members in "acc update" directives

2018-09-03 Thread Julian Brown
Hi,

This patch (by Cesar) adds support for Fortran derived type members in
"acc update" directives (as specified in OpenACC 2.5 2.14.4., Update
Directive). Seemingly only "update" directives may specify derived type
members in this way as of OpenACC 2.5.

Tested with offloading to NVPTX and bootstrapped.

OK to apply?

Thanks,

Julian

2018-09-03  Cesar Philippidis  

gcc/fortran/
* openmp.c (gfc_match_omp_variable_list): New allow_derived
argument. (gfc_match_omp_map_clause): Update call to
gfc_match_omp_variable_list. (gfc_match_omp_clauses): Update
calls to gfc_match_omp_map_clause. (gfc_match_oacc_update):
Update call to gfc_match_omp_clauses. (resolve_omp_clauses):
Permit derived type variables in ACC UPDATE clauses.
* trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived type
members.

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Update handling of ACC
UPDATE variables.

gcc/testsuite/
* gfortran.dg/goacc/derived-types.f90: New test.

libgomp/
* testsuite/libgomp.oacc-fortran/update-2.f90: New test.
* testsuite/libgomp.oacc-fortran/derived-type-1.f90: New test.
commit a7e1f0958d38bfda7474fbaf6bb31951351ab66d
Author: Julian Brown 
Date:   Thu Aug 30 17:00:58 2018 -0700

Derived types for acc update.

2018-09-03  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_variable_list): New allow_derived argument.
	(gfc_match_omp_map_clause): Update call to gfc_match_omp_variable_list.
	(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clause.
	(gfc_match_oacc_update): Update call to gfc_match_omp_clauses.
	(resolve_omp_clauses): Permit derived type variables in ACC UPDATE
	clauses.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived type members.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Update handling of ACC
	UPDATE variables.

	gcc/testsuite/
	* gfortran.dg/goacc/derived-types.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/update-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/derived-type-1.f90: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 94a7f7e..80a4c05 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -222,7 +222,8 @@ static match
 gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 			 bool allow_common, bool *end_colon = NULL,
 			 gfc_omp_namelist ***headp = NULL,
-			 bool allow_sections = false)
+			 bool allow_sections = false,
+			 bool allow_derived = false)
 {
   gfc_omp_namelist *head, *tail, *p;
   locus old_loc, cur_loc;
@@ -248,7 +249,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	case MATCH_YES:
 	  gfc_expr *expr;
 	  expr = NULL;
-	  if (allow_sections && gfc_peek_ascii_char () == '(')
+	  if ((allow_sections && gfc_peek_ascii_char () == '(')
+	  || (allow_derived && gfc_peek_ascii_char () == '%'))
 	{
 	  gfc_current_locus = cur_loc;
 	  m = gfc_match_variable (, 0);
@@ -914,10 +916,12 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : omp_mask (m)
mapping.  */
 
 static bool
-gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
+gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
+			  bool allow_derived)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, false, NULL, , true)
+  if (gfc_match_omp_variable_list ("", list, false, NULL, , true,
+   allow_derived)
   == MATCH_YES)
 {
   gfc_omp_namelist *n;
@@ -935,7 +939,7 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		   bool first = true, bool needs_space = true,
-		   bool openacc = false)
+		   bool openacc = false, bool allow_derived = false)
 {
   gfc_omp_clauses *c = gfc_get_omp_clauses ();
   locus old_loc;
@@ -1039,7 +1043,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   OMP_MAP_TOFROM))
+	   OMP_MAP_TOFROM, allow_derived))
 	continue;
 	  if (mask & OMP_CLAUSE_COPYIN)
 	{
@@ -1047,7 +1051,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		{
 		  if (gfc_match ("copyin ( ") == MATCH_YES
 		  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-		   OMP_MAP_TO))
+		   OMP_MAP_TO, allow_derived))
 		continue;
 		}
 	  else if (gfc_match_omp_variable_list ("copyin (",
@@ -1058,7 +1062,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask ma

Re: [PATCH, OpenACC] Support C++ "this" in OpenACC directives (PR66053)

2018-09-03 Thread Julian Brown
On Fri, 31 Aug 2018 16:20:08 +0200
Jakub Jelinek  wrote:

> On Fri, Aug 31, 2018 at 10:04:07AM -0400, Nathan Sidwell wrote:
> > On 08/30/2018 04:27 PM, Jason Merrill wrote:
> >   
> > > On Thu, Aug 30, 2018 at 3:31 PM, Julian Brown
> > >  wrote: 
> > > > "Apart from parsing, it's necessary to prevent the "cannot take
> > > > the address of 'this', which is an rvalue expression" error
> > > > from appearing  
> > 
> > Breaking a rather fundamental language attribute does not seem wise.
> >   
> > > Why does referring to this[0:1] require making 'this' addressable?
> > > Surely what we're interested in is the value of 'this', not the
> > > address.  
> > Yes, transferring the this pointer is very unlikely to be what the
> > user wants -- the object being referred to contains the data.  It
> > might be wise to look at the DR's and changes relating to lambdas
> > and this capture.  Those changes now make it much harder to simply
> > capture the pointer unintentionally.  
> 
> Yeah, I agree we shouldn't try to make this addressable.
> Does OpenACC try to map the base of the array section (rather than
> what e.g. OpenMP does, privatize the pointer base instead and assign
> the pointer the new value inside of the region)?
> Even if it is mapped, can't it be mapped by taking an address of a
> temporary initialized from this?

For OpenACC, two mappings are created for an array section: one for the
data (to, from, tofrom, etc.) and a firstprivate pointer with a bias to
locate the (possibly virtual) zero'th element of the array. I think
that's the same as OpenMP.

For the test case given, it's sufficient to merely allow "this" to be
used as the base pointer for an array section. That usage doesn't
require "this" to be made addressable.

The this[0:1] syntax is accepted by PGI
(https://www.pgroup.com/resources/docs/18.4/x86/openacc-gs/index.htm,
2.4 C++ Classes in OpenACC) -- in order to copy "the class itself" to
the accelerator.

Referring to class member variables in OpenACC clauses (as the example
in that section does also) is still problematic in GCC, though.

PGI also allows the user to specify just "this" in OpenACC clauses,
which presumably does the same thing as specifying this[0:1]. For PGI,
but not for OpenACC <= 2.5, that seems to follow a general case for
pointers to structs (2.3. C Structs in OpenACC), "A pointer to a scalar
struct is treated as a one-element array, and should be shaped as
r[0:1]". That's notably different from OpenMP 4.5, which treats plain
mapped pointers as zero-length array sections, and also differs from
the current behaviour of GCC (which bizarrely, IIUC, is to copy the
bits of the host pointer verbatim to the target). OpenACC 2.5 arguably
leaves the behaviour unspecified for pointers without an explicit array
section.

The attached patch allows basic class-wrapping-an-array kinds of
usages, anyway. Re-tested with offloading to nvptx and bootstrapped. OK
to apply?

Thanks,

Julian

2018-09-03  Joseph Myers  
Julian Brown  

PR C++/66053

* semantics.c (handle_omp_array_sections_1): Allow array
sections with "this" pointer for OpenACC.
commit 355411e5415f65e09a06f42d400761fff065f7c7
Author: Julian Brown 
Date:   Fri Aug 31 17:30:20 2018 -0700

Allow this[:] array slices for OpenACC

2018-09-03  Joseph Myers  
	Julian Brown  

	* semantics.c (handle_omp_array_sections_1): Allow array sections with
	"this" pointer for OpenACC.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 676de01..98511ed 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -4598,7 +4598,8 @@ handle_omp_array_sections_1 (tree c, tree t, vec ,
 		  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  return error_mark_node;
 	}
-  else if (TREE_CODE (t) == PARM_DECL
+  else if (ort == C_ORT_OMP
+	   && TREE_CODE (t) == PARM_DECL
 	   && DECL_ARTIFICIAL (t)
 	   && DECL_NAME (t) == this_identifier)
 	{
diff --git a/libgomp/testsuite/libgomp.oacc-c++/this.C b/libgomp/testsuite/libgomp.oacc-c++/this.C
new file mode 100644
index 000..510c690
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/this.C
@@ -0,0 +1,43 @@
+#include 
+#include 
+using namespace std;
+
+class test {
+  public:
+  int a;
+
+  test ()
+  {
+a = -1;
+#pragma acc enter data copyin (this[0:1])
+  }
+
+  ~test ()
+  {
+#pragma acc exit data delete (this[0:1])
+  }
+
+  void set (int i)
+  {
+a = i;
+#pragma acc update device (this[0:1])
+  }
+
+  int get ()
+  {
+#pragma acc update host (this[0:1])
+return a;
+  }
+};
+
+int
+main ()
+{
+  test t;
+
+  t.set (4);
+  if (t.get () != 4)
+abort ();
+
+  return 0;
+}


[PATCH, OpenACC] Support C++ "this" in OpenACC directives (PR66053)

2018-08-30 Thread Julian Brown
This patch (by Joseph) allows "this" to be used in OpenACC directives,
following -- IIUC -- the behaviour of other compilers. The standard (as
of OpenACC 2.5) does not appear to have explicit language either
permitting or forbidding such usage.

Joseph's original commentary is in the bug report (ca. 2015,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66053):

"This patch, for gomp-4_0-branch, adds support for C++ "this" in
OpenACC directives.  (This patch does not do anything to handle OpenMP
differently from OpenACC; that - bug 66053 - will need to be resolved
for mainline, either deciding these cases should be accepted for
OpenMP or making the parsing only accept them in OpenACC directives
and not OpenMP ones.)

"Apart from parsing, it's necessary to prevent the "cannot take the
address of 'this', which is an rvalue expression" error from appearing
when "this" is used in such contexts.  This patch duly adds a new
argument to cxx_mark_addressable (default false so callers don't all
need to change) to allow disabling that error, passing that argument
in all calls that seem relevant to OpenACC directives."

AFAICT though, this attached version of the patch does still forbid
"this" in OpenMP directives (apart from "declare simd"). I couldn't see
that there was any change to the OpenMP spec language in 4.5.

Tested with offloading to NVPTX and bootstrapped.

OK to apply?

Julian

ChangeLog

20xx-xx-xx  Joseph Myers  

PR C++/66053

gcc/cp/
* cp-tree.h (enum cxx_mark_addressable_flags): New.
(cxx_mark_addressable): Use it.  Adjust users.
* parser.c (cp_parser_omp_var_list_no_open): Handle RID_THIS.
* semantics.c (handle_omp_array_sections_1)
(handle_omp_array_sections, finish_omp_reduction_clause)
(finish_omp_clauses): Pass CXX_MARK_ADDRESSABLE_FLAGS_ALLOW_THIS
to cxx_mark_addressable.  Enforce "this" usage limitation only
for OpenMP.
* typeck.c (cp_build_array_ref): Adjust cxx_mark_addressble
call. (cxx_mark_addressable): Handle
CXX_MARK_ADDRESSABLE_FLAGS_ALLOW_THIS.

libgomp/
* testsuite/libgomp.oacc-c++/this.C: New test.
commit 12294a1345d981b72ef61d285057fb4c7e378fd7
Author: Julian Brown 
Date:   Wed Aug 29 18:19:44 2018 -0700

Support C++ "this" in OpenACC directives

20xx-xx-xx  Joseph Myers  

	PR C++/66053

	gcc/cp/
	* cp-tree.h (enum cxx_mark_addressable_flags): New.
	(cxx_mark_addressable): Use it.  Adjust users.
	* parser.c (cp_parser_omp_var_list_no_open): Handle RID_THIS.
	* semantics.c (handle_omp_array_sections_1)
	(handle_omp_array_sections, finish_omp_reduction_clause)
	(finish_omp_clauses): Pass CXX_MARK_ADDRESSABLE_FLAGS_ALLOW_THIS
	to cxx_mark_addressable.  Enforce "this" usage limitation only for
	OpenMP.
	* typeck.c (cp_build_array_ref): Adjust cxx_mark_addressble call.
	(cxx_mark_addressable): Handle CXX_MARK_ADDRESSABLE_FLAGS_ALLOW_THIS.

	libgomp/
	* testsuite/libgomp.oacc-c++/this.C: New test.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 43e452c..127e15a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7214,7 +7214,24 @@ extern void cxx_print_error_function		(diagnostic_context *,
 		 struct diagnostic_info *);
 
 /* in typeck.c */
-extern bool cxx_mark_addressable		(tree, bool = false);
+
+/* Flags for cxx_mark_addressable.  */
+
+enum cxx_mark_addressable_flags
+{
+  CXX_MARK_ADDRESSABLE_FLAGS_NONE = 0,
+  /* This is for ARRAY_REF construction - in that case we don't want
+ to look through VIEW_CONVERT_EXPR from VECTOR_TYPE to ARRAY_TYPE,
+ it is fine to use ARRAY_REFs for vector subscripts on vector
+ register variables.  */
+  CXX_MARK_ADDRESSABLE_FLAGS_ARRAY_REF = 1 << 0,
+  /* Allow `current_class_ptr' to be addressable.  */
+  CXX_MARK_ADDRESSABLE_FLAGS_ALLOW_THIS = 1 << 1
+};
+
+extern bool cxx_mark_addressable		(tree,
+		 enum cxx_mark_addressable_flags flags
+		 = CXX_MARK_ADDRESSABLE_FLAGS_NONE);
 extern int string_conv_p			(const_tree, const_tree, int);
 extern tree cp_truthvalue_conversion		(tree);
 extern tree condition_conversion		(tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 92e6b40..0cf2526 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -31726,7 +31726,7 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	  OMP_CLAUSE_CHAIN (u) = list;
 	  list = u;
 	}
-  else
+  else if (decl != error_mark_node)
 	list = tree_cons (decl, NULL_TREE, list);
 
 get_comma:
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 676de01..9a722df 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -4598,7 +4598,8 @@ handle_omp_array_sections_1 (tree c, tree t, vec ,
 		  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  return error_mark_node;
 

[PATCH, OpenACC] OpenACC subarray data alignment in Fortran

2018-08-29 Thread Julian Brown
This patch (by Cesar) removes several pointer-to-character casts
emitted during OpenMP/OpenACC clause processing in the Fortran front
end.

It's not quite clear to me why these casts were needed to start with,
but they are problematic in that an offload target will create a copy
of the array data with natural alignment for the character type, rather
than for the true element type of the array. That leads to alignment
violations on NVPTX, e.g. with the included new test.

Simply removing the casts appears to work. The data mapping then uses
the natural alignment of the array's element type.

Tested with offloading to NVPTX and bootstrapped. Posted previously as
part of the patch:

https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01896.html

OK?

Julian

ChangeLog

20xx-xx-xx  Cesar Philippidis  
Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't cast ptr into a
character pointer.
(gfc_trans_omp_clauses_1): Likewise.

libgomp/
* testsuite/libgomp.oacc-fortran/data-alignment.f90: New test.

gcc/testsuite/
* gfortran.dg/goacc/pr70828.f90: Adjust expected output.
commit 1e4c518992560dec161a2d1f65aad560d7b12518
Author: Julian Brown 
Date:   Wed Aug 29 12:42:27 2018 -0700

OpenACC subarray data alignment in fortran

20xx-xx-xx  Cesar Philippidis  
	Julian Brown  

	gcc/fortran/
	* trans-openmp.c (gfc_omp_finish_clause): Don't cast ptr into a
	character pointer.
	(gfc_trans_omp_clauses_1): Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/data-alignment.f90: New test.

	gcc/testsuite/
	* gfortran.dg/goacc/pr70828.f90: Adjust expected output.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 86be407..9c7b74b 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1098,7 +1098,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   gfc_start_block ();
   tree type = TREE_TYPE (decl);
   tree ptr = gfc_conv_descriptor_data_get (decl);
-  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
   ptr = build_fold_indirect_ref (ptr);
   OMP_CLAUSE_DECL (c) = ptr;
   c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
@@ -2145,8 +2144,6 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		{
 		  tree type = TREE_TYPE (decl);
 		  tree ptr = gfc_conv_descriptor_data_get (decl);
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-	  ptr);
 		  ptr = build_fold_indirect_ref (ptr);
 		  OMP_CLAUSE_DECL (node) = ptr;
 		  node2 = build_omp_clause (input_location,
@@ -2239,8 +2236,6 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
    OMP_CLAUSE_SIZE (node), elemsz);
 		}
 		  gfc_add_block_to_block (block, );
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-  ptr);
 		  OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr);
 
 		  if (POINTER_TYPE_P (TREE_TYPE (decl))
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr70828.f90 b/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
index 2e58120..6604fb3 100644
--- a/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
@@ -18,5 +18,5 @@ program test
   !$acc end data
 end program test
 
-! { dg-final { scan-tree-dump-times "omp target oacc_data map\\(tofrom:MEM\\\[\\(c_char \\*\\)\_\[0-9\]+\\\] \\\[len: _\[0-9\]+\\\]\\) map\\(alloc:data \\\[pointer assign, bias: _\[0-9\]+\\\]\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_parallel map\\(force_present:MEM\\\[\\(c_char \\*\\)D\\.\[0-9\]+\\\] \\\[len: D\\.\[0-9\]+\\\]\\) map\\(alloc:data \\\[pointer assign, bias: D\\.\[0-9\]+\\\]\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data map\\(tofrom:MEM\\\[\\(integer\\(kind=\[48\]\\)\\\[0:\\\] \\*\\)\_\[0-9\]+\\\] \\\[len: _\[0-9\]+\\\]\\) map\\(alloc:data \\\[pointer assign, bias: _\[0-9\]+\\\]\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel map\\(force_present:MEM\\\[\\(integer\\(kind=\[48\]\\)\\\[0:\\\] \\*\\)D\\.\[0-9\]+\\\] \\\[len: D\\.\[0-9\]+\\\]\\) map\\(alloc:data \\\[pointer assign, bias: D\\.\[0-9\]+\\\]\\)" 1 "gimple" } }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90 b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
new file mode 100644
index 000..38c9005
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
@@ -0,0 +1,35 @@
+! Test if the array data associated with c is properly aligned
+! on the accelerator.  If it is not, this program will crash.
+
+! { dg-do run }
+
+integer function routine_align()
+  implicit none
+  integer, parameter :: n = 1
+  real*8, dimension(:), allocatable :: c
+  integer :: i, idx
+
+  allocate (c(n))
+  

Re: [PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
On Tue, 28 Aug 2018 12:23:22 -0700
Cesar Philippidis  wrote:

> On 08/28/2018 12:19 PM, Julian Brown wrote:
> 
> > diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> > index f038f4c..86be407 100644
> > --- a/gcc/fortran/trans-openmp.c
> > +++ b/gcc/fortran/trans-openmp.c
> > @@ -1045,9 +1045,13 @@ gfc_omp_finish_clause (tree c, gimple_seq
> > *pre_p) 
> >tree decl = OMP_CLAUSE_DECL (c);
> >  
> > -  /* Assumed-size arrays can't be mapped implicitly, they have to
> > be
> > - mapped explicitly using array sections.  */
> > -  if (TREE_CODE (decl) == PARM_DECL
> > +  /* Assumed-size arrays can't be mapped implicitly, they have to
> > be mapped
> > + explicitly using array sections.  An exception is if the
> > array is
> > + mapped explicitly in an enclosing data construct for OpenACC,
> > in which
> > + case we see GOMP_MAP_FORCE_PRESENT here and do not need to
> > raise an
> > + error.  */
> > +  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
> > +  && TREE_CODE (decl) == PARM_DECL
> >&& GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
> >&& GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) ==
> > GFC_ARRAY_UNKNOWN && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),  
> 
> This is specific to OpenACC, and needs to be guarded as such.

Are you sure that condition can be true for OpenMP? I'd assumed not...

Julian


[PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
This follow-up patch enables the "inheritance" of mappings for OpenACC
data constructs to work also for Fortran assumed-size arrays.
Otherwise, such arrays are (arguably, prematurely) bailed out on in the
Fortran front-end.

Tested alongside the previous patch with offloading to nvptx.

OK to apply?

Thanks,

Julian

2018-08-28  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
assumed-size array if present in a lexically-enclosing data construct.

libgomp/
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
>From 9214ffc6bb2ac7cf023f4e62ca324b1a47123ffc Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Tue, 28 Aug 2018 09:01:15 -0700
Subject: [PATCH 2/2] Assumed-size array fix

2018-08-28  Julian Brown  

	gcc/fortran/
	* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
	assumed-size array if present in a lexically-enclosing data construct.

	libgomp/
	* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
---
 gcc/fortran/trans-openmp.c | 10 ---
 .../testsuite/libgomp.oacc-fortran/pr70828-4.f90   | 31 ++
 2 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c..86be407 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1045,9 +1045,13 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be
- mapped explicitly using array sections.  */
-  if (TREE_CODE (decl) == PARM_DECL
+  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
+ explicitly using array sections.  An exception is if the array is
+ mapped explicitly in an enclosing data construct for OpenACC, in which
+ case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
+ error.  */
+  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
+  && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90
new file mode 100644
index 000..01da999
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90
@@ -0,0 +1,31 @@
+! Subarrays declared on data construct: assumed-size array.
+
+subroutine s1(n, arr)
+  integer :: n
+  integer :: arr(*)
+
+  !$acc data copy(arr(5:n-10))
+  !$acc parallel loop
+  do i = 10, n - 10
+ arr(i) = i
+  end do
+  !$acc end parallel loop
+  !$acc end data
+end subroutine s1
+
+program test
+  integer, parameter :: n = 100
+  integer i, data(n)
+
+  data(:) = 0
+
+  call s1(n, data)
+
+  do i = 1, n
+ if ((i < 10 .or. i > n-10)) then
+if ((data(i) .ne. 0)) call abort
+ else if (data(i) .ne. i) then
+call abort
+ end if
+  end do
+end program test
-- 
1.8.1.1



<    1   2   3   4   5   6   7   8   9   >