OpenACC ICV acc-default-async-var (was: [gomp4] Async related additions to OpenACC runtime library)

2018-11-17 Thread Thomas Schwinge
Hi Chung-Lin!

On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang  
wrote:
> This patch adds:
> 
> // New functions to set/get the current default async queue
> void acc_set_default_async (int);
> int acc_get_default_async (void);
> 
> and _async versions of a few existing API functions:

(Please, separate patches for separate features/changes.)


Reviewing the OpenACC ICV acc-default-async-var changes here.

> --- include/gomp-constants.h  (revision 245382)
> +++ include/gomp-constants.h  (working copy)

>  /* Asynchronous behavior.  Keep in sync with
> libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
>  
> +#define GOMP_ASYNC_DEFAULT   0
>  #define GOMP_ASYNC_NOVAL -1
>  #define GOMP_ASYNC_SYNC  -2

This means that "acc_set_default_async(acc_async_default)" will set
acc-default-async-var to "0", that is, the same as
"acc_set_default_async(0)".  It thus follows that
"async"/"async(acc_async_noval)" is the same as "async(0)".  Is that
intentional?

It is in line with the OpenACC 2.5 specification: "initial value [...] is
implementation defined", but I wonder why map it to "async(0)", and not
to its own, unspecified, but separate queue.  In the latter case,
"acc_async_default" etc. would then map to a negative value to denote
this unspecified, but separate queue (and your changes would need to be
adapted for that).

I have not verified whether we're currently already having (on trunk
and/or openacc-gcc-8-branch) the semantics of the queue of
"async(acc_async_noval)" mapping to the same queue as "async(0)"?

I'm fine to accept your changes as proposed (basically, everthing from
your patch posted that has a "default_async" in it), for that's an
incremental improvement anyway.  But -- unless you tell me I've
misunderstood something -- I'll get the issue I raised clarified with the
OpenACC technical committee, and we will then later improve this further.

No matter what the outcome, the implementation-defined behavior should be
documented.  (Can do that once we get the intentions clarified.)

> --- libgomp/oacc-async.c  (revision 245382)
> +++ libgomp/oacc-async.c  (working copy)

> +int
> +acc_get_default_async (void)
> +{
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  if (!thr || !thr->dev)
> +gomp_fatal ("no device active");

I suppose that instead, this might also either just "return
acc_async_sync", or in fact "goacc_lazy_initialize", and then return the
correct value?  As far as I remember now, I have an issue open with the
OpenACC technical committee to clarify which constructs/API calls are
expected to implicitly initialize.  I'll fold this question in.

So, OK to leave 'gomp_fatal ("no device active")', as that's what all
other async routines also seem to be doing at the moment.

> +
> +  return thr->default_async;
> +}
> +

> +void
> +acc_set_default_async (int async)
> +{
> +  if (async < acc_async_sync)
> +gomp_fatal ("invalid async argument: %d", async);

(This will nowadays use "async_valid_stream_id_p" or some such.)

> +
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  if (!thr || !thr->dev)
> +gomp_fatal ("no device active");

As above.

> +  thr->default_async = async;
> +}

> --- libgomp/oacc-plugin.c (revision 245382)
> +++ libgomp/oacc-plugin.c (working copy)

> +/* Return the default async number from the TLS data for the current thread. 
>  */
> +
> +int
> +GOMP_PLUGIN_acc_thread_default_async (void)
> +{
> +  struct goacc_thread *thr = goacc_thread ();
> +  return thr ? thr->default_async : acc_async_default;
> +}

As I understand, the need for this function will disappear with your
later "async re-work" changes, so OK as posted, but I wondered in which
cases we would not have a valid "goacc_thread" when coming here?  (Might
again related to the "goacc_lazy_initialize" issue mentioned above.)

> --- libgomp/plugin/plugin-nvptx.c (revision 245382)
> +++ libgomp/plugin/plugin-nvptx.c (working copy)
> @@ -414,13 +414,10 @@ select_stream_for_async (int async, pthread_t thre
>struct ptx_stream *stream = NULL;
>int orig_async = async;
>  
> -  /* The special value acc_async_noval (-1) maps (for now) to an
> - implicitly-created stream, which is then handled the same as any other
> - numbered async stream.  Other options are available, e.g. using the null
> - stream for anonymous async operations, or choosing an idle stream from 
> an
> - active set.  But, stick with this for now.  */
> -  if (async > acc_async_sync)
> -async++;

Is that actually a separate change from the acc-default-async-var
changes?

Is this one relevant in the question raised above, whether
"async(acc_async_noval)" maps to the same queue as "async(0)"?

> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> + default async stream.  */
> +  if (async == acc_async_noval)
> +async = GOMP_PLUGIN_acc_thread_default_async ();

> --- 

Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-15 Thread Thomas Schwinge
Hi!

On Tue, 14 Feb 2017 19:58:11 +0800, Chung-Lin Tang  
wrote:
> On 2017/2/14 07:25 PM, Thomas Schwinge wrote:
> > Testing [...], I saw a lot of regressions,
> > and in r245427 just committed [...] to address
> > these.  Did you simply forget to commit your changes to
> > libgomp/libgomp.map, or why did this work for you?  Please verify:
> 
> Weird, I did not see any regressions

We figured it out; I just filed 
'Inappropriate "ld --version" checking'.


Grüße
 Thomas


Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-14 Thread Chung-Lin Tang
On 2017/2/14 07:25 PM, Thomas Schwinge wrote:
> Hi Chung-Lin!
> 
> On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang  
> wrote:
>> Tested and committed to gomp-4_0-branch.
> 
> Thanks!  (Not yet reviewed.)  Testing this, I saw a lot of regressions,
> and in r245427 just committed the following to gomp-4_0-branch to address
> OCthese.  Did you simply forget to commit your changes to
> libgomp/libgomp.map, or why did this work for you?  Please verify:

Weird, I did not see any regressions, but thanks for adding those.
I overlooked updating the map file.

Thanks,
Chung-Lin

> commit bd5613600754bd7a1fe85990eb3b7b6b5f2e1543
> Author: tschwinge 
> Date:   Tue Feb 14 11:20:31 2017 +
> 
> Update libgomp/libgomp.map for OpenACC async functions
> 
> libgomp/
> * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
> acc_copyin_async_32_h_, acc_copyin_async_64_h_,
> acc_copyin_async_array_h_, acc_copyout_async,
> acc_copyout_async_32_h_, acc_copyout_async_64_h_,
> acc_copyout_async_array_h_, acc_create_async,
> acc_create_async_32_h_, acc_create_async_64_h_,
> acc_create_async_array_h_, acc_delete_async,
> acc_delete_async_32_h_, acc_delete_async_64_h_,
> acc_delete_async_array_h_, acc_get_default_async,
> acc_get_default_async_h_, acc_memcpy_from_device_async,
> acc_memcpy_to_device_async, acc_set_default_async,
> acc_set_default_async_h_, acc_update_device_async,
> acc_update_device_async_32_h_, acc_update_device_async_64_h_,
> acc_update_device_async_array_h_, acc_update_self_async,
> acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
> acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
> add GOMP_PLUGIN_acc_thread_default_async.
> 
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245427 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  libgomp/ChangeLog.gomp | 20 
>  libgomp/libgomp.map| 39 +++
>  2 files changed, 59 insertions(+)
> 
> diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
> index 0a5f601..b811c28 100644
> --- libgomp/ChangeLog.gomp
> +++ libgomp/ChangeLog.gomp
> @@ -1,3 +1,23 @@
> +2017-02-14  Thomas Schwinge  
> +
> + * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
> + acc_copyin_async_32_h_, acc_copyin_async_64_h_,
> + acc_copyin_async_array_h_, acc_copyout_async,
> + acc_copyout_async_32_h_, acc_copyout_async_64_h_,
> + acc_copyout_async_array_h_, acc_create_async,
> + acc_create_async_32_h_, acc_create_async_64_h_,
> + acc_create_async_array_h_, acc_delete_async,
> + acc_delete_async_32_h_, acc_delete_async_64_h_,
> + acc_delete_async_array_h_, acc_get_default_async,
> + acc_get_default_async_h_, acc_memcpy_from_device_async,
> + acc_memcpy_to_device_async, acc_set_default_async,
> + acc_set_default_async_h_, acc_update_device_async,
> + acc_update_device_async_32_h_, acc_update_device_async_64_h_,
> + acc_update_device_async_array_h_, acc_update_self_async,
> + acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
> + acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
> + add GOMP_PLUGIN_acc_thread_default_async.
> +
>  2017-02-13  Cesar Philippidis  
>  
>   * plugin/plugin-nvptx.c (nvptx_exec): Adjust the default num_gangs.
> diff --git libgomp/libgomp.map libgomp/libgomp.map
> index b047ad9..2c9a13d 100644
> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -378,6 +378,40 @@ OACC_2.0 {
>   acc_set_cuda_stream;
>  };
>  
> +OACC_2.5 {
> +  global:
> + acc_copyin_async;
> + acc_copyin_async_32_h_;
> + acc_copyin_async_64_h_;
> + acc_copyin_async_array_h_;
> + acc_copyout_async;
> + acc_copyout_async_32_h_;
> + acc_copyout_async_64_h_;
> + acc_copyout_async_array_h_;
> + acc_create_async;
> + acc_create_async_32_h_;
> + acc_create_async_64_h_;
> + acc_create_async_array_h_;
> + acc_delete_async;
> + acc_delete_async_32_h_;
> + acc_delete_async_64_h_;
> + acc_delete_async_array_h_;
> + acc_get_default_async;
> + acc_get_default_async_h_;
> + acc_memcpy_from_device_async;
> + acc_memcpy_to_device_async;
> + acc_set_default_async;
> + acc_set_default_async_h_;
> + acc_update_device_async;
> + acc_update_device_async_32_h_;
> + acc_update_device_async_64_h_;
> + acc_update_device_async_array_h_;
> + acc_update_self_async;
> + acc_update_self_async_32_h_;
> + acc_update_self_async_64_h_;
> + acc_update_self_async_array_h_;
> +} OACC_2.0;
> +
>  GOACC_2.0 {
>global:
>   GOACC_data_end;

Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang  
wrote:
> Tested and committed to gomp-4_0-branch.

Thanks!  (Not yet reviewed.)  Testing this, I saw a lot of regressions,
and in r245427 just committed the following to gomp-4_0-branch to address
OCthese.  Did you simply forget to commit your changes to
libgomp/libgomp.map, or why did this work for you?  Please verify:

commit bd5613600754bd7a1fe85990eb3b7b6b5f2e1543
Author: tschwinge 
Date:   Tue Feb 14 11:20:31 2017 +

Update libgomp/libgomp.map for OpenACC async functions

libgomp/
* libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
acc_copyin_async_32_h_, acc_copyin_async_64_h_,
acc_copyin_async_array_h_, acc_copyout_async,
acc_copyout_async_32_h_, acc_copyout_async_64_h_,
acc_copyout_async_array_h_, acc_create_async,
acc_create_async_32_h_, acc_create_async_64_h_,
acc_create_async_array_h_, acc_delete_async,
acc_delete_async_32_h_, acc_delete_async_64_h_,
acc_delete_async_array_h_, acc_get_default_async,
acc_get_default_async_h_, acc_memcpy_from_device_async,
acc_memcpy_to_device_async, acc_set_default_async,
acc_set_default_async_h_, acc_update_device_async,
acc_update_device_async_32_h_, acc_update_device_async_64_h_,
acc_update_device_async_array_h_, acc_update_self_async,
acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
add GOMP_PLUGIN_acc_thread_default_async.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245427 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 20 
 libgomp/libgomp.map| 39 +++
 2 files changed, 59 insertions(+)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 0a5f601..b811c28 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,23 @@
+2017-02-14  Thomas Schwinge  
+
+   * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
+   acc_copyin_async_32_h_, acc_copyin_async_64_h_,
+   acc_copyin_async_array_h_, acc_copyout_async,
+   acc_copyout_async_32_h_, acc_copyout_async_64_h_,
+   acc_copyout_async_array_h_, acc_create_async,
+   acc_create_async_32_h_, acc_create_async_64_h_,
+   acc_create_async_array_h_, acc_delete_async,
+   acc_delete_async_32_h_, acc_delete_async_64_h_,
+   acc_delete_async_array_h_, acc_get_default_async,
+   acc_get_default_async_h_, acc_memcpy_from_device_async,
+   acc_memcpy_to_device_async, acc_set_default_async,
+   acc_set_default_async_h_, acc_update_device_async,
+   acc_update_device_async_32_h_, acc_update_device_async_64_h_,
+   acc_update_device_async_array_h_, acc_update_self_async,
+   acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
+   acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
+   add GOMP_PLUGIN_acc_thread_default_async.
+
 2017-02-13  Cesar Philippidis  
 
* plugin/plugin-nvptx.c (nvptx_exec): Adjust the default num_gangs.
diff --git libgomp/libgomp.map libgomp/libgomp.map
index b047ad9..2c9a13d 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -378,6 +378,40 @@ OACC_2.0 {
acc_set_cuda_stream;
 };
 
+OACC_2.5 {
+  global:
+   acc_copyin_async;
+   acc_copyin_async_32_h_;
+   acc_copyin_async_64_h_;
+   acc_copyin_async_array_h_;
+   acc_copyout_async;
+   acc_copyout_async_32_h_;
+   acc_copyout_async_64_h_;
+   acc_copyout_async_array_h_;
+   acc_create_async;
+   acc_create_async_32_h_;
+   acc_create_async_64_h_;
+   acc_create_async_array_h_;
+   acc_delete_async;
+   acc_delete_async_32_h_;
+   acc_delete_async_64_h_;
+   acc_delete_async_array_h_;
+   acc_get_default_async;
+   acc_get_default_async_h_;
+   acc_memcpy_from_device_async;
+   acc_memcpy_to_device_async;
+   acc_set_default_async;
+   acc_set_default_async_h_;
+   acc_update_device_async;
+   acc_update_device_async_32_h_;
+   acc_update_device_async_64_h_;
+   acc_update_device_async_array_h_;
+   acc_update_self_async;
+   acc_update_self_async_32_h_;
+   acc_update_self_async_64_h_;
+   acc_update_self_async_array_h_;
+} OACC_2.0;
+
 GOACC_2.0 {
   global:
GOACC_data_end;
@@ -417,3 +451,8 @@ GOMP_PLUGIN_1.1 {
   global:
GOMP_PLUGIN_target_task_completion;
 } GOMP_PLUGIN_1.0;
+
+GOMP_PLUGIN_1.2 {
+  global:
+   GOMP_PLUGIN_acc_thread_default_async;
+} GOMP_PLUGIN_1.1;


Grüße
 Thomas


[gomp4] Async related additions to OpenACC runtime library

2017-02-13 Thread Chung-Lin Tang
This patch adds:

// New functions to set/get the current default async queue
void acc_set_default_async (int);
int acc_get_default_async (void);

and _async versions of a few existing API functions:

void acc_copyin_async (void *, size_t, int);
void acc_create_async (void *, size_t, int);
void acc_copyout_async (void *, size_t, int);
void acc_delete_async (void *, size_t, int);
void acc_update_device_async (void *, size_t, int);
void acc_update_self_async (void *, size_t, int);
void acc_memcpy_to_device_async (void *, void *, size_t, int);
void acc_memcpy_from_device_async (void *, void *, size_t, int);

These implement part of the additional requirements for OpenACC 2.5
Tested and committed to gomp-4_0-branch.

Chung-Lin

2017-02-13  Chung-Lin Tang  

libgomp/
* oacc-async.c (acc_get_default_async): New API function.
(acc_set_default_async): Likewise.
* oacc-init.c ():
* oacc-int.h (struct goacc_thread): Add default_async field.
* oacc-mem.c (memcpy_tofrom_device): New function, combined from
acc_memcpy_to/from_device functions, now with async parameter.
(acc_memcpy_to_device): Modify to use memcpy_tofrom_device.
(acc_memcpy_from_device): Likewise.
(acc_memcpy_to_device_async): New API function.
(acc_memcpy_from_device_async): Likewise.
(present_create_copy): Add async parameter.
(acc_create): Adjust present_create_copy call.
(acc_copyin): Likewise.
(acc_present_or_create): Likewise.
(acc_present_or_copyin): Likewise.
(acc_create_async): New API function.
(acc_copyin_async): New API function.
(delete_copyout): Add async parameter.
(acc_delete): Adjust delete_copyout call.
(acc_copyout): Likewise.
(acc_delete_async): New API function.
(acc_copyout_async): Likewise.
(update_dev_host): Add async parameter.
(acc_update_device): Adjust update_dev_host call.
(acc_update_self): Likewise.
(acc_update_device_async): New API function.
(acc_update_self_async): Likewise.
* oacc-plugin.c (GOMP_PLUGIN_acc_thread_default_async): New function.
* oacc-plugin.h (GOMP_PLUGIN_acc_thread_default_async): Declare.
* openacc.f90 (acc_async_default): Declare.
(acc_set_default_async): Likewise.
(acc_get_default_async): Likewise.
* openacc_lib.h (acc_async_default): Declare.
(acc_set_default_async): Likewise.
(acc_get_default_async): Likewise.
* testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
* testsuite/libgomp.oacc-fortran/lib-16.f90: New test.

include/
* gomp-constants.h (GOMP_ASYNC_DEFAULT): Define.
Index: libgomp/oacc-async.c
===
--- libgomp/oacc-async.c(revision 245382)
+++ libgomp/oacc-async.c(working copy)
@@ -105,3 +105,28 @@ acc_wait_all_async (int async)
 
   thr->dev->openacc.async_wait_all_async_func (async);
 }
+
+int
+acc_get_default_async (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  return thr->default_async;
+}
+
+void
+acc_set_default_async (int async)
+{
+  if (async < acc_async_sync)
+gomp_fatal ("invalid async argument: %d", async);
+
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  thr->default_async = async;
+}
Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c (revision 245382)
+++ libgomp/oacc-init.c (working copy)
@@ -437,6 +437,8 @@ goacc_attach_host_thread_to_device (int ord)
   
   thr->target_tls
 = acc_dev->openacc.create_thread_data_func (ord);
+
+  thr->default_async = acc_async_default;
   
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
Index: libgomp/oacc-int.h
===
--- libgomp/oacc-int.h  (revision 245382)
+++ libgomp/oacc-int.h  (working copy)
@@ -73,6 +73,9 @@ struct goacc_thread
 
   /* Target-specific data (used by plugin).  */
   void *target_tls;
+
+  /* Default OpenACC async queue for current thread, exported to plugin.  */
+  int default_async;
 };
 
 #if defined HAVE_TLS || defined USE_EMUTLS
Index: libgomp/oacc-mem.c
===
--- libgomp/oacc-mem.c  (revision 245382)
+++ libgomp/oacc-mem.c  (working copy)
@@ -153,8 +153,9 @@ acc_free (void *d)
 gomp_fatal ("error in freeing device memory in %s", __FUNCTION__);
 }
 
-void
-acc_memcpy_to_device (void *d, void *h, size_t s)
+static void
+memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async,
+