Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-18 Thread Jakub Jelinek
On Wed, May 18, 2016 at 05:40:33PM +0800, Chung-Lin Tang wrote:
> On 2016/5/17 5:00 PM, Jakub Jelinek wrote:
> > On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
> >> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be 
> >> enough?
> > 
> > I think so.
> > 
> > Jakub
> > 
> 
> How is this patch? I have added a comment to remind to adjust the version 
> number
> when plugin interface changes are made.

Ok.

> And, if this approach to solve the plugin interface problem is okay, can I 
> regard the
> device lock deadlock patches and async patch approved for trunk?

Yes.

>   include/
>   * gomp-constants.h (GOMP_VERSION): Increment to 1, add comment describe 
> the
>   need for increment this macro whenever the plugin interface is modified.
> 

> Index: gomp-constants.h
> ===
> --- gomp-constants.h  (revision 236363)
> +++ gomp-constants.h  (working copy)
> @@ -196,8 +196,10 @@ enum gomp_map_kind
>  /* Internal to libgomp.  */
>  #define GOMP_TARGET_FLAG_UPDATE  (1U << 31)
>  
> -/* Versions of libgomp and device-specific plugins.  */
> -#define GOMP_VERSION 0
> +/* Versions of libgomp and device-specific plugins.  GOMP_VERSION
> +   should be incremented whenever an ABI-incompatible change is introduced
> +   to the plugin interface defined in libgomp/libgomp.h.  */
> +#define GOMP_VERSION 1
>  #define GOMP_VERSION_NVIDIA_PTX 1
>  #define GOMP_VERSION_INTEL_MIC 0
>  #define GOMP_VERSION_HSA 0


Jakub


Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-18 Thread Chung-Lin Tang
On 2016/5/17 5:00 PM, Jakub Jelinek wrote:
> On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
>> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be 
>> enough?
> 
> I think so.
> 
>   Jakub
> 

How is this patch? I have added a comment to remind to adjust the version number
when plugin interface changes are made.

And, if this approach to solve the plugin interface problem is okay, can I 
regard the
device lock deadlock patches and async patch approved for trunk?

Thanks,
Chung-Lin

include/
* gomp-constants.h (GOMP_VERSION): Increment to 1, add comment describe 
the
need for increment this macro whenever the plugin interface is modified.

Index: gomp-constants.h
===
--- gomp-constants.h(revision 236363)
+++ gomp-constants.h(working copy)
@@ -196,8 +196,10 @@ enum gomp_map_kind
 /* Internal to libgomp.  */
 #define GOMP_TARGET_FLAG_UPDATE(1U << 31)
 
-/* Versions of libgomp and device-specific plugins.  */
-#define GOMP_VERSION   0
+/* Versions of libgomp and device-specific plugins.  GOMP_VERSION
+   should be incremented whenever an ABI-incompatible change is introduced
+   to the plugin interface defined in libgomp/libgomp.h.  */
+#define GOMP_VERSION   1
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 #define GOMP_VERSION_HSA 0


Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-17 Thread Jakub Jelinek
On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be 
> enough?

I think so.

Jakub


Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-17 Thread Chung-Lin Tang
On 2016/5/12 6:56 PM, Jakub Jelinek wrote:
> On Thu, May 12, 2016 at 12:47:18PM +0200, Thomas Schwinge wrote:
>> Hi!
>>
>> On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek  wrote:
>>> ABI incompatible change for the plugin
>>> interface (affecting OpenACC capable plugins only), I think you just should
>>> rename the plugin callback you add the argument to, so that
>>>   || !DLSYM_OPT (openacc.register_async_cleanup,
>>>  openacc_register_async_cleanup)
>>> would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
>>> or vice versa.
>>
>> Hmm, as far as I remember, we had previously agreed that libgomp plugin
>> ABI changes are not of any concern, given that libgomp and its plugins
>> will always be built from the same sources, at the same time, and so
>> their ABIs will always correspond?  Discussed before in
>> ,
>> for example.
> 
> I thought the agreement was that it is ok not to support mixing of
> different libgomp and plugin versions, but we should make sure that we
> refuse to load the plugin in case of mismatch, instead of silently crashing.
> Of course, changes in unreleased compiler versions are fine.
> 
> So, I'm not asking for compatibility in that 6.x nvptx plugin should still
> work with 7.x libgomp and vice versa, but that it would be ignored or
> diagnosed if somebody mixes it.
> 
>   Jakub
> 

I assume that bumping GOMP_VERSION in include/gomp-constants.h would be enough?

Renaming is just really ugly, probably okay for the async patch which touches 
just one plugin hook,
but the device lock deadlock fix audits several of them in a minor way; doesn't 
look like
the right solution to rename them all.

Chung-Lin



Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-12 Thread Jakub Jelinek
On Thu, May 12, 2016 at 12:47:18PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek  wrote:
> > ABI incompatible change for the plugin
> > interface (affecting OpenACC capable plugins only), I think you just should
> > rename the plugin callback you add the argument to, so that
> >   || !DLSYM_OPT (openacc.register_async_cleanup,
> >  openacc_register_async_cleanup)
> > would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
> > or vice versa.
> 
> Hmm, as far as I remember, we had previously agreed that libgomp plugin
> ABI changes are not of any concern, given that libgomp and its plugins
> will always be built from the same sources, at the same time, and so
> their ABIs will always correspond?  Discussed before in
> ,
> for example.

I thought the agreement was that it is ok not to support mixing of
different libgomp and plugin versions, but we should make sure that we
refuse to load the plugin in case of mismatch, instead of silently crashing.
Of course, changes in unreleased compiler versions are fine.

So, I'm not asking for compatibility in that 6.x nvptx plugin should still
work with 7.x libgomp and vice versa, but that it would be ignored or
diagnosed if somebody mixes it.

Jakub


Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-12 Thread Thomas Schwinge
Hi!

On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek  wrote:
> ABI incompatible change for the plugin
> interface (affecting OpenACC capable plugins only), I think you just should
> rename the plugin callback you add the argument to, so that
>   || !DLSYM_OPT (openacc.register_async_cleanup,
>  openacc_register_async_cleanup)
> would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
> or vice versa.

Hmm, as far as I remember, we had previously agreed that libgomp plugin
ABI changes are not of any concern, given that libgomp and its plugins
will always be built from the same sources, at the same time, and so
their ABIs will always correspond?  Discussed before in
,
for example.


Grüße
 Thomas


Re: [PATCH, libgomp] Rewire OpenACC async

2016-05-12 Thread Jakub Jelinek
On Tue, Mar 29, 2016 at 05:48:25PM +0800, Chung-Lin Tang wrote:
> I've updated this patch for trunk (as attached), and re-tested without
> regressions. This patch is still a fix for 
> libgomp.oacc-c-c++-common/asyncwait-1.c,
> which FAILs right now.
> 
> ChangeLog is still as before. Is this okay for trunk?

Mostly ok for trunk, but as it is an ABI incompatible change for the plugin
interface (affecting OpenACC capable plugins only), I think you just should
rename the plugin callback you add the argument to, so that
  || !DLSYM_OPT (openacc.register_async_cleanup,
 openacc_register_async_cleanup)
would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
or vice versa.

Jakub


Re: [PATCH, libgomp] Rewire OpenACC async (Ping x3)

2016-05-11 Thread Chung-Lin Tang
Ping x3

On 2016/4/16 3:40 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>>> I've updated this patch for trunk (as attached), and re-tested without
>>> regressions. This patch is still a fix for 
>>> libgomp.oacc-c-c++-common/asyncwait-1.c,
>>> which FAILs right now.
>>>
>>> ChangeLog is still as before. Is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
 Ping.

 On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
>
> Before this patch, we had a somewhat confusing way of implementing this
> by having two refcounts for each mapping: refcount and async_refcount,
> which I never got working again after the last wave of async regressions
> showed up.
>
> So this patch implements what I believe to be a simplification: 
> async_refcount
> is removed, and instead of trying to queue the async copyouts during 
> unmapping
> we actually do that during the plugin event handling. This requires a 
> addition
> of the async stream integer as an argument to the register_async_cleanup
> plugin hook, but overall I think this should be more elegant than before.
>
> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
> It also fixed data-[23].c regressions before, but some other recent 
> check-in
> happened to already fixed those.
>
> Tested without regressions, is this okay for trunk?
>
> Thanks,
> Chung-Lin
>
> 2015-11-24  Chung-Lin Tang  
>
> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
> parameter, use to set async stream around call to gomp_unmap_vars,
> call gomp_unmap_vars() with 'do_copyfrom' set to true.
> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
> (event_gc): Adjust event handling loop, collect 
> PTX_EVT_ASYNC_CLEANUP
> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
> (event_add): Add int parameter, initialize 'val' field when
> adding new ptx_event struct.
> (nvptx_evec): Adjust event_add() call arguments.
> (nvptx_host2dev): Likewise.
> (nvptx_dev2host): Likewise.
> (nvptx_wait_async): Likewise.
> (nvptx_wait_all_async): Likewise.
> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async 
> parameter,
> pass to event_add() call.
> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int 
> async'
> parameter.
> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
> call openacc.register_async_cleanup_func() hook.
> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
> * target.c (gomp_copy_from_async): Delete function.
> (gomp_map_vars): Remove async_refcount.
> (gomp_unmap_vars): Likewise.
> (gomp_load_image_to_device): Likewise.
> (omp_target_associate_ptr): Likewise.
> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
> (gomp_copy_from_async): Remove.
>

>>>
>>
> 



Re: [PATCH, libgomp] Rewire OpenACC async

2016-04-16 Thread Chung-Lin Tang
Ping.

On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>> I've updated this patch for trunk (as attached), and re-tested without
>> regressions. This patch is still a fix for 
>> libgomp.oacc-c-c++-common/asyncwait-1.c,
>> which FAILs right now.
>>
>> ChangeLog is still as before. Is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>>> Ping.
>>>
>>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
 Hi, this patch reworks some of the way that asynchronous copyouts are
 implemented for OpenACC in libgomp.

 Before this patch, we had a somewhat confusing way of implementing this
 by having two refcounts for each mapping: refcount and async_refcount,
 which I never got working again after the last wave of async regressions
 showed up.

 So this patch implements what I believe to be a simplification: 
 async_refcount
 is removed, and instead of trying to queue the async copyouts during 
 unmapping
 we actually do that during the plugin event handling. This requires a 
 addition
 of the async stream integer as an argument to the register_async_cleanup
 plugin hook, but overall I think this should be more elegant than before.

 This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
 It also fixed data-[23].c regressions before, but some other recent 
 check-in
 happened to already fixed those.

 Tested without regressions, is this okay for trunk?

 Thanks,
 Chung-Lin

 2015-11-24  Chung-Lin Tang  

 * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
 * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
 parameter, use to set async stream around call to gomp_unmap_vars,
 call gomp_unmap_vars() with 'do_copyfrom' set to true.
 * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
 (event_gc): Adjust event handling loop, collect 
 PTX_EVT_ASYNC_CLEANUP
 events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
 (event_add): Add int parameter, initialize 'val' field when
 adding new ptx_event struct.
 (nvptx_evec): Adjust event_add() call arguments.
 (nvptx_host2dev): Likewise.
 (nvptx_dev2host): Likewise.
 (nvptx_wait_async): Likewise.
 (nvptx_wait_all_async): Likewise.
 (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
 pass to event_add() call.
 * oacc-host.c (host_openacc_register_async_cleanup): Add 'int 
 async'
 parameter.
 * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
 call openacc.register_async_cleanup_func() hook.
 * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
 * target.c (gomp_copy_from_async): Delete function.
 (gomp_map_vars): Remove async_refcount.
 (gomp_unmap_vars): Likewise.
 (gomp_load_image_to_device): Likewise.
 (omp_target_associate_ptr): Likewise.
 * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
 (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
 (gomp_copy_from_async): Remove.

>>>
>>
> 



Re: [PATCH, libgomp] Rewire OpenACC async

2016-04-08 Thread Chung-Lin Tang
Ping.

On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
> I've updated this patch for trunk (as attached), and re-tested without
> regressions. This patch is still a fix for 
> libgomp.oacc-c-c++-common/asyncwait-1.c,
> which FAILs right now.
> 
> ChangeLog is still as before. Is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>>> Hi, this patch reworks some of the way that asynchronous copyouts are
>>> implemented for OpenACC in libgomp.
>>>
>>> Before this patch, we had a somewhat confusing way of implementing this
>>> by having two refcounts for each mapping: refcount and async_refcount,
>>> which I never got working again after the last wave of async regressions
>>> showed up.
>>>
>>> So this patch implements what I believe to be a simplification: 
>>> async_refcount
>>> is removed, and instead of trying to queue the async copyouts during 
>>> unmapping
>>> we actually do that during the plugin event handling. This requires a 
>>> addition
>>> of the async stream integer as an argument to the register_async_cleanup
>>> plugin hook, but overall I think this should be more elegant than before.
>>>
>>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>>> It also fixed data-[23].c regressions before, but some other recent check-in
>>> happened to already fixed those.
>>>
>>> Tested without regressions, is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> 2015-11-24  Chung-Lin Tang  
>>>
>>> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>>> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>>> parameter, use to set async stream around call to gomp_unmap_vars,
>>> call gomp_unmap_vars() with 'do_copyfrom' set to true.
>>> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>>> (event_gc): Adjust event handling loop, collect 
>>> PTX_EVT_ASYNC_CLEANUP
>>> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>>> (event_add): Add int parameter, initialize 'val' field when
>>> adding new ptx_event struct.
>>> (nvptx_evec): Adjust event_add() call arguments.
>>> (nvptx_host2dev): Likewise.
>>> (nvptx_dev2host): Likewise.
>>> (nvptx_wait_async): Likewise.
>>> (nvptx_wait_all_async): Likewise.
>>> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>>> pass to event_add() call.
>>> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>>> parameter.
>>> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>>> call openacc.register_async_cleanup_func() hook.
>>> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>>> * target.c (gomp_copy_from_async): Delete function.
>>> (gomp_map_vars): Remove async_refcount.
>>> (gomp_unmap_vars): Likewise.
>>> (gomp_load_image_to_device): Likewise.
>>> (omp_target_associate_ptr): Likewise.
>>> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>>> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>>> (gomp_copy_from_async): Remove.
>>>
>>
> 



Re: [PATCH, libgomp] Rewire OpenACC async

2016-03-29 Thread Chung-Lin Tang
I've updated this patch for trunk (as attached), and re-tested without
regressions. This patch is still a fix for 
libgomp.oacc-c-c++-common/asyncwait-1.c,
which FAILs right now.

ChangeLog is still as before. Is this okay for trunk?

Thanks,
Chung-Lin

On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>> Hi, this patch reworks some of the way that asynchronous copyouts are
>> implemented for OpenACC in libgomp.
>>
>> Before this patch, we had a somewhat confusing way of implementing this
>> by having two refcounts for each mapping: refcount and async_refcount,
>> which I never got working again after the last wave of async regressions
>> showed up.
>>
>> So this patch implements what I believe to be a simplification: 
>> async_refcount
>> is removed, and instead of trying to queue the async copyouts during 
>> unmapping
>> we actually do that during the plugin event handling. This requires a 
>> addition
>> of the async stream integer as an argument to the register_async_cleanup
>> plugin hook, but overall I think this should be more elegant than before.
>>
>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>> It also fixed data-[23].c regressions before, but some other recent check-in
>> happened to already fixed those.
>>
>> Tested without regressions, is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-11-24  Chung-Lin Tang  
>>
>> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>> parameter, use to set async stream around call to gomp_unmap_vars,
>> call gomp_unmap_vars() with 'do_copyfrom' set to true.
>> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>> (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>> (event_add): Add int parameter, initialize 'val' field when
>> adding new ptx_event struct.
>> (nvptx_evec): Adjust event_add() call arguments.
>> (nvptx_host2dev): Likewise.
>> (nvptx_dev2host): Likewise.
>> (nvptx_wait_async): Likewise.
>> (nvptx_wait_all_async): Likewise.
>> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>> pass to event_add() call.
>> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>> parameter.
>> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>> call openacc.register_async_cleanup_func() hook.
>> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>> * target.c (gomp_copy_from_async): Delete function.
>> (gomp_map_vars): Remove async_refcount.
>> (gomp_unmap_vars): Likewise.
>> (gomp_load_image_to_device): Likewise.
>> (omp_target_associate_ptr): Likewise.
>> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>> (gomp_copy_from_async): Remove.
>>
> 

Index: oacc-host.c
===
--- oacc-host.c	(revision 234516)
+++ oacc-host.c	(working copy)
@@ -144,7 +144,8 @@ host_openacc_exec (void (*fn) (void *),
 }
 
 static void
-host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)))
+host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)),
+ int async __attribute__ ((unused)))
 {
 }
 
Index: oacc-mem.c
===
--- oacc-mem.c	(revision 234516)
+++ oacc-mem.c	(working copy)
@@ -661,10 +661,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyf
   if (async < acc_async_noval)
 gomp_unmap_vars (t, true);
   else
-{
-  gomp_copy_from_async (t);
-  acc_dev->openacc.register_async_cleanup_func (t);
-}
+t->device_descr->openacc.register_async_cleanup_func (t, async);
 
   gomp_debug (0, "  %s: mappings restored\n", __FUNCTION__);
 }
Index: oacc-parallel.c
===
--- oacc-parallel.c	(revision 234516)
+++ oacc-parallel.c	(working copy)
@@ -186,10 +186,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void
   if (async < acc_async_noval)
 gomp_unmap_vars (tgt, true);
   else
-{
-  gomp_copy_from_async (tgt);
-  acc_dev->openacc.register_async_cleanup_func (tgt);
-}
+tgt->device_descr->openacc.register_async_cleanup_func (tgt, async);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
Index: target.c
===
--- target.c	(revision 234516)
+++ target.c	(working copy)
@@ -663,7 +663,6 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 		tgt->list[i].offset = 0;
 		

Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-22 Thread Chung-Lin Tang
Ping.

On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing this
> by having two refcounts for each mapping: refcount and async_refcount,
> which I never got working again after the last wave of async regressions
> showed up.
> 
> So this patch implements what I believe to be a simplification: async_refcount
> is removed, and instead of trying to queue the async copyouts during unmapping
> we actually do that during the plugin event handling. This requires a addition
> of the async stream integer as an argument to the register_async_cleanup
> plugin hook, but overall I think this should be more elegant than before.
> 
> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
> It also fixed data-[23].c regressions before, but some other recent check-in
> happened to already fixed those.
> 
> Tested without regressions, is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-11-24  Chung-Lin Tang  
> 
> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
> parameter, use to set async stream around call to gomp_unmap_vars,
> call gomp_unmap_vars() with 'do_copyfrom' set to true.
> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
> (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
> (event_add): Add int parameter, initialize 'val' field when
> adding new ptx_event struct.
> (nvptx_evec): Adjust event_add() call arguments.
> (nvptx_host2dev): Likewise.
> (nvptx_dev2host): Likewise.
> (nvptx_wait_async): Likewise.
> (nvptx_wait_all_async): Likewise.
> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
> pass to event_add() call.
> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
> parameter.
> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
> call openacc.register_async_cleanup_func() hook.
> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
> * target.c (gomp_copy_from_async): Delete function.
> (gomp_map_vars): Remove async_refcount.
> (gomp_unmap_vars): Likewise.
> (gomp_load_image_to_device): Likewise.
> (omp_target_associate_ptr): Likewise.
> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
> (gomp_copy_from_async): Remove.
> 



Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-05 Thread Chung-Lin Tang
On 2015/12/1 08:01 PM, Julian Brown wrote:
> On Tue, 24 Nov 2015 18:27:24 +0800
> Chung-Lin Tang  wrote:
> 
>> Hi, this patch reworks some of the way that asynchronous copyouts are
>> implemented for OpenACC in libgomp.
>>
>> Before this patch, we had a somewhat confusing way of implementing
>> this by having two refcounts for each mapping: refcount and
>> async_refcount, which I never got working again after the last wave
>> of async regressions showed up.
>>
>> So this patch implements what I believe to be a simplification:
>> async_refcount is removed, and instead of trying to queue the async
>> copyouts during unmapping we actually do that during the plugin event
>> handling. This requires a addition of the async stream integer as an
>> argument to the register_async_cleanup plugin hook, but overall I
>> think this should be more elegant than before.
> 
> This looks OK to me I think (I've only looked fairly briefly). I vaguely
> remember trying something along these lines in an earlier iteration of
> the async support -- maybe hitting problems with locking (I see you
> have code to mitigate problems with that, and locking generally has
> probably evolved a bit since I last looked at the code in detail
> anyway).
> 
> Can event_gc ever be called when the *device* lock is held?

It only matters when the memmap_lockable argument is true, and for those
cases, no the device lock is never held.

> I'm slightly concerned that pushing async unmapping into event_gc means
> that program-level semantics are deferred to the backend, which is
> arguably the wrong place. But then I don't understand what went wrong
> with the dual-refcount implementation, so maybe it's unavoidable for
> some reason.

I got the dual-refcounting to work again (after the regressions first showed up)
in some cases briefly, but regressed in other testcases, which I don't recall
the full details now.
Indeed the copyout is now triggered inside the plugin, but it is still wrapped
inside GOMP_PLUGIN_async_unmap_vars(), so it's probably not too ugly.

Per our earlier internal discussion, I'm committing this to the gomp4 branch 
first.
Trunk will need to wait for Jakub's approval.

Thanks,
Chung-Lin



Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-01 Thread Julian Brown
On Tue, 24 Nov 2015 18:27:24 +0800
Chung-Lin Tang  wrote:

> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing
> this by having two refcounts for each mapping: refcount and
> async_refcount, which I never got working again after the last wave
> of async regressions showed up.
> 
> So this patch implements what I believe to be a simplification:
> async_refcount is removed, and instead of trying to queue the async
> copyouts during unmapping we actually do that during the plugin event
> handling. This requires a addition of the async stream integer as an
> argument to the register_async_cleanup plugin hook, but overall I
> think this should be more elegant than before.

This looks OK to me I think (I've only looked fairly briefly). I vaguely
remember trying something along these lines in an earlier iteration of
the async support -- maybe hitting problems with locking (I see you
have code to mitigate problems with that, and locking generally has
probably evolved a bit since I last looked at the code in detail
anyway).

Can event_gc ever be called when the *device* lock is held?

I'm slightly concerned that pushing async unmapping into event_gc means
that program-level semantics are deferred to the backend, which is
arguably the wrong place. But then I don't understand what went wrong
with the dual-refcount implementation, so maybe it's unavoidable for
some reason.

HTH,

Julian