Re: [PATCH] OpenACC reference count overhaul

2019-12-18 Thread Thomas Schwinge
Hi!

On 2019-12-11T18:22:00+0100, I wrote:
> On 2019-10-29T12:15:01+, Julian Brown  wrote:
>> I've removed the special-case handling
>> of pointers in the enter/exit data code, and combined the
>> gomp_acc_remove_pointer code (which now iterated over mappings
>> one-at-a-time anyway) with the loop iterating over mappings in the
>> new goacc_exit_data_internal function. It was a bit nonsensical to have
>> the "exit data" code split over two files, as before.
>
> Yes, I like that very much, and we shall tackle that next intermediate
> step

> One thing:
>
>> libgomp/
>
>> * oacc-parallel.c (find_pointer): Remove function.
>> (find_group_last, goacc_enter_data_internal,
>> goacc_exit_data_internal): New functions.
>> (GOACC_enter_exit_data): Use goacc_enter_data_internal and
>> goacc_exit_data_internal helper functions.
>
> It makes much sense to move all that into 'libgomp/oacc-mem.c', and as a
> preparational step, see attached "[OpenACC] Consolidate
> 'GOACC_enter_exit_data' and its helper functions in
> 'libgomp/oacc-mem.c'", committed to trunk in r279233.

Working incrementally towards the goal of unifying all that mapping
handling code, I did some refactoring ("No functional changes"): see the
attached "[OpenACC] Refactor 'present_create_copy' into
'goacc_enter_data'", "[OpenACC] Refactor 'delete_copyout' into
'goacc_exit_data'", "[OpenACC] Refactor 'GOACC_enter_exit_data' to call
'goacc_enter_data', 'goacc_exit_data'", "[OpenACC] Refactor
'goacc_remove_pointer' interface", "[OpenACC] Refactor 'goacc_enter_data'
so that it can be called from 'goacc_insert_pointer', "not present"
case", "[OpenACC] Refactor 'goacc_enter_data' so that it can be called
from 'goacc_insert_pointer', "present" case, and simplify"; committed to
trunk in r279535, r279536, r279537, r279538, r279539, r279540.


Grüße
 Thomas


From ab6f9acf81772264a8564f834b9c5d1b5b70213e Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Wed, 18 Dec 2019 17:01:51 +
Subject: [PATCH 1/6] [OpenACC] Refactor 'present_create_copy' into
 'goacc_enter_data'

Every caller passes in 'FLAG_PRESENT', 'FLAG_CREATE'.  Change the remaining
'FLAG_COPY' into the usual map kind.

No functional changes.

	libgomp/
	* oacc-mem.c (present_create_copy): Refactor into...
	(goacc_enter_data): ... this.  Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279535 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |  3 +++
 libgomp/oacc-mem.c | 37 ++---
 2 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 541a2c7610c..a5d6b51df5f 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,8 @@
 2019-12-18  Thomas Schwinge  
 
+	* oacc-mem.c (present_create_copy): Refactor into...
+	(goacc_enter_data): ... this.  Adjust all users.
+
 	* target.c (gomp_unmap_vars_internal): Add a safeguard to
 	'gomp_remove_var'.
 
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 32bf3656029..68b78b3f42f 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -492,12 +492,13 @@ acc_unmap_data (void *h)
 }
 }
 
-#define FLAG_PRESENT (1 << 0)
-#define FLAG_CREATE (1 << 1)
-#define FLAG_COPY (1 << 2)
+
+/* Enter a dynamic mapping.
+
+   Return the device pointer.  */
 
 static void *
-present_create_copy (unsigned f, void *h, size_t s, int async)
+goacc_enter_data (void *h, size_t s, unsigned short kind, int async)
 {
   void *d;
   splay_tree_key n;
@@ -530,12 +531,6 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
   /* Present. */
   d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
 
-  if (!(f & FLAG_PRESENT))
-{
-	  gomp_mutex_unlock (_dev->lock);
-  gomp_fatal ("[%p,+%d] already mapped to [%p,+%d]",
-	  (void *)h, (int)s, (void *)d, (int)s);
-	}
   if ((h + s) > (void *)n->host_end)
 	{
 	  gomp_mutex_unlock (_dev->lock);
@@ -549,29 +544,18 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
 
   gomp_mutex_unlock (_dev->lock);
 }
-  else if (!(f & FLAG_CREATE))
-{
-  gomp_mutex_unlock (_dev->lock);
-  gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
-}
   else
 {
   struct target_mem_desc *tgt;
   size_t mapnum = 1;
-  unsigned short kinds;
   void *hostaddrs = h;
 
-  if (f & FLAG_COPY)
-	kinds = GOMP_MAP_TO;
-  else
-	kinds = GOMP_MAP_ALLOC;
-
   gomp_mutex_unlock (_dev->lock);
 
   goacc_aq aq = get_goacc_asyncqueue (async);
 
   tgt = gomp_map_vars_async (acc_dev, aq, mapnum, , NULL, ,
- , true, GOMP_MAP_VARS_ENTER_DATA);
+ , true, GOMP_MAP_VARS_ENTER_DATA);
   assert (tgt);
   n = tgt->list[0].key;
   assert (n->refcount == 1);
@@ -593,13 +577,13 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
 void *
 acc_create (void *h, size_t s)
 {
-  return present_create_copy 

Re: [PATCH] OpenACC reference count overhaul

2019-12-11 Thread Thomas Schwinge
Hi!

On 2019-10-29T12:15:01+, Julian Brown  wrote:
> On Mon, 21 Oct 2019 16:14:11 +0200
> Thomas Schwinge  wrote:
>> On 2019-10-03T09:35:04-0700, Julian Brown 
>> wrote:
>> >  void
>> > -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom,
>> > int async,
>> > -   int finalize, int mapnum)
>> > +gomp_acc_remove_pointer (struct gomp_device_descr *acc_dev, void
>> > **hostaddrs,
>> > +   size_t *sizes, unsigned short *kinds, int
>> > async,
>> > +   bool finalize, int mapnum)
>> >  {  
>> > [...]

> That code's all gone with this version...

\o/ Yay!

>> > --- a/libgomp/oacc-parallel.c
>> > +++ b/libgomp/oacc-parallel.c
>> > @@ -56,12 +56,29 @@ find_pointer (int pos, size_t mapnum, unsigned
>> > short *kinds)  
>> 
>> I've always been confused by this function (before known as
>> 'find_pset'); this feels wrong, but I've never gotten to the bottom
>> of it.
>
> This version removes that function in favour of a function that finds
> groups of consecutive mappings that should be kept together for a
> single gomp_map_vars invocation. I think that fits better with my
> findings as written up on the wiki page
> https://gcc.gnu.org/wiki/LibgompPointerMappingKinds.

\o/ Yay!

>> > [...]
>> 
>> ;-) Yuck.  As requested before: "Can we get a comment added to such
>> 'magic', please?"
>
> That magic is gone now. 

\o/ Yay!

>> I just wish that eventually we'll be able to can get rid of that
>> stuff, and just let 'gomp_map_vars' do its thing.  Similar to
>>  "'GOACC_parallel_keyed' should use
>> 'GOMP_MAP_VARS_TARGET'".
>> 
>> (For avoidance of doubt, that's not your task right now.)

> I've removed the special-case handling
> of pointers in the enter/exit data code, and combined the
> gomp_acc_remove_pointer code (which now iterated over mappings
> one-at-a-time anyway) with the loop iterating over mappings in the
> new goacc_exit_data_internal function. It was a bit nonsensical to have
> the "exit data" code split over two files, as before.

Yes, I like that very much, and we shall tackle that next intermediate
step once your patch for  "[OpenACC] In
async context, need to use 'gomp_remove_var_async' instead of
'gomp_remove_var'" is done,
<87tv681tb3.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87tv681tb3.fsf@euler.schwinge.homeip.net>.

One thing:

> libgomp/

> * oacc-parallel.c (find_pointer): Remove function.
> (find_group_last, goacc_enter_data_internal,
> goacc_exit_data_internal): New functions.
> (GOACC_enter_exit_data): Use goacc_enter_data_internal and
> goacc_exit_data_internal helper functions.

It makes much sense to move all that into 'libgomp/oacc-mem.c', and as a
preparational step, see attached "[OpenACC] Consolidate
'GOACC_enter_exit_data' and its helper functions in
'libgomp/oacc-mem.c'", committed to trunk in r279233.


Grüße
 Thomas


From ca9b2739279e3ef0080164a68f082bcfb5169095 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Wed, 11 Dec 2019 16:49:17 +
Subject: [PATCH] [OpenACC] Consolidate 'GOACC_enter_exit_data' and its helper
 functions in 'libgomp/oacc-mem.c'

	libgomp/
	* oacc-parallel.c (find_pointer, GOACC_enter_exit_data): Move...
	* oacc-mem.c: ... here.
	(gomp_acc_insert_pointer, gomp_acc_remove_pointer): Rename to
	'goacc_insert_pointer', 'goacc_remove_pointer', and make 'static'.
	* libgomp.h (gomp_acc_insert_pointer, gomp_acc_remove_pointer):
	Remove.
	* libgomp_g.h: Update.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279233 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog   |   8 ++
 libgomp/libgomp.h   |   2 -
 libgomp/libgomp_g.h |   7 +-
 libgomp/oacc-mem.c  | 274 +++-
 libgomp/oacc-parallel.c | 253 -
 5 files changed, 281 insertions(+), 263 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index f7d9ae98616..0a5650ed438 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,13 @@
 2019-12-11  Thomas Schwinge  
 
+	* oacc-parallel.c (find_pointer, GOACC_enter_exit_data): Move...
+	* oacc-mem.c: ... here.
+	(gomp_acc_insert_pointer, gomp_acc_remove_pointer): Rename to
+	'goacc_insert_pointer', 'goacc_remove_pointer', and make 'static'.
+	* libgomp.h (gomp_acc_insert_pointer, gomp_acc_remove_pointer):
+	Remove.
+	* libgomp_g.h: Update.
+
 	* oacc-parallel.c (GOACC_wait, goacc_wait): Move...
 	* oacc-async.c: ... here.
 	* oacc-int.h (goacc_wait): Declare.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a35aa07c80b..9f4d0428871 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1138,8 +1138,6 @@ enum gomp_map_vars_kind
   GOMP_MAP_VARS_ENTER_DATA
 };
 
-extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
-extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
 extern void 

Re: [PATCH] OpenACC reference count overhaul

2019-12-10 Thread Chung-Lin Tang

On 2019/12/10 12:04 AM, Julian Brown wrote:

I'm citing below the changes introducing 'gomp_remove_var_async',
modelled similar to the existing 'gomp_unmap_vars_async'.


Also for both these, do I understand correctly, that it's actually not
the 'gomp_unref_tgt' that needs to be "delayed" via
'goacc_asyncqueue', but rather really only the
'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
target_mem_desc' alive?  Per my understanding, that one is one
component of the mapping table, and not relevant anymore (thus can be
'free'd) as soon as it has been determined that 'tgt->refcount ==
0'?  Am I missing something there?

IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying
freeing of the target_mem_desc isn't really a huge problem, in practice.


I don't clearly remember all the details. It could be possible that not
asyncqueue-ifying gomp_remove_var was simply an overlook.

The 'target_mem_desc' is supposed to represent the piece of device memory
inside libgomp, so unref/freeing it only after all dev-to-host copying is
done seems logical.

Chung-Lin


Re: [PATCH] OpenACC reference count overhaul

2019-12-09 Thread Julian Brown
On Mon, 9 Dec 2019 15:44:25 +0100
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2019-10-03T09:35:04-0700, Julian Brown 
> wrote:
> > --- a/libgomp/oacc-mem.c
> > +++ b/libgomp/oacc-mem.c  
> 
> > @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t
> > s, int async, const char *libfnname)  
> 
> >if (f & FLAG_COPYOUT)
> > [...]
> >   gomp_copy_dev2host (acc_dev, aq, h, d, s);
> > }
> > -  gomp_remove_var (acc_dev, n);
> > +  gomp_remove_var_async (acc_dev, n, aq);  
> 
> Conceptually, I understand correctly that we need to use this (new)
> 'gomp_remove_var_async' to make sure that we don't
> 'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above
> is still in process?

Yep.

> I'm curious why this isn't causing any problems for nvptx offloading
> already, any thoughts on that?  Or, is this just missing test
> coverage? (Always difficult for 'async' stuff, of course.)  By
> chance, is this right now already causing problems with AMD GCN
> offloading?  (I really need to set up AMD GCN offloading testing...)

In a few cases, async stuff on nvidia seems to "just work" even in
cases where we wouldn't expect it to via inspection (either because the
driver/hardware is doing something "magic", or because we're
somehow driving async operations in such a way that they run
synchronously in practice). One such case is with the "ephemeral"
asynchronous host-to-device memory copy patch.

The AMD side seems much more sensitive to improper async behaviour --
but I don't actually remember if I hit problems with this code in
particular.

> I'm citing below the changes introducing 'gomp_remove_var_async',
> modelled similar to the existing 'gomp_unmap_vars_async'.
> 
> 
> Also for both these, do I understand correctly, that it's actually not
> the 'gomp_unref_tgt' that needs to be "delayed" via
> 'goacc_asyncqueue', but rather really only the
> 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
> 'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
> target_mem_desc' alive?  Per my understanding, that one is one
> component of the mapping table, and not relevant anymore (thus can be
> 'free'd) as soon as it has been determined that 'tgt->refcount ==
> 0'?  Am I missing something there?

IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying
freeing of the target_mem_desc isn't really a huge problem, in practice.

> It will be OK to clean that up later, but I'd like to understand this
> now.  Well, or, stating that you just blindly copied that from the
> existing 'gomp_unmap_vars_async' is fine, too!  ;-P

Some changes arose via the porting to AMD GCN, and some may have been
drive-by fixes (e.g. where a synchronous call was used in a context
where it is obvious that an asynchronous call is really needed). Like
you mentioned, test coverage could probably be better, and writing
reliable tests for async behaviour is challenging.

Julian


Re: [PATCH] OpenACC reference count overhaul

2019-12-09 Thread Thomas Schwinge
Hi Julian!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t s, int 
> async, const char *libfnname)

>if (f & FLAG_COPYOUT)
> [...]
> gomp_copy_dev2host (acc_dev, aq, h, d, s);
>   }
> -  gomp_remove_var (acc_dev, n);
> +  gomp_remove_var_async (acc_dev, n, aq);

Conceptually, I understand correctly that we need to use this (new)
'gomp_remove_var_async' to make sure that we don't
'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above is
still in process?

I'm curious why this isn't causing any problems for nvptx offloading
already, any thoughts on that?  Or, is this just missing test coverage?
(Always difficult for 'async' stuff, of course.)  By chance, is this
right now already causing problems with AMD GCN offloading?  (I really
need to set up AMD GCN offloading testing...)


I'm citing below the changes introducing 'gomp_remove_var_async',
modelled similar to the existing 'gomp_unmap_vars_async'.


Also for both these, do I understand correctly, that it's actually not
the 'gomp_unref_tgt' that needs to be "delayed" via 'goacc_asyncqueue',
but rather really only the 'gomp_free_device_memory', called via
'gomp_unmap_tgt', called via 'gomp_unref_tgt'?  In other words: why do we
need to keep the 'struct target_mem_desc' alive?  Per my understanding,
that one is one component of the mapping table, and not relevant anymore
(thus can be 'free'd) as soon as it has been determined that
'tgt->refcount == 0'?  Am I missing something there?

It will be OK to clean that up later, but I'd like to understand this
now.  Well, or, stating that you just blindly copied that from the
existing 'gomp_unmap_vars_async' is fine, too!  ;-P


Grüße
 Thomas


> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -1092,32 +1106,66 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
>free (tgt);
>  }
>  
> -attribute_hidden bool
> -gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
> +static bool
> +gomp_unref_tgt (void *ptr)
>  {
>bool is_tgt_unmapped = false;
> -  splay_tree_remove (>mem_map, k);
> -  if (k->link_key)
> -splay_tree_insert (>mem_map, (splay_tree_node) k->link_key);
> -  if (k->tgt->refcount > 1)
> -k->tgt->refcount--;
> +
> +  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> +
> +  if (tgt->refcount > 1)
> +tgt->refcount--;
>else
>  {
> +  gomp_unmap_tgt (tgt);
>is_tgt_unmapped = true;
> -  gomp_unmap_tgt (k->tgt);
>  }
> +
>return is_tgt_unmapped;
>  }
>  
>  static void
> -gomp_unref_tgt (void *ptr)
> +gomp_unref_tgt_void (void *ptr)
>  {
> -  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> +  (void) gomp_unref_tgt (ptr);
> +}
>  
> -  if (tgt->refcount > 1)
> -tgt->refcount--;
> +static inline __attribute__((always_inline)) bool
> +gomp_remove_var_internal (struct gomp_device_descr *devicep, splay_tree_key 
> k,
> +   struct goacc_asyncqueue *aq)
> +{
> +  bool is_tgt_unmapped = false;
> +  splay_tree_remove (>mem_map, k);
> +  if (k->virtual_refcount == VREFCOUNT_LINK_KEY)
> +{
> +  if (k->u.link_key)
> + splay_tree_insert (>mem_map, (splay_tree_node) k->u.link_key);
> +}
> +  if (aq)
> +devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void,
> + (void *) k->tgt);
>else
> -gomp_unmap_tgt (tgt);
> +is_tgt_unmapped = gomp_unref_tgt ((void *) k->tgt);
> +  return is_tgt_unmapped;
> +}
> +
> +attribute_hidden bool
> +gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
> +{
> +  return gomp_remove_var_internal (devicep, k, NULL);
> +}
> +
> +/* Remove a variable asynchronously.  This actually removes the variable
> +   mapping immediately, but retains the linked target_mem_desc until the
> +   asynchronous operation has completed (as it may still refer to target
> +   memory).  The device lock must be held before entry, and remains locked on
> +   exit.  */
> +
> +attribute_hidden void
> +gomp_remove_var_async (struct gomp_device_descr *devicep, splay_tree_key k,
> +struct goacc_asyncqueue *aq)
> +{
> +  (void) gomp_remove_var_internal (devicep, k, aq);
>  }


signature.asc
Description: PGP signature


Re: [PATCH] OpenACC reference count overhaul

2019-11-22 Thread Julian Brown
On Sat, 9 Nov 2019 01:28:51 +
Julian Brown  wrote:

> On Thu, 31 Oct 2019 19:11:57 +0100
> Thomas Schwinge  wrote:
> 
> > So that's not related to reference counting, needs to be discussed
> > separately.
> > 
> > ..., and while I do agree that the current code is a bit "strange"
> > (returning 'tgt->to_free'), I couldn't quickly find or come up with
> > a test cases where this would actually do the wrong thing.  After
> > all, this is the code path taken for "not present", and 'tgt' is
> > built anew for one single mapping, with no alignment set (which
> > would cause 'to_free' to differ from 'tgt_start'); 'tgt_offset'
> > should always be zero, and 'h' always the same as 'host_start'.
> > What am I missing? That is, given the current set of libgomp test
> > cases, the attached never triggeres.  
> 
> The code can't stay exactly as it is with this patch, because the tgt
> return value from gomp_map_vars_async with
> GOMP_MAP_VARS_OPENACC_ENTER_DATA is a null pointer.
> 
> So, the device pointer calculation needed to be re-done -- although
> it's not quite a bug fix, as you point out, and some of the offsets
> will always be zero or cancel out in practice.
> 
> *However*, it looks like the device pointer calculation for the
> "present" case is wrong in the preceding code. I've addressed that in
> the patch posted here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00661.html
> 
> The patch attached here applies on top of that one, and attempts to
> keep the device pointer calculation "the same" for the non-present
> case, modulo an extra lookup_host -- and also adds some assertions to
> make sure the assumptions about zero/cancelled-out offsets stay true.

Here's another iteration that applies over the version of the
present/subarray patch committed, and also addresses the use of
REFCOUNT_INFINITY on target blocks as queried in the following message:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01146.html

Most uses of REFCOUNT_INFINITY indeed appear to be unreachable (as in,
a target_mem_desc with refcount == REFCOUNT_INFINITY will most of the
time be linked from a splay tree key with refcount ==
REFCOUNT_INFINITY, and the code to decrement the former's refcount
and/or free the block will never be called).

I found one case (for OpenACC) where a runtime check/error can be
added -- attempting to free a mapped target block corresponding to a
device_resident global variable using an API routine. I don't think
there's a code path using directives (for either OpenACC or OpenMP) that
exhibits any problematic behaviour in that regard. I've added a couple
of test cases, and a couple of assertions.

OK now? (Or perhaps the REFCOUNT_INFINITY bits want splitting out? It
all still arguably comes under the "refcount overhaul" umbrella!).

Thanks,

Julian
commit d027f9ab63fdf871076f40e2cf42f672ed3f0e69
Author: Julian Brown 
Date:   Mon Nov 5 15:51:46 2018 -0800

OpenACC reference count overhaul

2019-11-22  Julian Brown  
Thomas Schwinge  

libgomp/
* libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount
field for virtual_refcount.
(struct acc_dispatch_t): Remove data_environ field.
(struct gomp_device_descr): Update comment on openacc field.
(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
(gomp_acc_insert_pointer, gomp_acc_remove_pointer, gomp_free_memmap):
Remove prototypes.
(gomp_remove_var_async): Add prototype.
* oacc-host.c (host_dispatch): Don't initialise removed data_environ
field.
* oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var
instead of calling gomp_free_memmap.
* oacc-mem.c (lookup_dev_1): New function.
(lookup_dev): Reimplement using above.
(acc_free, acc_hostptr): Update calls to lookup_dev.
(acc_map_data): Likewise.  Don't add to data_environ list.
(acc_unmap_data): Remove call to gomp_unmap_vars.  Fix semantics to
remove mapping, but not mapped data.  Handle REFCOUNT_INFINITY on
target blocks.
(present_create_copy): Use virtual_refcount instead of
dynamic_refcount.  Don't manipulate data_environ.  Re-do lookup for
target pointer return value.
(delete_copyout): Update for virtual_refcount semantics.  Use
goacc_remove_var_async for asynchronous delete/copyouts.
(gomp_acc_insert_pointer, gomp_acc_remove_pointer): Remove functions.
* oacc-parallel.c (find_pointer): Remove function.
(find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal): New functions.
(GOACC_enter_exit_data): Use goacc_enter_data_internal and
goacc_exit_data_internal helper functions.
* target.c (gomp_map_vars_internal): Handle

Re: [PATCH] OpenACC reference count overhaul

2019-11-08 Thread Julian Brown
On Thu, 31 Oct 2019 19:11:57 +0100
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2019-10-29T12:15:01+, Julian Brown 
> wrote:
> > This is a new version of the patch which hopefully addresses all
> > review comments. Further commentary below.  
> 
> Thanks, great, looking into that one -- I see you're removing more and
> more special-case, strange code, replacing it with generic and/or
> well-explained code.
> 
> 
> Question, for my understanding:
> 
> > On Mon, 21 Oct 2019 16:14:11 +0200
> > Thomas Schwinge  wrote:  
> >> On 2019-10-03T09:35:04-0700, Julian Brown 
> >> wrote:  
> 
> >> > @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h,
> >> > size_t s, int async)
> >>   
> >> > -  d = tgt->to_free;
> >>   
> >> > +  n = lookup_host (acc_dev, h, s);
> >> > +  assert (n != NULL);
> >> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset +
> >> > (uintptr_t) h
> >> > +- n->host_start);
> >> 
> >> |   return d;
> >> 
> >> Again, it's not obvious to me how that is semantically equivalent
> >> to what we've returned before?  
> >
> > This is a bug fix (it's mentioned in the ChangeLog).  
> 
> Eh, well hidden.  Indeed that mentions:
> 
>   (present_create_copy): [...] Fix target pointer
>   return value.
> 
> So that's not related to reference counting, needs to be discussed
> separately.
> 
> ..., and while I do agree that the current code is a bit "strange"
> (returning 'tgt->to_free'), I couldn't quickly find or come up with a
> test cases where this would actually do the wrong thing.  After all,
> this is the code path taken for "not present", and 'tgt' is built
> anew for one single mapping, with no alignment set (which would cause
> 'to_free' to differ from 'tgt_start'); 'tgt_offset' should always be
> zero, and 'h' always the same as 'host_start'.  What am I missing?
> That is, given the current set of libgomp test cases, the attached
> never triggeres.

The code can't stay exactly as it is with this patch, because the tgt
return value from gomp_map_vars_async with
GOMP_MAP_VARS_OPENACC_ENTER_DATA is a null pointer.

So, the device pointer calculation needed to be re-done -- although it's
not quite a bug fix, as you point out, and some of the offsets will
always be zero or cancel out in practice.

*However*, it looks like the device pointer calculation for the
"present" case is wrong in the preceding code. I've addressed that in
the patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00661.html

The patch attached here applies on top of that one, and attempts to
keep the device pointer calculation "the same" for the non-present
case, modulo an extra lookup_host -- and also adds some assertions to
make sure the assumptions about zero/cancelled-out offsets stay true.

OK for trunk? Re-tested with offloading to nvptx.

Thanks,

Julian
commit f486944db79d4c9b8f3d96453d72f34c4a9f0aab
Author: Julian Brown 
Date:   Mon Nov 5 15:51:46 2018 -0800

OpenACC reference count overhaul

2019-10-28  Julian Brown  
Thomas Schwinge  

libgomp/
* libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount
field for virtual_refcount.
(struct acc_dispatch_t): Remove data_environ field.
(struct gomp_device_descr): Update comment on openacc field.
(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
(gomp_acc_insert_pointer, gomp_acc_remove_pointer, gomp_free_memmap):
Remove prototypes.
(gomp_remove_var_async): Add prototype.
* oacc-host.c (host_dispatch): Don't initialise removed data_environ
field.
* oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var
instead of calling gomp_free_memmap.
* oacc-mem.c (lookup_dev_1): New function.
(lookup_dev): Reimplement using above.
(acc_free, acc_hostptr): Update calls to lookup_dev.
(acc_map_data): Likewise.  Don't add to data_environ list.
(acc_unmap_data): Remove call to gomp_unmap_vars.  Fix semantics to
remove mapping, but not mapped data.
(present_create_copy): Use virtual_refcount instead of
dynamic_refcount.  Don't manipulate data_environ.  Fix target pointer
return value.
(delete_copyout): Update for virtual_refcount semantics.  Use
goacc_remove_var_async for asynchronous delete/copyouts.
(gomp_acc_insert_pointer, gomp_acc_remove_pointer): Remove functions.
* oacc-parallel.c (find_pointer): Remove function.
(find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal): New functions.
(GOACC_enter_exit_data): Use goacc_enter_data_internal and
goacc_exit_data_internal helper functions.
* target.c (gomp_map_vars_internal): Handle

Re: [PATCH] OpenACC reference count overhaul

2019-10-31 Thread Thomas Schwinge
Hi Julian!

On 2019-10-29T12:15:01+, Julian Brown  wrote:
> This is a new version of the patch which hopefully addresses all review
> comments. Further commentary below.

Thanks, great, looking into that one -- I see you're removing more and
more special-case, strange code, replacing it with generic and/or
well-explained code.


Question, for my understanding:

> On Mon, 21 Oct 2019 16:14:11 +0200
> Thomas Schwinge  wrote:
>> On 2019-10-03T09:35:04-0700, Julian Brown 
>> wrote:

>> > @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h, size_t s, 
>> > int async)  
>> 
>> > -  d = tgt->to_free;  
>> 
>> > +  n = lookup_host (acc_dev, h, s);
>> > +  assert (n != NULL);
>> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h
>> > +  - n->host_start);  
>> 
>> |   return d;
>> 
>> Again, it's not obvious to me how that is semantically equivalent to
>> what we've returned before?
>
> This is a bug fix (it's mentioned in the ChangeLog).

Eh, well hidden.  Indeed that mentions:

(present_create_copy): [...] Fix target pointer
return value.

So that's not related to reference counting, needs to be discussed
separately.

..., and while I do agree that the current code is a bit "strange"
(returning 'tgt->to_free'), I couldn't quickly find or come up with a
test cases where this would actually do the wrong thing.  After all, this
is the code path taken for "not present", and 'tgt' is built anew for one
single mapping, with no alignment set (which would cause 'to_free' to
differ from 'tgt_start'); 'tgt_offset' should always be zero, and 'h'
always the same as 'host_start'.  What am I missing?  That is, given the
current set of libgomp test cases, the attached never triggeres.


Grüße
 Thomas


>From 2751a55293620a777e6a02587f36ab645b64ef0f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 31 Oct 2019 19:09:47 +0100
Subject: [PATCH] libgomp/oacc-mem.c:present_create_copy: "not present"
 assertions

---
 libgomp/oacc-mem.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 3773cdab85d..51a89e31ae5 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1,3 +1,5 @@
+#pragma GCC optimize "-O0"
+
 /* OpenACC Runtime initialization routines
 
Copyright (C) 2013-2019 Free Software Foundation, Inc.
@@ -589,6 +591,12 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
   gomp_mutex_lock (_dev->lock);
 
   d = tgt->to_free;
+  n = lookup_host (acc_dev, h, s);
+  assert (n);
+  assert (n->tgt_offset == 0);
+  assert (d == (void *) n->tgt->tgt_start);
+  assert (n->host_start == (uintptr_t) h);
+
   tgt->prev = acc_dev->openacc.data_environ;
   acc_dev->openacc.data_environ = tgt;
 
-- 
2.17.1



Re: [PATCH] OpenACC reference count overhaul

2019-10-29 Thread Julian Brown
Hi!

This is a new version of the patch which hopefully addresses all review
comments. Further commentary below.

On Mon, 21 Oct 2019 16:14:11 +0200
Thomas Schwinge  wrote:

> On 2019-10-03T09:35:04-0700, Julian Brown 
> wrote:
> > This patch has been broken out of the patch supporting OpenACC 2.6
> > manual deep copy last posted here:
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html  
> 
> Thanks.
> 
> Remeber to look into  "Potential null
> pointer dereference in 'gomp_acc_remove_pointer'", which may be
> relevant here.

I've deleted the whole function (see below) so nothing to do there, I
don't think, even if that code had still been live in the last version
of the patch.

> I see you've merged in the relevant parts of my incremental patch
> '[WIP] OpenACC 2.6 manual deep copy support (attach/detach): adjust
> for "goacc_async_unmap_tgt" removal', that I included in
> ,
> which tells me that I supposedly understood that part alright.  ;-D

Yes I think so -- I'll add you as co-author to the ChangeLog. Apologies
for the omission!

> > As part of developing that patch, libgomp's OpenACC reference
> > counting implementation proved to be somewhat inconsistent,
> > especially when used in combination with the deep copy support
> > which exercises it more thoroughly.
> >
> > So, this patch contains just the changes to reference-counting
> > behaviour, for ease of (re-)review.  The other parts of OpenACC 2.6
> > manual deep copy support are forthcoming, but some changes in this
> > patch anticipate that support.  
> 
> As we're discussing these separately, please for now remove the
> changes related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving
> 'link_key' into an union (to later be shared with 'attach_count');
> <87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>.

Done (I have a plan for the link_key/attach_count fields, but it's not
in this patch, and I'm not sure how well it'll work out yet).

> > Tested with offloading to NVPTX, with good results (though a couple
> > of tests need fixing also).  
> 
> The testsuite changes we're discussing separately, and need to go in
> before this one, obviously.

Those tests no longer regress, so no testsuite changes are strictly
necessary for this patch.

> > OK for trunk?  
> 
> I haven't understood all the changes related to replacing
> 'dynamic_refcount' with 'virtual_refcount', getting rid of
> 'data_environ', the 'lookup_dev' rework, but I trust you got that
> right. In particular, these seem to remove special-case OpenACC code
> in favor of generic OMP code, which is good.

Yep -- the previous email you dug up included the following rationale:

 - reference counts in the linked memory-mapping splay tree structure
   can be self-checked for consistency using optional (i.e.
   development-only) code.  This survives a libgomp test run (with
   offloading to nvptx), so I'm reasonably confident it's good.

 - the "data_environ" field in the device descriptor -- a linear linked
   list containing a target memory descriptor for each "acc enter data"
   mapping -- has been removed.  This brings OpenACC closer to the
   OpenMP implementation for non-lexically-scoped data mapping
   (GOMP_target_enter_exit_data), and is potentially a performance win
   if lots of data is mapped in this way.

 - the semantics of the "dynamic_refcount" field in the splay_tree_key
   structure have shifted slightly, so I've renamed the field.  It now
   represents references that are excess to those represented by actual
   pointers in the linked splay tree/target-memory descriptor structure.
   That might have been the intention before in fact, but the
   implementation was inconsistent.

The big thing here is the auto-checking of refcounting behaviour. There
were quite a few corner cases that were broken before.

> A few more comments:
> 
> > --- a/libgomp/libgomp.h
> > +++ b/libgomp/libgomp.h  
> 
> >  typedef struct acc_dispatch_t
> >  {
> > -  /* This is a linked list of data mapped using the
> > - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit
> > data" pragmas.
> > - Unlike mapped_data in the goacc_thread struct, unmapping can
> > - happen out-of-order with respect to mapping.  */
> > -  /* This is guarded by the lock in the "outer" struct
> > gomp_device_descr.  */
> > -  struct target_mem_desc *data_environ;  
> 
> As mentioned before, please also accordingly update the comment
> attached to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'.

Done.

> That code:
> 
> > -/* Free address mapping tables.  MM must be locked on entry, and
> > remains locked
> > -   on return.  */
> > -
> > -attribute_hidden void
> > -gomp_free_memmap (struct splay_tree_s *mem_map)
> > -{
> > -  while (mem_map->root)
> > -{
> > -  struct target_mem_desc *tgt = mem_map->root->key.tgt;
> 

Re: [PATCH] OpenACC reference count overhaul

2019-10-23 Thread Thomas Schwinge
Hi Julian!

On 2019-10-21T16:14:11+0200, I wrote:
> On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
>> This patch has been broken out of the patch supporting OpenACC 2.6 manual
>> deep copy last posted here:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html
>
> Thanks.

I meanwhile re-discovered that an earlier submission,
,
had included some documentation/rationale for:

> I haven't understood all the changes related to replacing
> 'dynamic_refcount' with 'virtual_refcount', getting rid of
> 'data_environ', the 'lookup_dev' rework, but I trust you got that right.
> In particular, these seem to remove special-case OpenACC code in favor of
> generic OMP code, which is good.

... these changes.  Please in the future remember to refer to such
existing documentation/rationale, or again include in any re-submissions,
thanks.


>> Tested with offloading to NVPTX, with good results

I noticed that when testing with
'-foffload=x86_64-intelmicemul-linux-gnu', the x86_64-pc-linux-gnu '-m32'
multilib (but not default '-m64', huh) then reproducibly regresses:

PASS: libgomp.c/target-link-1.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/target-link-1.c execution test

..., with an un-helpful message: "offload error: process on the device 0
unexpectedly exited with code 0".

So non-OpenACC code paths seem to be negatively affected in some way?

Hopefully that'll go away when backing out the 'VREFCOUNT_LINK_KEY'
etc. changes, as discussed elsewhere.  (I can easily test patches for
you, no need for you to set up Intel MIC (emulated) offloading testing.)


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] OpenACC reference count overhaul

2019-10-21 Thread Thomas Schwinge
Hi!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> This patch has been broken out of the patch supporting OpenACC 2.6 manual
> deep copy last posted here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html

Thanks.

Remeber to look into  "Potential null
pointer dereference in 'gomp_acc_remove_pointer'", which may be relevant
here.

I see you've merged in the relevant parts of my incremental patch '[WIP]
OpenACC 2.6 manual deep copy support (attach/detach): adjust for
"goacc_async_unmap_tgt" removal', that I included in
,
which tells me that I supposedly understood that part alright.  ;-D

> As part of developing that patch, libgomp's OpenACC reference counting
> implementation proved to be somewhat inconsistent, especially when
> used in combination with the deep copy support which exercises it
> more thoroughly.
>
> So, this patch contains just the changes to reference-counting behaviour,
> for ease of (re-)review.  The other parts of OpenACC 2.6 manual deep
> copy support are forthcoming, but some changes in this patch anticipate
> that support.

As we're discussing these separately, please for now remove the changes
related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving 'link_key'
into an union (to later be shared with 'attach_count');
<87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>.

> Tested with offloading to NVPTX, with good results (though a couple of
> tests need fixing also).

The testsuite changes we're discussing separately, and need to go in
before this one, obviously.

> OK for trunk?

I haven't understood all the changes related to replacing
'dynamic_refcount' with 'virtual_refcount', getting rid of
'data_environ', the 'lookup_dev' rework, but I trust you got that right.
In particular, these seem to remove special-case OpenACC code in favor of
generic OMP code, which is good.

A few more comments:

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h

>  typedef struct acc_dispatch_t
>  {
> -  /* This is a linked list of data mapped using the
> - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
> - Unlike mapped_data in the goacc_thread struct, unmapping can
> - happen out-of-order with respect to mapping.  */
> -  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
> -  struct target_mem_desc *data_environ;

As mentioned before, please also accordingly update the comment attached
to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'.

That code:

> -/* Free address mapping tables.  MM must be locked on entry, and remains 
> locked
> -   on return.  */
> -
> -attribute_hidden void
> -gomp_free_memmap (struct splay_tree_s *mem_map)
> -{
> -  while (mem_map->root)
> -{
> -  struct target_mem_desc *tgt = mem_map->root->key.tgt;
> -
> -  splay_tree_remove (mem_map, _map->root->key);
> -  free (tgt->array);
> -  free (tgt);
> -}
> -}

... kind-of gets inlined here:

> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d)
>  
>if (walk->dev)
>   {
> -   gomp_mutex_lock (>dev->lock);
> -   gomp_free_memmap (>dev->mem_map);
> -   gomp_mutex_unlock (>dev->lock);
> +   while (walk->dev->mem_map.root)
> + {
> +   splay_tree_key k = >dev->mem_map.root->key;
> +   gomp_remove_var (walk->dev, k);
> + }
>  
> walk->dev = NULL;
> walk->base_dev = NULL;

It's not obvious to me why it's OK to remove the locking?  Don't all
operations on the 'mem_map' have to have the device locked?

Does that code now still have the previous (and expected?) "finalize"
semantics (don't consider 'refcount', always unmap)?  (Should we assert
here that 'gomp_remove_var' always returns 'true'?  And/or, if it
doesn't, what does that mean then?)  Or am I confused?  ;-)

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -427,6 +418,7 @@ acc_unmap_data (void *h)
>  {
>struct goacc_thread *thr = goacc_thread ();
>struct gomp_device_descr *acc_dev = thr->dev;
> +  struct splay_tree_key_s cur_node;

I know it's often not the case in existing code, but when adding new
code, please move definitions next to their first use.

> @@ -438,12 +430,11 @@ acc_unmap_data (void *h)
>acc_api_info api_info;
>bool profiling_p = GOACC_PROFILING_SETUP_P (thr, _info, _info);
>  
>gomp_mutex_lock (_dev->lock);
>  
> -  splay_tree_key n = lookup_host (acc_dev, h, 1);
> -  struct target_mem_desc *t;
> +  cur_node.host_start = (uintptr_t) h;
> +  cur_node.host_end = cur_node.host_start + 1;
> +  splay_tree_key n = splay_tree_lookup (_dev->mem_map, _node);
>  
>if (!n)
>  {

Isn't this just inlining 'lookup_host'?  There may be a good reason to do
that, but what is it?

> @@ -451,47 +442,28 @@ acc_unmap_data (void *h)

> - 

Re: [PATCH] OpenACC reference count overhaul

2019-10-21 Thread Julian Brown
On Tue, 15 Oct 2019 17:30:06 +0200
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2019-10-03T09:35:04-0700, Julian Brown 
> wrote:
> > This patch has been broken out of the patch supporting OpenACC 2.6
> > manual deep copy last posted here:
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html  
> 
> Thanks.
> 
> 
> > a couple of
> > tests need fixing also  
> 
> Let's look at these first, and independently.
> 
> The overall goal not being to bend test cases until they (again) work,
> but rather to verify what they're testing, so that they're valid
> OpenACC code, or if not that, then they're testing specifics of the
> GCC implementation (for example, the 'dg-shouldfail' test cases).

Indeed, the tests looked "obviously wrong", but actually none of them
should have failed with the reference-count overhaul patch. As far as I
can tell, only the context-2.c test now fails with the current og9
branch, intermittently, with the last version of the patch sent. Turns
out that was a real bug! So, good catch.

> ACK -- but do we understand why the same shouldn't be applied to the
> very similar 'libgomp.oacc-c-c++-common/context-1.c' and
> 'libgomp.oacc-c-c++-common/context-3.c', too?
> 
> I suppose your testing of the "OpenACC reference count overhaul"
> tripped over these constructs?  (Why just some, then?)

Yeah. Just blind luck, I think.

> The same pattern ('acc_copyin', 'acc_free') also appears in
> 'libgomp.oacc-c-c++-common/clauses-1.c', does that also need to be
> corrected?  Same in 'libgomp.oacc-c-c++-common/lib-13.c' (... where
> that test case actually is titled "Check acc_is_present and
> acc_delete" instead of "[...] acc_free", huh),
> 'libgomp.oacc-c-c++-common/lib-14.c',
> 'libgomp.oacc-c-c++-common/lib-18.c'.
> 
> Then, the 'acc_deviceptr', 'acc_unmap_data', 'acc_free' usage in
> 'libgomp.oacc-c-c++-common/clauses-1.c' also seems strange, as the
> respective 'acc_free' argument certainly is not (at least not
> directly) a "pointer value that was returned by a call to
> 'acc_malloc'".  Does it make sense to (continue to) support that,
> assuming that's how it's implemented internally, or should these be
> corrected to valid OpenACC, too?  Same in
> 'libgomp.oacc-c-c++-common/present-1.c'.
> 
> Same in 'libgomp.oacc-c-c++-common/clauses-2.c' (we 'dg-shouldfail'
> earlier, but the later code should otherwise be made correct anyway).
> 
> Several of these things again in
> 'libgomp.oacc-c-c++-common/nested-1.c'.

I'm not sure if *all* of those are wrong. I have a patch (forthcoming)
that fixes some of the pedantically-wrong OpenACC usage, but none of
the tests now regress with this version of the patch, so the urgency
is gone.

This version of the patch fixes the lookup_dev_1 helper function --
previously I had:

static splay_tree_key
lookup_dev_1 (splay_tree_node node, uintptr_t d, size_t s)
{
  splay_tree_key k = >key;
  struct target_mem_desc *t = k->tgt;

  if (d >= t->tgt_start && d + s <= t->tgt_end)
return k;

  if (node->left)
return lookup_dev_1 (node->left, d, s);

  if (node->right)
return lookup_dev_1 (node->right, d, s);

  return NULL;
}

which would never recurse into a right-hand branch if there was a
left-hand node! Oops. So, device-address lookups would sometimes fail
when there was a valid mapping, depending on the balance of the splay
tree. (As an aside, I think calling lookup_dev unconditionally in
several of the OpenACC API calls as we do is a bad idea -- it takes time
linear to the number of mappings, with no way to avoid that overhead.
But that's another matter.)

Re-testing shows that the previously-regressing tests no longer
regress, but I haven't yet made any changes to VREFCOUNT_LINK_KEY, etc.
as suggested in the review of the attach/detach patch:

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01374.html

OK? (ChangeLog as before.)

Julian
>From e7b21dfe343f1ca556652cc54793e0656eb95685 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 5 Nov 2018 15:51:46 -0800
Subject: [PATCH] OpenACC reference count overhaul

2019-10-02  Julian Brown  

	libgomp/
	* libgomp.h (VREFCOUNT_LINK_KEY): New macro.
	(struct splay_tree_key_s): Put link_key field into a new union.
	Substitute dynamic_refcount field for virtual_refcount.
	(struct acc_dispatch_t): Remove data_environ field.
	(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
	(gomp_acc_insert_pointer): Remove prototype.
	(gomp_acc_remove_pointer): Update prototype.
	(gomp_free_memmap): Remove prototype.
	(gomp_remove_var_async): Add prototype.
	* oacc-host.c (host_dispatch): Don't initialise removed data_environ
	field.
	* oacc-init.c (acc_shutdown_1): Use gomp_remove_var instead of
	gomp_free_memmap.
	* oacc-mem.c (lookup_dev_1): New function.
	(lookup_dev): Reimplement using above.
	(acc_free, acc_hostptr): Update calls to lookup_dev.
	(acc_map_data): Likewise.  Don't add to data_environ list.
	(acc_unmap_data): Remove call to gomp_unmap_vars.  Fix semantics to
	remove mapping, but not 

Re: [PATCH] OpenACC reference count overhaul

2019-10-15 Thread Thomas Schwinge
Hi Julian!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> This patch has been broken out of the patch supporting OpenACC 2.6 manual
> deep copy last posted here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html

Thanks.


> a couple of
> tests need fixing also

Let's look at these first, and independently.

The overall goal not being to bend test cases until they (again) work,
but rather to verify what they're testing, so that they're valid OpenACC
code, or if not that, then they're testing specifics of the GCC
implementation (for example, the 'dg-shouldfail' test cases).

>   * testsuite/libgomp.oacc-c-c++-common/context-2.c: Use correct API to
>   deallocate acc_copyin'd data.
>   * testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c

> +acc_delete (_X[0], N * sizeof (float));
> +acc_delete (_Y1[0], N * sizeof (float));
> +
>  free (h_X);
>  free (h_Y1);
>  free (h_Y2);
>  
> -acc_free (d_X);
> -acc_free (d_Y);

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c

> +acc_delete (_X[0], N * sizeof (float));
> +acc_delete (_Y1[0], N * sizeof (float));
> +
>  free (h_X);
>  free (h_Y1);
>  free (h_Y2);
>  
> -acc_free (d_X);
> -acc_free (d_Y);

ACK -- but do we understand why the same shouldn't be applied to the very
similar 'libgomp.oacc-c-c++-common/context-1.c' and
'libgomp.oacc-c-c++-common/context-3.c', too?

I suppose your testing of the "OpenACC reference count overhaul" tripped
over these constructs?  (Why just some, then?)

The same pattern ('acc_copyin', 'acc_free') also appears in
'libgomp.oacc-c-c++-common/clauses-1.c', does that also need to be
corrected?  Same in 'libgomp.oacc-c-c++-common/lib-13.c' (... where that
test case actually is titled "Check acc_is_present and acc_delete"
instead of "[...] acc_free", huh), 'libgomp.oacc-c-c++-common/lib-14.c',
'libgomp.oacc-c-c++-common/lib-18.c'.

Then, the 'acc_deviceptr', 'acc_unmap_data', 'acc_free' usage in
'libgomp.oacc-c-c++-common/clauses-1.c' also seems strange, as the
respective 'acc_free' argument certainly is not (at least not directly) a
"pointer value that was returned by a call to 'acc_malloc'".  Does it
make sense to (continue to) support that, assuming that's how it's
implemented internally, or should these be corrected to valid OpenACC,
too?  Same in 'libgomp.oacc-c-c++-common/present-1.c'.

Same in 'libgomp.oacc-c-c++-common/clauses-2.c' (we 'dg-shouldfail'
earlier, but the later code should otherwise be made correct anyway).

Several of these things again in 'libgomp.oacc-c-c++-common/nested-1.c'.

(The other 'libgomp.oacc-c-c++-common/lib-*.c' ones are correctly pairing
'acc_malloc', 'acc_free', as far as I can tell.)


> --- a/libgomp/testsuite/libgomp.oacc-fortran/data-2.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/data-2.f90

> @@ -70,10 +71,14 @@ program test
>  end do
>!$acc end parallel
>
> -  !$acc exit data copyout (d(1:N)) async
> +  !$acc exit data delete (c(1:N)) copyout (d(1:N)) async
>!$acc exit data async
>!$acc wait

ACK, but also it seems to me as if the '!$acc exit data async' (currently
"clause-less") was meant to carry the 'delete (c(1:N))' clause?

> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-additional-options "-cpp" }

> [...]

> +#if !ACC_MEM_SHARED
> +  if (acc_is_present (c) .eqv. .TRUE.) call abort
> +#endif

;-) Should be able to simplify that one to 'if (acc_is_present (c))', no?

But is that a really useful test here: don't we elsewhere have enough of
such 'acc_is_present' testing?  (That is, OK to keep that, but likewise
OK to drop that.)

And, just for background information: per PR84381, it has been suggested
to use the Fortran standard 'stop' (or was it 'error stop'?) instead of
'call abort'.  But no need to change that here individually; the libgomp
testsuite still (or, again?)  contains a lot of 'call abort'.

> +
>do i = 1, N
>  if (d(i) .ne. 4.0) call abort
>end do

..., for example, here.  ;-) (For avoidance of doubt, I'm not asking you
to change these now.)


So, please address these items first, as separate "Fix OpenACC test cases
regarding 'acc_malloc', 'acc_free' pairing", and "Fix OpenACC test case
for unstructured data regions" (or similar) commits.  If you're confident
you're doing "the obvious", feel free to commit without further review.


Grüße
 Thomas


signature.asc
Description: PGP signature