Re: [gomp4.5] Handle #pragma omp declare target link

2019-06-26 Thread Thomas Schwinge
Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin  wrote:
> Here is an updated patch [for "#pragma omp declare target link"]

..., that got committed long ago (trunk r231655), with additional changes
later on.

As has later been filed in PR81689, the test case added
"libgomp.c/target-link-1.c fails for nvptx: #pragma omp target link not
implemented".  Curious, has anybody ever looked into what's going
on/wrong?


Grüße
 Thomas


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-link-1.c
> @@ -0,0 +1,63 @@
> +struct S { int s, t; };
> +
> +int a = 1, b = 1;
> +double c[27];
> +struct S d = { ,  };
> +#pragma omp declare target link (a) to (b) link (c, d)
> +
> +int
> +foo (void)
> +{
> +  return a++ + b++;
> +}
> +
> +int
> +bar (int n)
> +{
> +  int *p1 = 
> +  int *p2 = 
> +  c[n] += 2.0;
> +  d.s -= 2;
> +  d.t -= 2;
> +  return *p1 + *p2 + d.s + d.t;
> +}
> +
> +#pragma omp declare target (foo, bar)
> +
> +int
> +main ()
> +{
> +  a = b = 2;
> +  d.s = 17;
> +  d.t = 18;
> +
> +  int res, n = 10;
> +  #pragma omp target map (to: a, b, c, d) map (from: res)
> +  {
> +res = foo () + foo ();
> +c[n] = 3.0;
> +res += bar (n);
> +  }
> +
> +  int shared_mem = 0;
> +  #pragma omp target map (alloc: shared_mem)
> +shared_mem = 1;
> +
> +  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
> +  || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
> +__builtin_abort ();
> +
> +  #pragma omp target enter data map (to: c)
> +  #pragma omp target update from (c)
> +  res = (int) (c[n] + 0.5);
> +  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
> +__builtin_abort ();
> +
> +  #pragma omp target map (to: a, b) map (from: res)
> +res = foo ();
> +
> +  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
> +__builtin_abort ();
> +
> +  return 0;
> +}


signature.asc
Description: PGP signature


gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link)

2015-12-16 Thread Thomas Schwinge
Hi!

On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin  wrote:
> On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > +/* This function finalizes all initialized devices.  */
> > > +
> > > +static void
> > > +gomp_target_fini (void)
> > > +{
> > > +  [...]
> > 
> > The question is what will this do if there are async target tasks still
> > running on some of the devices at this point (forgotten #pragma omp taskwait
> > or similar if target nowait regions are started outside of parallel region,
> > or exit inside of parallel, etc.  But perhaps it can be handled 
> > incrementally.
> > Also there is the question that the 
> > So I think the patch is ok with the above mentioned changes.
> 
> Here is what I've committed to trunk.

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
>} cuda;
>  } acc_dispatch_t;
>  
> +/* Various state of the accelerator device.  */
> +enum gomp_device_state
> +{
> +  GOMP_DEVICE_UNINITIALIZED,
> +  GOMP_DEVICE_INITIALIZED,
> +  GOMP_DEVICE_FINALIZED
> +};
> +
>  /* This structure describes accelerator device.
> It contains name of the corresponding libgomp plugin, function handlers 
> for
> interaction with the device, ID-number of the device, and information 
> about
> @@ -933,8 +941,10 @@ struct gomp_device_descr
>/* Mutex for the mutable data.  */
>gomp_mutex_t lock;
>  
> -  /* Set to true when device is initialized.  */
> -  bool is_initialized;
> +  /* Current state of the device.  OpenACC allows to move from INITIALIZED 
> state
> + back to UNINITIALIZED state.  OpenMP allows only to move from 
> INITIALIZED
> + to FINALIZED state (at program shutdown).  */
> +  enum gomp_device_state state;

(ACK, but I assume we'll want to make sure that an OpenACC device is
never re-initialized if we're in/after the libgomp finalization phase.)


The issue mentioned above: "exit inside of parallel" is actually a
problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
test cases now run into annoying "WARNING: program timed out".  Here is
what's happening, as I understand it: in
libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +{
> +  struct gomp_device_descr *devicep = [i];
> +  gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> + {
> +   devicep->fini_device_func (devicep->target_id);
> +   devicep->state = GOMP_DEVICE_FINALIZED;
> + }
> +  gomp_mutex_unlock (>lock);
> +}
> +}

> @@ -2387,6 +2433,9 @@ gomp_target_init (void)
>if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
>   goacc_register ([i]);
>  }
> +
> +  if (atexit (gomp_target_fini) != 0)
> +gomp_fatal ("atexit failed");
>  }

Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
atexit handler, gomp_target_fini, which, with the device lock held, will
call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
clean up.

Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
context is now in an inconsistent state, see
:

CUDA_ERROR_LAUNCH_FAILED = 719
An exception occurred on the device while executing a
kernel. Common causes include dereferencing an invalid device
pointer and accessing out of bounds shared memory. The context
cannot be used, so it must be destroyed (and a new one should be
created). All existing device memory allocations from this
context are invalid and must be reconstructed if the program is
to continue using CUDA.

Thus, any cuMemFreeHost invocations that are run during clean-up will now
also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
GOMP_PLUGIN_fatal, which again will trigger the same or another
(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
trying to lock the device again, which is still locked.

(Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
errors, should we destroy the context right away, or toggle a boolean
flag to mark it as unusable, and use that as an indication to avoid the
follow-on failures of cuMemFreeHost just described above, for example?)


tells us:

Since the behavior is undefined if the exit() function is called more
than once, portable applications calling atexit() must ensure that the
exit() 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-16 Thread Thomas Schwinge
Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin  wrote:
> [updated patch]

This regresses libgomp.oacc-c-c++-common/declare-4.c compilation for
nvptx offloading:

spawn [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/declare-4.c
 -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/ 
-B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs 
-I[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp 
-I[...]/source-gcc/libgomp/testsuite/../../include 
-I[...]/source-gcc/libgomp/testsuite/.. -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never 
-B/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B/bin 
-B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/libexec/gcc/x86_64-pc-linux-gnu/6.0.0
 -B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/bin 
-fopenacc -I[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 
-L[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -lm -o ./declare-4.exe
ptxas /tmp/ccLXqNjE.o, line 50; error   : State space mismatch between 
instruction and address in instruction 'ld'
ptxas /tmp/ccLXqNjE.o, line 50; error   : Unknown symbol 'b_linkptr'
ptxas /tmp/ccLXqNjE.o, line 50; error   : Label expected for forward 
reference of 'b_linkptr'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: 
[...]/build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit 
status
compilation terminated.

That "b_linkptr" symbol is not declared/referenced in the test case
itself, libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c:

/* { dg-do run  { target openacc_nvidia_accel_selected } } */

#include 
#include 

float b;
#pragma acc declare link (b)

#pragma acc routine
int
func (int a)
{
  b = a + 1;

  return b;
}

int
main (int argc, char **argv)
{
  float a;

  a = 2.0;

#pragma acc parallel copy (a)
  {
b = a;
a = 1.0;
a = a + b;
  }

  if (a != 3.0)
abort ();

  a = func (a);

  if (a != 4.0)
abort ();

  return 0;
}

..., but I see that the "b_linkptr" identifier is generated for "b" in
the new gcc/lto/lto.c:offload_handle_link_vars based on whether attribute
"omp declare target link" is set, so maybe we fail to set that one as
appropriate?  Jim, as the main author of the OpenACC declare
implementation, would you please have a look?  I have not yet studied in
detail the thread, starting at
,
that resulted in the trunk r231655 commit:

> gcc/c-family/
>   * c-common.c (c_common_attribute_table): Handle "omp declare target
>   link" attribute.
> gcc/
>   * cgraphunit.c (output_in_order): Do not assemble "omp declare target
>   link" variables in ACCEL_COMPILER.
>   * gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
>   "omp declare target link" variables.
>   * lto/lto.c: Include stringpool.h and fold-const.h.
>   (offload_handle_link_vars): New static function.
>   (lto_main): Call offload_handle_link_vars.
>   * omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
>   declare target link" variables.
>   (add_decls_addresses_to_decl_constructor): For "omp declare target link"
>   variables output address of the artificial pointer instead of address of
>   the variable.  Set most significant bit of the size to mark them.
>   (pass_data_omp_target_link): New pass_data.
>   (pass_omp_target_link): New class.
>   (find_link_var_op): New static function.
>   (make_pass_omp_target_link): New function.
>   * passes.def: Add pass_omp_target_link.
>   * tree-pass.h (make_pass_omp_target_link): Declare.
>   * varpool.c (symbol_table::output_variables): Do not assemble "omp
>   declare target link" variables in ACCEL_COMPILER.
> libgomp/
>   * libgomp.h (REFCOUNT_LINK): Define.
>   (struct splay_tree_key_s): Add link_key.
>   * target.c (gomp_map_vars): Treat REFCOUNT_LINK objects as not mapped.
>   Replace target address of the pointer with target address of newly
>   mapped object in the splay tree.  Set link pointer on target to the
>   device address of the mapped object.
>   (gomp_unmap_vars): Restore target address of the pointer in the splay
>   tree for REFCOUNT_LINK objects after unmapping.
>   (gomp_load_image_to_device): Set refcount to REFCOUNT_LINK for "omp
>   declare target link" objects.
>   (gomp_unload_image_from_device): Replace j with i.  Force unmap of all
>   "omp declare target link" objects, which were mapped for the image.

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-15 Thread Jakub Jelinek
On Mon, Dec 14, 2015 at 08:17:33PM +0300, Ilya Verbin wrote:
> Here is an updated patch.  Now MSB is set in both tables, and
> gomp_unload_image_from_device is changed.  I've verified using simple DSO
> testcase, that memory on target is freed after dlclose.
> bootstrap and make check on x86_64-linux passed.
> 
> gcc/c-family/
>   * c-common.c (c_common_attribute_table): Handle "omp declare target
>   link" attribute.
> gcc/
>   * cgraphunit.c (output_in_order): Do not assemble "omp declare target
>   link" variables in ACCEL_COMPILER.
>   * gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
>   "omp declare target link" variables.
>   * lto/lto.c: Include stringpool.h and fold-const.h.
>   (offload_handle_link_vars): New static function.
>   (lto_main): Call offload_handle_link_vars.

lto/ has its own ChangeLog file, so please move the entry there and remove
the lto/ prefix.

Ok with that change, thanks.

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-14 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > > This looks wrong, both of these clearly could affect anything with
> > > DECL_HAS_VALUE_EXPR_P, not just the link vars.
> > > So, if you need to handle the "omp declare target link" vars specially,
> > > you should only handle those specially and nothing else.  And please try 
> > > to
> > > explain why.
> > 
> > Actually these ifndefs are not needed, because assemble_decl never will be
> > called by accel compiler for original link vars.  I've added a check into
> > output_in_order, but missed a second place where assemble_decl is called -
> > symbol_table::output_variables.  So, fixed now.
> 
> Great.
> 
> > > Do we need to do anything in gomp_unload_image_from_device ?
> > > I mean at least in questionable programs that for link vars don't 
> > > decrement
> > > the refcount of the var that replaced the link var to 0 first before
> > > dlclosing the library.
> > > At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> > > handle it differently.  Perhaps for that case perform a lookup, and if we
> > > get something which has link_map non-NULL, first perform as if there is
> > > target exit data delete (var) on it first?
> > 
> > You're right, it doesn't deallocate memory on the device if DSO leaves 
> > nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, 
> > it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for 
> > each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table 
> > during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars 
> > with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device 
> > works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, 
> > because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set 
> > shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
> if (!host_data_has_msb_set)
>   splay_tree_remove (>mem_map, );
> else
>   {
> splay_tree_key n = splay_tree_lookup (>mem_map, );
> if (n->link_key)
> {
>   n->refcount = 0;
>   n->link_key = NULL;
>   splay_tree_remove (>mem_map, n);
>   if (n->tgt->refcount > 1)
> n->tgt->refcount--;
>   else
> gomp_unmap_tgt (n->tgt);
> }
>   else
> splay_tree_remove (>mem_map, n);
>   }
> or so.

Here is an updated patch.  Now MSB is set in both tables, and
gomp_unload_image_from_device is changed.  I've verified using simple DSO
testcase, that memory on target is freed after dlclose.
bootstrap and make check on x86_64-linux passed.


gcc/c-family/
* c-common.c (c_common_attribute_table): Handle "omp declare target
link" attribute.
gcc/
* cgraphunit.c (output_in_order): Do not assemble "omp declare target
link" variables in ACCEL_COMPILER.
* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
"omp declare target link" variables.
* lto/lto.c: Include stringpool.h and fold-const.h.
(offload_handle_link_vars): New static function.
(lto_main): Call offload_handle_link_vars.
* omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
declare target link" variables.
(add_decls_addresses_to_decl_constructor): For "omp declare target link"
variables output address of the artificial pointer instead of address of
the variable.  Set most significant bit of the size to mark them.
(pass_data_omp_target_link): New pass_data.
(pass_omp_target_link): New class.
(find_link_var_op): New static function.
(make_pass_omp_target_link): New function.
* passes.def: Add pass_omp_target_link.
* tree-pass.h (make_pass_omp_target_link): Declare.
* varpool.c (symbol_table::output_variables): Do not assemble "omp
declare target link" variables in ACCEL_COMPILER.
libgomp/
* libgomp.h (REFCOUNT_LINK): Define.
(struct splay_tree_key_s): Add link_key.
* target.c (gomp_map_vars): Treat 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-14 Thread Ilya Verbin
On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, 
> > size_t mapnum,
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.

Done.

> > +  return NULL;
> > +}
> >  
> >for (i = 0; i < mapnum; i++)
> >  {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> > do_copyfrom)
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> > +  return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.

Done.

> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.

This code can be executed only at program shutdown, so I think that removing
from the splay tree isn't necessary here, it will only consume time.
Besides, we do not remove at shutdown those vars, which have non-zero refcount.

> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = [i];
> > +  gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +   {
> > + devicep->fini_device_func (devicep->target_id);
> > + devicep->state = GOMP_DEVICE_FINALIZED;
> > +   }
> > +  gomp_mutex_unlock (>lock);
> > +}
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.

Here is what I've committed to trunk.


libgomp/
* libgomp.h (gomp_device_state): New enum.
(struct gomp_device_descr): Replace is_initialized with state.
(gomp_fini_device): Remove declaration.
* oacc-host.c (host_dispatch): Use state instead of is_initialized.
* oacc-init.c (acc_init_1): Use state instead of is_initialized.
(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
(acc_set_device_type): Use state instead of is_initialized.
(acc_set_device_num): Likewise.
* target.c (resolve_device): Use state instead of is_initialized.
Do not initialize finalized device.
(gomp_map_vars): Do nothing if device is finalized.
(gomp_unmap_vars): Likewise.
(gomp_update): Likewise.
(GOMP_offload_register_ver): Use state instead of is_initialized.
(GOMP_offload_unregister_ver): Likewise.
(gomp_init_device): Likewise.
(gomp_unload_device): Likewise.
(gomp_fini_device): Remove.
(gomp_get_target_fn_addr): Do nothing if device is finalized.
(GOMP_target): Go to host fallback if device is finalized.
(GOMP_target_ext): Likewise.
(gomp_exit_data): Do nothing if device is finalized.
(gomp_target_task_fn): Go to host fallback if device is finalized.
(gomp_target_fini): New static function.
(gomp_target_init): Use state instead of is_initialized.
Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED 
state
+ back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+ to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-11 Thread Ilya Verbin
On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> > @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
> >  {
> >struct gomp_device_descr *acc_dev = _dev[i];
> >gomp_mutex_lock (_dev->lock);
> > -  if (acc_dev->is_initialized)
> > +  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
> >  {
> >   devices_active = true;
> > - gomp_fini_device (acc_dev);
> > + acc_dev->fini_device_func (acc_dev->target_id);
> > + acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
> > }
> >gomp_mutex_unlock (_dev->lock);
> >  }
> 
> I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
> but I'd leave that to the OpenACC folks to do that incrementally
> once they test it and/or decide what to do.

libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c contains a call to acc_init,
next acc_shutdown, and acc_init again, so I guess that OpenACC allows to
initialize the device again after acc_shutdown, but GOMP_DEVICE_FINALIZED means
that it's terminally finalized.

> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, 
> > size_t mapnum,
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.
> 
> > +  return NULL;
> > +}
> >  
> >for (i = 0; i < mapnum; i++)
> >  {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> > do_copyfrom)
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> > +  return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.
> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.
> 
> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = [i];
> > +  gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +   {
> > + devicep->fini_device_func (devicep->target_id);
> > + devicep->state = GOMP_DEVICE_FINALIZED;
> > +   }
> > +  gomp_mutex_unlock (>lock);
> > +}
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.
> 
> What is the state of the link clause implementation patch?  Does it depend
> on this?

It's ready, but it depends on this.  I will retest and resend "link" patch after
checking-in "init/fini" patch.

  -- Ilya


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-11 Thread Jakub Jelinek
On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
>  {
>struct gomp_device_descr *acc_dev = _dev[i];
>gomp_mutex_lock (_dev->lock);
> -  if (acc_dev->is_initialized)
> +  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
>  {
> devices_active = true;
> -   gomp_fini_device (acc_dev);
> +   acc_dev->fini_device_func (acc_dev->target_id);
> +   acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
>   }
>gomp_mutex_unlock (_dev->lock);
>  }

I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
but I'd leave that to the OpenACC folks to do that incrementally
once they test it and/or decide what to do.

> @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
> mapnum,
>  }
>  
>gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +{
> +  gomp_mutex_unlock (>lock);

You need to free (tgt); here I think to avoid leaking memory.

> +  return NULL;
> +}
>  
>for (i = 0; i < mapnum; i++)
>  {
> @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> do_copyfrom)
>  }
>  
>gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +{
> +  gomp_mutex_unlock (>lock);
> +  return;

Supposedly you want at least free (tgt->array); free (tgt); here.
Plus the question is if the mappings shouldn't be removed from the splay tree
before that.

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +{
> +  struct gomp_device_descr *devicep = [i];
> +  gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> + {
> +   devicep->fini_device_func (devicep->target_id);
> +   devicep->state = GOMP_DEVICE_FINALIZED;
> + }
> +  gomp_mutex_unlock (>lock);
> +}
> +}

The question is what will this do if there are async target tasks still
running on some of the devices at this point (forgotten #pragma omp taskwait
or similar if target nowait regions are started outside of parallel region,
or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
Also there is the question that the 
So I think the patch is ok with the above mentioned changes.

What is the state of the link clause implementation patch?  Does it depend
on this?

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-08 Thread Ilya Verbin
On Tue, Dec 01, 2015 at 20:05:04 +0100, Jakub Jelinek wrote:
> This is racy, tsan would tell you so.
> Instead of a global var, I'd just change the devicep->is_initialized 
> field from bool into a 3 state field (perhaps enum), with states
> uninitialized, initialized, finalized, and then say in resolve_device,
> 
>   gomp_mutex_lock ([device_id].lock);
>   if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
> gomp_init_device ([device_id]);
>   else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
> {
>   gomp_mutex_unlock ([device_id].lock);
>   return NULL;
> }
>   gomp_mutex_unlock ([device_id].lock);
> 
> Though, of course, that is incomplete, because resolve_device takes one
> lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
> So I think either we want to rewrite the locking, such that say
> resolve_device returns a locked device and then you perform stuff on the
> locked device (disadvantage is that gomp_map_vars will call gomp_malloc
> with the lock held, which can take some time to allocate the memory),
> or there needs to be the possibility that gomp_map_vars rechecks if the
> device has not been finalized after taking the lock and returns to the
> caller if the device has been finalized in between resolve_device and
> gomp_map_vars.

This patch implements the second approach.  Is it OK?
Bootstrap and make check-target-libgomp passed.


libgomp/
* libgomp.h (gomp_device_state): New enum.
(struct gomp_device_descr): Replace is_initialized with state.
(gomp_fini_device): Remove declaration.
* oacc-host.c (host_dispatch): Use state instead of is_initialized.
* oacc-init.c (acc_init_1): Use state instead of is_initialized.
(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
(acc_set_device_type): Use state instead of is_initialized.
(acc_set_device_num): Likewise.
* target.c (resolve_device): Use state instead of is_initialized.
Do not initialize finalized device.
(gomp_map_vars): Do nothing if device is finalized.
(gomp_unmap_vars): Likewise.
(gomp_update): Likewise.
(GOMP_offload_register_ver): Use state instead of is_initialized.
(GOMP_offload_unregister_ver): Likewise.
(gomp_init_device): Likewise.
(gomp_unload_device): Likewise.
(gomp_fini_device): Remove.
(gomp_get_target_fn_addr): Do nothing if device is finalized.
(GOMP_target): Go to host fallback if device is finalized.
(GOMP_target_ext): Likewise.
(gomp_exit_data): Do nothing if device is finalized.
(gomp_target_task_fn): Go to host fallback if device is finalized.
(gomp_target_fini): New static function.
(gomp_target_init): Use state instead of is_initialized.
Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED 
state
+ back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+ to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
@@ -962,7 +972,6 @@ extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
-extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 9874804..d289b38 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -222,7 +222,7 @@ static struct gomp_device_descr host_dispatch =
 
 .mem_map = { NULL },
 /* .lock initilized in goacc_host_init.  */
-.is_initialized = false,
+.state = 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin
On Tue, Dec 01, 2015 at 14:15:59 +0100, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> > > On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> > >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> > >> Ok, but it doesn't solve the issue with doing it for the executable, 
> > >> because
> > >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized 
> > >> device.
> > > 
> > > ?? You mean that the
> > > devicep->unload_image_func (devicep->target_id, version, target_data);
> > > call deinitializes the device or something else (I mean, if there is some
> > > other tgt, then it had to be initialized)?
> > 
> > No, I mean that it can be deinitialized from plugin's __run_exit_handlers 
> > (see my last mail with the patch).
> 
> Then the bug is that you have too many atexit registered handlers that
> perform some finalization, better would be to have a single one that
> performs everything in order.
> 
> Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
> intelmic plugin) to set some flag and ignore free_func calls when the flag
> is set or something like that.
> 
> Note library destructors can also use OpenMP code in them, similarly C++
> dtors etc., so when you at some point finalize certain device, you should
> arrange for newer events on the device to be ignored and new offloadings to
> go to host fallback.

So I guess the decision to do host fallback should be made in resolve_device,
rather than in plugins (in free_func and all others).  Is this patch OK?
make check-target-libgomp pass both using emul and hw, offloading from dlopened
libs also works fine.


libgomp/
* target.c (finalized): New static variable.
(resolve_device): Do nothing when finalized is true.
(GOMP_offload_register_ver): Likewise.
(GOMP_offload_unregister_ver): Likewise.
(gomp_target_fini): New static function.
(gomp_target_init): Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..320178e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -78,6 +78,10 @@ static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* True when offloading runtime is finalized.  */
+static bool finalized;
+
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -108,6 +112,9 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device_id)
 {
+  if (finalized)
+return NULL;
+
   if (device_id == GOMP_DEVICE_ICV)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -1095,6 +1102,9 @@ GOMP_offload_register_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
 gomp_fatal ("Library too old for offload (version %u < %u)",
GOMP_VERSION, GOMP_VERSION_LIB (version));
@@ -1143,6 +1153,9 @@ GOMP_offload_unregister_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   gomp_mutex_lock (_lock);
 
   /* Unload image from all initialized devices.  */
@@ -2282,6 +2295,24 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
   return 0;
 }
 
+/* This function finalizes the runtime needed for offloading and all 
initialized
+   devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  finalized = true;
+
+  int i;
+  for (i = 0; i < num_devices; i++)
+{
+  struct gomp_device_descr *devicep = [i];
+  gomp_mutex_lock (>lock);
+  gomp_fini_device (devicep);
+  gomp_mutex_unlock (>lock);
+}
+}
+
 /* This function initializes the runtime needed for offloading.
It parses the list of offload targets and tries to load the plugins for
these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2387,6 +2418,9 @@ gomp_target_init (void)
   if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
goacc_register ([i]);
 }
+
+  if (atexit (gomp_target_fini) != 0)
+gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
-unregister_main_image ()
-{
-  __offload_unregister_image (_target_image);
-}
-
-static void
 register_main_image ()
 {
   /* Do not check the return value, because old 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 08:29:27PM +0300, Ilya Verbin wrote:
> libgomp/
>   * target.c (finalized): New static variable.
>   (resolve_device): Do nothing when finalized is true.
>   (GOMP_offload_register_ver): Likewise.
>   (GOMP_offload_unregister_ver): Likewise.
>   (gomp_target_fini): New static function.
>   (gomp_target_init): Call gomp_target_fini at exit.
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
>   (register_main_image): Do not call unregister_main_image at exit.
>   (GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.
> 
> diff --git a/libgomp/target.c b/libgomp/target.c
> index cf9d0e6..320178e 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -78,6 +78,10 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* True when offloading runtime is finalized.  */
> +static bool finalized;


> +
> +
>  /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
>  
>  static void *
> @@ -108,6 +112,9 @@ gomp_get_num_devices (void)
>  static struct gomp_device_descr *
>  resolve_device (int device_id)
>  {
> +  if (finalized)
> +return NULL;
> +

This is racy, tsan would tell you so.
Instead of a global var, I'd just change the devicep->is_initialized 
field from bool into a 3 state field (perhaps enum), with states
uninitialized, initialized, finalized, and then say in resolve_device,

  gomp_mutex_lock ([device_id].lock);
  if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
gomp_init_device ([device_id]);
  else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
{
  gomp_mutex_unlock ([device_id].lock);
  return NULL;
}
  gomp_mutex_unlock ([device_id].lock);

Though, of course, that is incomplete, because resolve_device takes one
lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
So I think either we want to rewrite the locking, such that say
resolve_device returns a locked device and then you perform stuff on the
locked device (disadvantage is that gomp_map_vars will call gomp_malloc
with the lock held, which can take some time to allocate the memory),
or there needs to be the possibility that gomp_map_vars rechecks if the
device has not been finalized after taking the lock and returns to the
caller if the device has been finalized in between resolve_device and
gomp_map_vars.

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin

> On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> 
>> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
>> Ok, but it doesn't solve the issue with doing it for the executable, because
>> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> 
> ?? You mean that the
> devicep->unload_image_func (devicep->target_id, version, target_data);
> call deinitializes the device or something else (I mean, if there is some
> other tgt, then it had to be initialized)?

No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see 
my last mail with the patch).

  -- Ilya

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> Ok, but it doesn't solve the issue with doing it for the executable, because
> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

?? You mean that the
devicep->unload_image_func (devicep->target_id, version, target_data);
call deinitializes the device or something else (I mean, if there is some
other tgt, then it had to be initialized)?
If it is just that order, I wonder if you can't just move the
unload_image_func call after the splay_tree_remove loops (or even after the
node freeing call).

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> 
> > On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> > 
> >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> >> Ok, but it doesn't solve the issue with doing it for the executable, 
> >> because
> >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> > 
> > ?? You mean that the
> > devicep->unload_image_func (devicep->target_id, version, target_data);
> > call deinitializes the device or something else (I mean, if there is some
> > other tgt, then it had to be initialized)?
> 
> No, I mean that it can be deinitialized from plugin's __run_exit_handlers 
> (see my last mail with the patch).

Then the bug is that you have too many atexit registered handlers that
perform some finalization, better would be to have a single one that
performs everything in order.

Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
intelmic plugin) to set some flag and ignore free_func calls when the flag
is set or something like that.

Note library destructors can also use OpenMP code in them, similarly C++
dtors etc., so when you at some point finalize certain device, you should
arrange for newer events on the device to be ignored and new offloadings to
go to host fallback.

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 13:04:59 +0100, Jakub Jelinek wrote:
> On Fri, Nov 27, 2015 at 07:50:09PM +0300, Ilya Verbin wrote:
> > + /* Most significant bit of the size marks such vars.  */
> > + unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > + isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> 
> That supposedly should be BITS_PER_UNIT instead of 8.

Fixed.

> > diff --git a/gcc/varpool.c b/gcc/varpool.c
> > index 36f19a6..cbd1e05 100644
> > --- a/gcc/varpool.c
> > +++ b/gcc/varpool.c
> > @@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
> >   are not real variables, but just info for debugging and codegen.
> >   Unfortunately at the moment emutls is not updating varpool correctly
> >   after turning real vars into value_expr vars.  */
> > +#ifndef ACCEL_COMPILER
> >if (DECL_HAS_VALUE_EXPR_P (decl)
> >&& !targetm.have_tls)
> >  return false;
> > +#endif
> >  
> >/* Hard register vars do not need to be output.  */
> >if (DECL_HARD_REGISTER (decl))
> >  return false;
> >  
> > +#ifndef ACCEL_COMPILER
> >gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
> >&& TREE_CODE (decl) == VAR_DECL
> >&& !DECL_HAS_VALUE_EXPR_P (decl));
> > +#endif
> 
> This looks wrong, both of these clearly could affect anything with
> DECL_HAS_VALUE_EXPR_P, not just the link vars.
> So, if you need to handle the "omp declare target link" vars specially,
> you should only handle those specially and nothing else.  And please try to
> explain why.

Actually these ifndefs are not needed, because assemble_decl never will be
called by accel compiler for original link vars.  I've added a check into
output_in_order, but missed a second place where assemble_decl is called -
symbol_table::output_variables.  So, fixed now.

> > @@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr 
> > *devicep, unsigned version,
> >for (i = 0; i < num_vars; i++)
> >  {
> >struct addr_pair *target_var = _table[num_funcs + i];
> > -  if (target_var->end - target_var->start
> > - != (uintptr_t) host_var_table[i * 2 + 1])
> > +  uintptr_t target_size = target_var->end - target_var->start;
> > +
> > +  /* Most significant bit of the size marks "omp declare target link"
> > +variables.  */
> > +  bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));
> 
> __CHAR_BIT__ here instead of 8?

Fixed.

> > @@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
> > *devicep, unsigned version,
> >k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
> >k->tgt = tgt;
> >k->tgt_offset = target_var->start;
> > -  k->refcount = REFCOUNT_INFINITY;
> > +  k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
> >k->async_refcount = 0;
> >array->left = NULL;
> >array->right = NULL;
> 
> Do we need to do anything in gomp_unload_image_from_device ?
> I mean at least in questionable programs that for link vars don't decrement
> the refcount of the var that replaced the link var to 0 first before
> dlclosing the library.
> At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> handle it differently.  Perhaps for that case perform a lookup, and if we
> get something which has link_map non-NULL, first perform as if there is
> target exit data delete (var) on it first?

You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
set only by accel compiler.  But it's possible to do splay_tree_lookup for each
var to determine whether is it linked or not, like in the patch bellow.
Or do you prefer to set the bit in host compiler too?  It requires
lookup_attribute ("omp declare target link") for all vars in the table during
compilation, but allows to do splay_tree_lookup at run-time only for vars with
MSB set in host_var_table.
Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
only for DSO, but it crashed when an executable leaves nonzero refcount, because
target device may be already uninitialized from plugin's __run_exit_handlers
(and it is in case of intelmic), so gomp_exit_data cannot run free_func.
Is it possible do add some atexit (...) to libgomp, which will set shutting_down
flag, and just do nothing in gomp_unload_image_from_device if it is set?


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 369574f..b73caa1 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -822,6 +822,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_simd_attribute, false },
   { "omp declare target", 0, 0, true, false, false,
  handle_omp_declare_target_attribute, false },
+  { "omp declare target link", 0, 0, true, false, false,
+ 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > This looks wrong, both of these clearly could affect anything with
> > DECL_HAS_VALUE_EXPR_P, not just the link vars.
> > So, if you need to handle the "omp declare target link" vars specially,
> > you should only handle those specially and nothing else.  And please try to
> > explain why.
> 
> Actually these ifndefs are not needed, because assemble_decl never will be
> called by accel compiler for original link vars.  I've added a check into
> output_in_order, but missed a second place where assemble_decl is called -
> symbol_table::output_variables.  So, fixed now.

Great.

> > Do we need to do anything in gomp_unload_image_from_device ?
> > I mean at least in questionable programs that for link vars don't decrement
> > the refcount of the var that replaced the link var to 0 first before
> > dlclosing the library.
> > At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> > handle it differently.  Perhaps for that case perform a lookup, and if we
> > get something which has link_map non-NULL, first perform as if there is
> > target exit data delete (var) on it first?
> 
> You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
> refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
> set only by accel compiler.  But it's possible to do splay_tree_lookup for 
> each
> var to determine whether is it linked or not, like in the patch bellow.
> Or do you prefer to set the bit in host compiler too?  It requires
> lookup_attribute ("omp declare target link") for all vars in the table during
> compilation, but allows to do splay_tree_lookup at run-time only for vars with
> MSB set in host_var_table.
> Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
> only for DSO, but it crashed when an executable leaves nonzero refcount, 
> because
> target device may be already uninitialized from plugin's __run_exit_handlers
> (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> Is it possible do add some atexit (...) to libgomp, which will set 
> shutting_down
> flag, and just do nothing in gomp_unload_image_from_device if it is set?

Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
you perform the same action as would delete(var) do in that case.
Calling gomp_exit_data e.g. looks it up again etc.
Supposedly having the MSB in host table too is useful, so if you could
handle that, it would be nice.  And splay_tree_lookup only if the MSB is
set.
So,
if (!host_data_has_msb_set)
  splay_tree_remove (>mem_map, );
else
  {
splay_tree_key n = splay_tree_lookup (>mem_map, );
if (n->link_key)
  {
n->refcount = 0;
n->link_key = NULL;
splay_tree_remove (>mem_map, n);
if (n->tgt->refcount > 1)
  n->tgt->refcount--;
else
  gomp_unmap_tgt (n->tgt);
  }
else
  splay_tree_remove (>mem_map, n);
  }
or so.

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > You're right, it doesn't deallocate memory on the device if DSO leaves 
> > nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, 
> > it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for 
> > each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table 
> > during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars 
> > with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device 
> > works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, 
> > because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set 
> > shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
> if (!host_data_has_msb_set)
>   splay_tree_remove (>mem_map, );
> else
>   {
> splay_tree_key n = splay_tree_lookup (>mem_map, );
> if (n->link_key)
> {
>   n->refcount = 0;
>   n->link_key = NULL;
>   splay_tree_remove (>mem_map, n);
>   if (n->tgt->refcount > 1)
> n->tgt->refcount--;
>   else
> gomp_unmap_tgt (n->tgt);
> }
>   else
> splay_tree_remove (>mem_map, n);
>   }
> or so.

Ok, but it doesn't solve the issue with doing it for the executable, because
gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

  -- Ilya


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Jakub Jelinek
On Fri, Nov 27, 2015 at 07:50:09PM +0300, Ilya Verbin wrote:
> On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> > On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context 
> > > *ctx)
> > > decl = OMP_CLAUSE_DECL (c);
> > > /* Global variables with "omp declare target" attribute
> > >don't need to be copied, the receiver side will use them
> > > -  directly.  */
> > > +  directly.  However, global variables with "omp declare target link"
> > > +  attribute need to be copied.  */
> > > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> > > && DECL_P (decl)
> > > && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context 
> > > *ctx)
> > >  != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> > > || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> > > && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > > -   && varpool_node::get_create (decl)->offloadable)
> > > +   && varpool_node::get_create (decl)->offloadable
> > > +   && !lookup_attribute ("omp declare target link",
> > > + DECL_ATTRIBUTES (decl)))
> > 
> > I wonder if Honza/Richi wouldn't prefer to have this info also
> > in cgraph, instead of looking up the attribute in each case.
> 
> So should I add a new flag into cgraph?
> Also it is used in gimplify_adjust_omp_clauses.

Richi said on IRC that lookup_attribute is ok, so let's keep it that way for
now.

> +   /* Most significant bit of the size marks such vars.  */
> +   unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> +   isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);

That supposedly should be BITS_PER_UNIT instead of 8.

> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 36f19a6..cbd1e05 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
>   are not real variables, but just info for debugging and codegen.
>   Unfortunately at the moment emutls is not updating varpool correctly
>   after turning real vars into value_expr vars.  */
> +#ifndef ACCEL_COMPILER
>if (DECL_HAS_VALUE_EXPR_P (decl)
>&& !targetm.have_tls)
>  return false;
> +#endif
>  
>/* Hard register vars do not need to be output.  */
>if (DECL_HARD_REGISTER (decl))
>  return false;
>  
> +#ifndef ACCEL_COMPILER
>gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
>  && TREE_CODE (decl) == VAR_DECL
>  && !DECL_HAS_VALUE_EXPR_P (decl));
> +#endif

This looks wrong, both of these clearly could affect anything with
DECL_HAS_VALUE_EXPR_P, not just the link vars.
So, if you need to handle the "omp declare target link" vars specially,
you should only handle those specially and nothing else.  And please try to
explain why.

> @@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>for (i = 0; i < num_vars; i++)
>  {
>struct addr_pair *target_var = _table[num_funcs + i];
> -  if (target_var->end - target_var->start
> -   != (uintptr_t) host_var_table[i * 2 + 1])
> +  uintptr_t target_size = target_var->end - target_var->start;
> +
> +  /* Most significant bit of the size marks "omp declare target link"
> +  variables.  */
> +  bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));

__CHAR_BIT__ here instead of 8?

> @@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
>k->tgt = tgt;
>k->tgt_offset = target_var->start;
> -  k->refcount = REFCOUNT_INFINITY;
> +  k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
>k->async_refcount = 0;
>array->left = NULL;
>array->right = NULL;

Do we need to do anything in gomp_unload_image_from_device ?
I mean at least in questionable programs that for link vars don't decrement
the refcount of the var that replaced the link var to 0 first before
dlclosing the library.
At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
handle it differently.  Perhaps for that case perform a lookup, and if we
get something which has link_map non-NULL, first perform as if there is
target exit data delete (var) on it first?

Jakub


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-27 Thread Ilya Verbin
On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >   decl = OMP_CLAUSE_DECL (c);
> >   /* Global variables with "omp declare target" attribute
> >  don't need to be copied, the receiver side will use them
> > -directly.  */
> > +directly.  However, global variables with "omp declare target link"
> > +attribute need to be copied.  */
> >   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> >   && DECL_P (decl)
> >   && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >!= GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> >   || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> >   && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > - && varpool_node::get_create (decl)->offloadable)
> > + && varpool_node::get_create (decl)->offloadable
> > + && !lookup_attribute ("omp declare target link",
> > +   DECL_ATTRIBUTES (decl)))
> 
> I wonder if Honza/Richi wouldn't prefer to have this info also
> in cgraph, instead of looking up the attribute in each case.

So should I add a new flag into cgraph?
Also it is used in gimplify_adjust_omp_clauses.

> > +  if (var.link_ptr_decl == NULL_TREE)
> > +   addr = build_fold_addr_expr (var.decl);
> > +  else
> > +   {
> > + /* For "omp declare target link" var use address of the pointer
> > +instead of address of the var.  */
> > + addr = build_fold_addr_expr (var.link_ptr_decl);
> > + /* Most significant bit of the size marks such vars.  */
> > + unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > + isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> > + size = wide_int_to_tree (const_ptr_type_node, isize);
> > +
> > + /* FIXME: Remove varpool node of var?  */
> 
> There is varpool_node::remove (), but not sure if at this point all the
> references are already gone.

Actually removing varpool node here will not remove var from the target code, so
I've added a check in cgraphunit.c before assemble_decl ().

> > +class pass_omp_target_link : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_omp_target_link (gcc::context *ctxt)
> > +: gimple_opt_pass (pass_data_omp_target_link, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *fun)
> > +{
> > +#ifdef ACCEL_COMPILER
> > +  /* FIXME: Replace globals in target regions too or not?  */
> > +  return lookup_attribute ("omp declare target",
> > +  DECL_ATTRIBUTES (fun->decl));
> 
> Certainly in "omp declare target entrypoint" regions too.

Done.

> > +unsigned
> > +pass_omp_target_link::execute (function *fun)
> > +{
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, fun)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> > +   {
> > + unsigned i;
> > + gimple *stmt = gsi_stmt (gsi);
> > + for (i = 0; i < gimple_num_ops (stmt); i++)
> > +   {
> > + tree op = gimple_op (stmt, i);
> > + tree var = NULL_TREE;
> > +
> > + if (!op)
> > +   continue;
> > + if (TREE_CODE (op) == VAR_DECL)
> > +   var = op;
> > + else if (TREE_CODE (op) == ADDR_EXPR)
> > +   {
> > + tree op1 = TREE_OPERAND (op, 0);
> > + if (TREE_CODE (op1) == VAR_DECL)
> > +   var = op1;
> > +   }
> > + /* FIXME: Support arrays.  What else?  */
> 
> We need to support all the references to the variables.
> So, I think this approach is not right.
> 
> > +
> > + if (var && lookup_attribute ("omp declare target link",
> > +  DECL_ATTRIBUTES (var)))
> > +   {
> > + tree type = TREE_TYPE (var);
> > + tree ptype = build_pointer_type (type);
> > +
> > + /* Find var in offload table.  */
> > + omp_offload_var *table_entry = NULL;
> > + for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
> > +   if ((*offload_vars)[j].decl == var)
> > + {
> > +   table_entry = &(*offload_vars)[j];
> > +   break;
> > + }
> 
> Plus this would be terribly expensive if there are many variables in
> offload_vars.
> So, what I think should be done instead is that you first somewhere, perhaps
> when streaming in the decls from LTO in ACCEL_COMPILER or so, create
> the artificial link ptr variables for the "omp declare target link"
> global vars and
>   SET_DECL_VALUE_EXPR (var, build_simple_mem_ref (link_ptr_var));
>   DECL_HAS_VALUE_EXPR_P (var) = 1;
> and then in this pass just 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-19 Thread Jakub Jelinek
On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
> going to resolve, however target-link-1.c testcase pass.
> Is this approach correct?  Any comments on FIXMEs?
> 
> 
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 23d0107..58771c0 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
> g->have_offload = true;
> if (is_a  (node))
>   {
> -   vec_safe_push (offload_vars, t);
> +   omp_offload_var var;
> +   var.decl = t;
> +   var.link_ptr_decl = NULL_TREE;
> +   vec_safe_push (offload_vars, var);
> node->force_output = 1;
>   }

Another possible approach would be to keep offload_vars as
vector of trees, and simply push 2 trees in each case.
Or not to change this at all, see below.

> @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> decl = OMP_CLAUSE_DECL (c);
> /* Global variables with "omp declare target" attribute
>don't need to be copied, the receiver side will use them
> -  directly.  */
> +  directly.  However, global variables with "omp declare target link"
> +  attribute need to be copied.  */
> if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> && DECL_P (decl)
> && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -   && varpool_node::get_create (decl)->offloadable)
> +   && varpool_node::get_create (decl)->offloadable
> +   && !lookup_attribute ("omp declare target link",
> + DECL_ATTRIBUTES (decl)))

I wonder if Honza/Richi wouldn't prefer to have this info also
in cgraph, instead of looking up the attribute in each case.

> +  if (var.link_ptr_decl == NULL_TREE)
> + addr = build_fold_addr_expr (var.decl);
> +  else
> + {
> +   /* For "omp declare target link" var use address of the pointer
> +  instead of address of the var.  */
> +   addr = build_fold_addr_expr (var.link_ptr_decl);
> +   /* Most significant bit of the size marks such vars.  */
> +   unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> +   isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> +   size = wide_int_to_tree (const_ptr_type_node, isize);
> +
> +   /* FIXME: Remove varpool node of var?  */

There is varpool_node::remove (), but not sure if at this point all the
references are already gone.

> +class pass_omp_target_link : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_target_link (gcc::context *ctxt)
> +: gimple_opt_pass (pass_data_omp_target_link, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +{
> +#ifdef ACCEL_COMPILER
> +  /* FIXME: Replace globals in target regions too or not?  */
> +  return lookup_attribute ("omp declare target",
> +DECL_ATTRIBUTES (fun->decl));

Certainly in "omp declare target entrypoint" regions too.

> +unsigned
> +pass_omp_target_link::execute (function *fun)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +{
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> + {
> +   unsigned i;
> +   gimple *stmt = gsi_stmt (gsi);
> +   for (i = 0; i < gimple_num_ops (stmt); i++)
> + {
> +   tree op = gimple_op (stmt, i);
> +   tree var = NULL_TREE;
> +
> +   if (!op)
> + continue;
> +   if (TREE_CODE (op) == VAR_DECL)
> + var = op;
> +   else if (TREE_CODE (op) == ADDR_EXPR)
> + {
> +   tree op1 = TREE_OPERAND (op, 0);
> +   if (TREE_CODE (op1) == VAR_DECL)
> + var = op1;
> + }
> +   /* FIXME: Support arrays.  What else?  */

We need to support all the references to the variables.
So, I think this approach is not right.

> +
> +   if (var && lookup_attribute ("omp declare target link",
> +DECL_ATTRIBUTES (var)))
> + {
> +   tree type = TREE_TYPE (var);
> +   tree ptype = build_pointer_type (type);
> +
> +   /* Find var in offload table.  */
> +   omp_offload_var *table_entry = NULL;
> +   for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
> + if ((*offload_vars)[j].decl == var)
> +   {
> + 

[gomp4.5] Handle #pragma omp declare target link

2015-11-16 Thread Ilya Verbin
Hi!

On Mon, Oct 26, 2015 at 20:49:40 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > > Without declare target link or to, you can't use the global variables
> > > in orphaned accelerated routines (unless you e.g. take the address of the
> > > mapped variable in the region and pass it around).
> > > The to variables (non-deferred) are always mapped and are initialized with
> > > the original initializer, refcount is infinity.  link (deferred) work more
> > > like the normal mapping, referencing those vars when they aren't 
> > > explicitly
> > > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped 
> > > freshly
> > > with to kind, it gets the current value of the host var rather than the
> > > original one.  But, beyond the mapping the compiler needs to ensure that
> > > all uses of the link global var (or perhaps just all uses of the link 
> > > global
> > > var outside of the target construct body where it is mapped, because you
> > > could use there the pointer you got from GOMP_target) are replaced by
> > > dereference of some artificial pointer, so a becomes *a_tmp and  becomes
> > > &*a_tmp, and that the runtime library during registration of the tables is
> > > told about the address of this artificial pointer.  During registration,
> > > I'd expect it would stick an entry for this range into the table, with 
> > > some
> > > special flag or something similar, indicating that it is deferred mapping
> > > and where the offloading device pointer is.  During mapping, it would map 
> > > it
> > > as any other not yet mapped object, but additionally would also set this
> > > device pointer to the device address of the mapped object.  We also need 
> > > to
> > > ensure that when we drop the refcount of that mapping back to 0, we get it
> > > back to the state where it is described as a range with registered 
> > > deferred
> > > mapping and where the device pointer is.
> > 
> > Ok, got it, I'll try implement this...
> 
> Thanks.
> 
> > > > > we actually replace the variables with pointers to variables, then 
> > > > > need
> > > > > to somehow also mark those in the offloading tables, so that the 
> > > > > library
> > > > 
> > > > I see 2 possible options: use the MSB of the size, or introduce the 
> > > > third field
> > > > for flags.
> > > 
> > > Well, it can be either recorded in the host variable tables (which contain
> > > address and size pair, right), or in corresponding offloading device table
> > > (which contains the pointer, something else?).
> > 
> > It contains a size too, which is checked in libgomp:
> >   gomp_fatal ("Can't map target variables (size mismatch)");
> > Yes, we can remove this check, and use second field in device table for 
> > flags.
> 
> Yeah, or e.g. just use MSB of that size (so check that either the size is
> the same (then it is target to) or it is MSB | size (then it is target link).
> Objects larger than half of the address space aren't really supportable
> anyway.

Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
going to resolve, however target-link-1.c testcase pass.
Is this approach correct?  Any comments on FIXMEs?


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..58771c0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d1f4970..b890f6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34999,7 +34999,10 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 67a9024..878a9c5 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1106,7 +1106,7 @@ output_offload_tables (void)
   streamer_write_enum (ob->main_stream, LTO_symtab_tags,
   LTO_symtab_last_tag, LTO_symtab_variable);
   lto_output_var_decl_index (ob->decl_state, ob->main_stream,
-(*offload_vars)[i]);
+(*offload_vars)[i].decl);
 }
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1902,7 +1902,10 @@ input_offload_tables (void)