[og7] Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2018-05-20 Thread Thomas Schwinge
Hi!

(This whole idea/patch still needs an overall re-work, as discussed, but
here is a small incremental improvement/bug fix.)

On Thu, 20 Aug 2015 22:52:58 +, Joseph Myers  
wrote:
> On Tue, 18 Aug 2015, Thomas Schwinge wrote:
> > [...] here is my current messy WIP patch [...]

> +/* List of offload targets, separated by colon.  Defaults to the list
> +   determined when configuring libgomp.  */
> +static const char *gomp_offload_targets = OFFLOAD_TARGETS;
> +static bool gomp_offload_targets_init = false;
> +
> +/* Override the list of offload targets.  This must be called early, and only
> +   once.  */
> +
> +void
> +GOMP_set_offload_targets (const char *offload_targets)
> +{
> +  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
> +
> +  /* Make sure this gets called early.  */
> +  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
> +  /* Make sure this only gets called once.  */
> +  assert (!gomp_offload_targets_init);
> +  gomp_offload_targets_init = true;
> +  gomp_offload_targets = offload_targets;
> +}

This will obviously fail as soon as there are shared libraries involved,
compiled for offloading, which contain additional
GOMP_set_offload_targets constructor calls.  Thus pushed to
openacc-gcc-7-branch:

commit 917e247055a37f912129ed545719182de0046adb
Author: Thomas Schwinge 
Date:   Sun May 20 21:31:01 2018 +0200

[PR81886] Avoid "GOMP_set_offload_targets: Assertion 
`!gomp_offload_targets_init' failed"

PR libgomp/81886
* openacc.h (enum acc_device_t): Add _acc_device_intel_mic,
_acc_device_hsa.
* oacc-init.c (get_openacc_name): Handle these.
(resolve_device): Debugging output.
* target.c (resolve_device, gomp_init_device)
(gomp_offload_target_available_p): Likewise.
(GOMP_set_offload_targets): Rewrite.
* testsuite/libgomp.oacc-c++/c++.exp: Provide offload target in
"-DACC_DEVICE_TYPE_host", and "-DACC_DEVICE_TYPE_nvidia".
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
* testsuite/libgomp.oacc-c/offload-targets-1.c: New file.
* testsuite/libgomp.oacc-c/offload-targets-2.c: Likewise.
* testsuite/libgomp.oacc-c/offload-targets-3.c: Likewise.
* testsuite/libgomp.oacc-c/offload-targets-4.c: Likewise.
* testsuite/libgomp.oacc-c/offload-targets-5.c: Likewise.
* testsuite/libgomp.oacc-c/offload-targets-6.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
---
 libgomp/ChangeLog.openacc  |  34 
 libgomp/oacc-init.c|   7 +
 libgomp/openacc.h  |   2 +
 libgomp/target.c   | 178 +++--
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |   4 +-
 .../libgomp.oacc-c-c++-common/acc-on-device-2.c|   2 +-
 .../libgomp.oacc-c-c++-common/acc_on_device-1.c|   4 +-
 .../libgomp.oacc-c-c++-common/pr85381-2.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-4.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-5.c  |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/pr85381.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c  |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/pr85486.c  |   3 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp |   4 +-
 .../testsuite/libgomp.oacc-c/offload-targets-1.c   | 119 ++
 .../testsuite/libgomp.oacc-c/offload-targets-2.c   |   2 +
 .../testsuite/libgomp.oacc-c/offload-targets-3.c   |  10 ++
 .../testsuite/libgomp.oacc-c/offload-targets-4.c   |  11 ++
 .../testsuite/libgomp.oacc-c/offload-targets-5.c   |  10 ++
 .../testsuite/libgomp.oacc-c/offload-targets-6.c   |  11 ++
 

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Bernd Schmidt

On 10/20/2015 12:02 PM, Jakub Jelinek wrote:

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?


Last time around I think I asked for some minor changes, like updated 
documentation for give_switch. Other than that, I'm ok with the patch 
iff you are happy with the overall approach.



Bernd




Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > How's the following (complete patch instead of incremental patch; the
> > driver changes are still the same as before)?  The changes are:
> > 
> >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> >   * libgomp/target.c:resolve_device and
> > libgomp/oacc-init.c:resolve_device verify that a default device
> > (OpenMP device-var ICV, and acc_device_default, respectively) is
> > actually enabled, or resort to host fallback if not.
> >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > to enable devices specified by -foffload.  Can be called multiple
> > times (executable, any shared libraries); the set of enabled devices
> > is the union of all those ever requested.
> >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > changed to enable all devices.  This is to maintain compatibility
> > with old executables and shared libraries built without the -foffload
> > constructor support.

Any reason not to pass the bitmask of the enabled targets to
GOMP_offload_register_ver instead, to decrease the amount of ctors and
the times you lock the various locks during initialization, or just enable
automatically the devices you load data for during GOMP_offload_register_ver?
I mean, GOMP_offload_register would enable for compatibility all devices,
GOMP_offload_register_ver would enable the device it is registered for.
For -foffload=disable on all shared libraries/binaries, naturally you would
not register anything, thus would not enable any devices (only host fallback
would work).

Or are you worried about the case where one shared library is compiled
with say -foffload=intelmic,ptx but doesn't actually contain any
#pragma omp target/#pragma omp declare target (or OpenACC similar
#directives), but only contains #pragma omp target data and/or the device
query/copying routines, then dlopens some other shared library that actually
has the offloading device code?
That could be solved by adding the call you are talking about, but
if we really should care about that unlikely case, it would be better to
only arrange for it if really needed by the shared library (i.e. if it calls
one of the OpenMP or OpenACC library routines that talk to the devices, or
has #pragma omp target data or similar constructs;
I'd strongly prefer not to have constructors in code that just got compiled
with -fopenmp, even in configuration where some offloading is configured by
default, when nothing in the code really cares about offloading.

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -401,6 +401,8 @@ static const char 
> *compare_debug_auxbase_opt_spec_function (int, const char **);
>  static const char *pass_through_libs_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (int, const char **);
>  static const char *greater_than_spec_func (int, const char **);
> +static const char *add_omp_infile_spec_func (int, const char **);
> +
>  static char *convert_white_space (char *);
>  
>  /* The Specs Language

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?

> diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
> index 24fbb94..5da4fa7 100644
> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -48,7 +48,8 @@ enum offload_target_type
>OFFLOAD_TARGET_TYPE_HOST = 2,
>/* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
>OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> +  OFFLOAD_TARGET_TYPE_HWM

What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

> diff --git a/libgomp/target.c b/libgomp/target.c
> index b767410..df51bfb 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -72,6 +72,9 @@ static int num_offload_images;
>  /* Array of descriptors for all available devices.  */
>  static struct gomp_device_descr *devices;
>  
> +/* Set of enabled devices.  */
> +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];

I must say I don't like the locking for this.
If all you ever change on this is that you change it from 0 to 1,
then supposedly just storing it with __atomic_store, perhaps with
rel semantics, and reading it as __atomic_load, with acquire semantics,
would be good enough?  And perhaps change it into int array,
so that it is actually atomic even on the old Alphas (if there are any
around).

Jakub


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Thomas Schwinge
Hi Jakub!

Thanks for the review.

On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinek  wrote:
> On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > > How's the following (complete patch instead of incremental patch; the
> > > driver changes are still the same as before)?  The changes are:
> > > 
> > >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> > >   * libgomp/target.c:resolve_device and
> > > libgomp/oacc-init.c:resolve_device verify that a default device
> > > (OpenMP device-var ICV, and acc_device_default, respectively) is
> > > actually enabled, or resort to host fallback if not.
> > >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > > to enable devices specified by -foffload.  Can be called multiple
> > > times (executable, any shared libraries); the set of enabled devices
> > > is the union of all those ever requested.
> > >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > > changed to enable all devices.  This is to maintain compatibility
> > > with old executables and shared libraries built without the -foffload
> > > constructor support.
> 
> Any reason not to pass the bitmask of the enabled targets to
> GOMP_offload_register_ver instead, to decrease the amount of ctors and
> the times you lock the various locks during initialization, or just enable
> automatically the devices you load data for during GOMP_offload_register_ver?
> I mean, GOMP_offload_register would enable for compatibility all devices,
> GOMP_offload_register_ver would enable the device it is registered for.
> For -foffload=disable on all shared libraries/binaries, naturally you would
> not register anything, thus would not enable any devices (only host fallback
> would work).

As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

#include 
int main()
{
  __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
  return 0;
}

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?

Always creating (dummy) GOMP_offload_register_ver constructors has been
another suggestion that I had voiced much earlier in this thread (months
ago), but everyone (including me) taking part in the discussion agreed
that it'd cause even higher compile-time overhead.

> Or are you worried about the case where one shared library is compiled
> with say -foffload=intelmic,ptx but doesn't actually contain any
> #pragma omp target/#pragma omp declare target (or OpenACC similar
> #directives), but only contains #pragma omp target data and/or the device
> query/copying routines, then dlopens some other shared library that actually
> has the offloading device code?

That's another example, yes.

> That could be solved by adding the call you are talking about, but
> if we really should care about that unlikely case, it would be better to
> only arrange for it if really needed by the shared library (i.e. if it calls
> one of the OpenMP or OpenACC library routines that talk to the devices, or
> has #pragma omp target data or similar constructs;
> I'd strongly prefer not to have constructors in code that just got compiled
> with -fopenmp, even in configuration where some offloading is configured by
> default, when nothing in the code really cares about offloading.

So, how to resolve our different opinions?  I mean, for any serious
program code, there will be constructor calls into libgomp already; are
you expecting that adding one more really will cause any noticeable
overhead?

I agree that enabling devices for GOMP_offload_register_ver calls makes
sense.  (I indeed had considered this earlier, but it didn't lead to
solving the problem complete -- see above.)  Can we come up with a scheme
to do it this way, and only generate the GOMP_enable_offload_targets
constructor of no GOMP_offload_register_ver constructors have been
generated?  But I have no idea how to implement that in a non-convoluted
way.  (And, it sounds excessive to me in terms of implementation overhead
on our side, in contrast to execution overhead of one libgomp constructor
call.)

> > --- a/gcc/gcc.c
> > +++ b/gcc/gcc.c
> > @@ -401,6 +401,8 @@ static const char 
> > *compare_debug_auxbase_opt_spec_function (int, const char **);
> >  static const char *pass_through_libs_spec_func (int, const char **);
> >  static const char *replace_extension_spec_func (int, const char **);
> >  static 

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote:
> Always creating (dummy) GOMP_offload_register_ver constructors has been
> another suggestion that I had voiced much earlier in this thread (months
> ago), but everyone (including me) taking part in the discussion agreed
> that it'd cause even higher compile-time overhead.

I'd prefer to just set a flag like "force creation of the GOMP offloading
sections" whenever you see one of the APIs or constructs used in the TU,
and if that flag is set, even when there are no offloaded vars or
functions/kernels, force creation of the corresponding data sections.
Either it can be stardard offloading LTO sections, just not containing
anything, or, if you want to improve compile-time, it could be special too,
so that the linker plugin can quickly identify those that only need
offloading support, but don't have any offloaded vars or code.
But that can certainly be done as an incremental optimization.

For OpenMP that would be whenever
#pragma omp target{, data, enter data, exit data} construct is seen
(e.g. during gimplification or OMP region nesting checking even better),
or for

omp_set_default_device
omp_get_default_device
omp_get_num_devices
omp_is_initial_device
omp_get_initial_device
omp_target_alloc
omp_target_free
omp_target_is_present
omp_target_memcpy
omp_target_memcpy_rect
omp_target_associate_ptr
omp_target_disassociate_ptr

calls.  Guess for OpenACC you have similar set of calls.
The thing is, while OpenACC is standard is pretty much solely about offloading,
OpenMP is not, and in many cases programs just use host OpenMP parallelization
(at least right now, I bet such programs are significantly larger set
than programs that use OpenACC or OpenMP offloading together).
Distributions and others will eventually configure the compilers they are
shipping to enable the offloading, and if that forces a constructor to every
TU or even every shared library just because it has been compiled with
-fopenmp, it is unacceptable overhead.

For the vendor shipped binary compilers, I'm envisioning ideal would be to
be able to configure gcc for many offloading targets, then build such main
compiler and offloading target compilers, but package them separately (one
package (or set of packages) the base compiler, and then another package (or
set of them) for each offloading target.  What the -foffload= actually will
be in the end from the linked shared library or binary POV would depend both
on the configured offloading target, but also on whether the mkoffload
binaries are found (or whatever else is needed first from the offloading
target).  That would mean that we'd not issue hard error or any kind of
diagnostics if mkoffload is missing.  Is that acceptable, or should that
e.g. be limited just to the compiled in configure default (i.e. explicit
-foffload= would error if the requested mkoffload is missing, default
-foffload= would silently skip unavailable ones; I guess this would be my
preference), or should we have two ways of configuring the offloading
targets, as hard requirements and as optional support?

> So, how to resolve our different opinions?  I mean, for any serious
> program code, there will be constructor calls into libgomp already; are
> you expecting that adding one more really will cause any noticeable
> overhead?

See above, that is really not the case.  Most of OpenMP code doesn't have
any constructor calls into libgomp at all, the only exception is
GOMP_offload_register{,_ver} at this point.

> > What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?
> 
> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

*_LAST or *_last is actually what we use pretty much everywhere, see e.g.
lots of places in tree-core.h.

> Are you worried about the performance issues of a very short locking
> cycle that in the majority of all cases should happen without blocking,
> in comparison to performance issues related to host/device memory
> transfers or kernel launches that will follow after the call to
> gomp_offload_target_enabled_p?  I don't really think that is reasonable
> to worry about.

Yes, I'm worried about that.  The lock could be contended, and if you take
the lock many times for each construct, it can show up, I'm worried about
cache effects etc.  It is already bad enough that we take/release the locks
for the same device e.g. in each of:
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);

  struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, 

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Bernd Schmidt

On 10/20/2015 01:17 PM, Thomas Schwinge wrote:


As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

 #include 
 int main()
 {
   __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
   return 0;
 }

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?


Both of you can ignore me if you feel I'm not making sense, but what 
exactly is the use case for -foffload=disable? Isn't it slightly 
redundant with -fno-openacc? IMO it's not an option that alters the 
available devices, that's a question that is answered at run-time and 
doesn't (or shouldn't) really depend on compiler switches. As a user I'd 
expect -foffload=disable to just prevent generation of offloaded code 
for the things I'm compiling. As Jakub pointed out, shared libraries may 
still contain other pieces that are offloadable.


I guess I don't fully understand why you want to go to great lengths to 
disable devices at run-time based on a compile-time switch. What's the 
reasoning here?



Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
told me this means "High Water Mark".  I have no strong opinion on the
name to use, just want to mention that "*_LAST" sounds to me like that
one still is part of the accepted set, whereas in this case it'd be the
first enumerator outside of the accepted ones.  (And I guess, we agree
that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
"OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
ugly?)


Nah, just rename HWM to LAST, that's fairly common usage I think.


Bernd


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote:
> Both of you can ignore me if you feel I'm not making sense, but what exactly
> is the use case for -foffload=disable? Isn't it slightly redundant with
> -fno-openacc? IMO it's not an option that alters the available devices,
> that's a question that is answered at run-time and doesn't (or shouldn't)
> really depend on compiler switches. As a user I'd expect -foffload=disable
> to just prevent generation of offloaded code for the things I'm compiling.
> As Jakub pointed out, shared libraries may still contain other pieces that
> are offloadable.
> 
> I guess I don't fully understand why you want to go to great lengths to
> disable devices at run-time based on a compile-time switch. What's the
> reasoning here?

At least for OpenMP, I'm also happy with what we do now (except for the
ability to configure offloading targets as optional, i.e. dynamically
configure the default based on what packages user install rather than
just on how it has been configured, so that e.g. just because it has been
configured for PTX offloading the host GCC itself doesn't have to have a
dependency on the proprietary CUDA stuff in any way).
I believe in OpenMP nobody says that if the device HW is available, but user
chose to not compile offloading code/variables for that particular device
that it can't show up among omp_get_num_devices ().  And I think it is
entirely fine if say target data map succeeds to that device, but then
target is offloaded, if that is caused by users configure or command line
choice.  Maybe OpenACC has different requirements, is it required to
terminate the program if it can't fulfill the requested offloading?

In any case, I'm fine with something I've noted in the last mail, or with
the status quo, but not with running constructors in TUs or even shared
libraries just because they have been compiled with -fopenmp (and either
haven't used any OpenMP code at all, or just the non-*target* directives).

Jakub


Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-19 Thread Thomas Schwinge
Hi!

Ping...

On Wed, 30 Sep 2015 17:54:07 +0200, I wrote:
> On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek  wrote:
> > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  
> > > wrote:
> > > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > > construct[ors] of all shared libraries (and the binary) that contain 
> > > > offloaded
> > > > code?  If yes, that is surely going to fail the assertions in there.
> > > 
> > > Indeed.  My original plan has been to generate/invoke this constructor
> > > only for/from the final executable and not for any shared libraries, but
> > > it seems I didn't implemented this correctly.
> > 
> > How would you mean to implement it?
> 
> I have come to realize that we need to generate/invoke this constructor
> From everything that links against libgomp (which is what I implemented),
> that is, executables as well as shared libraries.
> 
> > -fopenmp or -fopenacc code with
> > offloading bits might not be in the final executable at all, nor in shared
> > libraries it is linked against; such libraries could be only dlopened,
> > consider say python plugin.  And this is not just made up, perhaps not with
> > offloading yet, but people regularly use OpenMP code in plugins and then we
> > get complains that fork child of the main program is not allowed to do
> > anything but async-signal-safe functions.
> 
> I'm not sure I'm completely understanding that paragraph?  Are you saying
> that offloaded code can be in libraries that are not linked against
> libgomp?  How would these register (GOMP_offload_register) their
> offloaded code?  I think it's a reasonable to expect that every shared
> library that contains offloaded code must link against libgomp, which
> will happen automatically given that it is built with -fopenmp/-fopenacc?
> 
> > > > You can dlopen such libraries etc.  What if you link one library with
> > > > -fopenmp=nvptx-none and another one with 
> > > > -fopenmp=x86_64-intelmicemul-linux?
> > > 
> > > So, the first question to answer is: what do we expect to happen in this
> > > case, or similarly, if the executable and any shared libraries are
> > > compiled with different/incompatible -foffload options?
> > 
> > As the device numbers are per-process, the only possibility I see is that
> > all the physically available devices are always available, and just if you
> > try to offload from some code to a device that doesn't support it, you get
> > host fallback.  Because, one shared library could carefully use device(xyz)
> > to offload to say XeonPhi it is compiled for and supports, and another
> > library device(abc) to offload to PTX it is compiled for and supports.
> 
> OK, I think I get that, and it makes sense.  Even though, I don't know
> how you'd do that today: as far as I can tell, there is no specification
> covering the OpenMP 4 target device IDs, so I have no idea how a user
> program/library could realiably use them in practice?  For example, in
> the current GCC implementation, the OpenMP 4 target device IDs depend on
> the number of individual devices availble in the system, and the order in
> which libgomp loads the plugins, which is defined (arbitrarily) by the
> GCC configuration?
> 
> > > For this, I propose that the only mode of operation that we currently can
> > > support is that all of the executable and any shared libraries agree on
> > > the offload targets specified by -foffload, and I thus propose the
> > > following patch on top of what Joseph has posted before (passes the
> > > testsuite, but not yet tested otherwise):
> > 
> > See above, no.
> 
> OK.
> 
> How's the following (complete patch instead of incremental patch; the
> driver changes are still the same as before)?  The changes are:
> 
>   * libgomp/target.c:gomp_target_init again loads all the plugins.
>   * libgomp/target.c:resolve_device and
> libgomp/oacc-init.c:resolve_device verify that a default device
> (OpenMP device-var ICV, and acc_device_default, respectively) is
> actually enabled, or resort to host fallback if not.
>   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> to enable devices specified by -foffload.  Can be called multiple
> times (executable, any shared libraries); the set of enabled devices
> is the union of all those ever requested.
>   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> changed to enable all devices.  This is to maintain compatibility
> with old executables and shared libraries built without the -foffload
> constructor support.
>   * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
> of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
> GOMP_offload_unregister.  To avoid enabling all devices
> (GOMP_offload_register).
>   * New test cases to verify this (-foffload=disable, host 

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-25 Thread Joseph Myers
On reviewing in more detail the changes to pass offloading targets
from the driver to libgomp at link time to identify the minimal
self-contained pieces that can go to trunk, I found that the use of
fnmatch to match against target names was completely unnecessary; the
ISO C90 functions strstr and strncmp could be used instead, so
avoiding the need to add configure tests for fnmatch.  This patch duly
removes the use of and configure tests for fnmatch.

Will commit to gomp-4_0-branch subject to test results.

2015-08-25  Joseph Myers  jos...@codesourcery.com

* plugin/configfrag.ac: Don't test for fnmatch.h or fnmatch.
* configure, config.h.in: Regenerate.
* target.c [PLUGIN_SUPPORT]: Don't include fnmatch.h.
(offload_target_to_plugin_name): Use strstr and strncmp instead of
fnmatch.

Index: libgomp/config.h.in
===
--- libgomp/config.h.in (revision 227169)
+++ libgomp/config.h.in (working copy)
@@ -24,12 +24,6 @@
 /* Define to 1 if you have the dlfcn.h header file. */
 #undef HAVE_DLFCN_H
 
-/* Define to 1 if you have the `fnmatch' function. */
-#undef HAVE_FNMATCH
-
-/* Define to 1 if you have the fnmatch.h header file. */
-#undef HAVE_FNMATCH_H
-
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
Index: libgomp/target.c
===
--- libgomp/target.c(revision 227169)
+++ libgomp/target.c(working copy)
@@ -41,7 +41,6 @@
 
 #ifdef PLUGIN_SUPPORT
 #include dlfcn.h
-#include fnmatch.h
 #include plugin-suffix.h
 #endif
 
@@ -1271,9 +1270,9 @@
 static const char *
 offload_target_to_plugin_name (const char *offload_target)
 {
-  if (fnmatch (*-intelmic*, offload_target, 0) == 0)
+  if (strstr (offload_target, -intelmic) != NULL)
 return intelmic;
-  if (fnmatch (nvptx*, offload_target, 0) == 0)
+  if (strncmp (offload_target, nvptx, 5) == 0)
 return nvptx;
   gomp_fatal (Unknown offload target: %s, offload_target);
 }
Index: libgomp/configure
===
--- libgomp/configure   (revision 227169)
+++ libgomp/configure   (working copy)
@@ -15119,33 +15119,6 @@
 offload_targets=
 
 plugin_support=yes
-for ac_header in fnmatch.h
-do :
-  ac_fn_c_check_header_mongrel $LINENO fnmatch.h ac_cv_header_fnmatch_h 
$ac_includes_default
-if test x$ac_cv_header_fnmatch_h = xyes; then :
-  cat confdefs.h _ACEOF
-#define HAVE_FNMATCH_H 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-
-done
-
-for ac_func in fnmatch
-do :
-  ac_fn_c_check_func $LINENO fnmatch ac_cv_func_fnmatch
-if test x$ac_cv_func_fnmatch = xyes; then :
-  cat confdefs.h _ACEOF
-#define HAVE_FNMATCH 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-done
-
 { $as_echo $as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl 5
 $as_echo_n checking for dlsym in -ldl...  6; }
 if test ${ac_cv_lib_dl_dlsym+set} = set; then :
Index: libgomp/plugin/configfrag.ac
===
--- libgomp/plugin/configfrag.ac(revision 227169)
+++ libgomp/plugin/configfrag.ac(working copy)
@@ -29,8 +29,6 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
-AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
-AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x$plugin_support = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-24 Thread Joseph Myers
On Fri, 21 Aug 2015, Joseph Myers wrote:

 On Fri, 21 Aug 2015, Nathan Sidwell wrote:
 
  this appears to cause an ICE in add_omp_infile_spec_func at;
gcc_assert (offload_targets != NULL);
  
  when you use something like -foffload='-save-temps -v -fdump-rtl-all
  -fdump-tree-all -fno-verbose-asm'
  
  Is that use ill-formed?
 
 I'll need to reverse-engineer the question of what's a well-formed 
 -foffload= option (bug 67300 filed yesterday for the lack of any 
 documentation of that option).

Although there is no documentation for the -foffload options in the
manual, I found something at https://gcc.gnu.org/wiki/Offloading
that I hope is current.

It turns out the problem wasn't in the assertion, but in how a default
-foffload option was generated.  Generating it via specs meant that if
the only -foffload option specified options without specifying a
target (i.e., options applicable to all the configured offload
targets), then the offload_targets variable was never set and so the
assertion failure resulted (as well as OFFLOAD_TARGET_NAMES not being
exported).  Rather than trying to make the specs produce something if
no -foffload=* options other than -foffload=-* options were passed,
I'm testing this patch to default the offload targets after the
original command line is processed (and before extra options from
these specs are processed, so before the assertion is executed), and
will commit it if tests are OK.

2015-08-24  Joseph Myers  jos...@codesourcery.com

* gcc.c (driver_self_specs) [ENABLE_OFFLOADING]: Don't generate a
-foffload option.
(process_command): Call handle_foffload_option (OFFLOAD_TARGETS)
if no offload target specified.

Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 227045)
+++ gcc/gcc.c   (working copy)
@@ -1064,9 +1064,6 @@ static const char *const multilib_defaults_raw[] =
 static const char *const driver_self_specs[] = {
   %{fdump-final-insns:-fdump-final-insns=.} %fdump-final-insns,
 #ifdef ENABLE_OFFLOADING
-  /* If the user didn't specify any, default to all configured offload
- targets.  */
-  %{!foffload=*:-foffload= OFFLOAD_TARGETS },
   /* If linking against libgomp, add a setup file.  */
   %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1): \
   %:add-omp-infile()},
@@ -4291,6 +4288,11 @@ process_command (unsigned int decoded_options_coun
   CL_DRIVER, handlers, global_dc);
 }
 
+  /* If the user didn't specify any, default to all configured offload
+ targets.  */
+  if (offload_targets == NULL)
+handle_foffload_option (OFFLOAD_TARGETS);
+
   if (output_file
strcmp (output_file, -) != 0
strcmp (output_file, HOST_BIT_BUCKET) != 0)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-24 Thread Joseph Myers
On Mon, 24 Aug 2015, Joseph Myers wrote:

 I'm testing this patch to default the offload targets after the
 original command line is processed (and before extra options from
 these specs are processed, so before the assertion is executed), and
 will commit it if tests are OK.

Now committed to gomp-4_0-branch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-24 Thread Nathan Sidwell

On 08/24/15 18:22, Joseph Myers wrote:

On Mon, 24 Aug 2015, Joseph Myers wrote:


I'm testing this patch to default the offload targets after the
original command line is processed (and before extra options from
these specs are processed, so before the assertion is executed), and
will commit it if tests are OK.


Now committed to gomp-4_0-branch.


thanks!



Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-21 Thread Nathan Sidwell

On 08/20/15 18:52, Joseph Myers wrote:

On Tue, 18 Aug 2015, Thomas Schwinge wrote:



This is what I've committed to gomp-4_0-branch, with the driver changes
substantially cleaned up and smaller changes to the other bits of the
patch.

gcc:
2015-08-20  Thomas Schwinge  tho...@codesourcery.com
Joseph Myers  jos...@codesourcery.com

* doc/invoke.texi (-ffixed-@var{reg}): Document conflict with
Fortran options.
* gcc.c (offload_targets): Update comment.
(add_omp_infile_spec_func, spec_lang_mask_accept): New.
(driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use
%:add-omp-infile().
(static_spec_functions): Add add-omp-infile.
(struct switchstr): Add lang_mask field.  Expand comment.
(struct infile): Add lang_mask field.
(add_infile, save_switch, do_spec): Add lang_mask argument.
(driver_unknown_option_callback, driver_wrong_lang_callback)
(driver_handle_option, process_command, do_self_spec)
(driver::do_spec_on_infiles): All callers changed.
(give_switch): Check languages of switch against
spec_lang_mask_accept.
(driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate
targets variable.
* gcc.h (do_spec): Update prototype.


this appears to cause an ICE in add_omp_infile_spec_func at;
  gcc_assert (offload_targets != NULL);

when you use something like -foffload='-save-temps -v -fdump-rtl-all 
-fdump-tree-all -fno-verbose-asm'


Is that use ill-formed?

nathan


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-21 Thread Joseph Myers
On Fri, 21 Aug 2015, Nathan Sidwell wrote:

 this appears to cause an ICE in add_omp_infile_spec_func at;
   gcc_assert (offload_targets != NULL);
 
 when you use something like -foffload='-save-temps -v -fdump-rtl-all
 -fdump-tree-all -fno-verbose-asm'
 
 Is that use ill-formed?

I'll need to reverse-engineer the question of what's a well-formed 
-foffload= option (bug 67300 filed yesterday for the lack of any 
documentation of that option).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-20 Thread Joseph Myers
On Tue, 18 Aug 2015, Thomas Schwinge wrote:

 So, back to modifying the driver; here is my current messy WIP patch with
 still a lot of TODOs in it -- but it appears to work at last.  :-)
 
 Maybe somebody else is able to continue with that task while I'm out of
 office.  This has been developed on top of gomp-4_0-branch r226832.  I'm
 also attaching a tarball of the even more messy indivdual patches,
 foffload.tar.bz2, in case there's anything to salvage in there, or if
 that helps to understand the development options/history.  Earlier
 messages in this thread should give enough context what this is about,
 http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E.

This is what I've committed to gomp-4_0-branch, with the driver changes 
substantially cleaned up and smaller changes to the other bits of the 
patch.

gcc:
2015-08-20  Thomas Schwinge  tho...@codesourcery.com
Joseph Myers  jos...@codesourcery.com

* doc/invoke.texi (-ffixed-@var{reg}): Document conflict with
Fortran options.
* gcc.c (offload_targets): Update comment.
(add_omp_infile_spec_func, spec_lang_mask_accept): New.
(driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use
%:add-omp-infile().
(static_spec_functions): Add add-omp-infile.
(struct switchstr): Add lang_mask field.  Expand comment.
(struct infile): Add lang_mask field.
(add_infile, save_switch, do_spec): Add lang_mask argument.
(driver_unknown_option_callback, driver_wrong_lang_callback)
(driver_handle_option, process_command, do_self_spec)
(driver::do_spec_on_infiles): All callers changed.
(give_switch): Check languages of switch against
spec_lang_mask_accept.
(driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate
targets variable.
* gcc.h (do_spec): Update prototype.

fortran:
2015-08-20  Joseph Myers  jos...@codesourcery.com

* gfortranspec.c (lang_specific_pre_link): Update call to do_spec.

java:
2015-08-20  Joseph Myers  jos...@codesourcery.com

* jvspec.c (lang_specific_pre_link): Update call to do_spec.

libgomp:
2015-08-20  Thomas Schwinge  tho...@codesourcery.com
Joseph Myers  jos...@codesourcery.com

* plugin/configfrag.ac (fnmatch.h): Check for header.
(fnmatch): Check for function.
(tgt_name): Do not set.
(offload_targets): Separate with colons not commas.
* config.h.in, configure: Regenerate.
* env.c (initialize_env): Make static.  Remove TODO.
* libgomp.h (gomp_offload_target_available_p): New prototype.
* libgomp.map (GOACC_2.0.GOMP_4_BRANCH): Add
GOMP_set_offload_targets.
(INTERNAL): Remove.
* libgomp_g.h (GOMP_set_offload_targets): New prototype.
* oacc-init.c (resolve_device): Do not handle acc_device_host.
Add comments.
* target.c: Include fnmatch.h.
(resolve_device): Use host fallback when offload data not
available.
(gomp_offload_target_available_p, offload_target_to_plugin_name)
(gomp_offload_targets, gomp_offload_targets_init)
(GOMP_set_offload_targets, gomp_plugin_prefix)
(gomp_plugin_suffix): New.
(gomp_load_plugin_for_device): Add gomp_debug call.
(gomp_target_init): Usegomp_offload_targets instead of
OFFLOAD_TARGETS.  Handle and rewrie colon-separated string.
* testsuite/lib/libgomp.exp: Expect offload targets to be
colon-separated.  Adjust matching of offload targets.  Don't
generate constructor here.
(libgomp_target_compile): Use GCC_UNDER_TEST.
(check_effective_target_openacc_nvidia_accel_supported)
(check_effective_target_openacc_host_selected): Adjust checks of
offload target names.
* testsuite/libgomp.c++/c++.exp: Do not set
HAVE_SET_GXX_UNDER_TEST or GXX_UNDER_TEST.
* testsuite/libgomp.c/c.exp: Do not append to
libgomp_compile_options,
* testsuite/libgomp.fortran/fortran.exp: Do not set
GFORTRAN_UNDER_TEST or libgomp_compile_options.
* testsuite/libgomp.graphite/graphite.exp: Do not append to
libgomp_compile_options.
* testsuite/libgomp.oacc-c++/c++.exp: Set SAVE_GCC_UNDER_TEST and
GCC_UNDER_TEST.  Do not set HAVE_SET_GXX_UNDER_TEST and
GXX_UNDER_TEST.  Do not append to ALWAYS_CFLAGS.  Adjust set of
offload targets.  Use -foffload=.
* testsuite/libgomp.oacc-c/c.exp: Do not append to
libgomp_compile_options or ALWAYS_CFLAGS.  Adjust set of offload
targets.  Use -foffload=.
* testsuite/libgomp.oacc-fortran/fortran.exp: Do not set
GFORTRAN_UNDER_TEST or append to libgomp_compile_options.  Do not
append to ALWAYS_CFLAGS.  Adjust set of offload targets.  Use
-foffload=.

Index: libgomp/plugin/configfrag.ac

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-18 Thread Thomas Schwinge
Hi!

On Fri, 14 Aug 2015 22:56:30 +, Joseph Myers jos...@codesourcery.com 
wrote:
 On Fri, 14 Aug 2015, Thomas Schwinge wrote:
 
  Can you suggest off-hand where you'd expect this option filtering to
  happen?  Should this be during specs parsing in the driver; something
  like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
  gcc/gcc.c:give_switch ignore any switches that don't match the expected
  CL_*?  I seem to have difficulties to properly populate/deduce that
  lang_mask at the call sites of gcc/gcc.c:save_switch.

(I figured that out.)

  Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
  to silently ignore options that don't apply instead of emitting a »is
  valid for [...] but not for [...]« diagnostic, if a (new) flag
  (-f[something]?) has been set, which would be active only during the
  add-omp-infile compilation?
 
 That would be a possibility, yes.

..., and that even looked like a sensible thing to do, also given that I
found where you added the internal -lang-asm flag five years ago,
gcc/c-family/c-opts.c:accept_all_c_family_options »Whether options from
all C-family languages should be accepted quietly«, which does a rather
similar thing.

Unfortunately, going that route turned out to not work correctly:
consider the Fortran -ffixed-form option, and likewise the
-ffixed-line-length-[...] options.  If not compiling for Fortran, these
will be passed to C-family front ends, and be recognized there by means
of the Common option -ffixed-[...], resulting in »cc1: warning: unknown
register name: form«, for example.  (Yay!)  ;-)

So, back to modifying the driver; here is my current messy WIP patch with
still a lot of TODOs in it -- but it appears to work at last.  :-)

Maybe somebody else is able to continue with that task while I'm out of
office.  This has been developed on top of gomp-4_0-branch r226832.  I'm
also attaching a tarball of the even more messy indivdual patches,
foffload.tar.bz2, in case there's anything to salvage in there, or if
that helps to understand the development options/history.  Earlier
messages in this thread should give enough context what this is about,
http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E.

 gcc/doc/invoke.texi|   4 +
 gcc/gcc.c  | 200 ++---
 libgomp/config.h.in|   8 +-
 libgomp/configure  |  33 +++-
 libgomp/env.c  |   6 +-
 libgomp/libgomp.h  |   1 +
 libgomp/libgomp.map|   7 +-
 libgomp/libgomp_g.h|   1 +
 libgomp/oacc-init.c|  18 +-
 libgomp/plugin/configfrag.ac   |  10 +-
 libgomp/target.c   | 172 ++
 libgomp/testsuite/lib/libgomp.exp  |  75 ++--
 libgomp/testsuite/libgomp.c++/c++.exp  |  13 --
 libgomp/testsuite/libgomp.c/c.exp  |   2 -
 libgomp/testsuite/libgomp.fortran/fortran.exp  |   5 -
 libgomp/testsuite/libgomp.graphite/graphite.exp|   2 -
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |  33 ++--
 libgomp/testsuite/libgomp.oacc-c/c.exp |  17 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  23 +--
 19 files changed, 408 insertions(+), 222 deletions(-)

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 8c96ca5..80bc639 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -24036,6 +24036,10 @@ macro in the machine description macro file.
 This flag does not have a negative form, because it specifies a
 three-way choice.
 
+Note that this flag may conflict with the @option{-ffixed-form} as
+well as @option{-ffixed-line-length-none} and
+@option{-ffixed-line-length-n} options of the Fortran front end.
+
 @item -fcall-used-@var{reg}
 @opindex fcall-used
 Treat the register named @var{reg} as an allocable register that is
diff --git gcc/gcc.c gcc/gcc.c
index 0642be1..5c7c462 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1,3 +1,5 @@
+#define FPRINTF if (getenv(DEBUG)) fprintf
+
 /* Compiler driver program that can handle many languages.
Copyright (C) 1987-2015 Free Software Foundation, Inc.
 
@@ -158,7 +160,7 @@ static const char *const spec_version = 
DEFAULT_TARGET_VERSION;
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE;
 
-/* List of offload targets.  */
+/* List of offload targets.  Empty string for -foffload=disable.  */
 
 static char *offload_targets = NULL;
 
@@ -275,6 +277,8 @@ static const char *compare_debug_auxbase_opt_spec_function 
(int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static 

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)

2015-08-17 Thread Martin Jambor
Hi,

On Fri, Aug 14, 2015 at 03:19:26PM +0200, Ilya Verbin wrote:
 2015-08-14 11:47 GMT+02:00 Thomas Schwinge tho...@codesourcery.com:
  On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote:
@@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), 
const void *unused,
 return gomp_target_fallback (fn, hostaddrs);
   
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+return gomp_target_fallback (fn, hostaddrs);
 
  Is that reliable?  Consider the following scenario, with f1 and f2
  implemented in separate TUs:
 
  #pragma omp target data [map clauses]
  {
f1([...]);
f2([...]);
  }
 
  Consider that in f1 we have a OpenMP target region with offloading data
  available, and in f2 we have a OpenMP target region without offloading
  data available.  In this case, the GOMP_target in f1 will execute on the
  offloading target, but the GOMP_target in f2 will resort to host fallback
  -- and we then likely have data inconsistencies, as the data specified by
  the map clauses is not synchronized between host and device.
 
  Admittedly, this is user error (inconsistent set of offloading functions
  available -- need either all, or none), but in such a scenario probably
  we should be doing a better job (at detecting this).  (Note, I'm not sure
  whether my current patch actually does any better.)  ;-)
 
 You're right. That's why I didn't send this patch for review yet.
 My current plan is as follows:
 * Use this approach for architectures with shared memory, since it
 allows mixing host and target functions.

Great, please keep me posted on these changes.

Thanks!

Martin

 * For non-shared memory, at the first splay tree lookup:
 ** If target fn is not found, run the whole program in host-fallback mode.
 ** If it's found, then all target fns must exist. I.e. if some
 tgt_addr (not first) is NULL, then libgomp will issue an error as it
 does now.
 
   -- Ilya


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)

2015-08-14 Thread Ilya Verbin
2015-08-14 11:47 GMT+02:00 Thomas Schwinge tho...@codesourcery.com:
 On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote:
   @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), 
   const void *unused,
return gomp_target_fallback (fn, hostaddrs);
  
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
   +  if (fn_addr == NULL)
   +return gomp_target_fallback (fn, hostaddrs);

 Is that reliable?  Consider the following scenario, with f1 and f2
 implemented in separate TUs:

 #pragma omp target data [map clauses]
 {
   f1([...]);
   f2([...]);
 }

 Consider that in f1 we have a OpenMP target region with offloading data
 available, and in f2 we have a OpenMP target region without offloading
 data available.  In this case, the GOMP_target in f1 will execute on the
 offloading target, but the GOMP_target in f2 will resort to host fallback
 -- and we then likely have data inconsistencies, as the data specified by
 the map clauses is not synchronized between host and device.

 Admittedly, this is user error (inconsistent set of offloading functions
 available -- need either all, or none), but in such a scenario probably
 we should be doing a better job (at detecting this).  (Note, I'm not sure
 whether my current patch actually does any better.)  ;-)

You're right. That's why I didn't send this patch for review yet.
My current plan is as follows:
* Use this approach for architectures with shared memory, since it
allows mixing host and target functions.
* For non-shared memory, at the first splay tree lookup:
** If target fn is not found, run the whole program in host-fallback mode.
** If it's found, then all target fns must exist. I.e. if some
tgt_addr (not first) is NULL, then libgomp will issue an error as it
does now.

  -- Ilya


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)

2015-08-14 Thread Joseph Myers
On Fri, 14 Aug 2015, Thomas Schwinge wrote:

 This function »generate[s] a C source file containing a constructor call
 to GOMP_set_offload_targets [...], and adds that as an infile«.  This
 basically works ;-) -- but really only for C source code, and for C++
 and Fortran it fails if there are command-line options used that conflict
 with the C compilation of add-omp-infile, such as (from a libgomp
 testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for
 Fortran: -fcray-pointer, -fintrinsic-modules-path.  Any suggestion about
 how to overcome that?

I suppose you need to use the option-handling information about which 
options are for which languages to filter out any options that aren't 
valid for C or Common.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-14 Thread Joseph Myers
On Fri, 14 Aug 2015, Thomas Schwinge wrote:

 Can you suggest off-hand where you'd expect this option filtering to
 happen?  Should this be during specs parsing in the driver; something
 like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
 gcc/gcc.c:give_switch ignore any switches that don't match the expected
 CL_*?  I seem to have difficulties to properly populate/deduce that
 lang_mask at the call sites of gcc/gcc.c:save_switch.  Or, did you
 imagine that to be done differently?

I don't have a particular design in mind; I was simply noting that the 
relevant information is available to the driver through the option 
handling data.

 Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
 to silently ignore options that don't apply instead of emitting a »is
 valid for [...] but not for [...]« diagnostic, if a (new) flag
 (-f[something]?) has been set, which would be active only during the
 add-omp-infile compilation?

That would be a possibility, yes.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-14 Thread Thomas Schwinge
Hi!

On Fri, 14 Aug 2015 16:56:25 +, Joseph Myers jos...@codesourcery.com 
wrote:
 On Fri, 14 Aug 2015, Thomas Schwinge wrote:
 
  This function »generate[s] a C source file containing a constructor call
  to GOMP_set_offload_targets [...], and adds that as an infile«.  This
  basically works ;-) -- but really only for C source code, and for C++
  and Fortran it fails if there are command-line options used that conflict
  with the C compilation of add-omp-infile, such as (from a libgomp
  testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for
  Fortran: -fcray-pointer, -fintrinsic-modules-path.  Any suggestion about
  how to overcome that?

The problem, as (I hope) I understand it, is that gcc/gcc.c:cc1_options
includes %{std*[...]} and %{f*}, which will match/accept the
C++/Fortran-specific command-line arguments (as cited above) even if
actually operating in C language mode for the add-omp-infile compilation.

 I suppose you need to use the option-handling information about which 
 options are for which languages to filter out any options that aren't 
 valid for C or Common.

OK, that sounds simple enough, conceptually.  So, you are invalidating my
worry that the driver might in fact not be able to do this kind of thing
(mixed language compilation).

I'm currently trying to understand how all that command-line option
parsing code works, and the handoff from the driver to the frontends;
processing of the specs language.

Can you suggest off-hand where you'd expect this option filtering to
happen?  Should this be during specs parsing in the driver; something
like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
gcc/gcc.c:give_switch ignore any switches that don't match the expected
CL_*?  I seem to have difficulties to properly populate/deduce that
lang_mask at the call sites of gcc/gcc.c:save_switch.  Or, did you
imagine that to be done differently?

Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
to silently ignore options that don't apply instead of emitting a »is
valid for [...] but not for [...]« diagnostic, if a (new) flag
(-f[something]?) has been set, which would be active only during the
add-omp-infile compilation?


Grüße,
 Thomas


pgp8sVpxQ9Zgf.pgp
Description: PGP signature


Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)

2015-08-14 Thread Thomas Schwinge
Hi!

Assuming that the overall approach (my option a) is fine, this is now
primarily a question about how to teach the driver to the right thing.
(Joseph CCed as driver reviewer.)

On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote:
 On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
  On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin iver...@gmail.com wrote:
   On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
   We had established the use of a boolean flag have_offload in gcc::context
   to indicate whether during compilation, we've actually seen any code to
   be offloaded (see cited below the relevant parts of the patch by Ilya et
   al.).  This means that currently, the whole offload machinery will not be
   run unless we actually have any offloaded data.  This means that the
   configured mkoffload programs (-foffload=[...], defaulting to
   configure-time --enable-offload-targets=[...]) will not be invoked unless
   we actually have any offloaded data.  This means that we will not
   actually generate constructor code to call libgomp's
   GOMP_offload_register unless we actually have any offloaded data.
  
   Yes, that was the plan.
  
   runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
   targets have been specified during compilation.
  
   But: at runtime, I'd like to know which -foffload=[...] targets have been
   specified during compilation, so that we can, for example, reliably
   resort to host fallback execution for -foffload=disable instead of
   getting error message that an offloaded function is missing.
  
   It's easy to fix:
  
   diff --git a/libgomp/target.c b/libgomp/target.c
   index a5fb164..f81d570 100644
   --- a/libgomp/target.c
   +++ b/libgomp/target.c
   @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
   *devicep,
  k.host_end = k.host_start + 1;
  splay_tree_key tgt_fn = splay_tree_lookup (devicep-mem_map, k);
  gomp_mutex_unlock (devicep-lock);
   -  if (tgt_fn == NULL)
   -   gomp_fatal (Target function wasn't mapped);
   -
  return (void *) tgt_fn-tgt_offset;
}

Won't that possibly result in a NULL pointer dereference (tgt_fn) --
instead return NULL, I think?

   @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const 
   void *unused,
return gomp_target_fallback (fn, hostaddrs);
  
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
   +  if (fn_addr == NULL)
   +return gomp_target_fallback (fn, hostaddrs);

Is that reliable?  Consider the following scenario, with f1 and f2
implemented in separate TUs:

#pragma omp target data [map clauses]
{
  f1([...]);
  f2([...]);
}

Consider that in f1 we have a OpenMP target region with offloading data
available, and in f2 we have a OpenMP target region without offloading
data available.  In this case, the GOMP_target in f1 will execute on the
offloading target, but the GOMP_target in f2 will resort to host fallback
-- and we then likely have data inconsistencies, as the data specified by
the map clauses is not synchronized between host and device.

Admittedly, this is user error (inconsistent set of offloading functions
available -- need either all, or none), but in such a scenario probably
we should be doing a better job (at detecting this).  (Note, I'm not sure
whether my current patch actually does any better.)  ;-)

   other hand, for example, for -foffload=nvptx-none, even if user program
   code doesn't contain any offloaded data (and thus the offload machinery
   has not been run), the user program might still contain any executable
   directives or OpenACC runtime library calls, so we'd still like to use
   the libgomp nvptx plugin.  However, we currently cannot detect this
   situation.
  
   I see two ways to resolve this: a) embed the compile-time -foffload=[...]
   configuration in the executable (as a string, for example) for libgomp to
   look that up, or b) make it a requirement that (if configured via
   -foffload=[...]), the offload machinery is run even if there is not
   actually any data to be offloaded, so we then reliably get the respective
   constructor call to libgomp's GOMP_offload_register.  I once began to
   implement a), but this to get a big ugly, so then looked into b) instead.
   Compared to the status quo, always running the whole offloading machinery
   for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
   are active, certainly does introduce some overhead when there isn't
   actually any code to be offloaded, so I'm not sure whether that is
   acceptable?
  
   I vote for (a).

OK.  Any other opinions?

  What happens for conflicting -fofffload=[...] options in different TUs?
 
 If you're asking about what happens now, only the list of offload targets from
 link-time -foffload=tgt1,tgt2 option matters.

I'm fine with that -- require the user to specify a consistent set of
-foffload