[og7] Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi! (This whole idea/patch still needs an overall re-work, as discussed, but here is a small incremental improvement/bug fix.) On Thu, 20 Aug 2015 22:52:58 +, Joseph Myerswrote: > On Tue, 18 Aug 2015, Thomas Schwinge wrote: > > [...] here is my current messy WIP patch [...] > +/* List of offload targets, separated by colon. Defaults to the list > + determined when configuring libgomp. */ > +static const char *gomp_offload_targets = OFFLOAD_TARGETS; > +static bool gomp_offload_targets_init = false; > + > +/* Override the list of offload targets. This must be called early, and only > + once. */ > + > +void > +GOMP_set_offload_targets (const char *offload_targets) > +{ > + gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets); > + > + /* Make sure this gets called early. */ > + assert (gomp_is_initialized == PTHREAD_ONCE_INIT); > + /* Make sure this only gets called once. */ > + assert (!gomp_offload_targets_init); > + gomp_offload_targets_init = true; > + gomp_offload_targets = offload_targets; > +} This will obviously fail as soon as there are shared libraries involved, compiled for offloading, which contain additional GOMP_set_offload_targets constructor calls. Thus pushed to openacc-gcc-7-branch: commit 917e247055a37f912129ed545719182de0046adb Author: Thomas Schwinge Date: Sun May 20 21:31:01 2018 +0200 [PR81886] Avoid "GOMP_set_offload_targets: Assertion `!gomp_offload_targets_init' failed" PR libgomp/81886 * openacc.h (enum acc_device_t): Add _acc_device_intel_mic, _acc_device_hsa. * oacc-init.c (get_openacc_name): Handle these. (resolve_device): Debugging output. * target.c (resolve_device, gomp_init_device) (gomp_offload_target_available_p): Likewise. (GOMP_set_offload_targets): Rewrite. * testsuite/libgomp.oacc-c++/c++.exp: Provide offload target in "-DACC_DEVICE_TYPE_host", and "-DACC_DEVICE_TYPE_nvidia". * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-c/offload-targets-1.c: New file. * testsuite/libgomp.oacc-c/offload-targets-2.c: Likewise. * testsuite/libgomp.oacc-c/offload-targets-3.c: Likewise. * testsuite/libgomp.oacc-c/offload-targets-4.c: Likewise. * testsuite/libgomp.oacc-c/offload-targets-5.c: Likewise. * testsuite/libgomp.oacc-c/offload-targets-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise. --- libgomp/ChangeLog.openacc | 34 libgomp/oacc-init.c| 7 + libgomp/openacc.h | 2 + libgomp/target.c | 178 +++-- libgomp/testsuite/libgomp.oacc-c++/c++.exp | 4 +- .../libgomp.oacc-c-c++-common/acc-on-device-2.c| 2 +- .../libgomp.oacc-c-c++-common/acc_on_device-1.c| 4 +- .../libgomp.oacc-c-c++-common/pr85381-2.c | 3 +- .../libgomp.oacc-c-c++-common/pr85381-3.c | 3 +- .../libgomp.oacc-c-c++-common/pr85381-4.c | 3 +- .../libgomp.oacc-c-c++-common/pr85381-5.c | 3 +- .../testsuite/libgomp.oacc-c-c++-common/pr85381.c | 3 +- .../libgomp.oacc-c-c++-common/pr85486-2.c | 3 +- .../libgomp.oacc-c-c++-common/pr85486-3.c | 3 +- .../testsuite/libgomp.oacc-c-c++-common/pr85486.c | 3 +- libgomp/testsuite/libgomp.oacc-c/c.exp | 4 +- .../testsuite/libgomp.oacc-c/offload-targets-1.c | 119 ++ .../testsuite/libgomp.oacc-c/offload-targets-2.c | 2 + .../testsuite/libgomp.oacc-c/offload-targets-3.c | 10 ++ .../testsuite/libgomp.oacc-c/offload-targets-4.c | 11 ++ .../testsuite/libgomp.oacc-c/offload-targets-5.c | 10 ++ .../testsuite/libgomp.oacc-c/offload-targets-6.c | 11 ++
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 10/20/2015 12:02 PM, Jakub Jelinek wrote: I'd like to defer review of the driver bits, can Joseph or Bernd please have a look at those? Last time around I think I asked for some minor changes, like updated documentation for give_switch. Other than that, I'm ok with the patch iff you are happy with the overall approach. Bernd
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote: > > How's the following (complete patch instead of incremental patch; the > > driver changes are still the same as before)? The changes are: > > > > * libgomp/target.c:gomp_target_init again loads all the plugins. > > * libgomp/target.c:resolve_device and > > libgomp/oacc-init.c:resolve_device verify that a default device > > (OpenMP device-var ICV, and acc_device_default, respectively) is > > actually enabled, or resort to host fallback if not. > > * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used > > to enable devices specified by -foffload. Can be called multiple > > times (executable, any shared libraries); the set of enabled devices > > is the union of all those ever requested. > > * GOMP_offload_register (but not the new GOMP_offload_register_ver) > > changed to enable all devices. This is to maintain compatibility > > with old executables and shared libraries built without the -foffload > > constructor support. Any reason not to pass the bitmask of the enabled targets to GOMP_offload_register_ver instead, to decrease the amount of ctors and the times you lock the various locks during initialization, or just enable automatically the devices you load data for during GOMP_offload_register_ver? I mean, GOMP_offload_register would enable for compatibility all devices, GOMP_offload_register_ver would enable the device it is registered for. For -foffload=disable on all shared libraries/binaries, naturally you would not register anything, thus would not enable any devices (only host fallback would work). Or are you worried about the case where one shared library is compiled with say -foffload=intelmic,ptx but doesn't actually contain any #pragma omp target/#pragma omp declare target (or OpenACC similar #directives), but only contains #pragma omp target data and/or the device query/copying routines, then dlopens some other shared library that actually has the offloading device code? That could be solved by adding the call you are talking about, but if we really should care about that unlikely case, it would be better to only arrange for it if really needed by the shared library (i.e. if it calls one of the OpenMP or OpenACC library routines that talk to the devices, or has #pragma omp target data or similar constructs; I'd strongly prefer not to have constructors in code that just got compiled with -fopenmp, even in configuration where some offloading is configured by default, when nothing in the code really cares about offloading. > --- a/gcc/gcc.c > +++ b/gcc/gcc.c > @@ -401,6 +401,8 @@ static const char > *compare_debug_auxbase_opt_spec_function (int, const char **); > static const char *pass_through_libs_spec_func (int, const char **); > static const char *replace_extension_spec_func (int, const char **); > static const char *greater_than_spec_func (int, const char **); > +static const char *add_omp_infile_spec_func (int, const char **); > + > static char *convert_white_space (char *); > > /* The Specs Language I'd like to defer review of the driver bits, can Joseph or Bernd please have a look at those? > diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h > index 24fbb94..5da4fa7 100644 > --- a/libgomp/libgomp-plugin.h > +++ b/libgomp/libgomp-plugin.h > @@ -48,7 +48,8 @@ enum offload_target_type >OFFLOAD_TARGET_TYPE_HOST = 2, >/* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed. */ >OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5, > - OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 > + OFFLOAD_TARGET_TYPE_INTEL_MIC = 6, > + OFFLOAD_TARGET_TYPE_HWM What is HWM? Is that OFFLOAD_TARGET_TYPE_LAST what you mean? > diff --git a/libgomp/target.c b/libgomp/target.c > index b767410..df51bfb 100644 > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -72,6 +72,9 @@ static int num_offload_images; > /* Array of descriptors for all available devices. */ > static struct gomp_device_descr *devices; > > +/* Set of enabled devices. */ > +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM]; I must say I don't like the locking for this. If all you ever change on this is that you change it from 0 to 1, then supposedly just storing it with __atomic_store, perhaps with rel semantics, and reading it as __atomic_load, with acquire semantics, would be good enough? And perhaps change it into int array, so that it is actually atomic even on the old Alphas (if there are any around). Jakub
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi Jakub! Thanks for the review. On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinekwrote: > On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote: > > > How's the following (complete patch instead of incremental patch; the > > > driver changes are still the same as before)? The changes are: > > > > > > * libgomp/target.c:gomp_target_init again loads all the plugins. > > > * libgomp/target.c:resolve_device and > > > libgomp/oacc-init.c:resolve_device verify that a default device > > > (OpenMP device-var ICV, and acc_device_default, respectively) is > > > actually enabled, or resort to host fallback if not. > > > * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used > > > to enable devices specified by -foffload. Can be called multiple > > > times (executable, any shared libraries); the set of enabled devices > > > is the union of all those ever requested. > > > * GOMP_offload_register (but not the new GOMP_offload_register_ver) > > > changed to enable all devices. This is to maintain compatibility > > > with old executables and shared libraries built without the -foffload > > > constructor support. > > Any reason not to pass the bitmask of the enabled targets to > GOMP_offload_register_ver instead, to decrease the amount of ctors and > the times you lock the various locks during initialization, or just enable > automatically the devices you load data for during GOMP_offload_register_ver? > I mean, GOMP_offload_register would enable for compatibility all devices, > GOMP_offload_register_ver would enable the device it is registered for. > For -foffload=disable on all shared libraries/binaries, naturally you would > not register anything, thus would not enable any devices (only host fallback > would work). As explained a few times already: GOMP_offload_register_ver constructors will only be generated if there actually are offloaded code regions, but for example: #include int main() { __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia)); return 0; } ... is a valid OpenACC program (untested), which doesn't contain any offloaded code regions. As a user I'd expect it to return different answers if compiled with -foffload=nvptx-none in contrast to -foffload=disable. Actually, I can foresee exactly such code to be used to probe for offloading being available, for example in testsuites. And, I guess we agree that under -foffload=disable we'd like the compilation/runtime system to be configured in a way that no offloading will happen? Always creating (dummy) GOMP_offload_register_ver constructors has been another suggestion that I had voiced much earlier in this thread (months ago), but everyone (including me) taking part in the discussion agreed that it'd cause even higher compile-time overhead. > Or are you worried about the case where one shared library is compiled > with say -foffload=intelmic,ptx but doesn't actually contain any > #pragma omp target/#pragma omp declare target (or OpenACC similar > #directives), but only contains #pragma omp target data and/or the device > query/copying routines, then dlopens some other shared library that actually > has the offloading device code? That's another example, yes. > That could be solved by adding the call you are talking about, but > if we really should care about that unlikely case, it would be better to > only arrange for it if really needed by the shared library (i.e. if it calls > one of the OpenMP or OpenACC library routines that talk to the devices, or > has #pragma omp target data or similar constructs; > I'd strongly prefer not to have constructors in code that just got compiled > with -fopenmp, even in configuration where some offloading is configured by > default, when nothing in the code really cares about offloading. So, how to resolve our different opinions? I mean, for any serious program code, there will be constructor calls into libgomp already; are you expecting that adding one more really will cause any noticeable overhead? I agree that enabling devices for GOMP_offload_register_ver calls makes sense. (I indeed had considered this earlier, but it didn't lead to solving the problem complete -- see above.) Can we come up with a scheme to do it this way, and only generate the GOMP_enable_offload_targets constructor of no GOMP_offload_register_ver constructors have been generated? But I have no idea how to implement that in a non-convoluted way. (And, it sounds excessive to me in terms of implementation overhead on our side, in contrast to execution overhead of one libgomp constructor call.) > > --- a/gcc/gcc.c > > +++ b/gcc/gcc.c > > @@ -401,6 +401,8 @@ static const char > > *compare_debug_auxbase_opt_spec_function (int, const char **); > > static const char *pass_through_libs_spec_func (int, const char **); > > static const char *replace_extension_spec_func (int, const char **); > > static
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote: > Always creating (dummy) GOMP_offload_register_ver constructors has been > another suggestion that I had voiced much earlier in this thread (months > ago), but everyone (including me) taking part in the discussion agreed > that it'd cause even higher compile-time overhead. I'd prefer to just set a flag like "force creation of the GOMP offloading sections" whenever you see one of the APIs or constructs used in the TU, and if that flag is set, even when there are no offloaded vars or functions/kernels, force creation of the corresponding data sections. Either it can be stardard offloading LTO sections, just not containing anything, or, if you want to improve compile-time, it could be special too, so that the linker plugin can quickly identify those that only need offloading support, but don't have any offloaded vars or code. But that can certainly be done as an incremental optimization. For OpenMP that would be whenever #pragma omp target{, data, enter data, exit data} construct is seen (e.g. during gimplification or OMP region nesting checking even better), or for omp_set_default_device omp_get_default_device omp_get_num_devices omp_is_initial_device omp_get_initial_device omp_target_alloc omp_target_free omp_target_is_present omp_target_memcpy omp_target_memcpy_rect omp_target_associate_ptr omp_target_disassociate_ptr calls. Guess for OpenACC you have similar set of calls. The thing is, while OpenACC is standard is pretty much solely about offloading, OpenMP is not, and in many cases programs just use host OpenMP parallelization (at least right now, I bet such programs are significantly larger set than programs that use OpenACC or OpenMP offloading together). Distributions and others will eventually configure the compilers they are shipping to enable the offloading, and if that forces a constructor to every TU or even every shared library just because it has been compiled with -fopenmp, it is unacceptable overhead. For the vendor shipped binary compilers, I'm envisioning ideal would be to be able to configure gcc for many offloading targets, then build such main compiler and offloading target compilers, but package them separately (one package (or set of packages) the base compiler, and then another package (or set of them) for each offloading target. What the -foffload= actually will be in the end from the linked shared library or binary POV would depend both on the configured offloading target, but also on whether the mkoffload binaries are found (or whatever else is needed first from the offloading target). That would mean that we'd not issue hard error or any kind of diagnostics if mkoffload is missing. Is that acceptable, or should that e.g. be limited just to the compiled in configure default (i.e. explicit -foffload= would error if the requested mkoffload is missing, default -foffload= would silently skip unavailable ones; I guess this would be my preference), or should we have two ways of configuring the offloading targets, as hard requirements and as optional support? > So, how to resolve our different opinions? I mean, for any serious > program code, there will be constructor calls into libgomp already; are > you expecting that adding one more really will cause any noticeable > overhead? See above, that is really not the case. Most of OpenMP code doesn't have any constructor calls into libgomp at all, the only exception is GOMP_offload_register{,_ver} at this point. > > What is HWM? Is that OFFLOAD_TARGET_TYPE_LAST what you mean? > > Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he > told me this means "High Water Mark". I have no strong opinion on the > name to use, just want to mention that "*_LAST" sounds to me like that > one still is part of the accepted set, whereas in this case it'd be the > first enumerator outside of the accepted ones. (And I guess, we agree > that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by > "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is > ugly?) *_LAST or *_last is actually what we use pretty much everywhere, see e.g. lots of places in tree-core.h. > Are you worried about the performance issues of a very short locking > cycle that in the majority of all cases should happen without blocking, > in comparison to performance issues related to host/device memory > transfers or kernel launches that will follow after the call to > gomp_offload_target_enabled_p? I don't really think that is reasonable > to worry about. Yes, I'm worried about that. The lock could be contended, and if you take the lock many times for each construct, it can show up, I'm worried about cache effects etc. It is already bad enough that we take/release the locks for the same device e.g. in each of: void *fn_addr = gomp_get_target_fn_addr (devicep, fn); struct target_mem_desc *tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds,
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 10/20/2015 01:17 PM, Thomas Schwinge wrote: As explained a few times already: GOMP_offload_register_ver constructors will only be generated if there actually are offloaded code regions, but for example: #include int main() { __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia)); return 0; } ... is a valid OpenACC program (untested), which doesn't contain any offloaded code regions. As a user I'd expect it to return different answers if compiled with -foffload=nvptx-none in contrast to -foffload=disable. Actually, I can foresee exactly such code to be used to probe for offloading being available, for example in testsuites. And, I guess we agree that under -foffload=disable we'd like the compilation/runtime system to be configured in a way that no offloading will happen? Both of you can ignore me if you feel I'm not making sense, but what exactly is the use case for -foffload=disable? Isn't it slightly redundant with -fno-openacc? IMO it's not an option that alters the available devices, that's a question that is answered at run-time and doesn't (or shouldn't) really depend on compiler switches. As a user I'd expect -foffload=disable to just prevent generation of offloaded code for the things I'm compiling. As Jakub pointed out, shared libraries may still contain other pieces that are offloadable. I guess I don't fully understand why you want to go to great lengths to disable devices at run-time based on a compile-time switch. What's the reasoning here? Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he told me this means "High Water Mark". I have no strong opinion on the name to use, just want to mention that "*_LAST" sounds to me like that one still is part of the accepted set, whereas in this case it'd be the first enumerator outside of the accepted ones. (And I guess, we agree that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is ugly?) Nah, just rename HWM to LAST, that's fairly common usage I think. Bernd
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote: > Both of you can ignore me if you feel I'm not making sense, but what exactly > is the use case for -foffload=disable? Isn't it slightly redundant with > -fno-openacc? IMO it's not an option that alters the available devices, > that's a question that is answered at run-time and doesn't (or shouldn't) > really depend on compiler switches. As a user I'd expect -foffload=disable > to just prevent generation of offloaded code for the things I'm compiling. > As Jakub pointed out, shared libraries may still contain other pieces that > are offloadable. > > I guess I don't fully understand why you want to go to great lengths to > disable devices at run-time based on a compile-time switch. What's the > reasoning here? At least for OpenMP, I'm also happy with what we do now (except for the ability to configure offloading targets as optional, i.e. dynamically configure the default based on what packages user install rather than just on how it has been configured, so that e.g. just because it has been configured for PTX offloading the host GCC itself doesn't have to have a dependency on the proprietary CUDA stuff in any way). I believe in OpenMP nobody says that if the device HW is available, but user chose to not compile offloading code/variables for that particular device that it can't show up among omp_get_num_devices (). And I think it is entirely fine if say target data map succeeds to that device, but then target is offloaded, if that is caused by users configure or command line choice. Maybe OpenACC has different requirements, is it required to terminate the program if it can't fulfill the requested offloading? In any case, I'm fine with something I've noted in the last mail, or with the status quo, but not with running constructors in TUs or even shared libraries just because they have been compiled with -fopenmp (and either haven't used any OpenMP code at all, or just the non-*target* directives). Jakub
Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi! Ping... On Wed, 30 Sep 2015 17:54:07 +0200, I wrote: > On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinekwrote: > > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote: > > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek > > > wrote: > > > > So, do I understand well that you'll call GOMP_set_offload_targets from > > > > construct[ors] of all shared libraries (and the binary) that contain > > > > offloaded > > > > code? If yes, that is surely going to fail the assertions in there. > > > > > > Indeed. My original plan has been to generate/invoke this constructor > > > only for/from the final executable and not for any shared libraries, but > > > it seems I didn't implemented this correctly. > > > > How would you mean to implement it? > > I have come to realize that we need to generate/invoke this constructor > From everything that links against libgomp (which is what I implemented), > that is, executables as well as shared libraries. > > > -fopenmp or -fopenacc code with > > offloading bits might not be in the final executable at all, nor in shared > > libraries it is linked against; such libraries could be only dlopened, > > consider say python plugin. And this is not just made up, perhaps not with > > offloading yet, but people regularly use OpenMP code in plugins and then we > > get complains that fork child of the main program is not allowed to do > > anything but async-signal-safe functions. > > I'm not sure I'm completely understanding that paragraph? Are you saying > that offloaded code can be in libraries that are not linked against > libgomp? How would these register (GOMP_offload_register) their > offloaded code? I think it's a reasonable to expect that every shared > library that contains offloaded code must link against libgomp, which > will happen automatically given that it is built with -fopenmp/-fopenacc? > > > > > You can dlopen such libraries etc. What if you link one library with > > > > -fopenmp=nvptx-none and another one with > > > > -fopenmp=x86_64-intelmicemul-linux? > > > > > > So, the first question to answer is: what do we expect to happen in this > > > case, or similarly, if the executable and any shared libraries are > > > compiled with different/incompatible -foffload options? > > > > As the device numbers are per-process, the only possibility I see is that > > all the physically available devices are always available, and just if you > > try to offload from some code to a device that doesn't support it, you get > > host fallback. Because, one shared library could carefully use device(xyz) > > to offload to say XeonPhi it is compiled for and supports, and another > > library device(abc) to offload to PTX it is compiled for and supports. > > OK, I think I get that, and it makes sense. Even though, I don't know > how you'd do that today: as far as I can tell, there is no specification > covering the OpenMP 4 target device IDs, so I have no idea how a user > program/library could realiably use them in practice? For example, in > the current GCC implementation, the OpenMP 4 target device IDs depend on > the number of individual devices availble in the system, and the order in > which libgomp loads the plugins, which is defined (arbitrarily) by the > GCC configuration? > > > > For this, I propose that the only mode of operation that we currently can > > > support is that all of the executable and any shared libraries agree on > > > the offload targets specified by -foffload, and I thus propose the > > > following patch on top of what Joseph has posted before (passes the > > > testsuite, but not yet tested otherwise): > > > > See above, no. > > OK. > > How's the following (complete patch instead of incremental patch; the > driver changes are still the same as before)? The changes are: > > * libgomp/target.c:gomp_target_init again loads all the plugins. > * libgomp/target.c:resolve_device and > libgomp/oacc-init.c:resolve_device verify that a default device > (OpenMP device-var ICV, and acc_device_default, respectively) is > actually enabled, or resort to host fallback if not. > * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used > to enable devices specified by -foffload. Can be called multiple > times (executable, any shared libraries); the set of enabled devices > is the union of all those ever requested. > * GOMP_offload_register (but not the new GOMP_offload_register_ver) > changed to enable all devices. This is to maintain compatibility > with old executables and shared libraries built without the -foffload > constructor support. > * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead > of GOMP_offload_register, and GOMP_offload_unregister_ver instead of > GOMP_offload_unregister. To avoid enabling all devices > (GOMP_offload_register). > * New test cases to verify this (-foffload=disable, host
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On reviewing in more detail the changes to pass offloading targets from the driver to libgomp at link time to identify the minimal self-contained pieces that can go to trunk, I found that the use of fnmatch to match against target names was completely unnecessary; the ISO C90 functions strstr and strncmp could be used instead, so avoiding the need to add configure tests for fnmatch. This patch duly removes the use of and configure tests for fnmatch. Will commit to gomp-4_0-branch subject to test results. 2015-08-25 Joseph Myers jos...@codesourcery.com * plugin/configfrag.ac: Don't test for fnmatch.h or fnmatch. * configure, config.h.in: Regenerate. * target.c [PLUGIN_SUPPORT]: Don't include fnmatch.h. (offload_target_to_plugin_name): Use strstr and strncmp instead of fnmatch. Index: libgomp/config.h.in === --- libgomp/config.h.in (revision 227169) +++ libgomp/config.h.in (working copy) @@ -24,12 +24,6 @@ /* Define to 1 if you have the dlfcn.h header file. */ #undef HAVE_DLFCN_H -/* Define to 1 if you have the `fnmatch' function. */ -#undef HAVE_FNMATCH - -/* Define to 1 if you have the fnmatch.h header file. */ -#undef HAVE_FNMATCH_H - /* Define to 1 if you have the `getloadavg' function. */ #undef HAVE_GETLOADAVG Index: libgomp/target.c === --- libgomp/target.c(revision 227169) +++ libgomp/target.c(working copy) @@ -41,7 +41,6 @@ #ifdef PLUGIN_SUPPORT #include dlfcn.h -#include fnmatch.h #include plugin-suffix.h #endif @@ -1271,9 +1270,9 @@ static const char * offload_target_to_plugin_name (const char *offload_target) { - if (fnmatch (*-intelmic*, offload_target, 0) == 0) + if (strstr (offload_target, -intelmic) != NULL) return intelmic; - if (fnmatch (nvptx*, offload_target, 0) == 0) + if (strncmp (offload_target, nvptx, 5) == 0) return nvptx; gomp_fatal (Unknown offload target: %s, offload_target); } Index: libgomp/configure === --- libgomp/configure (revision 227169) +++ libgomp/configure (working copy) @@ -15119,33 +15119,6 @@ offload_targets= plugin_support=yes -for ac_header in fnmatch.h -do : - ac_fn_c_check_header_mongrel $LINENO fnmatch.h ac_cv_header_fnmatch_h $ac_includes_default -if test x$ac_cv_header_fnmatch_h = xyes; then : - cat confdefs.h _ACEOF -#define HAVE_FNMATCH_H 1 -_ACEOF - -else - plugin_support=no -fi - -done - -for ac_func in fnmatch -do : - ac_fn_c_check_func $LINENO fnmatch ac_cv_func_fnmatch -if test x$ac_cv_func_fnmatch = xyes; then : - cat confdefs.h _ACEOF -#define HAVE_FNMATCH 1 -_ACEOF - -else - plugin_support=no -fi -done - { $as_echo $as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl 5 $as_echo_n checking for dlsym in -ldl... 6; } if test ${ac_cv_lib_dl_dlsym+set} = set; then : Index: libgomp/plugin/configfrag.ac === --- libgomp/plugin/configfrag.ac(revision 227169) +++ libgomp/plugin/configfrag.ac(working copy) @@ -29,8 +29,6 @@ offload_targets= AC_SUBST(offload_targets) plugin_support=yes -AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no]) -AC_CHECK_FUNCS([fnmatch], , [plugin_support=no]) AC_CHECK_LIB(dl, dlsym, , [plugin_support=no]) if test x$plugin_support = xyes; then AC_DEFINE(PLUGIN_SUPPORT, 1, -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Fri, 21 Aug 2015, Joseph Myers wrote: On Fri, 21 Aug 2015, Nathan Sidwell wrote: this appears to cause an ICE in add_omp_infile_spec_func at; gcc_assert (offload_targets != NULL); when you use something like -foffload='-save-temps -v -fdump-rtl-all -fdump-tree-all -fno-verbose-asm' Is that use ill-formed? I'll need to reverse-engineer the question of what's a well-formed -foffload= option (bug 67300 filed yesterday for the lack of any documentation of that option). Although there is no documentation for the -foffload options in the manual, I found something at https://gcc.gnu.org/wiki/Offloading that I hope is current. It turns out the problem wasn't in the assertion, but in how a default -foffload option was generated. Generating it via specs meant that if the only -foffload option specified options without specifying a target (i.e., options applicable to all the configured offload targets), then the offload_targets variable was never set and so the assertion failure resulted (as well as OFFLOAD_TARGET_NAMES not being exported). Rather than trying to make the specs produce something if no -foffload=* options other than -foffload=-* options were passed, I'm testing this patch to default the offload targets after the original command line is processed (and before extra options from these specs are processed, so before the assertion is executed), and will commit it if tests are OK. 2015-08-24 Joseph Myers jos...@codesourcery.com * gcc.c (driver_self_specs) [ENABLE_OFFLOADING]: Don't generate a -foffload option. (process_command): Call handle_foffload_option (OFFLOAD_TARGETS) if no offload target specified. Index: gcc/gcc.c === --- gcc/gcc.c (revision 227045) +++ gcc/gcc.c (working copy) @@ -1064,9 +1064,6 @@ static const char *const multilib_defaults_raw[] = static const char *const driver_self_specs[] = { %{fdump-final-insns:-fdump-final-insns=.} %fdump-final-insns, #ifdef ENABLE_OFFLOADING - /* If the user didn't specify any, default to all configured offload - targets. */ - %{!foffload=*:-foffload= OFFLOAD_TARGETS }, /* If linking against libgomp, add a setup file. */ %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1): \ %:add-omp-infile()}, @@ -4291,6 +4288,11 @@ process_command (unsigned int decoded_options_coun CL_DRIVER, handlers, global_dc); } + /* If the user didn't specify any, default to all configured offload + targets. */ + if (offload_targets == NULL) +handle_foffload_option (OFFLOAD_TARGETS); + if (output_file strcmp (output_file, -) != 0 strcmp (output_file, HOST_BIT_BUCKET) != 0) -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Mon, 24 Aug 2015, Joseph Myers wrote: I'm testing this patch to default the offload targets after the original command line is processed (and before extra options from these specs are processed, so before the assertion is executed), and will commit it if tests are OK. Now committed to gomp-4_0-branch. -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 08/24/15 18:22, Joseph Myers wrote: On Mon, 24 Aug 2015, Joseph Myers wrote: I'm testing this patch to default the offload targets after the original command line is processed (and before extra options from these specs are processed, so before the assertion is executed), and will commit it if tests are OK. Now committed to gomp-4_0-branch. thanks!
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 08/20/15 18:52, Joseph Myers wrote: On Tue, 18 Aug 2015, Thomas Schwinge wrote: This is what I've committed to gomp-4_0-branch, with the driver changes substantially cleaned up and smaller changes to the other bits of the patch. gcc: 2015-08-20 Thomas Schwinge tho...@codesourcery.com Joseph Myers jos...@codesourcery.com * doc/invoke.texi (-ffixed-@var{reg}): Document conflict with Fortran options. * gcc.c (offload_targets): Update comment. (add_omp_infile_spec_func, spec_lang_mask_accept): New. (driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use %:add-omp-infile(). (static_spec_functions): Add add-omp-infile. (struct switchstr): Add lang_mask field. Expand comment. (struct infile): Add lang_mask field. (add_infile, save_switch, do_spec): Add lang_mask argument. (driver_unknown_option_callback, driver_wrong_lang_callback) (driver_handle_option, process_command, do_self_spec) (driver::do_spec_on_infiles): All callers changed. (give_switch): Check languages of switch against spec_lang_mask_accept. (driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate targets variable. * gcc.h (do_spec): Update prototype. this appears to cause an ICE in add_omp_infile_spec_func at; gcc_assert (offload_targets != NULL); when you use something like -foffload='-save-temps -v -fdump-rtl-all -fdump-tree-all -fno-verbose-asm' Is that use ill-formed? nathan
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Fri, 21 Aug 2015, Nathan Sidwell wrote: this appears to cause an ICE in add_omp_infile_spec_func at; gcc_assert (offload_targets != NULL); when you use something like -foffload='-save-temps -v -fdump-rtl-all -fdump-tree-all -fno-verbose-asm' Is that use ill-formed? I'll need to reverse-engineer the question of what's a well-formed -foffload= option (bug 67300 filed yesterday for the lack of any documentation of that option). -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Tue, 18 Aug 2015, Thomas Schwinge wrote: So, back to modifying the driver; here is my current messy WIP patch with still a lot of TODOs in it -- but it appears to work at last. :-) Maybe somebody else is able to continue with that task while I'm out of office. This has been developed on top of gomp-4_0-branch r226832. I'm also attaching a tarball of the even more messy indivdual patches, foffload.tar.bz2, in case there's anything to salvage in there, or if that helps to understand the development options/history. Earlier messages in this thread should give enough context what this is about, http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E. This is what I've committed to gomp-4_0-branch, with the driver changes substantially cleaned up and smaller changes to the other bits of the patch. gcc: 2015-08-20 Thomas Schwinge tho...@codesourcery.com Joseph Myers jos...@codesourcery.com * doc/invoke.texi (-ffixed-@var{reg}): Document conflict with Fortran options. * gcc.c (offload_targets): Update comment. (add_omp_infile_spec_func, spec_lang_mask_accept): New. (driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use %:add-omp-infile(). (static_spec_functions): Add add-omp-infile. (struct switchstr): Add lang_mask field. Expand comment. (struct infile): Add lang_mask field. (add_infile, save_switch, do_spec): Add lang_mask argument. (driver_unknown_option_callback, driver_wrong_lang_callback) (driver_handle_option, process_command, do_self_spec) (driver::do_spec_on_infiles): All callers changed. (give_switch): Check languages of switch against spec_lang_mask_accept. (driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate targets variable. * gcc.h (do_spec): Update prototype. fortran: 2015-08-20 Joseph Myers jos...@codesourcery.com * gfortranspec.c (lang_specific_pre_link): Update call to do_spec. java: 2015-08-20 Joseph Myers jos...@codesourcery.com * jvspec.c (lang_specific_pre_link): Update call to do_spec. libgomp: 2015-08-20 Thomas Schwinge tho...@codesourcery.com Joseph Myers jos...@codesourcery.com * plugin/configfrag.ac (fnmatch.h): Check for header. (fnmatch): Check for function. (tgt_name): Do not set. (offload_targets): Separate with colons not commas. * config.h.in, configure: Regenerate. * env.c (initialize_env): Make static. Remove TODO. * libgomp.h (gomp_offload_target_available_p): New prototype. * libgomp.map (GOACC_2.0.GOMP_4_BRANCH): Add GOMP_set_offload_targets. (INTERNAL): Remove. * libgomp_g.h (GOMP_set_offload_targets): New prototype. * oacc-init.c (resolve_device): Do not handle acc_device_host. Add comments. * target.c: Include fnmatch.h. (resolve_device): Use host fallback when offload data not available. (gomp_offload_target_available_p, offload_target_to_plugin_name) (gomp_offload_targets, gomp_offload_targets_init) (GOMP_set_offload_targets, gomp_plugin_prefix) (gomp_plugin_suffix): New. (gomp_load_plugin_for_device): Add gomp_debug call. (gomp_target_init): Usegomp_offload_targets instead of OFFLOAD_TARGETS. Handle and rewrie colon-separated string. * testsuite/lib/libgomp.exp: Expect offload targets to be colon-separated. Adjust matching of offload targets. Don't generate constructor here. (libgomp_target_compile): Use GCC_UNDER_TEST. (check_effective_target_openacc_nvidia_accel_supported) (check_effective_target_openacc_host_selected): Adjust checks of offload target names. * testsuite/libgomp.c++/c++.exp: Do not set HAVE_SET_GXX_UNDER_TEST or GXX_UNDER_TEST. * testsuite/libgomp.c/c.exp: Do not append to libgomp_compile_options, * testsuite/libgomp.fortran/fortran.exp: Do not set GFORTRAN_UNDER_TEST or libgomp_compile_options. * testsuite/libgomp.graphite/graphite.exp: Do not append to libgomp_compile_options. * testsuite/libgomp.oacc-c++/c++.exp: Set SAVE_GCC_UNDER_TEST and GCC_UNDER_TEST. Do not set HAVE_SET_GXX_UNDER_TEST and GXX_UNDER_TEST. Do not append to ALWAYS_CFLAGS. Adjust set of offload targets. Use -foffload=. * testsuite/libgomp.oacc-c/c.exp: Do not append to libgomp_compile_options or ALWAYS_CFLAGS. Adjust set of offload targets. Use -foffload=. * testsuite/libgomp.oacc-fortran/fortran.exp: Do not set GFORTRAN_UNDER_TEST or append to libgomp_compile_options. Do not append to ALWAYS_CFLAGS. Adjust set of offload targets. Use -foffload=. Index: libgomp/plugin/configfrag.ac
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi! On Fri, 14 Aug 2015 22:56:30 +, Joseph Myers jos...@codesourcery.com wrote: On Fri, 14 Aug 2015, Thomas Schwinge wrote: Can you suggest off-hand where you'd expect this option filtering to happen? Should this be during specs parsing in the driver; something like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in gcc/gcc.c:give_switch ignore any switches that don't match the expected CL_*? I seem to have difficulties to properly populate/deduce that lang_mask at the call sites of gcc/gcc.c:save_switch. (I figured that out.) Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang to silently ignore options that don't apply instead of emitting a »is valid for [...] but not for [...]« diagnostic, if a (new) flag (-f[something]?) has been set, which would be active only during the add-omp-infile compilation? That would be a possibility, yes. ..., and that even looked like a sensible thing to do, also given that I found where you added the internal -lang-asm flag five years ago, gcc/c-family/c-opts.c:accept_all_c_family_options »Whether options from all C-family languages should be accepted quietly«, which does a rather similar thing. Unfortunately, going that route turned out to not work correctly: consider the Fortran -ffixed-form option, and likewise the -ffixed-line-length-[...] options. If not compiling for Fortran, these will be passed to C-family front ends, and be recognized there by means of the Common option -ffixed-[...], resulting in »cc1: warning: unknown register name: form«, for example. (Yay!) ;-) So, back to modifying the driver; here is my current messy WIP patch with still a lot of TODOs in it -- but it appears to work at last. :-) Maybe somebody else is able to continue with that task while I'm out of office. This has been developed on top of gomp-4_0-branch r226832. I'm also attaching a tarball of the even more messy indivdual patches, foffload.tar.bz2, in case there's anything to salvage in there, or if that helps to understand the development options/history. Earlier messages in this thread should give enough context what this is about, http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E. gcc/doc/invoke.texi| 4 + gcc/gcc.c | 200 ++--- libgomp/config.h.in| 8 +- libgomp/configure | 33 +++- libgomp/env.c | 6 +- libgomp/libgomp.h | 1 + libgomp/libgomp.map| 7 +- libgomp/libgomp_g.h| 1 + libgomp/oacc-init.c| 18 +- libgomp/plugin/configfrag.ac | 10 +- libgomp/target.c | 172 ++ libgomp/testsuite/lib/libgomp.exp | 75 ++-- libgomp/testsuite/libgomp.c++/c++.exp | 13 -- libgomp/testsuite/libgomp.c/c.exp | 2 - libgomp/testsuite/libgomp.fortran/fortran.exp | 5 - libgomp/testsuite/libgomp.graphite/graphite.exp| 2 - libgomp/testsuite/libgomp.oacc-c++/c++.exp | 33 ++-- libgomp/testsuite/libgomp.oacc-c/c.exp | 17 +- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 23 +-- 19 files changed, 408 insertions(+), 222 deletions(-) diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index 8c96ca5..80bc639 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -24036,6 +24036,10 @@ macro in the machine description macro file. This flag does not have a negative form, because it specifies a three-way choice. +Note that this flag may conflict with the @option{-ffixed-form} as +well as @option{-ffixed-line-length-none} and +@option{-ffixed-line-length-n} options of the Fortran front end. + @item -fcall-used-@var{reg} @opindex fcall-used Treat the register named @var{reg} as an allocable register that is diff --git gcc/gcc.c gcc/gcc.c index 0642be1..5c7c462 100644 --- gcc/gcc.c +++ gcc/gcc.c @@ -1,3 +1,5 @@ +#define FPRINTF if (getenv(DEBUG)) fprintf + /* Compiler driver program that can handle many languages. Copyright (C) 1987-2015 Free Software Foundation, Inc. @@ -158,7 +160,7 @@ static const char *const spec_version = DEFAULT_TARGET_VERSION; static const char *spec_machine = DEFAULT_TARGET_MACHINE; static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE; -/* List of offload targets. */ +/* List of offload targets. Empty string for -foffload=disable. */ static char *offload_targets = NULL; @@ -275,6 +277,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **); static const char *pass_through_libs_spec_func (int, const char **); static const char *replace_extension_spec_func (int, const char **); static
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
Hi, On Fri, Aug 14, 2015 at 03:19:26PM +0200, Ilya Verbin wrote: 2015-08-14 11:47 GMT+02:00 Thomas Schwinge tho...@codesourcery.com: On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote: @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, return gomp_target_fallback (fn, hostaddrs); void *fn_addr = gomp_get_target_fn_addr (devicep, fn); + if (fn_addr == NULL) +return gomp_target_fallback (fn, hostaddrs); Is that reliable? Consider the following scenario, with f1 and f2 implemented in separate TUs: #pragma omp target data [map clauses] { f1([...]); f2([...]); } Consider that in f1 we have a OpenMP target region with offloading data available, and in f2 we have a OpenMP target region without offloading data available. In this case, the GOMP_target in f1 will execute on the offloading target, but the GOMP_target in f2 will resort to host fallback -- and we then likely have data inconsistencies, as the data specified by the map clauses is not synchronized between host and device. Admittedly, this is user error (inconsistent set of offloading functions available -- need either all, or none), but in such a scenario probably we should be doing a better job (at detecting this). (Note, I'm not sure whether my current patch actually does any better.) ;-) You're right. That's why I didn't send this patch for review yet. My current plan is as follows: * Use this approach for architectures with shared memory, since it allows mixing host and target functions. Great, please keep me posted on these changes. Thanks! Martin * For non-shared memory, at the first splay tree lookup: ** If target fn is not found, run the whole program in host-fallback mode. ** If it's found, then all target fns must exist. I.e. if some tgt_addr (not first) is NULL, then libgomp will issue an error as it does now. -- Ilya
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
2015-08-14 11:47 GMT+02:00 Thomas Schwinge tho...@codesourcery.com: On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote: @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, return gomp_target_fallback (fn, hostaddrs); void *fn_addr = gomp_get_target_fn_addr (devicep, fn); + if (fn_addr == NULL) +return gomp_target_fallback (fn, hostaddrs); Is that reliable? Consider the following scenario, with f1 and f2 implemented in separate TUs: #pragma omp target data [map clauses] { f1([...]); f2([...]); } Consider that in f1 we have a OpenMP target region with offloading data available, and in f2 we have a OpenMP target region without offloading data available. In this case, the GOMP_target in f1 will execute on the offloading target, but the GOMP_target in f2 will resort to host fallback -- and we then likely have data inconsistencies, as the data specified by the map clauses is not synchronized between host and device. Admittedly, this is user error (inconsistent set of offloading functions available -- need either all, or none), but in such a scenario probably we should be doing a better job (at detecting this). (Note, I'm not sure whether my current patch actually does any better.) ;-) You're right. That's why I didn't send this patch for review yet. My current plan is as follows: * Use this approach for architectures with shared memory, since it allows mixing host and target functions. * For non-shared memory, at the first splay tree lookup: ** If target fn is not found, run the whole program in host-fallback mode. ** If it's found, then all target fns must exist. I.e. if some tgt_addr (not first) is NULL, then libgomp will issue an error as it does now. -- Ilya
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
On Fri, 14 Aug 2015, Thomas Schwinge wrote: This function »generate[s] a C source file containing a constructor call to GOMP_set_offload_targets [...], and adds that as an infile«. This basically works ;-) -- but really only for C source code, and for C++ and Fortran it fails if there are command-line options used that conflict with the C compilation of add-omp-infile, such as (from a libgomp testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for Fortran: -fcray-pointer, -fintrinsic-modules-path. Any suggestion about how to overcome that? I suppose you need to use the option-handling information about which options are for which languages to filter out any options that aren't valid for C or Common. -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Fri, 14 Aug 2015, Thomas Schwinge wrote: Can you suggest off-hand where you'd expect this option filtering to happen? Should this be during specs parsing in the driver; something like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in gcc/gcc.c:give_switch ignore any switches that don't match the expected CL_*? I seem to have difficulties to properly populate/deduce that lang_mask at the call sites of gcc/gcc.c:save_switch. Or, did you imagine that to be done differently? I don't have a particular design in mind; I was simply noting that the relevant information is available to the driver through the option handling data. Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang to silently ignore options that don't apply instead of emitting a »is valid for [...] but not for [...]« diagnostic, if a (new) flag (-f[something]?) has been set, which would be active only during the add-omp-infile compilation? That would be a possibility, yes. -- Joseph S. Myers jos...@codesourcery.com
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi! On Fri, 14 Aug 2015 16:56:25 +, Joseph Myers jos...@codesourcery.com wrote: On Fri, 14 Aug 2015, Thomas Schwinge wrote: This function »generate[s] a C source file containing a constructor call to GOMP_set_offload_targets [...], and adds that as an infile«. This basically works ;-) -- but really only for C source code, and for C++ and Fortran it fails if there are command-line options used that conflict with the C compilation of add-omp-infile, such as (from a libgomp testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for Fortran: -fcray-pointer, -fintrinsic-modules-path. Any suggestion about how to overcome that? The problem, as (I hope) I understand it, is that gcc/gcc.c:cc1_options includes %{std*[...]} and %{f*}, which will match/accept the C++/Fortran-specific command-line arguments (as cited above) even if actually operating in C language mode for the add-omp-infile compilation. I suppose you need to use the option-handling information about which options are for which languages to filter out any options that aren't valid for C or Common. OK, that sounds simple enough, conceptually. So, you are invalidating my worry that the driver might in fact not be able to do this kind of thing (mixed language compilation). I'm currently trying to understand how all that command-line option parsing code works, and the handoff from the driver to the frontends; processing of the specs language. Can you suggest off-hand where you'd expect this option filtering to happen? Should this be during specs parsing in the driver; something like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in gcc/gcc.c:give_switch ignore any switches that don't match the expected CL_*? I seem to have difficulties to properly populate/deduce that lang_mask at the call sites of gcc/gcc.c:save_switch. Or, did you imagine that to be done differently? Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang to silently ignore options that don't apply instead of emitting a »is valid for [...] but not for [...]« diagnostic, if a (new) flag (-f[something]?) has been set, which would be active only during the add-omp-infile compilation? Grüße, Thomas pgp8sVpxQ9Zgf.pgp Description: PGP signature
Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
Hi! Assuming that the overall approach (my option a) is fine, this is now primarily a question about how to teach the driver to the right thing. (Joseph CCed as driver reviewer.) On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin iver...@gmail.com wrote: On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote: On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin iver...@gmail.com wrote: On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote: We had established the use of a boolean flag have_offload in gcc::context to indicate whether during compilation, we've actually seen any code to be offloaded (see cited below the relevant parts of the patch by Ilya et al.). This means that currently, the whole offload machinery will not be run unless we actually have any offloaded data. This means that the configured mkoffload programs (-foffload=[...], defaulting to configure-time --enable-offload-targets=[...]) will not be invoked unless we actually have any offloaded data. This means that we will not actually generate constructor code to call libgomp's GOMP_offload_register unless we actually have any offloaded data. Yes, that was the plan. runtime, in libgomp, we then cannot reliably tell which -foffload=[...] targets have been specified during compilation. But: at runtime, I'd like to know which -foffload=[...] targets have been specified during compilation, so that we can, for example, reliably resort to host fallback execution for -foffload=disable instead of getting error message that an offloaded function is missing. It's easy to fix: diff --git a/libgomp/target.c b/libgomp/target.c index a5fb164..f81d570 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep, k.host_end = k.host_start + 1; splay_tree_key tgt_fn = splay_tree_lookup (devicep-mem_map, k); gomp_mutex_unlock (devicep-lock); - if (tgt_fn == NULL) - gomp_fatal (Target function wasn't mapped); - return (void *) tgt_fn-tgt_offset; } Won't that possibly result in a NULL pointer dereference (tgt_fn) -- instead return NULL, I think? @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, return gomp_target_fallback (fn, hostaddrs); void *fn_addr = gomp_get_target_fn_addr (devicep, fn); + if (fn_addr == NULL) +return gomp_target_fallback (fn, hostaddrs); Is that reliable? Consider the following scenario, with f1 and f2 implemented in separate TUs: #pragma omp target data [map clauses] { f1([...]); f2([...]); } Consider that in f1 we have a OpenMP target region with offloading data available, and in f2 we have a OpenMP target region without offloading data available. In this case, the GOMP_target in f1 will execute on the offloading target, but the GOMP_target in f2 will resort to host fallback -- and we then likely have data inconsistencies, as the data specified by the map clauses is not synchronized between host and device. Admittedly, this is user error (inconsistent set of offloading functions available -- need either all, or none), but in such a scenario probably we should be doing a better job (at detecting this). (Note, I'm not sure whether my current patch actually does any better.) ;-) other hand, for example, for -foffload=nvptx-none, even if user program code doesn't contain any offloaded data (and thus the offload machinery has not been run), the user program might still contain any executable directives or OpenACC runtime library calls, so we'd still like to use the libgomp nvptx plugin. However, we currently cannot detect this situation. I see two ways to resolve this: a) embed the compile-time -foffload=[...] configuration in the executable (as a string, for example) for libgomp to look that up, or b) make it a requirement that (if configured via -foffload=[...]), the offload machinery is run even if there is not actually any data to be offloaded, so we then reliably get the respective constructor call to libgomp's GOMP_offload_register. I once began to implement a), but this to get a big ugly, so then looked into b) instead. Compared to the status quo, always running the whole offloading machinery for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp are active, certainly does introduce some overhead when there isn't actually any code to be offloaded, so I'm not sure whether that is acceptable? I vote for (a). OK. Any other opinions? What happens for conflicting -fofffload=[...] options in different TUs? If you're asking about what happens now, only the list of offload targets from link-time -foffload=tgt1,tgt2 option matters. I'm fine with that -- require the user to specify a consistent set of -foffload