Re: OpenACC Profiling Interface: 'acc_register_library' (was: OpenACC 2.5 Profiling Interface)
On Thu, May 16, 2019 at 05:21:56PM +0200, Thomas Schwinge wrote: > > Jakub, would you please especially review the non-OpenACC-specific > > changes here, including the libgomp ABI changes? > > Given a baseline that I've not yet posted ;-) would you please anyway > have a look at the following changes? Is it OK to add/handle the > 'acc_register_library' symbol in this way? The idea behind that one is > that you dynamically (including via 'LD_PRELOAD') link your code against > a "library" providing an implementation of 'acc_register_library', or > even define it in your user code (see the test case below), and then upon > initialization, "The OpenACC runtime will invoke 'acc_register_library', > passing [...]". Ugh, it is a mess (but then, seems OMPT has the same mess with ompt_start_tool symbol). It is nasty to call acc_register_library from initialization of the OpenMP library, similarly to nastyness of calling ompt_start_tool from initialization of the OpenACC library, neither of those symbols is reserved to the implementation generally. Can't we not do anything for -fopenacc or -fopenmp and have -fopenacc-profile or -fopenmpt options that would link in another shared library which just provides that symbol and calls it from its initialization? The dummy implementation would be __attribute__((weak)) and would dlsym (RTLD_NEXT, "...") and call that if it returns non-NULL, so even if that library happens to be linked before whatever library implements the user symbol. Looking at what libomp does for ompt_start_tool, for Darwin they don't use a weak symbol and instead just dlsym(RTLD_DEFAULT, "...") in the library ctor, for Linux they have a weak definition that does dlsym (RTLD_NEXT, "...") and for Windows use something yet different. > --- libgomp/libgomp.map > +++ libgomp/libgomp.map > @@ -469,6 +469,7 @@ OACC_2.5 { > acc_prof_lookup; > acc_prof_register; > acc_prof_unregister; > + acc_register_library; > acc_update_device_async; > acc_update_device_async_32_h_; > acc_update_device_async_64_h_; You certainly never want to add something to a symbol version that has been shipped in a release compiler already. Jakub
OpenACC Profiling Interface: 'acc_register_library' (was: OpenACC 2.5 Profiling Interface)
Hi Jakub! On Sun, 11 Nov 2018 22:31:42 -0600, I wrote: > On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > > Interface". > > I'd like to get that into trunk. It's not yet complete (that is, doesn't > provide all the information specified), but it's very useful already, and > the missing pieces can later be added incrementally. > > Jakub, would you please especially review the non-OpenACC-specific > changes here, including the libgomp ABI changes? Given a baseline that I've not yet posted ;-) would you please anyway have a look at the following changes? Is it OK to add/handle the 'acc_register_library' symbol in this way? The idea behind that one is that you dynamically (including via 'LD_PRELOAD') link your code against a "library" providing an implementation of 'acc_register_library', or even define it in your user code (see the test case below), and then upon initialization, "The OpenACC runtime will invoke 'acc_register_library', passing [...]". As far as I can tell, it was never a concern (by us internally as well as that nobody external ever complained) that 'acc_*' and 'GOACC_*' symbols are visible when building with '-fopenmp' but (default) '-fno-openacc', and vice versa, 'omp_*' and 'GOMP_*' symbols are visible when building with '-fopenacc' but (default) '-fno-openmp'. But, 'acc_register_library' is special in that the runtime (libgomp) will unconditionally call it, also for '-fopenmp' but (default) '-fno-openacc'. So, when OpenMP user code happens to contain an (unrelated) 'acc_register_library' symbol, strange things will happen. OpenACC states that "Typically, the OpenACC runtime will include a _weak_ definition of 'acc_register_library', which does nothing and which will be called when there is no tools library". I'm not sure if that's "weak" specifically in the ELF linking sense, or just generally "weak" semantics. But it seemed easy enough to just provide a regular symbol in its own '*.o' file, to be overridden in both the dynamic and static linking cases, so that's what I've done. Any comments to that aspect? --- libgomp/Makefile.am +++ libgomp/Makefile.am @@ -66,7 +66,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \ splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c oacc-init.c \ oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \ affinity-fmt.c teams.c \ - oacc-profiling.c + oacc-profiling.c oacc-profiling-acc_register_library.c include $(top_srcdir)/plugin/Makefrag.am --- libgomp/acc_prof.h +++ libgomp/acc_prof.h @@ -235,6 +235,9 @@ extern void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t) typedef void (*acc_query_fn) (); typedef acc_query_fn (*acc_prof_lookup_func) (const char *); extern acc_query_fn acc_prof_lookup (const char *) __GOACC_NOTHROW; +/* Don't tag 'acc_register_library' as '__GOACC_NOTHROW': this function can be + overridden by the application, and must be expected to do anything. */ +extern void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func); #ifdef __cplusplus --- libgomp/libgomp.map +++ libgomp/libgomp.map @@ -469,6 +469,7 @@ OACC_2.5 { acc_prof_lookup; acc_prof_register; acc_prof_unregister; + acc_register_library; acc_update_device_async; acc_update_device_async_32_h_; acc_update_device_async_64_h_; --- /dev/null +++ libgomp/oacc-profiling-acc_register_library.c @@ -0,0 +1,40 @@ +/* OpenACC Profiling Interface: stub 'acc_register_library' function +[...] + +#include "libgomp.h" +#include "acc_prof.h" + +/* This is in its own file so that this function definition can be overridden + when linking statically. */ + +void +acc_register_library (acc_prof_reg reg, acc_prof_reg unreg, + acc_prof_lookup_func lookup) +{ + gomp_debug (0, "dummy %s\n", __FUNCTION__); +} --- libgomp/oacc-profiling.c +++ libgomp/oacc-profiling.c @@ -107,6 +107,12 @@ goacc_profiling_initialize (void) /* ..., but profiling is still disabled. */ __atomic_store_n (_prof_enabled, false, MEMMODEL_RELAXED); + /* We are to invoke an external acc_register_library routine, defaulting to + our stub oacc-profiling-acc_register_library.c:acc_register_library + implementation. */ + gomp_debug (0, "%s: calling acc_register_library\n", __FUNCTION__); + acc_register_library (acc_prof_register, acc_prof_unregister, + acc_prof_lookup); #ifdef PLUGIN_SUPPORT char *acc_proflibs = secure_getenv ("ACC_PROFLIB"); while (acc_proflibs != NULL && acc_proflibs[0] != '\0') @@ -139,16 +145,24 @@ goacc_profiling_initialize (void) void
Re: OpenACC 2.5 Profiling Interface
Hi Jakub! On Tue, 4 Dec 2018 14:13:49 +0100, Jakub Jelinek wrote: > On Sun, Nov 11, 2018 at 10:31:42PM -0600, Thomas Schwinge wrote: > > On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > > > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > > > Interface". > > > > I'd like to get that into trunk. It's not yet complete (that is, doesn't > > provide all the information specified), but it's very useful already, and > > the missing pieces can later be added incrementally. > > > > Jakub, would you please especially review the non-OpenACC-specific > > changes here, including the libgomp ABI changes? > > > > (Note that this patch doesn't apply on top of trunk. I extracted it out > > of openacc-gcc-8-branch, plus additional changes, and it depends on a > > number of other pending patches. Due to the many regions of code > > touched, there are a lot of "textual" conflicts when porting it to > > current trunk, but the "structure" will be the same.) > > Seems rather expensive to me, especially with the dependence on > libbacktrace and the unconditional initialization of the profiling from the > library constructor. Could e.g. libbacktrace or some libgomp plugin that is > linked against libbacktrace be dlopened only when apps ask for this stuff? Thanks, that seems plausible, and I'm looking into that. > OpenMP 5 has a profiling API too (OMPT) (... which I'm not familiar with...) > there the rough plan for when it > will be implemented is that libgomp as the library will implement only the > absolute required minimum and perhaps have a variant library that is a > replacement for libgomp if more detailed instrumentation is needed. The "problem" with the OpenACC Profiling Interface is that the user can enable the callbacks etc. anytime dynamically at run time. So, as I understand, that rules out the "variant library" approach? Grüße Thomas
Re: OpenACC 2.5 Profiling Interface
On Sun, Nov 11, 2018 at 10:31:42PM -0600, Thomas Schwinge wrote: > On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > > Interface". > > I'd like to get that into trunk. It's not yet complete (that is, doesn't > provide all the information specified), but it's very useful already, and > the missing pieces can later be added incrementally. > > Jakub, would you please especially review the non-OpenACC-specific > changes here, including the libgomp ABI changes? > > (Note that this patch doesn't apply on top of trunk. I extracted it out > of openacc-gcc-8-branch, plus additional changes, and it depends on a > number of other pending patches. Due to the many regions of code > touched, there are a lot of "textual" conflicts when porting it to > current trunk, but the "structure" will be the same.) Seems rather expensive to me, especially with the dependence on libbacktrace and the unconditional initialization of the profiling from the library constructor. Could e.g. libbacktrace or some libgomp plugin that is linked against libbacktrace be dlopened only when apps ask for this stuff? OpenMP 5 has a profiling API too (OMPT), there the rough plan for when it will be implemented is that libgomp as the library will implement only the absolute required minimum and perhaps have a variant library that is a replacement for libgomp if more detailed instrumentation is needed. Jakub
Documentation changes for OpenACC 2.5 Profiling Interface (was: More OpenACC 2.5 Profiling Interface)
Hi! On Mon, 15 May 2017 08:52:39 +0200, I wrote: > On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > > Interface". In r245784, I committed incomplete support to > > gomp-4_0-branch. I plan to continue working on this, but wanted to > > synchronize at this point. > > > > commit b22a85fe7f3daeb48460e7aa28606d0cdb799f69 > > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> > > Date: Tue Feb 28 17:36:03 2017 + > > > > OpenACC 2.5 Profiling Interface (incomplete) > > Committed to gomp-4_0-branch in r248042: > > commit e3720963a1f494b2a0a1b6c28d5eb8bfb7c0d546 > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> > Date: Mon May 15 06:50:17 2017 + > > More OpenACC 2.5 Profiling Interface Committed to gomp-4_0-branch in r248058: commit b58008024048f960eed9fd709cbe5d5ea96c Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Mon May 15 11:45:45 2017 + Documentation changes for OpenACC 2.5 Profiling Interface libgomp/ * libgomp.texi (OpenACC Environment Variables): Mention "ACC_PROFLIB". (OpenACC Profiling Interface): Update. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@248058 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 4 libgomp/libgomp.texi | 21 ++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index f36cbfc..3125c99 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,5 +1,9 @@ 2017-05-15 Thomas Schwinge <tho...@codesourcery.com> + * libgomp.texi (OpenACC Environment Variables): Mention + "ACC_PROFLIB". + (OpenACC Profiling Interface): Update. + * libgomp.texi: Update for OpenACC 2.5. * openacc.f90 (openacc_version): Update to "201510". * openacc_lib.h (openacc_version): Likewise. diff --git libgomp/libgomp.texi libgomp/libgomp.texi index 74b98c7..7a3c491 100644 --- libgomp/libgomp.texi +++ libgomp/libgomp.texi @@ -2839,13 +2839,15 @@ A.2.1.4. @node OpenACC Environment Variables @chapter OpenACC Environment Variables -The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} +The variables @env{ACC_DEVICE_TYPE}, @env{ACC_DEVICE_NUM}, +and @code{ACC_PROFLIB} are defined by section 4 of the OpenACC specification in version 2.5. The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. @menu * ACC_DEVICE_TYPE:: * ACC_DEVICE_NUM:: +* ACC_PROFLIB:: * GCC_ACC_NOTIFY:: @end menu @@ -2871,6 +2873,19 @@ The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. +@node ACC_PROFLIB +@section @code{ACC_PROFLIB} +@table @asis +@item @emph{See also}: +@ref{OpenACC Profiling Interface} + +@item @emph{Reference}: +@uref{http://www.openacc.org/, OpenACC specification v2.5}, section +4.3. +@end table + + + @node GCC_ACC_NOTIFY @section @code{GCC_ACC_NOTIFY} @table @asis @@ -3095,8 +3110,8 @@ Application Programming Interface}, version 2.5.} @section Implementation Status and Implementation-Defined Behavior -We're not yet implementing the whole Profiling Interface as defined by -the OpenACC 2.5 specification. Also, the specification doesn't +We're implementing most of the Profiling Interface as defined by +the OpenACC 2.5 specification. The specification doesn't clearly define some aspects of its Profiling Interface, so we're clarifying these as @emph{implementation-defined behavior} here. We already have reported to the OpenACC Technical Committee some issues, Grüße Thomas
More OpenACC 2.5 Profiling Interface (was: OpenACC 2.5 Profiling Interface (incomplete))
Hi! On Tue, 28 Feb 2017 18:43:36 +0100, I wrote: > The 2.5 versions of the OpenACC standard added a new chapter "Profiling > Interface". In r245784, I committed incomplete support to > gomp-4_0-branch. I plan to continue working on this, but wanted to > synchronize at this point. > > commit b22a85fe7f3daeb48460e7aa28606d0cdb799f69 > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> > Date: Tue Feb 28 17:36:03 2017 + > > OpenACC 2.5 Profiling Interface (incomplete) Committed to gomp-4_0-branch in r248042: commit e3720963a1f494b2a0a1b6c28d5eb8bfb7c0d546 Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Mon May 15 06:50:17 2017 + More OpenACC 2.5 Profiling Interface libgomp/ * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) (acc_wait_async, acc_wait_all, acc_wait_all_async): Set up profiling. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-init.c (acc_set_device_type, acc_get_device_type) (acc_get_device_num): Likewise. * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) (acc_map_data, acc_unmap_data, present_create_copy) (delete_copyout, update_dev_host): Likewise. * oacc-parallel.c (GOACC_data_start, GOACC_data_end) (GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. * oacc-profiling.c (goacc_profiling_setup_p): New function. (goacc_profiling_dispatch_p): Add a "bool" formal parameter. Adjust all users. * oacc-int.h (goacc_profiling_setup_p) (goacc_profiling_dispatch_p): Update. * plugin/plugin-nvptx.c (nvptx_exec, nvptx_wait, nvptx_wait_all): Generate more profiling events. * libgomp.texi (OpenACC Profiling Interface): Update. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@248042 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp| 24 +++ libgomp/libgomp.texi | 74 +++-- libgomp/oacc-async.c | 110 - libgomp/oacc-cuda.c | 82 -- libgomp/oacc-init.c | 102 +++- libgomp/oacc-int.h| 4 +- libgomp/oacc-mem.c| 154 +- libgomp/oacc-parallel.c | 357 +++--- libgomp/oacc-profiling.c | 100 +++- libgomp/plugin/plugin-nvptx.c | 113 - 10 files changed, 1056 insertions(+), 64 deletions(-) diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index 5dc0889..23882cf 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,27 @@ +2017-05-15 Thomas Schwinge <tho...@codesourcery.com> + + * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) + (acc_wait_async, acc_wait_all, acc_wait_all_async): Set up + profiling. + * oacc-cuda.c (acc_get_current_cuda_device) + (acc_get_current_cuda_context, acc_get_cuda_stream) + (acc_set_cuda_stream): Likewise. + * oacc-init.c (acc_set_device_type, acc_get_device_type) + (acc_get_device_num): Likewise. + * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) + (acc_map_data, acc_unmap_data, present_create_copy) + (delete_copyout, update_dev_host): Likewise. + * oacc-parallel.c (GOACC_data_start, GOACC_data_end) + (GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. + * oacc-profiling.c (goacc_profiling_setup_p): New function. + (goacc_profiling_dispatch_p): Add a "bool" formal parameter. + Adjust all users. + * oacc-int.h (goacc_profiling_setup_p) + (goacc_profiling_dispatch_p): Update. + * plugin/plugin-nvptx.c (nvptx_exec, nvptx_wait, nvptx_wait_all): + Generate more profiling events. + * libgomp.texi (OpenACC Profiling Interface): Update. + 2017-05-14 Thomas Schwinge <tho...@codesourcery.com> * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: New diff --git libgomp/libgomp.texi libgomp/libgomp.texi index 93365cd..b3fa139 100644 --- libgomp/libgomp.texi +++ libgomp/libgomp.texi @@ -3207,12 +3207,19 @@ Will be @code{acc_construct_parallel} for OpenACC kernels constructs; should be @code{acc_construct_kernels}. @item +Will be @code{acc_construct_enter_data} or +@code{acc_construct_exit_data} when processing variable mappings +specified in OpenACC declare directives; should be +@code{acc_construct_declare}. + +@item For implicit @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}, and explicit as well as implicit @code{acc_ev_alloc}, @code{acc_ev_free}, @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload
OpenACC 2.5 Profiling Interface (incomplete)
Hi! The 2.5 versions of the OpenACC standard added a new chapter "Profiling Interface". In r245784, I committed incomplete support to gomp-4_0-branch. I plan to continue working on this, but wanted to synchronize at this point. commit b22a85fe7f3daeb48460e7aa28606d0cdb799f69 Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue Feb 28 17:36:03 2017 +0000 OpenACC 2.5 Profiling Interface (incomplete) libgomp/ * acc_prof.h: New file. * oacc-profiling-acc_register_library.c: Likewise. * oacc-profiling.c: Likewise. * Makefile.am (nodist_libsubinclude_HEADERS, libgomp_la_SOURCES): Add these, respectively. * Makefile.in: Regenerate. * libgomp/config/nvptx/oacc-profiling-acc_register_library.c: New empty file. * libgomp/config/nvptx/oacc-profiling.c: Likewise. * env.c (initialize_env): Call goacc_profiling_initialize. * libgomp-plugin.c: New function GOMP_PLUGIN_goacc_profiling_dispatch. * libgomp-plugin.h: Declare function GOMP_PLUGIN_goacc_profiling_dispatch. * oacc-plugin.c: New function GOMP_PLUGIN_goacc_thread. * oacc-plugin.h: Declare function GOMP_PLUGIN_goacc_thread. * libgomp.map (OACC_2.5): Add acc_prof_lookup, acc_prof_register, acc_prof_unregister, and acc_register_library. Add GOMP_PLUGIN_goacc_profiling_dispatch, and GOMP_PLUGIN_goacc_thread with new GOMP_PLUGIN_1.3 symbol version. * oacc-int.h (struct goacc_thread): Add "acc_prof_info *prof_info", "acc_api_info *api_info", and "bool prof_callbacks_enabled" members. Declare functions goacc_profiling_initialize, goacc_profiling_dispatch_p, and goacc_profiling_dispatch. * oacc-init.c (acc_init_1): Add "acc_construct_t", and "int" formal parameters. Adjust all users. (acc_init_1, goacc_attach_host_thread_to_device, acc_init) (goacc_lazy_initialize): Update for OpenACC Profiling Interface. * oacc-parallel.c (GOACC_parallel_keyed): Likewise. * plugin/plugin-nvptx.c (cuda_map_create, cuda_map_destroy) (map_init, map_fini, map_pop, map_push): Add "struct goacc_thread *" formal parameter. Adjust all users. (select_stream_for_async, event_gc, nvptx_exec, nvptx_host2dev) (nvptx_dev2host, nvptx_set_cuda_stream): Call GOMP_PLUGIN_goacc_thread instead of nvptx_thread. (cuda_map_create, cuda_map_destroy, nvptx_exec, nvptx_alloc) (nvptx_free, nvptx_host2dev, nvptx_dev2host): Update for OpenACC Profiling Interface. * libgomp.texi: New chapter "OpenACC Profiling Interface". * testsuite/libgomp.oacc-c-c++-common/acc_prof-dispatch-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-valid_bytes-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245784 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 50 ++ libgomp/Makefile.am| 5 +- libgomp/Makefile.in| 10 +- libgomp/acc_prof.h | 237 +++ .../nvptx/oacc-profiling-acc_register_library.c| 0 libgomp/config/nvptx/oacc-profiling.c | 0 libgomp/env.c | 3 +- libgomp/libgomp-plugin.c | 9 + libgomp/libgomp-plugin.h | 6 + libgomp/libgomp.map| 11 + libgomp/libgomp.texi | 246 +++ libgomp/oacc-init.c| 68 +- libgomp/oacc-int.h | 12 + libgomp/oacc-parallel.c| 126 +++- libgomp/oacc-plugin.c | 13 + libgomp/oacc-plugin.h | 3 + ...gin.h => oacc-profiling-acc_register_library.c} | 19 +- libgomp/oacc-profiling.c | 576 + libgomp/plugin/plugin-nvptx.c | 315 - .../acc_prof-dispatch-1.c | 344 ++ .../libgomp.oacc-c-c++-common/acc_prof-init-1.c| 306 + .../acc_prof-parallel-1.c | 703 + .../acc_prof-valid_bytes-1.c