Re: [gomp4] Add tables generation
Hi, On 05 Sep 19:09, Ilya Verbin wrote: I hope that it will not affect the regular LTO compilation. At least, the 'make check' did not reveal any regression on the system with linker plugin. I found that relying on -flto option in lto-wrapper was a bad idea. E.g., this simple case is not working: $ gcc -c -flto test.c $ gcc test.o # Here -flto is absent, but lto-wrapper must recompile IR from test.o So I reverted back all conditions with lto_mode in lto-wrapper, and instead introduced a new flag have_lto. The following patch looks like working now. Thanks, -- Ilya --- gcc/cgraphunit.c| 38 +--- gcc/lto-wrapper.c | 66 +++-- gcc/omp-low.c | 6 + gcc/passes.c| 2 +- lto-plugin/lto-plugin.c | 25 ++- 5 files changed, 97 insertions(+), 40 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index f0c9f5c..b87e6dd 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2040,13 +2040,25 @@ output_in_order (void) free (nodes); } -/* Collect all global variables with omp declare target attribute into - OFFLOAD_VARS. It will be streamed out in ipa_write_summaries. */ +/* Check whether there is at least one function or global variable to offload. + Also collect all such global variables into OFFLOAD_VARS, the functions were + already collected in omp-low.c. They will be streamed out in + ipa_write_summaries. */ -static void -init_offload_var_table (void) +static bool +initialize_offload (void) { + bool have_offload = false; + struct cgraph_node *node; struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl))) + { + have_offload = true; + break; + } + FOR_EACH_DEFINED_VARIABLE (vnode) { if (!lookup_attribute (omp declare target, @@ -2054,13 +2066,17 @@ init_offload_var_table (void) || TREE_CODE (vnode-decl) != VAR_DECL || DECL_SIZE (vnode-decl) == 0) continue; + have_offload = true; vec_safe_push (offload_vars, vnode-decl); } + + return have_offload; } static void ipa_passes (void) { + bool have_offload = false; gcc::pass_manager *passes = g-get_passes (); set_cfun (NULL); @@ -2068,6 +2084,14 @@ ipa_passes (void) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); + if (!in_lto_p (flag_openacc || flag_openmp)) +{ + have_offload = initialize_offload (); + /* OpenACC / OpenMP offloading requires LTO infrastructure. */ + if (have_offload) + flag_generate_lto = 1; +} + invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL); if (!in_lto_p) @@ -2108,11 +2132,7 @@ ipa_passes (void) if (!in_lto_p) { - init_offload_var_table (); - - if ((flag_openacc || flag_openmp) - !(vec_safe_is_empty (offload_funcs) - vec_safe_is_empty (offload_vars))) + if (have_offload) { section_name_prefix = OMP_SECTION_NAME_PREFIX; ipa_write_summaries (true); diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 80d10f3..86669cf 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -619,6 +619,7 @@ run_gcc (unsigned argc, char *argv[]) unsigned int decoded_options_count; struct obstack argv_obstack; int new_head_argc; + bool have_lto = false; bool have_offload = false; /* Get the driver and options. */ @@ -668,6 +669,11 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } + /* We may choose not to write out this .opts section in the future. In +that case we'll have to use something else to look for. */ + if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, + offset, length, errmsg, err)) + have_offload = true; if (!simple_object_find_section (sobj, LTO_SECTION_NAME_PREFIX . opts, offset, length, errmsg, err)) { @@ -675,11 +681,7 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } - /* We may choose not to write out this .opts section in the future. In -that case we'll have to use something else to look for. */ - if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, - offset, length, errmsg, err)) - have_offload = true; + have_lto = true; lseek (fd, file_offset + offset, SEEK_SET); data = (char *)xmalloc (length); read (fd, data, length); @@ -871,6 +873,32 @@ run_gcc (unsigned argc, char *argv[]) /* Remember at which point we can scrub args to re-use the commons. */ new_head_argc = obstack_object_size (argv_obstack) / sizeof (void *); + if (have_offload) +{ + compile_images_for_openmp_targets (argc,
Re: [gomp4] Add tables generation
Hi, On 04 Sep 15:54, Thomas Schwinge wrote: Aha, it's gcc/gcc.c:LINK_PLUGIN_SPEC that is unconditionally adding the -fresolution option. Here is a hack that seems to make it work, but that most certainly should be done differently: Here is updated patch, which solves the problem with linker plugin. I hope that it will not affect the regular LTO compilation. At least, the 'make check' did not reveal any regression on the system with linker plugin. Bootstrap also passed. Ok for gomp-4_0-branch? -- Ilya --- gcc/cgraphunit.c| 38 --- gcc/lto-wrapper.c | 69 ++--- gcc/omp-low.c | 6 + gcc/passes.c| 2 +- lto-plugin/lto-plugin.c | 25 +- 5 files changed, 97 insertions(+), 43 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index f0c9f5c..b87e6dd 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2040,13 +2040,25 @@ output_in_order (void) free (nodes); } -/* Collect all global variables with omp declare target attribute into - OFFLOAD_VARS. It will be streamed out in ipa_write_summaries. */ +/* Check whether there is at least one function or global variable to offload. + Also collect all such global variables into OFFLOAD_VARS, the functions were + already collected in omp-low.c. They will be streamed out in + ipa_write_summaries. */ -static void -init_offload_var_table (void) +static bool +initialize_offload (void) { + bool have_offload = false; + struct cgraph_node *node; struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl))) + { + have_offload = true; + break; + } + FOR_EACH_DEFINED_VARIABLE (vnode) { if (!lookup_attribute (omp declare target, @@ -2054,13 +2066,17 @@ init_offload_var_table (void) || TREE_CODE (vnode-decl) != VAR_DECL || DECL_SIZE (vnode-decl) == 0) continue; + have_offload = true; vec_safe_push (offload_vars, vnode-decl); } + + return have_offload; } static void ipa_passes (void) { + bool have_offload = false; gcc::pass_manager *passes = g-get_passes (); set_cfun (NULL); @@ -2068,6 +2084,14 @@ ipa_passes (void) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); + if (!in_lto_p (flag_openacc || flag_openmp)) +{ + have_offload = initialize_offload (); + /* OpenACC / OpenMP offloading requires LTO infrastructure. */ + if (have_offload) + flag_generate_lto = 1; +} + invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL); if (!in_lto_p) @@ -2108,11 +2132,7 @@ ipa_passes (void) if (!in_lto_p) { - init_offload_var_table (); - - if ((flag_openacc || flag_openmp) - !(vec_safe_is_empty (offload_funcs) - vec_safe_is_empty (offload_vars))) + if (have_offload) { section_name_prefix = OMP_SECTION_NAME_PREFIX; ipa_write_summaries (true); diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 80d10f3..01e8f06 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -668,6 +668,11 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } + /* We may choose not to write out this .opts section in the future. In +that case we'll have to use something else to look for. */ + if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, + offset, length, errmsg, err)) + have_offload = true; if (!simple_object_find_section (sobj, LTO_SECTION_NAME_PREFIX . opts, offset, length, errmsg, err)) { @@ -675,11 +680,6 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } - /* We may choose not to write out this .opts section in the future. In -that case we'll have to use something else to look for. */ - if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, - offset, length, errmsg, err)) - have_offload = true; lseek (fd, file_offset + offset, SEEK_SET); data = (char *)xmalloc (length); read (fd, data, length); @@ -871,7 +871,32 @@ run_gcc (unsigned argc, char *argv[]) /* Remember at which point we can scrub args to re-use the commons. */ new_head_argc = obstack_object_size (argv_obstack) / sizeof (void *); - if (lto_mode == LTO_MODE_LTO) + if (have_offload) +{ + compile_images_for_openmp_targets (argc, argv); + if (offload_names) + { + find_ompbeginend (); + for (i = 0; offload_names[i]; i++) + printf (%s\n, offload_names[i]); + free_array_of_ptrs ((void **) offload_names, i); + } +} + + if (ompbegin) +printf (%s\n, ompbegin); + + if (lto_mode
Re: [gomp4] Add tables generation
Hi! On Tue, 2 Sep 2014 21:49:46 +0400, Ilya Verbin iver...@gmail.com wrote: This patch allows to compile binaries with offloading without passing -flto option, and w/o performing link-time optimizations of the host code. Thanks for working on this! How it works: 1. If there is at least one function or global variable to offload, gcc sets flag_generate_lto. This enables writing the bytecode produced by ipa_write_summaries into .gnu.target_lto_* sections (.gnu.lto_* sections are not created). Also this flag emits LTO marker (__gnu_lto_v1). 2. This step is not changed: collect2 scans object files for the LTO marker and fills the list of LTO objects. If the list is not empty, it runs lto-wrapper to perform link-time recompilation. 3. lto-wrapper compiles images for targets. And if -flto option is absent (lto_mode == LTO_MODE_NONE), then it just returns the list of input objects without recompilation. That seems sane to me. (But you guys have looked into this design/code in much more detail than I have.) I'm facing one problem; I guess the crucial detail is that in my scenario I'm using the linker plugin. The lto-wrapper is not being executed (and thus no mkoffload being run), because »num_claimed_files == 0«. In lto-plugin/lto-plugin.c:process_symtab, only LTO_SECTION_PREFIX (.gnu.lto_.symtab) is considered, which (correctly so) is not generated anymore by GCC in the new scenario, but .gnu.target_lto_ is not considered there. (Should this maybe look only for the LTO marker __gnu_lto_v1, or am I misunderstanding what this is doing?) If I make that also accept the offloading section, the compilation process proceeds further, but still fails, because no resolution file is available: »[...]/ld: cannot find -fresolution=/tmp/cc7xeiW0.res: No such file or directory«. Is this enough information for someone who is more familiar with the design/code to already see what needs to be done? Grüße, Thomas pgpFNyou7QLiL.pgp Description: PGP signature
Re: [gomp4] Add tables generation
On 04 Sep 15:08, Thomas Schwinge wrote: I'm facing one problem; I guess the crucial detail is that in my scenario I'm using the linker plugin. The lto-wrapper is not being executed (and thus no mkoffload being run), because »num_claimed_files == 0«. Yeah I missed that. I tested this patch on the system with old ld, therefore the scenario with linker plugin wasn't tested. I'm going to fix my patch. -- Ilya
Re: [gomp4] Add tables generation
Hi! On Thu, 04 Sep 2014 15:08:06 +0200, I wrote: On Tue, 2 Sep 2014 21:49:46 +0400, Ilya Verbin iver...@gmail.com wrote: This patch allows to compile binaries with offloading without passing -flto option, and w/o performing link-time optimizations of the host code. Thanks for working on this! How it works: 1. If there is at least one function or global variable to offload, gcc sets flag_generate_lto. This enables writing the bytecode produced by ipa_write_summaries into .gnu.target_lto_* sections (.gnu.lto_* sections are not created). Also this flag emits LTO marker (__gnu_lto_v1). 2. This step is not changed: collect2 scans object files for the LTO marker and fills the list of LTO objects. If the list is not empty, it runs lto-wrapper to perform link-time recompilation. 3. lto-wrapper compiles images for targets. And if -flto option is absent (lto_mode == LTO_MODE_NONE), then it just returns the list of input objects without recompilation. That seems sane to me. (But you guys have looked into this design/code in much more detail than I have.) I'm facing one problem; I guess the crucial detail is that in my scenario I'm using the linker plugin. The lto-wrapper is not being executed (and thus no mkoffload being run), because »num_claimed_files == 0«. In lto-plugin/lto-plugin.c:process_symtab, only LTO_SECTION_PREFIX (.gnu.lto_.symtab) is considered, which (correctly so) is not generated anymore by GCC in the new scenario, but .gnu.target_lto_ is not considered there. (Should this maybe look only for the LTO marker __gnu_lto_v1, or am I misunderstanding what this is doing?) If I make that also accept the offloading section, the compilation process proceeds further, but still fails, because no resolution file is available: »[...]/ld: cannot find -fresolution=/tmp/cc7xeiW0.res: No such file or directory«. Is this enough information for someone who is more familiar with the design/code to already see what needs to be done? Aha, it's gcc/gcc.c:LINK_PLUGIN_SPEC that is unconditionally adding the -fresolution option. Here is a hack that seems to make it work, but that most certainly should be done differently: commit 9de71e209f5a75454ddb6922009425eb1f6bec1c Author: Thomas Schwinge tho...@codesourcery.com Date: Thu Sep 4 15:44:37 2014 +0200 Hack for offloading without -flto, with linker plugin. diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c index d40f8ae..9556cdc 100644 --- gcc/lto-wrapper.c +++ gcc/lto-wrapper.c @@ -910,7 +910,8 @@ run_gcc (unsigned argc, char *argv[]) there is no need to perform a link-time recompilation, i.e. lto-wrapper is used only for compiling offload images. */ for (i = 1; i argc; ++i) - printf (%s\n, argv[i]); + if (strncmp (argv[i], -fresolution=, sizeof (-fresolution=) - 1)) + printf (%s\n, argv[i]); goto finish; } else if (lto_mode == LTO_MODE_LTO) diff --git lto-plugin/lto-plugin.c lto-plugin/lto-plugin.c index 910e23c..a397276 100644 --- lto-plugin/lto-plugin.c +++ lto-plugin/lto-plugin.c @@ -84,8 +84,8 @@ along with this program; see the file COPYING3. If not see /* LTO magic section name. */ -#define LTO_SECTION_PREFIX .gnu.lto_.symtab -#define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) +#define LTO_SECTION_SYMTAB .gnu.lto_.symtab +#define OMP_SECTION_SYMTAB .gnu.target_lto_.symtab /* The part of the symbol table the plugin has to keep track of. Note that we must keep SYMS until all_symbols_read is called to give the linker time to @@ -820,7 +820,8 @@ process_symtab (void *data, const char *name, off_t offset, off_t length) char *s; char *secdatastart, *secdata; - if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0) + if (strncmp (name, LTO_SECTION_SYMTAB, strlen (LTO_SECTION_SYMTAB)) != 0 + strncmp (name, OMP_SECTION_SYMTAB, strlen (OMP_SECTION_SYMTAB)) != 0) return 1; s = strrchr (name, '.'); Grüße, Thomas pgpS9TCoy7rDh.pgp Description: PGP signature
Re: [gomp4] Add tables generation
Hi! On Mon, 18 Aug 2014 20:07:59 +0400, Ilya Verbin iver...@gmail.com wrote: I discovered an issue in the LTO streaming out for target - currently any file (even without any pragma) compiled with -fopenmp/-fopenacc contains .gnu.target_lto_* sections. This increases the size of an object file and makes lto-wrapper to run mkoffload. Therefore, I propose to replace the condition before ipa_write_summaries: - if (flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) !(vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars))) But to do this, the offload_vars must be filled before the check (offload_funcs is already filled in expand_omp_target). Here is the updated patch. Bootstrap passed. OK for gomp-4_0-branch? On 13 Aug 20:19, Ilya Verbin wrote: Here is the updated patch. offload_funcs/vars are now declared in omp-low.h, the functions have a comment. Also it fixes the issue of offload_funcs/vars corruption by the garbage collector. OK for gomp-4_0-branch? --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -8906,6 +8909,9 @@ expand_omp_target (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_add_new_function (child_fn, true); + /* Add the new function to the offload table. */ + vec_safe_push (offload_funcs, child_fn); + /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); The same change needs to be done for OpenACC offloading; addressed in r214892: commit 9fb900482bd3bca9bfa89301e417174caabd7176 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Sep 3 19:10:43 2014 + Restore OpenACC offloading. gcc/ * omp-low.c (expand_oacc_offload): Add child_fn to offload_funcs. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@214892 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 4 gcc/omp-low.c | 3 +++ 2 files changed, 7 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 40688df..0c55814 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,7 @@ +2014-09-03 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (expand_oacc_offload): Add child_fn to offload_funcs. + 2014-08-19 Ilya Verbin ilya.ver...@intel.com * Makefile.in (GTFILES): Add omp-low.h. diff --git gcc/omp-low.c gcc/omp-low.c index 1ad98ab..6ed8239 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -5351,6 +5351,9 @@ expand_oacc_offload (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_add_new_function (child_fn, true); + /* Add the new function to the offload table. */ + vec_safe_push (offload_funcs, child_fn); + /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); Grüße, Thomas pgpr85p9Z8mFN.pgp Description: PGP signature
Re: [gomp4] Add tables generation
Hi Bernd, This patch allows to compile binaries with offloading without passing -flto option, and w/o performing link-time optimizations of the host code. How it works: 1. If there is at least one function or global variable to offload, gcc sets flag_generate_lto. This enables writing the bytecode produced by ipa_write_summaries into .gnu.target_lto_* sections (.gnu.lto_* sections are not created). Also this flag emits LTO marker (__gnu_lto_v1). 2. This step is not changed: collect2 scans object files for the LTO marker and fills the list of LTO objects. If the list is not empty, it runs lto-wrapper to perform link-time recompilation. 3. lto-wrapper compiles images for targets. And if -flto option is absent (lto_mode == LTO_MODE_NONE), then it just returns the list of input objects without recompilation. One known issue -- the final binary contains temporary .gnu.target_lto_* sections. This can be solved by adding the following linker script to the list of input files: SECTIONS { /DISCARD/ : { *(.gnu.target_lto_*) } } But I'm sure what is the best way to this automatically. Bootstrap and make check passed, tests with '#pragma omp target' without -flto passed. What do you think? Thanks, -- Ilya --- gcc/cgraphunit.c | 39 +++ gcc/lto-wrapper.c | 68 +-- gcc/omp-low.c | 6 + gcc/passes.c | 2 +- 4 files changed, 73 insertions(+), 42 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index f0c9f5c..32b35f3 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2040,13 +2040,26 @@ output_in_order (void) free (nodes); } -/* Collect all global variables with omp declare target attribute into - OFFLOAD_VARS. It will be streamed out in ipa_write_summaries. */ +/* Check whether there is at least one function or global variable to offload. + Also collect all such global variables into OFFLOAD_VARS, the functions were + already collected in omp-low.c. They will be streamed out in + ipa_write_summaries. */ -static void -init_offload_var_table (void) +static bool +initialize_offload (void) { + bool have_offload = false; + struct cgraph_node *node; struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl))) + { + have_offload = true; + break; + } + FOR_EACH_DEFINED_VARIABLE (vnode) { if (!lookup_attribute (omp declare target, @@ -2054,13 +2067,17 @@ init_offload_var_table (void) || TREE_CODE (vnode-decl) != VAR_DECL || DECL_SIZE (vnode-decl) == 0) continue; + have_offload = true; vec_safe_push (offload_vars, vnode-decl); } + + return have_offload; } static void ipa_passes (void) { + bool have_offload = false; gcc::pass_manager *passes = g-get_passes (); set_cfun (NULL); @@ -2068,6 +2085,14 @@ ipa_passes (void) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); + if (!in_lto_p (flag_openacc || flag_openmp)) +{ + have_offload = initialize_offload (); + /* OpenACC / OpenMP offloading requires LTO infrastructure. */ + if (have_offload) + flag_generate_lto = 1; +} + invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL); if (!in_lto_p) @@ -2108,11 +2133,7 @@ ipa_passes (void) if (!in_lto_p) { - init_offload_var_table (); - - if ((flag_openacc || flag_openmp) - !(vec_safe_is_empty (offload_funcs) - vec_safe_is_empty (offload_vars))) + if (have_offload) { section_name_prefix = OMP_SECTION_NAME_PREFIX; ipa_write_summaries (true); diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 80d10f3..e9245f1 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -668,6 +668,11 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } + /* We may choose not to write out this .opts section in the future. In +that case we'll have to use something else to look for. */ + if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, + offset, length, errmsg, err)) + have_offload = true; if (!simple_object_find_section (sobj, LTO_SECTION_NAME_PREFIX . opts, offset, length, errmsg, err)) { @@ -675,11 +680,6 @@ run_gcc (unsigned argc, char *argv[]) close (fd); continue; } - /* We may choose not to write out this .opts section in the future. In -that case we'll have to use something else to look for. */ - if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX . opts, - offset, length, errmsg, err)) - have_offload = true; lseek (fd, file_offset + offset, SEEK_SET); data = (char *)xmalloc (length);
Re: [gomp4] Add tables generation
On 18 Aug 18:24, Bernd Schmidt wrote: I think I'd be happier if the function was called init_offload_table and also collected functions, rather than leaving that to expand_omp_target. I think the patch would be ok with that change. For the functions it's not so easy to identify which of them to add into the table, e.g.: #pragma omp target #pragma omp parallel x++; Here 2 functions with omp declare target attribute are created. But only the outer must be added to the table. So I believe that expand_omp_target is better place for the functions. -- Ilya
Re: [gomp4] Add tables generation
On 08/19/2014 12:41 PM, Ilya Verbin wrote: On 18 Aug 18:24, Bernd Schmidt wrote: I think I'd be happier if the function was called init_offload_table and also collected functions, rather than leaving that to expand_omp_target. I think the patch would be ok with that change. For the functions it's not so easy to identify which of them to add into the table, e.g.: #pragma omp target #pragma omp parallel x++; Here 2 functions with omp declare target attribute are created. But only the outer must be added to the table. So I believe that expand_omp_target is better place for the functions. Hmm, ok. Can you elaborate how this happens and why only one must be added to the table? In any case, I think let's get this checked in for now and iterate later. Bernd
Re: [gomp4] Add tables generation
On 19 Aug 13:55, Bernd Schmidt wrote: On 08/19/2014 12:41 PM, Ilya Verbin wrote: For the functions it's not so easy to identify which of them to add into the table, e.g.: #pragma omp target #pragma omp parallel x++; Here 2 functions with omp declare target attribute are created. But only the outer must be added to the table. So I believe that expand_omp_target is better place for the functions. Hmm, ok. Can you elaborate how this happens and why only one must be added to the table? Here is gimple for this testcase: foo () { /* prepare data */ __builtin_GOMP_target (-1, foo._omp_fn.0, /* data */); } foo._omp_fn.0 (struct .omp_data_t.0 * .omp_data_i) { /* prepare data */ __builtin_GOMP_parallel (foo._omp_fn.1, /* data */); } foo._omp_fn.1 (struct .omp_data_s.1 * .omp_data_i) { _3 = .omp_data_i_2(D)-x; _4 = _3 + 1; .omp_data_i_2(D)-x = _4; } Both fn.0 and fn.1 can be executed on host and on target, therefore they have omp declare target attribute. And there are 2 alternatives during runtime: 1. GOMP_target calls fn.0 on host, which calls fn.1 on host. 2. GOMP_target offloads fn.0 and fn.1, and runs fn.0 on target, which calls fn.1. So, there is only one entry point for GOMP_target - fn.0, and GOMP_target can't run fn.1 on target directly, that's why only fn.0 must be added to the table. -- Ilya
Re: [gomp4] Add tables generation
On 19 Aug 13:55, Bernd Schmidt wrote: In any case, I think let's get this checked in for now and iterate later. Committed revision 214148. -- Ilya
Re: [gomp4] Add tables generation
Hi Bernd, I discovered an issue in the LTO streaming out for target - currently any file (even without any pragma) compiled with -fopenmp/-fopenacc contains .gnu.target_lto_* sections. This increases the size of an object file and makes lto-wrapper to run mkoffload. Therefore, I propose to replace the condition before ipa_write_summaries: - if (flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) !(vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars))) But to do this, the offload_vars must be filled before the check (offload_funcs is already filled in expand_omp_target). Here is the updated patch. Bootstrap passed. OK for gomp-4_0-branch? On 13 Aug 20:19, Ilya Verbin wrote: Here is the updated patch. offload_funcs/vars are now declared in omp-low.h, the functions have a comment. Also it fixes the issue of offload_funcs/vars corruption by the garbage collector. OK for gomp-4_0-branch? --- gcc/Makefile.in|1 + gcc/cgraphunit.c | 25 - gcc/gengtype.c |2 +- gcc/lto-cgraph.c | 93 gcc/lto-section-in.c |3 +- gcc/lto-streamer-out.c |2 + gcc/lto-streamer.h |3 ++ gcc/lto/lto.c |2 + gcc/omp-low.c | 74 -- gcc/omp-low.h |3 ++ 10 files changed, 147 insertions(+), 61 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index bfa5f32..372f586 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2290,6 +2290,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \ $(srcdir)/tree-parloops.c \ $(srcdir)/omp-low.c \ + $(srcdir)/omp-low.h \ $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \ $(srcdir)/cgraphclones.c \ $(srcdir)/tree-phinodes.c \ diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 308c534..f0c9f5c 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -211,6 +211,7 @@ along with GCC; see the file COPYING3. If not see #include tree-nested.h #include gimplify.h #include dbgcnt.h +#include omp-low.h #include lto-section-names.h /* Queue of cgraph nodes scheduled to be added into cgraph. This is a @@ -2039,6 +2040,24 @@ output_in_order (void) free (nodes); } +/* Collect all global variables with omp declare target attribute into + OFFLOAD_VARS. It will be streamed out in ipa_write_summaries. */ + +static void +init_offload_var_table (void) +{ + struct varpool_node *vnode; + FOR_EACH_DEFINED_VARIABLE (vnode) +{ + if (!lookup_attribute (omp declare target, +DECL_ATTRIBUTES (vnode-decl)) + || TREE_CODE (vnode-decl) != VAR_DECL + || DECL_SIZE (vnode-decl) == 0) + continue; + vec_safe_push (offload_vars, vnode-decl); +} +} + static void ipa_passes (void) { @@ -2089,7 +2108,11 @@ ipa_passes (void) if (!in_lto_p) { - if (flag_openacc || flag_openmp) + init_offload_var_table (); + + if ((flag_openacc || flag_openmp) + !(vec_safe_is_empty (offload_funcs) + vec_safe_is_empty (offload_vars))) { section_name_prefix = OMP_SECTION_NAME_PREFIX; ipa_write_summaries (true); diff --git a/gcc/gengtype.c b/gcc/gengtype.c index ffe3f94..5bcbbe2 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -1800,7 +1800,7 @@ open_base_files (void) tree-ssa.h, reload.h, cpp-id-data.h, tree-chrec.h, except.h, output.h, cfgloop.h, target.h, ipa-prop.h, lto-streamer.h, target-globals.h, - ipa-inline.h, dwarf2out.h, NULL + ipa-inline.h, dwarf2out.h, omp-low.h, NULL }; const char *const *ifp; outf_p gtype_desc_c; diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index bc05400..8fb7078 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include context.h #include pass_manager.h #include ipa-utils.h +#include omp-low.h /* True when asm nodes has been output. */ bool asm_nodes_output = false; @@ -1044,6 +1045,49 @@ read_string (struct lto_input_block *ib) return str; } +/* Output function/variable tables that will allow libgomp to look up offload + target code. OFFLOAD_FUNCS is filled in expand_omp_target, OFFLOAD_VARS is + filled in ipa_passes. In WHOPR (partitioned) mode during the WPA stage both + OFFLOAD_FUNCS and OFFLOAD_VARS are filled by input_offload_tables. */ + +void +output_offload_tables (void) +{ + if (vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars)) +return; + + struct lto_simple_output_block *ob += lto_create_simple_output_block (LTO_section_offload_table); + + for (unsigned i = 0; i vec_safe_length (offload_funcs); i++) +{ + streamer_write_enum (ob-main_stream, LTO_symtab_tags, + LTO_symtab_last_tag,
Re: [gomp4] Add tables generation
On 08/18/2014 06:07 PM, Ilya Verbin wrote: I discovered an issue in the LTO streaming out for target - currently any file (even without any pragma) compiled with -fopenmp/-fopenacc contains .gnu.target_lto_* sections. This increases the size of an object file and makes lto-wrapper to run mkoffload. Therefore, I propose to replace the condition before ipa_write_summaries: - if (flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) !(vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars))) But to do this, the offload_vars must be filled before the check (offload_funcs is already filled in expand_omp_target). Here is the updated patch. Bootstrap passed. OK for gomp-4_0-branch? I think I'd be happier if the function was called init_offload_table and also collected functions, rather than leaving that to expand_omp_target. I think the patch would be ok with that change. Bernd
Re: [gomp4] Add tables generation
Hi, Here is the updated patch. offload_funcs/vars are now declared in omp-low.h, the functions have a comment. Also it fixes the issue of offload_funcs/vars corruption by the garbage collector. OK for gomp-4_0-branch? -- Ilya --- gcc/Makefile.in|1 + gcc/gengtype.c |2 +- gcc/lto-cgraph.c | 110 gcc/lto-section-in.c |3 +- gcc/lto-streamer-out.c |2 + gcc/lto-streamer.h |3 + gcc/lto/lto.c |2 + gcc/omp-low.c | 68 ++ gcc/omp-low.h |3 + 9 files changed, 137 insertions(+), 57 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index bfa5f32..372f586 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2290,6 +2290,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \ $(srcdir)/tree-parloops.c \ $(srcdir)/omp-low.c \ + $(srcdir)/omp-low.h \ $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \ $(srcdir)/cgraphclones.c \ $(srcdir)/tree-phinodes.c \ diff --git a/gcc/gengtype.c b/gcc/gengtype.c index ffe3f94..5bcbbe2 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -1800,7 +1800,7 @@ open_base_files (void) tree-ssa.h, reload.h, cpp-id-data.h, tree-chrec.h, except.h, output.h, cfgloop.h, target.h, ipa-prop.h, lto-streamer.h, target-globals.h, - ipa-inline.h, dwarf2out.h, NULL + ipa-inline.h, dwarf2out.h, omp-low.h, NULL }; const char *const *ifp; outf_p gtype_desc_c; diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index bc05400..64ad599 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include context.h #include pass_manager.h #include ipa-utils.h +#include omp-low.h /* True when asm nodes has been output. */ bool asm_nodes_output = false; @@ -1044,6 +1045,66 @@ read_string (struct lto_input_block *ib) return str; } +/* Output function/variable tables that will allow libgomp to look up offload + target code. OFFLOAD_FUNCS is filled in expand_omp_target, OFFLOAD_VARS is + filled here just before streaming. In WHOPR (partitioned) mode during the + WPA stage both OFFLOAD_FUNCS and OFFLOAD_VARS are filled by + input_offload_tables. */ + +void +output_offload_tables (void) +{ + /* Collect all omp-target global variables to offload_vars, if they have not + been gathered earlier by input_offload_tables on the WPA stage. */ + if (!flag_wpa vec_safe_is_empty (offload_vars)) +{ + struct varpool_node *vnode; + FOR_EACH_DEFINED_VARIABLE (vnode) + { + if (!lookup_attribute (omp declare target, +DECL_ATTRIBUTES (vnode-decl)) + || TREE_CODE (vnode-decl) != VAR_DECL + || DECL_SIZE (vnode-decl) == 0) + continue; + vec_safe_push (offload_vars, vnode-decl); + } +} + + if (vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars)) +return; + + struct lto_simple_output_block *ob += lto_create_simple_output_block (LTO_section_offload_table); + + for (unsigned i = 0; i vec_safe_length (offload_funcs); i++) +{ + streamer_write_enum (ob-main_stream, LTO_symtab_tags, + LTO_symtab_last_tag, LTO_symtab_unavail_node); + lto_output_fn_decl_index (ob-decl_state, ob-main_stream, + (*offload_funcs)[i]); +} + + for (unsigned i = 0; i vec_safe_length (offload_vars); i++) +{ + streamer_write_enum (ob-main_stream, LTO_symtab_tags, + LTO_symtab_last_tag, LTO_symtab_variable); + lto_output_var_decl_index (ob-decl_state, ob-main_stream, +(*offload_vars)[i]); +} + + streamer_write_uhwi_stream (ob-main_stream, 0); + lto_destroy_simple_output_block (ob); + + /* In WHOPR mode during the WPA stage the joint offload tables need to be + streamed to one partition only. That's why we free offload_funcs and + offload_vars after the first call of output_offload_tables. */ + if (flag_wpa) +{ + vec_free (offload_funcs); + vec_free (offload_vars); +} +} + /* Overwrite the information in NODE based on FILE_DATA, TAG, FLAGS, STACK_SIZE, SELF_TIME and SELF_SIZE. This is called either to initialize NODE or to replace the values in it, for instance because the first @@ -1739,6 +1800,55 @@ input_symtab (void) } } +/* Input function/variable tables that will allow libgomp to look up offload + target code, and store them into OFFLOAD_FUNCS and OFFLOAD_VARS. */ + +void +input_offload_tables (void) +{ + struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); + struct lto_file_decl_data *file_data; + unsigned int j = 0; + + while ((file_data = file_data_vec[j++])) +{ +
Re: [gomp4] Add tables generation
On 04/17/2014 08:33 PM, Ilya Verbin wrote: Could you please take a look at this patch? It fixes the ordering issue in the tables stated above, and passes all the tests that I have. But I'm not sure about its correctness from the architectural point of view. I'm still skeptical relying on ordering is going to work in the long run, but in the meantime this looks better than what we have at the moment. So I think this should probably go in for now, but first it needs a few small changes: --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -82,6 +82,8 @@ enum LTO_symtab_tags LTO_symtab_last_tag }; +extern vectree, va_gc *offload_funcs, *offload_vars; Declarations go into header files. +void +output_offload_tables (void) All functions should have a comment. +{ + /* Collect all omp-target global variables to offload_vars, if they have not + been gathered earlier by input_offload_tables. */ + if (vec_safe_is_empty (offload_vars)) What if a variable was entered into the table by something other than input_offload_tables? We'll skip this code entirely, which doesn't seem right. Can we even get here after input_offload_tables has been called, and if so, maybe this step of collecting variables belongs elsewhere? Also, the previous code did the same for functions, and I can't find anything corresponding to that after the patch. Is this intentional? Bernd
Re: [gomp4] Add tables generation
On 10 Jun 15:52, Bernd Schmidt wrote: On 04/17/2014 08:33 PM, Ilya Verbin wrote: +{ + /* Collect all omp-target global variables to offload_vars, if they have not + been gathered earlier by input_offload_tables. */ + if (vec_safe_is_empty (offload_vars)) What if a variable was entered into the table by something other than input_offload_tables? We'll skip this code entirely, which doesn't seem right. Can we even get here after input_offload_tables has been called, and if so, maybe this step of collecting variables belongs elsewhere? Also, the previous code did the same for functions, and I can't find anything corresponding to that after the patch. Is this intentional? I'll try to explain with an example bellow: Suppose there are 2 source files: test1.c and test2.c. 1. During the compilation of test1.c: 1.1. In expand_omp_target gcc adds new target functions into offload_funcs; 1.2. In output_offload_tables gcc adds all target variables into offload_vars; 1.3. In output_offload_tables gcc streams offload_funcs/vars into TARGET LTO_section_offload_table. And if there is -flto, it also streams them into the HOST LTO_section_offload_table; 1.4. In omp_finish_file gcc writes addresses from offload_funcs/vars into test1.o. 2. The same steps happen for test2.c. 3a. If there is no -flto, ld will join raw tables from test1.o and test2.o. And accel compiler will join tables from target LTO_section_offload_table. For now this mode isn't implemented, to run accel compiler we need -flto. 3b. If there is -flto (let's consider WHOPR mode, since LTO mode is simpler), there are 2 stages: 3.1. WPA: 3.1.1. In input_offload_tables gcc reads host LTO_section_offload_table from test1.o and test2.o; 3.1.2. In output_offload_tables gcc streams the joined tables into LTO_section_offload_table in the new partition xxx.ltrans0.o; 3.2. LTRANS: 3.2.1. In input_offload_tables gcc reads host LTO_section_offload_table from xxx.ltrans0.o; 3.2.2. In omp_finish_file gcc writes addresses from offload_funcs/vars into the final xxx.ltrans0.ltrans.o. So, the question is what is the right place for collecting decls into offload_funcs/vars? I collect offload_funcs in expand_omp_target where they're created. But for offload_vars I couldn't find a place better than output_offload_tables. That's why I added if (vec_safe_is_empty (offload_vars)). If the var decls have been read by input_offload_tables on the step 3.1.1, there is no need to collect them from FOR_EACH_DEFINED_VARIABLE on the step 3.1.2, because that order might be incorrect. Thanks, -- Ilya
Re: [gomp4] Add tables generation
On 05/08/2014 12:11 PM, Bernd Schmidt wrote: On 05/06/2014 05:32 PM, Ilya Verbin wrote: On 05 Apr 17:22, Bernd Schmidt wrote: Things seemed to work over here, but now I'm not certain whether the __start_/__stop_ functionality is GNU ld specific? Maybe we should just go back to the previous version of this patch which didn't try to use this. Bernd This approach does not work with shared libraries. The automatically inserted symbols have GLOBAL binding, therefore the __start_/__stop_ from the executable overwrite the respective symbols in DSO. Ok, I guess we should just go back to what we had previously. Here's what I intend to commit if there are no objections. ... plus the following to make it work with the changed section names. Bernd Index: libgcc/ompstuff.c === --- libgcc/ompstuff.c (revision 432834) +++ libgcc/ompstuff.c (working copy) @@ -44,10 +44,10 @@ see the files COPYING3 and COPYING.RUNTI #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) void *_omp_func_table[0] __attribute__ ((__used__, visibility (hidden), - section (.offload_func_table_section))) = { }; + section (__gnu_offload_funcs))) = { }; void *_omp_var_table[0] __attribute__ ((__used__, visibility (hidden), - section (.offload_var_table_section))) = { }; + section (__gnu_offload_vars))) = { }; #endif #elif defined CRT_END @@ -55,10 +55,10 @@ void *_omp_var_table[0] #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) void *_omp_funcs_end[0] __attribute__ ((__used__, visibility (hidden), - section (.offload_func_table_section))) = { }; + section (__gnu_offload_funcs))) = { }; void *_omp_vars_end[0] __attribute__ ((__used__, visibility (hidden), - section (.offload_var_table_section))) = { }; + section (__gnu_offload_vars))) = { }; extern void *_omp_func_table[]; extern void *_omp_var_table[]; void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (hidden))) =
Re: [gomp4] Add tables generation
On 05/06/2014 05:32 PM, Ilya Verbin wrote: On 05 Apr 17:22, Bernd Schmidt wrote: Things seemed to work over here, but now I'm not certain whether the __start_/__stop_ functionality is GNU ld specific? Maybe we should just go back to the previous version of this patch which didn't try to use this. Bernd This approach does not work with shared libraries. The automatically inserted symbols have GLOBAL binding, therefore the __start_/__stop_ from the executable overwrite the respective symbols in DSO. Ok, I guess we should just go back to what we had previously. Here's what I intend to commit if there are no objections. Bernd Index: gcc/lto-wrapper.c === --- gcc/lto-wrapper.c (revision 210170) +++ gcc/lto-wrapper.c (working copy) @@ -66,7 +66,7 @@ static unsigned int nr; static char **input_names; static char **output_names; static char **offload_names; -static const char *ompend; +static const char *ompbegin, *ompend; static char *makefile; const char tool_name[] = lto-wrapper; @@ -554,30 +554,40 @@ copy_file (const char *dest, const char } } -/* Find the crtompend.o file in LIBRARY_PATH, make a copy and store - the name of the copy in ompend. */ +/* Find the omp_begin.o and omp_end.o files in LIBRARY_PATH, make copies + and store the names of the copies in ompbegin and ompend. */ static void -find_ompend (void) +find_ompbeginend (void) { char **paths; const char *library_path = getenv (LIBRARY_PATH); if (library_path == NULL) return; - int n_paths = parse_env_var (library_path, paths, /crtompend.o); + int n_paths = parse_env_var (library_path, paths, /crtompbegin.o); - for (int i = 0; i n_paths; i++) + int i; + for (i = 0; i n_paths; i++) if (access_check (paths[i], R_OK) == 0) { + size_t len = strlen (paths[i]); + char *tmp = xstrdup (paths[i]); + strcpy (paths[i] + len - 7, end.o); + if (access_check (paths[i], R_OK) != 0) + fatal (installation error, can't find crtompend.o); /* The linker will delete the filenames we give it, so make copies. */ - const char *omptmp = make_temp_file (.o); - copy_file (omptmp, paths[i]); - ompend = omptmp; + const char *omptmp1 = make_temp_file (.o); + const char *omptmp2 = make_temp_file (.o); + copy_file (omptmp1, tmp); + ompbegin = omptmp1; + copy_file (omptmp2, paths[i]); + ompend = omptmp2; + free (tmp); break; } - if (ompend == 0) -fatal (installation error, can't find crtompend.o); + if (i == n_paths) +fatal (installation error, can't find crtompbegin.o); free_array_of_ptrs ((void**) paths, n_paths); } @@ -1073,7 +1083,7 @@ cont: compile_images_for_openmp_targets (argc, argv); if (offload_names) { - find_ompend (); + find_ompbeginend (); for (i = 0; offload_names[i]; i++) { fputs (offload_names[i], stdout); @@ -1082,6 +1092,11 @@ cont: free_array_of_ptrs ((void **)offload_names, i); } } + if (ompbegin) + { + fputs (ompbegin, stdout); + putc ('\n', stdout); + } for (i = 0; i nr; ++i) { Index: libgcc/Makefile.in === --- libgcc/Makefile.in (revision 210170) +++ libgcc/Makefile.in (working copy) @@ -975,6 +975,9 @@ crtbegin$(objext): $(srcdir)/crtstuff.c crtend$(objext): $(srcdir)/crtstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $ -DCRT_END +crtompbegin$(objext): $(srcdir)/ompstuff.c + $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $ -DCRT_BEGIN + crtompend$(objext): $(srcdir)/ompstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $ -DCRT_END Index: libgcc/configure === --- libgcc/configure (revision 210170) +++ libgcc/configure (working copy) @@ -4397,7 +4397,7 @@ fi if test x$offload_targets != x; then - extra_parts=${extra_parts} crtompend.o + extra_parts=${extra_parts} crtompbegin.o crtompend.o fi # Check if Solaris/x86 linker supports ZERO terminator unwind entries. Index: libgcc/configure.ac === --- libgcc/configure.ac (revision 210170) +++ libgcc/configure.ac (working copy) @@ -336,7 +336,7 @@ AC_ARG_ENABLE(offload-targets, ], [enable_accelerator=no]) AC_SUBST(enable_accelerator) if test x$offload_targets != x; then - extra_parts=${extra_parts} crtompend.o + extra_parts=${extra_parts} crtompbegin.o crtompend.o fi # Check if Solaris/x86 linker supports ZERO terminator unwind entries. Index: libgcc/ompstuff.c === --- libgcc/ompstuff.c (revision 210170) +++ libgcc/ompstuff.c (working copy) @@ -39,14 +39,35 @@ see the files COPYING3 and COPYING.RUNTI #include tm.h #include libgcc_tm.h +#ifdef CRT_BEGIN + #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) -extern void __start___gnu_offload_funcs; -extern void __stop___gnu_offload_funcs;
Re: [gomp4] Add tables generation
On 05 Apr 17:22, Bernd Schmidt wrote: Things seemed to work over here, but now I'm not certain whether the __start_/__stop_ functionality is GNU ld specific? Maybe we should just go back to the previous version of this patch which didn't try to use this. Bernd This approach does not work with shared libraries. The automatically inserted symbols have GLOBAL binding, therefore the __start_/__stop_ from the executable overwrite the respective symbols in DSO. Here is a simple example with 2 DSOs and one executable. The function GOMP_offload_register is called with the following pointers in HOST_TABLE: 1. (funcs 0x604880:0x604898, vars 0x604840:0x604880) 2. (funcs 0x604880:0x604898, vars 0x604840:0x604880) 3. (funcs 0x604880:0x604898, vars 0x604840:0x604880) But with manually added start/stop and LOCAL binding everything works fine: 1. (funcs 0x7f286b425530:0x7f286b425540, vars 0x7f286b425540:0x7f286b425540) 2. (funcs 0x7f286b8624a0:0x7f286b8624b0, vars 0x7f286b8624b0:0x7f286b8624b0) 3. (funcs 0x604760:0x604778, vars 0x604780:0x6047c0) -- Ilya
Re: [gomp4] Add tables generation
On 17 Apr 22:33, Ilya Verbin wrote: Hi Jakub, Could you please take a look at this patch? It fixes the ordering issue in the tables stated above, and passes all the tests that I have. But I'm not sure about its correctness from the architectural point of view. Thanks, -- Ilya Ping.
Re: [gomp4] Add tables generation
On 27 Mar 17:16, Jakub Jelinek wrote: On Thu, Mar 27, 2014 at 08:13:00PM +0400, Ilya Verbin wrote: On 27 Mar 15:02, Jakub Jelinek wrote: The tables need to be created before IPA, that way it really shouldn't matter in what order you emit them. E.g. the outlined target functions could be added to the table during ompexp pass which actually creates the outlined functions, the vars need to be added before target lto or host lto is streamed. For host tables it's ok, but when target compiler will create tables with functions? It reads bytecode from target_lto sections, so it never executes ompexp pass. Which is why the table created for host by the ompexp pass should be streamed into the target_lto sections (marked specially somehow, special attribute or whatever), and then corresponding target table created from that, rather then created from some possibly different ordering there. Jakub Hi Jakub, Could you please take a look at this patch? It fixes the ordering issue in the tables stated above, and passes all the tests that I have. But I'm not sure about its correctness from the architectural point of view. --- gcc/lto-cgraph.c | 93 ++ gcc/lto-section-in.c | 3 +- gcc/lto-streamer-out.c | 2 ++ gcc/lto-streamer.h | 3 ++ gcc/lto/lto.c | 2 ++ gcc/omp-low.c | 68 +++- 6 files changed, 115 insertions(+), 56 deletions(-) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 544f04b..3d6637e 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -82,6 +82,8 @@ enum LTO_symtab_tags LTO_symtab_last_tag }; +extern vectree, va_gc *offload_funcs, *offload_vars; + /* Create a new symtab encoder. if FOR_INPUT, the encoder allocate only datastructures needed to read the symtab. */ @@ -958,6 +960,51 @@ output_symtab (void) output_refs (encoder); } +void +output_offload_tables (void) +{ + /* Collect all omp-target global variables to offload_vars, if they have not + been gathered earlier by input_offload_tables. */ + if (vec_safe_is_empty (offload_vars)) +{ + struct varpool_node *vnode; + FOR_EACH_DEFINED_VARIABLE (vnode) + { + if (!lookup_attribute (omp declare target, +DECL_ATTRIBUTES (vnode-decl)) + || TREE_CODE (vnode-decl) != VAR_DECL + || DECL_SIZE (vnode-decl) == 0) + continue; + vec_safe_push (offload_vars, vnode-decl); + } +} + + if (vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars)) +return; + + struct lto_simple_output_block *ob += lto_create_simple_output_block (LTO_section_offload_table); + + for (unsigned i = 0; i vec_safe_length (offload_funcs); i++) +{ + streamer_write_enum (ob-main_stream, LTO_symtab_tags, + LTO_symtab_last_tag, LTO_symtab_unavail_node); + lto_output_fn_decl_index (ob-decl_state, ob-main_stream, + (*offload_funcs)[i]); +} + + for (unsigned i = 0; i vec_safe_length (offload_vars); i++) +{ + streamer_write_enum (ob-main_stream, LTO_symtab_tags, + LTO_symtab_last_tag, LTO_symtab_variable); + lto_output_var_decl_index (ob-decl_state, ob-main_stream, +(*offload_vars)[i]); +} + + streamer_write_uhwi_stream (ob-main_stream, 0); + lto_destroy_simple_output_block (ob); +} + /* Overwrite the information in NODE based on FILE_DATA, TAG, FLAGS, STACK_SIZE, SELF_TIME and SELF_SIZE. This is called either to initialize NODE or to replace the values in it, for instance because the first @@ -1611,6 +1658,52 @@ input_symtab (void) } } +void +input_offload_tables (void) +{ + struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); + struct lto_file_decl_data *file_data; + unsigned int j = 0; + + while ((file_data = file_data_vec[j++])) +{ + const char *data; + size_t len; + struct lto_input_block *ib + = lto_create_simple_input_block (file_data, LTO_section_offload_table, +data, len); + if (!ib) + continue; + + enum LTO_symtab_tags tag + = streamer_read_enum (ib, LTO_symtab_tags, LTO_symtab_last_tag); + while (tag) + { + if (tag == LTO_symtab_unavail_node) + { + int decl_index = streamer_read_uhwi (ib); + tree fn_decl + = lto_file_decl_data_get_fn_decl (file_data, decl_index); + vec_safe_push (offload_funcs, fn_decl); + } + else if (tag == LTO_symtab_variable) + { + int decl_index = streamer_read_uhwi (ib); + tree var_decl + = lto_file_decl_data_get_var_decl (file_data, decl_index); + vec_safe_push (offload_vars, var_decl); + } +
Re: [gomp4] Add tables generation
On Sat, Apr 05, 2014 at 05:22:09PM +0200, Bernd Schmidt wrote: On 04/05/2014 05:04 PM, Thomas Schwinge wrote: Is it a linker bug that I need to add something like the following? --- libgcc/ompstuff.c +++ libgcc/ompstuff.c @@ -40,6 +40,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #include libgcc_tm.h #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +# if 1 +/* TODO: Without the following, will get missing symbols for __start_* and + __stop_*. Linker bug? */ +static void *_funcs[0] __attribute__ ((section (__gnu_offload_funcs))) = { }; +static void *_vars[0] __attribute__ ((section (__gnu_offload_vars))) = { }; +# endif Things seemed to work over here, but now I'm not certain whether the __start_/__stop_ functionality is GNU ld specific? Maybe we should just go back to the previous version of this patch which didn't try to use this. Somebody needs to try it with gold, I think it should support the same. As for other linkers, don't we need linker plugin support anyway, which is not available for other linkers? Jakub
Re: [gomp4] Add tables generation
Hi! On Fri, 4 Apr 2014 11:30:49 +0200, Bernd Schmidt ber...@codesourcery.com wrote: On 03/21/2014 04:20 PM, Jakub Jelinek wrote: On Fri, Mar 21, 2014 at 04:13:45PM +0100, Bernd Schmidt wrote: On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. I'd call the files crtompbegin.o/crtompend.o instead. I'd go with crtoffload* (or similar). ;-) Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. I've checked in the following which should address all this. Is it a linker bug that I need to add something like the following? --- libgcc/ompstuff.c +++ libgcc/ompstuff.c @@ -40,6 +40,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #include libgcc_tm.h #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +# if 1 +/* TODO: Without the following, will get missing symbols for __start_* and + __stop_*. Linker bug? */ +static void *_funcs[0] __attribute__ ((section (__gnu_offload_funcs))) = { }; +static void *_vars[0] __attribute__ ((section (__gnu_offload_vars))) = { }; +# endif extern void __start___gnu_offload_funcs; extern void __stop___gnu_offload_funcs; extern void __start___gnu_offload_vars; $ ld --version GNU ld (Sourcery CodeBench 2013.11-17) 2.23.52.20130912 [...] --- libgcc/ompstuff.c (revision 0) +++ libgcc/ompstuff.c (working copy) +extern void __start___gnu_offload_funcs; +extern void __stop___gnu_offload_funcs; +extern void __start___gnu_offload_vars; +extern void __stop___gnu_offload_vars; +void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (hidden))) = +{ + __start___gnu_offload_funcs, __stop___gnu_offload_funcs, + __start___gnu_offload_vars, __stop___gnu_offload_vars +}; ../../../source/libgcc/ompstuff.c:49:3: warning: taking address of expression of type 'void' __start___gnu_offload_funcs, __stop___gnu_offload_funcs, ^ ../../../source/libgcc/ompstuff.c:49:33: warning: taking address of expression of type 'void' __start___gnu_offload_funcs, __stop___gnu_offload_funcs, ^ ../../../source/libgcc/ompstuff.c:50:3: warning: taking address of expression of type 'void' __start___gnu_offload_vars, __stop___gnu_offload_vars ^ ../../../source/libgcc/ompstuff.c:50:32: warning: taking address of expression of type 'void' __start___gnu_offload_vars, __stop___gnu_offload_vars ^ s%void%char makes this go away. Grüße, Thomas pgpPQxXD1zF2u.pgp Description: PGP signature
Re: [gomp4] Add tables generation
On 04/05/2014 05:04 PM, Thomas Schwinge wrote: Is it a linker bug that I need to add something like the following? --- libgcc/ompstuff.c +++ libgcc/ompstuff.c @@ -40,6 +40,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #include libgcc_tm.h #if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +# if 1 +/* TODO: Without the following, will get missing symbols for __start_* and + __stop_*. Linker bug? */ +static void *_funcs[0] __attribute__ ((section (__gnu_offload_funcs))) = { }; +static void *_vars[0] __attribute__ ((section (__gnu_offload_vars))) = { }; +# endif Things seemed to work over here, but now I'm not certain whether the __start_/__stop_ functionality is GNU ld specific? Maybe we should just go back to the previous version of this patch which didn't try to use this. Bernd
Re: [gomp4] Add tables generation
On 04/04/2014 07:55 AM, Thomas Schwinge wrote: Hi! On Thu, 3 Apr 2014 18:13:08 +0200, Bernd Schmidt ber...@codesourcery.com wrote: The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Yes, it does, thanks! Please revert my patch when committing yours. Oh, and please use ChangeLog.gomp files on gomp-4_0-branch; also please move the entries for your recent commits from the ChangeLog file(s) to the respective ChangeLog.gomp one(s). All done. Bernd
Re: [gomp4] Add tables generation
On 03/21/2014 04:20 PM, Jakub Jelinek wrote: On Fri, Mar 21, 2014 at 04:13:45PM +0100, Bernd Schmidt wrote: On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. I'd call the files crtompbegin.o/crtompend.o instead. And, what is the exact reason why you are using protected visibility rather than hidden? Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. I've checked in the following which should address all this. Bernd Index: gcc/ChangeLog.gomp === --- gcc/ChangeLog.gomp (revision 209074) +++ gcc/ChangeLog.gomp (working copy) @@ -1,5 +1,15 @@ 2014-04-04 Bernd Schmidt ber...@codesourcery.com + * lto-section-names.h (OFFLOAD_VAR_TABLE_SECTION_NAME, + OFFLOAD_FUNC_TABLE_SECTION_NAME): Define. + * lto-wrapper.c (OFFLOAD_FUNC_TABLE_SECTION_NAME): Don't define. + (ompend): New static variable. + (copy_file, find_ompend): New static functions. + (run_gcc): Call find_ompend if we have offload images. Add its + return value to the output. + * omp-low.c: Include lto-section-names.h. + (omp_finish_file): Initialize section names from macros defined there. + * omp-low.c (offload_symbol_decl): New static variable. (get_offload_symbol_decl): New static function. (expand_oacc_offload, expand_omp_target): Use it. Index: gcc/lto-section-names.h === --- gcc/lto-section-names.h (revision 209072) +++ gcc/lto-section-names.h (working copy) @@ -31,3 +31,6 @@ along with GCC; see the file COPYING3. /* Can be either OMP_SECTION_NAME_PREFIX when we stream pragma omp target stuff, or LTO_SECTION_NAME_PREFIX for lto case. */ extern const char *section_name_prefix; + +#define OFFLOAD_VAR_TABLE_SECTION_NAME __gnu_offload_vars +#define OFFLOAD_FUNC_TABLE_SECTION_NAME __gnu_offload_funcs Index: gcc/lto-wrapper.c === --- gcc/lto-wrapper.c (revision 209072) +++ gcc/lto-wrapper.c (working copy) @@ -49,7 +49,6 @@ along with GCC; see the file COPYING3. #include lto-section-names.h #include collect-utils.h -#define OFFLOAD_FUNC_TABLE_SECTION_NAME .offload_func_table_section #define OFFLOAD_TARGET_NAMES_ENV OFFLOAD_TARGET_NAMES enum lto_mode_d { @@ -67,6 +66,7 @@ static unsigned int nr; static char **input_names; static char **output_names; static char **offload_names; +static const char *ompend; static char *makefile; const char tool_name[] = lto-wrapper; @@ -479,6 +479,54 @@ compile_images_for_openmp_targets (unsig free_array_of_ptrs ((void**) names, num_targets); } +/* Copy a file from SRC to DEST. */ +static void +copy_file (const char *dest, const char *src) +{ + FILE *d = fopen (dest, wb); + FILE *s = fopen (src, rb); + char buffer[512]; + while (!feof (s)) +{ + size_t len = fread (buffer, 1, 512, s); + if (ferror (s) != 0) + fatal (reading input file); + if (len 0) + { + fwrite (buffer, 1, len, d); + if (ferror (d) != 0) + fatal (writing output file); + } +} +} + +/* Find the crtompend.o file in LIBRARY_PATH, make a copy and store + the name of the copy in ompend. */ + +static void +find_ompend (void) +{ + char **paths; + const char *library_path = getenv (LIBRARY_PATH); + if (library_path == NULL) +return; + int n_paths = parse_env_var (library_path, paths, /crtompend.o); + + for (int i = 0; i n_paths; i++) +if (access_check (paths[i], R_OK) == 0) + { + /* The linker will delete the filenames we give it, so make + copies. */ + const char *omptmp = make_temp_file (.o); + copy_file (omptmp, paths[i]); + ompend = omptmp; + break; + } + if (ompend == 0) +fatal (installation error, can't find crtompend.o); + + free_array_of_ptrs ((void**) paths, n_paths); +} /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */ @@ -964,6 +1012,7 @@ cont: compile_images_for_openmp_targets (argc, argv); if (offload_names) { + find_ompend (); for (i = 0; offload_names[i]; i++) { fputs (offload_names[i], stdout); @@ -972,12 +1021,18 @@ cont: free_array_of_ptrs ((void **)offload_names, i); } } + for (i = 0; i nr; ++i) { fputs (output_names[i], stdout); putc ('\n',
Re: [gomp4] Add tables generation
On 04/02/2014 10:36 AM, Thomas Schwinge wrote: I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because [...] Workaround committed in r209015: libgcc/ * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to NULL. The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Bernd Index: gcc/omp-low.c === --- gcc/omp-low.c (revision 429741) +++ gcc/omp-low.c (working copy) @@ -221,6 +221,28 @@ static tree scan_omp_1_op (tree *, int * *handled_ops_p = false; \ break; +static GTY(()) tree offload_symbol_decl; + +/* Get the __OPENMP_TARGET__ symbol. */ +static tree +get_offload_symbol_decl (void) +{ + if (!offload_symbol_decl) +{ + tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), + ptr_type_node); + TREE_PUBLIC (decl) = 1; + DECL_EXTERNAL (decl) = 1; + DECL_WEAK (decl) = 1; + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier (weak), + NULL_TREE, DECL_ATTRIBUTES (decl)); + offload_symbol_decl = decl; +} + return offload_symbol_decl; +} + /* Convenience function for calling scan_omp_1_op on tree operands. */ static inline tree @@ -5148,11 +5170,7 @@ expand_oacc_offload (struct omp_region * } gimple g; - tree openmp_target -= build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier (__OPENMP_TARGET__), ptr_type_node); - TREE_PUBLIC (openmp_target) = 1; - DECL_EXTERNAL (openmp_target) = 1; + tree openmp_target = get_offload_symbol_decl (); tree fnaddr = build_fold_addr_expr (child_fn); g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device, fnaddr, build_fold_addr_expr (openmp_target), @@ -8686,11 +8704,7 @@ expand_omp_target (struct omp_region *re } gimple g; - tree openmp_target -= build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier (__OPENMP_TARGET__), ptr_type_node); - TREE_PUBLIC (openmp_target) = 1; - DECL_EXTERNAL (openmp_target) = 1; + tree openmp_target = get_offload_symbol_decl (); if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn);
Re: [gomp4] Add tables generation
2014-04-03 20:13 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Shouldn't we just remove __OPENMP_TARGET__ argument from GOMP_target, since we decided to pass it to GOMP_offload_register? -- Ilya
Re: [gomp4] Add tables generation
On 04/03/2014 06:53 PM, Ilya Verbin wrote: 2014-04-03 20:13 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Shouldn't we just remove __OPENMP_TARGET__ argument from GOMP_target, since we decided to pass it to GOMP_offload_register? I thought it was used to look up the right function? With shared libraries you'd get multiple __OPENMP_TARGET__ tables. Bernd
Re: [gomp4] Add tables generation
2014-04-03 21:06 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: On 04/03/2014 06:53 PM, Ilya Verbin wrote: 2014-04-03 20:13 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Shouldn't we just remove __OPENMP_TARGET__ argument from GOMP_target, since we decided to pass it to GOMP_offload_register? I thought it was used to look up the right function? With shared libraries you'd get multiple __OPENMP_TARGET__ tables. Bernd Yes, initially the idea was to use it for look up the right function. But now each DSO will call GOMP_offload_register, and pass unique pointer to __OPENMP_TARGET__ (host_table) for this DSO. Then gomp_register_images_for_device registers all this host tables in the plugin. And when libgomp calls device_get_table_func, the plugin returns the joint table for all DSO's. -- Ilya
Re: [gomp4] Add tables generation
On 04/03/2014 07:25 PM, Ilya Verbin wrote: Yes, initially the idea was to use it for look up the right function. But now each DSO will call GOMP_offload_register, and pass unique pointer to __OPENMP_TARGET__ (host_table) for this DSO. Then gomp_register_images_for_device registers all this host tables in the plugin. And when libgomp calls device_get_table_func, the plugin returns the joint table for all DSO's. Why make a joint table? It seems better to use the __OPENMP_TARGET__ symbol to restrict lookups to the subset of symbols that could actually be found. BTW, I still expect that the lookup by ordering will turn out to be fundamentally unreliable and we'll need to use the unique id patch I posted a while ago. In that case using __OPENMP_TARGET__ as a first order key for the lookups eliminates any problem with duplicate names across multiple libraries. Bernd
Re: [gomp4] Add tables generation
2014-04-03 21:28 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: On 04/03/2014 07:25 PM, Ilya Verbin wrote: Yes, initially the idea was to use it for look up the right function. But now each DSO will call GOMP_offload_register, and pass unique pointer to __OPENMP_TARGET__ (host_table) for this DSO. Then gomp_register_images_for_device registers all this host tables in the plugin. And when libgomp calls device_get_table_func, the plugin returns the joint table for all DSO's. Why make a joint table? It seems better to use the __OPENMP_TARGET__ symbol to restrict lookups to the subset of symbols that could actually be found. BTW, I still expect that the lookup by ordering will turn out to be fundamentally unreliable and we'll need to use the unique id patch I posted a while ago. In that case using __OPENMP_TARGET__ as a first order key for the lookups eliminates any problem with duplicate names across multiple libraries. Bernd In current implementation each gomp_device_descr contains one dev_splay_tree. And all addresses are inserted into this splay tree. There is no need to restrict lookup, because the addresses from multiple DSO's can't overlap. -- Ilya
Re: [gomp4] Add tables generation
Hi! On Thu, 3 Apr 2014 18:13:08 +0200, Bernd Schmidt ber...@codesourcery.com wrote: On 04/02/2014 10:36 AM, Thomas Schwinge wrote: I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because [...] Workaround committed in r209015: libgcc/ * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to NULL. The patch below should be a better fix, making the references to __OPENMP_TARGET__ weak. Does this work for you? Yes, it does, thanks! Please revert my patch when committing yours. Oh, and please use ChangeLog.gomp files on gomp-4_0-branch; also please move the entries for your recent commits from the ChangeLog file(s) to the respective ChangeLog.gomp one(s). Grüße, Thomas pgp9LEYYQa4tJ.pgp Description: PGP signature
Re: [gomp4] Add tables generation
Hi! On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because even if... --- gcc/configure.ac (revision 208715) +++ gcc/configure.ac (working copy) @@ -887,6 +887,10 @@ AC_SUBST(enable_accelerator) offload_targets=`echo $offload_targets | sed -e 's#,#:#'` AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, $offload_targets, [Define to hold the list of target names suitable for offloading.]) +if test x$offload_targets != x; then + AC_DEFINE(ENABLE_OFFLOADING, 1, +[Define this to enable support for offloading.]) +fi ... offloading is not enabled, this... --- gcc/omp-low.c (revision 208706) +++ gcc/omp-low.c (working copy) @@ -8671,19 +8672,22 @@ expand_omp_target (struct omp_region *re } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), 7, - device, fnaddr, openmp_target, t1, t2, t3, t4); + g = gimple_build_call (builtin_decl_explicit (start_ix), 7, device, + fnaddr, build_fold_addr_expr (openmp_target), + t1, t2, t3, t4); } else -g = gimple_build_call (builtin_decl_explicit (start_ix), 6, -device, openmp_target, t1, t2, t3, t4); +g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device, +build_fold_addr_expr (openmp_target), +t1, t2, t3, t4); ... will now cause a reference to __OPENMP_TARGET__, but... --- libgcc/crtstuff.c (revision 208706) +++ libgcc/crtstuff.c (working copy) @@ -311,6 +311,15 @@ register_tm_clones (void) } #endif /* USE_TM_CLONE_REGISTRY */ +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_func_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_var_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +#endif + #if defined(INIT_SECTION_ASM_OP) || defined(INIT_ARRAY_SECTION_ASM_OP) #ifdef OBJECT_FORMAT_ELF @@ -752,6 +761,23 @@ __do_global_ctors (void) #error What are you doing with crtstuff.c, then? #endif +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_funcs_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_vars_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +extern void *_omp_func_table[]; +extern void *_omp_var_table[]; +void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (protected))) = +{ + _omp_func_table, _omp_funcs_end, + _omp_var_table, _omp_vars_end +}; +#endif ... __OPENMP_TARGET__ is not being defined here for the !ENABLE_OFFLOADING case. In http://news.gmane.org/find-root.php?message_id=%3C20130905082455.GH23437%40tucnak.redhat.com%3E, Jakub had suggested this to be a weak symbol, so we'd get NULL in this case, which would be what's needed here, I think? Also, I'd suggest to rename __OPENMP_TARGET__ (and similar ones) to __GNU_OFFLOAD__ (or similar). As we're using this offloading stuff for
Re: [gomp4] Add tables generation
Hi! On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. --- gcc/omp-low.c (revision 208706) +++ gcc/omp-low.c (working copy) @@ -8671,19 +8672,22 @@ expand_omp_target (struct omp_region *re } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), 7, - device, fnaddr, openmp_target, t1, t2, t3, t4); + g = gimple_build_call (builtin_decl_explicit (start_ix), 7, device, + fnaddr, build_fold_addr_expr (openmp_target), + t1, t2, t3, t4); } else -g = gimple_build_call (builtin_decl_explicit (start_ix), 6, -device, openmp_target, t1, t2, t3, t4); +g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device, +build_fold_addr_expr (openmp_target), +t1, t2, t3, t4); Committed in r209013: commit 1f54e08135bd8be59438977b4edbc102e7cef2d7 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Apr 2 08:28:54 2014 + Handle __OPENMP_TARGET__ symbol for OpenACC offloading functions, too. gcc/ * omp-low.c (expand_oacc_offload): Handle __OPENMP_TARGET__ symbol. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@209013 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 5 + gcc/omp-low.c | 14 -- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1d35b58..8983632 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-04-02 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (expand_oacc_offload): Handle __OPENMP_TARGET__ + symbol. + 2014-03-20 Thomas Schwinge tho...@codesourcery.com * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_OACC_LOOP. diff --git gcc/omp-low.c gcc/omp-low.c index a7b93bc..01eda9d 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -5138,13 +5138,15 @@ expand_oacc_offload (struct omp_region *region) } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), -10, device, fnaddr, openmp_target, t1, t2, t3, t4, + g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device, +fnaddr, build_fold_addr_expr (openmp_target), +t1, t2, t3, t4, t_num_gangs, t_num_workers, t_vector_length); gimple_set_location (g, gimple_location (entry_stmt)); gsi_insert_before (gsi, g, GSI_SAME_STMT); +/* Create new symbol containing (address, size) pairs for omp-marked + functions and global variables. */ +void +omp_finish_file (void) +{ + struct cgraph_node *node; + struct varpool_node *vnode; + const char *funcs_section_name = .offload_func_table_section; + const char *vars_section_name = .offload_var_table_section; + vectree, va_gc *v_funcs, *v_vars; + + vec_alloc (v_vars, 0); + vec_alloc (v_funcs, 0); + + [...] + unsigned num_vars = vec_safe_length (v_vars); + unsigned num_funcs = vec_safe_length (v_funcs); + [...] + if (targetm_common.have_named_sections) +{ + [...] + } + else +{ + for (unsigned i = 0; i num_funcs; i++) + { + tree it = (*v_funcs)[i]; + targetm.record_offload_symbol (it); + } + for (unsigned i = 0; i num_funcs; i++) + { + tree it =
Re: [gomp4] Add tables generation
Hi! On Wed, 02 Apr 2014 09:34:29 +0200, I wrote: On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because [...] Workaround committed in r209015: commit 6a015f81a5fafe32cf45656e3de121f4088dbf41 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Apr 2 08:29:17 2014 + Work around __OPENMP_TARGET__ not being defined for !ENABLE_OFFLOADING. libgcc/ * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to NULL. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@209015 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgcc/ChangeLog.gomp | 10 ++ libgcc/crtstuff.c | 2 ++ 2 files changed, 12 insertions(+) diff --git libgcc/ChangeLog.gomp libgcc/ChangeLog.gomp new file mode 100644 index 000..7d08efa --- /dev/null +++ libgcc/ChangeLog.gomp @@ -0,0 +1,10 @@ +2014-04-02 Thomas Schwinge tho...@codesourcery.com + + * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to + NULL. + +Copyright (C) 2014 Free Software Foundation, Inc. + +Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. diff --git libgcc/crtstuff.c libgcc/crtstuff.c index cda0bae..79af7f0 100644 --- libgcc/crtstuff.c +++ libgcc/crtstuff.c @@ -775,6 +775,8 @@ void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (protected))) = _omp_func_table, _omp_funcs_end, _omp_var_table, _omp_vars_end }; +#else +void **__OPENMP_TARGET__ __attribute__ ((__visibility__ (protected))) = NULL; #endif Also, I'd suggest to rename __OPENMP_TARGET__ (and similar ones) to __GNU_OFFLOAD__ (or similar). As we're using this offloading stuff for both OpenACC and OpenMP target, it makes sense to me to use a generic name; we still have the chance to do so now while this stuff is not yet in trunk. Grüße, Thomas pgpMH12KYLnx1.pgp Description: PGP signature
Re: [gomp4] Add tables generation
+#ifdef ACCEL_COMPILER + /* Decls are placed in reversed order in fat-objects, so we need to + revert them back if we compile target. */ ... Actually this change is incorrect. If host binary is built with -flto, then both host gcc and target gcc read decls from lto and target_lto sections in the same order, and resulting tables are identical. So, in this case there is no need to change the order. But what if one wants to link non-lto host object files with a target image, produced from target_lto sections? In this case the order of host table, produced during ordinary compilation will differ from the order of target table, produced during lto compilation. Jakub, what do you think? Here is a simple example with 4 functions and 4 global variables: #define N 100 #pragma omp declare target int arr1[N]; int arr2[N]; int arr3[N]; int arr4[N]; #pragma omp end declare target void foo () { #pragma omp target for (int i = 0; i N; i++) arr1[i] = 41 + i; #pragma omp target for (int i = 0; i N; i++) arr2[i] = 42 + i; #pragma omp target for (int i = 0; i N; i++) arr3[i] = 43 + i; #pragma omp target for (int i = 0; i N; i++) arr4[i] = 44 + i; } I print DECL_NAME ((*v_funcs)[i]) and DECL_NAME ((*v_vars)[i]) in omp_finish_file: Host compilation: $ gcc -std=c99 -fopenmp -flto -c test.c -o test.o host func 0: foo._omp_fn.0 host func 1: foo._omp_fn.1 host func 2: foo._omp_fn.2 host func 3: foo._omp_fn.3 host var 0: arr4 host var 1: arr3 host var 2: arr2 host var 3: arr1 Host lto and target lto: $ gcc -std=c99 -fopenmp -flto test.o -o test host func 0: foo._omp_fn.3 host func 1: foo._omp_fn.2 host func 2: foo._omp_fn.1 host func 3: foo._omp_fn.0 host var 0: arr4 host var 1: arr3 host var 2: arr2 host var 3: arr1 target func 0: foo._omp_fn.3 target func 1: foo._omp_fn.2 target func 2: foo._omp_fn.1 target func 3: foo._omp_fn.0 target var 0: arr4 target var 1: arr3 target var 2: arr2 target var 3: arr1 The func tables produced during ordinary compilation and lto are different. -- Ilya
Re: [gomp4] Add tables generation
On 03/27/2014 02:31 PM, Ilya Verbin wrote: +#ifdef ACCEL_COMPILER + /* Decls are placed in reversed order in fat-objects, so we need to + revert them back if we compile target. */ ... Actually this change is incorrect. If host binary is built with -flto, then both host gcc and target gcc read decls from lto and target_lto sections in the same order, and resulting tables are identical. So, in this case there is no need to change the order. But what if one wants to link non-lto host object files with a target image, produced from target_lto sections? In this case the order of host table, produced during ordinary compilation will differ from the order of target table, produced during lto compilation. I haven't looked into the ordering issue here (the reversing of the order is from Michael's original patch), because I still think the whole scheme can't work and I was intending to produce a testcase to demonstrate that. Looks like you saved me some time here :) My suggestion would be to augment the tables with the unique-name scheme I posted previously. I think the objections against it were a little exaggerated, and it would ensure reliability. Bernd
Re: [gomp4] Add tables generation
On 03/27/2014 02:31 PM, Ilya Verbin wrote: +#ifdef ACCEL_COMPILER + /* Decls are placed in reversed order in fat-objects, so we need to + revert them back if we compile target. */ ... Actually this change is incorrect. If host binary is built with -flto, then both host gcc and target gcc read decls from lto and target_lto sections in the same order, and resulting tables are identical. So, in this case there is no need to change the order. But what if one wants to link non-lto host object files with a target image, produced from target_lto sections? In this case the order of host table, produced during ordinary compilation will differ from the order of target table, produced during lto compilation. I haven't looked into the ordering issue here (the reversing of the order is from Michael's original patch), because I still think the whole scheme can't work and I was intending to produce a testcase to demonstrate that. Looks like you saved me some time here :) My suggestion would be to augment the tables with the unique-name scheme I posted previously. I think the objections against it were a little exaggerated, and it would ensure reliability. Bernd
Re: [gomp4] Add tables generation
On Thu, Mar 27, 2014 at 05:31:29PM +0400, Ilya Verbin wrote: +#ifdef ACCEL_COMPILER + /* Decls are placed in reversed order in fat-objects, so we need to + revert them back if we compile target. */ ... Actually this change is incorrect. If host binary is built with -flto, then both host gcc and target gcc read decls from lto and target_lto sections in the same order, and resulting tables are identical. So, in this case there is no need to change the order. But what if one wants to link non-lto host object files with a target image, produced from target_lto sections? In this case the order of host table, produced during ordinary compilation will differ from the order of target table, produced during lto compilation. Jakub, what do you think? The tables need to be created before IPA, that way it really shouldn't matter in what order you emit them. E.g. the outlined target functions could be added to the table during ompexp pass which actually creates the outlined functions, the vars need to be added before target lto or host lto is streamed. Jakub
Re: [gomp4] Add tables generation
On 27 Mar 15:02, Jakub Jelinek wrote: The tables need to be created before IPA, that way it really shouldn't matter in what order you emit them. E.g. the outlined target functions could be added to the table during ompexp pass which actually creates the outlined functions, the vars need to be added before target lto or host lto is streamed. For host tables it's ok, but when target compiler will create tables with functions? It reads bytecode from target_lto sections, so it never executes ompexp pass. -- Ilya
Re: [gomp4] Add tables generation
On 27 Mar 17:16, Jakub Jelinek wrote: Which is why the table created for host by the ompexp pass should be streamed into the target_lto sections (marked specially somehow, special attribute or whatever), and then corresponding target table created from that, rather then created from some possibly different ordering there. Ok, this should work. I'll rewrite tables generation. -- Ilya
Re: [gomp4] Add tables generation
On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. Bernd Index: gomp-4_0-branch/gcc/lto-wrapper.c === --- gomp-4_0-branch.orig/gcc/lto-wrapper.c +++ gomp-4_0-branch/gcc/lto-wrapper.c @@ -67,6 +67,7 @@ static unsigned int nr; static char **input_names; static char **output_names; static char **offload_names; +static const char *ompbegin, *ompend; static char *makefile; const char tool_name[] = lto-wrapper; @@ -479,6 +480,61 @@ compile_images_for_openmp_targets (unsig free_array_of_ptrs ((void**) names, num_targets); } +/* Copy a file from SRC to DEST. */ +static void +copy_file (const char *dest, const char *src) +{ + FILE *d = fopen (dest, wb); + FILE *s = fopen (src, rb); + char buffer[512]; + while (!feof (s)) +{ + size_t len = fread (buffer, 1, 512, s); + if (ferror (s) != 0) + fatal (reading input file); + if (len 0) + { + fwrite (buffer, 1, len, d); + if (ferror (d) != 0) + fatal (writing output file); + } +} +} + +/* Find the omp_begin.o and omp_end.o files in LIBRARY_PATH, make copies + and store the names of the copies in ompbegin and ompend. */ + +static void +find_ompbeginend (void) +{ + char **paths; + const char *library_path = getenv (LIBRARY_PATH); + if (library_path == NULL) +return; + int n_paths = parse_env_var (library_path, paths, /ompbegin.o); + + for (int i = 0; i n_paths; i++) +if (access_check (paths[i], R_OK) == 0) + { + size_t len = strlen (paths[i]); + char *tmp = xstrdup (paths[i]); + strcpy (paths[i] + len - 7, end.o); + if (access_check (paths[i], R_OK) != 0) + fatal (installation error, can't find ompend.o); + /* The linker will delete the filenames we give it, so make + copies. */ + const char *omptmp1 = make_temp_file (.o); + const char *omptmp2 = make_temp_file (.o); + copy_file (omptmp1, tmp); + ompbegin = omptmp1; + copy_file (omptmp2, paths[i]); + ompend = oindmptmp2; + free (tmp); + break; + } + + free_array_of_ptrs ((void**) paths, n_paths); +} /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */ @@ -964,6 +1020,7 @@ cont: compile_images_for_openmp_targets (argc, argv); if (offload_names) { + find_ompbeginend (); for (i = 0; offload_names[i]; i++) { fputs (offload_names[i], stdout); @@ -972,12 +1029,23 @@ cont: free_array_of_ptrs ((void **)offload_names, i); } } + if (ompbegin) + { + fputs (ompbegin, stdout); + putc ('\n', stdout); + } + for (i = 0; i nr; ++i) { fputs (output_names[i], stdout); putc ('\n', stdout); free (input_names[i]); } + if (ompend) + { + fputs (ompend, stdout); + putc ('\n', stdout); + } nr = 0; free (output_names); free (input_names); Index: gomp-4_0-branch/libgcc/configure === --- gomp-4_0-branch.orig/libgcc/configure +++ gomp-4_0-branch/libgcc/configure @@ -566,6 +566,7 @@ sfp_machine_header set_use_emutls set_have_cc_tls vis_hide +enable_accelerator fixed_point enable_decimal_float decimal_float @@ -664,6 +665,8 @@ with_build_libsubdir enable_decimal_float with_system_libunwind enable_sjlj_exceptions +enable_accelerator +enable_offload_targets enable_tls ' ac_precious_vars='build_alias @@ -1301,6 +1304,9 @@ Optional Features: to use --enable-sjlj-exceptions force use of builtin_setjmp for exceptions + --enable-acceleratorbuild accelerator [ARG={no,device-triplet}] + --enable-offload-targets=LIST + enable offloading to devices from LIST --enable-tlsUse thread-local storage [default=yes] Optional Packages: @@ -4357,6 +4363,43 @@ esac # Collect host-machine-specific information. . ${srcdir}/config.host +offload_targets= +# Check whether --enable-accelerator was given. +if test ${enable_accelerator+set} = set; then : + enableval=$enable_accelerator; + case $enable_accelerator in + no) ;; + *) +offload_targets=$enable_accelerator +;; + esac + +fi + + + +# Check whether --enable-offload-targets was given. +if test ${enable_offload_targets+set} = set; then : + enableval=$enable_offload_targets; + if test x$enable_offload_targets = x; then +as_fn_error no offload targets specified $LINENO 5 + else +if test x$offload_targets = x; then + offload_targets=$enable_offload_targets +else +
Re: [gomp4] Add tables generation
On Fri, Mar 21, 2014 at 04:13:45PM +0100, Bernd Schmidt wrote: On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. I'd call the files crtompbegin.o/crtompend.o instead. And, what is the exact reason why you are using protected visibility rather than hidden? Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. As for the __OPENMP_TARGET__ header format, that can be certainly resolved later on. Jakub
Re: [gomp4] Add tables generation
On 03/21/2014 04:20 PM, Jakub Jelinek wrote: And, what is the exact reason why you are using protected visibility rather than hidden? Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. Hmm, okay. No real reason for any of these except things were set up like this in Michael Zolotukhin's original patch. I'll tweak it some more. Bernd
[gomp4] Add tables generation
This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. Bernd Index: libgcc/ChangeLog === --- libgcc/ChangeLog (revision 208706) +++ libgcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2014-03-20 Bernd Schmidt ber...@codesourcery.com + + * crtstuff.c (_omp_func_table, _omp_var_table, _omp_funcs_end, + _omp_vars_end): New array fragments. + (__OPENMP_TARGET__): New variable. + 2014-02-28 Joey Ye joey...@arm.com PR libgcc/60166 Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 208720) +++ gcc/ChangeLog (working copy) @@ -1,5 +1,20 @@ 2014-03-20 Bernd Schmidt ber...@codesourcery.com + Mostly by Michael Zolotukhin: + * omp-low.c: Include common/common-target.h. + (expand_omp_target): Pass in address of __OPENMP_TARGET__. + (add_decls_addresses_to_decl_constructor, omp_finish_file): New + functions. + * omp-low.h (omp_finish_file): Declare. + * toplev.c: Include omp-low.h. + (compile_file): Call omp_finish_file. + * target.def (record_offload_symbol): New hook. + * doc/tm.texi.in (TARGET_RECORD_OFFLOAD_SYMBOL): Add. + * doc/tm.texi: Regenerate. + * configure.ac (ENABLE_OFFLOADING): Define if we have offload_targets. + * configure: Regenerate. + * config.in: Regenerate. + * config/darwin.c: Include lto-section-names.h. (LTO_SEGMENT_NAME): Don't define. * config/i386/winnt.c: Include lto-section-names.h. Index: gcc/config.in === --- gcc/config.in (revision 208715) +++ gcc/config.in (working copy) @@ -139,6 +139,12 @@ #endif +/* Define this to enable support for offloading. */ +#ifndef USED_FOR_TARGET +#undef ENABLE_OFFLOADING +#endif + + /* Define to enable plugin support. */ #ifndef USED_FOR_TARGET #undef ENABLE_PLUGIN Index: gcc/configure === --- gcc/configure (revision 208715) +++ gcc/configure (working copy) @@ -7363,6 +7363,11 @@ cat confdefs.h _ACEOF #define OFFLOAD_TARGETS $offload_targets _ACEOF +if test x$offload_targets != x; then + +$as_echo #define ENABLE_OFFLOADING 1 confdefs.h + +fi # Check whether --with-multilib-list was given. @@ -18008,7 +18013,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18011 configure +#line 18016 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18114,7 +18119,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18117 configure +#line 18122 configure #include confdefs.h #if HAVE_DLFCN_H Index: gcc/configure.ac === --- gcc/configure.ac (revision 208715) +++ gcc/configure.ac (working copy) @@ -887,6 +887,10 @@ AC_SUBST(enable_accelerator) offload_targets=`echo $offload_targets | sed -e 's#,#:#'` AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, $offload_targets, [Define to hold the list of target names suitable for offloading.]) +if test x$offload_targets != x; then + AC_DEFINE(ENABLE_OFFLOADING, 1, +[Define this to enable support for offloading.]) +fi AC_ARG_WITH(multilib-list, [AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and x86-64 only)])], Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 208706) +++ gcc/doc/tm.texi (working copy) @@ -11418,3 +11418,9 @@ If defined, this function returns an app @deftypefn {Target Hook} void TARGET_ATOMIC_ASSIGN_EXPAND_FENV (tree *@var{hold}, tree *@var{clear}, tree *@var{update}) ISO C11 requires atomic compound assignments that may raise floating-point exceptions to raise exceptions corresponding to the arithmetic operation whose result was successfully stored in a compare-and-exchange sequence. This requires code equivalent to calls to @code{feholdexcept}, @code{feclearexcept} and @code{feupdateenv} to be generated at appropriate points in the compare-and-exchange sequence. This hook should set @code{*@var{hold}} to an expression equivalent to the call to @code{feholdexcept}, @code{*@var{clear}} to an expression equivalent to the call to @code{feclearexcept} and @code{*@var{update}} to an expression equivalent to the call to @code{feupdateenv}. The three expressions are @code{NULL_TREE} on entry to the hook and may be left as @code{NULL_TREE} if no code is required in a particular place. The default implementation leaves all three expressions as @code{NULL_TREE}.
Re: [gomp4] Add tables generation
On Thu, Mar 20, 2014 at 05:50:13PM +0100, Bernd Schmidt wrote: --- libgcc/crtstuff.c (revision 208706) +++ libgcc/crtstuff.c (working copy) @@ -311,6 +311,15 @@ register_tm_clones (void) } #endif /* USE_TM_CLONE_REGISTRY */ +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_func_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_var_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +#endif + #if defined(INIT_SECTION_ASM_OP) || defined(INIT_ARRAY_SECTION_ASM_OP) #ifdef OBJECT_FORMAT_ELF @@ -752,6 +761,23 @@ __do_global_ctors (void) #error What are you doing with crtstuff.c, then? #endif +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_funcs_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_vars_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +extern void *_omp_func_table[]; +extern void *_omp_var_table[]; +void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (protected))) = +{ + _omp_func_table, _omp_funcs_end, + _omp_var_table, _omp_vars_end +}; +#endif + + #else /* ! CRT_BEGIN ! CRT_END */ #error One of CRT_BEGIN or CRT_END must be defined. #endif I don't like these libgcc changes at all. First of all, we should find a way which has no runtime costs for at least programs not compiled with -fopenmp/-fopenacc at all, preferrably no runtime cost for any program or shared library that actually doesn't contain any offloading code. The above costs every single binary/shared library 5 exported symbols (with the worst ever visibility, protected should basically never be used, it is even more costly than normal symbol visibility, why it isn't just hidden?) and 4 * sizeof (void *) bytes in data section and 4 runtime relocations (with the protected visibility costly ones). When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. Also, I'd prefer if __OPENMP_TARGET__ header was as compact as possible for the case when there is nothing to offload (ideally, if __OPENMP_TARGET__ symbol is never referenced, not create it at all, if it is referenced, but there is nothing to offload, say just a single 0 byte, otherwise say an uleb128 number how many different kinds of offload data there are and then for each one some identification which offload it is for, the tables, where to find it. Jakub