Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/27/20 3:21 PM, Jeff Law wrote: On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote: On 3/26/20 5:54 PM, Jeff Law wrote: On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote: On 3/9/20 4:36 PM, H.J. Lu wrote: We nee to support different variables, like TLS, data and bss variables. Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? About BSS and DATA I agree that it would be handy. I can theoretically covered with code in get_variable_section/bss_initializer_p. But it's quite logic and I'm not sure we should simulate it. @Honza/Richi: Do you have any opinion about that? Or could we just fix the damn broken configure scripts? Isn't that what's driving all this nonsense? Hello. Yes. Apparently libtool people were unable to come up with a reasonable workaround for that. One can read technical details in this PR: https://sourceware.org/bugzilla/show_bug.cgi?id=25355 What's funny is that discussion doesn't mention the HP-UX bits which handle this correctly. However, I think my tests were all done without the common symbol handling change we made late last year. So that may complicate things. Note that the switching to -fno-common was actually the trigger of the issues. Before the change, one can easily disambiguate global vars and functions based on the LDPK_COMMON symbol_kind. The HP-UX bits for this in configure work correctly.I verified that months ago and even have a sed script which will take a broken configure script and fix it. How many packages will you have to fix? IIRC it's less than a half-dozen. I found them by capturing the generated config.h files with and without LTO enabled and comparing them. I don't offhand know if the differences caused the packages to mis-behave or not -- we just failed any build where those config.h files differed. The current LTO plug-in extension will work fine with gcc10 + latest bintuils release branch. That's a viable solution for us openSUSE. If it's working, then great. But I worry about having to keep binutils/gcc consistent. In fact, what brought me into the conversation again was them getting out of sync in the last few days and binutils tests started failing. It was probably caused by H.J.'s attempt to use lto-wrapper for nm (for LTO bytecode) which had various issues. Martin jeff
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote: > On 3/26/20 5:54 PM, Jeff Law wrote: > > On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote: > > > On 3/9/20 4:36 PM, H.J. Lu wrote: > > > > We nee to support different variables, like TLS, data and bss variables. > > > > > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? > > > > > > About BSS and DATA I agree that it would be handy. I can theoretically > > > covered with code in get_variable_section/bss_initializer_p. But it's > > > quite logic and I'm not sure we should simulate it. > > > > > > @Honza/Richi: Do you have any opinion about that? > > Or could we just fix the damn broken configure scripts? Isn't that what's > > driving all this nonsense? > > Hello. > > Yes. Apparently libtool people were unable to come up with a reasonable > workaround for that. One can read technical details in this PR: > https://sourceware.org/bugzilla/show_bug.cgi?id=25355 What's funny is that discussion doesn't mention the HP-UX bits which handle this correctly. However, I think my tests were all done without the common symbol handling change we made late last year. So that may complicate things. > > > The HP-UX bits for this in configure work correctly.I verified that > > months > > ago and even have a sed script which will take a broken configure script and > > fix > > it. > > How many packages will you have to fix? IIRC it's less than a half-dozen. I found them by capturing the generated config.h files with and without LTO enabled and comparing them. I don't offhand know if the differences caused the packages to mis-behave or not -- we just failed any build where those config.h files differed. > > The current LTO plug-in extension will work fine with gcc10 + latest bintuils > release > branch. That's a viable solution for us openSUSE. If it's working, then great. But I worry about having to keep binutils/gcc consistent. In fact, what brought me into the conversation again was them getting out of sync in the last few days and binutils tests started failing. jeff
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/26/20 5:54 PM, Jeff Law wrote: On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote: On 3/9/20 4:36 PM, H.J. Lu wrote: We nee to support different variables, like TLS, data and bss variables. Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? About BSS and DATA I agree that it would be handy. I can theoretically covered with code in get_variable_section/bss_initializer_p. But it's quite logic and I'm not sure we should simulate it. @Honza/Richi: Do you have any opinion about that? Or could we just fix the damn broken configure scripts? Isn't that what's driving all this nonsense? Hello. Yes. Apparently libtool people were unable to come up with a reasonable workaround for that. One can read technical details in this PR: https://sourceware.org/bugzilla/show_bug.cgi?id=25355 The HP-UX bits for this in configure work correctly.I verified that months ago and even have a sed script which will take a broken configure script and fix it. How many packages will you have to fix? The current LTO plug-in extension will work fine with gcc10 + latest bintuils release branch. That's a viable solution for us openSUSE. Martin jeff
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote: > On 3/9/20 4:36 PM, H.J. Lu wrote: > > We nee to support different variables, like TLS, data and bss variables. > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? > > About BSS and DATA I agree that it would be handy. I can theoretically > covered with code in get_variable_section/bss_initializer_p. But it's > quite logic and I'm not sure we should simulate it. > > @Honza/Richi: Do you have any opinion about that? Or could we just fix the damn broken configure scripts? Isn't that what's driving all this nonsense? The HP-UX bits for this in configure work correctly.I verified that months ago and even have a sed script which will take a broken configure script and fix it. jeff
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 19, 2020 at 12:56 PM Richard Biener wrote: > > On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu" wrote: > >On Thu, Mar 19, 2020 at 9:00 AM Martin Liška wrote: > >> > >> On 3/19/20 4:50 PM, H.J. Lu wrote: > >> > I like it and I will take case of binutils side. > >> > > >> > Thanks. > >> > >> Great. I've just installed the 2 patches to master. > >> > > > >Here is the binutils patch: > > > >https://sourceware.org/pipermail/binutils/2020-March/110295.html > > If the plugin advertises v2 support but the LTO IL files lack symtab_ext data > what happens? Did you mean symbol_type == LDST_UNKNOWN? In that case, you get the text symbol for everything. -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu" wrote: >On Thu, Mar 19, 2020 at 9:00 AM Martin Liška wrote: >> >> On 3/19/20 4:50 PM, H.J. Lu wrote: >> > I like it and I will take case of binutils side. >> > >> > Thanks. >> >> Great. I've just installed the 2 patches to master. >> > >Here is the binutils patch: > >https://sourceware.org/pipermail/binutils/2020-March/110295.html If the plugin advertises v2 support but the LTO IL files lack symtab_ext data what happens? Richard.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 19, 2020 at 9:00 AM Martin Liška wrote: > > On 3/19/20 4:50 PM, H.J. Lu wrote: > > I like it and I will take case of binutils side. > > > > Thanks. > > Great. I've just installed the 2 patches to master. > Here is the binutils patch: https://sourceware.org/pipermail/binutils/2020-March/110295.html -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/19/20 4:50 PM, H.J. Lu wrote: I like it and I will take case of binutils side. Thanks. Great. I've just installed the 2 patches to master. Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 19, 2020 at 8:46 AM Richard Biener wrote: > > On Thu, Mar 19, 2020 at 4:00 PM Martin Liška wrote: > > > > On 3/19/20 10:12 AM, Richard Biener wrote: > > > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška wrote: > > >> > > >> On 3/18/20 12:27 AM, Jan Hubicka wrote: > > Hi. > > > > There's updated version of the patch. > > Changes from the previous version: > > - comment added to ld_plugin_symbol > > - new section renamed to ext_symtab > > - assert added for loop iterations in produce_symtab and > > produce_symtab_extension > > >>> Hi, > > >>> I hope this is last version of the patch. > > >> > > >> Hello. > > >> > > >> Yes. > > >> > > > > 2020-03-12 Martin Liska > > > > * lto-section-in.c: Add extension_symtab. > > >>> ext_symtab :) > > >> > > >> Fixed. > > >> > > diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > > index c17dd69dbdd..78b015be696 100644 > > --- a/gcc/lto-section-in.c > > +++ b/gcc/lto-section-in.c > > @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = > > "mode_table", > > "hsa", > > "lto", > > - "ipa_sra" > > + "ipa_sra", > > + "ext_symtab" > > >>> I would move ext_symtab next to symtab so the sections remains at least > > >>> bit reasonably ordered. > > >> > > >> Ok, I'll adjust it and I will send a separate patch where we bump > > >> LTO_major_version. > > >> > > > > +/* Write extension information for symbols (symbol type, section > > flags). */ > > + > > +static void > > +write_symbol_extension_info (tree t) > > +{ > > + unsigned char c; > > >>> Do we still use vertical whitespace after decls per GNU coding style? > > >> > > >> Dunno. This seems to me like a nit. > > >> > > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h > > index 25bf6c468f7..4f82b439360 100644 > > --- a/gcc/lto-streamer.h > > +++ b/gcc/lto-streamer.h > > @@ -236,6 +236,7 @@ enum lto_section_type > > LTO_section_ipa_hsa, > > LTO_section_lto, > > LTO_section_ipa_sra, > > + LTO_section_symtab_extension, > > >>> I guess symtab_ext to match the actual section name? > > >> > > >> No. See e.g. LTO_section_jump_functions - "jmpfuncs". We want to have > > >> more descriptive > > >> enum names. > > >> > > LTO_N_SECTION_TYPES /* Must be last. */ > > }; > > > > diff --git a/include/lto-symtab.h b/include/lto-symtab.h > > index 0ce0de10121..47f0ff27df8 100644 > > --- a/include/lto-symtab.h > > +++ b/include/lto-symtab.h > > @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility > > GCCPV_HIDDEN > > }; > > > > +enum gcc_plugin_symbol_type > > +{ > > + GCCST_UNKNOWN, > > + GCCST_FUNCTION, > > + GCCST_VARIABLE, > > +}; > > + > > +enum gcc_plugin_symbol_section_flags > > +{ > > + GCCSSS_BSS = 1 > > +}; > > >>> > > >>> Probably comments here? > > >> > > >> No. There are just shadow copy of enum types from plugin-api.h which > > >> are documented. > > >> > > + > > #endif /* GCC_LTO_SYMTAB_H */ > > +/* Parse an entry of the IL symbol table. The data to be parsed is > > pointed > > + by P and the result is written in ENTRY. The slot number is stored > > in SLOT. > > + Returns the address of the next entry. */ > > + > > +static char * > > +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) > > +{ > > + unsigned char t; > > + enum ld_plugin_symbol_type symbol_types[] = > > +{ > > + LDST_UNKNOWN, > > + LDST_FUNCTION, > > + LDST_VARIABLE, > > +}; > > + > > + t = *p; > > + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); > > + entry->symbol_type = symbol_types[t]; > > + p++; > > + entry->section_flags = *p; > > + p++; > > + > > + return p; > > +} > > >>> > > >>> I think we have chance to make some plan for future extensions without > > >>> introducing too many additional sections. > > >>> > > >>> Currently there are 2 bytes per entry, while only 3 bits are actively > > >>> used of them. If we invent next flag to pass we can use unused bits > > >>> however we need a way to indicate to plugin that the bit is defined. > > >>> This could be done by a simple version byte at the beggining of > > >>> ext_symtab section which will be 0 now and once we define extra bits we > > >>> bump it up to 1. > > >> > > >> I like the suggested change, it can help us in the future. > > >> > > >>> > > >>> It is not that important given that even empty file results in 2k LTO > > >>> object file, but I think it would be nicer in longer run. > > + /* This is for compatibility with older ABIs. */ > > >>> Perhaps say here tha
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 19, 2020 at 4:00 PM Martin Liška wrote: > > On 3/19/20 10:12 AM, Richard Biener wrote: > > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška wrote: > >> > >> On 3/18/20 12:27 AM, Jan Hubicka wrote: > Hi. > > There's updated version of the patch. > Changes from the previous version: > - comment added to ld_plugin_symbol > - new section renamed to ext_symtab > - assert added for loop iterations in produce_symtab and > produce_symtab_extension > >>> Hi, > >>> I hope this is last version of the patch. > >> > >> Hello. > >> > >> Yes. > >> > > 2020-03-12 Martin Liska > > * lto-section-in.c: Add extension_symtab. > >>> ext_symtab :) > >> > >> Fixed. > >> > diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > index c17dd69dbdd..78b015be696 100644 > --- a/gcc/lto-section-in.c > +++ b/gcc/lto-section-in.c > @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = > "mode_table", > "hsa", > "lto", > - "ipa_sra" > + "ipa_sra", > + "ext_symtab" > >>> I would move ext_symtab next to symtab so the sections remains at least > >>> bit reasonably ordered. > >> > >> Ok, I'll adjust it and I will send a separate patch where we bump > >> LTO_major_version. > >> > > +/* Write extension information for symbols (symbol type, section > flags). */ > + > +static void > +write_symbol_extension_info (tree t) > +{ > + unsigned char c; > >>> Do we still use vertical whitespace after decls per GNU coding style? > >> > >> Dunno. This seems to me like a nit. > >> > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h > index 25bf6c468f7..4f82b439360 100644 > --- a/gcc/lto-streamer.h > +++ b/gcc/lto-streamer.h > @@ -236,6 +236,7 @@ enum lto_section_type > LTO_section_ipa_hsa, > LTO_section_lto, > LTO_section_ipa_sra, > + LTO_section_symtab_extension, > >>> I guess symtab_ext to match the actual section name? > >> > >> No. See e.g. LTO_section_jump_functions - "jmpfuncs". We want to have > >> more descriptive > >> enum names. > >> > LTO_N_SECTION_TYPES /* Must be last. */ > }; > > diff --git a/include/lto-symtab.h b/include/lto-symtab.h > index 0ce0de10121..47f0ff27df8 100644 > --- a/include/lto-symtab.h > +++ b/include/lto-symtab.h > @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility > GCCPV_HIDDEN > }; > > +enum gcc_plugin_symbol_type > +{ > + GCCST_UNKNOWN, > + GCCST_FUNCTION, > + GCCST_VARIABLE, > +}; > + > +enum gcc_plugin_symbol_section_flags > +{ > + GCCSSS_BSS = 1 > +}; > >>> > >>> Probably comments here? > >> > >> No. There are just shadow copy of enum types from plugin-api.h which > >> are documented. > >> > + > #endif /* GCC_LTO_SYMTAB_H */ > +/* Parse an entry of the IL symbol table. The data to be parsed is > pointed > + by P and the result is written in ENTRY. The slot number is stored > in SLOT. > + Returns the address of the next entry. */ > + > +static char * > +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) > +{ > + unsigned char t; > + enum ld_plugin_symbol_type symbol_types[] = > +{ > + LDST_UNKNOWN, > + LDST_FUNCTION, > + LDST_VARIABLE, > +}; > + > + t = *p; > + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); > + entry->symbol_type = symbol_types[t]; > + p++; > + entry->section_flags = *p; > + p++; > + > + return p; > +} > >>> > >>> I think we have chance to make some plan for future extensions without > >>> introducing too many additional sections. > >>> > >>> Currently there are 2 bytes per entry, while only 3 bits are actively > >>> used of them. If we invent next flag to pass we can use unused bits > >>> however we need a way to indicate to plugin that the bit is defined. > >>> This could be done by a simple version byte at the beggining of > >>> ext_symtab section which will be 0 now and once we define extra bits we > >>> bump it up to 1. > >> > >> I like the suggested change, it can help us in the future. > >> > >>> > >>> It is not that important given that even empty file results in 2k LTO > >>> object file, but I think it would be nicer in longer run. > + /* This is for compatibility with older ABIs. */ > >>> Perhaps say here that this ABI defined only "int def;" > >> > >> Good point. > >> > >>> > >>> The patch look good to me. Thanks for the work! > >> > >> Thanks. I'm sending updated patch that I've just tested on lto.exp and > >> both binutils master and HJ's branch that utilizes the new API. > > > > @@ -495,10 +560,16 @@ write_resolution (void) > > > > /* Versio
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/19/20 10:12 AM, Richard Biener wrote: On Wed, Mar 18, 2020 at 9:52 AM Martin Liška wrote: On 3/18/20 12:27 AM, Jan Hubicka wrote: Hi. There's updated version of the patch. Changes from the previous version: - comment added to ld_plugin_symbol - new section renamed to ext_symtab - assert added for loop iterations in produce_symtab and produce_symtab_extension Hi, I hope this is last version of the patch. Hello. Yes. 2020-03-12 Martin Liska * lto-section-in.c: Add extension_symtab. ext_symtab :) Fixed. diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index c17dd69dbdd..78b015be696 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "mode_table", "hsa", "lto", - "ipa_sra" + "ipa_sra", + "ext_symtab" I would move ext_symtab next to symtab so the sections remains at least bit reasonably ordered. Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version. +/* Write extension information for symbols (symbol type, section flags). */ + +static void +write_symbol_extension_info (tree t) +{ + unsigned char c; Do we still use vertical whitespace after decls per GNU coding style? Dunno. This seems to me like a nit. diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 25bf6c468f7..4f82b439360 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -236,6 +236,7 @@ enum lto_section_type LTO_section_ipa_hsa, LTO_section_lto, LTO_section_ipa_sra, + LTO_section_symtab_extension, I guess symtab_ext to match the actual section name? No. See e.g. LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive enum names. LTO_N_SECTION_TYPES /* Must be last. */ }; diff --git a/include/lto-symtab.h b/include/lto-symtab.h index 0ce0de10121..47f0ff27df8 100644 --- a/include/lto-symtab.h +++ b/include/lto-symtab.h @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility GCCPV_HIDDEN }; +enum gcc_plugin_symbol_type +{ + GCCST_UNKNOWN, + GCCST_FUNCTION, + GCCST_VARIABLE, +}; + +enum gcc_plugin_symbol_section_flags +{ + GCCSSS_BSS = 1 +}; Probably comments here? No. There are just shadow copy of enum types from plugin-api.h which are documented. + #endif /* GCC_LTO_SYMTAB_H */ +/* Parse an entry of the IL symbol table. The data to be parsed is pointed + by P and the result is written in ENTRY. The slot number is stored in SLOT. + Returns the address of the next entry. */ + +static char * +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) +{ + unsigned char t; + enum ld_plugin_symbol_type symbol_types[] = +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE, +}; + + t = *p; + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); + entry->symbol_type = symbol_types[t]; + p++; + entry->section_flags = *p; + p++; + + return p; +} I think we have chance to make some plan for future extensions without introducing too many additional sections. Currently there are 2 bytes per entry, while only 3 bits are actively used of them. If we invent next flag to pass we can use unused bits however we need a way to indicate to plugin that the bit is defined. This could be done by a simple version byte at the beggining of ext_symtab section which will be 0 now and once we define extra bits we bump it up to 1. I like the suggested change, it can help us in the future. It is not that important given that even empty file results in 2k LTO object file, but I think it would be nicer in longer run. + /* This is for compatibility with older ABIs. */ Perhaps say here that this ABI defined only "int def;" Good point. The patch look good to me. Thanks for the work! Thanks. I'm sending updated patch that I've just tested on lto.exp and both binutils master and HJ's branch that utilizes the new API. @@ -495,10 +560,16 @@ write_resolution (void) /* Version 2 of API supports IRONLY_EXP resolution that is accepted by GCC-4.7 and newer. */ - if (get_symbols_v2) -get_symbols_v2 (info->handle, symtab->nsyms, syms); + if (get_symbols_v4) + get_symbols_v4 (info->handle, symtab->nsyms, syms); else -get_symbols (info->handle, symtab->nsyms, syms); + { + clear_new_symtab_flags (symtab); can you instead just avoid parsing the ext symtab? Yes, I simplified the changes and I bet we'll only need one new hook get_symbols_v2. Then we can base parsing of the external symtab on that. + if (get_symbols_v2) + get_symbols_v2 (info->handle, symtab->nsyms, syms); + else + get_symbols (info->handle, symtab->nsyms, syms); + } I guess this also points to the fact that LDs symbol resolution can't tell GCC it chose "BSS" (from a non-IL object) or that it chose a variable or function. @@ -296,6 +300,8 @@ parse_table_entry (char
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, Mar 18, 2020 at 9:52 AM Martin Liška wrote: > > On 3/18/20 12:27 AM, Jan Hubicka wrote: > >> Hi. > >> > >> There's updated version of the patch. > >> Changes from the previous version: > >> - comment added to ld_plugin_symbol > >> - new section renamed to ext_symtab > >> - assert added for loop iterations in produce_symtab and > >> produce_symtab_extension > > Hi, > > I hope this is last version of the patch. > > Hello. > > Yes. > > >> > >> 2020-03-12 Martin Liska > >> > >> * lto-section-in.c: Add extension_symtab. > > ext_symtab :) > > Fixed. > > >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > >> index c17dd69dbdd..78b015be696 100644 > >> --- a/gcc/lto-section-in.c > >> +++ b/gcc/lto-section-in.c > >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = > >> "mode_table", > >> "hsa", > >> "lto", > >> - "ipa_sra" > >> + "ipa_sra", > >> + "ext_symtab" > > I would move ext_symtab next to symtab so the sections remains at least > > bit reasonably ordered. > > Ok, I'll adjust it and I will send a separate patch where we bump > LTO_major_version. > > >> > >> +/* Write extension information for symbols (symbol type, section flags). > >> */ > >> + > >> +static void > >> +write_symbol_extension_info (tree t) > >> +{ > >> + unsigned char c; > > Do we still use vertical whitespace after decls per GNU coding style? > > Dunno. This seems to me like a nit. > > >> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h > >> index 25bf6c468f7..4f82b439360 100644 > >> --- a/gcc/lto-streamer.h > >> +++ b/gcc/lto-streamer.h > >> @@ -236,6 +236,7 @@ enum lto_section_type > >> LTO_section_ipa_hsa, > >> LTO_section_lto, > >> LTO_section_ipa_sra, > >> + LTO_section_symtab_extension, > > I guess symtab_ext to match the actual section name? > > No. See e.g. LTO_section_jump_functions - "jmpfuncs". We want to have more > descriptive > enum names. > > >> LTO_N_SECTION_TYPES /* Must be last. */ > >> }; > >> > >> diff --git a/include/lto-symtab.h b/include/lto-symtab.h > >> index 0ce0de10121..47f0ff27df8 100644 > >> --- a/include/lto-symtab.h > >> +++ b/include/lto-symtab.h > >> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility > >> GCCPV_HIDDEN > >> }; > >> > >> +enum gcc_plugin_symbol_type > >> +{ > >> + GCCST_UNKNOWN, > >> + GCCST_FUNCTION, > >> + GCCST_VARIABLE, > >> +}; > >> + > >> +enum gcc_plugin_symbol_section_flags > >> +{ > >> + GCCSSS_BSS = 1 > >> +}; > > > > Probably comments here? > > No. There are just shadow copy of enum types from plugin-api.h which > are documented. > > >> + > >> #endif /* GCC_LTO_SYMTAB_H */ > >> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed > >> + by P and the result is written in ENTRY. The slot number is stored in > >> SLOT. > >> + Returns the address of the next entry. */ > >> + > >> +static char * > >> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) > >> +{ > >> + unsigned char t; > >> + enum ld_plugin_symbol_type symbol_types[] = > >> +{ > >> + LDST_UNKNOWN, > >> + LDST_FUNCTION, > >> + LDST_VARIABLE, > >> +}; > >> + > >> + t = *p; > >> + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); > >> + entry->symbol_type = symbol_types[t]; > >> + p++; > >> + entry->section_flags = *p; > >> + p++; > >> + > >> + return p; > >> +} > > > > I think we have chance to make some plan for future extensions without > > introducing too many additional sections. > > > > Currently there are 2 bytes per entry, while only 3 bits are actively > > used of them. If we invent next flag to pass we can use unused bits > > however we need a way to indicate to plugin that the bit is defined. > > This could be done by a simple version byte at the beggining of > > ext_symtab section which will be 0 now and once we define extra bits we > > bump it up to 1. > > I like the suggested change, it can help us in the future. > > > > > It is not that important given that even empty file results in 2k LTO > > object file, but I think it would be nicer in longer run. > >> + /* This is for compatibility with older ABIs. */ > > Perhaps say here that this ABI defined only "int def;" > > Good point. > > > > > The patch look good to me. Thanks for the work! > > Thanks. I'm sending updated patch that I've just tested on lto.exp and > both binutils master and HJ's branch that utilizes the new API. @@ -495,10 +560,16 @@ write_resolution (void) /* Version 2 of API supports IRONLY_EXP resolution that is accepted by GCC-4.7 and newer. */ - if (get_symbols_v2) -get_symbols_v2 (info->handle, symtab->nsyms, syms); + if (get_symbols_v4) + get_symbols_v4 (info->handle, symtab->nsyms, syms); else -get_symbols (info->handle, symtab->nsyms, syms); + { + clear_new_symtab_flags (symtab); can you instead just avoid parsing the ext symtab? + if (get_symbols_v2
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/18/20 12:27 AM, Jan Hubicka wrote: Hi. There's updated version of the patch. Changes from the previous version: - comment added to ld_plugin_symbol - new section renamed to ext_symtab - assert added for loop iterations in produce_symtab and produce_symtab_extension Hi, I hope this is last version of the patch. Hello. Yes. 2020-03-12 Martin Liska * lto-section-in.c: Add extension_symtab. ext_symtab :) Fixed. diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index c17dd69dbdd..78b015be696 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "mode_table", "hsa", "lto", - "ipa_sra" + "ipa_sra", + "ext_symtab" I would move ext_symtab next to symtab so the sections remains at least bit reasonably ordered. Ok, I'll adjust it and I will send a separate patch where we bump LTO_major_version. +/* Write extension information for symbols (symbol type, section flags). */ + +static void +write_symbol_extension_info (tree t) +{ + unsigned char c; Do we still use vertical whitespace after decls per GNU coding style? Dunno. This seems to me like a nit. diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 25bf6c468f7..4f82b439360 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -236,6 +236,7 @@ enum lto_section_type LTO_section_ipa_hsa, LTO_section_lto, LTO_section_ipa_sra, + LTO_section_symtab_extension, I guess symtab_ext to match the actual section name? No. See e.g. LTO_section_jump_functions - "jmpfuncs". We want to have more descriptive enum names. LTO_N_SECTION_TYPES /* Must be last. */ }; diff --git a/include/lto-symtab.h b/include/lto-symtab.h index 0ce0de10121..47f0ff27df8 100644 --- a/include/lto-symtab.h +++ b/include/lto-symtab.h @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility GCCPV_HIDDEN }; +enum gcc_plugin_symbol_type +{ + GCCST_UNKNOWN, + GCCST_FUNCTION, + GCCST_VARIABLE, +}; + +enum gcc_plugin_symbol_section_flags +{ + GCCSSS_BSS = 1 +}; Probably comments here? No. There are just shadow copy of enum types from plugin-api.h which are documented. + #endif /* GCC_LTO_SYMTAB_H */ +/* Parse an entry of the IL symbol table. The data to be parsed is pointed + by P and the result is written in ENTRY. The slot number is stored in SLOT. + Returns the address of the next entry. */ + +static char * +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) +{ + unsigned char t; + enum ld_plugin_symbol_type symbol_types[] = +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE, +}; + + t = *p; + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); + entry->symbol_type = symbol_types[t]; + p++; + entry->section_flags = *p; + p++; + + return p; +} I think we have chance to make some plan for future extensions without introducing too many additional sections. Currently there are 2 bytes per entry, while only 3 bits are actively used of them. If we invent next flag to pass we can use unused bits however we need a way to indicate to plugin that the bit is defined. This could be done by a simple version byte at the beggining of ext_symtab section which will be 0 now and once we define extra bits we bump it up to 1. I like the suggested change, it can help us in the future. It is not that important given that even empty file results in 2k LTO object file, but I think it would be nicer in longer run. + /* This is for compatibility with older ABIs. */ Perhaps say here that this ABI defined only "int def;" Good point. The patch look good to me. Thanks for the work! Thanks. I'm sending updated patch that I've just tested on lto.exp and both binutils master and HJ's branch that utilizes the new API. Martin Honza +#ifdef __BIG_ENDIAN__ + char unused; + char section_flags; + char symbol_type; + char def; +#else + char def; + char symbol_type; + char section_flags; + char unused; +#endif int visibility; uint64_t size; char *comdat_key; @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility LDPV_HIDDEN }; +/* The type of the symbol. */ + +enum ld_plugin_symbol_type +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE, +}; + +enum ld_plugin_symbol_section_flags +{ + LDSSS_BSS = 1 +}; + /* How a symbol is resolved. */ enum ld_plugin_symbol_resolution @@ -431,7 +456,9 @@ enum ld_plugin_tag LDPT_GET_INPUT_SECTION_ALIGNMENT = 29, LDPT_GET_INPUT_SECTION_SIZE = 30, LDPT_REGISTER_NEW_INPUT_HOOK = 31, - LDPT_GET_WRAP_SYMBOLS = 32 + LDPT_GET_WRAP_SYMBOLS = 32, + LDPT_ADD_SYMBOLS_V2 = 33, + LDPT_GET_SYMBOLS_V4 = 34, }; /* The plugin transfer vector. */ -- 2.25.1 >From 492e7dc5b5f792b2e9f92b5fc77e47fe9ee98da7 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH 2/3] API extension for binutils (type of symbols). gcc/ChangeLo
Re: [PATCH][RFC] API extension for binutils (type of symbols).
> Hi. > > There's updated version of the patch. > Changes from the previous version: > - comment added to ld_plugin_symbol > - new section renamed to ext_symtab > - assert added for loop iterations in produce_symtab and > produce_symtab_extension Hi, I hope this is last version of the patch. > > 2020-03-12 Martin Liska > > * lto-section-in.c: Add extension_symtab. ext_symtab :) > diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > index c17dd69dbdd..78b015be696 100644 > --- a/gcc/lto-section-in.c > +++ b/gcc/lto-section-in.c > @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = >"mode_table", >"hsa", >"lto", > - "ipa_sra" > + "ipa_sra", > + "ext_symtab" I would move ext_symtab next to symtab so the sections remains at least bit reasonably ordered. > > +/* Write extension information for symbols (symbol type, section flags). */ > + > +static void > +write_symbol_extension_info (tree t) > +{ > + unsigned char c; Do we still use vertical whitespace after decls per GNU coding style? > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h > index 25bf6c468f7..4f82b439360 100644 > --- a/gcc/lto-streamer.h > +++ b/gcc/lto-streamer.h > @@ -236,6 +236,7 @@ enum lto_section_type >LTO_section_ipa_hsa, >LTO_section_lto, >LTO_section_ipa_sra, > + LTO_section_symtab_extension, I guess symtab_ext to match the actual section name? >LTO_N_SECTION_TYPES/* Must be last. */ > }; > > diff --git a/include/lto-symtab.h b/include/lto-symtab.h > index 0ce0de10121..47f0ff27df8 100644 > --- a/include/lto-symtab.h > +++ b/include/lto-symtab.h > @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility > GCCPV_HIDDEN >}; > > +enum gcc_plugin_symbol_type > +{ > + GCCST_UNKNOWN, > + GCCST_FUNCTION, > + GCCST_VARIABLE, > +}; > + > +enum gcc_plugin_symbol_section_flags > +{ > + GCCSSS_BSS = 1 > +}; Probably comments here? > + > #endif /* GCC_LTO_SYMTAB_H */ > +/* Parse an entry of the IL symbol table. The data to be parsed is pointed > + by P and the result is written in ENTRY. The slot number is stored in > SLOT. > + Returns the address of the next entry. */ > + > +static char * > +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry) > +{ > + unsigned char t; > + enum ld_plugin_symbol_type symbol_types[] = > +{ > + LDST_UNKNOWN, > + LDST_FUNCTION, > + LDST_VARIABLE, > +}; > + > + t = *p; > + check (t <= 3, LDPL_FATAL, "invalid symbol type found"); > + entry->symbol_type = symbol_types[t]; > + p++; > + entry->section_flags = *p; > + p++; > + > + return p; > +} I think we have chance to make some plan for future extensions without introducing too many additional sections. Currently there are 2 bytes per entry, while only 3 bits are actively used of them. If we invent next flag to pass we can use unused bits however we need a way to indicate to plugin that the bit is defined. This could be done by a simple version byte at the beggining of ext_symtab section which will be 0 now and once we define extra bits we bump it up to 1. It is not that important given that even empty file results in 2k LTO object file, but I think it would be nicer in longer run. > + /* This is for compatibility with older ABIs. */ Perhaps say here that this ABI defined only "int def;" The patch look good to me. Thanks for the work! Honza > +#ifdef __BIG_ENDIAN__ > + char unused; > + char section_flags; > + char symbol_type; > + char def; > +#else > + char def; > + char symbol_type; > + char section_flags; > + char unused; > +#endif >int visibility; >uint64_t size; >char *comdat_key; > @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility >LDPV_HIDDEN > }; > > +/* The type of the symbol. */ > + > +enum ld_plugin_symbol_type > +{ > + LDST_UNKNOWN, > + LDST_FUNCTION, > + LDST_VARIABLE, > +}; > + > +enum ld_plugin_symbol_section_flags > +{ > + LDSSS_BSS = 1 > +}; > + > /* How a symbol is resolved. */ > > enum ld_plugin_symbol_resolution > @@ -431,7 +456,9 @@ enum ld_plugin_tag >LDPT_GET_INPUT_SECTION_ALIGNMENT = 29, >LDPT_GET_INPUT_SECTION_SIZE = 30, >LDPT_REGISTER_NEW_INPUT_HOOK = 31, > - LDPT_GET_WRAP_SYMBOLS = 32 > + LDPT_GET_WRAP_SYMBOLS = 32, > + LDPT_ADD_SYMBOLS_V2 = 33, > + LDPT_GET_SYMBOLS_V4 = 34, > }; > > /* The plugin transfer vector. */ > -- > 2.25.1 >
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/16/20 3:34 PM, H.J. Lu wrote: It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2. Sorry. It's fixed now. Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Mon, Mar 16, 2020 at 6:50 AM Martin Liška wrote: > > On 3/16/20 12:12 PM, H.J. Lu wrote: > > (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and > > ^^^ Typo, > > Thank you. I fixed that in my development branch. Still not right. It should be 2020-03-12 Martin Liska PR binutils/25640 * plugin-api.h (struct ld_plugin_symbol): Split int def into 4 char fields. (enum ld_plugin_symbol_type): New. (enum ld_plugin_symbol_section_flags): New. (enum ld_plugin_tag): Add LDPT_ADD_SYMBOLS_V2 and LDPT_GET_SYMBOLS_V4. It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2. -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/16/20 12:12 PM, H.J. Lu wrote: (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and ^^^ Typo, Thank you. I fixed that in my development branch. Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 12, 2020 at 7:54 AM Martin Liška wrote: > > Hi. > > There's updated version of the patch. > Changes from the previous version: > - comment added to ld_plugin_symbol > - new section renamed to ext_symtab > - assert added for loop iterations in produce_symtab and > produce_symtab_extension > > Martin 2020-03-12 Martin Liska * plugin-api.h (struct ld_plugin_symbol): Split int def into 4 char fields. (enum ld_plugin_symbol_type): New. (enum ld_plugin_symbol_section_flags): New. (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and ^^^ Typo, should be LDPT_ADD_SYMBOLS_V2. LDPT_GET_SYMBOLS_V4. -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
Hi. There's updated version of the patch. Changes from the previous version: - comment added to ld_plugin_symbol - new section renamed to ext_symtab - assert added for loop iterations in produce_symtab and produce_symtab_extension Martin >From ad24565cfb3166fdd9995381b25c1f558c7bcf8c Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH 2/2] API extension for binutils (type of symbols). gcc/ChangeLog: 2020-03-12 Martin Liska * lto-section-in.c: Add extension_symtab. * lto-streamer-out.c (write_symbol_extension_info): New. (produce_symtab_extension): New. (produce_asm_for_decls): Stream also produce_symtab_extension. * lto-streamer.h (enum lto_section_type): New section. include/ChangeLog: 2020-03-12 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. (enum gcc_plugin_symbol_section_flags): Likewise. lto-plugin/ChangeLog: 2020-03-12 Martin Liska * lto-plugin.c (LTO_SECTION_PREFIX): Rename to ... (LTO_SYMTAB_PREFIX): ... this. (LTO_SECTION_PREFIX_LEN): Rename to ... (LTO_SYMTAB_PREFIX_LEN): ... this. (LTO_SYMTAB_EXT_PREFIX): New. (LTO_SYMTAB_EXT_PREFIX_LEN): New. (LTO_LTO_PREFIX): New. (LTO_LTO_PREFIX_LEN): New. (parse_table_entry): Fill up unused to zero. (parse_table_entry_extension): New. (parse_symtab_extension): New. (finish_conflict_resolution): Change type for resolution. (clear_new_symtab_flags): New. (write_resolution): Support new get_symbols_v4. (process_symtab): Use new macro name. (process_symtab_extension): New. (claim_file_handler): Parse also process_symtab_extension. (onload): Call new add_symbols_v2. --- gcc/lto-section-in.c| 3 +- gcc/lto-streamer-out.c | 76 +- gcc/lto-streamer.h | 1 + include/lto-symtab.h| 12 +++ lto-plugin/lto-plugin.c | 165 +++- 5 files changed, 237 insertions(+), 20 deletions(-) diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index c17dd69dbdd..78b015be696 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "mode_table", "hsa", "lto", - "ipa_sra" + "ipa_sra", + "ext_symtab" }; /* Hooks so that the ipa passes can call into the lto front end to get diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index cea5e71cffb..e459ec91856 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "print-tree.h" #include "tree-dfa.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "output.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache, lto_write_data (&slot_num, 4); } +/* Write extension information for symbols (symbol type, section flags). */ + +static void +write_symbol_extension_info (tree t) +{ + unsigned char c; + c = ((unsigned char) TREE_CODE (t) == VAR_DECL + ? GCCST_VARIABLE : GCCST_FUNCTION); + lto_write_data (&c, 1); + unsigned char section_flags = 0; + if (TREE_CODE (t) == VAR_DECL) +{ + section *s = get_variable_section (t, false); + if (s->common.flags & SECTION_BSS) + section_flags |= GCCSSS_BSS; +} + lto_write_data (§ion_flags, 1); +} + /* Write an IL symbol table to OB. SET and VSET are cgraph/varpool node sets we are outputting. */ -static void +static unsigned int produce_symtab (struct output_block *ob) { + unsigned int streamed_symbols = 0; struct streamer_tree_cache_d *cache = ob->writer_cache; char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL); lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; @@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob) if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ()) continue; write_symbol (cache, node->decl, &seen, false); + ++streamed_symbols; } for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei)) @@ -2813,8 +2835,55 @@ produce_symtab (struct output_block *ob) if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ()) continue; write_symbol (cache, node->decl, &seen, false); + ++streamed_symbols; +} + + lto_end_section (); + + return streamed_symbols; +} + +/* Write an IL symbol table extension to OB. + SET and VSET are cgraph/varpool node sets we are outputting. */ + +static void +produce_symtab_extension (struct output_block *ob, + unsigned int previous_streamed_symbols) +{ + unsigned int streamed_symbols = 0; + char *section_name = lto_get_section_name (LTO_section_symtab_extension, + NULL, 0, NULL); + lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; + lto_symtab_encoder_iterator lsei; + + lto_begin_section (section_name, false); + free (section_name); + + /*
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/12/20 2:15 PM, Richard Biener wrote: #elif defined __LITTLE_ENDIAN__ Hmm, the macro is not defined. Even a simple test-case shows that: $ cat le.c #ifdef __LITTLE_ENDIAN__ asdfa #endif $ gcc le.c -c [no error]
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Thu, Mar 12, 2020 at 2:11 PM Martin Liška wrote: > > On 3/12/20 1:55 PM, Jan Hubicka wrote: > >> gcc/ChangeLog: > >> > >> 2020-03-12 Martin Liska > >> > >> * lto-section-in.c: Add extension_symtab. > >> * lto-streamer-out.c (write_symbol_extension_info): > >> New. > >> (produce_symtab_extension): New. > >> (produce_asm_for_decls): Stream also produce_symtab_extension. > >> * lto-streamer.h (enum lto_section_type): New section. > >> > >> include/ChangeLog: > >> > >> 2020-03-12 Martin Liska > >> > >> * lto-symtab.h (enum gcc_plugin_symbol_type): New. > >> (enum gcc_plugin_symbol_section_flags): Likewise. > >> > >> lto-plugin/ChangeLog: > >> > >> 2020-03-12 Martin Liska > >> > >> * lto-plugin.c (LTO_SECTION_PREFIX): Rename to > >> ... > >> (LTO_SYMTAB_PREFIX): ... this. > >> (LTO_SECTION_PREFIX_LEN): Rename to ... > >> (LTO_SYMTAB_PREFIX_LEN): ... this. > >> (LTO_SYMTAB_EXT_PREFIX): New. > >> (LTO_SYMTAB_EXT_PREFIX_LEN): New. > >> (LTO_LTO_PREFIX): New. > >> (LTO_LTO_PREFIX_LEN): New. > >> (parse_table_entry): Fill up unused to zero. > >> (parse_table_entry_extension): New. > >> (parse_symtab_extension): New. > >> (finish_conflict_resolution): Change type > >> for resolution. > >> (clear_new_symtab_flags): New. > >> (write_resolution): Support new get_symbols_v4. > >> (process_symtab): Use new macro name. > >> (process_symtab_extension): New. > >> (claim_file_handler): Parse also process_symtab_extension. > >> (onload): Call new add_symbols_v2. > > Hi, > > thanks for updating the patch. Two minor nits > >> --- > >> gcc/lto-section-in.c| 3 +- > >> gcc/lto-streamer-out.c | 64 +++- > >> gcc/lto-streamer.h | 1 + > >> include/lto-symtab.h| 12 +++ > >> lto-plugin/lto-plugin.c | 165 +++- > >> 5 files changed, 226 insertions(+), 19 deletions(-) > >> > >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > >> index c17dd69dbdd..7bf59c513fc 100644 > >> --- a/gcc/lto-section-in.c > >> +++ b/gcc/lto-section-in.c > >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = > >> "mode_table", > >> "hsa", > >> "lto", > >> - "ipa_sra" > >> + "ipa_sra", > >> + "extension_symtab" > > I would call it symtab_ext so alphabetically it is coupled with symtab. Maybe ext_symtab then, proper enlish for the long form would be extended_symtab I guess. > It's problematic because of this collision where we search for prefix: > > if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0) > > Note that the section names have suffix: > .gnu.lto_.symtab.48f1e54b1c048b0f > > > I wonder if moving it up in the enum would make it physically appear > > next to .symtab in object file so we save a bit of i/o? > > It's not about names order, but streaming order. Which should be fine: > > $ readelf -SW /tmp/bss.o > ... >[10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS > 0001dd 0002de 00 E 0 0 1 >[11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS > 0004bb 49 00 E 0 0 1 >[12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS > 000504 06 00 E 0 0 1 >[13] .gnu.lto_.optsPROGBITS 00050a 51 00 > E 0 0 1 > ... > > > >> +static void > >> +produce_symtab_extension (struct output_block *ob) > >> +{ > >> + char *section_name = lto_get_section_name (LTO_section_symtab_extension, > >> + NULL, 0, NULL); > >> + lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; > >> + lto_symtab_encoder_iterator lsei; > >> + > >> + lto_begin_section (section_name, false); > >> + free (section_name); > >> + > >> + /* Write the symbol table. > >> + First write everything defined and then all declarations. > >> + This is necessary to handle cases where we have duplicated symbols. > >> */ > >> + for (lsei = lsei_start (encoder); > >> + !lsei_end_p (lsei); lsei_next (&lsei)) > >> +{ > >> + symtab_node *node = lsei_node (lsei); > >> + > >> + if (DECL_EXTERNAL (node->decl) || > >> !node->output_to_lto_symbol_table_p ()) > >> +continue; > >> + write_symbol_extension_info (node->decl); > >> +} > >> + for (lsei = lsei_start (encoder); > >> + !lsei_end_p (lsei); lsei_next (&lsei)) > >> +{ > >> + symtab_node *node = lsei_node (lsei); > >> + > >> + if (!DECL_EXTERNAL (node->decl) || > >> !node->output_to_lto_symbol_table_p ()) > >> +continue; > >> + write_symbol_extension_info (node->decl); > >> +} > >> + > >> + lto_end_section (); > >> +} > > > > It seems fragile to me to duplicate the loops that needs to be kept in > > sync because breaking that will likely get bit suprising outcome (i.e. > > bootstrap will work since we do not
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/12/20 1:55 PM, Jan Hubicka wrote: gcc/ChangeLog: 2020-03-12 Martin Liska * lto-section-in.c: Add extension_symtab. * lto-streamer-out.c (write_symbol_extension_info): New. (produce_symtab_extension): New. (produce_asm_for_decls): Stream also produce_symtab_extension. * lto-streamer.h (enum lto_section_type): New section. include/ChangeLog: 2020-03-12 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. (enum gcc_plugin_symbol_section_flags): Likewise. lto-plugin/ChangeLog: 2020-03-12 Martin Liska * lto-plugin.c (LTO_SECTION_PREFIX): Rename to ... (LTO_SYMTAB_PREFIX): ... this. (LTO_SECTION_PREFIX_LEN): Rename to ... (LTO_SYMTAB_PREFIX_LEN): ... this. (LTO_SYMTAB_EXT_PREFIX): New. (LTO_SYMTAB_EXT_PREFIX_LEN): New. (LTO_LTO_PREFIX): New. (LTO_LTO_PREFIX_LEN): New. (parse_table_entry): Fill up unused to zero. (parse_table_entry_extension): New. (parse_symtab_extension): New. (finish_conflict_resolution): Change type for resolution. (clear_new_symtab_flags): New. (write_resolution): Support new get_symbols_v4. (process_symtab): Use new macro name. (process_symtab_extension): New. (claim_file_handler): Parse also process_symtab_extension. (onload): Call new add_symbols_v2. Hi, thanks for updating the patch. Two minor nits --- gcc/lto-section-in.c| 3 +- gcc/lto-streamer-out.c | 64 +++- gcc/lto-streamer.h | 1 + include/lto-symtab.h| 12 +++ lto-plugin/lto-plugin.c | 165 +++- 5 files changed, 226 insertions(+), 19 deletions(-) diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index c17dd69dbdd..7bf59c513fc 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "mode_table", "hsa", "lto", - "ipa_sra" + "ipa_sra", + "extension_symtab" I would call it symtab_ext so alphabetically it is coupled with symtab. It's problematic because of this collision where we search for prefix: if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0) Note that the section names have suffix: .gnu.lto_.symtab.48f1e54b1c048b0f I wonder if moving it up in the enum would make it physically appear next to .symtab in object file so we save a bit of i/o? It's not about names order, but streaming order. Which should be fine: $ readelf -SW /tmp/bss.o ... [10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS 0001dd 0002de 00 E 0 0 1 [11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS 0004bb 49 00 E 0 0 1 [12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS 000504 06 00 E 0 0 1 [13] .gnu.lto_.optsPROGBITS 00050a 51 00 E 0 0 1 ... +static void +produce_symtab_extension (struct output_block *ob) +{ + char *section_name = lto_get_section_name (LTO_section_symtab_extension, +NULL, 0, NULL); + lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; + lto_symtab_encoder_iterator lsei; + + lto_begin_section (section_name, false); + free (section_name); + + /* Write the symbol table. + First write everything defined and then all declarations. + This is necessary to handle cases where we have duplicated symbols. */ + for (lsei = lsei_start (encoder); + !lsei_end_p (lsei); lsei_next (&lsei)) +{ + symtab_node *node = lsei_node (lsei); + + if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ()) + continue; + write_symbol_extension_info (node->decl); +} + for (lsei = lsei_start (encoder); + !lsei_end_p (lsei); lsei_next (&lsei)) +{ + symtab_node *node = lsei_node (lsei); + + if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ()) + continue; + write_symbol_extension_info (node->decl); +} + + lto_end_section (); +} It seems fragile to me to duplicate the loops that needs to be kept in sync because breaking that will likely get bit suprising outcome (i.e. bootstrap will work since we do not care about symbol types). Perhaps it would be more robust to simply push the extension bits into a vector in write_symbol for later streaming. Or at least add comment that loops needs to be kept in sync. I'll add comment. diff --git a/include/plugin-api.h b/include/plugin-api.h index 09e1202df07..804684449cf 100644 --- a/include/plugin-api.h +++ b/include/plugin-api.h @@ -87,7 +87,17 @@ struct ld_plugin_symbol { char *name; char *version; - int def; +#ifdef __BIG_ENDIAN__ + char unused; + char section_flags; + char symbol_type; + char def; +#else + char def; + char symbol_type; +
Re: [PATCH][RFC] API extension for binutils (type of symbols).
> gcc/ChangeLog: > > 2020-03-12 Martin Liska > > * lto-section-in.c: Add extension_symtab. > * lto-streamer-out.c (write_symbol_extension_info): > New. > (produce_symtab_extension): New. > (produce_asm_for_decls): Stream also produce_symtab_extension. > * lto-streamer.h (enum lto_section_type): New section. > > include/ChangeLog: > > 2020-03-12 Martin Liska > > * lto-symtab.h (enum gcc_plugin_symbol_type): New. > (enum gcc_plugin_symbol_section_flags): Likewise. > > lto-plugin/ChangeLog: > > 2020-03-12 Martin Liska > > * lto-plugin.c (LTO_SECTION_PREFIX): Rename to > ... > (LTO_SYMTAB_PREFIX): ... this. > (LTO_SECTION_PREFIX_LEN): Rename to ... > (LTO_SYMTAB_PREFIX_LEN): ... this. > (LTO_SYMTAB_EXT_PREFIX): New. > (LTO_SYMTAB_EXT_PREFIX_LEN): New. > (LTO_LTO_PREFIX): New. > (LTO_LTO_PREFIX_LEN): New. > (parse_table_entry): Fill up unused to zero. > (parse_table_entry_extension): New. > (parse_symtab_extension): New. > (finish_conflict_resolution): Change type > for resolution. > (clear_new_symtab_flags): New. > (write_resolution): Support new get_symbols_v4. > (process_symtab): Use new macro name. > (process_symtab_extension): New. > (claim_file_handler): Parse also process_symtab_extension. > (onload): Call new add_symbols_v2. Hi, thanks for updating the patch. Two minor nits > --- > gcc/lto-section-in.c| 3 +- > gcc/lto-streamer-out.c | 64 +++- > gcc/lto-streamer.h | 1 + > include/lto-symtab.h| 12 +++ > lto-plugin/lto-plugin.c | 165 +++- > 5 files changed, 226 insertions(+), 19 deletions(-) > > diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c > index c17dd69dbdd..7bf59c513fc 100644 > --- a/gcc/lto-section-in.c > +++ b/gcc/lto-section-in.c > @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = >"mode_table", >"hsa", >"lto", > - "ipa_sra" > + "ipa_sra", > + "extension_symtab" I would call it symtab_ext so alphabetically it is coupled with symtab. I wonder if moving it up in the enum would make it physically appear next to .symtab in object file so we save a bit of i/o? > +static void > +produce_symtab_extension (struct output_block *ob) > +{ > + char *section_name = lto_get_section_name (LTO_section_symtab_extension, > + NULL, 0, NULL); > + lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; > + lto_symtab_encoder_iterator lsei; > + > + lto_begin_section (section_name, false); > + free (section_name); > + > + /* Write the symbol table. > + First write everything defined and then all declarations. > + This is necessary to handle cases where we have duplicated symbols. */ > + for (lsei = lsei_start (encoder); > + !lsei_end_p (lsei); lsei_next (&lsei)) > +{ > + symtab_node *node = lsei_node (lsei); > + > + if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p > ()) > + continue; > + write_symbol_extension_info (node->decl); > +} > + for (lsei = lsei_start (encoder); > + !lsei_end_p (lsei); lsei_next (&lsei)) > +{ > + symtab_node *node = lsei_node (lsei); > + > + if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p > ()) > + continue; > + write_symbol_extension_info (node->decl); > +} > + > + lto_end_section (); > +} It seems fragile to me to duplicate the loops that needs to be kept in sync because breaking that will likely get bit suprising outcome (i.e. bootstrap will work since we do not care about symbol types). Perhaps it would be more robust to simply push the extension bits into a vector in write_symbol for later streaming. Or at least add comment that loops needs to be kept in sync. > diff --git a/include/plugin-api.h b/include/plugin-api.h > index 09e1202df07..804684449cf 100644 > --- a/include/plugin-api.h > +++ b/include/plugin-api.h > @@ -87,7 +87,17 @@ struct ld_plugin_symbol > { >char *name; >char *version; > - int def; > +#ifdef __BIG_ENDIAN__ > + char unused; > + char section_flags; > + char symbol_type; > + char def; > +#else > + char def; > + char symbol_type; > + char section_flags; > + char unused; > +#endif Perhaps at least drop a comment why this is done this way (i.e. compatibility with V3 version of API) so in 10 years we know? Bit more systematic way would be to add a new hook to query extended properties of a given symbol. I.e. something like void *get_symbol_property (symbol, enum property) Honza
Re: [PATCH][RFC] API extension for binutils (type of symbols).
Hi. I'm sending extended version of the patch where I address Honza's note about backward compatibility. So I add a next section _lto.extension_symtab where I put new bits. With that, new LTO bytecode can be opened with older LTO plugin. Moreover, I've also tested lto.exp tests and binutils master. And I can confirm that patches work with H.J.'s counterpart: https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/plugin/master $ cat bss.c int global_zero; int global_one = 1; int main() { return 0; } $ gcc -c -flto bss.c Old bintuils (with new plugin): $ ./binutils/nm-new /tmp/bss.o --plugin /home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0 T global_one T global_zero T main H.J.'s binutils (with new plugin): $ ./binutils/nm-new /tmp/bss.o --plugin /home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0 D global_one B global_zero T main System binutils (with old plugin): $ nm /tmp/bss.o T global_one T global_zero T main Hope I'm done now. Thoughts? Thanks, Martin >From 8968bbdf627dc26056f1cb344e253eedae0efca1 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH 2/2] API extension for binutils (type of symbols). gcc/ChangeLog: 2020-03-12 Martin Liska * lto-section-in.c: Add extension_symtab. * lto-streamer-out.c (write_symbol_extension_info): New. (produce_symtab_extension): New. (produce_asm_for_decls): Stream also produce_symtab_extension. * lto-streamer.h (enum lto_section_type): New section. include/ChangeLog: 2020-03-12 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. (enum gcc_plugin_symbol_section_flags): Likewise. lto-plugin/ChangeLog: 2020-03-12 Martin Liska * lto-plugin.c (LTO_SECTION_PREFIX): Rename to ... (LTO_SYMTAB_PREFIX): ... this. (LTO_SECTION_PREFIX_LEN): Rename to ... (LTO_SYMTAB_PREFIX_LEN): ... this. (LTO_SYMTAB_EXT_PREFIX): New. (LTO_SYMTAB_EXT_PREFIX_LEN): New. (LTO_LTO_PREFIX): New. (LTO_LTO_PREFIX_LEN): New. (parse_table_entry): Fill up unused to zero. (parse_table_entry_extension): New. (parse_symtab_extension): New. (finish_conflict_resolution): Change type for resolution. (clear_new_symtab_flags): New. (write_resolution): Support new get_symbols_v4. (process_symtab): Use new macro name. (process_symtab_extension): New. (claim_file_handler): Parse also process_symtab_extension. (onload): Call new add_symbols_v2. --- gcc/lto-section-in.c| 3 +- gcc/lto-streamer-out.c | 64 +++- gcc/lto-streamer.h | 1 + include/lto-symtab.h| 12 +++ lto-plugin/lto-plugin.c | 165 +++- 5 files changed, 226 insertions(+), 19 deletions(-) diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index c17dd69dbdd..7bf59c513fc 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "mode_table", "hsa", "lto", - "ipa_sra" + "ipa_sra", + "extension_symtab" }; /* Hooks so that the ipa passes can call into the lto front end to get diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index cea5e71cffb..2fd20252ba5 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "print-tree.h" #include "tree-dfa.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "output.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -2777,6 +2778,25 @@ write_symbol (struct streamer_tree_cache_d *cache, lto_write_data (&slot_num, 4); } +/* Write extension information for symbols (symbol type, section flags). */ + +static void +write_symbol_extension_info (tree t) +{ + unsigned char c; + c = ((unsigned char) TREE_CODE (t) == VAR_DECL + ? GCCST_VARIABLE : GCCST_FUNCTION); + lto_write_data (&c, 1); + unsigned char section_flags = 0; + if (TREE_CODE (t) == VAR_DECL) +{ + section *s = get_variable_section (t, false); + if (s->common.flags & SECTION_BSS) + section_flags |= GCCSSS_BSS; +} + lto_write_data (§ion_flags, 1); +} + /* Write an IL symbol table to OB. SET and VSET are cgraph/varpool node sets we are outputting. */ @@ -2818,6 +2838,45 @@ produce_symtab (struct output_block *ob) lto_end_section (); } +/* Write an IL symbol table extension to OB. + SET and VSET are cgraph/varpool node sets we are outputting. */ + +static void +produce_symtab_extension (struct output_block *ob) +{ + char *section_name = lto_get_section_name (LTO_section_symtab_extension, + NULL, 0, NULL); + lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; + lto_symtab_encoder_iterator lsei; + + lto_begin_section (section_name, false); + free (section_name); + + /* Write the symbol table. + First write everything defined and then all declarations. + This is necessa
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, 11 Mar 2020, Martin Liška wrote: > > Is there a comprehensive list of plugins out in the wild using the LD > > plugin API? > > I know only about: > $ ls /usr/lib/bfd-plugins > liblto_plugin.so.0.0.0 LLVMgold.so > > and I know about Alexander Monakov (some dead code elimination plug-in). I don't recall seeing any other plugin when working on our "debloating" stuff, but of course I suppose people could have used the plugin API for experimentation. I don't think a comprehensive list could exist. > > Note we also have to bring in gold folks (not sure if lld also > > implements the same plugin API) > > I don't see how can they be affected? As discussed downthread the other linkers could freely ignore the new callback, but fwiw LLD has no plans to implement the plugin-api.h interface: https://bugs.llvm.org/show_bug.cgi?id=42446 Alexander
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/11/20 1:51 PM, Richard Biener wrote: Splitting an existing field isn't hackish IMHO. I guess even explicitely changing it to one short and two char fields would be OK. Good, I'm doing that in the updated version of that patch. H.J. is right now working on the corresponding binutils counterpart. Thoughts? Martin >From 8a2e73edb53f26a1b6a543e2e2027b2661a48f94 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH] API extension for binutils (type of symbols). --- gcc/lto-streamer-out.c | 12 + include/lto-symtab.h| 12 + include/plugin-api.h| 30 - lto-plugin/lto-plugin.c | 99 + 4 files changed, 142 insertions(+), 11 deletions(-) diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index cea5e71cffb..b212be35b3d 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "print-tree.h" #include "tree-dfa.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "output.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -2773,6 +2774,17 @@ write_symbol (struct streamer_tree_cache_d *cache, lto_write_data (&c, 1); c = (unsigned char) visibility; lto_write_data (&c, 1); + c = ((unsigned char) TREE_CODE (t) == VAR_DECL + ? GCCST_VARIABLE : GCCST_FUNCTION); + lto_write_data (&c, 1); + unsigned char section_flags = 0; + if (TREE_CODE (t) == VAR_DECL) +{ + section *s = get_variable_section (t, false); + if (s->common.flags & SECTION_BSS) + section_flags |= GCCSSS_BSS; +} + lto_write_data (§ion_flags, 1); lto_write_data (&size, 8); lto_write_data (&slot_num, 4); } diff --git a/include/lto-symtab.h b/include/lto-symtab.h index 0ce0de10121..47f0ff27df8 100644 --- a/include/lto-symtab.h +++ b/include/lto-symtab.h @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility GCCPV_HIDDEN }; +enum gcc_plugin_symbol_type +{ + GCCST_UNKNOWN, + GCCST_FUNCTION, + GCCST_VARIABLE, +}; + +enum gcc_plugin_symbol_section_flags +{ + GCCSSS_BSS = 1 +}; + #endif /* GCC_LTO_SYMTAB_H */ diff --git a/include/plugin-api.h b/include/plugin-api.h index 09e1202df07..804684449cf 100644 --- a/include/plugin-api.h +++ b/include/plugin-api.h @@ -87,7 +87,17 @@ struct ld_plugin_symbol { char *name; char *version; - int def; +#ifdef __BIG_ENDIAN__ + char unused; + char section_flags; + char symbol_type; + char def; +#else + char def; + char symbol_type; + char section_flags; + char unused; +#endif int visibility; uint64_t size; char *comdat_key; @@ -123,6 +133,20 @@ enum ld_plugin_symbol_visibility LDPV_HIDDEN }; +/* The type of the symbol. */ + +enum ld_plugin_symbol_type +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE, +}; + +enum ld_plugin_symbol_section_flags +{ + LDSSS_BSS = 1 +}; + /* How a symbol is resolved. */ enum ld_plugin_symbol_resolution @@ -431,7 +455,9 @@ enum ld_plugin_tag LDPT_GET_INPUT_SECTION_ALIGNMENT = 29, LDPT_GET_INPUT_SECTION_SIZE = 30, LDPT_REGISTER_NEW_INPUT_HOOK = 31, - LDPT_GET_WRAP_SYMBOLS = 32 + LDPT_GET_WRAP_SYMBOLS = 32, + LDPT_ADD_SYMBOLS_V2 = 33, + LDPT_GET_SYMBOLS_V4 = 34, }; /* The plugin transfer vector. */ diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index c307fc871bf..bce9f183049 100644 --- a/lto-plugin/lto-plugin.c +++ b/lto-plugin/lto-plugin.c @@ -90,6 +90,8 @@ along with this program; see the file COPYING3. If not see #define LTO_SECTION_PREFIX ".gnu.lto_.symtab" #define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) +#define LTO_LTO_PREFIX ".gnu.lto_.lto" +#define LTO_LTO_PREFIX_LEN (sizeof (LTO_LTO_PREFIX) - 1) #define OFFLOAD_SECTION ".gnu.offload_lto_.opts" #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1) @@ -154,12 +156,12 @@ enum symbol_style static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; -static ld_plugin_get_symbols get_symbols, get_symbols_v2; +static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4; static ld_plugin_register_cleanup register_cleanup; static ld_plugin_add_input_file add_input_file; static ld_plugin_add_input_library add_input_library; static ld_plugin_message message; -static ld_plugin_add_symbols add_symbols; +static ld_plugin_add_symbols add_symbols, add_symbols_v2; static struct plugin_file_info *claimed_files = NULL; static unsigned int num_claimed_files = 0; @@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text) } } +/* Structure that represents LTO ELF section with information + about the format. */ + +struct lto_section + { + int16_t major_version; + int16_t minor_version; + unsigned char slim_object: 1; + unsigned char compression: 4; + int32_t reserved0: 27; +}; +
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/11/20 2:24 PM, H.J. Lu wrote: ld-new don't have to use the new interface since it isn't needed. Yeah. We talk about nm that should utilize the new API, right? Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, Mar 11, 2020 at 6:09 AM Martin Liška wrote: > > On 3/11/20 1:51 PM, Richard Biener wrote: > > I'm not sure I understand the versioning, we should aim at something where > > an updated plugin can talk to old and new ld and where a new ld can also > > talk > > to an old plugin. That requires an arbitration which I don't see > > implemented? > > So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, > these contain ld-new don't have to use the new interface since it isn't needed. -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/11/20 1:51 PM, Richard Biener wrote: I'm not sure I understand the versioning, we should aim at something where an updated plugin can talk to old and new ld and where a new ld can also talk to an old plugin. That requires an arbitration which I don't see implemented? So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, these contain ld_plugin_symbol_v2 with filled ::symbol_type. ld-new will check for he new API with LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2 and will tv_add_symbols and tv_get_symbols callbacks. ld-old will use the old tv_add_symbols and tv_get_symbols callback. One another level of compatibility is the LTO bytecode, where new plugin should handle also old LTO bytecode. That's done with parsing of LTO bytecode version (lto_section). Splitting an existing field isn't hackish IMHO. I guess even explicitely changing it to one short and two char fields would be OK. Then I'm fine with that as well. Am I right that the split order will depend of the endianess? Is there a comprehensive list of plugins out in the wild using the LD plugin API? I know only about: $ ls /usr/lib/bfd-plugins liblto_plugin.so.0.0.0 LLVMgold.so and I know about Alexander Monakov (some dead code elimination plug-in). Note we also have to bring in gold folks (not sure if lld also implements the same plugin API) I don't see how can they be affected? Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, Mar 11, 2020 at 1:22 PM Martin Liška wrote: > > On 3/11/20 11:22 AM, Richard Biener wrote: > > On Wed, Mar 11, 2020 at 10:19 AM Martin Liška wrote: > >> > >> On 3/10/20 1:07 PM, Martin Liška wrote: > >>> On 3/10/20 12:24 PM, Richard Biener wrote: > Not sure how symtab is encoded right now but we also could have > >>> > >>> Ok, right now I don't see symtab entry much extensible. > >>> > >>> But what am I suggesting is to parse LTO bytecode version and then > >>> process conditional parsing of lto_symtab section. > >>> > >>> Thoughts? > >>> Martin > >> > >> So as H.J. correctly pointed I can't add new symbol_type into > >> ld_plugin_symbol struct. > >> It would make ABI change as correctly identified by abidiff: > >> > >> abidiff /tmp/before.o /tmp/after.o > >> Functions changes summary: 0 Removed, 1 Changed, 0 Added function > >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > >> Function symbols changes summary: 0 Removed, 0 Added function symbol not > >> referenced by debug info > >> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not > >> referenced by debug info > >> > >> 1 function with some indirect sub-type change: > >> > >> [C]'function ld_plugin_status onload(ld_plugin_tv*)' at > >> lto-plugin.c:1275:1 has some indirect sub-type changes: > >> parameter 1 of type 'ld_plugin_tv*' has sub-type changes: > >> in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1: > >> type size hasn't changed > >> 1 data member changes (1 filtered): > >>type of 'union {int tv_val; const char* tv_string; > >> ld_plugin_register_claim_file tv_register_claim_file; > >> ld_plugin_register_all_symbols_read tv_register_all_symbols_read; > >> ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols > >> tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; > >> ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; > >> ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view > >> tv_get_view; ld_plugin_release_input_file tv_release_input_file; > >> ld_plugin_add_input_library tv_add_input_library; > >> ld_plugin_set_extra_library_path tv_set_extra_library_path; > >> ld_plugin_get_input_section_count tv_get_input_section_count; > >> ld_plugin_get_input_section_type tv_get_input_section_type; > >> ld_plugin_get_input_section_name tv_get_input_section_name; > >> ld_plugin_get_input_section_contents tv_get_input_section_contents; > >> ld_plugin_update_section_order tv_update_section_order; > >> ld_plugin_allow_section_ordering tv_allow_section_ordering; > >> ld_plugin_allow_unique_segment_for_sections > >> tv_allow_unique_segment_for_sections; > >> ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; > >> ld_plugin_get_input_section_alignment tv_get_input_section_alignment; > >> ld_plugin_get_input_section_size tv_get_input_section_size; > >> ld_plugin_register_new_input tv_register_new_input; > >> ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' > >> changed: > >> type size hasn't changed > >> 1 data member changes (1 filtered): > >> type of 'ld_plugin_add_symbols tv_add_symbols' changed: > >> underlying type 'enum ld_plugin_status (void*, int, const > >> ld_plugin_symbol*)*' changed: > >> in pointed to type 'function type enum ld_plugin_status > >> (void*, int, const ld_plugin_symbol*)': > >> parameter 3 of type 'const ld_plugin_symbol*' has > >> sub-type changes: > >> in pointed to type 'const ld_plugin_symbol': > >> in unqualified underlying type 'struct > >> ld_plugin_symbol' at plugin-api.h:86:1: > >> type size changed from 256 to 288 (in bits) > >> 1 data member insertion: > >> 'int ld_plugin_symbol::symbol_type', at offset > >> 256 (in bits) at plugin-api.h:95:1 > >> > >> So that I need to come up with ld_plugin_symbol_v2. It brings more > >> challenges: one has 2 parallel symbol > >> tables: > >> > >> struct plugin_symtab > >> { > >> ... > >> struct ld_plugin_symbol_v2 *syms; > >> struct ld_plugin_symbol *syms_v1; > >> ... > >> }; > >> > >> and the information of these should by aligned. > >> > >> The patch can survive lto.exp and I would like to ask H.J. to write > >> bintuils counterpart that will > >> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. > >> > >> Thoughts? > > > > Can't we simply have _V4/V2 use the upper half of > > ld_plugin_symbol::def? If the linker > > then requests _V4 but the plugin cannot cope it could still "use" the > > data but get > > LDST_UNKNOWN (zero) there. > > Can be possible, but it's hack a bit. The plugin has a mechanisms for > versioning > and this change does not align with the idea. I'm not sure I understand the versioning, we
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/11/20 11:22 AM, Richard Biener wrote: On Wed, Mar 11, 2020 at 10:19 AM Martin Liška wrote: On 3/10/20 1:07 PM, Martin Liška wrote: On 3/10/20 12:24 PM, Richard Biener wrote: Not sure how symtab is encoded right now but we also could have Ok, right now I don't see symtab entry much extensible. But what am I suggesting is to parse LTO bytecode version and then process conditional parsing of lto_symtab section. Thoughts? Martin So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct. It would make ABI change as correctly identified by abidiff: abidiff /tmp/before.o /tmp/after.o Functions changes summary: 0 Removed, 1 Changed, 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info 1 function with some indirect sub-type change: [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes: parameter 1 of type 'ld_plugin_tv*' has sub-type changes: in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1: type size hasn't changed 1 data member changes (1 filtered): type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed: type size hasn't changed 1 data member changes (1 filtered): type of 'ld_plugin_add_symbols tv_add_symbols' changed: underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed: in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)': parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes: in pointed to type 'const ld_plugin_symbol': in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1: type size changed from 256 to 288 (in bits) 1 data member insertion: 'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1 So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol tables: struct plugin_symtab { ... struct ld_plugin_symbol_v2 *syms; struct ld_plugin_symbol *syms_v1; ... }; and the information of these should by aligned. The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. Thoughts? Can't we simply have _V4/V2 use the upper half of ld_plugin_symbol::def? If the linker then requests _V4 but the plugin cannot cope it could still "use" the data but get LDST_UNKNOWN (zero) there. Can be possible, but it's hack a bit. The plugin has a mechanisms for versioning and this change does not align with the idea. IMHO LDST_VARIABLE_BSS is "misplaced"? "BSS" is the section of the variable. Yes. If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC. Note there's also rodata vs data info that would be missing in case we'd want tools like readelf -s dump the symbol table of the IL part of an object. It looks like nm can also distinguish rodata from data ("R", "r" vs "d") and "small object" data sections (not sure what's that about). It seems nm cannot distinguish symbols in mergeable string sections (it dumps "R" for me there). So intead of mangling everything into enum ld
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, Mar 11, 2020 at 11:22 AM Richard Biener wrote: > > On Wed, Mar 11, 2020 at 10:19 AM Martin Liška wrote: > > > > On 3/10/20 1:07 PM, Martin Liška wrote: > > > On 3/10/20 12:24 PM, Richard Biener wrote: > > >> Not sure how symtab is encoded right now but we also could have > > > > > > Ok, right now I don't see symtab entry much extensible. > > > > > > But what am I suggesting is to parse LTO bytecode version and then > > > process conditional parsing of lto_symtab section. > > > > > > Thoughts? > > > Martin > > > > So as H.J. correctly pointed I can't add new symbol_type into > > ld_plugin_symbol struct. > > It would make ABI change as correctly identified by abidiff: > > > > abidiff /tmp/before.o /tmp/after.o > > Functions changes summary: 0 Removed, 1 Changed, 0 Added function > > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > > Function symbols changes summary: 0 Removed, 0 Added function symbol not > > referenced by debug info > > Variable symbols changes summary: 0 Removed, 1 Added variable symbol not > > referenced by debug info > > > > 1 function with some indirect sub-type change: > > > >[C]'function ld_plugin_status onload(ld_plugin_tv*)' at > > lto-plugin.c:1275:1 has some indirect sub-type changes: > > parameter 1 of type 'ld_plugin_tv*' has sub-type changes: > >in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1: > > type size hasn't changed > > 1 data member changes (1 filtered): > > type of 'union {int tv_val; const char* tv_string; > > ld_plugin_register_claim_file tv_register_claim_file; > > ld_plugin_register_all_symbols_read tv_register_all_symbols_read; > > ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols > > tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; > > ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; > > ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; > > ld_plugin_release_input_file tv_release_input_file; > > ld_plugin_add_input_library tv_add_input_library; > > ld_plugin_set_extra_library_path tv_set_extra_library_path; > > ld_plugin_get_input_section_count tv_get_input_section_count; > > ld_plugin_get_input_section_type tv_get_input_section_type; > > ld_plugin_get_input_section_name tv_get_input_section_name; > > ld_plugin_get_input_section_contents tv_get_input_section_contents; > > ld_plugin_update_section_order tv_update_section_order; > > ld_plugin_allow_section_ordering tv_allow_section_ordering; > > ld_plugin_allow_unique_segment_for_sections > > tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections > > tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment > > tv_get_input_section_alignment; ld_plugin_get_input_section_size > > tv_get_input_section_size; ld_plugin_register_new_input > > tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} > > ld_plugin_tv::tv_u' changed: > > type size hasn't changed > > 1 data member changes (1 filtered): > > type of 'ld_plugin_add_symbols tv_add_symbols' changed: > >underlying type 'enum ld_plugin_status (void*, int, const > > ld_plugin_symbol*)*' changed: > > in pointed to type 'function type enum ld_plugin_status > > (void*, int, const ld_plugin_symbol*)': > >parameter 3 of type 'const ld_plugin_symbol*' has > > sub-type changes: > > in pointed to type 'const ld_plugin_symbol': > >in unqualified underlying type 'struct > > ld_plugin_symbol' at plugin-api.h:86:1: > > type size changed from 256 to 288 (in bits) > > 1 data member insertion: > >'int ld_plugin_symbol::symbol_type', at offset > > 256 (in bits) at plugin-api.h:95:1 > > > > So that I need to come up with ld_plugin_symbol_v2. It brings more > > challenges: one has 2 parallel symbol > > tables: > > > > struct plugin_symtab > > { > > ... > >struct ld_plugin_symbol_v2 *syms; > >struct ld_plugin_symbol *syms_v1; > > ... > > }; > > > > and the information of these should by aligned. > > > > The patch can survive lto.exp and I would like to ask H.J. to write > > bintuils counterpart that will > > utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. > > > > Thoughts? > > Can't we simply have _V4/V2 use the upper half of > ld_plugin_symbol::def? If the linker > then requests _V4 but the plugin cannot cope it could still "use" the > data but get > LDST_UNKNOWN (zero) there. > > IMHO LDST_VARIABLE_BSS is "misplaced"? "BSS" is the section of the variable. > If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC. > Note there's also rodata vs data info that would be missing in case > we'd want tools > like readelf -s dump the symbol table of the IL part of an object. It > looks like > nm can also
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Wed, Mar 11, 2020 at 10:19 AM Martin Liška wrote: > > On 3/10/20 1:07 PM, Martin Liška wrote: > > On 3/10/20 12:24 PM, Richard Biener wrote: > >> Not sure how symtab is encoded right now but we also could have > > > > Ok, right now I don't see symtab entry much extensible. > > > > But what am I suggesting is to parse LTO bytecode version and then > > process conditional parsing of lto_symtab section. > > > > Thoughts? > > Martin > > So as H.J. correctly pointed I can't add new symbol_type into > ld_plugin_symbol struct. > It would make ABI change as correctly identified by abidiff: > > abidiff /tmp/before.o /tmp/after.o > Functions changes summary: 0 Removed, 1 Changed, 0 Added function > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > Function symbols changes summary: 0 Removed, 0 Added function symbol not > referenced by debug info > Variable symbols changes summary: 0 Removed, 1 Added variable symbol not > referenced by debug info > > 1 function with some indirect sub-type change: > >[C]'function ld_plugin_status onload(ld_plugin_tv*)' at > lto-plugin.c:1275:1 has some indirect sub-type changes: > parameter 1 of type 'ld_plugin_tv*' has sub-type changes: >in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1: > type size hasn't changed > 1 data member changes (1 filtered): > type of 'union {int tv_val; const char* tv_string; > ld_plugin_register_claim_file tv_register_claim_file; > ld_plugin_register_all_symbols_read tv_register_all_symbols_read; > ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols > tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; > ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; > ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; > ld_plugin_release_input_file tv_release_input_file; > ld_plugin_add_input_library tv_add_input_library; > ld_plugin_set_extra_library_path tv_set_extra_library_path; > ld_plugin_get_input_section_count tv_get_input_section_count; > ld_plugin_get_input_section_type tv_get_input_section_type; > ld_plugin_get_input_section_name tv_get_input_section_name; > ld_plugin_get_input_section_contents tv_get_input_section_contents; > ld_plugin_update_section_order tv_update_section_order; > ld_plugin_allow_section_ordering tv_allow_section_ordering; > ld_plugin_allow_unique_segment_for_sections > tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections > tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment > tv_get_input_section_alignment; ld_plugin_get_input_section_size > tv_get_input_section_size; ld_plugin_register_new_input > tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} > ld_plugin_tv::tv_u' changed: > type size hasn't changed > 1 data member changes (1 filtered): > type of 'ld_plugin_add_symbols tv_add_symbols' changed: >underlying type 'enum ld_plugin_status (void*, int, const > ld_plugin_symbol*)*' changed: > in pointed to type 'function type enum ld_plugin_status > (void*, int, const ld_plugin_symbol*)': >parameter 3 of type 'const ld_plugin_symbol*' has sub-type > changes: > in pointed to type 'const ld_plugin_symbol': >in unqualified underlying type 'struct > ld_plugin_symbol' at plugin-api.h:86:1: > type size changed from 256 to 288 (in bits) > 1 data member insertion: >'int ld_plugin_symbol::symbol_type', at offset 256 > (in bits) at plugin-api.h:95:1 > > So that I need to come up with ld_plugin_symbol_v2. It brings more > challenges: one has 2 parallel symbol > tables: > > struct plugin_symtab > { > ... >struct ld_plugin_symbol_v2 *syms; >struct ld_plugin_symbol *syms_v1; > ... > }; > > and the information of these should by aligned. > > The patch can survive lto.exp and I would like to ask H.J. to write bintuils > counterpart that will > utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. > > Thoughts? Can't we simply have _V4/V2 use the upper half of ld_plugin_symbol::def? If the linker then requests _V4 but the plugin cannot cope it could still "use" the data but get LDST_UNKNOWN (zero) there. IMHO LDST_VARIABLE_BSS is "misplaced"? "BSS" is the section of the variable. If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC. Note there's also rodata vs data info that would be missing in case we'd want tools like readelf -s dump the symbol table of the IL part of an object. It looks like nm can also distinguish rodata from data ("R", "r" vs "d") and "small object" data sections (not sure what's that about). It seems nm cannot distinguish symbols in mergeable string sections (it dumps "R" for me there). So intead of mangling everything into enum ld_plugin_symbol_t
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/10/20 1:07 PM, Martin Liška wrote: On 3/10/20 12:24 PM, Richard Biener wrote: Not sure how symtab is encoded right now but we also could have Ok, right now I don't see symtab entry much extensible. But what am I suggesting is to parse LTO bytecode version and then process conditional parsing of lto_symtab section. Thoughts? Martin So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol struct. It would make ABI change as correctly identified by abidiff: abidiff /tmp/before.o /tmp/after.o Functions changes summary: 0 Removed, 1 Changed, 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 0 Removed, 0 Added function symbol not referenced by debug info Variable symbols changes summary: 0 Removed, 1 Added variable symbol not referenced by debug info 1 function with some indirect sub-type change: [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 has some indirect sub-type changes: parameter 1 of type 'ld_plugin_tv*' has sub-type changes: in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1: type size hasn't changed 1 data member changes (1 filtered): type of 'union {int tv_val; const char* tv_string; ld_plugin_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbols_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_set_extra_library_path; ld_plugin_get_input_section_count tv_get_input_section_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_plugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_section_contents tv_get_input_section_contents; ld_plugin_update_section_order tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_section_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed: type size hasn't changed 1 data member changes (1 filtered): type of 'ld_plugin_add_symbols tv_add_symbols' changed: underlying type 'enum ld_plugin_status (void*, int, const ld_plugin_symbol*)*' changed: in pointed to type 'function type enum ld_plugin_status (void*, int, const ld_plugin_symbol*)': parameter 3 of type 'const ld_plugin_symbol*' has sub-type changes: in pointed to type 'const ld_plugin_symbol': in unqualified underlying type 'struct ld_plugin_symbol' at plugin-api.h:86:1: type size changed from 256 to 288 (in bits) 1 data member insertion: 'int ld_plugin_symbol::symbol_type', at offset 256 (in bits) at plugin-api.h:95:1 So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: one has 2 parallel symbol tables: struct plugin_symtab { ... struct ld_plugin_symbol_v2 *syms; struct ld_plugin_symbol *syms_v1; ... }; and the information of these should by aligned. The patch can survive lto.exp and I would like to ask H.J. to write bintuils counterpart that will utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. Thoughts? Martin >From 1271e9cc487847c1aa07e5bc0470e8c5b71900e9 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH] API extension for binutils (type of symbols). gcc/ChangeLog: 2020-03-11 Martin Liska * lto-streamer-out.c (write_symbol): Stream type of symbol (function, variable - bss/data). include/ChangeLog: 2020-03-11 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. * plugin-api.h (struct ld_plugin_symbol_v2): New ld_plugin_symbol structure with added symbol_type. (enum ld_plugin_symbol_type): New. (ld_plugin_add_symbols_v2): New. (ld_plugin_get_symbols_v2): New. (ld_plugin_tag): Add LDPT_GET_SYMBOLS_V4 and LDPT_ADD_SYMBOLS_V2. (struct ld_plugin_tv): Add tv_add_symbols_v2 and tv_get_symbols_v4. lto-plugin/ChangeLog: 2020-03-11 Martin Liska * lto-plugin.c (LTO_LTO_PREFIX): New. (LTO_LTO_PREFIX_LEN): Likewise. (struct plugin_symtab): Make syms of new type ld_plugin_symbol_v2 and add syms_v1 for older API hooks. (struct lto_section): New. (parse_table_entr
Re: [PATCH][RFC] API extension for binutils (type of symbols).
Hello, On Tue, 10 Mar 2020, Martin Liška wrote: > >>> We nee to support different variables, like TLS, data and bss variables. > >> > >> Why do we need TLS? Right now, it's not supported by nm. > > > > Of course it does. It's the 'T' (or 't') character. > > Thank you reply! > > Are you sure about it? Err, I bungled in between writing emails, you are right, nm(1) in BSD mode doesn't make a difference for TLS symbols. (And of course T/t are for text, aka code, symbols). Doesn't invalidate the rest of my email, though. Ciao, Michael.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/10/20 10:39 AM, Martin Liška wrote: Are you sure about it? $ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o ... D s 0010 D t ? A nicer example: $ cat tls.c __thread struct S { long a, b; } s = { 0, 0 }; __thread char t[16] = { 7 }; $ gcc tls.c -c && nm -B tls.o B s D t Well, TLS seems to me an orthogonal problem. Similarly other special symbols like .func, .symver, alias attribute, ... Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/10/20 12:24 PM, Richard Biener wrote: Not sure how symtab is encoded right now but we also could have Ok, right now I don't see symtab entry much extensible. But what am I suggesting is to parse LTO bytecode version and then process conditional parsing of lto_symtab section. Thoughts? Martin >From 017dd9cba9a0222104682bef094d9f5057d2c9ae Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH] API extension for binutils (type of symbols). gcc/ChangeLog: 2020-03-09 Martin Liska * lto-streamer-out.c (write_symbol): Stream symbol type. include/ChangeLog: 2020-03-09 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. * plugin-api.h (struct ld_plugin_symbol): New member symbols_type. (enum ld_plugin_symbol_type): New. (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4. lto-plugin/ChangeLog: 2020-03-09 Martin Liska * lto-plugin.c (parse_table_entry): Parse symbol type. (LTO_LTO_PREFIX): New. (LTO_LTO_PREFIX_LEN): New. (struct lto_section): New. --- gcc/lto-streamer-out.c | 14 include/lto-symtab.h| 8 +++ include/plugin-api.h| 14 +++- lto-plugin/lto-plugin.c | 48 - 4 files changed, 82 insertions(+), 2 deletions(-) diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index cea5e71cffb..ead606eb665 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "print-tree.h" #include "tree-dfa.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "output.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache, lto_write_data (&c, 1); c = (unsigned char) visibility; lto_write_data (&c, 1); + + gcc_plugin_symbol_type st; + if (TREE_CODE (t) == VAR_DECL) +{ + section *s = get_variable_section (t, false); + st = (s->common.flags & SECTION_BSS + ? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA); +} + else +st = GCCST_FUNCTION; + + c = (unsigned char) st; + lto_write_data (&c, 1); lto_write_data (&size, 8); lto_write_data (&slot_num, 4); } diff --git a/include/lto-symtab.h b/include/lto-symtab.h index 0ce0de10121..901bc3585c2 100644 --- a/include/lto-symtab.h +++ b/include/lto-symtab.h @@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility GCCPV_HIDDEN }; +enum gcc_plugin_symbol_type +{ + GCCST_UNKNOWN, + GCCST_FUNCTION, + GCCST_VARIABLE_DATA, + GCCST_VARIABLE_BSS +}; + #endif /* GCC_LTO_SYMTAB_H */ diff --git a/include/plugin-api.h b/include/plugin-api.h index 09e1202df07..794a2dcc4ee 100644 --- a/include/plugin-api.h +++ b/include/plugin-api.h @@ -92,6 +92,7 @@ struct ld_plugin_symbol uint64_t size; char *comdat_key; int resolution; + int symbol_type; }; /* An object's section. */ @@ -123,6 +124,16 @@ enum ld_plugin_symbol_visibility LDPV_HIDDEN }; +/* The type of the symbol. */ + +enum ld_plugin_symbol_type +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE_DATA, + LDST_VARIABLE_BSS +}; + /* How a symbol is resolved. */ enum ld_plugin_symbol_resolution @@ -431,7 +442,8 @@ enum ld_plugin_tag LDPT_GET_INPUT_SECTION_ALIGNMENT = 29, LDPT_GET_INPUT_SECTION_SIZE = 30, LDPT_REGISTER_NEW_INPUT_HOOK = 31, - LDPT_GET_WRAP_SYMBOLS = 32 + LDPT_GET_WRAP_SYMBOLS = 32, + LDPT_GET_SYMBOLS_V4 = 33, }; /* The plugin transfer vector. */ diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index c307fc871bf..33afae9afb6 100644 --- a/lto-plugin/lto-plugin.c +++ b/lto-plugin/lto-plugin.c @@ -90,6 +90,8 @@ along with this program; see the file COPYING3. If not see #define LTO_SECTION_PREFIX ".gnu.lto_.symtab" #define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) +#define LTO_LTO_PREFIX ".gnu.lto_.lto" +#define LTO_LTO_PREFIX_LEN (sizeof (LTO_LTO_PREFIX) - 1) #define OFFLOAD_SECTION ".gnu.offload_lto_.opts" #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1) @@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text) } } +/* Structure that represents LTO ELF section with information + about the format. */ + +struct lto_section + { + int16_t major_version; + int16_t minor_version; + unsigned char slim_object: 1; + unsigned char compression: 4; + int32_t reserved0: 27; +}; + +struct lto_section lto_header; + /* This little wrapper allows check to be called with a non-integer first argument, such as a pointer that must be non-NULL. We can't use c99 bool type to coerce it into range, so we explicitly test. */ @@ -252,6 +268,14 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry, LDPV_HIDDEN }; + enum ld_plugin_symbol_type symbol_types[] = +{ + LDST_UNKNOWN, + LDST_FUNCTION, + LDST_VARIABLE_DATA, + LDST_VARIABLE_BSS +}; + switc
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Tue, Mar 10, 2020 at 12:09 PM Jan Hubicka wrote: > > > > >>> @Honza/Richi: Do you have any opinion about that? > > > > > > > > I guess we indeed want to get as close to non-LTO nm behaviour as > > > > possible. So we want to support them and perhaps think of .symtab > > > > section file format that can be made backward compatible (such as having > > > > attribute string for symbols where we can add new info in future in a > > > > way that old plugins will still get info they want). > > > > > > I like the idea. But it's probably next stage1 material. Or can you > > > prepare > > > a patch? > > > > I think what's important is that the LTO plugin needs to understand > > the old and the new version since there's only one for auto-loading. > > > > The other missing feature of the linker plugin API is file claiming > > which should be a on a section basis instead - but that's a different > > part of the API and not related to symbol tables. Enhancing that > > part of the API would allow to elide the LTO debug copying ... > > Thinking of it, it seems to me that we do not need to break > compatibility with existing plugins. We could keep existing .symtab > section the way it is implemented right now > and add additional data to new .symtab_ext section so existing plugins > will work as expeted. > > We could tag symtab_ext by a version string and keep adding extensions > of extensions in the future being compatible with old plugins. Not sure how symtab is encoded right now but we also could have entry1: symbol entry2: ^ext-ver-n entry3: ^ext-ver-m entry4: symbol and ^ext-ver-X being escape byte(s) denoting we're providing additional data for the preceeding symbol. Not sure if there's something conveniently available in the current encoding that would make older plugins skip an entry ;) > Indeed that would help situation user already has distro provided plugin > in the search path but compiles its own gcc10. > > Honza
Re: [PATCH][RFC] API extension for binutils (type of symbols).
> > >>> @Honza/Richi: Do you have any opinion about that? > > > > > > I guess we indeed want to get as close to non-LTO nm behaviour as > > > possible. So we want to support them and perhaps think of .symtab > > > section file format that can be made backward compatible (such as having > > > attribute string for symbols where we can add new info in future in a > > > way that old plugins will still get info they want). > > > > I like the idea. But it's probably next stage1 material. Or can you prepare > > a patch? > > I think what's important is that the LTO plugin needs to understand > the old and the new version since there's only one for auto-loading. > > The other missing feature of the linker plugin API is file claiming > which should be a on a section basis instead - but that's a different > part of the API and not related to symbol tables. Enhancing that > part of the API would allow to elide the LTO debug copying ... Thinking of it, it seems to me that we do not need to break compatibility with existing plugins. We could keep existing .symtab section the way it is implemented right now and add additional data to new .symtab_ext section so existing plugins will work as expeted. We could tag symtab_ext by a version string and keep adding extensions of extensions in the future being compatible with old plugins. Indeed that would help situation user already has distro provided plugin in the search path but compiles its own gcc10. Honza
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Tue, Mar 10, 2020 at 11:05 AM Martin Liška wrote: > > On 3/9/20 9:19 PM, Jan Hubicka wrote: > >> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška wrote: > >>> > >>> On 3/9/20 4:36 PM, H.J. Lu wrote: > We nee to support different variables, like TLS, data and bss variables. > >>> > >>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? > >> > >> Since you are introducing symbol types, why not support TLS? > >> > >>> About BSS and DATA I agree that it would be handy. I can theoretically > >>> covered with code in get_variable_section/bss_initializer_p. But it's > >>> quite logic and I'm not sure we should simulate it. > > > > I think it should not be that hard to factor out the logic from > > get_variable_section to return enum of what we want to do and then > > have get_variale_section as a wrapper parsing this enum to actual > > section. > > So it was easier that I expected and I'm sending updated version > of the patch. > > >>> > >>> @Honza/Richi: Do you have any opinion about that? > > > > I guess we indeed want to get as close to non-LTO nm behaviour as > > possible. So we want to support them and perhaps think of .symtab > > section file format that can be made backward compatible (such as having > > attribute string for symbols where we can add new info in future in a > > way that old plugins will still get info they want). > > I like the idea. But it's probably next stage1 material. Or can you prepare > a patch? I think what's important is that the LTO plugin needs to understand the old and the new version since there's only one for auto-loading. The other missing feature of the linker plugin API is file claiming which should be a on a section basis instead - but that's a different part of the API and not related to symbol tables. Enhancing that part of the API would allow to elide the LTO debug copying ... > > > > Of course IPA optimizations may migrate symbols around (say from data to > > bss)/take them away/rename them, but with that we need to live. I would > > expect most tools inspecting nm are interested in what will enter > > linking not what will be in final output. > > Yes, there are mostly used during configure script run where they commonly > do not take final linked executables/shared libs. > > > > > Since we discuss plugin extensions (and I do not want this to complicate > > finishing Martin's patch). > > Please suggest that in another patch. > > The current situation with binutils is bad because we can't build distribution > with -fno-common and LTO. > > Martin > > > Are we aware of other plugin limitations? > > One thing that I consider unsafe is the way we produce local names when > > we need to promote symbol to hidden due to partitining. We add > > .lto_priv, but that is not safe if we link with .o file that was > > incrementally lto-optimized to target object file (this is reason why I > > did not enabled WHOPR path for it). > > > > We may also want to inform lld and llvm's gold plugin maintainers about > > intended changes. > > Honza > >>> > >>> Thanks, > >>> Martin > >> > >> > >> > >> -- > >> H.J. >
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/9/20 9:19 PM, Jan Hubicka wrote: On Mon, Mar 9, 2020 at 9:56 AM Martin Liška wrote: On 3/9/20 4:36 PM, H.J. Lu wrote: We nee to support different variables, like TLS, data and bss variables. Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? Since you are introducing symbol types, why not support TLS? About BSS and DATA I agree that it would be handy. I can theoretically covered with code in get_variable_section/bss_initializer_p. But it's quite logic and I'm not sure we should simulate it. I think it should not be that hard to factor out the logic from get_variable_section to return enum of what we want to do and then have get_variale_section as a wrapper parsing this enum to actual section. So it was easier that I expected and I'm sending updated version of the patch. @Honza/Richi: Do you have any opinion about that? I guess we indeed want to get as close to non-LTO nm behaviour as possible. So we want to support them and perhaps think of .symtab section file format that can be made backward compatible (such as having attribute string for symbols where we can add new info in future in a way that old plugins will still get info they want). I like the idea. But it's probably next stage1 material. Or can you prepare a patch? Of course IPA optimizations may migrate symbols around (say from data to bss)/take them away/rename them, but with that we need to live. I would expect most tools inspecting nm are interested in what will enter linking not what will be in final output. Yes, there are mostly used during configure script run where they commonly do not take final linked executables/shared libs. Since we discuss plugin extensions (and I do not want this to complicate finishing Martin's patch). Please suggest that in another patch. The current situation with binutils is bad because we can't build distribution with -fno-common and LTO. Martin Are we aware of other plugin limitations? One thing that I consider unsafe is the way we produce local names when we need to promote symbol to hidden due to partitining. We add .lto_priv, but that is not safe if we link with .o file that was incrementally lto-optimized to target object file (this is reason why I did not enabled WHOPR path for it). We may also want to inform lld and llvm's gold plugin maintainers about intended changes. Honza Thanks, Martin -- H.J. >From 4214743fc011fd8900a89166759a5511d8da5da2 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 6 Mar 2020 18:09:35 +0100 Subject: [PATCH] API extension for binutils (type of symbols). gcc/ChangeLog: 2020-03-09 Martin Liska * lto-streamer-out.c (write_symbol): Stream symbol type. include/ChangeLog: 2020-03-09 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. * plugin-api.h (struct ld_plugin_symbol): New member symbols_type. (enum ld_plugin_symbol_type): New. (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4. lto-plugin/ChangeLog: 2020-03-09 Martin Liska * lto-plugin.c (parse_table_entry): Parse symbol type. --- gcc/lto-streamer-out.c | 14 ++ include/lto-symtab.h| 8 include/plugin-api.h| 14 +- lto-plugin/lto-plugin.c | 13 + 4 files changed, 48 insertions(+), 1 deletion(-) diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index cea5e71cffb..ead606eb665 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "print-tree.h" #include "tree-dfa.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "output.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache, lto_write_data (&c, 1); c = (unsigned char) visibility; lto_write_data (&c, 1); + + gcc_plugin_symbol_type st; + if (TREE_CODE (t) == VAR_DECL) +{ + section *s = get_variable_section (t, false); + st = (s->common.flags & SECTION_BSS + ? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA); +} + else +st = GCCST_FUNCTION; + + c = (unsigned char) st; + lto_write_data (&c, 1); lto_write_data (&size, 8); lto_write_data (&slot_num, 4); } diff --git a/include/lto-symtab.h b/include/lto-symtab.h index 0ce0de10121..901bc3585c2 100644 --- a/include/lto-symtab.h +++ b/include/lto-symtab.h @@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility GCCPV_HIDDEN }; +enum gcc_plugin_symbol_type +{ + GCCST_UNKNOWN, + GCCST_FUNCTION, + GCCST_VARIABLE_DATA, + GCCST_VARIABLE_BSS +}; + #endif /* GCC_LTO_SYMTAB_H */ diff --git a/include/plugin-api.h b/include/plugin-api.h index 09e1202df07..794a2dcc4ee 100644 --- a/include/plugin-api.h +++ b/include/plugin-api.h @@ -92,6 +92,7 @@ struct ld_plugin_symbol uint64_t size; char *comdat_key; int resolution; + int symbol_type; }; /* An object's section. */ @@ -123,6 +124,16 @@ enum ld_plugin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/9/20 8:45 PM, Michael Matz wrote: Hello, On Mon, 9 Mar 2020, Martin Liška wrote: On 3/9/20 4:36 PM, H.J. Lu wrote: We nee to support different variables, like TLS, data and bss variables. Why do we need TLS? Right now, it's not supported by nm. Of course it does. It's the 'T' (or 't') character. Thank you reply! Are you sure about it? $ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o ... D s 0010 D t ? When you introduce symbol categories into the plugin system it would be advisable to include all we usually care about, and as the ELF categories are (roughly) a superset of everything we support, I'd say that should be the list to look at. I.e. a mixture of visibility, locality (aka binding) and type: I agree with that. {object,function,common,tls} Here LDPK_COMMON is already handled. x {local,global,weak,unique} x {default,internal,hidden,protected} That doesn't include symbols types section,file,ifunc or os or arch specific types or visibilities or bindings. But it would probably not be the worst idea to simply encode what we need with ELF constants and names. While not all the world is ELF, all concepts we have can be mapped onto ELF. Ciao, Michael.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška wrote: > > > > On 3/9/20 4:36 PM, H.J. Lu wrote: > > > We nee to support different variables, like TLS, data and bss variables. > > > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? > > Since you are introducing symbol types, why not support TLS? > > > About BSS and DATA I agree that it would be handy. I can theoretically > > covered with code in get_variable_section/bss_initializer_p. But it's > > quite logic and I'm not sure we should simulate it. I think it should not be that hard to factor out the logic from get_variable_section to return enum of what we want to do and then have get_variale_section as a wrapper parsing this enum to actual section. > > > > @Honza/Richi: Do you have any opinion about that? I guess we indeed want to get as close to non-LTO nm behaviour as possible. So we want to support them and perhaps think of .symtab section file format that can be made backward compatible (such as having attribute string for symbols where we can add new info in future in a way that old plugins will still get info they want). Of course IPA optimizations may migrate symbols around (say from data to bss)/take them away/rename them, but with that we need to live. I would expect most tools inspecting nm are interested in what will enter linking not what will be in final output. Since we discuss plugin extensions (and I do not want this to complicate finishing Martin's patch). Are we aware of other plugin limitations? One thing that I consider unsafe is the way we produce local names when we need to promote symbol to hidden due to partitining. We add .lto_priv, but that is not safe if we link with .o file that was incrementally lto-optimized to target object file (this is reason why I did not enabled WHOPR path for it). We may also want to inform lld and llvm's gold plugin maintainers about intended changes. Honza > > > > Thanks, > > Martin > > > > -- > H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
Hello, On Mon, 9 Mar 2020, Martin Liška wrote: > On 3/9/20 4:36 PM, H.J. Lu wrote: > > We nee to support different variables, like TLS, data and bss variables. > > Why do we need TLS? Right now, it's not supported by nm. Of course it does. It's the 'T' (or 't') character. When you introduce symbol categories into the plugin system it would be advisable to include all we usually care about, and as the ELF categories are (roughly) a superset of everything we support, I'd say that should be the list to look at. I.e. a mixture of visibility, locality (aka binding) and type: {object,function,common,tls} x {local,global,weak,unique} x {default,internal,hidden,protected} That doesn't include symbols types section,file,ifunc or os or arch specific types or visibilities or bindings. But it would probably not be the worst idea to simply encode what we need with ELF constants and names. While not all the world is ELF, all concepts we have can be mapped onto ELF. Ciao, Michael.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Mon, Mar 9, 2020 at 9:56 AM Martin Liška wrote: > > On 3/9/20 4:36 PM, H.J. Lu wrote: > > We nee to support different variables, like TLS, data and bss variables. > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? Since you are introducing symbol types, why not support TLS? > About BSS and DATA I agree that it would be handy. I can theoretically > covered with code in get_variable_section/bss_initializer_p. But it's > quite logic and I'm not sure we should simulate it. > > @Honza/Richi: Do you have any opinion about that? > > Thanks, > Martin -- H.J.
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On 3/9/20 4:36 PM, H.J. Lu wrote: We nee to support different variables, like TLS, data and bss variables. Why do we need TLS? Right now, it's not supported by nm. Or am I wrong? About BSS and DATA I agree that it would be handy. I can theoretically covered with code in get_variable_section/bss_initializer_p. But it's quite logic and I'm not sure we should simulate it. @Honza/Richi: Do you have any opinion about that? Thanks, Martin
Re: [PATCH][RFC] API extension for binutils (type of symbols).
On Mon, Mar 9, 2020 at 2:30 AM Martin Liška wrote: > > Hi. > > With change of -fno-common default for GCC 10 we're facing serious > problem with LTO where one can't distinguish in between global symbols > and variables based on the output of nm: > https://sourceware.org/bugzilla/show_bug.cgi?id=25355 > > $ cat nm.c > char nm_test_var; > > $ gcc-9 nm.c -c -fno-common > $ nm nm.o > B nm_test_var > > $ gcc-9 nm.c -c -fno-common -flto > $ nm nm.o > T nm_test_var > > H.J. decided to implement quite heavy solution which is about usage of > lto-wrapper > that takes a LTO object file and makes a final assembly file. The file is then > utilized with nm. That has some disadvantages: > > - it's slow - using nm x.a can take very long time > - way of finding lto-wrapper is quite hard-coded to location of LTO plugin > - we face issues with multiple final object files: >https://sourceware.org/bugzilla/show_bug.cgi?id=25640 > > That said, I'm suggesting to expect LTO plugin API to tell binutils whether > a symbol is variable or function. That should help us to mark global variables > with "D" in nm output. > > I would like to note that even with -fcommon, the nm output for LTO bytecode > is far > from perfect: > > $ cat bss.c > int global_zero; > int global_one = 1; > > $ gcc-9 bss.c -c -flto > $ nm bss.o > T global_one > C global_zero > > I believe in this case we can mark both symbols with D as a reasonable guess. > > Thoughts? > Martin > > gcc/ChangeLog: > > 2020-03-09 Martin Liska > > * lto-streamer-out.c (write_symbol): Stream > symbol type. > > include/ChangeLog: > > 2020-03-09 Martin Liska > > * lto-symtab.h (enum gcc_plugin_symbol_type): New. > * plugin-api.h (struct ld_plugin_symbol): New member > symbols_type. > (enum ld_plugin_symbol_type): New. > (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4. > > lto-plugin/ChangeLog: > > 2020-03-09 Martin Liska > > * lto-plugin.c (parse_table_entry): Parse symbol type. > --- > gcc/lto-streamer-out.c | 2 ++ > include/lto-symtab.h| 7 +++ > include/plugin-api.h| 13 - > lto-plugin/lto-plugin.c | 12 > 4 files changed, 33 insertions(+), 1 deletion(-) > > We nee to support different variables, like TLS, data and bss variables. -- H.J.
[PATCH][RFC] API extension for binutils (type of symbols).
Hi. With change of -fno-common default for GCC 10 we're facing serious problem with LTO where one can't distinguish in between global symbols and variables based on the output of nm: https://sourceware.org/bugzilla/show_bug.cgi?id=25355 $ cat nm.c char nm_test_var; $ gcc-9 nm.c -c -fno-common $ nm nm.o B nm_test_var $ gcc-9 nm.c -c -fno-common -flto $ nm nm.o T nm_test_var H.J. decided to implement quite heavy solution which is about usage of lto-wrapper that takes a LTO object file and makes a final assembly file. The file is then utilized with nm. That has some disadvantages: - it's slow - using nm x.a can take very long time - way of finding lto-wrapper is quite hard-coded to location of LTO plugin - we face issues with multiple final object files: https://sourceware.org/bugzilla/show_bug.cgi?id=25640 That said, I'm suggesting to expect LTO plugin API to tell binutils whether a symbol is variable or function. That should help us to mark global variables with "D" in nm output. I would like to note that even with -fcommon, the nm output for LTO bytecode is far from perfect: $ cat bss.c int global_zero; int global_one = 1; $ gcc-9 bss.c -c -flto $ nm bss.o T global_one C global_zero I believe in this case we can mark both symbols with D as a reasonable guess. Thoughts? Martin gcc/ChangeLog: 2020-03-09 Martin Liska * lto-streamer-out.c (write_symbol): Stream symbol type. include/ChangeLog: 2020-03-09 Martin Liska * lto-symtab.h (enum gcc_plugin_symbol_type): New. * plugin-api.h (struct ld_plugin_symbol): New member symbols_type. (enum ld_plugin_symbol_type): New. (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4. lto-plugin/ChangeLog: 2020-03-09 Martin Liska * lto-plugin.c (parse_table_entry): Parse symbol type. --- gcc/lto-streamer-out.c | 2 ++ include/lto-symtab.h| 7 +++ include/plugin-api.h| 13 - lto-plugin/lto-plugin.c | 12 4 files changed, 33 insertions(+), 1 deletion(-)