Re: [Dwarf-Discuss] string reduction techniques

2021-11-11 Thread Greg Clayton via Dwarf-Discuss



> On Nov 7, 2021, at 12:36 PM, Todd Allen  wrote:
> 
> Just spitballing an idea here, but would there be value in a new DW_FORM (or
> two) that referenced the names from .strtab or .dynstr, instead of .debug_str?
> It would only work if the symbols already were there, but I would expect that
> for many/most/all(?) functions defined in the compilation unit.  It does
> somewhat relegate this to being Someone Else's Problem, but given that the
> .strtab already has the problem of zillions of these huge symbols, maybe 
> that's
> not so bad?
> 
> Maybe, if that's too onerous for tools that need to manipulate .strtab, it 
> could
> reference them indirectly through a .debug_strtab_offsets section similar to
> .debug_str_offsets.

Interesting idea! One issue is if someone strips the binary, this could end up 
stripping local symbols that have mangled names that the DWARF refers to and 
cause the DW_FORM values to point to invalid offsets. 

> 
> On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
>> On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
>> <[1]dwarf-discuss@lists.dwarfstd.org> wrote:
>> 
>>   LLDB also uses mangled names. The clang compiler is our expression
>>   parser and it always tries to resolve symbols during compilation/JIT and
>>   it supplies mangled names when looking for functions to resolve when it
>>   JITs code up. It is nice to be able to do quick name lookups using these
>>   mangled names to find the address of the function. That being said, we
>>   could work around it. Not sure how easy that would be though as mangled
>>   names can end up demangling to the same name with some loss of
>>   information and it would be important to be able to find the right in
>>   charge or out of charge constructor when the compiler asks for a
>>   specific symbol using the mangled name. We have more uses of mangled
>>   names but most of them relate to parsing the symbol tables, so removing
>>   them from DWARF wouldn't affect those areas.
>>   I wonder if these is a way to have a DW_AT_partial_linkage_name that
>>   relies on the decl context of a DIE. Like if you have a class "foo" in
>>   the global namespace it could have a DW_AT_partial_linkage_name with the
>>   value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>>inside this class could have another partial linkage name "3bari" that
>>   could be put together with the parent "_Z3foo" for a function like:
>>   Void foo::bar(int);
>>   Since many mangled names often start with the same prefix it might help
>>   reduce the string table size.
>> 
>> It's a thought - though I'm not sure how much that would really generalize
>> across different mangling schemes that use different mechanisms for
>> backreferences, etc. Or whether the return type should be included (it's
>> included for function templates in itanium mangling, for instance -
>> presumably also in MSVC mangling, but maybe some manglings include it even
>> in non-templates? I'm not sure) - since the partial linkage name for a
>> type would be context-insensitive (since it'd be attached to the type
>> rather than any use of the type) it'd be up to the consumer to fix that
>> up, eg:
>> 
>> [2]https://godbolt.org/z/TqYjeevqx
>> Itanium:
>>   f1<>(): _Z2f1IJEEvv
>>   f1(): _Z2f1IJ2t1S0_S0_EEvv
>> MSVC:
>>   f1<>(): ??$f1@$$V@@YAXXZ
>>   f1(): ??$f1@Ut1@@U1@U1@@@YAXXZ
>> I'm not sure how much less a consumer would know about mangling if it had
>> to know about how to assemble these things, insert backrefs, insert empty
>> list markers, etc - without having to know how to mangle a specific user
>> defined type or name, like "3foo" versus "@Ut1@"?
>> 
>> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
>> <[3]dwarf-discuss@lists.dwarfstd.org> wrote:
>> Finally, a question i know the answer to!
>> It brings us all the way back to when I was the C++ maintainer for
>> GDB, which is the most ancient of history.
>> Unfortunately, this a trip to a horrible place
>> I actually spent a lot of time trying to make it so we didn't need
>> linkage names, because, even then, they took up a *lot* of space.
>> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss
>> <[4]dwarf-discuss@lists.dwarfstd.org> wrote:
>> 
 I can't be sure about this exponential growth.  I don't have the
>>   data to back it
 up.  But I will say, when we created DWARF64, I was skeptical
>>   that it would be
 needed during my career.  And yet here we are...
>>> 
>>> Yep, still got mixed feelings about DWARF64 - partly the pieces
>>   that we're seeing with the need for some solutions for mixed
>>   DWARF32/64, etc, makes it feel like maybe it's not got a bit of
>>   "settling in" to do. And I'm still rather hopeful we might be able
>>   to reduce the overheads enough to avoid widespread use of DWARF64 -
>>   but it's not a sure thing by any means.
>> 
>>   Agreed. I'd like to 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-08 Thread David Blaikie via Dwarf-Discuss
On Sun, Nov 7, 2021 at 12:36 PM Todd Allen  wrote:
>
> Just spitballing an idea here, but would there be value in a new DW_FORM (or
> two) that referenced the names from .strtab or .dynstr, instead of .debug_str?

Yeah, something along those lines have crossed my mind too - I haven't
looked into it enough to understand if there's nice/generic
relocations to use for that that linkers respect (ie: preserve the
names if there's a relocation to it, even if the real symbol doesn't
make it into the linked file due to linker GC, etc). There's a couple
of wrinkles:

1) Not all the symbols are already there - a fully inlined function
might have a linkage name (does for the way we use DWARF at Google -
(clang's -fdebug-info-for-profiling) so that functions can be
identified build-over-build, even if they're inlined into al call
sites) and there's some discussion of adding type information for heap
allocations so we might want linkage_names on types too.

2) Split DWARF would, ideally, keep any linkage names that aren't
needed in the ELF file (fully inlined, types, etc) only in the Split
DWARF, not in the .o/executable

But yeah - maybe there's something down that direction, but there's
some hurdles to overcome.

> It would only work if the symbols already were there, but I would expect that
> for many/most/all(?) functions defined in the compilation unit.  It does
> somewhat relegate this to being Someone Else's Problem, but given that the
> .strtab already has the problem of zillions of these huge symbols, maybe 
> that's
> not so bad?
>
> Maybe, if that's too onerous for tools that need to manipulate .strtab, it 
> could
> reference them indirectly through a .debug_strtab_offsets section similar to
> .debug_str_offsets.
>
> On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
> >On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
> ><[1]dwarf-discuss@lists.dwarfstd.org> wrote:
> >
> >  LLDB also uses mangled names. The clang compiler is our expression
> >  parser and it always tries to resolve symbols during compilation/JIT 
> > and
> >  it supplies mangled names when looking for functions to resolve when it
> >  JITs code up. It is nice to be able to do quick name lookups using 
> > these
> >  mangled names to find the address of the function. That being said, we
> >  could work around it. Not sure how easy that would be though as mangled
> >  names can end up demangling to the same name with some loss of
> >  information and it would be important to be able to find the right in
> >  charge or out of charge constructor when the compiler asks for a
> >  specific symbol using the mangled name. We have more uses of mangled
> >  names but most of them relate to parsing the symbol tables, so removing
> >  them from DWARF wouldn't affect those areas.
> >  I wonder if these is a way to have a DW_AT_partial_linkage_name that
> >  relies on the decl context of a DIE. Like if you have a class "foo" in
> >  the global namespace it could have a DW_AT_partial_linkage_name with 
> > the
> >  value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
> >   inside this class could have another partial linkage name "3bari" that
> >  could be put together with the parent "_Z3foo" for a function like:
> >  Void foo::bar(int);
> >  Since many mangled names often start with the same prefix it might help
> >  reduce the string table size.
> >
> >It's a thought - though I'm not sure how much that would really 
> > generalize
> >across different mangling schemes that use different mechanisms for
> >backreferences, etc. Or whether the return type should be included (it's
> >included for function templates in itanium mangling, for instance -
> >presumably also in MSVC mangling, but maybe some manglings include it 
> > even
> >in non-templates? I'm not sure) - since the partial linkage name for a
> >type would be context-insensitive (since it'd be attached to the type
> >rather than any use of the type) it'd be up to the consumer to fix that
> >up, eg:
> >
> >[2]https://godbolt.org/z/TqYjeevqx
> >Itanium:
> >  f1<>(): _Z2f1IJEEvv
> >  f1(): _Z2f1IJ2t1S0_S0_EEvv
> >MSVC:
> >  f1<>(): ??$f1@$$V@@YAXXZ
> >  f1(): ??$f1@Ut1@@U1@U1@@@YAXXZ
> >I'm not sure how much less a consumer would know about mangling if it had
> >to know about how to assemble these things, insert backrefs, insert empty
> >list markers, etc - without having to know how to mangle a specific user
> >defined type or name, like "3foo" versus "@Ut1@"?
> >
> >On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
> ><[3]dwarf-discuss@lists.dwarfstd.org> wrote:
> >Finally, a question i know the answer to!
> >It brings us all the way back to when I was the C++ maintainer for
> >GDB, which is the most ancient of history.
> >

Re: [Dwarf-Discuss] string reduction techniques

2021-11-07 Thread Todd Allen via Dwarf-Discuss
Just spitballing an idea here, but would there be value in a new DW_FORM (or
two) that referenced the names from .strtab or .dynstr, instead of .debug_str?
It would only work if the symbols already were there, but I would expect that
for many/most/all(?) functions defined in the compilation unit.  It does
somewhat relegate this to being Someone Else's Problem, but given that the
.strtab already has the problem of zillions of these huge symbols, maybe that's
not so bad?

Maybe, if that's too onerous for tools that need to manipulate .strtab, it could
reference them indirectly through a .debug_strtab_offsets section similar to
.debug_str_offsets.

On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
>On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
><[1]dwarf-discuss@lists.dwarfstd.org> wrote:
> 
>  LLDB also uses mangled names. The clang compiler is our expression
>  parser and it always tries to resolve symbols during compilation/JIT and
>  it supplies mangled names when looking for functions to resolve when it
>  JITs code up. It is nice to be able to do quick name lookups using these
>  mangled names to find the address of the function. That being said, we
>  could work around it. Not sure how easy that would be though as mangled
>  names can end up demangling to the same name with some loss of
>  information and it would be important to be able to find the right in
>  charge or out of charge constructor when the compiler asks for a
>  specific symbol using the mangled name. We have more uses of mangled
>  names but most of them relate to parsing the symbol tables, so removing
>  them from DWARF wouldn't affect those areas.
>  I wonder if these is a way to have a DW_AT_partial_linkage_name that
>  relies on the decl context of a DIE. Like if you have a class "foo" in
>  the global namespace it could have a DW_AT_partial_linkage_name with the
>  value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>   inside this class could have another partial linkage name "3bari" that
>  could be put together with the parent "_Z3foo" for a function like:
>  Void foo::bar(int);
>  Since many mangled names often start with the same prefix it might help
>  reduce the string table size.
> 
>It's a thought - though I'm not sure how much that would really generalize
>across different mangling schemes that use different mechanisms for
>backreferences, etc. Or whether the return type should be included (it's
>included for function templates in itanium mangling, for instance -
>presumably also in MSVC mangling, but maybe some manglings include it even
>in non-templates? I'm not sure) - since the partial linkage name for a
>type would be context-insensitive (since it'd be attached to the type
>rather than any use of the type) it'd be up to the consumer to fix that
>up, eg:
> 
>[2]https://godbolt.org/z/TqYjeevqx
>Itanium:
>  f1<>(): _Z2f1IJEEvv
>  f1(): _Z2f1IJ2t1S0_S0_EEvv
>MSVC:
>  f1<>(): ??$f1@$$V@@YAXXZ
>  f1(): ??$f1@Ut1@@U1@U1@@@YAXXZ
>I'm not sure how much less a consumer would know about mangling if it had
>to know about how to assemble these things, insert backrefs, insert empty
>list markers, etc - without having to know how to mangle a specific user
>defined type or name, like "3foo" versus "@Ut1@"?
> 
>On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
><[3]dwarf-discuss@lists.dwarfstd.org> wrote:
>Finally, a question i know the answer to!
>It brings us all the way back to when I was the C++ maintainer for
>GDB, which is the most ancient of history.
>Unfortunately, this a trip to a horrible place
>I actually spent a lot of time trying to make it so we didn't need
>linkage names, because, even then, they took up a *lot* of space.
>On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss
><[4]dwarf-discuss@lists.dwarfstd.org> wrote:
> 
>  >> I can't be sure about this exponential growth.  I don't have the
>  data to back it
>  >> up.  But I will say, when we created DWARF64, I was skeptical
>  that it would be
>  >> needed during my career.  And yet here we are...
>  >
>  > Yep, still got mixed feelings about DWARF64 - partly the pieces
>  that we're seeing with the need for some solutions for mixed
>  DWARF32/64, etc, makes it feel like maybe it's not got a bit of
>  "settling in" to do. And I'm still rather hopeful we might be able
>  to reduce the overheads enough to avoid widespread use of DWARF64 -
>  but it's not a sure thing by any means.
> 
>  Agreed. I'd like to explore as many avenues as we can to eliminate
>  the
>  need for DWARF64.
> 
>  >> Honestly, I've never been sure why 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-02 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> LLDB also uses mangled names. The clang compiler is our expression parser
> and it always tries to resolve symbols during compilation/JIT and it
> supplies mangled names when looking for functions to resolve when it JITs
> code up. It is nice to be able to do quick name lookups using these mangled
> names to find the address of the function. That being said, we could work
> around it. Not sure how easy that would be though as mangled names can end
> up demangling to the same name with some loss of information and it would
> be important to be able to find the right in charge or out of charge
> constructor when the compiler asks for a specific symbol using the mangled
> name. We have more uses of mangled names but most of them relate to parsing
> the symbol tables, so removing them from DWARF wouldn’t affect those areas.
>
> I wonder if these is a way to have a DW_AT_partial_linkage_name that
> relies on the decl context of a DIE. Like if you have a class "foo" in the
> global namespace it could have a DW_AT_partial_linkage_name with the
> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>  inside this class could have another partial linkage name "3bari" that
> could be put together with the parent "_Z3foo" for a function like:
>
> Void foo::bar(int);
>
> Since many mangled names often start with the same prefix it might help
> reduce the string table size.
>

It's a thought - though I'm not sure how much that would really generalize
across different mangling schemes that use different mechanisms for
backreferences, etc. Or whether the return type should be included (it's
included for function templates in itanium mangling, for instance -
presumably also in MSVC mangling, but maybe some manglings include it even
in non-templates? I'm not sure) - since the partial linkage name for a type
would be context-insensitive (since it'd be attached to the type rather
than any use of the type) it'd be up to the consumer to fix that up, eg:

https://godbolt.org/z/TqYjeevqx
Itanium:
  f1<>(): _Z2f1IJEEvv
  f1(): _Z2f1IJ*2t1S0_S0_*EEvv
MSVC:
  f1<>(): ??$f1@$$V@@YAXXZ
  f1(): ??$f1*@Ut1@@U1@U1@*@@YAXXZ

I'm not sure how much less a consumer would know about mangling if it had
to know about how to assemble these things, insert backrefs, insert empty
list markers, etc - without having to know how to mangle a specific user
defined type or name, like "3foo" versus "@Ut1@"?


>
> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Finally, a question i know the answer to!
>
> It brings us all the way back to when I was the C++ maintainer for GDB,
> which is the most ancient of history.
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage
> names, because, even then, they took up a *lot* of space.
>
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
>> >> I can't be sure about this exponential growth.  I don't have the data
>> to back it
>> >> up.  But I will say, when we created DWARF64, I was skeptical that it
>> would be
>> >> needed during my career.  And yet here we are...
>> >
>> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
>> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
>> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
>> still rather hopeful we might be able to reduce the overheads enough to
>> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>>
>> Agreed. I'd like to explore as many avenues as we can to eliminate the
>> need for DWARF64.
>>
>>
>> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
>> Our
>> >> debugger almost never uses it.  (There is one use to detect "GNU
>> indirect"
>> >> functions.)  I wonder if it would be possible to avoid them if you
>> provided
>> >> enough info about the template parameters, if the debugger had its own
>> name
>> >> mangler.  I had to write one for our debugger a couple years ago, and
>> it
>> >> definitely was a persnickety beast.  But doable with enough
>> information.  Mind
>> >> you, I'm not sure there is enough information to do it perfectly with
>> the state
>> >> of DWARF & gcc right now.
>> >
>> > Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short name
>> "t1" but then also embeds the template argument list in the name (like
>> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
>> up the name, rebuild the original name as it would if it'd been given only
>> the simple name ("t1") and compare it to the one from clang. Then I can run
>> this over large programs and check everything round-trips correctly & in
>> clang, 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Daniel Berlin via Dwarf-Discuss
On Mon, Nov 1, 2021 at 10:14 PM Greg Clayton  wrote:

> LLDB also uses mangled names. The clang compiler is our expression parser
> and it always tries to resolve symbols during compilation/JIT and it
> supplies mangled names when looking for functions to resolve when it JITs
> code up.
>

GDB was nearly the same

> It is nice to be able to do quick name lookups using these mangled names
> to find the address of the function.
>

Yep - GDB also required them to be able to do binary search for the name of
the function -> address mapping (the "minimal symbol" table).


> That being said, we could work around it. Not sure how easy that would be
> though as mangled names can end up demangling to the same name with some
> loss of information and it would be important to be able to find the right
> in charge or out of charge constructor when the compiler asks for a
> specific symbol using the mangled name.
>

Yes - we felt the same way at the time. We could resolve the symbol table
speed issue in a variety of ways if we had to, but you'd end up with quite
an interface to get all the info being extracted from the linkage names
passed along to the right places to find the right symbols without them.


> We have more uses of mangled names but most of them relate to parsing the
> symbol tables, so removing them from DWARF wouldn’t affect those areas.
>
> I wonder if these is a way to have a DW_AT_partial_linkage_name that
> relies on the decl context of a DIE. Like if you have a class "foo" in the
> global namespace it could have a DW_AT_partial_linkage_name with the
> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>  inside this class could have another partial linkage name "3bari" that
> could be put together with the parent "_Z3foo" for a function like:
>
> Void foo::bar(int);
>
> Since many mangled names often start with the same prefix it might help
> reduce the string table size.
>

This is similar to how gdb constructed mangled names for certain things, so
it certainly is doable.


>
>
> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Finally, a question i know the answer to!
>
> It brings us all the way back to when I was the C++ maintainer for GDB,
> which is the most ancient of history.
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage
> names, because, even then, they took up a *lot* of space.
>
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
>> >> I can't be sure about this exponential growth.  I don't have the data
>> to back it
>> >> up.  But I will say, when we created DWARF64, I was skeptical that it
>> would be
>> >> needed during my career.  And yet here we are...
>> >
>> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
>> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
>> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
>> still rather hopeful we might be able to reduce the overheads enough to
>> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>>
>> Agreed. I'd like to explore as many avenues as we can to eliminate the
>> need for DWARF64.
>>
>>
>> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
>> Our
>> >> debugger almost never uses it.  (There is one use to detect "GNU
>> indirect"
>> >> functions.)  I wonder if it would be possible to avoid them if you
>> provided
>> >> enough info about the template parameters, if the debugger had its own
>> name
>> >> mangler.  I had to write one for our debugger a couple years ago, and
>> it
>> >> definitely was a persnickety beast.  But doable with enough
>> information.  Mind
>> >> you, I'm not sure there is enough information to do it perfectly with
>> the state
>> >> of DWARF & gcc right now.
>> >
>> > Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short name
>> "t1" but then also embeds the template argument list in the name (like
>> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
>> up the name, rebuild the original name as it would if it'd been given only
>> the simple name ("t1") and compare it to the one from clang. Then I can run
>> this over large programs and check everything round-trips correctly & in
>> clang, classify any names we can't roundtrip so they get emitted in full
>> rather than shortened.
>> > We could do something similar with linkage names - since to know
>> there's some prior art in your work there.
>> >
>> > I wouldn't be averse to considering what'd take to make DWARF robust
>> enough to always roundtrip simple and linkage names in this way - I don't
>> think it'd take a /lot/ of extra DWARF content.
>>
>> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
>> 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Greg Clayton via Dwarf-Discuss
LLDB also uses mangled names. The clang compiler is our expression parser and 
it always tries to resolve symbols during compilation/JIT and it supplies 
mangled names when looking for functions to resolve when it JITs code up. It is 
nice to be able to do quick name lookups using these mangled names to find the 
address of the function. That being said, we could work around it. Not sure how 
easy that would be though as mangled names can end up demangling to the same 
name with some loss of information and it would be important to be able to find 
the right in charge or out of charge constructor when the compiler asks for a 
specific symbol using the mangled name. We have more uses of mangled names but 
most of them relate to parsing the symbol tables, so removing them from DWARF 
wouldn’t affect those areas.

I wonder if these is a way to have a DW_AT_partial_linkage_name that relies on 
the decl context of a DIE. Like if you have a class "foo" in the global 
namespace it could have a DW_AT_partial_linkage_name with the value "_Z3foo". A 
DW_TAG_subprogram that is a child of this "foo" class  inside this class could 
have another partial linkage name "3bari" that could be put together with the 
parent "_Z3foo" for a function like:

Void foo::bar(int);

Since many mangled names often start with the same prefix it might help reduce 
the string table size.


> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss 
>  wrote:
> 
> Finally, a question i know the answer to!
> 
> It brings us all the way back to when I was the C++ maintainer for GDB, which 
> is the most ancient of history.  
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage 
> names, because, even then, they took up a *lot* of space.
> 
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss 
> mailto:dwarf-discuss@lists.dwarfstd.org>> 
> wrote:
> >> I can't be sure about this exponential growth.  I don't have the data to 
> >> back it
> >> up.  But I will say, when we created DWARF64, I was skeptical that it 
> >> would be
> >> needed during my career.  And yet here we are...
> >
> > Yep, still got mixed feelings about DWARF64 - partly the pieces that we're 
> > seeing with the need for some solutions for mixed DWARF32/64, etc, makes it 
> > feel like maybe it's not got a bit of "settling in" to do. And I'm still 
> > rather hopeful we might be able to reduce the overheads enough to avoid 
> > widespread use of DWARF64 - but it's not a sure thing by any means.
> 
> Agreed. I'd like to explore as many avenues as we can to eliminate the
> need for DWARF64.
> 
> 
> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
> >> debugger almost never uses it.  (There is one use to detect "GNU indirect"
> >> functions.)  I wonder if it would be possible to avoid them if you provided
> >> enough info about the template parameters, if the debugger had its own name
> >> mangler.  I had to write one for our debugger a couple years ago, and it
> >> definitely was a persnickety beast.  But doable with enough information.  
> >> Mind
> >> you, I'm not sure there is enough information to do it perfectly with the 
> >> state
> >> of DWARF & gcc right now.
> >
> > Yeah, that was/is certainly my first pass - the way I've done the 
> > DW_AT_name one is to have a feature in clang that produces the short name 
> > "t1" but then also embeds the template argument list in the name (like 
> > this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split 
> > up the name, rebuild the original name as it would if it'd been given only 
> > the simple name ("t1") and compare it to the one from clang. Then I can run 
> > this over large programs and check everything round-trips correctly & in 
> > clang, classify any names we can't roundtrip so they get emitted in full 
> > rather than shortened.
> > We could do something similar with linkage names - since to know there's 
> > some prior art in your work there.
> >
> > I wouldn't be averse to considering what'd take to make DWARF robust enough 
> > to always roundtrip simple and linkage names in this way - I don't think 
> > it'd take a /lot/ of extra DWARF content.
> 
> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
> (or only did in some very specific cases) until the LTO folks
> convinced us they needed it in order to relate profile data back to
> the source. Perhaps if we came up with a better way of doing that, we
> could eliminate the linkage names.
> 
> No, see, that's a mildly reasonable answer.
> If you go far enough back, the linkage names exist for a few reasons:
> 1. Because the debug info wasn't always good enough, and so GDB used to 
> demangle the linkage names and parse them using a hacked up C++-ish parser 
> for type info.
> 2. Even when it didn't, it decoded linkage names to detect things like 
> destructors/constructors, etc.
> 3. Because It used it to do 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Daniel Berlin via Dwarf-Discuss
Finally, a question i know the answer to!

It brings us all the way back to when I was the C++ maintainer for GDB,
which is the most ancient of history.
Unfortunately, this a trip to a horrible place
I actually spent a lot of time trying to make it so we didn't need linkage
names, because, even then, they took up a *lot* of space.

On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> >> I can't be sure about this exponential growth.  I don't have the data
> to back it
> >> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> >> needed during my career.  And yet here we are...
> >
> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
> still rather hopeful we might be able to reduce the overheads enough to
> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>
> Agreed. I'd like to explore as many avenues as we can to eliminate the
> need for DWARF64.
>
>
> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
> Our
> >> debugger almost never uses it.  (There is one use to detect "GNU
> indirect"
> >> functions.)  I wonder if it would be possible to avoid them if you
> provided
> >> enough info about the template parameters, if the debugger had its own
> name
> >> mangler.  I had to write one for our debugger a couple years ago, and it
> >> definitely was a persnickety beast.  But doable with enough
> information.  Mind
> >> you, I'm not sure there is enough information to do it perfectly with
> the state
> >> of DWARF & gcc right now.
> >
> > Yeah, that was/is certainly my first pass - the way I've done the
> DW_AT_name one is to have a feature in clang that produces the short name
> "t1" but then also embeds the template argument list in the name (like
> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
> up the name, rebuild the original name as it would if it'd been given only
> the simple name ("t1") and compare it to the one from clang. Then I can run
> this over large programs and check everything round-trips correctly & in
> clang, classify any names we can't roundtrip so they get emitted in full
> rather than shortened.
> > We could do something similar with linkage names - since to know there's
> some prior art in your work there.
> >
> > I wouldn't be averse to considering what'd take to make DWARF robust
> enough to always roundtrip simple and linkage names in this way - I don't
> think it'd take a /lot/ of extra DWARF content.
>
> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
> (or only did in some very specific cases) until the LTO folks
> convinced us they needed it in order to relate profile data back to
> the source. Perhaps if we came up with a better way of doing that, we
> could eliminate the linkage names.
>

No, see, that's a mildly reasonable answer.
If you go far enough back, the linkage names exist for a few reasons:
1. Because the debug info wasn't always good enough, and so GDB used to
demangle the linkage names and parse them using a hacked up C++-ish parser
for type info.
2. Even when it didn't, it decoded linkage names to detect things like
destructors/constructors, etc.
3. Because It used it to do remangling properly and try to generate method
signatures to lookup (and for #1)
4. Because it was used to do symbol lookup of in the ELF/etc symbol tables
for static things/etc.
5. Because it saved space in STABS to do #1 (they predate DWARF by far).

If you checkout gdb source code, circa 2001, and search for things like
check_stub_method, and follow all the things it calls (like
gdb_mangle_name), you can learn the history of linkage names (and probably
throw up in your mouth a little).
 If you do a case insensitive search for things like "physname" and
"phys_name", you'll see all the places it used to use the linkage names.
I spent a lot of time abstracting out things like the
constructor/destructor name testing, vptr name finding, etc, so that
someone later might have a chance to get rid of linkage names (it was also
necessary because of the gcc 2.95->3.0 ABI change).



>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Cary Coutant via Dwarf-Discuss
>> > I wouldn't be averse to considering what'd take to make DWARF robust 
>> > enough to always roundtrip simple and linkage names in this way - I don't 
>> > think it'd take a /lot/ of extra DWARF content.
>>
>> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
>> (or only did in some very specific cases) until the LTO folks
>> convinced us they needed it in order to relate profile data back to
>> the source. Perhaps if we came up with a better way of doing that, we
>> could eliminate the linkage names.
>
> Yeah, fair - it's certainly what we use it for still, authoritative names for 
> functions - including some amount of semantic information (so, for instance, 
> I believe a hash is inadequate) which allows limited rewriting (when we 
> changed standard libraries we were able to remap previous profile samples to 
> line up with the new names (different inline namespaces, implementation 
> names, etc) so as not to take a temporary perf hit as profiles were 
> regenerated, etc).

Just dug up this little bit I wrote up for a "DWARF Best Practices"
wiki article:

The producer may also generate a DW_AT_linkage_name attribute for
program objects, but the presence of this attribute should never be
required to distinguish one program object from another. The DIE
hierarchy is able to provide qualifiers for the name, and the
DW_AT_name attribute itself provides template parameters. In the case
of overloaded functions, the DW_TAG_formal_parameter DIEs belonging to
the function DIE can provide the necessary information to distinguish
one overload from another. In many cases, however, it is expensive for
a consumer to parse the hierarchy, and the presence of the mangled
name may be beneficial to performance. In other cases, the producer
may choose to generate a limited subset of debug information, and the
mangled name may substitute for the missing information.

-cary
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 5:35 PM Cary Coutant  wrote:

> >> I can't be sure about this exponential growth.  I don't have the data
> to back it
> >> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> >> needed during my career.  And yet here we are...
> >
> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
> still rather hopeful we might be able to reduce the overheads enough to
> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>
> Agreed. I'd like to explore as many avenues as we can to eliminate the
> need for DWARF64.
>
>
> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
> Our
> >> debugger almost never uses it.  (There is one use to detect "GNU
> indirect"
> >> functions.)  I wonder if it would be possible to avoid them if you
> provided
> >> enough info about the template parameters, if the debugger had its own
> name
> >> mangler.  I had to write one for our debugger a couple years ago, and it
> >> definitely was a persnickety beast.  But doable with enough
> information.  Mind
> >> you, I'm not sure there is enough information to do it perfectly with
> the state
> >> of DWARF & gcc right now.
> >
> > Yeah, that was/is certainly my first pass - the way I've done the
> DW_AT_name one is to have a feature in clang that produces the short name
> "t1" but then also embeds the template argument list in the name (like
> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
> up the name, rebuild the original name as it would if it'd been given only
> the simple name ("t1") and compare it to the one from clang. Then I can run
> this over large programs and check everything round-trips correctly & in
> clang, classify any names we can't roundtrip so they get emitted in full
> rather than shortened.
> > We could do something similar with linkage names - since to know there's
> some prior art in your work there.
> >
> > I wouldn't be averse to considering what'd take to make DWARF robust
> enough to always roundtrip simple and linkage names in this way - I don't
> think it'd take a /lot/ of extra DWARF content.
>
> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
> (or only did in some very specific cases) until the LTO folks
> convinced us they needed it in order to relate profile data back to
> the source. Perhaps if we came up with a better way of doing that, we
> could eliminate the linkage names.
>

Yeah, fair - it's certainly what we use it for still, authoritative names
for functions - including some amount of semantic information (so, for
instance, I believe a hash is inadequate) which allows limited rewriting
(when we changed standard libraries we were able to remap previous profile
samples to line up with the new names (different inline namespaces,
implementation names, etc) so as not to take a temporary perf hit as
profiles were regenerated, etc).
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Cary Coutant via Dwarf-Discuss
>> I can't be sure about this exponential growth.  I don't have the data to 
>> back it
>> up.  But I will say, when we created DWARF64, I was skeptical that it would 
>> be
>> needed during my career.  And yet here we are...
>
> Yep, still got mixed feelings about DWARF64 - partly the pieces that we're 
> seeing with the need for some solutions for mixed DWARF32/64, etc, makes it 
> feel like maybe it's not got a bit of "settling in" to do. And I'm still 
> rather hopeful we might be able to reduce the overheads enough to avoid 
> widespread use of DWARF64 - but it's not a sure thing by any means.

Agreed. I'd like to explore as many avenues as we can to eliminate the
need for DWARF64.


>> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
>> debugger almost never uses it.  (There is one use to detect "GNU indirect"
>> functions.)  I wonder if it would be possible to avoid them if you provided
>> enough info about the template parameters, if the debugger had its own name
>> mangler.  I had to write one for our debugger a couple years ago, and it
>> definitely was a persnickety beast.  But doable with enough information.  
>> Mind
>> you, I'm not sure there is enough information to do it perfectly with the 
>> state
>> of DWARF & gcc right now.
>
> Yeah, that was/is certainly my first pass - the way I've done the DW_AT_name 
> one is to have a feature in clang that produces the short name "t1" but then 
> also embeds the template argument list in the name (like this: 
> "_STNt1|") - then llvm-dwarfdump will detect this prefix, split up the 
> name, rebuild the original name as it would if it'd been given only the 
> simple name ("t1") and compare it to the one from clang. Then I can run this 
> over large programs and check everything round-trips correctly & in clang, 
> classify any names we can't roundtrip so they get emitted in full rather than 
> shortened.
> We could do something similar with linkage names - since to know there's some 
> prior art in your work there.
>
> I wouldn't be averse to considering what'd take to make DWARF robust enough 
> to always roundtrip simple and linkage names in this way - I don't think it'd 
> take a /lot/ of extra DWARF content.

Fuzzy memory here, but as I recall, GCC didn't generate linkage names
(or only did in some very specific cases) until the LTO folks
convinced us they needed it in order to relate profile data back to
the source. Perhaps if we came up with a better way of doing that, we
could eliminate the linkage names.

-cary

-cary
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 1:52 PM Todd Allen 
wrote:

> Dave,
>
> If I understand right: The space saving you're expecting is the
> near-elimination
> of DW_AT_name strings.  If they are only simple names like "T" and "int",
> they
> can be placed into the string table once each, and it should be very
> small.  But
> you're expecting the DW_AT_linkage_name attributes still to have lots of
> replication because of the large composed names.  So I gather that was
> where
> your estimate of 1/2 reduction came from.
>

Yep!


> I was trying to figure out how we came to opposite conclusions, and I
> think it's
> that I have this (implicit) assumption of a sort of "DWARF Moore's Law",
> that
> the size of debug info/strings/etc. would double periodically, just based
> on the
> tendency of software systems to grooow.  I'm likening it to Moore's
> Law,
> because I expect it's the same sort of vague, rough estimate that somehow
> still
> applies to the real world.
>
> Assuming it does apply, your halving of the string table amounts to buying
> yourself one doubling period, and then you're back to requiring DWARF64
> string
> tables.  (Meanwhile, DWARF64 gives us 32 doubling periods over DWARF32.  So
> hopefully that will last us for a while...)
>

I think there's a few things at work

1) these seem to be particularly extreme cases of template metaprogramming
- they make actually be growing greater than the Moore's Law-esque
situation (eg: we might've had some natural growth rate A, but then maybe a
few years back we get this particular use case and that use case
(TensorFlow in particular) gains significant adoption growing at rate B (st
B > A) and for a while that didn't come up and then eventually it starts
"Hockey-sticking" and we see the B growth dominating the A growth)
2) Yeah, I think I agree with you that if we don't solve the linkage name
problem, we might not have much runway.


> I can't be sure about this exponential growth.  I don't have the data to
> back it
> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> needed during my career.  And yet here we are...
>

Yep, still got mixed feelings about DWARF64 - partly the pieces that we're
seeing with the need for some solutions for mixed DWARF32/64, etc, makes it
feel like maybe it's not got a bit of "settling in" to do. And I'm still
rather hopeful we might be able to reduce the overheads enough to avoid
widespread use of DWARF64 - but it's not a sure thing by any means.


> ...
>
> The reduction for DW_AT_linkage_name does seem like a tougher nut to
> crack.  As
> you mentioned, there is a tendency to eliminate *some* of the replication
> because of the mangler's use of substitution strings (S_, S0_, S1_, etc.)
> But
> that same feature probably would make it a lot harder to do anything clever
> about chopping up the linkage names into substrings.
>

Yeah, somewhat - actually fully rebuilding it (having a fully
mangling-aware tool that can go look at the DWARF and build a linkage name
from it) would be possible for at least most/many names, but is in tension
with some of the point of DWARF to remove the need for consumers to have
such complicated knowledge... but costs/benefits/etc.


> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
> debugger almost never uses it.  (There is one use to detect "GNU indirect"
> functions.)  I wonder if it would be possible to avoid them if you provided
> enough info about the template parameters, if the debugger had its own name
> mangler.  I had to write one for our debugger a couple years ago, and it
> definitely was a persnickety beast.  But doable with enough information.
> Mind
> you, I'm not sure there is enough information to do it perfectly with the
> state
> of DWARF & gcc right now.
>

Yeah, that was/is certainly my first pass - the way I've done the
DW_AT_name one is to have a feature in clang that produces the short name
"t1" but then also embeds the template argument list in the name (like
this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
up the name, rebuild the original name as it would if it'd been given only
the simple name ("t1") and compare it to the one from clang. Then I can run
this over large programs and check everything round-trips correctly & in
clang, classify any names we can't roundtrip so they get emitted in full
rather than shortened.
We could do something similar with linkage names - since to know there's
some prior art in your work there.

I wouldn't be averse to considering what'd take to make DWARF robust enough
to always roundtrip simple and linkage names in this way - I don't think
it'd take a /lot/ of extra DWARF content.

- Dave

Todd
>
> On Mon, Nov 01, 2021 at 01:06:33PM -0700, David Blaikie wrote:
> >Hey Todd,
> >
> >Just some details regarding the string reduction strategies I'm
> pursuing
> >to address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo
> >sections in some large 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Todd Allen via Dwarf-Discuss
Dave,

If I understand right: The space saving you're expecting is the near-elimination
of DW_AT_name strings.  If they are only simple names like "T" and "int", they
can be placed into the string table once each, and it should be very small.  But
you're expecting the DW_AT_linkage_name attributes still to have lots of
replication because of the large composed names.  So I gather that was where
your estimate of 1/2 reduction came from.

I was trying to figure out how we came to opposite conclusions, and I think it's
that I have this (implicit) assumption of a sort of "DWARF Moore's Law", that
the size of debug info/strings/etc. would double periodically, just based on the
tendency of software systems to grooow.  I'm likening it to Moore's Law,
because I expect it's the same sort of vague, rough estimate that somehow still
applies to the real world.

Assuming it does apply, your halving of the string table amounts to buying
yourself one doubling period, and then you're back to requiring DWARF64 string
tables.  (Meanwhile, DWARF64 gives us 32 doubling periods over DWARF32.  So
hopefully that will last us for a while...)

I can't be sure about this exponential growth.  I don't have the data to back it
up.  But I will say, when we created DWARF64, I was skeptical that it would be
needed during my career.  And yet here we are...

...

The reduction for DW_AT_linkage_name does seem like a tougher nut to crack.  As
you mentioned, there is a tendency to eliminate *some* of the replication
because of the mangler's use of substitution strings (S_, S0_, S1_, etc.)  But
that same feature probably would make it a lot harder to do anything clever
about chopping up the linkage names into substrings.

Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
debugger almost never uses it.  (There is one use to detect "GNU indirect"
functions.)  I wonder if it would be possible to avoid them if you provided
enough info about the template parameters, if the debugger had its own name
mangler.  I had to write one for our debugger a couple years ago, and it
definitely was a persnickety beast.  But doable with enough information.  Mind
you, I'm not sure there is enough information to do it perfectly with the state
of DWARF & gcc right now.

Todd

On Mon, Nov 01, 2021 at 01:06:33PM -0700, David Blaikie wrote:
>Hey Todd,
> 
>Just some details regarding the string reduction strategies I'm pursuing
>to address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo
>sections in some large binaries at Google.
> 
>So the extreme cases I'm dealing with are predominantly C++ Expression
>templates (in TensorFlow and Eigen) - these produce types with very large
>DW_AT_names ("f1") and DW_AT_linkage_names (eg: "_Z2f1IiEvv") (but
>with many more template parameters, none of which are ever user-written
>but deduced).
> 
>So the main fix I'm pursuing (roughly called "simplified template names")
>is to omit template parameter lists from DW_AT_names of templates in most
>cases, allowing the consumer to reconstruct the name from
>DW_AT_template_*_parameters itself, recursively. Further discussion and
>details
>here: [1]https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ
>- in terms of how this affects scaling factors, it means that adding an
>additional template instantiation of existing types would add no new data
>to .debug_str (eg: going from a program with "t1" to "t1>"
>would add no new entries to .debug_str). Not all names can be readily
>reconstructed - so I'm opting the feature out on those, but we could have
>a more deeper discussion about how to handle them if we wanted to make
>this a full-fledged/robust feature (maybe one the DWARF spec
>suggests/encourages).
> 
>GDB seems to handle this sort of debug info OK - I guess someone did real
>work to support that at some point (so maybe some other debugger already
>generates DWARF like this).
> 
>The other half, though, is DW_AT_linkage_names - and in theory similar
>rebuilding could be done, but that'd require baking a lot fo
>implementation knowledge into the DWARF Consumer that DWARF is meant to
>help avoid... so I'm unsure what the right solution is there just now, but
>there's a few ideas I'm still kicking around. At least linkage names have
>less redundancy (within a single name they avoid redundancy - "t1,
>t1>" only ends up with a single description of "t1" instead of
>two of them like you get with the DW_AT_name) than DW_AT_names, so they do
>scale a bit better already.
> 
>Happy to discuss these ideas in specific, or their impact on debug_str
>growth in more detail any time (here, video chat, discords, etc).
> 
>- Dave
> 
> References
> 
>Visible links
>1. https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ
___
Dwarf-Discuss 

[Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
Hey Todd,

Just some details regarding the string reduction strategies I'm pursuing to
address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo sections
in some large binaries at Google.

So the extreme cases I'm dealing with are predominantly C++ Expression
templates (in TensorFlow and Eigen) - these produce types with very large
DW_AT_names ("f1") and DW_AT_linkage_names (eg: "_Z2f1IiEvv") (but
with many more template parameters, none of which are ever user-written but
deduced).

So the main fix I'm pursuing (roughly called "simplified template names")
is to omit template parameter lists from DW_AT_names of templates in most
cases, allowing the consumer to reconstruct the name from
DW_AT_template_*_parameters itself, recursively. Further discussion and
details here:
https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ - in
terms of how this affects scaling factors, it means that adding an
additional template instantiation of existing types would add no new data
to .debug_str (eg: going from a program with "t1" to "t1>"
would add no new entries to .debug_str). Not all names can be readily
reconstructed - so I'm opting the feature out on those, but we could have a
more deeper discussion about how to handle them if we wanted to make this a
full-fledged/robust feature (maybe one the DWARF spec suggests/encourages).

GDB seems to handle this sort of debug info OK - I guess someone did real
work to support that at some point (so maybe some other debugger already
generates DWARF like this).


The other half, though, is DW_AT_linkage_names - and in theory similar
rebuilding could be done, but that'd require baking a lot fo
implementation knowledge into the DWARF Consumer that DWARF is meant to
help avoid... so I'm unsure what the right solution is there just now, but
there's a few ideas I'm still kicking around. At least linkage names have
less redundancy (within a single name they avoid redundancy - "t1,
t1>" only ends up with a single description of "t1" instead of
two of them like you get with the DW_AT_name) than DW_AT_names, so they do
scale a bit better already.

Happy to discuss these ideas in specific, or their impact on debug_str
growth in more detail any time (here, video chat, discords, etc).

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org