Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-10-16 Thread Greg Clayton via Dwarf-discuss

> On Oct 16, 2023, at 9:12 AM, David Blaikie via Dwarf-discuss 
>  wrote:
> 
> 
> 
> On Mon, Oct 16, 2023 at 8:57 AM Alexander Yermolovich  <mailto:ayerm...@meta.com>> wrote:
>> For background llvm discussion on how to implement it: 
>> https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445
>> 
>> Thanks for explaining the issue, and proposing spec change. 
>> The question I have. Is non-bit identical TUs with the same hash a 
>> fundamental issue that needs to be addressed somehow in the next version of 
>> the spec? If we could have such guarantee that should simplify things quite 
>> a bit. The linker can just follow the same path as for functions. Compiler 
>> can generate symbol name unique for the type unit hash. So, when linker 
>> comdats TU sections entries in TU list will point to correct address and no 
>> special logic is needed for tombstone. I guess there is a hashing mechanism 
>> in DWARF spec, but LLVM is not using it. Should we go back to it, is it 
>> enough?
> 
> The hashing mechanism in the spec doesn't guarantee bit-identicality, I 
> believe. It's structural equivalence (eg: if you produce the main type DIE 
> followed by an int DIE that the main type needs, or you emit the int DIE 
> first, followed by the main type DIE - these hash to the same value (because 
> you start from the type DIE and hash outwards/to what it can reach, and has 
> structural equivalence - int is int, no matter what offset it's at)) not bit 
> identical. For a bunch of reasons this is preferable.
> 
> (yes, clang takes this further and hashes based on the C++ ODR - which is 
> off-spec, but workable in our experience)
> 
> I was thinking another direction we could go is that, I think, the only 
> things in a type unit that can be referenced is the type (I think?) then 
> perhaps we could modify how types defined in type units are referenced.
> 
> If only the type can be referenced in a type unit, we could emit a 
> .debug_names entry without a DW_IDX_die_offset - just the DW_IDX_type_unit - 
> and the consumer can use the header of the type unit to find the exact type 
> unit DIE.
> 
> Are there any other things that could be referenced within a type unit?

LLDB will want access to any types contained within the type units. Many 
classes contain type definitions within the class itself. Any CUs wanting to 
access these types of course can't, so they have the duplicate the entire 
declaration context for the type (containing namespaces and the class itself 
with a DW_AT_declaration(true) attribute) then create the copy of the contained 
type if it is simple. 

For example every STL class defines all sorts of "iterator", "const_iterator", 
"reverse_iterator", "size_type", "pointer_type", "reference_type", etc inside 
of the class. If no variables from a CU references these types, then we won't 
have access to them if we only add the main type unit type to the .debug_names 
table. 

So it is correct that the only thing that can be referenced in a type unit is 
the main type itself from a DWARF perspective, but it would be a shame if no 
debugger clients can use any of the extra types in the type units unless they 
are directly referenced (and duplicated) in a CU. 

LLDB notes which CUs and TUs have an entry in the .debug_names table and it 
will manually index any that didn't have entries. If the .debug_names tables 
end up only emitting the main type unit type, we will need to manually index 
each TU to make sure we have access to contained types. 

So I would vote to completely index each TU if possible.

>  
>> 
>> Alex
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: David Blaikie mailto:dblai...@gmail.com>>
>> Sent: Monday, September 25, 2023 9:02 AM
>> To: Alexander Yermolovich mailto:ayerm...@meta.com>>
>> Cc: dwarf-discuss@lists.dwarfstd.org 
>> <mailto:dwarf-discuss@lists.dwarfstd.org> > <mailto:dwarf-discuss@lists.dwarfstd.org>>
>> Subject: Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections
>>  
>> 
>> 
>> On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
>> Dwarf-discuss > <mailto:dwarf-discuss@lists.dwarfstd.org>> wrote:
>> >
>> > Hello
>> >
>> > I am trying to enable debug names acceleration table with 
>> > fdebug-types-sections in LLVM. One part I am not sure about is the local 
>> > TU list. It contains an offset into .debug_info section. All the entries 
>> > have an index entry that points to the local TU list. DIEs within entry 
>> > offsets are relative to the TU entry.

Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-10-16 Thread David Blaikie via Dwarf-discuss
On Mon, Oct 16, 2023 at 9:12 AM David Blaikie  wrote:

>
>
> On Mon, Oct 16, 2023 at 8:57 AM Alexander Yermolovich 
> wrote:
>
>> For background llvm discussion on how to implement it:
>>
>> https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445
>>
>> Thanks for explaining the issue, and proposing spec change. 
>> The question I have. Is non-bit identical TUs with the same hash a
>> fundamental issue that needs to be addressed somehow in the next version of
>> the spec? If we could have such guarantee that should simplify things quite
>> a bit. The linker can just follow the same path as for functions. Compiler
>> can generate symbol name unique for the type unit hash. So, when linker
>> comdats TU sections entries in TU list will point to correct address and no
>> special logic is needed for tombstone. I guess there is a hashing mechanism
>> in DWARF spec, but LLVM is not using it. Should we go back to it, is it
>> enough?
>>
>
> The hashing mechanism in the spec doesn't guarantee bit-identicality, I
> believe. It's structural equivalence (eg: if you produce the main type DIE
> followed by an int DIE that the main type needs, or you emit the int DIE
> first, followed by the main type DIE - these hash to the same value
> (because you start from the type DIE and hash outwards/to what it can
> reach, and has structural equivalence - int is int, no matter what offset
> it's at)) not bit identical. For a bunch of reasons this is preferable.
>
> (yes, clang takes this further and hashes based on the C++ ODR - which is
> off-spec, but workable in our experience)
>
> I was thinking another direction we could go is that, I think, the only
> things in a type unit that can be referenced is the type (I think?) then
> perhaps we could modify how types defined in type units are referenced.
>
> If only the type can be referenced in a type unit, we could emit a
> .debug_names entry without a DW_IDX_die_offset - just the DW_IDX_type_unit
> - and the consumer can use the header of the type unit to find the exact
> type unit DIE.
>
> Are there any other things that could be referenced within a type unit?
>

Hmm - this doesn't really relate to the local TU tombstoning issue (since
that's tombstoning the reference to the unit) but would help address the
foreign TU situation - in the DWP case, a consumer could ignore the
DW_IDX_compile_unit alongside the DW_IDx_type_unit and look up that type in
the DWP - then use the type unit's header to find the type DIE.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-10-16 Thread Alexander Yermolovich via Dwarf-discuss
For background llvm discussion on how to implement it:
https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445

Thanks for explaining the issue, and proposing spec change. 
The question I have. Is non-bit identical TUs with the same hash a fundamental 
issue that needs to be addressed somehow in the next version of the spec? If we 
could have such guarantee that should simplify things quite a bit. The linker 
can just follow the same path as for functions. Compiler can generate symbol 
name unique for the type unit hash. So, when linker comdats TU sections entries 
in TU list will point to correct address and no special logic is needed for 
tombstone. I guess there is a hashing mechanism in DWARF spec, but LLVM is not 
using it. Should we go back to it, is it enough?

Alex








From: David Blaikie 
Sent: Monday, September 25, 2023 9:02 AM
To: Alexander Yermolovich 
Cc: dwarf-discuss@lists.dwarfstd.org 
Subject: Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

!---|
  This Message Is From an External Sender

|---!

On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
Dwarf-discuss  wrote:
>
> Hello
>
> I am trying to enable debug names acceleration table with 
> fdebug-types-sections in LLVM. One part I am not sure about is the local TU 
> list. It contains an offset into .debug_info section. All the entries have an 
> index entry that points to the local TU list. DIEs within entry offsets are 
> relative to the TU entry.
>
> Linker de-duplicates Type Units using COMDAT. So, the final result will have 
> less type units. As the result Local Type Unit List will be invalid, and all 
> the Entries that point to that TU will not be valid either. Even if we Linker 
> is modified so that somehow when it de-duplicates type sections Local Type 
> Units will get the right offset, that still leaves all the duplicate entries.
> Am I missing something in that linker, specifically LLD, will need to be 
> aware of context of .debug_names sections when it de-duplicates type sections?
>
>
> It seems to me that to fully support it .debug_names need to be created by 
> post build tool (or by linker).
>
> Thanks.

While DWARF consumers will benefit from a content-aware linking of
.debug_names (using one hash table is more efficient than probing
hundreds/thousands of small hash tables), I don't believe the spec
as-is requires that for correctness.

In the case of type units, I'd expect behavior somewhat similar to how
linkers behave with inline functions - if the two copies of the
function are identical, it's possible that the linker will resolve all
relocations to the function to the single copy that remains after
linking (so two CUs would both describe the inline function "f1" and
both descriptions would have the same start address/length, the two
CUs CU-level DW_AT_ranges would overlap/both contain that function's
addresses - and neither would use the tombstone address). So in that
case, all the duplicate index entries would remain valid (their
TU-relative offsets would be correct - since the TUs were bit-wise
identical, so the offsets still point to the same things).

In the case where a producer produces equivalent but not
bitwise-identical TUs, the linker will choose one, drop the rest, and
use the tombstone value to resolve the relocation used in the local
TUs offset list. A consumer should ignore any entries that reference a
tombstone offset in the local TU list (& probably wouldn't hurt to use
the same code and ignore any tombstoned CUs too - I can't immediately
think of a situation/reason that'd happen, but seems like a good
general idea)

If a consumer does a semantic aware merge of the indexes, then it
should discard (rather than tombstoning) the index entries that
reference dead TUs and the dead TUs in the local TU list itself, and
also discard any duplicate index entries and duplicate elements in the
local TU list.

We could document the use of the tombstone in this context.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-09-25 Thread David Blaikie via Dwarf-discuss
On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
Dwarf-discuss  wrote:
>
> Hello
>
> I am trying to enable debug names acceleration table with 
> fdebug-types-sections in LLVM. One part I am not sure about is the local TU 
> list. It contains an offset into .debug_info section. All the entries have an 
> index entry that points to the local TU list. DIEs within entry offsets are 
> relative to the TU entry.
>
> Linker de-duplicates Type Units using COMDAT. So, the final result will have 
> less type units. As the result Local Type Unit List will be invalid, and all 
> the Entries that point to that TU will not be valid either. Even if we Linker 
> is modified so that somehow when it de-duplicates type sections Local Type 
> Units will get the right offset, that still leaves all the duplicate entries.
> Am I missing something in that linker, specifically LLD, will need to be 
> aware of context of .debug_names sections when it de-duplicates type sections?
>
>
> It seems to me that to fully support it .debug_names need to be created by 
> post build tool (or by linker).
>
> Thanks.

While DWARF consumers will benefit from a content-aware linking of
.debug_names (using one hash table is more efficient than probing
hundreds/thousands of small hash tables), I don't believe the spec
as-is requires that for correctness.

In the case of type units, I'd expect behavior somewhat similar to how
linkers behave with inline functions - if the two copies of the
function are identical, it's possible that the linker will resolve all
relocations to the function to the single copy that remains after
linking (so two CUs would both describe the inline function "f1" and
both descriptions would have the same start address/length, the two
CUs CU-level DW_AT_ranges would overlap/both contain that function's
addresses - and neither would use the tombstone address). So in that
case, all the duplicate index entries would remain valid (their
TU-relative offsets would be correct - since the TUs were bit-wise
identical, so the offsets still point to the same things).

In the case where a producer produces equivalent but not
bitwise-identical TUs, the linker will choose one, drop the rest, and
use the tombstone value to resolve the relocation used in the local
TUs offset list. A consumer should ignore any entries that reference a
tombstone offset in the local TU list (& probably wouldn't hurt to use
the same code and ignore any tombstoned CUs too - I can't immediately
think of a situation/reason that'd happen, but seems like a good
general idea)

If a consumer does a semantic aware merge of the indexes, then it
should discard (rather than tombstoning) the index entries that
reference dead TUs and the dead TUs in the local TU list itself, and
also discard any duplicate index entries and duplicate elements in the
local TU list.

We could document the use of the tombstone in this context.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


[Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-09-15 Thread Alexander Yermolovich via Dwarf-discuss
Hello

I am trying to enable debug names acceleration table with fdebug-types-sections 
in LLVM. One part I am not sure about is the local TU list. It contains an 
offset into .debug_info section. All the entries have an index entry that 
points to the local TU list. DIEs within entry offsets are relative to the TU 
entry.

Linker de-duplicates Type Units using COMDAT. So, the final result will have 
less type units. As the result Local Type Unit List will be invalid, and all 
the Entries that point to that TU will not be valid either. Even if we Linker 
is modified so that somehow when it de-duplicates type sections Local Type 
Units will get the right offset, that still leaves all the duplicate entries.
Am I missing something in that linker, specifically LLD, will need to be aware 
of context of .debug_names sections when it de-duplicates type sections?

It seems to me that to fully support it .debug_names need to be created by post 
build tool (or by linker).


Thanks.

-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss