Re: [Dwarf-discuss] Question on "pascal property"

2024-05-01 Thread David Blaikie via Dwarf-discuss
I'd be curious what compilers have been doing for Pascal for the last few
decades - not exactly a new problem? But, yeah, might've all been getting
by with extensions.

On Wed, May 1, 2024 at 2:47 PM Adrian Prantl via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Just a quick very general remark: For Objective-C, Clang (and presumably
> also GCC) are producing a couple of property-related attributes in the
> DW_AT_APPLE extension space
>
>
> https://github.com/llvm/llvm-project/blob/505f6da1961ab55c601d7239648c53ce863b5d70/llvm/include/llvm/BinaryFormat/Dwarf.def#L630
>
> maybe these are useful for you. I also wouldn't be opposed to
> standardizing a more generally useful mechanism for properties; we could
> potentially also find use for them in the Swift compiler.
>
> -- adrian
>
> On May 1, 2024, at 4:24 AM, Martin via Dwarf-discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Hello,
>
> I am writing the below to see if any of this might be of interest to add
> new tags/attributes to the Dwarf spec.
>
> If not, or for the parts for which it is a no, then I am happy to use
> vendor extensions.
>
>
> In Pascal there is a construct called property.
> https://www.freepascal.org/docs-html/ref/refse27.html
>
> To the user (and to any code accessing it, it behaves like a variable in
> that it can be read or assigned too. But it can not have its address taken.
> (The last bit may be ignorable for the rest of this email)
>
> In the most basic form it provides a setter and getter, which can either
> be a field or method. Setter and getter are always references to existing
> fields/methods.
>
> In order to describe this to a debugger, I can see 2 features that could
> be added to dwarf
>
>
> 1) Probably a new DW_TAG_PROPERTY
>
> Albeit, it might be possible to sub-divide existing DW_TAG_Variable /
> DW_TAG_Member into sub-sections, but I don't think that is advisable.
>
> The tag could then be sub-divided, as it needs the same info for different
> accessors. DW_TAG_PROPERTY_ where  is
>
> - "reader" => description on how to get the value
> - "setter" => description on how to write the value
> - "default" => how to get the default value, if one exsits
> - "stored" => information if this value should be serialized, if the
> object is serialized (may depend on its vaule, can be a function)
> - user/other
>
> Alternatively to dividing it into accessor tags, there could be just
> attributes for each method of access. Then however the below idea to allow
> direct variable/member tags will not work.
>
>
> The property would also be able to have attributes directly (not in sub
> sections), like virtuality.
>
> And it could be without sub-sections, if it just changed virtuality. This
> is common in Pascal, that a sub-class makes an inherited property more
> visible.
>
>
> Maybe There would be the question, if existing variable/member could
> benefit from any of that information. (I don't know a case were it would be
> needed). If there was, then the variable/member would be encoded as it
> currently is, but a DW_TAG_PROPERTY_WRITE[R] could be added, to override
> the handling of how it is written. Just a thought.
>
>
> 2) Forwarding to the existing member/method/function
>
> As indicated in Pascal the getter/setter are always a reference to an
> existing field/getter. Though defaults/stored can be a constant too.
> And to make it more flexible, it might be considered to allow embedding
> DW_TAG_Variable / DW_TAG_Member directly?
>
> In Pascal the reference can be
> - reference to a field or method
> - references to a field or method in a different object (needs
> specification of the target obj)
> - reference to a function (not part of an object/structure)
>
> So the accessors could have a simple tag
>
> For forwarding the would need a new DW_AT_FORWARD or DW_AT_ALIAS (better
> names may exist) attribute.
>
> References already exist. But they only reference the TAG describing the
> value. They can't hold an object (e.g. for DW_OP_push_object_address).
>
> So there would also be the need for DW_AT_FORWARD_OBJECT.
>
> I don't know if there are cases where such an ref-with-object would be
> interesting to existing encoding (such as when a ref is used to get the
> bounds of an array)
>
>
>
> If the forwarder is to a function, then there is a need to know how to
> call it. For a getter a function (or method, only taking the _this
> instance), the "how to call" is implicit.
>
> For a setter, there needs to be an extra argument, to pass the value. This
> could be implicit, if it is the only argument (other than maybe _this).
>
>
> Pascal has properties, that can share a getter and share a setter. The
> getter/setter then takes an index (of any type).
> For this a description would be needed, how to call the function.
>
> DW_TAG_CALL_ARGUMENT_LIST with a list of DW_AT_CALL_ARGUMENT. (that could
> be const, ref, expr), or Special values for _this and for "the value".
>
> --
> Dwarf-discuss mailing 

Re: [Dwarf-discuss] Coroutines

2024-02-20 Thread David Blaikie via Dwarf-discuss
Not that I'm aware of - there might be some existing features that are used
by languages that've had coroutines longer than C++? Not sure.

I think LLVM does produce some DWARF for coroutines, but I haven't looked
at it/seen how viable it is/how much consumers would need to know really
specific details of the producers strategy (if it's pretty generic, maybe
we don't have to standardize anything - if the consumer has to be really
aware, then maybe we can standardize some things to help ensure that
producer/consumer contrtact is generally usable/more portable)

On Tue, Feb 20, 2024 at 11:48 AM Kyle Huey via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Has anyone proposed or discussed DWARF structures to represent coroutines
> or similar constructs? I skimmed the open issues list and didn't see
> anything relevant.
>
> - Kyle
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
>
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Proposal: Allow padding in all tables

2024-01-31 Thread David Blaikie via Dwarf-discuss
Fair enough - alignment is mentioned in several cases, perhaps that could
be omitted? If that's not usually the need/intent?

On Wed, Jan 31, 2024 at 6:54 AM Robinson, Paul 
wrote:

> This proposal is guidance for the producer, not the linker. The producer
> needs this guidance specifically because linkers don’t pad/align
> contributions.
>
>
>
> I believe padding is rarely a functional requirement, and when it is, it’s
> not for alignment IME. This is where the line-table padding came from,
> allowing elbow room to replace a function’s line table without having to
> update references to other contributions. (Motivating examples include JIT
> (re-)compilation and incremental linking.)
>
>
>
> Padding for alignment, which is generally for performance or convenience
> and which I have run into in past years (pre-LLVM), must not confuse
> dumpers (which would be inclined to interpret padding bytes as the next
> header); therefore the padding bytes have to be interpretable.
>
>
>
> I think if we’re going to mention padding (which we already do in six of
> the ten non-string-section cases described below) we should be complete
> about it, hence this proposal. I’m not especially excited about the
> .debug_macro case, but as we failed to give that section a header with a
> length, we have to live with the consequences.
>
>
>
> If you think padding should never be mentioned (and so anyone who feels
> moved to provide padding has to re-invent the wheel), feel free to write a
> counter-proposal removing the existing mentions.
>
> --paulr
>
>
>
> *From:* David Blaikie 
> *Sent:* Tuesday, January 30, 2024 6:01 PM
> *To:* Robinson, Paul 
> *Cc:* dwarf-discuss@lists.dwarfstd.org
> *Subject:* Re: [Dwarf-discuss] Proposal: Allow padding in all tables
>
>
>
> Is anyone actually using this? In my experience linkers are generally
> concatenating these sections together with no extra padding/alignment.
>
> I'd rather not spec something that's not used/needed. I'm happy for
> consumers to be improved in the face of degenerate entries that might be
> created for padding if developers of such consumers feel so inclined
> (though I'd probably push back a bit on it in the consumers I work on - in
> the absence of any evidence of particular need/use case).
>
>
>
> On Thu, Jan 18, 2024 at 11:08 AM Robinson, Paul via Dwarf-discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> # Allow padding in all tables
>
> Enhancement; multiple sections.
>
> ## Background
>
> Issue 230329.1 requires all tables to be contiguous. During the discussion
> of that issue, the question came up of whether all tables allowed padding,
> so that contiguous concatenated contributions could be aligned reasonably.
> This is the result of my research.
>
> ## Overview
>
> The set of tables (merging the two tables from 230329.1) is as follows:
>
> - .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
> - .debug_aranges (Section 6.1.2)
> - .debug_addr (Section 7.27)
> - .debug_frame (Section 6.4.1)
> - .debug_info / .debug_info.dwo (Section 7.5.1)
> - .debug_line / .debug_line.dwo  (Section 6.2.4)
> - .debug_line_str
> - .debug_loclists / .debug_loclists.dwo (Section 7.29)
> - .debug_macro / .debug_macro.dwo (Section 6.3.1)
> - .debug_names (Section 6.1.1)
> - .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
> - .debug_str / .debug_str.dwo
> - .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)
>
> ### .debug_abbrev
>
> Entries have arbitrary size. Can be padded by adding an unused abbrev
> entry. Proposing a non-normative paragraph describing this.
>
> ### .debug_aranges
>
> Removed by 220724.1.
>
> ### .debug_addr
>
> Entries have a size of (segment_selector_size + address_size) and don't
> explicitly provide a padding mechanism. Adding unused entries at the end of
> the table should suffice. Proposing a non-normative paragraph describing
> this.
>
> ### .debug_frame
>
> Already permits padding by use of DW_CFA_nop.
>
> ### .debug_info
>
> Already permits padding by use of the abbreviation code 0 (see Section
> 7.5.2).
>
> ### .debug_line
>
> Already has DW_LNE_padding.
>
> ### .debug_line_str
>
> This is a string section and does not need padding (typically would be
> merged, not concatenated).
>
> ### .debug_loclists
>
> Already permits padding by use of repeated DW_LLE_end_of_list, with a
> non-normative comment to that effect.
>
> ### .debug_macro
>
> This has no unit_length and no explicit provision for padding. One could
> insert unused opcodes into the opcode_operands_table but this seems like
> quite a hack. In keeping with other sections, I'm proposing a
> DW_MACRO_padding opcode.
>
> ### .debug_names
>
> Components are mostly 4- or 8-byte multiples, except for the abbreviation
> table. The abbreviation table explicitly permits padding (Section
> 6.1.1.4.7).
>
> ### .debug_rnglists
>
> Already permits padding by use of repeated DW_RLE_end_of_list, with a
> non-normative comment to that effect.
>
> ### .debug_str
>
> This is a string section 

Re: [Dwarf-discuss] Proposal: Allow padding in all tables

2024-01-30 Thread David Blaikie via Dwarf-discuss
Is anyone actually using this? In my experience linkers are generally
concatenating these sections together with no extra padding/alignment.

I'd rather not spec something that's not used/needed. I'm happy for
consumers to be improved in the face of degenerate entries that might be
created for padding if developers of such consumers feel so inclined
(though I'd probably push back a bit on it in the consumers I work on - in
the absence of any evidence of particular need/use case).

On Thu, Jan 18, 2024 at 11:08 AM Robinson, Paul via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> # Allow padding in all tables
>
> Enhancement; multiple sections.
>
> ## Background
>
> Issue 230329.1 requires all tables to be contiguous. During the discussion
> of that issue, the question came up of whether all tables allowed padding,
> so that contiguous concatenated contributions could be aligned reasonably.
> This is the result of my research.
>
> ## Overview
>
> The set of tables (merging the two tables from 230329.1) is as follows:
>
> - .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
> - .debug_aranges (Section 6.1.2)
> - .debug_addr (Section 7.27)
> - .debug_frame (Section 6.4.1)
> - .debug_info / .debug_info.dwo (Section 7.5.1)
> - .debug_line / .debug_line.dwo  (Section 6.2.4)
> - .debug_line_str
> - .debug_loclists / .debug_loclists.dwo (Section 7.29)
> - .debug_macro / .debug_macro.dwo (Section 6.3.1)
> - .debug_names (Section 6.1.1)
> - .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
> - .debug_str / .debug_str.dwo
> - .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)
>
> ### .debug_abbrev
>
> Entries have arbitrary size. Can be padded by adding an unused abbrev
> entry. Proposing a non-normative paragraph describing this.
>
> ### .debug_aranges
>
> Removed by 220724.1.
>
> ### .debug_addr
>
> Entries have a size of (segment_selector_size + address_size) and don't
> explicitly provide a padding mechanism. Adding unused entries at the end of
> the table should suffice. Proposing a non-normative paragraph describing
> this.
>
> ### .debug_frame
>
> Already permits padding by use of DW_CFA_nop.
>
> ### .debug_info
>
> Already permits padding by use of the abbreviation code 0 (see Section
> 7.5.2).
>
> ### .debug_line
>
> Already has DW_LNE_padding.
>
> ### .debug_line_str
>
> This is a string section and does not need padding (typically would be
> merged, not concatenated).
>
> ### .debug_loclists
>
> Already permits padding by use of repeated DW_LLE_end_of_list, with a
> non-normative comment to that effect.
>
> ### .debug_macro
>
> This has no unit_length and no explicit provision for padding. One could
> insert unused opcodes into the opcode_operands_table but this seems like
> quite a hack. In keeping with other sections, I'm proposing a
> DW_MACRO_padding opcode.
>
> ### .debug_names
>
> Components are mostly 4- or 8-byte multiples, except for the abbreviation
> table. The abbreviation table explicitly permits padding (Section
> 6.1.1.4.7).
>
> ### .debug_rnglists
>
> Already permits padding by use of repeated DW_RLE_end_of_list, with a
> non-normative comment to that effect.
>
> ### .debug_str
>
> This is a string section and does not need padding (typically would be
> merged, not concatenated).
>
> ### .debug_str_offsets
>
> This has a header of 8 or 16 bytes, and entries of 4 or 8 bytes. This can
> still require padding if you want alignment greater than 4 bytes, and there
> is no explicit provision. Proposing a non-normative paragraph describing
> this.
>
> ### Conclusion
>
> Everything is already covered except .debug_abbrev, .debug_addr,
> .debug_str_offsets, and .debug_macro. The first three need non-normative
> notes describing how to pad the sections, and .debug_macro requires a new
> opcode to introduce padding cleanly.
>
> ## Proposed Changes
>
> I sorted these by affected section. In addition to the section-specific
> changes there is one general note.
>
> ### .debug_abbrev
>
> In Section 7.5.3 "Abbreviations Tables" (p.207), at the end of the
> section, add a new non-normative paragraph:
>
> *This table may be padded by adding an unused abbreviation entry. The
> minimum number of bytes in an abbreviation entry is four (abbreviation
> number, child flag, and two 0 bytes indicating the end of the
> attribute/form pairs). This can be expanded by choosing a large
> abbreviation number with a longer LEB128 encoding, or adding non-zero
> attribute/form pairs.*
>
> ### .debug_macro
>
> Add new Section 6.3.4 "Other Entries" (~ p.170) as follows:
>
> 1. DW_MACRO_padding
>The DW_MACRO_padding opcode takes two operands, a byte count and a
> sequence
>of arbitrary bytes. The byte count is an unsigned LEB128 encoded number
> and
>does not include the size of the opcode or the byte count operand. The
> opcode
>and operands have no effect on the macro information.
>
>*This permits a producer to pad the macro information with a minimum of
> two bytes.*
>
> ### 

Re: [Dwarf-discuss] Enhancement: DWARF Extension Registry

2023-12-01 Thread David Blaikie via Dwarf-discuss
On Fri, Dec 1, 2023 at 1:43 PM David Anderson via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> On 12/1/23 05:24, Ben Woodard via Dwarf-discuss wrote:
> > My reasoning is that the reason why we are running out of vendor defined
> > space is that within in the various vendor spaces the encoding space is
> > consumed by legacy extensions that:
> > 1) were never implemented publicly
> > 2) were implemented but are no longer in use because the compilers that
> > generated them have been abandoned
> > 3) were in use but have been incorporated into the standard version of
> > DWARF.
> >
> > I feel like clearing those out by drawing a line in the sand and saying
> > that extensions which existed in previous versions of DWARF do not
> > necessarily mean the same thing once the new version of DWARF is
> > released, should clear out the legacy cruft such that there should be
> > sufficient encoding space for new producer extensions.
> >
>
> While clearing-out of attributes etc that were never implemented
> makes sense,  I think the rest of this goes way too far in
> re-using things.   There is a distinct danger of making
> it impossible for a consumer to read DWARF3 once DWARF6 is complete.
> That seems to me to be a bad idea. Unappealing.
>

Not sure I follow this - you could still read DWARF3 as DWARF3 no matter
what changes in DWARF6, I think? Could you flesh out what you're thinking
here/how DWARF6 completion could (if we took some of these suggestions)
cause DWARF3 to be impossible to consume?
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Enhancement: DWARF Extension Registry

2023-12-01 Thread David Blaikie via Dwarf-discuss
>
> 5. Re 1.3.13: Blaming "reduced tool compatibility" on the skipability of
> unknown constructs seems a huge and unjustified claim. I don't buy it.
>
> In most cases, I can see your point but in this case I cannot. To me this
> seems obvious.
>
> Of course a consumer should be able to skip over something that it doesn't
> understand. However, if a consumer wants to achieve full compatibility with
> the producer, it needs to be able to interpret those constructs. It can't
> skip over them. That seems unquestionably obvious to me. So that the
> consumer's author can write the code that allows the consumer to interpret
> these producer specific constructs, the text refers the consumer to the
> registry.
>
I think the point might be, or a related point I would make is that DWARF
Consumer compatibility is complicated (think of DWARF as somewhere between
XML and HTML - there's a lot of "here's some tags and attributes, use them
as you see fit, here's some things we think they might be useful for") -
it's not tightly specified enough to constrain every consumer and producer
to interoperate identically, unlike something like HTML. The diversity of
languages, new language features, etc, is just wider than the DWARF
committee can effectively sign up to precisely describe.

So, while, a convenient mechanism to skip unknown constructs certainly
lowers the cost to ignoring unknown constructs, and thus lowers the cost to
incompatibility - the greater costs, of actually implementing the
functionality once a producer understands these constructs seems so much
higher that I don't think skippability significantly changes the costs of
compatibility. If none of these constructs were skippable, we'd need the
registry more than we do, but I think most consumers would just make a
table from the registry so they could parse everything, then keep ignoring
the bits that aren't relevant to their use case.


> 6. Re 6.1.1.2 and 6.2.4.2: Why single these out? This is no different
> than any other producer-defined extension and should be covered adequately
> by the general discussion in 7.1
>
> There was specific text in those sections which seemed to need to be
> changed.
>
>
> 7. Re 7.1: There seems to be text missing between "where there is a
> vendor" and "but it can also".
>
> By tieing a producer-defined extension to a particular version of DWARF,
> does that mean that when a new version of DWARF comes out that
> producer must re-register the extension to keep "ownership" of the code?
>
> I would say that as part of the new DWARF version release process, asking
> the producer developers who have registered extensions if those extensions
> are still meaningful and useful in the context of the new DWARF version.
>
> If so, then it gets carried over. However, If some particular extension
> continues to be useful across many versions of DWARF and producers are
> still actively using it, then that extension should be considered for
> standardization.
>
> If the extension has been standardized as part of the new DWARF version
> then that producer specific encoding can be reused in the context of the
> new DWARF version. In other words, producers should prefer the standardized
> way of expressing a concept over a producer specific way.
>
So that's certainly something we haven't done in the past (reusing the
existing extension encoding as a standard encoding) perhaps partly due to
the lack of such a registry - but also the point of the user extension
space is to be unconstrained. While we might've documented the extensions
we know about - this extension encoding space might be used by someone else
in some other way we don't know about (maybe that's a bit theoretical).

Another reason we haven't reused any encodings, even standard ones, is it
complicates consumers - they're probably going to end up needing an
encoding space that covers all the versions of DWARF they support, so
they'd need to remap all the DWARF version-specific encodings into one
wider encoding space that covers all the DWARF versions and encodes each
entity uniquely (alternately: having to check the DWARF version each time
when trying to lookup any attribute would add an extra indirection to every
attribute test, for instance... which would be a bit rough - I expect it
wouldn't outright/catastrophically harm performance, but would probably be
pretty difficult to work with), so I think that's generally why we haven't
reused any encoding space - either in the standard, or in extensions,
because consumers would struggle with it.

But worth looking at that kind of decision and re-evaluating, perhaps. I
think that's probably the core issue being proposed/that'll need to be
discussed here, is "can we reuse encoding space/what's the cost/impact of
that".
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Question: ETA?

2023-11-13 Thread David Blaikie via Dwarf-discuss
Yeah - I mostly don't mind it taking as long as it takes.

Eleanor - is there something you're particularly interested in/waiting
for/etc? At least for myself, speaking as a clang/llvm debug info
maintainer, I'd be happy to see DWARFv6 prototyped there (with a "no
stability guarantee" caveat, because the spec isn't finalized - so it'd be
useful/applicable if you have use clang/lldb at matched revisions (or some
arbitrary "safe revisions" even - if clang got out ahead of lldb in some
incompatible ways) only) if you really want to experiment with new
features, for instance.

In terms of moving the spec forward faster - if we think it's worth it, I'd
consider more proactively discouraging/sidelining proposals that can be
implemented using existing extension mechanisms & ask that those come back
only with implementation experience as extensions (though I realize this
comes at the cost of consuming more of the extension space, which is a bit
limited in some areas). Prioritize things that create extension spaces and
stuff that's fundamental shifts in the way DWARF is encoded. But I'm not
sure those sort of changes are necessary/a major need right now.

- Dave

On Mon, Nov 13, 2023 at 7:10 AM Robinson, Paul via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Speaking only for myself: Questions about ETA seem reasonable, as the
> interval between v4 and v5 was 6 years 8 months, and it has already been 6
> years 9 months since v5 was published. That said, the committee has never
> worked to a specific timeline.
>
>
>
> There is indeed a fair amount of work left to be done by the committee,
> some of which has had side discussions but not yet been formally proposed.
> My impression (I haven’t tried to verify this) is that the committee took
> longer than usual to get started on this round. Also we spent a fair amount
> of time on organizational issues, which obviously would detract from time
> spent on technical issues. The “change of administration” didn’t help
> either. But I think we are back in the groove.
>
>
>
> Regarding time commitment, we meet one hour every other week, which is not
> significantly different from the two hours per month that we met during
> consideration of the previous two versions. On the other hand, the
> committee is noticeably larger than it used to be, which can mean that
> discussions take longer. Perhaps we should increase the meeting time to get
> through the backlog more efficiently, and make up for lost time.
>
>
>
> --paulr
>
>
>
> *From:* Dwarf-discuss  sony@lists.dwarfstd.org> *On Behalf Of *Ben Woodard via Dwarf-discuss
> *Sent:* Sunday, November 12, 2023 10:31 AM
> *To:* Eleanor Bartle 
> *Cc:* dwarf-discuss@lists.dwarfstd.org
> *Subject:* Re: [Dwarf-discuss] Question: ETA?
>
>
>
> I’ve asked this question personally many times directly to members of the
> executive committee. The overall answer seems to be “when we are done”. The
> thing is, there are quite a few proposals sitting in the DWARF issue queue
> that have yet to be discussed AT ALL in the official DWARF committee
> meeting and the current meeting is only one hour every other week. Plus,
> there are a rather large number of additional proposals which are quite
> extensive which are still being discussed outside of the DWARF committee
> meeting and haven’t yet made it to the DWARF issue queue. E.g.
> https://github.com/ccoutant/dwarf-locations
>  which is the
> standardization effort for
>  https://www.llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html
> 
>  I
> also have 3 more that I’m incubating which haven’t seen the light of day
> yet because of some of this other work.
>
>
>
> Anyway, before the change in administration, it seemed like we were
> rushing to get DWARF6 out the door with just minor corrections and
> revisions. Now, it seems like DWARF6 is going to have many more significant
> changes in it and it is going to take a while. Personally, I’m quite glad
> for this because I feel as though a lot more work needs to be done. Before
> the change in administration, I felt DWARF6 was being rushed. I would say
> check in again in 6 months and see where we are then.
>
>
>
> In the mean time, there is a
> https://snapshots.sourceware.org/dwarfstd/dwarf-spec/ which is the
> current working draft and keep an eye on the Issue queue.
>
>
>
> -ben
>
>
>
> On Nov 12, 2023, at 1:49 AM, Eleanor Bartle via Dwarf-discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> 
>
> Is there any plan for a time to release version 6? If not a time, then a
> condition? Say "2025" or "some time in the next year" or "when no new
> proposals are accepted for three months" or "when two independent
> implementations are fully compliant".
>
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
>
> --
> 

Re: [Dwarf-discuss] Question about section .debug_aranges

2023-10-24 Thread David Blaikie via Dwarf-discuss
On Tue, Oct 24, 2023 at 6:09 AM Claudio Eterno via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Hi, I'm taking a look at docs.
> On dwarf-2.0.0.pdf I see at "7.20 Address Range Table"
>

For what it's worth, I'd encourage you to consider other options for
address lookup, as the plan at the moment is to remove this feature in
DWARFv6 ( https://dwarfstd.org/issues/220724.1.html ) - and clang, for
instance, hasn't produced these by default for 10+ years.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-10-16 Thread David Blaikie via Dwarf-discuss
On Mon, Oct 16, 2023 at 9:12 AM David Blaikie  wrote:

>
>
> On Mon, Oct 16, 2023 at 8:57 AM Alexander Yermolovich 
> wrote:
>
>> For background llvm discussion on how to implement it:
>>
>> https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445
>>
>> Thanks for explaining the issue, and proposing spec change. 
>> The question I have. Is non-bit identical TUs with the same hash a
>> fundamental issue that needs to be addressed somehow in the next version of
>> the spec? If we could have such guarantee that should simplify things quite
>> a bit. The linker can just follow the same path as for functions. Compiler
>> can generate symbol name unique for the type unit hash. So, when linker
>> comdats TU sections entries in TU list will point to correct address and no
>> special logic is needed for tombstone. I guess there is a hashing mechanism
>> in DWARF spec, but LLVM is not using it. Should we go back to it, is it
>> enough?
>>
>
> The hashing mechanism in the spec doesn't guarantee bit-identicality, I
> believe. It's structural equivalence (eg: if you produce the main type DIE
> followed by an int DIE that the main type needs, or you emit the int DIE
> first, followed by the main type DIE - these hash to the same value
> (because you start from the type DIE and hash outwards/to what it can
> reach, and has structural equivalence - int is int, no matter what offset
> it's at)) not bit identical. For a bunch of reasons this is preferable.
>
> (yes, clang takes this further and hashes based on the C++ ODR - which is
> off-spec, but workable in our experience)
>
> I was thinking another direction we could go is that, I think, the only
> things in a type unit that can be referenced is the type (I think?) then
> perhaps we could modify how types defined in type units are referenced.
>
> If only the type can be referenced in a type unit, we could emit a
> .debug_names entry without a DW_IDX_die_offset - just the DW_IDX_type_unit
> - and the consumer can use the header of the type unit to find the exact
> type unit DIE.
>
> Are there any other things that could be referenced within a type unit?
>

Hmm - this doesn't really relate to the local TU tombstoning issue (since
that's tombstoning the reference to the unit) but would help address the
foreign TU situation - in the DWP case, a consumer could ignore the
DW_IDX_compile_unit alongside the DW_IDx_type_unit and look up that type in
the DWP - then use the type unit's header to find the type DIE.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


[Dwarf-discuss] New Issue: Tombstoning TU entries in .debug_names

2023-10-13 Thread David Blaikie via Dwarf-discuss
(derived from:
https://lists.dwarfstd.org/mailman/private/dwarf-workgroup/2023-October/002444.html
)

# Tombstoning TU entries in `.debug_names`

## Background

Local type unit entries in `.debug_names` reference type units via their
offset in the `.debug_info` section.

Assuming a non-DWARF-aware linker, per 6.1.1.3: "When linking object files
containing per-CU indexes, the linker may choose to concatenate the indexes
as ordinary sections ..."

If a linker does this, it will need to choose what value to resolve a
relocation for the TU offset to, if that TU was discarded. (if the TUs are
bit identical, the linker might resolve all relocations to identical copies
to refer to the remaining copy - but TUs aren't guaranteed to be byte
identical - and since the offsets in the `.debug_names` table refer to
specific byte offset in the TU, these entries can't be used if a
different-but-equivalent copy of the TU is the one that's preserved by the
linker)

To deal with this, the linker would use a tombstone value of some kind to
indicate that the offset couldn't be resolved. By default, this offset
might be zero - which is a valid offset within the `.debug_info` section,
and would not be possible for the DWARF consumer to differentiate the
"ignore this index entry" from a valid index entry pointing to a TU at
offset zero.

With https://dwarfstd.org/issues/200609.1.html we chose a tombstone value
for other cases (.debug_info referencing into code - in the case of
discarded functions) of "the largest representable value". This seems like
a good foundation for addressing this new, similar case.

## Proposed Changes

Add a paragraph between the first and second/last in 6.1.1.4.3 that reads:

"Any local TU entry with a maximum representable value is considered not
present. Any index entry referencing such a local TU entry should be
ignored."
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] [DWARF5] .debug_names + fdebug-types-sections

2023-09-25 Thread David Blaikie via Dwarf-discuss
On Fri, Sep 15, 2023 at 2:45 PM Alexander Yermolovich via
Dwarf-discuss  wrote:
>
> Hello
>
> I am trying to enable debug names acceleration table with 
> fdebug-types-sections in LLVM. One part I am not sure about is the local TU 
> list. It contains an offset into .debug_info section. All the entries have an 
> index entry that points to the local TU list. DIEs within entry offsets are 
> relative to the TU entry.
>
> Linker de-duplicates Type Units using COMDAT. So, the final result will have 
> less type units. As the result Local Type Unit List will be invalid, and all 
> the Entries that point to that TU will not be valid either. Even if we Linker 
> is modified so that somehow when it de-duplicates type sections Local Type 
> Units will get the right offset, that still leaves all the duplicate entries.
> Am I missing something in that linker, specifically LLD, will need to be 
> aware of context of .debug_names sections when it de-duplicates type sections?
>
>
> It seems to me that to fully support it .debug_names need to be created by 
> post build tool (or by linker).
>
> Thanks.

While DWARF consumers will benefit from a content-aware linking of
.debug_names (using one hash table is more efficient than probing
hundreds/thousands of small hash tables), I don't believe the spec
as-is requires that for correctness.

In the case of type units, I'd expect behavior somewhat similar to how
linkers behave with inline functions - if the two copies of the
function are identical, it's possible that the linker will resolve all
relocations to the function to the single copy that remains after
linking (so two CUs would both describe the inline function "f1" and
both descriptions would have the same start address/length, the two
CUs CU-level DW_AT_ranges would overlap/both contain that function's
addresses - and neither would use the tombstone address). So in that
case, all the duplicate index entries would remain valid (their
TU-relative offsets would be correct - since the TUs were bit-wise
identical, so the offsets still point to the same things).

In the case where a producer produces equivalent but not
bitwise-identical TUs, the linker will choose one, drop the rest, and
use the tombstone value to resolve the relocation used in the local
TUs offset list. A consumer should ignore any entries that reference a
tombstone offset in the local TU list (& probably wouldn't hurt to use
the same code and ignore any tombstoned CUs too - I can't immediately
think of a situation/reason that'd happen, but seems like a good
general idea)

If a consumer does a semantic aware merge of the indexes, then it
should discard (rather than tombstoning) the index entries that
reference dead TUs and the dead TUs in the local TU list itself, and
also discard any duplicate index entries and duplicate elements in the
local TU list.

We could document the use of the tombstone in this context.
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Ranges for DW_TAG_namespace

2023-09-20 Thread David Blaikie via Dwarf-discuss
If what you're searching for could be encoded in the fast lookup
tables (.debug_names) that could save you some parsing effort.

Otherwise, if you're searching for an address and you have a lot of
non-addressable DWARF, or really large CUs or something - I can see
how namespace address ranges could help. You could always experiment
with it - implement support in your compiler of choice (assuming it's
open source/you have access to its source) for ranges on namespaces -
the encoding might not be too bad since we have a way to reference an
address range list from another range list, so the namespace's range
list could reference each subprogram's range list, if that's the best
encoding. And if it seems to be useful, could propose it as a DWARF
feature (pretty low cost feature in terms of wording, just saying
"hey, if you like, you can put ranges on namespaces" - the DWARF spec
doesn't need to tell you you can do it, you're allowed to do it
without that explicit allowance, but having the suggestion in the spec
might be nice - if it's a feature worth having)

On Thu, Sep 14, 2023 at 9:51 PM rifkin.jer--- via Dwarf-discuss
 wrote:
>
> Hello,
>
> Thank you both so much for the quick replies. Thank you for clarifying that 
> the expectation is to unconditionally traverse DIEs within a CU when looking 
> for a subprogram. I am currently working with some executables which have 
> large namespace sections that would be nice to skip over completely while 
> querying symbols. Preprocessing this information may be the next best thing, 
> but I will have to benchmark to see if it is beneficial. Despite namespaces 
> not being program entities I had incorrectly assumed there would be some 
> encoding of this information. The space vs time tradeoff is perhaps not 
> entirely clear though.
>
>
>
> Thank you again,
>
> Jeremy
>
>
>
> From: Greg Clayton 
> Sent: Thursday, September 14, 2023 9:22 PM
> To: rifkin@gmail.com
> Cc: DWARF Discuss 
> Subject: Re: [Dwarf-discuss] Ranges for DW_TAG_namespace
>
>
>
> When searching for addresses we first see if the compile unit's DW_AT_ranges 
> (or low/high pc) attribute contains the address we are looking for. Any CU 
> that doesn't contain the address doesn't need to have its child DIEs parsed, 
> just the top level DW_TAG_compile_unit DIE. Then we iterate over all the DIEs 
> always descending into all of the children looking for DW_TAG_subprogram 
> entries that contain the address we are looking for. So if we see a 
> DW_TAG_namespace we just call recursively to parse its children.
>
>
>
> From: Robinson, Paul paul.robin...@sony.com
> Sent: Thursday, September 14, 2023 7:51 PM
> To: rifkin@gmail.com; dwarf-discuss@lists.dwarfstd.org
> Subject: RE: [Dwarf-discuss] Ranges for DW_TAG_namespace
>
>
>
> I suppose it didn’t seem useful to provide ranges on namespaces. A C++ 
> namespace isn’t a program entity of its own, it’s a way of managing names of 
> entities. It doesn’t even restrict the scope of those names; you can refer to 
> them anywhere if you use the fully qualified version of the name. (With the 
> obvious caveat about names defined in anonymous namespaces.)
>
>
>
> Did you have a reason for considering a namespace to be a program entity? 
> What would that entity do?
>
> --paulr
>
>
>
>
>
> On Sep 14, 2023, at 3:50 PM, rifkin.jer--- via Dwarf-discuss 
>  wrote:
>
>
>
> Hello,
>
> What is the reasoning for not including range information on DW_TAG_namespace 
> DIEs? Is there a canonical way to check if a DW_TAG_namespace DIE contains a 
> given address?
>
>
>
> Thank you,
>
> Jeremy
>
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
>
>
>
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Sourceware infrastructure updates for Q3 2023

2023-08-30 Thread David Blaikie via Dwarf-discuss
> But please don't sent HTML email. It will make DKIM verification of
> your email impossible.

Hmm - how does HTML email relate to/interfere with DKIM verification?
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Seeking a test program with a >4GB .debug_info section

2023-04-24 Thread David Blaikie via Dwarf-discuss
On Fri, Apr 21, 2023 at 2:16 PM John DelSignore 
wrote:

>
> On 4/21/23 16:36, David Blaikie wrote:
>
> On Fri, Apr 21, 2023 at 12:44 PM John DelSignore 
> wrote:
>
>> Well, it took a long time to compile 5 CUs that contained your test code,
>> and things were looking promising, but the link failed:
>>
>> rocm2 42 04/21 15:14 /build/jdelsign/fatty % make
>> g++ -g -c fatty4.cxx -o fatty4.o
>> g++ -g -c fatty5.cxx -o fatty5.o
>> g++ -g -o fatty fatty.o fatty2.o fatty3.o fatty4.o fatty5.o
>> fatty5.o:(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32
>> against `.debug_info'
>> collect2: error: ld returned 1 exit status
>> make: *** [Makefile:5: fatty] Error 1
>> rocm2 43 04/21 15:39 /build/jdelsign/fatty %
>>
>> I guess I'm now in favor of the proposal to get rid of .debug_aranges. :-)
>>
> I guess, backing up - what's your goal/what're you trying to do with DWARF
> over 4GB?
>
> We have a TotalView user that has a gigantic executable (~9GB) and the
> .debug_info section alone is 4.9GB. A few of the other sections are about
> 1GB, but under the 32-bit limit. That's about all I know, because the user
> is at a secure location and cannot share the executable or other details.
>
> TotalView had a built-in limit of 4GB for DWARF sections. Recently, I
> increased TV's DWARF section sizes to 64-bits to handle this user's code,
> but I'm sure there are other changes that are needed to properly handle a
> .debug_info section that is that big. So, I'm looking for a test case to
> feed to TotalView.
>
Makes sense

> I changed "fatty" to compile with clang++ (instead of g++), and that
> worked. I have the following section sizes:
>
> rocm2 55 04/21 16:40 /build/jdelsign/fatty % readelf -SW fatty|grep debug
>   [30] .debug_abbrev PROGBITS 002948 000670
> 00  0   0  1
>   [31] .debug_info   PROGBITS 002fb8 15e82990e
> 00  0   0  1
>   [32] .debug_str_offsets PROGBITS 15e82c8c6
> c0db3e8 00  0   0  1
>   [33] .debug_strPROGBITS 16a907cae
> 217ee2aa 01  MS  0   0  1
>   [34] .debug_addr   PROGBITS 18c0f5f58 50
> 00  0   0  1
>   [35] .debug_line   PROGBITS 18c0f5fa8 0001dd
> 00  0   0  1
>   [36] .debug_line_str   PROGBITS 18c0f6185 4c
> 01  MS  0   0  1
> rocm2 56 04/21 16:41 /build/jdelsign/fatty %
>
> That looks like a decent test case (I haven't tried TotalView on it yet),
> so thanks for the suggestion.
>
>
> You do have to use DWARF64 for a .debug_info section over 4GB - for any
> section-relative reference in that section, such as cross-CU references
> (sec_offset), or aranges or debug_names I think, etc. Because with DWARF32
> the section references are 32bit, so can't exceed 4GB.
>
> I suspected that the application might contain 64-bit DWARF, so I asked
> the user dump/grep the .debug_info section and oddly enough all of the CUs
> are 32-bit DWARF. I have no idea if they are damaged. The CUs must not
> reference any DIEs outside their own CU.
>
Yeah, I guess if you don't use sec_offset (usually only appears with LTO,
at least that's the case with clang) and don't use aranges - there's no
need to refer to the absolute offset of a thing in .debug_info, so you can
exceed the 32 bit limit in total .debug_info size without needing DWARF64.

>
> (also, with a different example, you'd get string data over 4GB, which
> you'd also need DWARF64 for, or in DWARFv5 (with some dispensations from
> DWARFv6) you could use DWARF64 for .debug_str_offsets (assuming all string
> references were strx forms) without using DWARF64 for everything else)
>
> Understood. Luckily, my test program's string table made it with some room
> to spare.
>
>
> In Split DWARF if each contribution is <4GB you only need a 64 bit
> cu/tu_index and 64 bit str_offsets (& you could even do that selectively,
> only using DWARF64 str_offsets for contributions that need 64 bit offsets),
> since nothing else references across whole sections - which makes it much
> more scalable/easier to solve.
>
> Yes, I asked if they'd be willing to consider Split DWARF, but I haven't
> received an answer yet. I think it would save considerable link time and
> disk space, but they may have reasons why Split DWARF would not work for
> them.
>
> BTW, given that "gold" is deprecated, is there a linker than can build a
> decent ".gdb_index" to use with the Split DWARF? Without a decent index,
> it's hard to get same-day service out of the debugger with these giant
> executables.
>
lld has support for .gdb_index from .debug_gnu_pubnames/pubtypes - that's
what we use at Google. (& for Split DWARF, as far as I recall/understand,
but don't have a concrete example these days - gdb /requires/ an index for
correctness, it'll assume that a lookup failure is authoritative - it won't
actually load in all the .dwo/dwp content in the case of a missing index -

Re: [Dwarf-discuss] Seeking a test program with a >4GB .debug_info section

2023-04-21 Thread David Blaikie via Dwarf-discuss
On Fri, Apr 21, 2023 at 12:44 PM John DelSignore 
wrote:

> Well, it took a long time to compile 5 CUs that contained your test code,
> and things were looking promising, but the link failed:
>
> rocm2 42 04/21 15:14 /build/jdelsign/fatty % make
> g++ -g -c fatty4.cxx -o fatty4.o
> g++ -g -c fatty5.cxx -o fatty5.o
> g++ -g -o fatty fatty.o fatty2.o fatty3.o fatty4.o fatty5.o
> fatty5.o:(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32
> against `.debug_info'
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:5: fatty] Error 1
> rocm2 43 04/21 15:39 /build/jdelsign/fatty %
>
> I guess I'm now in favor of the proposal to get rid of .debug_aranges. :-)
>
I guess, backing up - what's your goal/what're you trying to do with DWARF
over 4GB?

You do have to use DWARF64 for a .debug_info section over 4GB - for any
section-relative reference in that section, such as cross-CU references
(sec_offset), or aranges or debug_names I think, etc. Because with DWARF32
the section references are 32bit, so can't exceed 4GB.

(also, with a different example, you'd get string data over 4GB, which
you'd also need DWARF64 for, or in DWARFv5 (with some dispensations from
DWARFv6) you could use DWARF64 for .debug_str_offsets (assuming all string
references were strx forms) without using DWARF64 for everything else)

In Split DWARF if each contribution is <4GB you only need a 64 bit
cu/tu_index and 64 bit str_offsets (& you could even do that selectively,
only using DWARF64 str_offsets for contributions that need 64 bit offsets),
since nothing else references across whole sections - which makes it much
more scalable/easier to solve.

Also, another issue is that even if you have simple/small bits of DWARF32
(in some precompiled library, etc), if your total program exceeds 64 bit,
you may not be able link the program because that DWARF32 might end up
being put at the end of the section, and so the debug_aranges, for
instance, needs to record the offset of the CU in 32 bits but can't. So
there's various discussions about linkers being able to sort the DWARF32
contributions earlier/first before the DWARF64 contributions. Then you
could still link DWARF32 precompiled libraries into huge programs that
exceed the DWARF32 limits. I can go find some links to those
threads/discussions if you need them, I think some happened in the LLVM
open source community.

- Dave


> Cheers, John D.
> On 4/21/23 13:28, John DelSignore wrote:
>
> Thanks David, this is useful. I'll see what I can cobble together.
>
> Cheers, John D.
> On 4/20/23 21:58, David Blaikie wrote:
>
> Oh, and I guess you could always make something even more artificial by
> hand - if you compile some random code with -g to assembly, you could then
> just pad out a .debug_info contribution with lots of zeros (there are some
> assembly directives for that, I think, but don't know assembly that well
> off hand) - would make it arbitrarily large without the need to tax the
> compiler creating novel/real DWARF, etc.
>
> On Thu, Apr 20, 2023 at 6:54 PM David Blaikie  wrote:
>
>> I /believe/ that Chromium (maybe specifically on ARM? not sure) may have
>> hit/had problems with the 4GB limit - probably trivially if you build with
>> clang but pass `-fstandalone-debug` which disables many type
>> reduction/deduplication strategies.
>>
>> If you want something more standalone... this:
>>
>>
>> #define MEMBERS(BASE) \
>>   int BASE##0 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##1 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##2 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##3 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##4 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##5 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##6 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##7 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##8 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int); \
>>   int BASE##9 (int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, int, int, int, int, int, int);
>> template
>> struct t1 {
>>   MEMBERS(f0)
>>   MEMBERS(f1)
>>   MEMBERS(f2)
>>   MEMBERS(f3)
>>   MEMBERS(f4)
>>   MEMBERS(f5)
>>   MEMBERS(f6)
>>   MEMBERS(f7)
>>   MEMBERS(f8)
>>   MEMBERS(f9)
>> };
>> #define ITER(A, B)\
>>   template  \
>>   struct A { \
>> B v0;   \
>> B v1;   \
>> B v2;   \
>> B v3;   

Re: [Dwarf-discuss] Seeking a test program with a >4GB .debug_info section

2023-04-20 Thread David Blaikie via Dwarf-discuss
Oh, and I guess you could always make something even more artificial by
hand - if you compile some random code with -g to assembly, you could then
just pad out a .debug_info contribution with lots of zeros (there are some
assembly directives for that, I think, but don't know assembly that well
off hand) - would make it arbitrarily large without the need to tax the
compiler creating novel/real DWARF, etc.

On Thu, Apr 20, 2023 at 6:54 PM David Blaikie  wrote:

> I /believe/ that Chromium (maybe specifically on ARM? not sure) may have
> hit/had problems with the 4GB limit - probably trivially if you build with
> clang but pass `-fstandalone-debug` which disables many type
> reduction/deduplication strategies.
>
> If you want something more standalone... this:
>
>
> #define MEMBERS(BASE) \
>   int BASE##0 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##1 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##2 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##3 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##4 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##5 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##6 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##7 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##8 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int); \
>   int BASE##9 (int, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, int, int, int, int, int);
> template
> struct t1 {
>   MEMBERS(f0)
>   MEMBERS(f1)
>   MEMBERS(f2)
>   MEMBERS(f3)
>   MEMBERS(f4)
>   MEMBERS(f5)
>   MEMBERS(f6)
>   MEMBERS(f7)
>   MEMBERS(f8)
>   MEMBERS(f9)
> };
> #define ITER(A, B)\
>   template  \
>   struct A { \
> B v0;   \
> B v1;   \
> B v2;   \
> B v3;   \
> B v4;   \
> B v5;   \
> B v6;   \
> B v7;   \
> B v8;   \
> B v9;   \
>   };
> ITER(t2, t1);
> ITER(t3, t2);
> ITER(t4, t3);
> ITER(t5, t4);
> ITER(t6, t5);
> ITER(t7, t6);
> ITER(top, t7);
> int main() {
>   t6<> v;
> }
>
> Doesn't quite hit 4GB, it's about 1.2GB in .debug_info (& takes 2.5
> minutes to compile with clang) - 5 of these (could stamp them out by
> including this file into a few other source files & just changing the
> `main` function to some other name in each)
>
> This specifically doesn't push the .debug_str section as hard - it's about
> half the size of the .debug_info in this program.
>
>
>
> On Thu, Apr 20, 2023 at 7:08 AM John DelSignore via Dwarf-discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
>> Is anyone aware of an open-source program or test program that when
>> compiled and built on Linux x86_64, results in a .debug_info section that
>> is greater than 4GB? I'm looking for a test program (realistic or not) that
>> contains 32-bit DWARF CUs in a .debug_info section that is about 5GB long,
>> or longer.
>>
>> Thanks, John D.
>>
>>
>>
>> This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.
>>
>> --
>> Dwarf-discuss mailing list
>> Dwarf-discuss@lists.dwarfstd.org
>> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
>>
>
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Seeking a test program with a >4GB .debug_info section

2023-04-20 Thread David Blaikie via Dwarf-discuss
I /believe/ that Chromium (maybe specifically on ARM? not sure) may have
hit/had problems with the 4GB limit - probably trivially if you build with
clang but pass `-fstandalone-debug` which disables many type
reduction/deduplication strategies.

If you want something more standalone... this:


#define MEMBERS(BASE) \
  int BASE##0 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##1 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##2 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##3 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##4 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##5 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##6 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##7 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##8 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int); \
  int BASE##9 (int, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, int, int, int, int, int);
template
struct t1 {
  MEMBERS(f0)
  MEMBERS(f1)
  MEMBERS(f2)
  MEMBERS(f3)
  MEMBERS(f4)
  MEMBERS(f5)
  MEMBERS(f6)
  MEMBERS(f7)
  MEMBERS(f8)
  MEMBERS(f9)
};
#define ITER(A, B)\
  template  \
  struct A { \
B v0;   \
B v1;   \
B v2;   \
B v3;   \
B v4;   \
B v5;   \
B v6;   \
B v7;   \
B v8;   \
B v9;   \
  };
ITER(t2, t1);
ITER(t3, t2);
ITER(t4, t3);
ITER(t5, t4);
ITER(t6, t5);
ITER(t7, t6);
ITER(top, t7);
int main() {
  t6<> v;
}

Doesn't quite hit 4GB, it's about 1.2GB in .debug_info (& takes 2.5 minutes
to compile with clang) - 5 of these (could stamp them out by including this
file into a few other source files & just changing the `main` function to
some other name in each)

This specifically doesn't push the .debug_str section as hard - it's about
half the size of the .debug_info in this program.



On Thu, Apr 20, 2023 at 7:08 AM John DelSignore via Dwarf-discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Is anyone aware of an open-source program or test program that when
> compiled and built on Linux x86_64, results in a .debug_info section that
> is greater than 4GB? I'm looking for a test program (realistic or not) that
> contains 32-bit DWARF CUs in a .debug_info section that is about 5GB long,
> or longer.
>
> Thanks, John D.
>
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
>
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Tables which have a unit_length header field must be contiguous.

2023-03-29 Thread David Blaikie via Dwarf-discuss
Yeah - agreed with this whole description & I'd feel comfortable with
either of the proposed additions.

On Wed, Mar 29, 2023 at 2:15 AM Keith Walker via Dwarf-discuss
 wrote:
>
> # Problem
>
>
>
> There is no statement if tables must be contiguous or if
>
> there can be padding between the tables.
>
>
>
> # Background
>
>
>
> Some sections have an implicit assumption that the tables in a section
>
> are contiguous so the section can be processed by serially reading the
>
> section. Sections in this category are:
>
>
>
>   .debug_info (Unit Headers, Section 7.5.1)
>
>   .debug_aranges (Address Lookup Tables, Section 6.1.2)
>
>   .debug_names (Name Index Section Header, Section 6.1.1)
>
>   .debug_frame (Section 6.4.1)
>
>
>
> All other tables may be accessed indirectly via an offset into a section,
>
> so in theory there is no need to ensure the tables are contiguous provided
>
> the tables are only accessed via these offsets.
>
>
>
> However there are use cases when this can be a problem
>
>
>
> - The file is "stripped" to just contain line information (.debug_line /
>
> .debug_line_str). The .debug_line_str was added for exactly this use case.
>
> There is now the assumption that the .debug_line section can be processed
>
> serially.
>
>
>
> - File dump utilities which list the contents of the sections serially.
>
>
>
> Sections with tables which have headers with a unit_length field:
>
>
>
>   .debug_aranges (Section 6.1.2)
>
>   .debug_addr (Section 7.27)
>
>   .debug_info / .debug_info.dwo (Section 7.5.1)
>
>   .debug_line / .debug_line.dwo  (Section 6.2.4)
>
>   .debug_loclists / .debug_loclists.dwo (Section 7.29)
>
>   .debug_names (Section 6.1.1)
>
>   .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
>
>   .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)
>
>
>
> Sections with tables/contributions without headers:
>
>
>
>   .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
>
>   .debug_frame (Section 6.4.1)
>
>   .debug_line_str
>
>   .debug_macro / .debug_macro.dwo (Section 6.3.1)
>
>   .debug_str / .debug_str.dwo
>
>
>
> It is a point for discussion on whether to only require the tables
>
> with a unit_length header field be contiguous, or should all tables be
>
> made contiguous.
>
>
>
> # Proposed Addition:
>
>
>
> 7.34 Contiguous Tables
>
>
>
> Tables which start with a unit_length field must be contiguous with the
>
> preceding table in the section or start of the section if there is no
>
> preceding table.
>
>
>
> # Alternative Proposed Addition:
>
>
>
> 7.34 Contiguous Tables
>
>
>
> Tables must be contiguous with the preceding table in the section or
>
> start of the section if there is no preceding table.
>
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] ISSUE: CPU vector types.

2023-03-28 Thread David Blaikie via Dwarf-discuss
> DW_AT[_GNU]_vector is best understood not as "a hardware vector register" but 
> rather as a marker that "this type is eligible to be passed in hardware 
> vector registers at function boundaries according to the platform ABI".

My 2c would not be to describe these in terms of
hardware/implementations (that gets confusing/blurs the line between
variable/types and locations - as you say, these things can be stored
in memory, so they aren't uniquely in registers - you might have a
member of this type in a struct passed in memory and need to know the
ABI/struct layout for that, etc), but at the source level - which the
ABI is defined in those same terms. Overloading, for instance, still
applies if these are different types - so other debugger features need
to work based on this type information.

So it seems like a simpler question is:

How should DWARF producers/consumers expect to encode the source
example Ben provided (well, simplified a bit):

#include 

void f( __m128 a){
}

What DWARF should be used to describe the type of 'a'? And how does
this encoding scale to all the other similar intrinsic types?
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] OTHER or arguably ENHANCEMENT: Logo

2023-03-22 Thread David Blaikie via Dwarf-discuss
FWIW, I like the idea/reckon it's worth a revisit.

I'd shy away from equipping the dwarves with weapons/combat imagery -
maybe a smithy's hammer (or pickaxe) would be more suitable for
tooling? (this sort of thing:
https://www.kctool.com/picard-1c-blacksmiths-hammer-with-ash-handle-1800g/
- has a simple/clear outline, square on one end, tapered on the other)

As for version-specific logos, as fun as it might be, practically I'd
be happy to settle on a consistent way to render versions - putting a
number in the upper right corner, perhaps (have the dwarf hold the
tool in their right hand, so would appear on the left of the image -
or swap version/tool around as folks see fit).

In general my personal font choice is more around example (4), but the
thick/sturdy-ness of (2) is probably more in keeping with the theme.

On Tue, Mar 21, 2023 at 2:44 PM Ben Woodard via Dwarf-discuss
 wrote:
>
> It has been kind of tense around here for a while; let's have some fun.
>
> The DWARF logo is quite old. There are many problems with it as a logo.
>
> It is a png and though there appears to be several versions of it at 
> different sizes it is a raster and so it doesn't scale well
> The image itself looks scanned and then colored. This leads to some built in 
> aliasing on a pixel level which show up badly when printed.
> The color choices are not really ideal for printing and if we were going to 
> use it on anything other than on a web page like a shirt or a sticker there 
> would be problems.
> If you look at it up close, it is not really clear.
>
> I could probably go on and if people would like additional justification I 
> will but with the new administration and the push to DWARF6, I think that we 
> should consider a new logo. My wife happens to be a graphic designer 
> https://cyansamone.com/ and one of the things that she does is logo design. 
> When she came into my office and the DWARF5 standard was up on my screen and 
> she cringed for about the 50'th time at the logo, I asked her to put together 
> some ideas. In a few minutes, she came up with some options. 
> http://www.bencoyote.net/~ben/DWARF_logo_1.png
>
> The fonts (all free as in beer) and images are mix and match and any other 
> ideas are more than welcome. They are vector art and so they can be scaled. 
> This was just a quick mockup.
>
> I also thought it might be fun to have a DWARF6 specific logo that we could 
> put on tools as they become capable of handling the new version of the 
> standard. Another idea that I had but she hasn't drawn yet is the 6 dwarves 
> chasing a bug. She said that drawing that would take more than the 5 minutes 
> she had at the moment because she would have to figure out what the dwarves 
> look like in profile. http://www.bencoyote.net/~ben/DWARF_6_draft.png
>
> Yeah I know that it isn't the most important thing in the world of DWARF but 
> in relation to all the other miscellany that needed to change, I thought that 
> we should toss this on the list.
>
> -ben
>
>
>
> --
> Dwarf-discuss mailing list
> Dwarf-discuss@lists.dwarfstd.org
> https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2023-02-28 Thread David Blaikie via Dwarf-Discuss
Hmm - I guess one complication of only putting the mangling number on
the type, is that you need the scope of the lambda too... which is
tricky in this case:

extern int i;
int i = []{ return 3; }();

In this case, the lambda is mangled in the scope of the global
variable `i`: i::{lambda()#1}::operator()() const
(https://godbolt.org/z/15Eqa8ajT)

Oh, and I guess you can use a lambda without ever instantiating its
operator(), and for a generic lambda there's nothing to describe...

eg:
template
void f1(const T&){}
inline void f2() {
  f1([](auto){});
}
void f3() {
  f2();
}

Clang's DWARF for the anonymous type is:
0x0043: DW_TAG_class_type
  DW_AT_calling_convention  (DW_CC_pass_by_value)
  DW_AT_byte_size   (0x01)
  DW_AT_decl_file
("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
  DW_AT_decl_line   (4)

GCC's includes a dtor (called "~") but the type just has size,
file, line, and column.

So we could avoid using the whole mangled name of the anonymous type
in some cases - maybe it's worth having features (like being able to
provide the mangling number in an attribute, maybe being able to scope
the type inside a variable DIE? though that sounds a bit frightening)
to help in those cases, even if in some of the worst cases we'd have
to use the mangled name to reassociate anonymous types?

- Dave

On Mon, Aug 22, 2022 at 12:44 PM David Blaikie  wrote:
>
> Ping - any thoughts here?
>
> On Sun, Jul 24, 2022 at 9:08 PM David Blaikie  wrote:
> >
> > Ping on this thread - would love to hear what ideas folks have for
> > addressing the naming of anonymous types (enums, structs/classes, and
> > lambdas) - especially if it'd make it easier to go back/forth between
> > the DW_AT_name of a template with an unnamed type as a parameter and
> > the actual DIEs describing the same parameter type.
> >
> > On Tue, Jun 14, 2022 at 1:02 PM David Blaikie  wrote:
> > >
> > > Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might 
> > > solve my immediate issues in clang, but I think we should still consider 
> > > moving to a more canonical naming of lambdas that, necessarily, doesn't 
> > > include the file name (unfortunately). Probably has to include the lambda 
> > > numbering/something roughly equivalent to the mangled lambda name - it 
> > > could include type information (it'd be superfluous to a unique 
> > > identifier, but I don't think it would break consistently naming the same 
> > > type across CUs either).
> > >
> > > Anyone got ideas/preferences/thoughts on this?
> > >
> > > On Mon, Jan 24, 2022 at 5:51 PM David Blaikie  wrote:
> > >>
> > >> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl  wrote:
> > >>>
> > >>>
> > >>>
> > >>> On Jan 23, 2022, at 2:53 PM, David Blaikie  wrote:
> > >>>
> > >>> A rather common "quality of implementation" issue seems to be lambda 
> > >>> naming.
> > >>>
> > >>> I came across this due to non-canonicalization of lambda names in 
> > >>> template parameters depending on how a source file is named in Clang, 
> > >>> and GCC's seem to be very ambiguous:
> > >>>
> > >>> $ cat tmp/lambda.h
> > >>> template
> > >>> void f1(T) { }
> > >>> static int i = (f1([]{}), 1);
> > >>> static int j = (f1([]{}), 2);
> > >>> void f1() {
> > >>>   f1([]{});
> > >>>   f1([]{});
> > >>> }
> > >>> $ cat tmp/lambda.cpp
> > >>> #ifdef I_PATH
> > >>> #include 
> > >>> #else
> > >>> #include "lambda.h"
> > >>> #endif
> > >>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot 
> > >>> lambda.o | grep "f1<"
> > >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:3:20)>")
> > >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:4:20)>")
> > >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:6:6)>")
> > >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:7:6)>")
> > >>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | 
> > >>> grep "f1<"
> > >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:3:20)>")
> > >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:4:20)>")
> > >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:6:6)>")
> > >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:7:6)>")
> > >>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | 
> > >>> grep "f1<"
> > >>> DW_AT_name  ("f1 >")
> > >>> DW_AT_name  ("f1 >")
> > >>> DW_AT_name  ("f1< >")
> > >>>
> > >>> DW_AT_name  ("f1< >")
> > >>>
> > >>> (I came across this in the context of my simplified template names work 
> > >>> - rebuilding names from the DW_TAG description of the template 
> > >>> parameters - and while I'm not rebuilding names that have lambda 
> > >>> parameters (keep encoding the full string instead). The issue is if 
> > >>> some other type depending on a type with a lambda parameter 

Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2022-08-22 Thread David Blaikie via Dwarf-Discuss
Ping - any thoughts here?

On Sun, Jul 24, 2022 at 9:08 PM David Blaikie  wrote:
>
> Ping on this thread - would love to hear what ideas folks have for
> addressing the naming of anonymous types (enums, structs/classes, and
> lambdas) - especially if it'd make it easier to go back/forth between
> the DW_AT_name of a template with an unnamed type as a parameter and
> the actual DIEs describing the same parameter type.
>
> On Tue, Jun 14, 2022 at 1:02 PM David Blaikie  wrote:
> >
> > Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might 
> > solve my immediate issues in clang, but I think we should still consider 
> > moving to a more canonical naming of lambdas that, necessarily, doesn't 
> > include the file name (unfortunately). Probably has to include the lambda 
> > numbering/something roughly equivalent to the mangled lambda name - it 
> > could include type information (it'd be superfluous to a unique identifier, 
> > but I don't think it would break consistently naming the same type across 
> > CUs either).
> >
> > Anyone got ideas/preferences/thoughts on this?
> >
> > On Mon, Jan 24, 2022 at 5:51 PM David Blaikie  wrote:
> >>
> >> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl  wrote:
> >>>
> >>>
> >>>
> >>> On Jan 23, 2022, at 2:53 PM, David Blaikie  wrote:
> >>>
> >>> A rather common "quality of implementation" issue seems to be lambda 
> >>> naming.
> >>>
> >>> I came across this due to non-canonicalization of lambda names in 
> >>> template parameters depending on how a source file is named in Clang, and 
> >>> GCC's seem to be very ambiguous:
> >>>
> >>> $ cat tmp/lambda.h
> >>> template
> >>> void f1(T) { }
> >>> static int i = (f1([]{}), 1);
> >>> static int j = (f1([]{}), 2);
> >>> void f1() {
> >>>   f1([]{});
> >>>   f1([]{});
> >>> }
> >>> $ cat tmp/lambda.cpp
> >>> #ifdef I_PATH
> >>> #include 
> >>> #else
> >>> #include "lambda.h"
> >>> #endif
> >>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot 
> >>> lambda.o | grep "f1<"
> >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:3:20)>")
> >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:4:20)>")
> >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:6:6)>")
> >>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:7:6)>")
> >>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep 
> >>> "f1<"
> >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:3:20)>")
> >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:4:20)>")
> >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:6:6)>")
> >>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:7:6)>")
> >>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep 
> >>> "f1<"
> >>> DW_AT_name  ("f1 >")
> >>> DW_AT_name  ("f1 >")
> >>> DW_AT_name  ("f1< >")
> >>>
> >>> DW_AT_name  ("f1< >")
> >>>
> >>> (I came across this in the context of my simplified template names work - 
> >>> rebuilding names from the DW_TAG description of the template parameters - 
> >>> and while I'm not rebuilding names that have lambda parameters (keep 
> >>> encoding the full string instead). The issue is if some other type 
> >>> depending on a type with a lambda parameter - but then multiple uses of 
> >>> that inner type exist, from different translation units (using type 
> >>> units) with different ways of naming the same file - so then the expected 
> >>> name has one spelling, but the actual spelling is different due to the 
> >>> "./")
> >>>
> >>> But all this said - it'd be good to figure out a reliable naming - the 
> >>> naming we have here, while usable for humans (pointing to surce files, 
> >>> etc) - they don't reliably give unique names for each lambda/template 
> >>> instantiation which would make it difficult for a consumer to know if two 
> >>> entities are the same (important for types - is some function parameter 
> >>> the same type as another type?)
> >>>
> >>> While it's expected cross-producer (eg: trying to be compatible with GCC 
> >>> and Clang debug info) you have to do some fuzzy matching (eg: "f1" 
> >>> or "f1" at the most basic - there are more complicated cases) - 
> >>> this one's not possible with the data available.
> >>>
> >>> The source file/line/column is insufficient to uniquely identify a lambda 
> >>> (multiple lambdas stamped out by a macro would get all the same 
> >>> file/line/col) and valid code (albeit unlikely) that writes the same 
> >>> definition in multiple places could make the same lambda have different 
> >>> names.
> >>>
> >>> We should probably use something more like the way various ABI manglings 
> >>> do to identify these entities.
> >>>
> >>> But we should probably also do this for other unnamed types that have 
> >>> linkage (need to/would benefit from being matched up between two CUs), 
> >>> even 

Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-07-25 Thread David Blaikie via Dwarf-Discuss
Here's the posted issue: https://dwarfstd.org/ShowIssue.php?issue=220724.1

On Sun, Jul 24, 2022 at 10:56 PM David Blaikie  wrote:
>
> Posted an issue to the dwarfstd.org to propose removing
> .debug_aranges, will follow up with a link here once it's
> accepted/posted publicly.
>
> On Tue, Jun 14, 2022 at 2:02 PM Greg Clayton  wrote:
> >
> > As long as there is a DW_AT_ranges on the CU the is complete, that is good 
> > enough for LLDB. No one seems to consistently emit .debug_aranges these 
> > days so we definitely don't rely on it.
> >
> > Greg
> >
> > > On Jun 14, 2022, at 1:10 PM, David Blaikie via Dwarf-Discuss 
> > >  wrote:
> > >
> > > Given the discussion previously in this thread - does anyone have
> > > particular objections to removing .debug_aranges? (in favor of/perhaps
> > > with specific wording that /requires/ CU level ranges to be specified
> > > (ie: it's not acceptable to have a subprogram with non-empty range in
> > > a CU which doesn't cover that range) - so a consumer can look at the
> > > CU and, if it has no ranges, conclude that it has no addresses covered
> > > and skip the CU for address-related computation purposes)
> > > ___
> > > Dwarf-Discuss mailing list
> > > Dwarf-Discuss@lists.dwarfstd.org
> > > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
> >
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-07-24 Thread David Blaikie via Dwarf-Discuss
Posted an issue to the dwarfstd.org to propose removing
.debug_aranges, will follow up with a link here once it's
accepted/posted publicly.

On Tue, Jun 14, 2022 at 2:02 PM Greg Clayton  wrote:
>
> As long as there is a DW_AT_ranges on the CU the is complete, that is good 
> enough for LLDB. No one seems to consistently emit .debug_aranges these days 
> so we definitely don't rely on it.
>
> Greg
>
> > On Jun 14, 2022, at 1:10 PM, David Blaikie via Dwarf-Discuss 
> >  wrote:
> >
> > Given the discussion previously in this thread - does anyone have
> > particular objections to removing .debug_aranges? (in favor of/perhaps
> > with specific wording that /requires/ CU level ranges to be specified
> > (ie: it's not acceptable to have a subprogram with non-empty range in
> > a CU which doesn't cover that range) - so a consumer can look at the
> > CU and, if it has no ranges, conclude that it has no addresses covered
> > and skip the CU for address-related computation purposes)
> > ___
> > Dwarf-Discuss mailing list
> > Dwarf-Discuss@lists.dwarfstd.org
> > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2022-07-24 Thread David Blaikie via Dwarf-Discuss
Ping on this thread - would love to hear what ideas folks have for
addressing the naming of anonymous types (enums, structs/classes, and
lambdas) - especially if it'd make it easier to go back/forth between
the DW_AT_name of a template with an unnamed type as a parameter and
the actual DIEs describing the same parameter type.

On Tue, Jun 14, 2022 at 1:02 PM David Blaikie  wrote:
>
> Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might solve 
> my immediate issues in clang, but I think we should still consider moving to 
> a more canonical naming of lambdas that, necessarily, doesn't include the 
> file name (unfortunately). Probably has to include the lambda 
> numbering/something roughly equivalent to the mangled lambda name - it could 
> include type information (it'd be superfluous to a unique identifier, but I 
> don't think it would break consistently naming the same type across CUs 
> either).
>
> Anyone got ideas/preferences/thoughts on this?
>
> On Mon, Jan 24, 2022 at 5:51 PM David Blaikie  wrote:
>>
>> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl  wrote:
>>>
>>>
>>>
>>> On Jan 23, 2022, at 2:53 PM, David Blaikie  wrote:
>>>
>>> A rather common "quality of implementation" issue seems to be lambda naming.
>>>
>>> I came across this due to non-canonicalization of lambda names in template 
>>> parameters depending on how a source file is named in Clang, and GCC's seem 
>>> to be very ambiguous:
>>>
>>> $ cat tmp/lambda.h
>>> template
>>> void f1(T) { }
>>> static int i = (f1([]{}), 1);
>>> static int j = (f1([]{}), 2);
>>> void f1() {
>>>   f1([]{});
>>>   f1([]{});
>>> }
>>> $ cat tmp/lambda.cpp
>>> #ifdef I_PATH
>>> #include 
>>> #else
>>> #include "lambda.h"
>>> #endif
>>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot 
>>> lambda.o | grep "f1<"
>>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:3:20)>")
>>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:4:20)>")
>>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:6:6)>")
>>> DW_AT_name  ("f1<(lambda at ./tmp/lambda.h:7:6)>")
>>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep 
>>> "f1<"
>>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:3:20)>")
>>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:4:20)>")
>>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:6:6)>")
>>> DW_AT_name  ("f1<(lambda at tmp/lambda.h:7:6)>")
>>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep 
>>> "f1<"
>>> DW_AT_name  ("f1 >")
>>> DW_AT_name  ("f1 >")
>>> DW_AT_name  ("f1< >")
>>>
>>> DW_AT_name  ("f1< >")
>>>
>>> (I came across this in the context of my simplified template names work - 
>>> rebuilding names from the DW_TAG description of the template parameters - 
>>> and while I'm not rebuilding names that have lambda parameters (keep 
>>> encoding the full string instead). The issue is if some other type 
>>> depending on a type with a lambda parameter - but then multiple uses of 
>>> that inner type exist, from different translation units (using type units) 
>>> with different ways of naming the same file - so then the expected name has 
>>> one spelling, but the actual spelling is different due to the "./")
>>>
>>> But all this said - it'd be good to figure out a reliable naming - the 
>>> naming we have here, while usable for humans (pointing to surce files, etc) 
>>> - they don't reliably give unique names for each lambda/template 
>>> instantiation which would make it difficult for a consumer to know if two 
>>> entities are the same (important for types - is some function parameter the 
>>> same type as another type?)
>>>
>>> While it's expected cross-producer (eg: trying to be compatible with GCC 
>>> and Clang debug info) you have to do some fuzzy matching (eg: "f1" or 
>>> "f1" at the most basic - there are more complicated cases) - this 
>>> one's not possible with the data available.
>>>
>>> The source file/line/column is insufficient to uniquely identify a lambda 
>>> (multiple lambdas stamped out by a macro would get all the same 
>>> file/line/col) and valid code (albeit unlikely) that writes the same 
>>> definition in multiple places could make the same lambda have different 
>>> names.
>>>
>>> We should probably use something more like the way various ABI manglings do 
>>> to identify these entities.
>>>
>>> But we should probably also do this for other unnamed types that have 
>>> linkage (need to/would benefit from being matched up between two CUs), even 
>>> not lambdas.
>>>
>>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these 
>>> symbols is:
>>>
>>>  void f1<$_0>($_0)
>>>  f1<$_1>($_1)
>>>  void f1(f1()::$_2)
>>>  void f1(f1()::$_3)
>>>
>>> Should we use that instead?
>>>
>>>
>>> The only other information that the current 

Re: [Dwarf-Discuss] DWARF bitness in loclists, etc

2022-06-26 Thread David Blaikie via Dwarf-Discuss
On Sun, Jun 26, 2022 at 2:24 PM Vsevolod Alekseyev via Dwarf-Discuss
 wrote:
>
> Makes sense, thank you. It's enough for me to go with as far as parsing is 
> concerned.
>
> That said, why bother with the bitness indicator in the ...lists sections at 
> all? I can't imagine parsing them from top to bottom; debugger-type 
> applications normally look up loclists based on DIE data, and a DIE 
> implicitly carries a CU context in it already, bitness and all. Even "dump 
> all loclists" applications (e. g. readelf) don't go by scrolling through the 
> loclists section; it may contain gaps and objects other than loclists (e. g. 
> locview pairs; it's a GNU extension), and is not generally parseable without 
> looking at the DIEs anyway.
>
> As for rangelists, they may overlap. I have binaries to show.
>
> Bottom line, for dumping loc/rnglists one starts by enumerating DIEs, and 
> once you do that, you have enough context to tell the bitness without looking 
> at the CU in loclists header at all.

I can say that llvm-dwarfdump, for instance, does support/work by
dumping sections directly, not only the parts of them referenced from
CUs. Nice to have the headers so you can look at them in isolation -
but, yeah, it does fall down if there are gaps/garbage in between
valid contributions to a section, or those locview GNU extensions.

Before DWARFv5, generally what you're saying is how it works - most
sections didn't have headers, or not adequate headers (eg: debug_line
was missing some things like the bitness, I think? or maybe some other
properties - address size, perhaps) but in DWARFv5 they're mostly
complete (few gaps, like I think .debug_macinfo (or .debug_macro,
whichever one is the new one) doesn't start with a size, but does
encode the 32/64 as a flag - it'd be nicer if it used a length like
everything else, would make it easier to skip unknown version data,
etc)

> -Original Message-
> From: Dwarf-Discuss  On Behalf Of 
> David Anderson via Dwarf-Discuss
> Sent: Sunday, June 26, 2022 11:39 AM
> To: dwarf-discuss@lists.dwarfstd.org
> Subject: Re: [Dwarf-Discuss] DWARF bitness in loclists, etc
>
> On 6/26/22 05:52, Vsevolod Alekseyev via Dwarf-Discuss wrote:
> > Greetings,
> >
> > I’m involved with a Python DWARF parser, Pyelftools (
> > https://github.com/eliben/pyelftools/
> >  ). I have a question about
> > DWARF5 and the newly indexed loclists/rnglists sections, please.
> >
> > In those sections, each CU gets a block. The block starts with a
> > header, which starts with a 4/12 byte unit_length field, which also
> > serves as a bitness indicator (32/64) – right? So the size of the
> > offset values in the offset table below the header is driven by the
> > structure of unit_length. The DWARF5 standard, section 7.29 talks
> > about “32-bit DWARF” and “64-bit DWARF” without making clear which of
> > the bitness indicators should be used – the one from the original
> > DIE’s CU, or the one from the CU header loclists where the DIE points.
> > I was presuming all along that it’s latter; can someone please confirm? 
> > Thank you.
>
> The 32/64 indicator is what the standard calls  lengths and offset sizes.  
> DWARF5 Section 7.4.
>
> The intent was always that all content related to a single CU (whether in one 
> section or more than one, as in your question) have the SAME
> offset size.Meaning either place one looks at the 32/64 offset size
> related to a CU it must match the CU header offset size.
>
> Unfortunately DWARF3-DWARF5 do not clearly say this.
> I'm likely not saying it clearly...sigh.
> (DWARF2 did not allow for a 64bit offset/length size).
>
> If the offset sizes related to a single CU in sections like 
> loclists/rngslists do not match the CU offset size the DWARF is corrupt.
>
> An elf file could have  one CU with 32 and another CU with 64 bit offset size 
> mixed into a single object file.  Each with its associated loclists/rnglists 
> (etc) with the offset size of its CU.
> This possibility too was always intended (starting with DWARF3).
>
> Corrections/clarifications are welcome.
>
> Hope this makes sense.
> David Anderson
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2022-06-14 Thread David Blaikie via Dwarf-Discuss
Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might
solve my immediate issues in clang, but I think we should still consider
moving to a more canonical naming of lambdas that, necessarily, doesn't
include the file name (unfortunately). Probably has to include the lambda
numbering/something roughly equivalent to the mangled lambda name - it
could include type information (it'd be superfluous to a unique identifier,
but I don't think it would break consistently naming the same type across
CUs either).

Anyone got ideas/preferences/thoughts on this?

On Mon, Jan 24, 2022 at 5:51 PM David Blaikie  wrote:

> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl  wrote:
>
>>
>>
>> On Jan 23, 2022, at 2:53 PM, David Blaikie  wrote:
>>
>> A rather common "quality of implementation" issue seems to be lambda
>> naming.
>>
>> I came across this due to non-canonicalization of lambda names in
>> template parameters depending on how a source file is named in Clang, and
>> GCC's seem to be very ambiguous:
>>
>> $ cat tmp/lambda.h
>> template
>> void f1(T) { }
>> static int i = (f1([]{}), 1);
>> static int j = (f1([]{}), 2);
>> void f1() {
>>   f1([]{});
>>   f1([]{});
>> }
>> $ cat tmp/lambda.cpp
>> #ifdef I_PATH
>> #include 
>> #else
>> #include "lambda.h"
>> #endif
>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
>> lambda.o | grep "f1<"
>> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")
>> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")
>> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")
>> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")
>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
>> "f1<"
>> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:3:20)>")
>> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:4:20)>")
>> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:6:6)>")
>> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:7:6)>")
>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
>> "f1<"
>> DW_AT_name  ("*f1<*f1():: >")
>> DW_AT_name  ("*f1<*f1():: >")
>> DW_AT_name  ("*f1<* >")
>>
>> DW_AT_name  ("*f1<* >")
>>
>> (I came across this in the context of my simplified template names work -
>> rebuilding names from the DW_TAG description of the template parameters -
>> and while I'm not rebuilding names that have lambda parameters (keep
>> encoding the full string instead). The issue is if some other type
>> depending on a type with a lambda parameter - but then multiple uses of
>> that inner type exist, from different translation units (using type units)
>> with different ways of naming the same file - so then the expected name has
>> one spelling, but the actual spelling is different due to the "./")
>>
>> But all this said - it'd be good to figure out a reliable naming - the
>> naming we have here, while usable for humans (pointing to surce files, etc)
>> - they don't reliably give unique names for each lambda/template
>> instantiation which would make it difficult for a consumer to know if two
>> entities are the same (important for types - is some function parameter the
>> same type as another type?)
>>
>> While it's expected cross-producer (eg: trying to be compatible with GCC
>> and Clang debug info) you have to do some fuzzy matching (eg: "f1" or
>> "f1" at the most basic - there are more complicated cases) - this
>> one's not possible with the data available.
>>
>> The source file/line/column is insufficient to uniquely identify a lambda
>> (multiple lambdas stamped out by a macro would get all the same
>> file/line/col) and valid code (albeit unlikely) that writes the same
>> definition in multiple places could make the same lambda have different
>> names.
>>
>> We should probably use something more like the way various ABI manglings
>> do to identify these entities.
>>
>> But we should probably also do this for other unnamed types that have
>> linkage (need to/would benefit from being matched up between two CUs), even
>> not lambdas.
>>
>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for
>> these symbols is:
>>
>>  void f1<$_0>($_0)
>>  f1<$_1>($_1)
>>  void f1(f1()::$_2)
>>  void f1(f1()::$_3)
>>
>> Should we use that instead?
>>
>>
>> The only other information that the current human-readable DWARF name
>> carries is the file+line and that is fully redundant with DW_AT_file/line,
>> so the above scheme seem reasonable to me. Poorly symbolicated backtraces
>> would be worse in this scheme, so I'm expecting most pushback from users
>> who rely on a tool that just prints the human readable name with no source
>> info.
>>
>
> Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway,
> so encoding it in the type name does seem redundant and inefficient 

Re: [Dwarf-Discuss] CU-local types

2022-06-14 Thread David Blaikie via Dwarf-Discuss
On Tue, Jun 14, 2022 at 2:01 PM Greg Clayton  wrote:
>
> Template types are emitted for C++ in DWARF as specialized instances only, 
> there is no generic definition of the type. One of the issues that impedes 
> LLDB from functioning correctly in the expression parser for C++ with 
> templates is how the accelerator table entries are emitted. If you have a 
> "std::vector", the only accelerator table entry, if there is one at all, 
> contains this full name ("std::vector"). LLDB uses clang as the 
> expression parser, so if someone types this into their expression, the end up 
> with one entry that points this full name to the matching entry. LLDB acts 
> and a precompiled header for clang when evaluating expressions, so first we 
> will try and find "std" in the accelerator tables and we will find the 
> namespace, then clang will ask for the name "vector" to be found within the 
> "std" namespace and we will never find it since the name of a class is always 
> fully specified. With functions we end up with the base name of the function 
> as the DW_AT_name and we
  have the DW_AT_linkage_name for the mangled name, both of which will appear 
in the accelerator tables. But for classes we don't have the base name of the 
class at the DW_AT_name, so we never will find this template class unless we 
again ignore all accelerator tables and generate them ourselves each time we 
debug. Granted this can be fixed in LLDB at great cost of having to parse every 
DIE in all units each time we start debugging so we can make an index that 
works for these lookups, but that cost is prohibitive.

This is certainly an issue (though might be a bit different from what
you've described - at least at a quick glance it looks like for
"ns::t" we get separate entries for "ns" and for the unqualified
"t" (but not for "t") in the accelerator tables) - and with
simplified template names we may get "t" in the accelerator table
rather than "t" (& so then you'll get "t" entries for "t"
and "t", etc... and have to disambiguate them)

But that's, I think, a different topic from the one this thread is
about - how to identify which types are CU-local and which types are
not. Maybe the template accelerated access issue would be worth
another/separate thread.

>
> On Jun 14, 2022, at 1:04 PM, David Blaikie via Dwarf-Discuss 
>  wrote:
>
> On Wed, May 18, 2022 at 9:53 AM David Blaikie  wrote:
>
>
> On Wed, May 18, 2022 at 4:16 AM Robinson, Paul  wrote:
>
>
> Looks like gdb and lldb both have issues with C++ local types (either
> types defined in anonymous namespaces, or otherwise localized - eg: a
> non-local template with a local type or variable in one of its
> parameters).
> ...
> So... what could/should we do about this?
>
>
> Do you have a strong argument for why these are not debugger bugs?
> It sounds to me like gdb/lldb are handling anonymous namespaces
> incorrectly, in effect treating their contents as global rather than
> CU-local.
>
>
> Oh, right, sorry forgot to include the trickier examples.
>
> So for a non-template this isn't especially burdensome (check for an
> anonymous namespace in the parent scopes - it's language specific, but
> not a ton of weird work to do) - for templates it's a bit harder (you
> have to check every template parameter, and potentially arbitrarily
> deep - eg: you might have a template parameter that's a function type
> and one of the parameters is a pointer type and the type the pointer
> points to is local - thus the template is local. That seems a bit more
> of a stretch to ask the consumer to do totally reliably) - but the
> worst case, that at the moment there's potentially no way to
> disambiguate whether the type is local or not: A non-type template
> parameter that points to a local variable.
>
> static int x = 3;
> template struct t1 { };
> t1<> v;
>
> Currently both LLVM and GCC name this type "t1<>" and LLVM at least
> puts a DW_AT_location on the DW_TAG_template_value_parameter which
> points to the global variable (not the DW_TAG_variable, but to the
> actual ELF symbol in the file) - though this choice has some negative
> effects (causes the symbol to be "used" and linked in - which means
> that enabling debug info can effect the behavior of the program
> (global ctors in that file execute when they wouldn't've otherwise,
> etc)).
>
> If the location is provided, arguably the consumer could lookup the
> symbol and check its linkage (not always accurate - LTO might've
> internalized a variable that wasn't actually internal in the original
> source, for

Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-06-14 Thread David Blaikie via Dwarf-Discuss
Given the discussion previously in this thread - does anyone have
particular objections to removing .debug_aranges? (in favor of/perhaps
with specific wording that /requires/ CU level ranges to be specified
(ie: it's not acceptable to have a subprogram with non-empty range in
a CU which doesn't cover that range) - so a consumer can look at the
CU and, if it has no ranges, conclude that it has no addresses covered
and skip the CU for address-related computation purposes)
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] CU-local types

2022-06-14 Thread David Blaikie via Dwarf-Discuss
On Wed, May 18, 2022 at 9:53 AM David Blaikie  wrote:
>
> On Wed, May 18, 2022 at 4:16 AM Robinson, Paul  wrote:
> >
> > > Looks like gdb and lldb both have issues with C++ local types (either
> > > types defined in anonymous namespaces, or otherwise localized - eg: a
> > > non-local template with a local type or variable in one of its
> > > parameters).
> > > ...
> > > So... what could/should we do about this?
> >
> > Do you have a strong argument for why these are not debugger bugs?
> > It sounds to me like gdb/lldb are handling anonymous namespaces
> > incorrectly, in effect treating their contents as global rather than
> > CU-local.
>
> Oh, right, sorry forgot to include the trickier examples.
>
> So for a non-template this isn't especially burdensome (check for an
> anonymous namespace in the parent scopes - it's language specific, but
> not a ton of weird work to do) - for templates it's a bit harder (you
> have to check every template parameter, and potentially arbitrarily
> deep - eg: you might have a template parameter that's a function type
> and one of the parameters is a pointer type and the type the pointer
> points to is local - thus the template is local. That seems a bit more
> of a stretch to ask the consumer to do totally reliably) - but the
> worst case, that at the moment there's potentially no way to
> disambiguate whether the type is local or not: A non-type template
> parameter that points to a local variable.
>
> static int x = 3;
> template struct t1 { };
> t1<> v;
>
> Currently both LLVM and GCC name this type "t1<>" and LLVM at least
> puts a DW_AT_location on the DW_TAG_template_value_parameter which
> points to the global variable (not the DW_TAG_variable, but to the
> actual ELF symbol in the file) - though this choice has some negative
> effects (causes the symbol to be "used" and linked in - which means
> that enabling debug info can effect the behavior of the program
> (global ctors in that file execute when they wouldn't've otherwise,
> etc)).
>
> If the location is provided, arguably the consumer could lookup the
> symbol and check its linkage (not always accurate - LTO might've
> internalized a variable that wasn't actually internal in the original
> source, for instance) - but when the location is not provided there's
> no way to know whether "t1<>" is local or not.

Ping - anyone got further ideas about how to address this issue/encode
this information?
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] CU-local types

2022-05-18 Thread David Blaikie via Dwarf-Discuss
On Wed, May 18, 2022 at 4:16 AM Robinson, Paul  wrote:
>
> > Looks like gdb and lldb both have issues with C++ local types (either
> > types defined in anonymous namespaces, or otherwise localized - eg: a
> > non-local template with a local type or variable in one of its
> > parameters).
> > ...
> > So... what could/should we do about this?
>
> Do you have a strong argument for why these are not debugger bugs?
> It sounds to me like gdb/lldb are handling anonymous namespaces
> incorrectly, in effect treating their contents as global rather than
> CU-local.

Oh, right, sorry forgot to include the trickier examples.

So for a non-template this isn't especially burdensome (check for an
anonymous namespace in the parent scopes - it's language specific, but
not a ton of weird work to do) - for templates it's a bit harder (you
have to check every template parameter, and potentially arbitrarily
deep - eg: you might have a template parameter that's a function type
and one of the parameters is a pointer type and the type the pointer
points to is local - thus the template is local. That seems a bit more
of a stretch to ask the consumer to do totally reliably) - but the
worst case, that at the moment there's potentially no way to
disambiguate whether the type is local or not: A non-type template
parameter that points to a local variable.

static int x = 3;
template struct t1 { };
t1<> v;

Currently both LLVM and GCC name this type "t1<>" and LLVM at least
puts a DW_AT_location on the DW_TAG_template_value_parameter which
points to the global variable (not the DW_TAG_variable, but to the
actual ELF symbol in the file) - though this choice has some negative
effects (causes the symbol to be "used" and linked in - which means
that enabling debug info can effect the behavior of the program
(global ctors in that file execute when they wouldn't've otherwise,
etc)).

If the location is provided, arguably the consumer could lookup the
symbol and check its linkage (not always accurate - LTO might've
internalized a variable that wasn't actually internal in the original
source, for instance) - but when the location is not provided there's
no way to know whether "t1<>" is local or not.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] CU-local types

2022-05-17 Thread David Blaikie via Dwarf-Discuss
Looks like gdb and lldb both have issues with C++ local types (either
types defined in anonymous namespaces, or otherwise localized - eg: a
non-local template with a local type or variable in one of its
parameters). GDB correctly associates directly referenced types (eg:
the type of a variable doesn't get confused just because there's a
same-named-but-distinct type in another CU) where LLDB does not (gets
that correct for a type in an anonymous namespace, but not a template
that's made local via a local-typed parameter). Neither debugger then
handles overload resolution and can correctly identify that a function
taking that parameter type from another CU is not a valid overload
candidate for this type.

So... what could/should we do about this?

In theory, using DW_AT_external for non-local types would be
consistent with other DWARF usage, but then a consumer would have to
assume that all non-external types are distinct, which probably isn't
the right default given current deployments? Mandating this in DWARFv6
might be possible?
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to generate DWARF info for a template alias to a raw pointer

2022-05-09 Thread David Blaikie via Dwarf-Discuss
On Fri, May 6, 2022 at 10:08 AM Robinson, Paul via Dwarf-Discuss
 wrote:
>
> > Could someone help to point out what kind of DWARF info should
> > be generated for below c++ source? Thanks
> >
> > ```
> > template
> > using ptr = T*;
> >
> > ptr  abc;
> > ```
> >
> > We declare a template alias here, so we may generate
> > `DW_TAG_template_type_parameter` like:
> >
> > ```
> > 0x0057:   DW_TAG_base_type
> > DW_AT_name  ("int")
> > DW_AT_byte_size  (0x04)
> > DW_AT_encoding   (DW_ATE_signed)
> >
> > 0x005e:   DW_TAG_pointer_type
> > DW_AT_type(0x0057 "int")
> >
> > 0x0064:   DW_TAG_template_alias
> > DW_AT_name  ("ptr")
> > DW_AT_type(0x005e "int *")
> >
> > 0x0076: DW_TAG_template_type_parameter
> >   DW_AT_name   ("T")
> >   DW_AT_type  (0x0057 "int")
> >
> > 0x007e:   DW_TAG_variable
> > DW_AT_name  ("abc")
> > DW_AT_type(0x0064 "ptr")
> > ```
>
> This all looks okay to me, with DW_TAG_template_type_parameter
> being a child of DW_TAG_template_alias.  There's an alias
> named `ptr`, its formal parameter is `T`, its actual parameter
> is `int`, and so the alias is a typedef of `int *`.

One quirk here is that this encoding is using something like
"simplified template names" - clang and GCC currently straddle both
unsimplified names (class and function templates would have DW_AT_name
with template parameters, eg: "base_name", but
variable templates currently get simplified names (just "base_name")
along with DW_TAG_template_*_parameter DIEs)

I'm working on changes to Clang to allow opting into the simplified,
basename-only, form for everything, including function and class
templates, but that's not fully supported/tested at the moment.

I suspect for now, lldb probably won't have a perfect time with a
simplified name for a type like above - and it's probably good to at
least be consistent with the other type templates and include the
parameters in the name. But wiring it up (if you're working with
clang) to the simplified template names support to allow this to be
simplified where possible/when that option is enabled. (there are some
cases where names can't be simplified, such as pointer non-type
template parameters (because the name can't be rebuilt readily from
the debug info - some other cases are truly lossy/not possible to
rebuild)

>
> > ` DW_TAG_template_type_parameter ` should be for a notation to
> > create a template base on another template, but as you can see
> > the referred type 0x005e is not a template. What kind of
> > DWARF info should we generate here? We should use
> > ` DW_TAG_typedef` instead of ` DW_TAG_template_type_parameter`
> > for this special case?
>
> DW_TAG_template_type_parameter is correctly describing the
> parameter to the template.
>
> There's no need for a DW_TAG_typedef, because DW_TAG_template_alias
> is implicitly a typedef.

Yeah, that looks/sounds good to me.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to interpret DW_AT_artificial tag?

2022-03-03 Thread David Blaikie via Dwarf-Discuss
On Mon, Feb 28, 2022 at 2:58 PM Michael Eager  wrote:

> On 2/28/22 13:11, David Blaikie via Dwarf-Discuss wrote:
> > On Mon, Feb 28, 2022 at 12:55 PM Greg Clayton via Dwarf-Discuss
> > You could choose to not show this, but I find it is often easier to
> > show this information in case some compiler change in the future
> > marks something that you might want to see as artificial. For
> > example the "this" parameter to C++ methods is marked as artificial,
> > and people do want to see the "this" in the variables view. Granted
> > that is a variable vs a member variable.
> >
> >
> > Probably the important distinction there is that "this", while
> > artificial, is named/usable in the source whereas the vtable isn't.
> >
> > Either having some DWARF attribute to communicate that distinction, or a
> > workaround might be to detect that the vtable name uses a reserved
> > identifier could be good - I guess the reality is that DWARF consumers
> > probably hardcode some sort of "this is the vtable pointer" -o so maybe
> > we should have some DWARF attribute that communicates that, instead of
> > relying on the string name of the member? Not sure.
>
> This is a peculiarity of C++ where an artificial variable has a user-
> accessible name.  Rather than have a special attribute to distinguish
> this oddity, and since "this" is a reserved word in C++, it seems
> easiest to simply check for this special case if the language is C++.
> You can reasonably hard-code "this" knowing it can't be used for any
> other purpose.  You cannot say the same about vtable pointers which can
> have any (or no) name.
>

Other OO languages have this feature under various names (Java also uses
"this", Swift (& Objective C?) uses "self", for instance - guess there are
probably others), such that it'd probably be nice if consumers didn't have
to hardcode these special cases. I would guess some languages might also
have implicit names for loop variables or other constructs?

Not suggesting this is the highest priority to improve in DWARF, but that
it could be improved if someone felt strongly about it.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to interpret DW_AT_artificial tag?

2022-02-28 Thread David Blaikie via Dwarf-Discuss
On Mon, Feb 28, 2022 at 12:55 PM Greg Clayton via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

>
>
> On Feb 28, 2022, at 5:49 AM, Ron Louzon via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> I have an application which uses DwarfLib to extract type information from
> debug executable images.  I have found in the DWARF data that some
> structure types have a "virtual" pointer added as their first member and
> this pointer's DIE contains the tag DW_AT_artificial=true.  How does that
> pointer member affect the offsets of the members that follow it in the
> structure.
>
>
> This will cause all other members to be pushed out by a pointer size.
>
> Should this 4-byte pointer be ignored or will its size cause the other
> structure members to be pushed out in memory by 4 bytes?
>
>
> All offsets in the DWARF should be correct for all members, including
> artificial members and any members that follow it in memory. So yes, if
> there is a vtable pointer added as the first member, its offset and all
> other offsets will be correct, so there is no need to adjust anything when
> reading this data.
>
> You could choose to not show this, but I find it is often easier to show
> this information in case some compiler change in the future marks something
> that you might want to see as artificial. For example the "this" parameter
> to C++ methods is marked as artificial, and people do want to see the
> "this" in the variables view. Granted that is a variable vs a member
> variable.
>

Probably the important distinction there is that "this", while artificial,
is named/usable in the source whereas the vtable isn't.

Either having some DWARF attribute to communicate that distinction, or a
workaround might be to detect that the vtable name uses a reserved
identifier could be good - I guess the reality is that DWARF consumers
probably hardcode some sort of "this is the vtable pointer" -o so maybe we
should have some DWARF attribute that communicates that, instead of relying
on the string name of the member? Not sure.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-02-25 Thread David Blaikie via Dwarf-Discuss
On Fri, Feb 25, 2022 at 1:23 AM tom.russ...@sony.com 
wrote:

> Hi David,
>
>
>
> We don’t use .debug_aranges in our debugger (and, to my knowledge, never
> have). Our strategy is to up front load all the debug information and
> convert it to our internal format. For that reason, the sections relating
> to accelerated access are not useful for us as we’ll be visiting & indexing
> all CU DIEs ourselves.
>

Thanks for the details!


>
>
> Tom
>
>
>
> *From:* David Blaikie 
> *Sent:* 24 February 2022 22:39
> *To:* Russell, Tom 
> *Cc:* Cary Coutant ; Robinson, Paul <
> paul.robin...@sony.com>; DWARF Discuss 
> *Subject:* Re: debug_aranges use and overhead
>
>
>
> Tom - any chance you've had/could take a brief look at this issue?
>
>
>
> On Thu, Mar 11, 2021 at 1:12 PM  wrote:
>
> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>
>
>
> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>
>
>
> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
> Thanks,
>
> --paulr
>
>
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
>
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is quite different though - it collects information from
> across the DIE tree - information that is expensive to otherwise gather
> (walking the whole DIE tree).
>
> .debug_aranges is not like that for most producers (producers that do
> include the address ranges on the CU DIE) - the data is readily available
> immediately on the CU. That does involve reading some of .debug_abbrev, and
> interpreting a handful of attributes - but at least for the use cases I'm
> aware of, that overhead isn't worth the size increase.
>
> Do you have numbers on the benefits of .debug_aranges compared to parsing
> the ranges from CU DIEs?
>
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
> its children DIEs contain addresses - having that as a guarantee, or some
> preferred way of encoding zero length (high/low of 0 would be acceptable, I
> guess) would be nice & make it cheap to skip over CUs that don't have any
> address ranges)
>
> Roughly, a modern debug_aranges to me would look something like:
>
> 
> 
> 
> 
> 
>
> So it could fully re-use the rnglist encoding. If this was going to be as
> compact as possible, it'd need to be configurable which encodings it uses -
> ranges V high/low, addrx V addr - at which point it'd probably look like a
> small DIE with an inline abbrev (similar to the way DWARFv5 encodes the
> file and directory entries now, and how debug_names is self-describing) -
> at which point it looks to me a lot like parsing the CU DIEs.
>
>
>
>
>
> **
> This email and any files transmitted with it are confidential and 

Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-02-24 Thread David Blaikie via Dwarf-Discuss
Tom - any chance you've had/could take a brief look at this issue?

On Thu, Mar 11, 2021 at 1:12 PM  wrote:

> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>
>
>
> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>
>
>
> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
> Thanks,
>
> --paulr
>
>
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
>
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is quite different though - it collects information from
> across the DIE tree - information that is expensive to otherwise gather
> (walking the whole DIE tree).
>
> .debug_aranges is not like that for most producers (producers that do
> include the address ranges on the CU DIE) - the data is readily available
> immediately on the CU. That does involve reading some of .debug_abbrev, and
> interpreting a handful of attributes - but at least for the use cases I'm
> aware of, that overhead isn't worth the size increase.
>
> Do you have numbers on the benefits of .debug_aranges compared to parsing
> the ranges from CU DIEs?
>
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
> its children DIEs contain addresses - having that as a guarantee, or some
> preferred way of encoding zero length (high/low of 0 would be acceptable, I
> guess) would be nice & make it cheap to skip over CUs that don't have any
> address ranges)
>
> Roughly, a modern debug_aranges to me would look something like:
>
> 
> 
> 
> 
> 
>
> So it could fully re-use the rnglist encoding. If this was going to be as
> compact as possible, it'd need to be configurable which encodings it uses -
> ranges V high/low, addrx V addr - at which point it'd probably look like a
> small DIE with an inline abbrev (similar to the way DWARFv5 encodes the
> file and directory entries now, and how debug_names is self-describing) -
> at which point it looks to me a lot like parsing the CU DIEs.
>
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-02-24 Thread David Blaikie via Dwarf-Discuss
On Thu, Feb 24, 2022 at 2:24 PM Samy Al Bahra  wrote:

> Hi David
>
> I implemented some optimizations in the form of a specialized parser for
> fast AT_ranges scanning and performance is now comparable to lazy
> evaluation through .debug_aranges (only marginally worse assuming buffer
> cache warmed up). We've since shipped with these optimizations. I have to
> do some work in the same code base in March and will run a comparison then
> / share numbers here including after dropping buffers. If you would benefit
> from having them sooner, let me know and I'll make it happen.
>

No rush - just came across the thread and was curious if there were any
updates/closure/lessons to factor in, etc. I'm glad to hear you ended up
with fairly similar performance - that matches my expectation, that there
aren't some hidden scalability issues here. But yeah, curious to hear more
if/when you happen to have more to share.

- Dave


>
> On Thu, Feb 24, 2022 at 3:44 PM David Blaikie  wrote:
>
>> Hey Samy - curious if you ever happened to end up getting further details
>> here.
>>
>> On Fri, Apr 9, 2021 at 1:05 PM Samy Al Bahra  wrote:
>>
>>> Thanks for the detailed response David.
>>>
>>> On Fri, Apr 9, 2021 at 2:52 PM David Blaikie  wrote:
>>>
>>>> I'm not suggesting scanning all of .debug_info - only the CU DIE for
>>>> DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the
>>>> unit header's next unit offset).
>>>>
>>>
>>>> It sounded like CU ranges couldn't be used to build such an index at
>>>> all/that your code used quite a different strategy in the absence of
>>>> aranges? (rather than building the index from the CU ranges - somewhat
>>>> slower I'm sure, but I wouldn't've thought (& am trying to understand
>>>> if it is/why) so fundamentally slower that it wouldn't be the next
>>>> fallback rather than skipping the index entirely or employing some
>>>> more fundamentally different approach)
>>>
>>>
>>> This is still significantly less dense than aranges, involves more disk
>>> I/O and memory pressure. Let me see what optimizations I can implement here
>>> and get back to you with the results / what I came up with. This would be a
>>> better basis for apples to apples comparison.
>>>
>>>
>>>>
>>>> If you mean building ranges from all the DIEs deep inside a CU - yeah,
>>>> that's going to be fundamentally slower in a bunch of ways that maybe
>>>> I could see that would necessitate a totally different approach/that
>>>> the index wouldn't make sense anymore (though I'd still like to
>>>> understand it) - but I'm especially curious about the case where the
>>>> CU DIE itself does have comprehensive address range information.
>>>>
>>>
>>> Will report back on this.
>>>
>>>
>>>>
>>>> - Dave
>>>>
>>>> >
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> (+ complexities Greg mentions later in the thread). In cases where
>>>> we lack this, we use our own persistent cache which introduces unnecessary
>>>> complexity. Now I am considering going as far as adding a multi-threaded
>>>> indexer for cases where a persistent cache / build system modifications
>>>> aren't an option (work to begin in the next week or two).
>>>> >>>
>>>> >>> .debug_aranges would provide a lot of value to our users.
>>>> >>>
>>>> >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss <
>>>> dwarf-discuss@lists.dwarfstd.org> wrote:
>>>> >>>>
>>>> >>>> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>>>> >>>>>
>>>> >>>>> Hopefully not to side-track things too much... maybe wants its own
>>>> >>>>> thread, if there's more to debate here.
>>>> >>>>
>>>> >>>>
>>>> >>>> Yeah, how about we spin it off into another thread (done here)
>>>> >>>>
>>>> >>>>>
>>>> >>>>> >> For the case you suggested where it would be useful to keep
>>>> the range
>>>> >>>>> >> list for the CU in the .o file, I think .debug_aranges is what
>>>> you're
>>>> >>>>> >> looking for.
>>&

Re: [Dwarf-Discuss] debug_aranges use and overhead

2022-02-24 Thread David Blaikie via Dwarf-Discuss
Hey Samy - curious if you ever happened to end up getting further details
here.

On Fri, Apr 9, 2021 at 1:05 PM Samy Al Bahra  wrote:

> Thanks for the detailed response David.
>
> On Fri, Apr 9, 2021 at 2:52 PM David Blaikie  wrote:
>
>> I'm not suggesting scanning all of .debug_info - only the CU DIE for
>> DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the
>> unit header's next unit offset).
>>
>
>> It sounded like CU ranges couldn't be used to build such an index at
>> all/that your code used quite a different strategy in the absence of
>> aranges? (rather than building the index from the CU ranges - somewhat
>> slower I'm sure, but I wouldn't've thought (& am trying to understand
>> if it is/why) so fundamentally slower that it wouldn't be the next
>> fallback rather than skipping the index entirely or employing some
>> more fundamentally different approach)
>
>
> This is still significantly less dense than aranges, involves more disk
> I/O and memory pressure. Let me see what optimizations I can implement here
> and get back to you with the results / what I came up with. This would be a
> better basis for apples to apples comparison.
>
>
>>
>> If you mean building ranges from all the DIEs deep inside a CU - yeah,
>> that's going to be fundamentally slower in a bunch of ways that maybe
>> I could see that would necessitate a totally different approach/that
>> the index wouldn't make sense anymore (though I'd still like to
>> understand it) - but I'm especially curious about the case where the
>> CU DIE itself does have comprehensive address range information.
>>
>
> Will report back on this.
>
>
>>
>> - Dave
>>
>> >
>> >>
>> >>
>> >>>
>> >>> (+ complexities Greg mentions later in the thread). In cases where we
>> lack this, we use our own persistent cache which introduces unnecessary
>> complexity. Now I am considering going as far as adding a multi-threaded
>> indexer for cases where a persistent cache / build system modifications
>> aren't an option (work to begin in the next week or two).
>> >>>
>> >>> .debug_aranges would provide a lot of value to our users.
>> >>>
>> >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss <
>> dwarf-discuss@lists.dwarfstd.org> wrote:
>> >>>>
>> >>>> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>> >>>>>
>> >>>>> Hopefully not to side-track things too much... maybe wants its own
>> >>>>> thread, if there's more to debate here.
>> >>>>
>> >>>>
>> >>>> Yeah, how about we spin it off into another thread (done here)
>> >>>>
>> >>>>>
>> >>>>> >> For the case you suggested where it would be useful to keep the
>> range
>> >>>>> >> list for the CU in the .o file, I think .debug_aranges is what
>> you're
>> >>>>> >> looking for.
>> >>>>> >
>> >>>>> > aranges has been off by default in LLVM for a while - it adds a
>> lot of
>> >>>>> > overhead (doesn't have all the nice rnglist encodings for
>> instance -
>> >>>>> > nor can it use debug_addr, and if it did it'd still be duplicate
>> with
>> >>>>> > the CU ranges wherever they were).
>> >>>>>
>> >>>>> Did you want to file an issue to improve how .debug_aranges works?
>> >>>>
>> >>>>
>> >>>> I don't currently understand the value it provides, and I at least
>> don't have a use case for it, so I'm not sure I'd be the best person to
>> advocate/drive that work.
>> >>>>
>> >>>>> Complaining that it duplicates CU ranges is missing the point,
>> though;
>> >>>>> it's an index, like .debug_names, of course it duplicates other
>> info.
>> >>>>> If you want to suggest an improved index, like we did with
>> .debug_names,
>> >>>>> that would be great too.
>> >>>>
>> >>>>
>> >>>> .debug_names is quite different though - it collects information
>> from across the DIE tree - information that is expensive to otherwise
>> gather (walking the whole DIE tree).
>> >>>>
>> >>>> .debug_aranges is not like that for most producers (produc

Re: [Dwarf-Discuss] clang -flto and LLVMgold.so

2022-02-22 Thread David Blaikie via Dwarf-Discuss
(might be easier to pick up from the thread where this came up or at least
CC/to the folks you're referring to (though perhaps you did and that got
stripped by the mailing list or something, not sure) - at least for me, I
have highlights for emails addressed to me that helps prioritizing/knowing
if there's a response expected from me & handy for context - I forget what
particular thing we were talking about regard to LTO & would have to go
search up the original thread for the context)

Yeah, I don't know too much about the LLVM release process, to be honest,
it surprises me that libLLVMgold.so is no longer part of the release.

But you should be able to use LTO with lld instead: -flto -fuse-ld=lld

Give that a go? Happy to help further.

On Tue, Feb 22, 2022 at 10:41 AM David Anderson via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> A followon To Jan K and David B Feb 1 2022:
>
> Binary release on github
> clang+llvm-13.0.0-x86_64-linux-gnu-ubuntu-20.04.tar.xz
> does not contain LLVMgold.so
>
> So adding -flto just results in the inability
> compile anything.
>
> Suggestions?
> Thanks.
> David Anderson
>
> --
> Overflow on /dev/null, please empty the bit bucket.
> -- seen on slashdot.org
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Producing .debug_names and questions about this lookup format

2022-02-01 Thread David Blaikie via Dwarf-Discuss
On Tue, Feb 1, 2022 at 3:45 PM Greg Clayton via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> I am in the process if writing a tool that can add .debug_names to a file
> that contains DWARF but doesn’t have an accelerator table that is adequate
> for debuggers to use. I was trying to see some example implementations of
> the .debug_names section to see exactly what I should emit from this tool.
>
> I have a question on the following section from the DWARF 5
> specification: 6.1.1.4.7 Abbreviations Table. It contains "Table 6.1: Index
> attribute encodings” which shows the available attributes for the
> abbreviations that can be used for each index entry:
>
> DW_IDX_compile_unit. // Index of CU
> DW_IDX_type_unit  // Index of TU
> DW_IDX_die_offset // Offset of DIE within CU or TU
> DW_IDX_parent // Index of name table entry for parent
> DW_IDX_type_hash // Hash of type declaration
>
>
> My main question is what are the best practices for how producers should
> emit an entry for a specific DIE in this lookup table. One option is to
> specify both a CU index and a relative DIE offset:
>
> DW_IDX_compile_unit + DW_FORM_dataX form
> DW_IDX_die_offset + DW_FORM_ref4
>
> “DW_FORM_ref4” is known to be a CU relative offset.
>
> Or we can simply emit a single attribute for the die_offset using an
> absolute DIE offset reference?
>
> DW_IDX_die_offset + DW_FORM_ref_addr
>
> There isn’t much written up in the DWARF spec about these attributes
> except that it seems to imply that DW_IDX_die_offset must also have a CU or
> TU since the comment says "Offset of DIE within CU or TU”
>
> I was looking at the output of dsymutil, a smart DWARF linker maintained
> by Apple, and it can emit .debug_names with an option, but it emits
> the following abbreviation, as seen by using “llvm-dwarfdump --debug-names”:
>
> Abbreviation 0x16 {
>   Tag: DW_TAG_typedef
>   DW_IDX_die_offset: DW_FORM_ref4
> }
>
> And the in the Entry for each type, dsymutil only emits a DIE offset as
> a DW_FORM_ref4, which seems to imply it is an absolute offset:
>
>   Bucket 0 [
> Name 1 {
>   Hash: 0x4D51C8E0
>   String: 0x10c4 "pthread_t"
>   Entry @ 0x5a7 {
> Abbrev: 0x16
> Tag: DW_TAG_typedef
> DW_IDX_die_offset: 0x2162
>   }
> }
> Name 2 {
>   Hash: 0x8FEC1B20
>   String: 0x0323 "long int"
>   Entry @ 0x5ad {
> Abbrev: 0x24
> Tag: DW_TAG_base_type
> DW_IDX_die_offset: 0x0920
>   }
> }
>   ]
>
>
> Should dsymutil be emitting both a CU index and a CU relative offset, or
> should it just switch over to using a DW_FORM_ref_addr as the encoding for
> the DIE?
>
> Do any current compilers support emitting .debug_names when “-gdwarf-5” is
> produced?
>

Clang does (if you pass -gpubnames in addition to -gdwarf-5). Looks like it
only uses DW_FORM_ref4 as though they were absolute (not unit-relative)
offsets. Looks like @Pavel Labath  added this DWARFv5
.debug_names support in https://reviews.llvm.org/D43286

Ah, looks like the implementation only adds IDX_compile_unit if there's
more than one unit being produced:
https://github.com/llvm/llvm-project/blob/d7dd7ad827a0a78314f3c9b55f4778a6059840f3/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp#L410

Ah, this is consistent with DWARFv5, page 141 (6.1.1.2 Structure of the
Name Index):

"Compilation Unit (CU), a reference to an entry in the list of CUs. In a
per-CU index, index entries without this attribute implicitly refer to the
single CU."

So the ref4 is CU relative, even though there's no IDX_compile_unit
attribute on these entries - because there's only one CU in the table, so
there's an implicit value of IDX_compile_unit == 0.


> Any clarification or pointers to other producers would be appreciated!
>
> Greg Clayton
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2022-01-25 Thread David Blaikie via Dwarf-Discuss
On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl  wrote:

>
>
> On Jan 23, 2022, at 2:53 PM, David Blaikie  wrote:
>
> A rather common "quality of implementation" issue seems to be lambda
> naming.
>
> I came across this due to non-canonicalization of lambda names in template
> parameters depending on how a source file is named in Clang, and GCC's seem
> to be very ambiguous:
>
> $ cat tmp/lambda.h
> template
> void f1(T) { }
> static int i = (f1([]{}), 1);
> static int j = (f1([]{}), 2);
> void f1() {
>   f1([]{});
>   f1([]{});
> }
> $ cat tmp/lambda.cpp
> #ifdef I_PATH
> #include 
> #else
> #include "lambda.h"
> #endif
> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
> lambda.o | grep "f1<"
> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")
> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")
> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")
> DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")
> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
> "f1<"
> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:3:20)>")
> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:4:20)>")
> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:6:6)>")
> DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:7:6)>")
> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
> "f1<"
> DW_AT_name  ("*f1<*f1():: >")
> DW_AT_name  ("*f1<*f1():: >")
> DW_AT_name  ("*f1<* >")
>
> DW_AT_name  ("*f1<* >")
>
> (I came across this in the context of my simplified template names work -
> rebuilding names from the DW_TAG description of the template parameters -
> and while I'm not rebuilding names that have lambda parameters (keep
> encoding the full string instead). The issue is if some other type
> depending on a type with a lambda parameter - but then multiple uses of
> that inner type exist, from different translation units (using type units)
> with different ways of naming the same file - so then the expected name has
> one spelling, but the actual spelling is different due to the "./")
>
> But all this said - it'd be good to figure out a reliable naming - the
> naming we have here, while usable for humans (pointing to surce files, etc)
> - they don't reliably give unique names for each lambda/template
> instantiation which would make it difficult for a consumer to know if two
> entities are the same (important for types - is some function parameter the
> same type as another type?)
>
> While it's expected cross-producer (eg: trying to be compatible with GCC
> and Clang debug info) you have to do some fuzzy matching (eg: "f1" or
> "f1" at the most basic - there are more complicated cases) - this
> one's not possible with the data available.
>
> The source file/line/column is insufficient to uniquely identify a lambda
> (multiple lambdas stamped out by a macro would get all the same
> file/line/col) and valid code (albeit unlikely) that writes the same
> definition in multiple places could make the same lambda have different
> names.
>
> We should probably use something more like the way various ABI manglings
> do to identify these entities.
>
> But we should probably also do this for other unnamed types that have
> linkage (need to/would benefit from being matched up between two CUs), even
> not lambdas.
>
> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these
> symbols is:
>
>  void f1<$_0>($_0)
>  f1<$_1>($_1)
>  void f1(f1()::$_2)
>  void f1(f1()::$_3)
>
> Should we use that instead?
>
>
> The only other information that the current human-readable DWARF name
> carries is the file+line and that is fully redundant with DW_AT_file/line,
> so the above scheme seem reasonable to me. Poorly symbolicated backtraces
> would be worse in this scheme, so I'm expecting most pushback from users
> who rely on a tool that just prints the human readable name with no source
> info.
>

Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway,
so encoding it in the type name does seem redundant and inefficient indeed
(beyond/independent of the correctness issues).

> GCC's mangling's different (in these examples that's OK, since they're all
> internal linkage):
>
>  void f1(f1()::'lambda0'())
>  void f1(f1()::'lambda'())
>
> If I add an example like this:
>
> inline auto f1() { return []{}; }
>
> and instantiate the template with the result of f1:
>
>  void f1(f2()::'lambda'())
>
> GCC:
>
>  void f1(f2()::'lambda'())
>
> So they consistently use the same mangling - we could use the same naming
> for template parameters?
>
> How should we communicate this sort of identity for unnamed types in the
> DIEs describing the types themselves (not just the string of a template
> name of a type instantiated with the unnamed type) so the 

[Dwarf-Discuss] lambda (& other anonymous type) identification/naming

2022-01-23 Thread David Blaikie via Dwarf-Discuss
A rather common "quality of implementation" issue seems to be lambda naming.

I came across this due to non-canonicalization of lambda names in template
parameters depending on how a source file is named in Clang, and GCC's seem
to be very ambiguous:

$ cat tmp/lambda.h

template

void f1(T) { }

static int i = (f1([]{}), 1);

static int j = (f1([]{}), 2);

void f1() {

  f1([]{});

  f1([]{});

}

$ cat tmp/lambda.cpp

#ifdef I_PATH

#include 

#else

#include "lambda.h"

#endif

$ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
lambda.o | grep "f1<"

DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")

DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")

DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")

DW_AT_name  ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")

$ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
"f1<"

DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:3:20)>")

DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:4:20)>")

DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:6:6)>")

DW_AT_name  ("*f1<*(lambda at tmp/lambda.h:7:6)>")

$ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
"f1<"

DW_AT_name  ("*f1<*f1():: >")

DW_AT_name  ("*f1<*f1():: >")

DW_AT_name  ("*f1<* >")

DW_AT_name  ("*f1<* >")


(I came across this in the context of my simplified template names work -
rebuilding names from the DW_TAG description of the template parameters -
and while I'm not rebuilding names that have lambda parameters (keep
encoding the full string instead). The issue is if some other type
depending on a type with a lambda parameter - but then multiple uses of
that inner type exist, from different translation units (using type units)
with different ways of naming the same file - so then the expected name has
one spelling, but the actual spelling is different due to the "./")

But all this said - it'd be good to figure out a reliable naming - the
naming we have here, while usable for humans (pointing to surce files, etc)
- they don't reliably give unique names for each lambda/template
instantiation which would make it difficult for a consumer to know if two
entities are the same (important for types - is some function parameter the
same type as another type?)

While it's expected cross-producer (eg: trying to be compatible with GCC
and Clang debug info) you have to do some fuzzy matching (eg: "f1" or
"f1" at the most basic - there are more complicated cases) - this
one's not possible with the data available.

The source file/line/column is insufficient to uniquely identify a lambda
(multiple lambdas stamped out by a macro would get all the same
file/line/col) and valid code (albeit unlikely) that writes the same
definition in multiple places could make the same lambda have different
names.

We should probably use something more like the way various ABI manglings do
to identify these entities.

But we should probably also do this for other unnamed types that have
linkage (need to/would benefit from being matched up between two CUs), even
not lambdas.

FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these
symbols is:

 void f1<$_0>($_0)

 f1<$_1>($_1)

 void f1(f1()::$_2)

 void f1(f1()::$_3)

Should we use that instead?

GCC's mangling's different (in these examples that's OK, since they're all
internal linkage):

 void f1(f1()::'lambda0'())

 void f1(f1()::'lambda'())

If I add an example like this:

inline auto f1() { return []{}; }

and instantiate the template with the result of f1:

 void f1(f2()::'lambda'())

GCC:

 void f1(f2()::'lambda'())

So they consistently use the same mangling - we could use the same naming
for template parameters?

How should we communicate this sort of identity for unnamed types in the
DIEs describing the types themselves (not just the string of a template
name of a type instantiated with the unnamed type) so the unnamed type can
be matched up between translation units.

eg, if I have these two translation units:
// header
inline auto f1() { struct { } local; return local; }
// unit 1:
#include "header"
auto f2(decltype(f1())) { }
// unit 2:
#include "header"
decltype(f1()) v1;

Currently the DWARF produced for this unnamed type is:

0x003f:   DW_TAG_structure_type

DW_AT_calling_convention(DW_CC_pass_by_value)

DW_AT_byte_size (0x01)

DW_AT_decl_file (
"/usr/local/google/home/blaikie/dev/scratch/test.cpp")

DW_AT_decl_line (1)


So there's no way to know if you see that structure type definition in two
different translation units whether they refer to the same type because
there may be multiple types that have the same DWARF description. (so no
way to know if the DWARF consumer should allow the 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-08 Thread David Blaikie via Dwarf-Discuss
On Sun, Nov 7, 2021 at 12:36 PM Todd Allen  wrote:
>
> Just spitballing an idea here, but would there be value in a new DW_FORM (or
> two) that referenced the names from .strtab or .dynstr, instead of .debug_str?

Yeah, something along those lines have crossed my mind too - I haven't
looked into it enough to understand if there's nice/generic
relocations to use for that that linkers respect (ie: preserve the
names if there's a relocation to it, even if the real symbol doesn't
make it into the linked file due to linker GC, etc). There's a couple
of wrinkles:

1) Not all the symbols are already there - a fully inlined function
might have a linkage name (does for the way we use DWARF at Google -
(clang's -fdebug-info-for-profiling) so that functions can be
identified build-over-build, even if they're inlined into al call
sites) and there's some discussion of adding type information for heap
allocations so we might want linkage_names on types too.

2) Split DWARF would, ideally, keep any linkage names that aren't
needed in the ELF file (fully inlined, types, etc) only in the Split
DWARF, not in the .o/executable

But yeah - maybe there's something down that direction, but there's
some hurdles to overcome.

> It would only work if the symbols already were there, but I would expect that
> for many/most/all(?) functions defined in the compilation unit.  It does
> somewhat relegate this to being Someone Else's Problem, but given that the
> .strtab already has the problem of zillions of these huge symbols, maybe 
> that's
> not so bad?
>
> Maybe, if that's too onerous for tools that need to manipulate .strtab, it 
> could
> reference them indirectly through a .debug_strtab_offsets section similar to
> .debug_str_offsets.
>
> On Tue, Nov 02, 2021 at 10:09:16AM -0700, Dwarf Discussion wrote:
> >On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss
> ><[1]dwarf-discuss@lists.dwarfstd.org> wrote:
> >
> >  LLDB also uses mangled names. The clang compiler is our expression
> >  parser and it always tries to resolve symbols during compilation/JIT 
> > and
> >  it supplies mangled names when looking for functions to resolve when it
> >  JITs code up. It is nice to be able to do quick name lookups using 
> > these
> >  mangled names to find the address of the function. That being said, we
> >  could work around it. Not sure how easy that would be though as mangled
> >  names can end up demangling to the same name with some loss of
> >  information and it would be important to be able to find the right in
> >  charge or out of charge constructor when the compiler asks for a
> >  specific symbol using the mangled name. We have more uses of mangled
> >  names but most of them relate to parsing the symbol tables, so removing
> >  them from DWARF wouldn't affect those areas.
> >  I wonder if these is a way to have a DW_AT_partial_linkage_name that
> >  relies on the decl context of a DIE. Like if you have a class "foo" in
> >  the global namespace it could have a DW_AT_partial_linkage_name with 
> > the
> >  value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
> >   inside this class could have another partial linkage name "3bari" that
> >  could be put together with the parent "_Z3foo" for a function like:
> >  Void foo::bar(int);
> >  Since many mangled names often start with the same prefix it might help
> >  reduce the string table size.
> >
> >It's a thought - though I'm not sure how much that would really 
> > generalize
> >across different mangling schemes that use different mechanisms for
> >backreferences, etc. Or whether the return type should be included (it's
> >included for function templates in itanium mangling, for instance -
> >presumably also in MSVC mangling, but maybe some manglings include it 
> > even
> >in non-templates? I'm not sure) - since the partial linkage name for a
> >type would be context-insensitive (since it'd be attached to the type
> >rather than any use of the type) it'd be up to the consumer to fix that
> >up, eg:
> >
> >[2]https://godbolt.org/z/TqYjeevqx
> >Itanium:
> >  f1<>(): _Z2f1IJEEvv
> >  f1(): _Z2f1IJ2t1S0_S0_EEvv
> >MSVC:
> >  f1<>(): ??$f1@$$V@@YAXXZ
> >  f1(): ??$f1@Ut1@@U1@U1@@@YAXXZ
> >I'm not sure how much less a consumer would know about mangling if it had
> >to know about how to assemble these things, insert backrefs, insert empty
> >list markers, etc - without having to know how to mangle a specific user
> >defined type or name, like "3foo" versus "@Ut1@"?
> >
> >On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss
> ><[3]dwarf-discuss@lists.dwarfstd.org> wrote:
> >Finally, a question i know the answer to!
> >It brings us all the way back to when I was the C++ maintainer for
> >GDB, which is the most ancient of history.
> >

Re: [Dwarf-Discuss] string reduction techniques

2021-11-02 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 7:14 PM Greg Clayton via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> LLDB also uses mangled names. The clang compiler is our expression parser
> and it always tries to resolve symbols during compilation/JIT and it
> supplies mangled names when looking for functions to resolve when it JITs
> code up. It is nice to be able to do quick name lookups using these mangled
> names to find the address of the function. That being said, we could work
> around it. Not sure how easy that would be though as mangled names can end
> up demangling to the same name with some loss of information and it would
> be important to be able to find the right in charge or out of charge
> constructor when the compiler asks for a specific symbol using the mangled
> name. We have more uses of mangled names but most of them relate to parsing
> the symbol tables, so removing them from DWARF wouldn’t affect those areas.
>
> I wonder if these is a way to have a DW_AT_partial_linkage_name that
> relies on the decl context of a DIE. Like if you have a class "foo" in the
> global namespace it could have a DW_AT_partial_linkage_name with the
> value "_Z3foo". A DW_TAG_subprogram that is a child of this "foo" class
>  inside this class could have another partial linkage name "3bari" that
> could be put together with the parent "_Z3foo" for a function like:
>
> Void foo::bar(int);
>
> Since many mangled names often start with the same prefix it might help
> reduce the string table size.
>

It's a thought - though I'm not sure how much that would really generalize
across different mangling schemes that use different mechanisms for
backreferences, etc. Or whether the return type should be included (it's
included for function templates in itanium mangling, for instance -
presumably also in MSVC mangling, but maybe some manglings include it even
in non-templates? I'm not sure) - since the partial linkage name for a type
would be context-insensitive (since it'd be attached to the type rather
than any use of the type) it'd be up to the consumer to fix that up, eg:

https://godbolt.org/z/TqYjeevqx
Itanium:
  f1<>(): _Z2f1IJEEvv
  f1(): _Z2f1IJ*2t1S0_S0_*EEvv
MSVC:
  f1<>(): ??$f1@$$V@@YAXXZ
  f1(): ??$f1*@Ut1@@U1@U1@*@@YAXXZ

I'm not sure how much less a consumer would know about mangling if it had
to know about how to assemble these things, insert backrefs, insert empty
list markers, etc - without having to know how to mangle a specific user
defined type or name, like "3foo" versus "@Ut1@"?


>
> On Nov 1, 2021, at 6:52 PM, Daniel Berlin via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Finally, a question i know the answer to!
>
> It brings us all the way back to when I was the C++ maintainer for GDB,
> which is the most ancient of history.
> Unfortunately, this a trip to a horrible place
> I actually spent a lot of time trying to make it so we didn't need linkage
> names, because, even then, they took up a *lot* of space.
>
> On Mon, Nov 1, 2021 at 8:35 PM Cary Coutant via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
>> >> I can't be sure about this exponential growth.  I don't have the data
>> to back it
>> >> up.  But I will say, when we created DWARF64, I was skeptical that it
>> would be
>> >> needed during my career.  And yet here we are...
>> >
>> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
>> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
>> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
>> still rather hopeful we might be able to reduce the overheads enough to
>> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>>
>> Agreed. I'd like to explore as many avenues as we can to eliminate the
>> need for DWARF64.
>>
>>
>> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
>> Our
>> >> debugger almost never uses it.  (There is one use to detect "GNU
>> indirect"
>> >> functions.)  I wonder if it would be possible to avoid them if you
>> provided
>> >> enough info about the template parameters, if the debugger had its own
>> name
>> >> mangler.  I had to write one for our debugger a couple years ago, and
>> it
>> >> definitely was a persnickety beast.  But doable with enough
>> information.  Mind
>> >> you, I'm not sure there is enough information to do it perfectly with
>> the state
>> >> of DWARF & gcc right now.
>> >
>> > Yeah, that was/is certainly my first pass - the way I've done the
>> DW_AT_name one is to have a feature in clang that produces the short name
>> "t1" but then also embeds the template argument list in the name (like
>> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
>> up the name, rebuild the original name as it would if it'd been given only
>> the simple name ("t1") and compare it to the one from clang. Then I can run
>> this over large programs and check everything round-trips correctly & in
>> clang, 

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 5:35 PM Cary Coutant  wrote:

> >> I can't be sure about this exponential growth.  I don't have the data
> to back it
> >> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> >> needed during my career.  And yet here we are...
> >
> > Yep, still got mixed feelings about DWARF64 - partly the pieces that
> we're seeing with the need for some solutions for mixed DWARF32/64, etc,
> makes it feel like maybe it's not got a bit of "settling in" to do. And I'm
> still rather hopeful we might be able to reduce the overheads enough to
> avoid widespread use of DWARF64 - but it's not a sure thing by any means.
>
> Agreed. I'd like to explore as many avenues as we can to eliminate the
> need for DWARF64.
>
>
> >> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.
> Our
> >> debugger almost never uses it.  (There is one use to detect "GNU
> indirect"
> >> functions.)  I wonder if it would be possible to avoid them if you
> provided
> >> enough info about the template parameters, if the debugger had its own
> name
> >> mangler.  I had to write one for our debugger a couple years ago, and it
> >> definitely was a persnickety beast.  But doable with enough
> information.  Mind
> >> you, I'm not sure there is enough information to do it perfectly with
> the state
> >> of DWARF & gcc right now.
> >
> > Yeah, that was/is certainly my first pass - the way I've done the
> DW_AT_name one is to have a feature in clang that produces the short name
> "t1" but then also embeds the template argument list in the name (like
> this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
> up the name, rebuild the original name as it would if it'd been given only
> the simple name ("t1") and compare it to the one from clang. Then I can run
> this over large programs and check everything round-trips correctly & in
> clang, classify any names we can't roundtrip so they get emitted in full
> rather than shortened.
> > We could do something similar with linkage names - since to know there's
> some prior art in your work there.
> >
> > I wouldn't be averse to considering what'd take to make DWARF robust
> enough to always roundtrip simple and linkage names in this way - I don't
> think it'd take a /lot/ of extra DWARF content.
>
> Fuzzy memory here, but as I recall, GCC didn't generate linkage names
> (or only did in some very specific cases) until the LTO folks
> convinced us they needed it in order to relate profile data back to
> the source. Perhaps if we came up with a better way of doing that, we
> could eliminate the linkage names.
>

Yeah, fair - it's certainly what we use it for still, authoritative names
for functions - including some amount of semantic information (so, for
instance, I believe a hash is inadequate) which allows limited rewriting
(when we changed standard libraries we were able to remap previous profile
samples to line up with the new names (different inline namespaces,
implementation names, etc) so as not to take a temporary perf hit as
profiles were regenerated, etc).
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
On Mon, Nov 1, 2021 at 1:52 PM Todd Allen 
wrote:

> Dave,
>
> If I understand right: The space saving you're expecting is the
> near-elimination
> of DW_AT_name strings.  If they are only simple names like "T" and "int",
> they
> can be placed into the string table once each, and it should be very
> small.  But
> you're expecting the DW_AT_linkage_name attributes still to have lots of
> replication because of the large composed names.  So I gather that was
> where
> your estimate of 1/2 reduction came from.
>

Yep!


> I was trying to figure out how we came to opposite conclusions, and I
> think it's
> that I have this (implicit) assumption of a sort of "DWARF Moore's Law",
> that
> the size of debug info/strings/etc. would double periodically, just based
> on the
> tendency of software systems to grooow.  I'm likening it to Moore's
> Law,
> because I expect it's the same sort of vague, rough estimate that somehow
> still
> applies to the real world.
>
> Assuming it does apply, your halving of the string table amounts to buying
> yourself one doubling period, and then you're back to requiring DWARF64
> string
> tables.  (Meanwhile, DWARF64 gives us 32 doubling periods over DWARF32.  So
> hopefully that will last us for a while...)
>

I think there's a few things at work

1) these seem to be particularly extreme cases of template metaprogramming
- they make actually be growing greater than the Moore's Law-esque
situation (eg: we might've had some natural growth rate A, but then maybe a
few years back we get this particular use case and that use case
(TensorFlow in particular) gains significant adoption growing at rate B (st
B > A) and for a while that didn't come up and then eventually it starts
"Hockey-sticking" and we see the B growth dominating the A growth)
2) Yeah, I think I agree with you that if we don't solve the linkage name
problem, we might not have much runway.


> I can't be sure about this exponential growth.  I don't have the data to
> back it
> up.  But I will say, when we created DWARF64, I was skeptical that it
> would be
> needed during my career.  And yet here we are...
>

Yep, still got mixed feelings about DWARF64 - partly the pieces that we're
seeing with the need for some solutions for mixed DWARF32/64, etc, makes it
feel like maybe it's not got a bit of "settling in" to do. And I'm still
rather hopeful we might be able to reduce the overheads enough to avoid
widespread use of DWARF64 - but it's not a sure thing by any means.


> ...
>
> The reduction for DW_AT_linkage_name does seem like a tougher nut to
> crack.  As
> you mentioned, there is a tendency to eliminate *some* of the replication
> because of the mangler's use of substitution strings (S_, S0_, S1_, etc.)
> But
> that same feature probably would make it a lot harder to do anything clever
> about chopping up the linkage names into substrings.
>

Yeah, somewhat - actually fully rebuilding it (having a fully
mangling-aware tool that can go look at the DWARF and build a linkage name
from it) would be possible for at least most/many names, but is in tension
with some of the point of DWARF to remove the need for consumers to have
such complicated knowledge... but costs/benefits/etc.


> Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
> debugger almost never uses it.  (There is one use to detect "GNU indirect"
> functions.)  I wonder if it would be possible to avoid them if you provided
> enough info about the template parameters, if the debugger had its own name
> mangler.  I had to write one for our debugger a couple years ago, and it
> definitely was a persnickety beast.  But doable with enough information.
> Mind
> you, I'm not sure there is enough information to do it perfectly with the
> state
> of DWARF & gcc right now.
>

Yeah, that was/is certainly my first pass - the way I've done the
DW_AT_name one is to have a feature in clang that produces the short name
"t1" but then also embeds the template argument list in the name (like
this: "_STNt1|") - then llvm-dwarfdump will detect this prefix, split
up the name, rebuild the original name as it would if it'd been given only
the simple name ("t1") and compare it to the one from clang. Then I can run
this over large programs and check everything round-trips correctly & in
clang, classify any names we can't roundtrip so they get emitted in full
rather than shortened.
We could do something similar with linkage names - since to know there's
some prior art in your work there.

I wouldn't be averse to considering what'd take to make DWARF robust enough
to always roundtrip simple and linkage names in this way - I don't think
it'd take a /lot/ of extra DWARF content.

- Dave

Todd
>
> On Mon, Nov 01, 2021 at 01:06:33PM -0700, David Blaikie wrote:
> >Hey Todd,
> >
> >Just some details regarding the string reduction strategies I'm
> pursuing
> >to address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo
> >sections in some large 

[Dwarf-Discuss] string reduction techniques

2021-11-01 Thread David Blaikie via Dwarf-Discuss
Hey Todd,

Just some details regarding the string reduction strategies I'm pursuing to
address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo sections
in some large binaries at Google.

So the extreme cases I'm dealing with are predominantly C++ Expression
templates (in TensorFlow and Eigen) - these produce types with very large
DW_AT_names ("f1") and DW_AT_linkage_names (eg: "_Z2f1IiEvv") (but
with many more template parameters, none of which are ever user-written but
deduced).

So the main fix I'm pursuing (roughly called "simplified template names")
is to omit template parameter lists from DW_AT_names of templates in most
cases, allowing the consumer to reconstruct the name from
DW_AT_template_*_parameters itself, recursively. Further discussion and
details here:
https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ - in
terms of how this affects scaling factors, it means that adding an
additional template instantiation of existing types would add no new data
to .debug_str (eg: going from a program with "t1" to "t1>"
would add no new entries to .debug_str). Not all names can be readily
reconstructed - so I'm opting the feature out on those, but we could have a
more deeper discussion about how to handle them if we wanted to make this a
full-fledged/robust feature (maybe one the DWARF spec suggests/encourages).

GDB seems to handle this sort of debug info OK - I guess someone did real
work to support that at some point (so maybe some other debugger already
generates DWARF like this).


The other half, though, is DW_AT_linkage_names - and in theory similar
rebuilding could be done, but that'd require baking a lot fo
implementation knowledge into the DWARF Consumer that DWARF is meant to
help avoid... so I'm unsure what the right solution is there just now, but
there's a few ideas I'm still kicking around. At least linkage names have
less redundancy (within a single name they avoid redundancy - "t1,
t1>" only ends up with a single description of "t1" instead of
two of them like you get with the DW_AT_name) than DW_AT_names, so they do
scale a bit better already.

Happy to discuss these ideas in specific, or their impact on debug_str
growth in more detail any time (here, video chat, discords, etc).

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Inconsistency of C++ member function qualifiers

2021-10-05 Thread David Blaikie via Dwarf-Discuss
On Tue, Oct 5, 2021 at 1:13 PM  wrote:

> According to https://en.cppreference.com/w/cpp/language/function the
> cv-qualifier is allowed only on non-static member functions, which are
> exactly the ones that have an implicit this-pointer parameter.
>
> *cv*
>
> -
>
> const/volatile qualification, only allowed in non-static member function
> declarations
>
> Are cv-qualified free functions or static member functions a GCC
> extension?  If so, then doing what GCC does seems like exactly the right
> thing to do.  It falls within the “permissive” nature of DWARF to do that.
> I don’t know that the DWARF standard should say anything special about it,
> though.
>

Ah, sorry, should've clarified earlier - this is only a property of
function types, not of the types of functions... because C++ is fun like
that.

You can't create a free function like "void f() const { }" but you can
create a function type, used for instance in a template type argument which
has a const qualifier: template void f1(); ... f1(); - you can't even create a function pointer of this const
qualified function type: "void (*)() const" is not a valid type, for
instance.

See for instance this C++ standards proposal:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0288r4.html#Specification
- which uses a std::function-like template, but with the const qualifier on
the function type parameter to determine whether the functor is mutable or
not. (so "std::any_invocable" is only callable when the object is
non-const, but "std::any_invocable" is callable even on if
it's a const object)


> --paulr
>
>
>
> *From:* Dwarf-Discuss  *On
> Behalf Of *David Blaikie via Dwarf-Discuss
> *Sent:* Tuesday, October 5, 2021 3:12 PM
> *To:* DWARF Discuss 
> *Subject:* [Dwarf-Discuss] Inconsistency of C++ member function qualifiers
>
>
>
> C++ member functions can be qualified in a number of ways - classic CV
> (const and volatile) qualifiers, and since C++11, lvalue (&) and rvalue
> (&&) reference qualifiers. Details here:
> https://en.cppreference.com/w/cpp/language/member_functions
> <https://urldefense.com/v3/__https:/en.cppreference.com/w/cpp/language/member_functions__;!!JmoZiZGBv3RvKRSx!tqhSDxClelx78nUz9oi9l27R5fYiWC6bR-gPJvMTM8FbJ-K2FSVDb2wi9pRr3rfbCg$>
>
> A note on 5.10, page 127 says:
>
> "C++ const-volatile qualifiers are encoded as part of the type of the
> “this”-pointer. C++11 reference and rvalue-reference qualifiers are encoded
> using the DW_AT_reference and DW_AT_rvalue_reference attributes,
> respectively. See also Section 5.7.8 on page 120."
>
> Though this appears to be inadequate, because C++ allows these qualifiers
> on any function type - even one without a first parameter necessary to
> carry the const/volatile qualifiers.
>
> eg:
> template
> struct t1 { };
>
> t1 v1;
>
> GCC implements this type by using DW_TAG_const_type around a
> DW_TAG_subroutine_type. I've implemented the same behavior in Clang
> recently.
>
> For actual member functions (eg: void (some_type::*)() const) both Clang
> and GCC put the const type on the artificial first parameter rather than by
> wrapping the type in DW_TAG_const_type.
>
>
> Does this seem acceptable, should we do something different to unify the
> representation between these two cases? Should we add some more
> non-normative text in 5.10/p127?
>
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] Inconsistency of C++ member function qualifiers

2021-10-05 Thread David Blaikie via Dwarf-Discuss
C++ member functions can be qualified in a number of ways - classic CV
(const and volatile) qualifiers, and since C++11, lvalue (&) and rvalue
(&&) reference qualifiers. Details here:
https://en.cppreference.com/w/cpp/language/member_functions

A note on 5.10, page 127 says:

"C++ const-volatile qualifiers are encoded as part of the type of the
“this”-pointer. C++11 reference and rvalue-reference qualifiers are encoded
using the DW_AT_reference and DW_AT_rvalue_reference attributes,
respectively. See also Section 5.7.8 on page 120."

Though this appears to be inadequate, because C++ allows these qualifiers
on any function type - even one without a first parameter necessary to
carry the const/volatile qualifiers.

eg:
template
struct t1 { };
t1 v1;

GCC implements this type by using DW_TAG_const_type around a
DW_TAG_subroutine_type. I've implemented the same behavior in Clang
recently.

For actual member functions (eg: void (some_type::*)() const) both Clang
and GCC put the const type on the artificial first parameter rather than by
wrapping the type in DW_TAG_const_type.


Does this seem acceptable, should we do something different to unify the
representation between these two cases? Should we add some more
non-normative text in 5.10/p127?
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] 170427.3 Extending loclists with common sublists

2021-06-30 Thread David Blaikie via Dwarf-Discuss
On Wed, Jun 30, 2021 at 5:37 AM Mark Wielaard via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Hi,
>
> We discussed 170427.3 Extending loclists with common sublists in the
> last meeting. http://dwarfstd.org/ShowIssue.php?issue=170427.3
>
> This issue was original part of a group of proposals to introduce
> Location Views. Location views allow the user to observe multiple
> program states at the same program counter. It would allow a user to
> see that one instruction does various state changes expressed in the
> source program that the compiler optimized into one instruction (for
> example increment a variable in two steps in the code, which the
> compiler would optimize into one step). Having multiple views makes
> having similar location lists more common and/or makes it necessary to
> mark those location lists as part of a particular view. But having a
> generic mechanism for having common sublists seems useful in general.
>
> This particular issue was split in two because it originally described
> two mechanisms, one to extend loclists with user defined operations,
> which became http://dwarfstd.org/ShowIssue.php?issue=170427.2
>
> This proposal deals just with introducing two new operands
> DW_LLE_extend_loclistx and DW_LLE_extend_loclist. One extends the
> location list with to content of the loclist from the given index, the
> other uses an offset into the loclists section.
>
> There were various comments on this proposal:
>
> - The original proposal imagined unknown loclist operations would
>   end the current location list. Having such an implicit action on
>   any unknown operation seems unwanted. So it was proposed to delete
>   the part of the proposal.
>
> - We like to keep the operands of loclists and rangelists the same.
>   And if it is useful to construct loclists from common sublists, it
>   seems it would be useful to construct ranges from common subranges.
>   Add the same operands to rangelists.
>

For what it's worth - I can see at least a plausible motivation for range
list extension: It's not uncommon for the ranges of a parent DIE to consist
of the union of the ranges of the children where there may be no ability to
form larger more contiguous ranges in the parent (that would be a more
compact representation), for instance with LLVM's "basic block sections"
each function/subprogram may have DW_AT_ranges, and the CU's DW_AT_ranges
would be a matter of listing all the subprogram ranges. So including the
subprogram ranges into the CU ranges rather than repeating them could
provide some savings - I'd like to see data on how much of a difference
that makes (so at least a prototype implementation) before deciding whether
it's a worthwhile feature to add.

But I don't know of a similar use case for loclists - so I'd want to see
some more motivation (as discussed below, maybe it'll become apparent with
more view numbering functionality) before going ahead with this. (though if
the rnglist case is motivating enough, I wouldn't object to equivalent
loclist functionality just for symmetry even without a solid use case)

Wording wise: The indexed version probably needs wording that loclists_base
must be specified on the CU that references any list that uses the indexed
version. (Maybe it'd be preferable to have some wording that the indexed
version can only be used when referenced via an indexed form? That way you
couldn't end up with weird sharing cases - where two CUs both reference the
same list? (Hmm, actually I'm not sure that's true - maybe two CUs can
technically have the same loclists_base?)). The sec_offset based version -
seems maybe OK that that could lead to sharing directly or indirectly (eg:
two CUs using two separate rnglists that each include some common rnglist).
Though maybe best to disallow this if it isn't already - now that rnglist
contributions have headers, perhaps we could/intend to restrict things such
that each contribution is unique to a single CU and a CU can only reference
rnglists within that contrtibution? (so there's no intent to support
sharing rnglists between CUs)


>
> - It seems more natural to call this include instead of extend.
>   So rename DW_LLE_extend_loclistx and DW_LLE_extend_loclist to
>   DW_LLE_include_loclistx and DW_LLE_include_loclist.
>
> - Maybe add a note that creating loops with sublists including each
>   other is not allowed? Seems obvious, similar to some other constructs
>   in DWARF that could create a loop. Not sure if it needs to be
>   explicitly spelled out here.
>
> - It isn't really clear if this is needed or is actually an
>   optimization without dealing with location view numbering first.
>   So it was decided to revisit this issue (with the above changes)
>   after we reviewed that proposal:
>   http://dwarfstd.org/ShowIssue.php?issue=170427.1
>
> Attached is the updated proposal with the above comments incorporated.
>
> Cheers,
>
> Mark
> ___
> Dwarf-Discuss 

Re: [Dwarf-Discuss] How to map [[no_unique_address]] into DWARF

2021-06-07 Thread David Blaikie via Dwarf-Discuss
Ah, in the sense that you want to be able to derive new types based on
the DWARF?
Fair enough.

Raphael's suggestion seems reasonable to me.

On Mon, Jun 7, 2021 at 11:20 AM Jan Kratochvil
 wrote:
>
> On Mon, 07 Jun 2021 20:11:16 +0200, David Blaikie via Dwarf-Discuss wrote:
> > On Mon, Jun 7, 2021 at 10:58 AM Jan Kratochvil via Dwarf-Discuss 
> >  wrote:
> > >
> > > clang-12 will create the same DWARF for class B with [[no_unique_address]]
> > > either present or not. Despite that class C derived from B has different
> > > layout depending on from which class B it gets derived
> >
> > Why is this ^ a problem? The layout seems accurate - in that 'a'
> > shares its address with 'c' - so both members having the same offset
> > seems like an accurate representation of the layout of the struct?
>
> $ echo 'struct A {}; struct B { [[no_unique_address]] A a; } b;'|clang -Wall 
> -g -c -o a.o -x c++ -;lldb ./a.o
> (lldb) expr -- struct C:B{char c;};&((C *)nullptr)->c);
> (lldb) expr -- struct C:B{char c;};sizeof(C);
>
> Actual:
> (char *) $0 = 0x0001 ""
> (unsigned long) $1 = 2
>
> Expected:
> (char *) $0 = 0x
> (unsigned long) $1 = 1
>
>
> Jan
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to map [[no_unique_address]] into DWARF

2021-06-07 Thread David Blaikie via Dwarf-Discuss
On Mon, Jun 7, 2021 at 10:58 AM Jan Kratochvil via Dwarf-Discuss
 wrote:
>
> Hi,
>
> clang-12 will create the same DWARF for class B with [[no_unique_address]]
> either present or not. Despite that class C derived from B has different
> layout depending on from which class B it gets derived

Why is this ^ a problem? The layout seems accurate - in that 'a'
shares its address with 'c' - so both members having the same offset
seems like an accurate representation of the layout of the struct?

> :
>
> --
> struct A {};
> struct By { [[no_unique_address]] A a; };
> struct Cy : By { char c; } cy;
> struct Bn {   A a; };
> struct Cn : Bn { char c; } cn;
> #include 
> int main() {
>   std::cout << "sizeof(Cy) = " << sizeof(Cy) << " offsetof(Cy, c) = " << 
> offsetof(Cy, c) << "\n";
>   std::cout << "sizeof(Cn) = " << sizeof(Cn) << " offsetof(Cn, c) = " << 
> offsetof(Cn, c) << "\n";
> }
> // sizeof(Cy) = 1 offsetof(Cy, c) = 0
> // sizeof(Cn) = 2 offsetof(Cn, c) = 1
> --
>
> gcc-11 creates a different DWARF for the two variants of B but that DWARF does
> not look as compliant to me:
>
> --
>
> DW_TAG_structure_type
>   DW_AT_name   ("B")
>   DW_AT_byte_size  (1)
>
>   DW_TAG_member
> DW_AT_name ("a")
> DW_AT_type (0x001e "A")
> By:  DW_AT_data_member_location (-1)
> Bn:  DW_AT_data_member_location (0x00)
>
>   NULL
> --
>
> From a discussion at https://reviews.llvm.org/D101237 :
>
> (1) Raphael Isemann (teemperor) notes:
> FWIW, I took a look at the DWARF standard and I think that is actually
> something we should already emit in the form of a
> "DW_AT_byte_size 0" attribute at the field? Quote:
> If the size of a data member is not the same as the size of the type
> given for the data member, the data member has either
> a DW_AT_byte_size or a DW_AT_bit_size attribute whose integer constant
> value (see Section 2.19) is the amount of storage needed to hold the
> value of the data member.
>
> (2) David Blaikie maybe meant in the beginning new DW_AT_no_unique_address
> but IIUC that may have been probably later deprecated as not really 
> needed.
>
> (3) One could also omit such member completely but that would break template
> metaprogramming based on such missing types from DWARF.
>
> I do not see many other options. That (1) would look like:
>
> --
> DW_TAG_structure_type
>   DW_AT_calling_convention   (DW_CC_pass_by_value)
>   DW_AT_name ("B")
>   DW_AT_byte_size(0x01)
>
>   DW_TAG_member
> DW_AT_name   ("a")
> DW_AT_type   (0x0065 "A")
> Bn:
> By: DW_AT_byte_size  (0x00)
> DW_AT_data_member_location   (0x00)
>
>   NULL
> --
>
>
> Jan
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] What to do with Pascal properties?

2021-06-05 Thread David Blaikie via Dwarf-Discuss
On Sat, Jun 5, 2021 at 5:59 AM Joost van der Sluis via Dwarf-Discuss
 wrote:
>
> Op 03-06-2021 om 00:50 schreef David Blaikie via Dwarf-Discuss:
> > On Fri, May 28, 2021 at 8:29 AM Joost van der Sluis via Dwarf-Discuss
> >  wrote:
> >> Now in Pascal there are 'properties'. Maybe you know these from c# which
> >> has something alike. Basically a property is an alias in a structure
> >> that links to other members of the same structure for reading, writing
> >> and/or storage-information.
> >
> > Hmm - so this isn't like C# where you can write arbitrary code for the
> > property? In this case it's only direct access - but you get to choose
> > whether it's read only, read-write, etc, compared to having the member
> > directly be public?
> >
> > That might 'just work' with existing DWARF consumers/debuggers if you
> > had the alias as another member (as you say, with different access
> > level) that happens to have the same data_member_location as the
> > underlying member - you could add an extra/extension attribute to
> > describe it as being read only, etc.
> >
> > I doubt the duplication of the data_member_location would be super
> > expensive - but I could be wrong. Might be worth measuring, especially
> > if this representational choice does get you "free" support in
> > existing DWARF consumers, compared to having to teach them about new
> > attributes, etc.
>
> They can do both - direct access and arbitrary code. And I agree that in
> the case of direct access it is not that expensive.
>
> >> Example:
> >>
> >> type
> >> TMyClass=class
> >> private
> >>   FProp: Integer;
> >>   FPropIsStored: Boolean;
> >> protected
> >>   function GetProp: Integer;
> >>   function GetItem(const Index: Integer): string;
> >> public
> >>   property IndividualItem[Index: Integer]: string read GetItem;
> >> published
> >>   property Prop: Integer read GetProp stored FPropIsStored;
> >>   property OtherProp: Integer read FProp write FProp;
> >>   property Item2: string index 2 read GetItem;
> >> end;
> >>
> >> var
> >> MyClass = TMyClass;
> >>
> >> Reading MyClass.Prop effectively calls GetProp.
> >>
> >> MyClass.Prop is read-only, and during streaming the information in
> >> FPropIsStored is being used.
> >>
> >> MyClass.OtherProp is read/write, and is more or less an alias for the
> >> private FProp field.
> >>
> >> MyClass.IndividualItem[6] is accessible like it is an array. And Item2
> >> has a fixed index.
> >>
> >> I want to encode this propery into the Dwarf debug-information.
> >
> > First question I usually ask is: Is there any prior art? (does GCC
> > support this situation, what DWARF does it use to describe these
> > entities? What about other DWARF producers?)
>
> Yes, that would be easy. But I do not know of any feature in C or C++ or
> any other language that is comparable to this functionality, except for
> C#, but that does not generate Dwarf debug-information. But I am well
> aware that  do not know all existing languages that well. ;)
>
> >> At this
> >> moment we only generate debug-information for cases similar to
> >> MyClass.OtherProp by duplicating the debug-information of FProp with
> >> another visibility-class.
> >
> > Ah, guess that's my suggestion above?
>
> Yes.
>
> >> I've added the following attributes:
> >> DW_AT_FPC_property_read (0x3230)
> >> DW_AT_FPC_property_write (0x3231)
> >> DW_AT_FPC_property_stored (0x3232)
> >>
> >> And then those attributes contain a link to the corresponding
> >> DW_TAG_members.
> >
> > OK, so FW_AT_FPC_property_stored uses an address form, most likely a
> > CU-local one, like DW_FORM_ref4, etc?
>
> Well, my current implementation always uses DW_FORM_ref_addr, but I now
> realize that this should be a DW_FORM_ref4 most of the time.
> It is possible to reference a member in a another CU, though. So local
> is not always possible.
>
> >> This to keep the debug-information as compact as
> >> possible. Furthermore I've added the tag DW_TAG_FPC_property (0x4230) or
> >> else other debuggers may be confused when they encounter a DW_TAG_member
> >> with only one or more of these specific fpc-attributes.
> >>
> >> I'll also have to add something like DW_AT_FPC_property_index to

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-04-09 Thread David Blaikie via Dwarf-Discuss
On Fri, Apr 9, 2021 at 11:13 AM Samy Al Bahra  wrote:
>
> Responses inline.
>
> On Fri, Mar 19, 2021 at 9:59 PM David Blaikie  wrote:
>>
>> On Fri, Mar 19, 2021 at 9:34 AM Samy Al Bahra  wrote:
>
>
> [...]
>
>>>
>>> This is quite old (excuse the formatting) but numbers are here: 
>>> https://engineering.backtrace.io/2014-09-15-bt-lightweight-backtrace-tool/ 
>>> , search for "Chromium".  This is something other debuggers can take 
>>> advantage of if they run in a non-interactive / batch mode (think bulk 
>>> processing of millions - billions of dumps a month)
>>
>>
>> "This is something... " - what is "this" you're referring to there? Lazy 
>> loading? Yeah, for sure. Why do you restrict/suggest that a highly lazy 
>> approach would only be suitable for non-interactive/batch execution?
>
>
> This is quite old, this = blog post.
>
> This is something other debuggers can take advantage of: Lazy loading is more 
> effective for automated analysis tools than interactive debuggers which more 
> often than not don't benefit from lazy evaluation if folks are expecting 
> auto-complete for types, variables, etc... Of course, it is still useful for 
> non-blocking loads of debug data especially if you implement job cancellation 
> (allow commands to be executed concurrently while loading is being completed).
>
> [...]
>
>>
>>
>>>
>>> I'm also happy to run benchmarks for you with and without .debug_aranges on 
>>> top of our debugger if it'll be useful.
>>
>>
>> Yeah, I'd certainly be curious if you have a chance! Though it may depend a 
>> bit on what your implementation does in the absence of .debug_aranges.
>
>
> I'll get back to you on this shortly!
>
>>
>>
>>>
>>> One of the crucial optimizations we made is incremental indexing on top of 
>>> .debug_aranges based on PC values
>>
>>
>> Could you explain that in more detail - and why that approach can't be used 
>> with CU ranges?
>
>
> .debug_aranges is significantly smaller and faster to load than scanning all 
> of .debug_info.

I'm not suggesting scanning all of .debug_info - only the CU DIE for
DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the
unit header's next unit offset).

It sounded like CU ranges couldn't be used to build such an index at
all/that your code used quite a different strategy in the absence of
aranges? (rather than building the index from the CU ranges - somewhat
slower I'm sure, but I wouldn't've thought (& am trying to understand
if it is/why) so fundamentally slower that it wouldn't be the next
fallback rather than skipping the index entirely or employing some
more fundamentally different approach)

If you mean building ranges from all the DIEs deep inside a CU - yeah,
that's going to be fundamentally slower in a bunch of ways that maybe
I could see that would necessitate a totally different approach/that
the index wouldn't make sense anymore (though I'd still like to
understand it) - but I'm especially curious about the case where the
CU DIE itself does have comprehensive address range information.

- Dave

>
>>
>>
>>>
>>> (+ complexities Greg mentions later in the thread). In cases where we lack 
>>> this, we use our own persistent cache which introduces unnecessary 
>>> complexity. Now I am considering going as far as adding a multi-threaded 
>>> indexer for cases where a persistent cache / build system modifications 
>>> aren't an option (work to begin in the next week or two).
>>>
>>> .debug_aranges would provide a lot of value to our users.
>>>
>>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss 
>>>  wrote:
>>>>
>>>> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>>>>>
>>>>> Hopefully not to side-track things too much... maybe wants its own
>>>>> thread, if there's more to debate here.
>>>>
>>>>
>>>> Yeah, how about we spin it off into another thread (done here)
>>>>
>>>>>
>>>>> >> For the case you suggested where it would be useful to keep the range
>>>>> >> list for the CU in the .o file, I think .debug_aranges is what you're
>>>>> >> looking for.
>>>>> >
>>>>> > aranges has been off by default in LLVM for a while - it adds a lot of
>>>>> > overhead (doesn't have all the nice rnglist encodings for instance -
>>>>> > nor can it use debug_addr, and if it did it'd still be duplicate with

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-19 Thread David Blaikie via Dwarf-Discuss
On Fri, Mar 19, 2021 at 9:34 AM Samy Al Bahra  wrote:

> Hi David,
>
> Sorry I'm a bit late to the game.
>

No worries at all - appreciate any/all perspectives/data here, for sure!


> On the value of having .debug_aranges and the performance impact:
>
> Our debugger was designed for performance and does end to end lazy
> evaluation (down to the DIE).
>

Nice! (certainly aspects of LLVM's DWARF parsing I'd love to move towards
more of a lazy model like this)


> This is quite old (excuse the formatting) but numbers are here:
> https://engineering.backtrace.io/2014-09-15-bt-lightweight-backtrace-tool/
> , search for "Chromium".  This is something other debuggers can take
> advantage of if they run in a non-interactive / batch mode (think bulk
> processing of millions - billions of dumps a month)
>

"This is something... " - what is "this" you're referring to there? Lazy
loading? Yeah, for sure. Why do you restrict/suggest that a highly lazy
approach would only be suitable for non-interactive/batch execution?


> and is generally useful when folks are iterating in development (fast
> feedback for crashes while having some background indexing work going on).
>

If it's useful in non-interactive/batch and iterative - is there a use case
you're suggesting such lazy evaluation isn't applicable to?


> I'm also happy to run benchmarks for you with and without .debug_aranges
> on top of our debugger if it'll be useful.
>

Yeah, I'd certainly be curious if you have a chance! Though it may depend a
bit on what your implementation does in the absence of .debug_aranges. \/


> One of the crucial optimizations we made is incremental indexing on top of
> .debug_aranges based on PC values
>

Could you explain that in more detail - and why that approach can't be used
with CU ranges?


> (+ complexities Greg mentions later in the thread). In cases where we lack
> this, we use our own persistent cache which introduces unnecessary
> complexity. Now I am considering going as far as adding a multi-threaded
> indexer for cases where a persistent cache / build system modifications
> aren't an option (work to begin in the next week or two).
>
> .debug_aranges would provide a lot of value to our users.
>
> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
>> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>>
>>> Hopefully not to side-track things too much... maybe wants its own
>>> thread, if there's more to debate here.
>>>
>>
>> Yeah, how about we spin it off into another thread (done here)
>>
>>
>>> >> For the case you suggested where it would be useful to keep the range
>>> >> list for the CU in the .o file, I think .debug_aranges is what you're
>>> >> looking for.
>>> >
>>> > aranges has been off by default in LLVM for a while - it adds a lot of
>>> > overhead (doesn't have all the nice rnglist encodings for instance -
>>> > nor can it use debug_addr, and if it did it'd still be duplicate with
>>> > the CU ranges wherever they were).
>>>
>>> Did you want to file an issue to improve how .debug_aranges works?
>>>
>>
>> I don't currently understand the value it provides, and I at least don't
>> have a use case for it, so I'm not sure I'd be the best person to
>> advocate/drive that work.
>>
>> Complaining that it duplicates CU ranges is missing the point, though;
>>> it's an index, like .debug_names, of course it duplicates other info.
>>> If you want to suggest an improved index, like we did with .debug_names,
>>> that would be great too.
>>>
>>
>> .debug_names is quite different though - it collects information from
>> across the DIE tree - information that is expensive to otherwise gather
>> (walking the whole DIE tree).
>>
>> .debug_aranges is not like that for most producers (producers that do
>> include the address ranges on the CU DIE) - the data is readily available
>> immediately on the CU. That does involve reading some of .debug_abbrev, and
>> interpreting a handful of attributes - but at least for the use cases I'm
>> aware of, that overhead isn't worth the size increase.
>>
>> Do you have numbers on the benefits of .debug_aranges compared to parsing
>> the ranges from CU DIEs?
>>
>> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
>> its children DIEs contain addresses - having that as a guarantee, or some
>> preferred way of encoding zero length (high/low of 0 would be acceptable, I
>> guess) would be nice & make it cheap to s

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 4:29 PM Greg Clayton  wrote:

>
>
> On Mar 11, 2021, at 1:12 PM, Paul Robinson via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>
>
> We will use them if they are there, but one interesting issue that we ran
> into with LLDB is some compile units might be in .debug_aranges because the
> compiler made a .debug_aranges section in the .o file, but others might
> not. So we had to add code to LLDB to figure out which compile units have
> any entries in the .debug_aranges section, and read the DW_AT_ranges from
> the DW_TAG_compile_unit if it exist, and if it doesn't, manually index the
> DWARF to create one on the fly each time.
>
>
> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>
> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
>
> A big issue with any of the DWARF sections is we are subject to making the
> contents work with linkers that just want to concatenate + relocate. This
> often leads to information being kept around when dead stripping occurs
> because anything that is dead stripped will just have its address zero'ed
> out or -1'ed out, but this bogus info is still in the data.
>

Yeah, we talked some last year about formalizing this more into the -1
tombstone - I thought maybe Paul had proposed that for standardization,
though at a glance I don't see the proposal. It's probably somewhere there.


> If we don't need a format that can simply be concatenated and relocated,
> the GSYM format, which is open sourced in llvm.org already, might be good
> inspiration for a .debug_aranges successor section that has very efficient
> lookups. The GSYM format could actually be used as is by adding only a new
> DIE offset IntoType.
>
> Besides ".debug_names", all other DWARF accelerator tables are really just
> random indexes that must be linearly scanned or pre-indexed prior to being
> used because of the concatenate + relocate style that is used for these
> DWARF sections. It would be great if any future accelerator tables are "map
> into memory and use as is" kind of tables like ".debug_names" and the
> ".apple_XXX" name accelerator tables.
>

Ah, fair point - could come up with a rather different structure if it were
designed for fast on-disk query (though then, like .debug_names (which I
don't think we have any linkers that can link today, for instance), you'd
probably /really/ want it to be linked in a content-aware manner, because
probing separate lookup tables (even if they're more designed for that)
per-CU doesn't probably gain you a lot).

- Dave


>
>
> Thanks,
> --paulr
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is 

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 1:12 PM  wrote:

> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>

Yeah, I'd be curious to know more, for sure. Might resort to writing the
smallest DWARF parser to, say, handle address queries using debug_aranges
or CU ranges for comparison.


> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>

Yeah, it might be that it's more practical to fixup debug_aranges for dead
stripping than it is to fixup debug_rnglists (I mean, it is, for sure
easier to do) - and that fixing up likely makes the aranges much more
compact and thus cheaper to use/parse, which might be part of the
motivation for them.

One of the things I've thought about in that direction would be a flag on
debug_rnglists contributions (a bit in the header) that says "all rnglists
in here are referenced /only/ by rnglistx" - that way a linker could know
that it could rewrite the whole rnglist contribution and so long as it
fixed up the offset table at the start to adjust for any shrinking or
removed rnglists, it would still be correct. Hmm, now that I think about it
-such an attribute wouldn't be needed, necessarily - if the linker was
willing to adjust how relocations referring to the debug_rnglist section
were applied as things shifted around. (& you've got to use relocations
anyway, if you're not using rnglistx)


> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
> Thanks,
>
> --paulr
>
>
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
>
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is quite different though - it collects information from
> across the DIE tree - information that is expensive to otherwise gather
> (walking the whole DIE tree).
>
> .debug_aranges is not like that for most producers (producers that do
> include the address ranges on the CU DIE) - the data is readily available
> immediately on the CU. That does involve reading some of .debug_abbrev, and
> interpreting a handful of attributes - but at least for the use cases I'm
> aware of, that overhead isn't worth the size increase.
>
> Do you have numbers on the benefits of .debug_aranges compared to parsing
> the ranges from CU DIEs?
>
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
> its children DIEs contain addresses - having that as a guarantee, or some
> preferred way of encoding zero length (high/low of 0 would be acceptable, I
> guess) would be nice & make it cheap to skip over CUs that don't have any
> address ranges)
>
> Roughly, a modern debug_aranges to me would look something like:
>
> 
> 
> 
> 
> 
>
> So it could fully re-use the rnglist encoding. If this was going to be as
> compact as possible, it'd need to be configurable which encodings it uses -
> ranges V 

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 12:07 PM Mark Wielaard  wrote:

> Hi David,
>
> On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> > > > (I went to look a bit further and GCC's .debug_loclists.dwo but it
> seems
> > > > there's something about it that llvm-dwarfdump can't understand - it
> only
> > > > prints a handful of rather mangled location lists... not sure which
> > > > component (GCC, llvm-dwarfdump, or both) is getting things confused
> here
> > > -
> > > > oh, maybe some kind of DWARF extension for the "views" system, by the
> > > looks
> > > > of it)
> > >
> > > Yes, you might try -gno-variable-location-views or simply use binutils
> or
> > > elfutils readelf to look at them.
> > >
> >
> > Thanks! - is this proposed as a DWARF extension? I thought I remembered
> it
> > coming up, but hadn't realized how non-standard it was/that it was
> already
> > implemented. (quick search on the issues page and I can't find any
> mention
> > of it at least)
>
> We kind of need a dwarf-extensions discussion list to document/discuss
> these kind of non-extendable DWARF extensions. Only half kidding. Some
> things in DWARF are well designed to allow vendor extensions that can
> be skipped/ignored, but some aren't and we probably need to coordinate
> more because it is years between standard spec releases.
>

Yeah, happy to do that anywhere - dwarf-discuss is probably OK for it, I'd
guess.

& happy to co-implement DWARF extensions/future proposals - especially when
they're carving out an extension space, so it's less a question of "is this
a good extension" (a more nuanced/difficult debate - then it comes down to
is anyone going to use it/need it in lldb, etc) and more "is it reasonable
for this feature to be extensible and how should that work". Won't mean
immediate implementation in LLVM, but at least agreeing on the direction &
will make adding support at least in dumpers more clearly
motivated/understood/etc.

Implementing not-yet-standardized things, especially if they look like a
plausible direction for the standard, is a good thing - getting some
implementation experience, ironing out any gotchas, etc, before it's
published and possibly more widely adopted. (that said, I wouldn't mind
knowing what "widely adopted" looks like - folks mention maintaining old or
obscure toolchains, but not sure if they're using more modern DWARF, or is
it basically Clang, LLDB, GCC, and GDB using anything like DWARFv5 and
beyond?)


> Extending loclists is a bit of a pain because they aren't really
> extendable. Making them extendable is
> http://dwarfstd.org/ShowIssue.php?issue=170427.2


Ah, indeed - thanks for the link!


> but I am still
> pondering whether that really helps here because as written you can
> only interpret them end of list, but not really skip them.
>

Yeah, it would be nice if extension opcodes had a uleb length as their
first argument.

(this is essentially the difference between a custom DW_TAG or DW_AT (very
cheap, easy for consumers to ignore if they don't recognize it) and a
custom DW_FORM (expensive - consumer can't parse the list at all) - though
I guess this extension issue /might/ fall in between, as you could read the
list up until you hit an extension, and use that partial information for
locations, even if you couldn't parse all of it)


> Location views themselves are
> http://dwarfstd.org/ShowIssue.php?issue=170427.1


Right right - thanks for that!


> Alexandre Oliva, who proposed the Location Views as DWARF
> extension. He has some more background material at
> http://www.fsfla.org/~lxoliva/papers/sfn/ Which is mostly on variable
> tracking assignments and statement frontier annotations.  Which
> describes the GCC implementation that makes it possible to have
> location views.
>
> What is proposed is slightly different from what GCC currently
> implements though. Caroline, Cary and I are supposed to sit down and
> discuss it to see how it can be standardized. But finding time has
> been tricky.
>

*nod* takes some time, I'm interested to see how it comes along.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 5:48 AM  wrote:

> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>

Yeah, how about we spin it off into another thread (done here)


> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>

I don't currently understand the value it provides, and I at least don't
have a use case for it, so I'm not sure I'd be the best person to
advocate/drive that work.

Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>

.debug_names is quite different though - it collects information from
across the DIE tree - information that is expensive to otherwise gather
(walking the whole DIE tree).

.debug_aranges is not like that for most producers (producers that do
include the address ranges on the CU DIE) - the data is readily available
immediately on the CU. That does involve reading some of .debug_abbrev, and
interpreting a handful of attributes - but at least for the use cases I'm
aware of, that overhead isn't worth the size increase.

Do you have numbers on the benefits of .debug_aranges compared to parsing
the ranges from CU DIEs?

(one possible issue: the CU doesn't /have/ to contain low/high/ranges if
its children DIEs contain addresses - having that as a guarantee, or some
preferred way of encoding zero length (high/low of 0 would be acceptable, I
guess) would be nice & make it cheap to skip over CUs that don't have any
address ranges)

Roughly, a modern debug_aranges to me would look something like:







So it could fully re-use the rnglist encoding. If this was going to be as
compact as possible, it'd need to be configurable which encodings it uses -
ranges V high/low, addrx V addr - at which point it'd probably look like a
small DIE with an inline abbrev (similar to the way DWARFv5 encodes the
file and directory entries now, and how debug_names is self-describing) -
at which point it looks to me a lot like parsing the CU DIEs.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 11:44 AM Jakub Jelinek  wrote:

> On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> > Thanks! - is this proposed as a DWARF extension? I thought I remembered
> it
>
> 170427.1 I think.  Note, what is emitted is different from what is being
> proposed, the problem with DW_LLE_* and DW_RLE_* is that they aren't easily
> extensible (in a way that would allow consumers that don't know about the
> extension skip it and parse just the standard ones; because when seeing
> an unknown opcode, the consumer doesn't know what arguments if any it has).
> E.g. in the way .debug_macro allows producers to define what arguments
> extension opcodes have (how many and what DW_FORM_* each has).
> So I think what GCC currently produces puts the stuff before the location
> sequences such that if a consumer can't handle those, it can skip those.
>

Ah, cunning! Yeah, there's a few places where LLVM just keeps trying to
parse the next thing, rather than only parsing parts that are referenced
from elsewhere (the other one I know of is a bug in location lists when
combined with bfd's linker tombstoning of gc'd sections (it sets any
relocation to a gc'd section to zero): if a location list were to span
across a gc'd section (such as for a global, raised into a register in one
function - LLVM can't produce the right DWARF for this, not sure about GCC)
binutils readelf, etc, will only dump sections of debug_loc that are
referenced from .debug_info, so the early list termination just leaves
holes, rather than mangled parsing trying to interpret the location
expression following the accidental terminator as the start of another
location list)


> The only thing that doesn't really work well for consumer unaware about
> that
> extension is walking the whole .debug_rnglists and dumping everything that
> it contains.
>

Yup - yeah, LLVM will just try to parse each offset and then go to the next
one, etc. (I don't think lldb would do this, hopefully - this is only an
issue with llvm-dwarfdump trying to dump as much as possible)

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 2:55 AM Mark Wielaard  wrote:

> Hi David,
>
> On Thu, Mar 11, 2021 at 01:01:05AM -0800, David Blaikie wrote:
> > +Mark in case he's got further context/perspective to share in the
> context
> > of this thread
>
> I haven't yet caught up on the mailinglist, but I think I understand
> the context, it was a discussion Simon and I had about how to handle
> .debug_rnglists in the main object file vs the split object.
>
> > One particular thing I'll pull out of the gdb-patches thread is:
> >
> > "But the rnglists
> > (loclists) themselves can still use relocations. A large part of them
> > is non-shared addresses, so using indexes (into the .debug_addr
> > addr_base) would simply be extra overhead."
> >
> > That's not quite right - while a direct mapping from debug_loc and
> > debug_ranges (at least location and range lists not using base address
> > selection entries) to debug_loclist and debug_rnglist would produce a
> > similar number of addresses and relocations - there's a lot to be gained
> by
> > using DW_RLE/LLE_base_addressx entries - then you can strategically reuse
> > an already-existing debug_addr entry and avoid another relocation all
> > together (debug_loc and debug_ranges couldn't do this, even when using a
> > base address selection entry - that base address couldn't be shared with
> > other lists, since it was inline).
>
> I admit I didn't implement anything to measure. So I can certainly be
> convinced of the opposite. But if your strategically reuse algorithm
> can also identify when it isn't strategic, then just not having an
> indirection for that address through the .debug_addr index is still a
> win. It just means you don't get to move the relocation to the
> .debug_addr. But I see why this is still important because...
>

Yeah - I haven't implemented anything in LLVM to bail out and avoid addrx
when there isn't another use of the address in DWARFv5 non-split because I
don't have much use for non-split (Google's not switched to split by
default, but that's the mode to use when size matters - so optimizing
non-split for size isn't a high priority for me) and there's no case I know
of where DW_AT_low_pc or any address in DW_AT_ranges wouldn't be used in at
least one other place: all those addresses will be used as the starting
address of a subprogram at least, so that's at least two uses.


>
> > also, as to the original motivation for Split DWARF (reduce object size,
> > reduce relocations, etc) - mostly a distributed build system where the
> cost
> > of shipping all the object files to the link step is a significant
> > bottleneck - so reduced object size (so reducing the DWARF object size -
> > which is both .debug_* sizes, and .rela.debug_* sizes equally - well,
> > except we do use -gz so .debug_* sizes are much smaller, but
> .rela.debug_*
> > is not compressed - so reducing relocations is /extra/ important).
>
> Although I see how a distributed build system where there is a cost of
> shipping object files to the linker might be a motivating factor. I
> also think that isn't a common setup. So yes to reducing relocations,
> having less work for the linker to do. But reducing transport cost
> wouldn't be that high on my list.
>

Fair enough - yeah, it boils down to similarly, as you said, fewer actions
for the linker to perform (both in terms of relocations to apply, and in
terms of bytes to write to the output file/linked executable (&
subsequently/also a smaller final linked executable)).


> > (I went to look a bit further and GCC's .debug_loclists.dwo but it seems
> > there's something about it that llvm-dwarfdump can't understand - it only
> > prints a handful of rather mangled location lists... not sure which
> > component (GCC, llvm-dwarfdump, or both) is getting things confused here
> -
> > oh, maybe some kind of DWARF extension for the "views" system, by the
> looks
> > of it)
>
> Yes, you might try -gno-variable-location-views or simply use binutils or
> elfutils readelf to look at them.
>

Thanks! - is this proposed as a DWARF extension? I thought I remembered it
coming up, but hadn't realized how non-standard it was/that it was already
implemented. (quick search on the issues page and I can't find any mention
of it at least)

(aside: Hmm, readelf doesn't have support for the offset entry tables in
either rnglists or loclists, I think:

readelf: Warning: The .debug_rnglists section contains unsupported offset
entry count: 2819.)

Unrecognized debug section: .debug_loclists.dwo

But, yeah, using readelf on a non-split-DWARF build I see these "location
view pair"s showing up.

I think you did convince me we need to look at smarter .debug_addr usage.
>

Great! Happy to chat about it further any time! I can point you to some of
the patches in LLVM and/or provide examples that demonstrate the
interesting cases of reuse.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 1:39 AM Jakub Jelinek  wrote:

> On Thu, Mar 11, 2021 at 01:05:06AM -0800, David Blaikie wrote:
> > What's your take on:
> >
> > 1) Fixing GDB to handle GCC's current output.
>
> I don't know what GDB will do, it is up to the GDB people.
>
> > 2) Fixing GCC to produce something maybe more standards conforming (to my
> > mind, ideally: ranges on the skeleton CU (using either
> > rnglists_base+rnglistx (like LLVM), or sec_offset (actually more
> > compact/better than LLVM anyway, and avoids the ambiguous situation), and
> > rnglistx in child DIEs the split full unit using using
> .debug_rnglists.dwo)
>
> Given the
> 3.1.3 "The following attributes are not part of a split full compilation
> unit entry but instead are 18 inherited (if present) from the corresponding
> skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges,
> DW_AT_stmt_list, DW_AT_comp_dir, DW_AT_str_offsets_base, DW_AT_addr_base
> and DW_AT_rnglists_base."
> sentence, at least for DWARF5 putting DW_AT_ranges into the full unit
> rather
> than skeleton unit for split DWARF seems like non-conforming, so I'll
> probably adjust my patch, but see below.
> Now, for DW_AT_addr_base and DW_AT_rnglists_base the spec talks about
> it affecting just .debug_addr or .debug_rnglists section, doesn't mention
> the .debug_rnglists.dwo section, while for DW_AT_str_offsets_base it talks
> about .debug_str_offsets or .debug_str_offsets.dwo.
> So, maybe one reason why DW_AT_rnglists_base might be ok on the skeleton
> unit.  On the other side, e.g. in Table F.1 I see there for Skeleton and
> Split:
> DW_AT_low_pc - Skeleton only
> DW_AT_ranges - Split Full only (so, in contradition of 3.1.3)
> DW_AT_rnglists_base - not present
> So, DWARF5 is inconsistent.  But appendix F is informative and so I think
> the normative 3.1.3 wins.
>

Yup, I hope to get those inconsistencies addressed through an issue I've
filed earlier today. Glad for the discussion/confirmation/etc.


> So, I think I'll go with DW_AT_ranges and DW_AT_low_pc in
> DW_TAG_skeleton_unit, but the former using DW_FORM_sec_offset rather than
> DW_FORM_rnglistx and no DW_AT_rnglists_base (I really don't see a benefit
> of that, there is one relocation either way, either on the DW_AT_ranges
> with DW_FORM_sec_offset or on DW_AT_rnglists_base with DW_FORM_sec_offset,
> but for the latter one needs one byte for the DW_FORM_rnglistx too and
> two extra bytes in .debug_abbrev.  DW_FORM_rnglistx can be beneficial if
> there
> is more than one range, which is not the case for the skeleton.
>

Yep, that's my take on it too - while I can argue that the LLVM output
is/should be valid, it's not the most efficient, and the most efficient
(using sec_offset for ranges on the skeleton CU) dodges this particular
question of validity - sounds good to me.


> But .debug_rnglists I'll probably use the *x suffixed DW_RLE_* opcodes when
> DW_RLE_offset_pair can't be used even in the skeleton .debug_rnglists.
>

Yeah, I can certainly +1 to that. Sharing the debug_addr entries is great
for object file size.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 12:32 AM Jakub Jelinek  wrote:

> On Wed, Mar 10, 2021 at 10:07:27PM -0800, David Blaikie wrote:
> > On Wed, Mar 10, 2021 at 9:38 PM Jakub Jelinek  wrote:
> >
> > > On Wed, Mar 10, 2021 at 04:12:57PM -0800, David Blaikie via
> Dwarf-Discuss
> > > wrote:
> > > > On Wed, Mar 10, 2021 at 4:02 PM Cary Coutant 
> wrote:
> > > >
> > > > > > > So in the end the logical thing to do when encountering a
> > > > > > > DW_FORM_rnglistx in a split-unit, in order to support
> everybody, is
> > > > > > > probably to go to the .debug_rnglists.dwo section, if there's
> one,
> > > > > > > disregarding the (inherited) DW_AT_rnglists_base.  If there
> isn't,
> > > then
> > > > > > > try the linked file's .debug_rnglists section, using
> > > > > > > DW_AT_rnglists_base.  If there isn't, then something is
> malformed.
> > > > >
> > > > > Looks reasonable to me. I think we need a new issue to clarify
> this in
> > > > > DWARF 6.
> > > > >
> > > >
> > > > Given that DWARFv5 isn't on by default in GCC yet & I think has a few
> > > more
> > >
> > > It is on by default.  But -gstrict-dwarf is not on by default.
> > >
> >
> > Oh, it is - in a released version of the compiler, or only in
> development?
>
> Still in development, but the prerelease already widely deployed by
> multiple
> Linux distributions.
>
> > & you're proposing changing the behavior only under -gstrict-dwarf,
> rather
> > than in general? Any particular reason?
>
> Just a typo, sorry, meant -gsplit-dwarf.
>

Ah, right right - I'm with you.

What's your take on:

1) Fixing GDB to handle GCC's current output.
2) Fixing GCC to produce something maybe more standards conforming (to my
mind, ideally: ranges on the skeleton CU (using either
rnglists_base+rnglistx (like LLVM), or sec_offset (actually more
compact/better than LLVM anyway, and avoids the ambiguous situation), and
rnglistx in child DIEs the split full unit using using .debug_rnglists.dwo)
3) both? (so GDB can handle old GCC's output and the newer/more correct
output)

Personally, I'd have thought it'd be enough to move forward, change GCC and
be done - but if folks would like GDB (& GDB folks are cool with it) to
handle the old/weird GCC output, that's cool/up to GDB folks. Though I hope
that sort of DWARF doesn't stick around long/need lots of long-lived
support (I hope we don't need to add it to llvm's symbolizer for instance).

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
+Mark in case he's got further context/perspective to share in the context
of this thread

One particular thing I'll pull out of the gdb-patches thread is:

"But the rnglists
(loclists) themselves can still use relocations. A large part of them
is non-shared addresses, so using indexes (into the .debug_addr
addr_base) would simply be extra overhead."

That's not quite right - while a direct mapping from debug_loc and
debug_ranges (at least location and range lists not using base address
selection entries) to debug_loclist and debug_rnglist would produce a
similar number of addresses and relocations - there's a lot to be gained by
using DW_RLE/LLE_base_addressx entries - then you can strategically reuse
an already-existing debug_addr entry and avoid another relocation all
together (debug_loc and debug_ranges couldn't do this, even when using a
base address selection entry - that base address couldn't be shared with
other lists, since it was inline). This has significant savings, and was
the main reason I suggested to Paul Robinson that range lists should get
the same handling as loclists - since optimized builds use a lot of range
listst and their relocations were taking up a huge amount of the remaining
.o/executable contribution to debug info. This (rnglists.dwo with strategic
use of base address selection entries) was the main win we saw when
switching Google from DWARFv4+GNU-extension Split DWARF to DWARFv5 and
justified the not insignificant work of updating the various DWARF
consumers we had (including a few of those patches upstreamed to gdb, lldb,
and various internal and external symbolizers).

also, as to the original motivation for Split DWARF (reduce object size,
reduce relocations, etc) - mostly a distributed build system where the cost
of shipping all the object files to the link step is a significant
bottleneck - so reduced object size (so reducing the DWARF object size -
which is both .debug_* sizes, and .rela.debug_* sizes equally - well,
except we do use -gz so .debug_* sizes are much smaller, but .rela.debug_*
is not compressed - so reducing relocations is /extra/ important).

And a neat note: Actually after DWARFv5 we have a .rela.debug_addr which is
/smaller/ than .rela.debug_line, which is sort of surprising/noteworthy.
That means that .rela.debug_addr has just one address for each section* -
but it also has an entry for each global variable, which .debug_line
doesn't have, so I'd usually expect .rela.debug_line to be a strict subset
of .rela.debug_addr - except that DWARFv5 moved the debug_line strings out
to .debug_line_str, which added a relocation for every file/directory name
- pushing the number of relocations up over the trimmed down
rela.debug_addr.

I haven't done GCC V Clang comparisons in a while - but it might be worth
trying some with Split DWARF, as I suspect this rnglist stuff and strategic
base address selection logic may carry quite some weight.

Picked a random file from the LLVM tree and built it with -O3 -gdwarf-5
-gsplit-dwarf with Clang and GCC ToT and some relevant stats:

Probably the lower bound for relocations is GCC's .rela.debug_line since it
uses the DWARFv3 line tables, without relocations for each directory and
file name, so basically one relocation per section (this build used
-ffunction-sections, so that amounts to one relocation per function):

 2.29Ki .rela.debug_line

GCC's other .rela.debug_*:

  291Ki .rela.debug_addr

   74Ki .rela.debug_rnglists

Clang's:

  7Ki .rela.debug_line
 45Ki .rela.debug_addr

Or, with the "prefer DW_AT_ranges, even when the range is contiguous":

  6Ki .rela.debug_addr


And with a custom form...

 2.20Ki   0.0%   0.rela.debug_addr

And comparing the .rela.debug_line and .rela.debug_addr in this last
example - there's exactly one more debug_addr relocation than debug_line
relocation, the one global variable in this CU.

(I went to look a bit further and GCC's .debug_loclists.dwo but it seems
there's something about it that llvm-dwarfdump can't understand - it only
prints a handful of rather mangled location lists... not sure which
component (GCC, llvm-dwarfdump, or both) is getting things confused here -
oh, maybe some kind of DWARF extension for the "views" system, by the looks
of it)

* actually there's one bit left: DW_AT_low_pc - it can't use an
addrx+offset encoding. So for now I've implemented a mode in Clang where
DW_AT_ranges instead of DW_AT_low/high_pc is used even when a DIE has a
contiguous address, so that a strategic base address can be used, reducing
the size of .{rela.}debug_addr a bit more, at the expense of a slightly
larger .debug_rnglists.dwo. I hope to propose/add some kind of addrx+offset
form to DWARFv6 to address this gap (& I've prototyped that in Clang too,
under a flag).

On Wed, Mar 10, 2021 at 11:23 PM Simon Marchi via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> On 2021-03-10 10:59 a.m., Jakub Jelinek via Dwarf-Discuss wrote:> Hi!
> >
> > We got a report today that 

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 9:38 PM Jakub Jelinek  wrote:

> On Wed, Mar 10, 2021 at 04:12:57PM -0800, David Blaikie via Dwarf-Discuss
> wrote:
> > On Wed, Mar 10, 2021 at 4:02 PM Cary Coutant  wrote:
> >
> > > > > So in the end the logical thing to do when encountering a
> > > > > DW_FORM_rnglistx in a split-unit, in order to support everybody, is
> > > > > probably to go to the .debug_rnglists.dwo section, if there's one,
> > > > > disregarding the (inherited) DW_AT_rnglists_base.  If there isn't,
> then
> > > > > try the linked file's .debug_rnglists section, using
> > > > > DW_AT_rnglists_base.  If there isn't, then something is malformed.
> > >
> > > Looks reasonable to me. I think we need a new issue to clarify this in
> > > DWARF 6.
> > >
> >
> > Given that DWARFv5 isn't on by default in GCC yet & I think has a few
> more
>
> It is on by default.  But -gstrict-dwarf is not on by default.
>

Oh, it is - in a released version of the compiler, or only in development?

& you're proposing changing the behavior only under -gstrict-dwarf, rather
than in general? Any particular reason?

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 4:02 PM Cary Coutant  wrote:

> > > So in the end the logical thing to do when encountering a
> > > DW_FORM_rnglistx in a split-unit, in order to support everybody, is
> > > probably to go to the .debug_rnglists.dwo section, if there's one,
> > > disregarding the (inherited) DW_AT_rnglists_base.  If there isn't, then
> > > try the linked file's .debug_rnglists section, using
> > > DW_AT_rnglists_base.  If there isn't, then something is malformed.
>
> Looks reasonable to me. I think we need a new issue to clarify this in
> DWARF 6.
>

Given that DWARFv5 isn't on by default in GCC yet & I think has a few more
things to do - would it be OK if we avoided that complexity if GCC's going
to move its rnglists to .dwo (except for the CU, per the spec) which should
match LLVM's behavior? Either way old GDBs won't handle GCC's DWARFv5 and
new GCCs will produce GDB-compatible DWARFv5.

I've posted an issue to clarify rnglists the way LLVM uses them - but if
preferred, I can amend that to support the fallback described here.
Hopefully we can avoid that variance/complexity for consumers, though.

Jakub: Any thoughts on this?

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 2:31 PM Michael Eager  wrote:

> On 3/10/21 1:28 PM, David Blaikie wrote:
> > On Wed, Mar 10, 2021 at 1:21 PM Cary Coutant  > > wrote:
> >
> >  > Speculation beyond the original question:
> >  > Given that it's a pretty common/core feature of a debugger to
> > call functions, perhaps a start would be some way for the producer
> > to communicate, via DWARF, that it has changed the ABI of a function
> > and so the consumer should not try to synthesize calls to it.
> > Providing much more functionality than that I think will amount to
> > encoding the ad-hoc ABIs that compilers create in these situations
> > (possible, but a fairly non-trivial proposal/enhancement to DWARF)
> >
> > I believe that's what DW_AT_calling_convention and DW_CC_nocall are
> > for (Section 3.3.1.1).
> >
> >
> > Oh, sweet - yep, that looks like the ticket indeed.
> >
> > "If the value of the calling convention attribute is the constant
> > DW_CC_nocall, the subroutine does not obey standard calling conventions,
> > and it may not be safe for the debugger to call this subroutine."
>
> All that says is that you can't call the function.  It doesn't
> describe how to call functions with non-ABI calling conventions.
>

Yep, but as I was saying, it seems like a minimum that - since we can't
describe how to call non-ABI-conforming functions, the least we could do is
flag them so the DWARF consumer can differentiate and not accidentally call
functions in a way that would produce problems.
So this'd at least address outright bugs/bogus behavior in DWARF consumers,
instead of calling a function expecting to use the ABI, they could tell the
user that that function can't be called.

but, yeah, the bigger puzzle is still unsolved, if consumers want to be
able to call such functions it'll take a fairly complicated/expensive
proposal, I expect

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 2:36 PM Cary Coutant  wrote:

> On Wed, Mar 10, 2021 at 1:27 PM David Blaikie  wrote:
> >
> > On Wed, Mar 10, 2021 at 1:16 PM Cary Coutant  wrote:
> >>
> >> > But what about the DW_AT_ranges on the skeleton CU when using split
> DWARF?
> >> >
> >> > Are you suggesting that both LLVM and GCC's emission is incorrect -
> and that it's not possible to use rnglistx in the skeleton CU (instead you
> must use sec_offset for DW_AT_ranges on the skeleton CU)? (& that there's
> no way to refer to range lists in the .o (debug_rnglists) from the .dwo -
> all ranges in the split full unit must be in debug_rnglists.dwo?)
> >>
> >> If you've moved range lists over to the dwo, having DW_AT_ranges in
> >> the skeleton CU would be pointless — the consumer would still have to
> >> go find the dwo to get the ranges.
> >>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> overhead (doesn't have all the nice rnglist encodings for instance - nor
> can it use debug_addr, and if it did it'd still be duplicate with the CU
> ranges wherever they were).
> >
> > DWARFv5 says:
> >
> > "A skeleton compilation unit may have additional attributes, which are
> the same as for conventional compilation unit entries except as noted, from
> among the following:
> >   2. Either a DW_AT_low_pc and DW_AT_high_pc pair of attributes or a
> DW_AT_ranges attribute."
> >
> > and
> >
> > "The following attributes are not part of a split full compilation unit
> entry but instead are inherited (if present) from the corresponding
> skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges, ..."
> >
> > Even before the rnglist move, this still disallowed using
> DW_AT_low/high_pc with addrx encodings from the split full unit, instead
> requiring them to be in the skeleton unit. I think this is the right call
> (& guess it was motivated by this use case) to make for cheap unit lookup.
>
> Hmm. I'm thinking that wording is from before the rnglist move, and
> did not get updated properly. Or maybe it was intentional, but not
> thought all the way through. Forcing DW_AT_ranges to be in the
> skeleton CU when the actual range lists are in the dwo is silly.
> DW_AT_low_pc/_high_pc in the skeleton makes sense, though.
>

Why one but not the other? Pre-range move, we could've still had smaller
skeleton CUs (& thus smaller objects/executables) if low/high/ranges were
all in the split full unit - using addrx and the v4 fission prototype style
ranges_base-relative offsets.

Yeah, the wiki write up for Fission ( https://gcc.gnu.org/wiki/DebugFission
) doesn't mention the motivation for DW_AT_low_pc, DW_AT_high_pc,
DW_AT_ranges, and DW_AT_stmt_list to be in the skeleton unit. stmt_list
makes sense because it needs to be relocated to refer to the .debug_line
contribution wherever it's linked in. (oh, I guess there's another use case
for low/high/ranges in the .o - it means you can symbolize (without inline
stack frames) without dwo files present - and with two level line tables
you could do full symbolization without dwo files)

In any case I disagree that forcing ranges to be on the skeleton CU is
silly - it's important to use cases I care about/support (& I think is
pretty generally a good idea, but wouldn't strenuously object if someone
really wanted the freedom to put it in the split full unit if they really
want to/have a use case).

But you've got me thinking Perhaps we should have *both*
> .debug_rnglists *and* .debug_rnglists.dwo. The former could contain
> only the range list needed for the DW_AT_ranges attribute in the
> skeleton CU, while the latter would contain the range lists for any
> DW_AT_ranges attributes in the dwo sections.
>

That's what LLVM emits today, so I'm certainly in favor of it. (& that got
me thinking to those relationship diagram appendices - and indeed "Figure
B.2: Split DWARF section relationships" is another inconsistency, leaning
towards your description, since it does not include a .debug_rnglists
non-dwo section in the diagram)

Which comes back to the "what does rnglists_base on a skeleton .o file
mean, if anything" - should the ranges on the skeleton CU always use
sec_offset (never rnglistx), or can you use rnglists_base on the skeleton
CU , rnglistx in both skeleton and split full - and the rnglistx in the
split full unit always refer to rnglists.dwo no matter what's going on in
the skeleton - and the rnglistx in the skeleton uses the rnglists_base?
That's what LLVM does today and seems consistent/general.

I'd say the fix to the wording would be to say that DW_AT_rnglists_base is
not inherited by the split full CU from the skeleton CU, and split full CUs
effectively always have an implicit rnglists_base of . (plus fixing the block diagram)


> If you (LLVM) are already choosing not to use 

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 1:27 PM Jakub Jelinek  wrote:

> On Wed, Mar 10, 2021 at 01:16:24PM -0800, Cary Coutant wrote:
> > > But what about the DW_AT_ranges on the skeleton CU when using split
> DWARF?
> > >
> > > Are you suggesting that both LLVM and GCC's emission is incorrect -
> and that it's not possible to use rnglistx in the skeleton CU (instead you
> must use sec_offset for DW_AT_ranges on the skeleton CU)? (& that there's
> no way to refer to range lists in the .o (debug_rnglists) from the .dwo -
> all ranges in the split full unit must be in debug_rnglists.dwo?)
> >
> > If you've moved range lists over to the dwo, having DW_AT_ranges in
> > the skeleton CU would be pointless — the consumer would still have to
> > go find the dwo to get the ranges.
>
> My current patch to start using .debug_rnglists.dwo in GCC for -gdwarf-5
> -gstrict-dwarf
> will emit DW_AT_ranges in the .debug_info.dwo only (both in the CU and
> other
> DIEs there),


That seems unfortunate to me - I think having the CU ranges readily
accessible without loading dwo files is of significant value (& using
aranges to achieve this is a lot of size overhead/duplication - works
against the benefits of Split DWARF).

(if this is a first step with intended refinement/improvement later - sure,
I totally appreciate incremental development. Certainly LLVM's DWARFv5
support probably looked really weird along the way)


> but if I need a DW_AT_low_pc for the base address too, do you think
> it is ok to keep it in the DW_TAG_skeleton_unit with DW_FORM_addr?
>

I think it would be OK. I don't think there's anything about addr_base that
/requires/ all address type attributes to use addrx.

I'd still probably use addrx (even though keeping it in the skeleton unit)
though, since you'll probably benefit from having the address in the
address pool for other reasons (such as in rnglists and loclists) - so you
would have fewer relocations by having it one in debug_addr, rather than
once in the CU and then again in debug_addr for use elsewhere. (while the
exact address might not be used in rnglist/loclist - it can be a good base
address to use, which reduces the number of debug address relocations and
the size of debug_addr)


> If I had to move it to .debug_info.dwo DW_TAG_compile_unit, it would need
> to
> be DW_FORM_addrx there and in the end would require larger size.
>

Only if the address wasn't used elsewhere.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 1:21 PM Cary Coutant  wrote:

> > Speculation beyond the original question:
> > Given that it's a pretty common/core feature of a debugger to call
> functions, perhaps a start would be some way for the producer to
> communicate, via DWARF, that it has changed the ABI of a function and so
> the consumer should not try to synthesize calls to it. Providing much more
> functionality than that I think will amount to encoding the ad-hoc ABIs
> that compilers create in these situations (possible, but a fairly
> non-trivial proposal/enhancement to DWARF)
>
> I believe that's what DW_AT_calling_convention and DW_CC_nocall are
> for (Section 3.3.1.1).
>

Oh, sweet - yep, that looks like the ticket indeed.

"If the value of the calling convention attribute is the constant
DW_CC_nocall, the subroutine does not obey standard calling conventions,
and it may not be safe for the debugger to call this subroutine."

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 1:16 PM Cary Coutant  wrote:

> > But what about the DW_AT_ranges on the skeleton CU when using split
> DWARF?
> >
> > Are you suggesting that both LLVM and GCC's emission is incorrect - and
> that it's not possible to use rnglistx in the skeleton CU (instead you must
> use sec_offset for DW_AT_ranges on the skeleton CU)? (& that there's no way
> to refer to range lists in the .o (debug_rnglists) from the .dwo - all
> ranges in the split full unit must be in debug_rnglists.dwo?)
>
> If you've moved range lists over to the dwo, having DW_AT_ranges in
> the skeleton CU would be pointless — the consumer would still have to
> go find the dwo to get the ranges.
>
> For the case you suggested where it would be useful to keep the range
> list for the CU in the .o file, I think .debug_aranges is what you're
> looking for.
>

aranges has been off by default in LLVM for a while - it adds a lot of
overhead (doesn't have all the nice rnglist encodings for instance - nor
can it use debug_addr, and if it did it'd still be duplicate with the CU
ranges wherever they were).

DWARFv5 says:

"A skeleton compilation unit may have additional attributes, which are the
same as for conventional compilation unit entries except as noted, from
among the following:
  2. Either a DW_AT_low_pc and DW_AT_high_pc pair of attributes or a
DW_AT_ranges attribute."

and

"The following attributes are not part of a split full compilation unit
entry but instead are inherited (if present) from the corresponding
skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges, ..."

Even before the rnglist move, this still disallowed using DW_AT_low/high_pc
with addrx encodings from the split full unit, instead requiring them to be
in the skeleton unit. I think this is the right call (& guess it was
motivated by this use case) to make for cheap unit lookup.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 9:51 AM Michael Eager via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> On 3/9/21 7:05 AM, Andrew Cagney via Dwarf-Discuss wrote:
> > Part of a typical Application Binary Interface is to specify the
> > function calling convention.  Several uses are:
> >
> > - ensuring function calls across interface boundaries work (function
> > in one object calls function in second object)
> > - the debugger supplementing the debug information describing the
> > location of parameters
> > - the debugger implementing inferior function calls
> >
> > Typically calls both between and within object files (DWARF
> > compilation unit) follow the ABI (with exceptions for things like
> > __mul, but good ABIs even defined those).
> >
> > Technically, however, only functions visible via an interface need
> > comply with the ABI.  This means that:
> >
> > - for simple objects, local functions; and
> > - with link-time-optimization, everything except library interface
> functions
> >
> > are fair game for ABI non-compliant call optimizations.
> >
> > Is anyone aware of a compiler doing this (I figure with LTO there's a
> > strong incentive)?  And if so, how is this described to the debugger.
> > The ABI / calling-convention is no longer on hand for filling in the
> > blanks.
>
> This is an instance of a more general issue: debugging optimized code.
>
> DWARF has ways to describe some of the optimizations which a compiler
> can perform, such as inlining.  This is because at least some of the
> semantics of the process of inlining are well defined.  Even then, a
> debugger calling an inlined function is generally not possible, unless
> there is a non-inlined instance.
>

Yep, and DWARF makes it clear to the consumer that this situation has
occurred - the consumer can provide an informative error message to the
user about the function having been inlined & so it is not callable.

There are a range of optimizations which a compiler can perform as
> long as the program performs as if the optimization was not done.
> These include eliminated code, merged code, duplicated code, non-ABI
> calls, non-ABI returns, and lots more.  DWARF describes the
> correspondence between source code and executable instruction, the
> locations of arguments and variables, and how to walk the stack.
>

Fair enough - sounds like we're in agreement about the original question:
Compilers do this (create non-ABI conforming functions) [and DWARF doesn't
currently have a way to describe how to call such a function].

GCC and Clang/LLVM do describe these situations in somewhat different ways
- but I don't think either would provide a strong basis on which to
synthesize correct calls to these functions from a DWARF consumer.


> This seems to be adequate to debug many optimized programs, even if it
> is not a complete description of the optimizations.
>
> If you want a debugger to be able to call a function which has been
> merged (partially or completely) into another function, or which has
> a streamlined non-ABI compliant call/return sequence, DWARF does not
> provide this information.  There might be a way to describe this
> piecemeal, addressing this one instance of the general issue of how
> to debug optimized code.  Or perhaps there is a more general way to
> describe optimizations.


Speculation beyond the original question:
Given that it's a pretty common/core feature of a debugger to call
functions, perhaps a start would be some way for the producer to
communicate, via DWARF, that it has changed the ABI of a function and so
the consumer should not try to synthesize calls to it. Providing much more
functionality than that I think will amount to encoding the ad-hoc ABIs
that compilers create in these situations (possible, but a fairly
non-trivial proposal/enhancement to DWARF)

This won't be complete, though - since I expect debuggers call functions
that don't even have any DWARF description, by using the mangled name to
determine the ABI - so compilers would probably still need some work to
modify the function names to make them not accidentally communicate how
they are to be called. (& maybe if that were fixed, there would be no need
for a DWARF feature to flag such functions - instead it could be
communicated in the mangling)

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-10 Thread David Blaikie via Dwarf-Discuss
On Wed, Mar 10, 2021 at 12:32 PM Cary Coutant  wrote:

> Location List and Range List Sections Improvement/Enhancement >> We
> got a report today that GCC even for -gdwarf-5 -gsplit-dwarf uses
> >> .debug_rnglists section + DW_AT_ranges + DW_AT_low_pc +
> DW_AT_rnglists_base
> >> attributes in the DW_TAG_skeleton_unit (and then some DW_AT_ranges in
> >> .debug_info.dwo that use DW_FORM_rnglistx, but no .debug_rnglists.dwo
> >> section).
>
> The original split DWARF proposal, and the prototype implementation
> based on DWARF-4 in GCC did not use .debug_rnglists.dwo (this was
> before .debug_ranges was converted to .debug_rnglists by issue
> 160123.1), so we used DW_AT_ranges_base in the skeleton CU so that dwo
> files could use DW_AT_ranges with a non-relocatable offset relative to
> that base.
>
> With issues 160123.1 (Unify Location Lists and Range Lists) and
> 160714.1 (Location List and Range List Sections
> Improvement/Enhancement), we replaced .debug_ranges with
> .debug_rnglists, and made it possible to place range lists into
> .debug_rnglists.dwo when using split DWARF.
>
> DW_AT_rnglists_base is useful to reduce the number of relocations in a
> non-split-DWARF object file. It's not necessary when placing range
> lists into the dwo file, but if it were to be used there, it would not
> make sense to put it in the skeleton CU. It's pointless in a split
> DWARF situation, since no relocations are necessary for
> DW_FORM_rnglistx.
>

But what about the DW_AT_ranges on the skeleton CU when using split DWARF?

Are you suggesting that both LLVM and GCC's emission is incorrect - and
that it's not possible to use rnglistx in the skeleton CU (instead you must
use sec_offset for DW_AT_ranges on the skeleton CU)? (& that there's no way
to refer to range lists in the .o (debug_rnglists) from the .dwo - all
ranges in the split full unit must be in debug_rnglists.dwo?)


> It sounds like the DWARF-5 implementation of split DWARF in GCC still
> has some residuals from the prototype based on DWARF-4. That's likely
> due to my retirement and the move of the rest of the Google compiler
> team over to LLVM.
>
> > I think the spec is ambiguous here:
> >
> > 3.1.3 "The following attributes are not part of a split full compilation
> unit entry but instead are 18 inherited (if present) from the corresponding
> skeleton compilation unit: DW_AT_low_pc, 19 DW_AT_high_pc, DW_AT_ranges,
> DW_AT_stmt_list, DW_AT_comp_dir, 20 DW_AT_str_offsets_base, DW_AT_addr_base
> and DW_AT_rnglists_base."
> >
> > So, on the one hand, if rnglists_base is inherited that implies that
> rnglists_base on a skeleton CU means rnglists.dwo is not used. (so the only
> way to use rnglists.dwo is to not have rnglists_base on the skeleton CU and
> you don't have it on the split full CU and /that's/ how rnglistx in split
> full unit refer to rnglists.dwo instead of debug_rnglists in the linked
> executable)
>
> Yes, it would be possible to keep range lists in .debug_rnglists (and
> in the .o) under split DWARF, and then DW_AT_rnglists_base in the
> skeleton CU would make sense. But that's not the intent.
>
> -cary
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-09 Thread David Blaikie via Dwarf-Discuss
On Tue, Mar 9, 2021 at 3:40 PM Jakub Jelinek  wrote:

> On Tue, Mar 09, 2021 at 03:22:35PM -0800, David Blaikie wrote:
> > So, when the consumer evaluates DW_OP_GNU_parameter_ref, it handles it
> > > similarly to DW_OP_entry_value, unwinds to caller if it can identify
> it,
> > > and just looks up if some value is specified for it in that particular
> > > caller.
> > >
> >
> > Could you help me understand more how DW_OP_GNU_parameter_ref
> > works/differently from DW_OP_entry_value?
>
> DW_OP_entry_value refers to a register or memory in which a parameter
> is passed.

  When a parameter is not passed at all, there is no register or
> memory to which it can refer to.  So, DW_OP_GNU_parameter_ref instead
> refers to the DW_TAG_formal_parameter DIE and the consumer needs to find
> a DW_TAG_call_site_parameter that refers to the same DIE.
>

Ah, OK. Hmm - do you have different call_site_parameters for registers
versus parameters? Or I guess a call_site_parameter without a
DW_AT_location and only a DW_AT_call_value?

> > I think GCC doesn't do that, instead it would if considered beneficial
> > > copy the function to a non-exported one and optimize away the
> parameters in
> > > there (etc.).
> >
> >
> > Ah, OK - in which case there would be no DWARF for the copy? And the
> > original function would look as though it were "optimized away" (ie: not
> > have any DW_AT_low/high_pc, etc)?
>
> No, there is DWARF for the copy.  The original user function is the
> abstract
> origin and then it has two (or more) DW_TAG_subprogram DIEs that refer to
> that (and refer to DW_TAG_formal_parameter and DW_TAG_variable etc.) in it.
>

Hmm - is that "conforming" (I'm all for "DWARF provides some tools and
here's a way we can use them to describe this situation") DWARF? I would've
thought that there could only be one concrete instance of an abstract
definition.

And given that, could the user call this function from their debugger, and
how would the debugger get the ABI correct? (let's say there were no other
callers in other files - and we used gc-sections to optimize away that
original external copy, perhaps - if that makes a difference)

I guess since GCC didn't use the original symbol name, maybe the debugger
wouldn't consider the modified copy to be a valid target to call the
original function even though the DWARF says this is an instance of the
original function?

To come back to the original question - are there gaps between DWARF and
the ABI, I think the answer is yes (for internalized functions), certainly
for LLVM but I think for GCC too. At least in terms of what's
guaranteed/explicitly communicated - we might have to formalize some things
to make clear what conclusions can be reached when certain parameter
locations are used. As it is today, DWARF doesn't guarantee that the
DW_AT_location of a variable has no bearing on the way the function is
called - even a DW_AT_location that doesn't use a location list only has to
be valid after the prologue - so you can't guarantee that the location can
be used for the calling the function (the simplest example of this is at
-O0, you get nothing about how to call the function - but this DWARF would
be ambiguous with an optimized build that took the parameter from some
non-ABI register in the prologue and moved it into fbreg) - is that
correct? Have I misunderstood something?

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-09 Thread David Blaikie via Dwarf-Discuss
On Tue, Mar 9, 2021 at 11:52 AM Jakub Jelinek  wrote:

> On Tue, Mar 09, 2021 at 11:43:54AM -0800, David Blaikie wrote:
> > Thanks for the details! So in this case GCC changes the ABI of foo(int x,
> > int y) to be equivalent to foo(int y) and the parameter description of
> 'y'
>
> No, it is actually equivalent to foo(void)


Ah, I see - the used parameter is constant, so it doesn't need to be passed
(it was constant propagated from all callers to the callee) and the varied
parameter was unused so it didn't need to be passed either.


> but DW_TAG_call_site_parameter
> in those cases holds the value the optimized away parameter would have if
> it
> would be passed.

So, when the consumer evaluates DW_OP_GNU_parameter_ref, it handles it
> similarly to DW_OP_entry_value, unwinds to caller if it can identify it,
> and just looks up if some value is specified for it in that particular
> caller.
>

Could you help me understand more how DW_OP_GNU_parameter_ref
works/differently from DW_OP_entry_value?

> Does GCC do anything like the LLVM optimization when the function is
> > externally visible, but some callers are visible to the optimizer - and
> the
> > compiler concludes that since the parameter is unused inside the function
> > implementation those callers can be modified to pass effectively garbage
> in
> > that parameter? And what sort of DWARF does GCC use to describe that?
>
> I think GCC doesn't do that, instead it would if considered beneficial
> copy the function to a non-exported one and optimize away the parameters in
> there (etc.).


Ah, OK - in which case there would be no DWARF for the copy? And the
original function would look as though it were "optimized away" (ie: not
have any DW_AT_low/high_pc, etc)?


> Also note that the callee is exported, there is the question
> if it can be semantically interposed (for GCC yes by default with
> -fpic/-fPIC unless playing e.g. with visibility attributes) and in that
> case
> you don't know if the parameter will be unused there.
>

Ah, yeah, this only applies in the non-interposable case.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-09 Thread David Blaikie via Dwarf-Discuss
On Tue, Mar 9, 2021 at 11:29 AM Jakub Jelinek  wrote:

> On Tue, Mar 09, 2021 at 11:16:01AM -0800, David Blaikie via Dwarf-Discuss
> wrote:
> > void f1(int i) { }
> >
> > to include a DW_AT_location with fbreg, nothing about how the ABI
> > represents 'i' - so that would be an ABI gap.
> >
> > In the cases where the compiler does modify any ABI-relevant properties,
> > how would the DWARF consumer know that /that/ location represents the ABI
> > calling convention, but the above fbreg location does not?
> >
> > At least for Clang - the two cases of DAE described below (either
> modifying
> > a parameter to be unused but present (because the signature must be
> > maintained for external callers) and truly removing the parameter (thus
> > shifting other parameters down)) would be indistinguishable in any DWARF
> I
> > can imagine - in both cases the parameter would have no DW_AT_location.
> Any
> > idea how GCC handles that, if it does that kind of optimization?
>
> If GCC optimizes away some parameter (of course that is possible only for
> functions not exported from TUs or on non-exported copies of the exported
> functions), e.g. with
> static __attribute__((noinline)) int foo (int x, int y) { return x; }
> int bar (void) { return foo (3, 17) + foo (3, 18) + foo (3, 19); }
> it will emit:
> .uleb128 0xa# (DIE (0xb9) DW_TAG_formal_parameter)
> .long   0x85# DW_AT_abstract_origin
> .byte   0x3 # DW_AT_const_value
> for the x case (where the parameter has been optimized away and is known to
> be 3 in all cases) and for y uses a GNU extension (that has been added too
> late in the DWARF5 cycle so it was too late to propose it for DWARF5)
> .uleb128 0x9# (DIE (0xad) DW_TAG_formal_parameter)
> .long   0x8d# DW_AT_abstract_origin
> .uleb128 0x6# DW_AT_location
> .byte   0xfa# DW_OP_GNU_parameter_ref
> .long   0x8d
> .byte   0x9f# DW_OP_stack_value
> which has a reference to the formal parameter and value can be (with luck)
> found in DW_TAG_call_site_parameter in the callers.
>

Thanks for the details! So in this case GCC changes the ABI of foo(int x,
int y) to be equivalent to foo(int y) and the parameter description of 'y'
using DW_OP_GNU_parameter_ref says something like "look at where the first
ABI parameter would be stored"? So a DWARF consumer should read the
locations of the parameters - see the constant value of the first parameter
and conclude that the first parameter is omitted from the ABI? Then reading
the second parameter and see that its stored in the ABI first parameter -
and then conclude that since no other parameters are mentioned, only that
first parameter slot is used (to pass the second source parameter)/should
be passed?

Does GCC do anything like the LLVM optimization when the function is
externally visible, but some callers are visible to the optimizer - and the
compiler concludes that since the parameter is unused inside the function
implementation those callers can be modified to pass effectively garbage in
that parameter? And what sort of DWARF does GCC use to describe that?
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] compilers generating ABI non-compliant function calls?

2021-03-09 Thread David Blaikie via Dwarf-Discuss
Frank:

FWIW, gcc does not leave ABI-dependent gaps in the DWARF generated for
function parameters.  First class location lists are given, whether or
not they are in the ABI-governed locations, or whether they've been
moved somewhere else, or whether they've been optimized out so that a
consumer must recompute it somehow, or whether they exist at all.

I don't think it's valid for a DWARF consumer to use the DW_AT_location of
a parameter to determine the calling convention, though. DW_AT_locations
for parameters only need to be valid after the prologue, so they don't
necessarily describe how the parameter was passed to the function (for
instance - LLVM and GCC and -O0 describe parameters as being on the stack,
not in registers, even though the calling convention dictates that the
parameters be passed in registers).

eg: clang and gcc both compile this:
void f1(int i) { }

to include a DW_AT_location with fbreg, nothing about how the ABI
represents 'i' - so that would be an ABI gap.

In the cases where the compiler does modify any ABI-relevant properties,
how would the DWARF consumer know that /that/ location represents the ABI
calling convention, but the above fbreg location does not?

At least for Clang - the two cases of DAE described below (either modifying
a parameter to be unused but present (because the signature must be
maintained for external callers) and truly removing the parameter (thus
shifting other parameters down)) would be indistinguishable in any DWARF I
can imagine - in both cases the parameter would have no DW_AT_location. Any
idea how GCC handles that, if it does that kind of optimization?

On Tue, Mar 9, 2021 at 8:36 AM Michael Eager via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> On 3/9/21 7:13 AM, Frank Ch. Eigler via Dwarf-Discuss wrote:
> > As I understand it, the location of*function return values*  is
> > however a gap in DWARF, and a consumer tool must resort to ABI specs.
> > (Thus the elfutils dwfl_module_return_value_location() function.)  I'm
> > sure there's a Reason for this, but having worked on a consumer, it'd
> > be handy if DWARF did explicitly identify the return value location
> > too.
>
> DWARF does not duplicate information which is documented in the ABI
> or in other information which is shared by compilers and debuggers.
> For example, DWARF does not describe the calling convention for a
> function.  Producers and consumers are expected to know this info.
>
> The rationale is that duplicating shared ABI knowledge would greatly
> increase the size of a debug file, while not improving the ability
> to debug a program.
>
> For example, DWARF describes the arguments to a function.  It doesn't
> describe the calling convention, which registers are preserved, which
> are clobbered, or anything else which is specified by the ABI.  In
> most ABIs, the location of a function return value is similarly
> constrained.  If there are multiple calling conventions, this is
> identified to allow a debugger to generate a valid call to a function.
>
> DWARF only contains information which describes what a compiler
> generates which cannot be unambiguous determined by knowledge of the
> ABI.  A limited exception is the CFI, which in many cases mirrors the
> ABI.
>
> If there are occasions when a compiler might place a function return
> value in a variety of different locations, not constrained by the
> ABI, a DWARF attribute might be useful.
>

Essentially, with LTO, everything can be (& is) changed by the compiler -
because it can see all callers to a function. Basically the ABI doesn't
hold for anything except functions that need to be called from outside the
binary (in some cases, that means only "main" conforms to the ABI).

A simple one I know of for LLVM is in the optimizations Argument Promotion
and Dead Argument Elimination.

Argument Promotion can change a pointer parameter to a value parameter, for
instance - how should a compiler describe this situation in the DWARF? The
source code says the function parameter is a pointer, but if the debugger
tries to call that function with a pointer it will be quite broken - the
pointer must be dereferenced before the function is called. (but what
happens if the compiler optimized away a null test for that parameter?
Because the compiler determined all callers were passing non-null
parameters - so the user might try to run a function call with the
parameter being null, but that could crash, for instance)

With Dead Argument Elimination the compiler can determine that certain
parameters are unused - this can happen in two flavors:
1) the easy case, if the function ABI must be preserved: Some callers might
be modified to not pass a certain parameter because it's known to be unused
inside the function. This generally means that the variable can have no
valid location inside the function (except using call_site_parameters) - so
I think that case has a pretty clear answer today
2) if the ABI doesn't have to be maintained, 

Re: [Dwarf-Discuss] Retrieving variables, function address using dwarf

2021-03-05 Thread David Blaikie via Dwarf-Discuss
On Fri, Mar 5, 2021 at 8:28 PM Archana Deshmukh via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Hello,
>
> I need to read the address of local variable, global variable, function
> name and function arguments from the process.
>
> For global variables , I read the address "55b51afea000" from
> /proc//maps file. I use DW_OP_addr parameter to retrieve the address.
> 55b51afea000 + DW_OP_addr gives me the address of global variable.
>
> I need to read the stack segment, heap.
>

As you found with the global variables, DWARF doesn't provide you with the
means to read things, but does tell you where the things are and then you
go find out how to read them with OS features to read process memory, etc.

But for the stack, for instance - if you read the pc register, then use
that to look through the DWARF to find the DW_TAG_subprogram with either
DW_AT_low_pc/DW_AT_high_pc or DW_AT_ranges that cover the address the pc
points to, then look at the frame_base in the DWARF to figure out which
register you should look at to figure out where the frame starts - then you
can use that to figure out how to interpret the DW_AT_locations of any
local DW_AT_variables that are covered by any scope that includes that pc.

Heap similarly - you'll start at the global or local variables, and then
look at the data they point to, and follow the description of the
variable's value, query the memory, follow the pointers, etc...


> Is there any way to read segments? DW_AT_segment parameter seems to be for
> 16 bit.
>
> I need to read the following process map using dwarf.
>
> Any suggestion, pointers are welcome.
>
> 55b51afea000-55b51afeb000 r-xp  fd:00 5902563
>
> 55b51b1ea000-55b51b1eb000 r--p  fd:00 5902563
>
> 55b51b1eb000-55b51b1ec000 rw-p 1000 fd:00 5902563
>
> 55b51c094000-55b51c0b5000 rw-p  00:00 0 [heap]
>
> 7fca24956000-7fca24957000 rwxp  00:00 0
>
> 7fca24958000-7fca24959000 rwxp  00:00 0
>
> 7fca2496d000-7fca2496f000 rwxp  00:00 0
>
> 7fca24974000-7fca24975000 rwxp  00:00 0
>
> 7fca2497f000-7fca2498 rwxp  00:00 0
>
> 7fca24983000-7fca2498c000 rwxp  00:00 0
>
> 7fca2498f000-7fca2499a000 rwxp  00:00 0
>
> 7fca2499b000-7fca24aec000 rwxp  00:00 0
>
> 7fca24aec000-7fca24af8000 rwxp  00:00 0
>
> 7fca24af8000-7fca24afa000 rwxp  00:00 0
>
> 7fca24afa000-7fca24ce1000 r-xp  fd:00 10230842
>
> 7fca24ce1000-7fca24ee1000 ---p 001e7000 fd:00 10230842
>
> 7fca24ee1000-7fca24ee5000 r--p 001e7000 fd:00 10230842
>
> 7fca24ee5000-7fca24ee7000 rw-p 001eb000 fd:00 10230842
>
> 7fca24ee7000-7fca24eeb000 rw-p  00:00 0
>
> 7fca24eeb000-7fca24f2d000 rwxp  00:00 0
>
> 7fca24f2d000-7fca24f32000 rwxp  00:00 0
>
> 7fca24f32000-7fca24f34000 rw-p  00:00 0
>
> 7fca24f34000-7fca24f36000 rwxp  00:00 0
>
> 7fca24f36000-7fca251a rwxp  00:00 0
>
> 7fca251a-7fca350e ---p  00:00 0
>
> 7fca350e-7fca361f3000 rwxp  00:00 0
>
> 7fca361f3000-7fca36f4b000 rw-p  00:00 0
>
> 7fca36f4b000-7fca36f81000 r--p  fd:00 5902405
>
> 7fca36f81000-7fca3719a000 r-xp 00036000 fd:00 5902405
>
> 7fca3719a000-7fca37218000 r--p 0024f000 fd:00 5902405
>
> 7fca37218000-7fca3722c000 r--p 002cc000 fd:00 5902405
>
> 7fca3722d000-7fca3723 rw-p  00:00 0
>
> 7fca3723-7fca37426000 r-xp  fd:00 5902408
>
> 7fca37426000-7fca37625000 ---p  00:00 0
>
> 7fca37625000-7fca37627000 r--p 001f5000 fd:00 5902408
>
> 7fca37627000-7fca37629000 rw-p 001f7000 fd:00 5902408
>
> 7fca37629000-7fca3764d000 rw-p  00:00 0
>
> 7fca3764d000-7fca376dc000 r-xp  fd:00 5902391
>
> 7fca376dc000-7fca378db000 ---p  00:00 0
>
> 7fca378db000-7fca378de000 r--p 0008e000 fd:00 5902391
>
> 7fca378de000-7fca378df000 rw-p 00091000 fd:00 5902391
>
> 7fca378df000-7fca37908000 rw-p  00:00 0
>
> 7fca37908000-7fca37937000 r-xp  fd:00 5902373
>
> 7fca37937000-7fca37b36000 ---p  00:00 0
>
> 7fca37b36000-7fca37b37000 r--p 0002e000 fd:00 5902373
>
> 7fca37b37000-7fca37b38000 rw-p 0002f000 fd:00 5902373
>
> 7fca37b38000-7fca37bd8000 r-xp  fd:00 5902368
>
> 7fca37bd8000-7fca37dd7000 ---p  00:00 0
>
> 7fca37dd7000-7fca37ddb000 r--p 0009f000 fd:00 5902368
>
> 7fca37ddb000-7fca37ddc000 rw-p 000a3000 fd:00 5902368
>
> 7fca37ddc000-7fca37ddd000 rw-p  00:00 0
>
> 7fca37ddd000-7fca37dec000 r-xp  fd:00 5902381
>
> 7fca37dec000-7fca37feb000 ---p  00:00 0
>
> Best Regards,
> Archana
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] HTML documentation for DWARF

2021-03-01 Thread David Blaikie via Dwarf-Discuss
I'd +1 that in general - probably mostly a question of who's going to do
the initial work, and how much ongoing maintenance cost it'll incur. If
you're volunteering to do the initial work, and it can be done so as to
keep the ongoing maintenance cost low - perhaps the editor(s) will be open
to it.

On Mon, Mar 1, 2021 at 2:53 PM Konrad Kleine via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Hi there.
>
> On a side note, the comment function doesn't work:
> http://www.dwarfstd.org/Comment.php .
>
> The verification doesn't work there: "Verification failed. Please go back
> and try again."
>
> I'm working on LLVM and LLDB and sometimes when I see a fix like the
> following I wonder why there's no link to the documentation about what
> "DW_FORM_line_strp" is doing: https://reviews.llvm.org/D97721. I would
> like to have a simple Hypertext link to the DWARF documentation. I could
> link to a specific PDF page, but that's not the same as linking to a
> section or a chapter.
>
> I'm asking if there's a way to generate not only a PDF but also a publicly
> accessible HTML from the source of the DWARF documentation. I learned that
> for DWARF4 the source was a MS Word document, so I guess that ship has
> sailed. But for the documentation in git (
> http://git.dwarfstd.org/?p=dwarf-doc.git;a=summary) I think it could be
> possible to have a documentation in HTML with URLs with /latest/ or
> /current/ or /5.0/ in it to resemble the HEAD of the development or some
> latest released or tagged version. What are the thoughts on having HTML
> documentation pages?
>
> Have you given some thought about using pandoc or publican as a more
> high-level organization for the latex sources? Those would allow different
> outputs.
>
> Regards
> Konrad
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 15, 2020 at 2:47 PM Greg Clayton via Dwarf-Discuss
 wrote:
>
> One simple approach would be to be able to represent a DW_AT_low_pc with a 
> DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset 
> from the DW_AT_low_pc.

I'm not sure this would catch all the desired cases/be especially tidy
to implement, unfortunately.

> The value of the DW_AT_low_pc would be an offset from either:
> 1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by 
> looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, 
> then use the lowest address out of all of them.

"lowest address in DW_AT_ranges" wouldn't be suitable when ranges are
used across sections (eg: some CU ranges - when functions are in
different sections due to inline functions or -ffunction-sections). If
everything was in one section then an implementation could use low_pc
to indicate a good base address even if they still needed DW_AT_ranges
(eg: void f1() { } __attribute__((nodebug)) void f2() { } void f3() {
} - or other cases where a single section with multiple hunks of debug
info could exist with holes in between) - but it's possible to have
that and ranges. eg:

// compiled without function sections, so f1 is in one section, but f2
and f3 are in a single section together, separate from f1
inline void f1() { }
void f2() { f1(); }
void f3() { }

the low_pc of f3 could benefit from using the same address (+offset)
as the low_pc of f2 - but there would be no clear way to indicate
which part of the CU's DW_AT_ranges could be used as the base address
for 'f3'.

> 2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX 
> encoding.

Similar in the example above, 'f3' has no parent with a suitable
low_pc, but would benefit from sharing the same debug_addr entry as
'f2'.

A more extreme example happens in LLVM's prototype "Propeller" feature
- which essentially is "basic block sections" - where even a single
function may be fragmented across multiple sections and have no
specific ordering/scope based hierarchy about which base address to
use (so the function would have DW_AT_ranges, not just DW_AT_low/high
- and some internal scope could have a contiguous range and would want
to reuse one of the addresses used in DW_AT_ranges (+an offset from
it)).

- Dave

> Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding 
> small since it is always relative to the first parent scope's DW_AT_low_pc. 
> So this could save a lot of space in the DWARF if we use the smallest 
> possible DW_FORM_data encoding all the time.
> Solution #2 could be easier as you would traverse parent scopes looking for 
> an address encoding as the DW_FORM.
>
> This would allow DW_TAG_subprogram DIEs to have a single relocation on the 
> DW_AT_low_pc.
>
> Greg Clayton
>
>
> > On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss 
> >  wrote:
> >
> > David Blaikie has brought this up with me (or in conversations that
> > I observed) a couple of times:
> >
> > It's common to want to refer to a particular address plus an offset,
> > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> > block or inlined subprogram within another subprogram.  Generally
> > the only symbolic address available is the entry point of the
> > containing subprogram.  Back when addresses were held directly in
> > the .debug_info section, the attributes would have relocations, the
> > offset would be encoded into the relocation and the linker would
> > just do the right thing.
> >
> > With DWARF v5, we now have the .debug_addr section, which contains
> > the addresses to be fixed up by the linker.  But, we don't have a
> > way to specify an offset to add to an entry in the .debug_addr
> > section; instead, each unique addr+offset requires its own entry
> > in the .debug_addr table.  This consumes additional space, these
> > entries are generally not reusable, and it doesn't reduce the
> > overall number of relocations that the linker must process.
> >
> > It's not feasible to define a new attribute for address+offset,
> > because an attribute has only one value, and the attribute would
> > have to specify both the .debug_addr index and the offset to add.
> > But, we could define an "indirect" entry in .debug_addr, and then
> > reference it with an attribute in the same way that we reference
> > any other .debug_addr entry.
> >
> > An indirect entry would be the same size as all other entries in
> > .debug_addr (i.e., the size of an address on the target).  The
> > upper half would be another index into .debug_addr and the lower
> > half would be the addend.  The consumer adds the addend to the
> > value from the entry specified by the "another index."
> >
> > This solution doesn't save space in .debug_addr, but it does
> > reduce the number of relocations.  Ideally .debug_addr would
> > require only one relocation per function.
> >
> > We can debate whether the addend should be signed or 

Re: [Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 15, 2020 at 10:13 AM Robinson, Paul via Dwarf-Discuss
 wrote:
>
> David Blaikie has brought this up with me (or in conversations that
> I observed) a couple of times:

Thanks for bringing this up! Not sure if I've raised this on
dwarf-discuss specifically before.. ah, yeah, 3 years ago:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-June/004378.html
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-July/thread.html#4380
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-August/004393.html

Most recently I had an idea for a workaround that I proposed on the
llvm-dev mailing list:
https://groups.google.com/g/llvm-dev/c/g3eGxhi4ATU/m/fbrBPFxNBwAJ
The idea being that actually using debug_rnglists even for contiguous
ranges would reduce .o/executable file size when using Split DWARF. I
think the data I had even showed breakeven for non-split DWARF object
files, probably slight growth for linked executables in that case,
though.

> It's common to want to refer to a particular address plus an offset,
> for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> block or inlined subprogram within another subprogram.

Yep - the ones I'm especially interested in now, are those that won't
be addressed even by a "ranges everywhere" approach (though that
approach does have size tradeoffs that I'd like to avoid/improve on
too, for sure!) - DW_TAG_call_site's
DW_AT_call_pc/DW_AT_call_return_pc and DW_TAG_label's DW_AT_low_pc.
The latter isn't super common in code I'm dealing with, but the former
is pretty ubiquitous now.

>  Generally
> the only symbolic address available is the entry point of the
> containing subprogram.  Back when addresses were held directly in
> the .debug_info section, the attributes would have relocations, the
> offset would be encoded into the relocation and the linker would
> just do the right thing.
>
> With DWARF v5, we now have the .debug_addr section, which contains
> the addresses to be fixed up by the linker.  But, we don't have a
> way to specify an offset to add to an entry in the .debug_addr
> section; instead, each unique addr+offset requires its own entry
> in the .debug_addr table.  This consumes additional space, these
> entries are generally not reusable, and it doesn't reduce the
> overall number of relocations that the linker must process.

If you're encountering size penalties with non-split DWARFv5 due to
debug_addr indirection - we could change LLVM to choose which
addresses to indirect and which ones to use the classing/DWARFv4-esque
representations.
(But, yeah, overall, I think it's better for lots of use cases to
support an addr+offset encoding)

> It's not feasible to define a new attribute for address+offset,
> because an attribute has only one value, and the attribute would
> have to specify both the .debug_addr index and the offset to add.

I don't follow this ^ - I think previously we've discussed at least 2
representations that could do this:
uleb+uleb
generalized exprloc support

admittedly uleb+uleb has the problem that it's a variable-length
encoding, but at least LLVM currently is using addrx exclusively, and
not the addrxN fixed length encodings.

> But, we could define an "indirect" entry in .debug_addr, and then
> reference it with an attribute in the same way that we reference
> any other .debug_addr entry.

This direction would, for my use case, be unfortunate - since my goal
is to remove as much DWARF from object files as possible under Split
DWARF - so leaving anything extra in debug_addr works against that
goal.

> An indirect entry would be the same size as all other entries in
> .debug_addr (i.e., the size of an address on the target).  The
> upper half would be another index into .debug_addr and the lower
> half would be the addend.  The consumer adds the addend to the
> value from the entry specified by the "another index."

If it's OK to use such a small fixed length encoding (addrx supports
variable length with fixed lengths of 1/2/3/4 - offsets in LLVM are
emitted as data4) then we could introduce that as the
FORM_addrx4_offset4 (or could make it variable length depending on
pointer size - but that seems less relevant when it's not uin the
debug_addr section) form and a uleb+uleb form, without providing all
the possible combinations of addrx{1,2,3,4,N}_offset{1,2,3,4,M}.

In any case, I think of these forms as sort of special
case/compact/easier to parse encodings of the generalized exprloc
(DW_OP_addrx(N), DW_OP_constu(M), DW_OP_plus).

>
> This solution doesn't save space in .debug_addr, but it does
> reduce the number of relocations.  Ideally .debug_addr would
> require only one relocation per function.
>
> We can debate whether the addend should be signed or unsigned,
> and whether the indirect entries should be a separate subtable,
> but I wanted to float the idea here before I wrote it up as a
> proposal.

I'd be fairly in favor of unsigned. Generally LLVM already 

Re: [Dwarf-Discuss] More on DW_AT_str_offset_base debug_str_offsets.dwo confusion

2020-09-01 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 1, 2020 at 10:24 AM David Anderson 
wrote:
>
> On 8/31/20 8:39 PM, David Blaikie wrote:
> > On Mon, Aug 31, 2020 at 8:22 PM David Anderson  > > wrote:
> >
> > On 8/31/20 1:03 PM, David Blaikie wrote:
> > > I'd rather go with LLVM's existing interpretation - that strx
> > > encodings used in .dwo do not attempt to use str_offsets in the
> > skeleton.
> > > But I wouldn't mind adding a str_offsets_base to the split full
unit
> > > to make it clear - this would be consistent with rnglists, I
> > think? (I
> > > think, in theory a rnglistx in a .dwo with a split full unit
> > without a
> > > rnglists_base would use the rnglists_base (and .debug_rnglists
> > > non-dwo) in the executable, but if the split full unit has a
> > > rnglists_base, then the rnglistx in the split full unit use that
> > base
> > > to find rnglists in debug_rnglists.dwo - arguably I'd say we
> > might as
> > > well say the same thing about loclists, too, for consistency,
> > though I
> > > don't have any use for skeleton location lists right now)
> >
> > It seems to me that rnglists base and loclists_base in Split Full
> > always
> > reference the data in .debug_rnglists/.debug_loclists
> >
> > 3.1.3  Split Full Compilation Unit Entries
> > The following attributes are not part of a split full compilation
unit
> > entry but instead are
> > inherited (if present) from the corresponding skeleton compilation
> > unit:
> > DW_AT_low_pc,
> > DW_AT_high_pc, DW_AT_ranges, DW_AT_stmt_list, DW_AT_comp_dir,
> > DW_AT_str_offsets_base, DW_AT_addr_base and DW_AT_rnglists_base.
> >
> >
> > Hmm... yeah. I guess LLVM implements rnglistx /rnglist_base the same
> > as strx/str_offsets_base. Where it assumes that any *x encoding refers
> > to entities in the .dwo, even in the absence of a
> > rnglists_base/str_offsets_base in the split full unit. I had thought
> > we'd implemented it to emit a rnglists_base in the split full unit,
> > which would've been in contrast to the str_offsets_base - so my
> > mistake/apologies for the previous description.
> Still confused.
>
> Lets say skeleton A is in object file OB.
> And OB.dwp contains the split-full CU DIE.
> Lets say non-empty  .debug_rnglists and .debug_rnglists.dwo  exist.
>
> The compiler could create the rnglists for A in *either* OB or OB.dwp.

It sounds like you might be talking specifically/only about the CU-level
ranges (in the phrasing "rnglists for A")? Not about ranges attached to,
say, a lexical_block or inlined_subroutine? Is that the case?

> And could pick and choose, for each split-able Compilation Unit,
> which place to put rnglists
> independently of all other CUs.

FWIW, I'm not objecting to the DWARF spec's requirement that the CU-level
ranges must go in the skeleton CU (though I wouldn't've minded if that was
a "quality of implementation" thing - some producers might want to scrape
those extra few bytes out of the skeleton at the cost of consumers needing
to do the indirection/read the dwo/dwp to find the CU's ranges)

> Meaning both OB and OB.dwp could have rnglists, but only
> one of them has the rnglists entry for any given CU.
>
> How do we know  which .debug_rnglists section  to look at
> given Skeleton A and split-full A?
> Which does the DW_AT_rnglists_base apply to?

The way I think of it - reading some parts of the spec and ignoring others,
and the way it's implemented in LLVM based on my thinking (model (1) in my
previous email) - and rnglistx encoding used in the split full unit (on the
CU DIE or any child DIEs) would be resolved into debug_rnglists.dwo - no
matter the presence/absence of a rnglists_base on the skeleton CU DIE (and
there would never be a rnglists_base on the split full CU DIE). If the
skeleton unit used a rnglistx encoding for anything, it would need a
rnglists_base and the rnglistx would be resolved relative to that.

I guess a few things I'd say:
  I don't think I'd ever want to suggest that a rnglistx on a skeleton DIE
shuold refer to rnglists.dwo (if that's the case, just move the
rnglistx-encoded attribute into the split full unit DIE, since it's useless
on the skeleton by itself). If you have a rnglistx in the skeleton unit,
you must have a rnglists_base on that skeleton DIE.
  I also think it's important that a unit be able to have references from
the skeleton unit to rnglists non-dwo, and to have references from the
split full unit (less important for the unit DIE itself (but perhaps
someone has a need for some other rnglistx encoded extension attribute, for
instance, that they would like to put on the split full unit DIE) but
certainly for the children of that DIE) to rnglists.dwo - the question is
just how to support both of those. Either we assume all *x encodings refer
within their own unit (from the previous comment this, to me, is already
definitely true for the skeleton unit - so that 

Re: [Dwarf-Discuss] More on DW_AT_str_offset_base debug_str_offsets.dwo confusion

2020-09-01 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 1, 2020 at 6:59 AM David Anderson  wrote:

> On 8/31/20 8:39 PM, David Blaikie wrote:
> > Hmm... yeah. I guess LLVM implements rnglistx /rnglist_base the same
> > as strx/str_offsets_base. Where it assumes that any *x encoding refers
> > to entities in the .dwo, even in the absence of a
> > rnglists_base/str_offsets_base in the split full unit. I had thought
> > we'd implemented it to emit a rnglists_base in the split full unit,
> > which would've been in contrast to the str_offsets_base - so my
> > mistake/apologies for the previous description.
>
> So the base addresses are in the skeleton and the actual section
> (rnglists/loclists/str_offsets/str)
> can go with Split Full (i.e, in a .dwo) if it has no addresses but must
> go with the skeleton if has addresses.
>

Sorry, I missed a step/not sure I understand this ^ comment - could you
rephrase/expound/clarify a bit?

I'm suggesting there are two possible ways we could spec this:

1) all loclistx, rnglistx, strx in .dwo are required/guaranteed/defined to
always refer to debug_loclists.dwo, debug_rnglists.dwo,
debug_str_offsets.dwo
  In this model, there's no way to

  -> this is how parts of the spec seem to be already defined, and how
strx/loclistx work in the DWARFv4 GNU extension Split DWARF implementation
(there's no loclists_base or str_offsets_base - the strx/loclistx in
debug_info.dwo is assumed to refer to the str_offsets.dwo/loc.dwo sections)

2) allow/require a *x encoding in a split full unit to refer (when combined
with a *_base attribute on the skeleton CU) a split full unit's
*_rnglist/loclist/str_offsets (non-dwo) contributions if the split full
unit has no *_base attributes. The split full unit's *_base attribute could
then be optionally specified to say "resolve *x encodings relative to/in
the split unit, instead of searching back up into the skeleton unit".

(1) is what LLVM's implemented at the moment, and changing that to (2)
wouldn't be too hard (we'd just always emit a *_base attribute in the split
full unit any time we were using the corresponding *x forms in the split
full unit)

>
> Ok.
>
> This way the standard is not in error as written.


I think there's still some contradictions - the two bits I quoted
previously:

"The DW_AT_addr_base and DW_AT_str_offsets_base attributes provide context
that may be necessary to interpret the contents of the corresponding split
DWARF object file."
"The following attributes are not part of a split full compilation unit
entry but instead are inherited (if present) from the corresponding
skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges,
DW_AT_stmt_list, DW_AT_comp_dir, DW_AT_str_offsets_base, DW_AT_addr_base
and DW_AT_rnglists_base."

If we're going with model (1), then the first of those two quotations
should remove "and DW_AT_str_offsets_base" and the second should carveout a
special case for str_offsets_base and rnglists_base to say they cannot be
specified on a skeleton unit, but also are /not/ inherited by the split
full compilation unit. (essentially the split full compilation unit has
implicit *_base attributes (if you think of them more like the (2) model
above) equal to the size of the contribution headers for those 3 sections)


> This understanding
> restricts what information can be derived from
> the Split Full CU by itself (ie, without the skeleton) a bit since the
> base addresses are not in the Split Full CU DIE.
>

That confuses me a bit. If you have rnglists.dwo, for example - you can't
use the actual rnglists_base from the skeleton CU to find the rnglists.dwo
contribution (because the rnglists_base will be relocated in the final
executable, to a value that has nothing to do with the rnglists.dwo
contribution location in the dwo or the index-relative location in the
dwp). So having the skeleton CU shouldn't make you any more or less able to
parse/dump/etc *x forms in a split full unit if we're using (1). If we're
using (2), then the wording needs to change to say that you must specify
*_base on the split full unit if you want to resolve *x forms in the split
full unit into rnglists.dwo/loclists.dwo/str_offsets.dwo references, and in
the absence of *_base on the split full unit, such *x forms would use the
*_base in the skeleton unit and the rnglists/loclists/str_offsets (non-dwo)
in the linked executable.


>
> Mike Eager: please delete the new issue 200831.1 as it is simply wrong.
>
> DavidA
>
>
>
>
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] More on DW_AT_str_offset_base debug_str_offsets.dwo confusion

2020-08-31 Thread David Blaikie via Dwarf-Discuss
On Mon, Aug 31, 2020 at 10:33 AM David Anderson via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> I has occurred to me that simply restricting skeleton CUs
> to use DW_FORM_string or DW_FORM_strp
> would restore the unique meaning of DW_AT_str_offsets_base
> to apply to the dwp  (letting non-skeleton CUs use
> DW_FORM_strx1 etc ).  With seemingly little impact on
> overall size.
>

Seems a pity for orthogonality (and for a non-standard/extension use that
LLVM has, where skeleton units carry some DIEs (essentially "gmlt"-like
data, enough to symbolize with inline stack frames) in the skeleton CU -
not being able to used indexed strings would be an object size penalty due
to potentially needing to use more relocations)

I think the DWARFv5 spec is a bit conflicted, but does have some wording
that supports LLVM's existing usage:

".debug_info.dwo to .debug_str_offsets.dwo: Attribute values of class
string may have one of the forms DW_FORM_strx, 3 DW_FORM_strx1,
DW_FORM_strx2, DW_FORM_strx3 or 4 DW_FORM_strx4, whose value is an index
into the .debug_str_offsets.dwo section for the corresponding string"
"The string table section in .debug_str.dwo contains all the strings
referenced from DWARF attributes using any of the forms DW_FORM_strx,
DW_FORM_strx1, DW_FORM_strx2, DW_FORM_strx3 or DW_FORM_strx4. Any attribute
in a compilation unit or a type unit using this form refers to an entry in
that unit’s contribution to the .debug_str_offsets.dwo section, which in
turn provides the offset of a string in the .debug_str.dwo section."

 (& some that contradicts it):

"The DW_AT_addr_base and DW_AT_str_offsets_base attributes provide context
that may be necessary to interpret the contents of the corresponding split
DWARF object file."
"The following attributes are not part of a split full compilation unit
entry but instead are inherited (if present) from the corresponding
skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges,
DW_AT_stmt_list, DW_AT_comp_dir, DW_AT_str_offsets_base, DW_AT_addr_base
and DW_AT_rnglists_base."

I'd rather go with LLVM's existing interpretation - that strx encodings
used in .dwo do not attempt to use str_offsets in the skeleton.
But I wouldn't mind adding a str_offsets_base to the split full unit to
make it clear - this would be consistent with rnglists, I think? (I think,
in theory a rnglistx in a .dwo with a split full unit without a
rnglists_base would use the rnglists_base (and .debug_rnglists non-dwo) in
the executable, but if the split full unit has a rnglists_base, then the
rnglistx in the split full unit use that base to find rnglists in
debug_rnglists.dwo - arguably I'd say we might as well say the same thing
about loclists, too, for consistency, though I don't have any use for
skeleton location lists right now)
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-08-07 Thread David Blaikie via Dwarf-Discuss
On Fri, Aug 7, 2020 at 9:43 AM Pedro Alves via Dwarf-Discuss
 wrote:
>
> Hi there!
>
> On 7/31/20 1:17 AM, Tye, Tony via Dwarf-Discuss wrote:
> > For optimized code involving multiple address spaces it is possible to run 
> > into cases where the location of a source language variable requires 
> > multiple address spaces. For example, a source variable may be optimized 
> > and different pieces may be in different places including memory of 
> > multiple address spaces, registers, etc. Describing this situation with a 
> > DW_AT_address_class on a source language variable is not possible as it 
> > provides a single address space that applies to the variable as a whole. 
> > The concept of "address space" is not available on the expression stack 
> > where a composite location description is created.
> >
> >
> >
> > Instead, making address space a property of memory location descriptions 
> > and making location descriptions a first-class concept on the expression 
> > stack solves the problem in a general way and leads to other nice 
> > properties.
>
> I am interesting in re-reading the document describing your changes, but the
> url that was pasted on the list before:
>
>  https://llvm.org/docs/AMDGPUDwarfProposalForHeterogeneousDebugging.html

https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

>
> is now returning "Not Found".  Did the document move somewhere else?
>
> Pedro Alves
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Question about DW_TAG_inlined_subroutine tags

2020-07-30 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 30, 2020 at 12:00 PM Greg Clayton via Dwarf-Discuss
 wrote:
>
> The LTO in clang creates some really interesting DWARF... One of the latest 
> things I discovered is DW_TAG_inlined_subroutine tags that are not contained 
> within a DW_TAG_subprogram. I am guessing the compiler/linker wanted to 
> outline an inlined function and tried its best to move the DWARF and didn't 
> end up changing the tag from DW_TAG_inlined_subroutine to DW_TAG_subprogram.

(you've mentioned a couple of quirky LTO situations that I don't think
I've seen with LLVM's LTO - do you have examples of these (this one
and the other one decl file/line one discussed on llvm-commits)?

> I was thinking of adding code to "llvm-dwarfdump --verify" to detect this 
> issue, but wanted to check with the DWARF list first to make sure this would 
> be considered an error. So I am looking for an answer to:
>
> Is it ok for DW_TAG_inlined_subroutine with high and low PC values to appear 
> on their own, not enclosed in a DW_TAG_subprogram?

The only wording I can find is:

"Each inline expansion of a subroutine is represented by a debugging
information entry with the tag DW_TAG_inlined_subroutine. Each such
entry is a direct child of the entry that represents the scope within
which the inlining occurs."

I guess this could still technically allow inlining into some place
that isn't described as a subprogram or child of a subprogram in the
DWARF (eg: inlining into a global initializer - I guess some DWARF
producer could model that as
DW_TAG_compile_unit{DW_TAG_inlined_subroutine}) - so I'd err on the
side of saying DWARF doesn't categorically disallow this.

But as an LLVM maintainer, I'd be totally fine adding that as a
verifier check to llvm-dwarfdump.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] DWARF for linker GC'd code

2020-07-27 Thread David Blaikie via Dwarf-Discuss
Paul filed an issue covering most of the details of this issue here:
http://dwarfstd.org/ShowIssue.php?issue=200609.1

A quick summary:

non-DWARF-aware linkers (ie: that aren't parsing all the DWARF and
rewriting it during link time, which is expensive) have to resolve
relocations in DWARF that refer to functions that have been garbage
collected or deduplicated at link time. They use a few different
strategies, all with different limitations:

Original Behavior:

   - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
   elsewhere
   - gold/lld: 0+addend everywhere

Limitations/bugs:

   - bfd/gold/lld
  -  doesn't support 0 as a valid executable address without ambiguities
   - gold/lld
  - ambiguities with large gc'd functions combined with a .text mapping
  that starts in relative low addresses
  - premature debug_range termination with zero-length functions (Clang
  produces these with __builtin_unreachable or non-void return
type functions
  without a return statement)

New behavior:

   - -1 (or "maximum value of the address representation") everywhere,
   except:
   - -2 for DWARFv4 debug_loc, debug_ranges (because -1 is a base address
   specifier there)


The trunk version of lld currently implements the new behavior with flags
to customize - but this has broken a bunch of consumers, so we're hoping to
switch to bfd's behavior (fewer limitations/bugs/issues, and more
broadly/long-term deployed/tested than the gold behavior), with a linker
flag to opt-in to the new behavior.

Do folks have any ideas about a deployment strategy for this new behavior?
Especially interested to hear from linker owners and DWARF consumers about
the sort of timelines they might be interested in adopting the new
behavior. (is this a matter of standardizing it in DWARFv6 and waiting
until DWARFv6 is on-by-default then turning it on-by-default in linkers?
Realizing that this functionality won't be able to be tied to the DWARF
version, since the linker doesn't get told the DWARF version and might be
linking DWARF spanning multiple versions - but that timeline might be long
enough to get broad adoption into consumers (it is a rather long timescale
though - I'd guess something like at least another 5 years from now,
probably longer?))

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-16 Thread David Blaikie via Dwarf-Discuss
d descendants have this
> > functionality.  (It's a many to one mapping in the 8086
> implementation,
> > but that's a problem for a bygone era.)  There's a reference to i386
> > memory models in Table 2.7.
> >
> > DWARF assumes a linear address space.  A segmented address maps to a
> > specific address in this linear address space.  The entries in
> > DW_AT_ranges for subprograms with different segment addresses would
> > usually be referenced by their address in the linear address space.
> If
> > DW_AT_ranges has a DW_AT_segment, this is an indication that the
> > debugger is to perform the processor-specific computation to
> translate
> > the segment-address pair to the linear address.
> >
> > There is no need to do anything with segments in the line table,
> since
> > the line table contains addresses in the linear address space.
> >
> > There is some (perhaps considerable) confusion in terminology in the
> > x86
> > world, because the x86 has multiple segment registers which on other
> > processors would be called base registers.  The values in these
> > registers reference memory segments and are added to whatever offset
> is
> > contained in the program to generate an address.  These segment
> > registers, and the memory segments which they point to, are NOT the
> > segments represented by DW_AT_segment.
> >
> > Re "reading the segment selector" and "addrx encoding":  The
> addresses
> > in DWARF DIEs are static, not dynamic.  There is no register+offset
> > encoding, and processor registers are not read to determine where a
> > subprogram is in memory.
> >
> >
> > Sorry, I don't quite follow the connections between all those statements.
>
> Perhaps I didn't understand your comments about "reading the segment
> selector" and "addrx encoding".
>
> TL;DR:
> DW_AT_segment was designed to describe x86 memory model addresses:
> https://en.wikipedia.org/wiki/Intel_Memory_Model.
>
> Possibly other architectures can use it, but I'm not familiar with any
> that do.
>
>
> > 2.17 says that if a DIE has a DW_AT_high_pc and DW_AT_segment, then the
> > high_pc is relative to the specified segment. That's a bit redundant if
> > high_pc uses FORM_addrx, because the address in the address pool can
> > specify its own segment, but a producer could choose which way to go
> > there. (presumably if the AT_segment is there, you should interpret the
> > addrx high_pc relative to that segment - assuming debug_addr has no
> > segment selector in it - or perhaps it should go the other way and
> > ignore the local AT_segment and only rely on whatever segment is in
> > debug_addr)
>
> DW_FORM_addrx (and the .debug_addr section) were introduced in DWARF V5
> to allow compression of DW_FORM_addr addresses.  DW_AT_segment is
> intended to describe an (x86) address in the form that the processor
> uses.  The first is one of many different compression schemes in DWARF,
> the second is part of an architectural description.
>

debug_addr supports segment selectors - in the debug_addr header it has a
field for "segment selector size" and the entries in the address list are
"segment/address pairs.".

So now there's two ways a segment selector for an address could be
specified - if you had a DW_TAG_subprogram with a DW_AT_low_pc using addrx
into a debug_addr with a non-zero segment selector and the subprogram also
had a DW_AT_segment, wonder which one's meant to win.

Though mostly my point was: since debug_addr entries can have segment
selectors, then debug_rnglists can have different segments for different
subranges within a singular range list. But without that (either using
direct addresses, or in v4 debug_ranges) you couldn't vary segment across a
single range list. Though the debug_rnglist header does have a segment
selector size in it - it doesn't seem to use it anywhere in its format
(similarly, debug_loclists and debug_line v5 has a segment selector size,
but doesn't seem to use it?).


>
> > On 7/15/20 4:31 PM, David Blaikie via Dwarf-Discuss wrote:
> >  > Looking at how segment selectors work:
> >  >
> >  > DW_AT_segment: Applies to a DIE subtree, including any ranges,
> > high/low
> >  > pc, locations, labels, etc
> >  > debug_range/loc (v4 and below): Doesn't seem to allow specifying
> > segment
> >  > variation - inherits from the segment given on the nearest parent
> > DIE
> >  > that refers to the entry
> >  > debug_rnglis

Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 6:35 PM Michael Eager  wrote:

> On 7/16/20 5:37 PM, David Blaikie wrote:
> >
> >
> > On Thu, Jul 16, 2020 at 4:25 PM Michael Eager 
> > The language used to describe segmented addressing in DW_AT_segment
> > reads to me like the same language used to describe segmented addresses
> > in debug_aranges - it reads to me like they refer to the same concept.
> > What aspect of the wording do you find distinguishes between these two
> > discussions on segmented addresses?
>
> They refer to the same segmented addresses.
>
> There are not two discussions of segmented addresses in the DWARF spec.
>
> > Reread my previous comment about the meaning of segment as used in
> > DW_AT_segment.   You apparently want it to mean something different.
> >
> >
> > There seems to be some significant disconnect between how we're each
> > understanding these concepts - could you help me understand/perhaps use
> > different language that might help me understand/connect with your
> > reading/understanding here? I'm trying my best to understand what the
> > spec is saying/how to interpret it.
>
> DW_AT_segment was intended to support addressing schemes like that used
> by x86 models.  The description is somewhat generalized, but that is the
> core of the concept.  If Intel used different terminology for their
> scheme, we would have used a different name.  In their usage, a
> segmented address maps (many-to-one) on a linear address space.
>
> Whether DW_AT_segment can be used to support a more general concept of
> disjoint address spaces is unclear.  AFAIK, no one has used it for this
> purpose.  DW_AT_address_class has been used, as I understand it, but I'm
> not familiar with these architectures.
>
> > You've mentioned they can be used - but I'm still pretty confused by how
> > they would be used to achieve that result. Do you happen to know of an
> > implementation that uses them in this way/any examples of DWARF using
> > the feature? I think that'd be realyl helpful to ground the discussion
> > with concrete examples.
>
> I've given you a concrete example: x86.
>

I was hoping for steps to produce DWARF that uses the feature from an
existing producer, to see how that producer interpreted the wording we're
discussing.

I'll see if I can make GCC do this, but I'm pretty unfamiliar with GCC for
non-host architectures - wouldn't mind any more specific tips if you have
them.

> My reading still seems to indicate that all the dwarf sections are meant
> > to use segment-relative addresses (all the wording for address size says
> > it should be the segment-relative address size, not an address in an
> > alternative linear address space)
>
> The majority (that is, all but one) of architectures use flat address
> spaces and have segment size = 0.
>
> The fact that DWARF supports an architecture which has a non-zero
> segment size should not be taken (as you appear to) to imply that all
> architectures must be somehow compelled to use non-zero segment size.
> The availability of a feature does not mandate its use.
>

Sorry, I didn't mean to imply that. I think what I meant to imply is that
it seems like if a producer is using segmented addresses in one part of
their output, they may want or need to do so in all sections.

But yeah, fully don't expect to need/use segmented addressing at all in
most/current producers.

> It sounds like you're suggesting that an implementation may choose on a
> > per-section basis whether to use segmented addressing (because this
> > assumes an existing alternative linear addressing per x86) - and that
> > some sections /require/ the use of that linear addressing mode. Is that
> > the idea?
>
> An x86 memory model implementation might use segmented addressing in the
> .debug_info section, but not in the .debug_line section.
>
> Given the aliasing in x86 segmented memory, I'm not sure how it could be
> supported in the line table.
>

Yep, pretty confused about that - hopefully with a concrete example this'll
be more clear to me.

> But the segmented addressing section seems to say "addresses are
> > specified as offsets within a given segment rather than as locations
> > within a single flat address space" - sounds like it's talking about
> > systems where there is no flat address space, in which case the sections
> > requiring a linear addressing would present a problem when it comes to
> > rendering them in DWARF.
>
> You are reading more into the text than is there.  Indeed, the example
> given does have a linear address space.
>
> If you want a concrete example, x86.   If you want to postulate an
> architecture which cannot be described in DWARF, have at it, but it's
> not a fruitful or illuminating discussion.
>
> > DWARF uses the same description for segmented addresses almost
> > everywhere that an address is used.  This is for consistency.  The
> > meaning is the same everywhere.
> >
> >
> > If the meaning is the same everywhere, then it seems strange that 

Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 4:25 PM Michael Eager  wrote:

> On 7/16/20 2:57 PM, David Blaikie wrote:
> >
> >
> > On Thu, Jul 16, 2020 at 2:05 PM Michael Eager  >
> > You appear to be starting with a counterfactual premise and using
> that
> > to postulate a problem where none exists.
> >
> >
> > Sorry - I seem to be misunderstanding what you mean by "there are no
> > restrictions on where code or data are located" - I understood that to
> > mean different bits of code could be in different segments.
>
> It does no one good when you take a comment about one point
> (.debug_aranges) and contort it to apply to an unrelated point
> (DW_AT_segment).
>

The language used to describe segmented addressing in DW_AT_segment reads
to me like the same language used to describe segmented addresses in
debug_aranges - it reads to me like they refer to the same concept. What
aspect of the wording do you find distinguishes between these two
discussions on segmented addresses?


> Reread my previous comment about the meaning of segment as used in
> DW_AT_segment.   You apparently want it to mean something different.
>

There seems to be some significant disconnect between how we're each
understanding these concepts - could you help me understand/perhaps use
different language that might help me understand/connect with your
reading/understanding here? I'm trying my best to understand what the spec
is saying/how to interpret it.

> As previously stated, DW_AT_segment provides a way to represent x86
> > segmented addressing.  Each segmented address is mapped to an
> > address in
> > a linear address space.  The mapped address can be used in the
> ranges.
> >
> >
> > Where does the spec say that? How do we construct an answer to the
> > original question from this thread from the words in the spec?
>
> There are many things that the spec does not say about implementation.
> We sometimes suggest best practices, but we don't require implementors
> to follow them.  The DWARF spec is also not written to prevent
> implementors from doing things badly.
>
> Providing an example of the use of DW_AT_segment provides a strong hint
> of how it may be used, without constraining it to one specific
> architecture, or preventing it from being used in other ways.  When we
> have had discussions about whether to give more specific implementation
> details, we have frequently decided to let matters go with a hint,
> rather than a prescription.
>
> While we cannot prevent people from misreading or misinterpreting the
> DWARF spec, we try to answer questions as they arise.
>
> Having been told how address ranges might be implemented for x86
> segmented addresses, is there more to add?
>

You've mentioned they can be used - but I'm still pretty confused by how
they would be used to achieve that result. Do you happen to know of an
implementation that uses them in this way/any examples of DWARF using the
feature? I think that'd be realyl helpful to ground the discussion with
concrete examples.

My reading still seems to indicate that all the dwarf sections are meant to
use segment-relative addresses (all the wording for address size says it
should be the segment-relative address size, not an address in an
alternative linear address space)

It sounds like you're suggesting that an implementation may choose on a
per-section basis whether to use segmented addressing (because this assumes
an existing alternative linear addressing per x86) - and that some sections
/require/ the use of that linear addressing mode. Is that the idea?

But the segmented addressing section seems to say "addresses are specified
as offsets within a given segment rather than as locations within a single
flat address space" - sounds like it's talking about systems where there is
no flat address space, in which case the sections requiring a linear
addressing would present a problem when it comes to rendering them in DWARF.


>
> > I don't especially have a need for segmented addresses myself - so I may
> > not be the best person to push for changes/clarifications here - I was
> > trying to answer the original poster's question using what I can see
> > from the spec.
>
> It's rarely beneficial to get into a hypothetical debate.  If you have
> no use case and are just postulating problems for which there is no
> evidence, there isn't much to be gained pursuing this.
>

The use case seems to be the original poster's question - and some
questions/uncertanties I had when reading the spec to try to understand it
to answer the question.

> You appear to be reading the standard to mean something other than
> what
> > it says.  FORM_addrx is a method to compress FORM_addr, nothing more.
> >
> >
> > But it describes segmented addresses - it says so specifically, doesn't
> > it? Using the same wording that is used to describe segmented addresses
> > elsewhere in the spec.
>
> DWARF uses the same description for segmented addresses almost
> everywhere that an 

Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 2:05 PM Michael Eager  wrote:

> On 7/16/20 1:36 PM, David Blaikie wrote:
> >
> >
> > On Thu, Jul 16, 2020 at 1:07 PM Michael Eager  > > wrote:
> >
> >  > Perhaps it's more like Paul was postulating - that the spec
> > assumes code
> >  > is in a code segment/doesn't need to be clarified. (but that gets
> > a bit
> >  > confused in debug_aranges - if it only is meant to contain code
> (not
> >  > data), why does it need a segment selector - and also in the DIEs
> > - if
> >  > code is always in a known/assumable segment then why can you vary
> >  > segment for low_pc/high_pc/ranges?)
> >
> > No, the spec says what it says.   There are no restriction on where
> > code
> > or data are located.
> >
> >
> > OK - then if subprograms can be in different segments, how would the
> > ranges on the CU be used to describe that? It seems to me that a range
> > list can't contain regions in more than one segment, which presents a
> > problem for describing such a situation, no?
>
> You appear to be starting with a counterfactual premise and using that
> to postulate a problem where none exists.
>

Sorry - I seem to be misunderstanding what you mean by "there are no
restrictions on where code or data are located" - I understood that to mean
different bits of code could be in different segments.


> As previously stated, DW_AT_segment provides a way to represent x86
> segmented addressing.  Each segmented address is mapped to an address in
> a linear address space.  The mapped address can be used in the ranges.
>

Where does the spec say that? How do we construct an answer to the original
question from this thread from the words in the spec?


>
> > (speaking of header consistency - it's a pity the debug_macro section
> > didn't end up with a more consistent header that started with a length,
> > then a version - without the length prefix it'll be hard to skip over
> > these in a consumer (especially something like a dumper) if their
> > version is unknown (in future versions, for instance))
>
> Rather than lard a thread with extraneous comments about unrelated
> issues, submit a comment on the DWARF Standard Public Comment page:
> http://dwarfstd.org/Comment.php


Fair point, for sure - sorry about that. Done.


>
>
> >  > debug_addr supports segment selectors - in the debug_addr header
> > it has
> >  > a field for "segment selector size" and the entries in the
> > address list
> >  > are "segment/address pairs.".
> >  >
> >  > So now there's two ways a segment selector for an address could be
> >  > specified - if you had a DW_TAG_subprogram with a DW_AT_low_pc
> using
> >  > addrx into a debug_addr with a non-zero segment selector and the
> >  > subprogram also had a DW_AT_segment, wonder which one's meant to
> win.
> >
> > Again, FORM_addrx doesn't mean the same as DW_AT_segment.
> >
> >
> > The spec seems to use exactly the same language. All the addresses seem
> > to say they only contain the offset portion of the address - so they'd
> > all need a segment selector to resolve which segment they should be
> > relative to, right? But some of them don't have a segment selector -
> > though the spec says loc/range/etc should be relative to the segment on
> > the DIE that references the loclist/rnglist/address - but then the
> > debug_addr has redundant (or possibly contradictory... ) segment
> > selectors and the range/loclists could only describe things in one
> > segment (so you couldn't use CU ranges to describe one function in one
> > segment and another function in a different segment)
>
> There are always ambiguities in written text.  If you have a specific
> comment about wording in the DWARF Specification, please submit a Public
> Comment.


I don't especially have a need for segmented addresses myself - so I may
not be the best person to push for changes/clarifications here - I was
trying to answer the original poster's question using what I can see from
the spec.

> They are orthogonal concepts.  Compression techniques, like
> FORM_addrx,
> > should not be used to describe architectural features.
> >
> >
> > I agree there clearly shouldn't be something you can express in a
> > debug_rnglist or debug_loclist (or a DIE attribute using FORM_addrx)
> > with the *x forms that you can't describe with the non-*x forms, but
> > from the semantics represented, it looks like that's the case, even if
> > it's not how it should be.
>
> There are no semantics represented in DWARF.
>

Semantics in the sense that a range list can refer to an address and that
address can have a segment selector. The meaning of the bits in the file -
not higher level semantics of "what is a class" (whatever a producer and
consumer agree that it is) but "these bits are specified to describe a
range using this address and this address has a segment selector".

You appear to be reading the 

Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 1:07 PM Michael Eager  wrote:

> > Perhaps it's more like Paul was postulating - that the spec assumes code
> > is in a code segment/doesn't need to be clarified. (but that gets a bit
> > confused in debug_aranges - if it only is meant to contain code (not
> > data), why does it need a segment selector - and also in the DIEs - if
> > code is always in a known/assumable segment then why can you vary
> > segment for low_pc/high_pc/ranges?)
>
> No, the spec says what it says.   There are no restriction on where code
> or data are located.
>

OK - then if subprograms can be in different segments, how would the ranges
on the CU be used to describe that? It seems to me that a range list can't
contain regions in more than one segment, which presents a problem for
describing such a situation, no?

> AFAIK, all addresses can be segmented addresses, except in the line
> > table where it isn't needed.
> >
> > Perhaps we should have (long ago) required flat/linear addresses for
> > x86
> > instead of segmented addresses.
> >
> >
> > What's the line table's segment_selector_size (in the DWARFv5 header)
> > for? (this sort of agrees and disagrees with you - it's there, but it's
> > not used in any part of the debug_line format that I can see)
>
> It may be there for consistency across all headers.
>

Seems a bit quirky at best, given the value is unused.
(speaking of header consistency - it's a pity the debug_macro section
didn't end up with a more consistent header that started with a length,
then a version - without the length prefix it'll be hard to skip over these
in a consumer (especially something like a dumper) if their version is
unknown (in future versions, for instance))


> > debug_addr supports segment selectors - in the debug_addr header it has
> > a field for "segment selector size" and the entries in the address list
> > are "segment/address pairs.".
> >
> > So now there's two ways a segment selector for an address could be
> > specified - if you had a DW_TAG_subprogram with a DW_AT_low_pc using
> > addrx into a debug_addr with a non-zero segment selector and the
> > subprogram also had a DW_AT_segment, wonder which one's meant to win.
>
> Again, FORM_addrx doesn't mean the same as DW_AT_segment.
>

The spec seems to use exactly the same language. All the addresses seem to
say they only contain the offset portion of the address - so they'd all
need a segment selector to resolve which segment they should be relative
to, right? But some of them don't have a segment selector - though the spec
says loc/range/etc should be relative to the segment on the DIE that
references the loclist/rnglist/address - but then the debug_addr has
redundant (or possibly contradictory... ) segment selectors and the
range/loclists could only describe things in one segment (so you couldn't
use CU ranges to describe one function in one segment and another function
in a different segment)

2.12 Segmented Addresses:
"The description evaluates to the segment selector of the item being
described."

6.1.2 Lookup by Address: (this one it sounds like you're saying is meant to
refer to the same kind of segmenting as 2.12? To allow lookup by address
across data and code, for instance, that would be in different segments?)
"address_size (ubyte) The size of an address in bytes on the target
architecture. For segmented addressing, this is the size of the offset
portion of the address."
"segment_selector_size (ubyte): The size of a segment selector in bytes on
the target architecture. If the target system uses a flat address space,
this value is 0."

6.2.4: The Line Number Program Header
"address_size (ubyte) A 1-byte unsigned integer containing the size in
bytes of an address (or offset portion of an address for segmented
addressing) on the target system"
"segment_selector_size (ubyte) A 1-byte unsigned integer containing the
size in bytes of a segment selector on the target system.
"*The segment_selector_size field is new in DWARF Version 5. It is needed
in combination with the address_size field to accurately characterize the
address representation on the target system.*"
(but doesn't seem to actually use the segment selector size anywhere? so it
assumes some known constant/implementation-defined segment?)

6.4.1 Structure of Call Frame Information
"segment_selector_size (ubyte) The size of a segment selector in this CIE
and any FDEs that use it, in bytes"
"initial_location (segment selector and target address) 2 The address of
the first location associated with this table entry. If the 3
segment_selector_size field of this FDE’s CIE is non-zero, the initial 4
location is preceded by a segment selector of the given length."
6.4.2.1: "If the segment_selector_size field of this FDE’s CIE is non-zero,
8 the initial location is preceded by a segment selector of the given
length."

7.5.1.1 Full and Partial Compile Unit Headers:
"address_size (ubyte) A 1-byte unsigned integer representing the size in

Re: [Dwarf-Discuss] modeling different address spaces

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 12:55 PM Michael Eager  wrote:

> On 7/16/20 11:51 AM, David Blaikie wrote:
> >
> >
> > On Thu, Jul 16, 2020 at 11:41 AM Robinson, Paul via Dwarf-Discuss
> > The example that most often comes up is Harvard architectures.  As it
> > happens, I think it's nearly always obvious from context whether a
> given
> > address is data-segment or code-segment.  The only time it's not,
> > that I'm
> > aware of, is in the .debug_aranges section, where addresses are
> > associated
> > with compile-units without any indication of whether they are code
> > or data
> > addresses.  I've heard arguments that .debug_aranges should only
> > have code
> > addresses in it, but I don't think that's what the spec says.
> >
> >
> > Curious - the spec doesn't seem to read that way to me - and if that
> > were its goal, it seems like DW_AT_segment wouldn't really be needed.
>
> As Paul said, DW_AT_segment is not generally needed to describe a
> Harvard architecture.
>
> > DW_AT_locations would always be data, DW_AT_high/low/ranges would always
> > be code, etc? The spec... specifically says DW_AT_segment applies to
> > high/low/ranges, and describes a parent-DIE delegation scheme that seems
> > to suggest that some DIEs could have one segment, and others could have
> > a different segment - but no way for ranges to have different segments
> > in different parts of the same range list, which seems to be at odds
> > with the ability to vary segment across a DIE tree - you couldn't put a
> > ranges at the top of such a variegated tree...
>
> On some (many?) architectures, data and code may be interspersed, for
> example, to place constant data with the executable text.
>
> DIEs can have different DW_AT_segments.
>
> Entries in .debug_aranges have segment, offset, and length.
>
> What is the use case for having multiple segments in a range list?
>

The same as the use case for varying the segment per high_pc, I'd have
thought - if code is in different segments. But if you're suggesting the
only valid use of DW_AT_segment is to always have it have the same value on
any DIE with ranges/low/high pc, and always the same (but possibly a
distinct value) for DW_AT_location-having DIEs, fair enough. The spec
doesn't seem to say that, though. It suggests it could vary more than that?

Guess for things like the original poster's case (though, like Paul, I've
not read AMD's proposal here) - GPU code with a separate address space from
CPU code, for instance.


> > (& yeah, the arange situation crossed my mind too - on both counts you
> > noted (that it needs it, but that it may not - because some
> > interpretations suggest it should only contain code addresses anyway))
>
> As Paul mentioned, that's not what the spec says.
>
> > & not sure how any of this resolves the "but debug_addr has segment
> > selectors"
> >
> > nor "what's the point of segment selector size in debug_rnglists,
> > debug_loclists, and debug_line" - none of those sections seem to contain
> > segment selectors, so why do their headers describe the size of such a
> > thing?
>
> Location lists contain segment, offset, and length.
>

I don't see any mention of segment in 2.6.2 - or do you mean base address
selection entries? Yeah, you could use those (& can use them even in an
unsegmented situation) & can use those in debug_ranges too. So, again, if
the notion is that everything's really in a contiguous address space &
segment selectors are a convenience - that lines up here.

loc/range and loclist/rnglist seem to use the same encoding for their
address ranges (loc/loclist just have the extra location tacked on after
the address range description) - as far as I've read.


>
> --
> Michael Eager
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-07-16 Thread David Blaikie via Dwarf-Discuss
On Thu, Jul 16, 2020 at 11:41 AM Robinson, Paul via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> (resending, this time without dropping the list from the cc: grump grump)
>
> > -Original Message-
> > From: Dwarf-Discuss  On Behalf
> > Of Michael Eager via Dwarf-Discuss
> > Sent: Thursday, July 16, 2020 2:12 PM
> > To: todd.al...@concurrent-rt.com; Metzger, Markus T
> > 
> > Cc: dwarf-discuss@lists.dwarfstd.org
> > Subject: Re: [Dwarf-Discuss] modeling different address spaces
> >
> > On 7/16/20 10:06 AM, Todd Allen via Dwarf-Discuss wrote:
> > > Markus, Michael, David, Xing,
> > >
> > > I always assumed that the segment support in DWARF was meant to be more
> > general,
> > > and support architectures where there was no single flat memory, and so
> > the
> > > segments were necessary for memory accesses.  I personally have not
> > dealt with
> > > any architectures where DW_AT_segment came into play, though.
> >
> > It is phrased in a way to make it less architecturally specific.  That's
> > in keeping with our desire to prevent DWARF from including architecture
> > specific specifications.  For example, we don't want to say "on ARM do
> > this" but on "MIPS do that".  DWARF doesn't specify how the translation
> > from segmented to linear addresses is done.
>
> The example that most often comes up is Harvard architectures.  As it
> happens, I think it's nearly always obvious from context whether a given
> address is data-segment or code-segment.  The only time it's not, that I'm
> aware of, is in the .debug_aranges section, where addresses are associated
> with compile-units without any indication of whether they are code or data
> addresses.  I've heard arguments that .debug_aranges should only have code
> addresses in it, but I don't think that's what the spec says.
>

Curious - the spec doesn't seem to read that way to me - and if that were
its goal, it seems like DW_AT_segment wouldn't really be needed.
DW_AT_locations would always be data, DW_AT_high/low/ranges would always be
code, etc? The spec... specifically says DW_AT_segment applies to
high/low/ranges, and describes a parent-DIE delegation scheme that seems to
suggest that some DIEs could have one segment, and others could have a
different segment - but no way for ranges to have different segments in
different parts of the same range list, which seems to be at odds with the
ability to vary segment across a DIE tree - you couldn't put a ranges at
the top of such a variegated tree...

(& yeah, the arange situation crossed my mind too - on both counts you
noted (that it needs it, but that it may not - because some interpretations
suggest it should only contain code addresses anyway))

& not sure how any of this resolves the "but debug_addr has segment
selectors"

nor "what's the point of segment selector size in debug_rnglists,
debug_loclists, and debug_line" - none of those sections seem to contain
segment selectors, so why do their headers describe the size of such a
thing?


> --paulr
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-15 Thread David Blaikie via Dwarf-Discuss
On Wed, Jul 15, 2020 at 8:00 PM Xing GUO  wrote:

> Hi David,
>
> On 7/16/20, David Blaikie  wrote:
> > Looking at how segment selectors work:
> >
> > DW_AT_segment: Applies to a DIE subtree, including any ranges, high/low
> pc,
> > locations, labels, etc
> > debug_range/loc (v4 and below): Doesn't seem to allow specifying segment
> > variation - inherits from the segment given on the nearest parent DIE
> that
> > refers to the entry
> > debug_rnglist/loclist (v5): includes segment selector size in the header,
> > but doesn't seem to use it - segment selection via addresses in the
> address
> > pool (RLE/LLE_*x encodings) would allow fine-grained segment selection,
> but
> > direct address forms don't seem to allow segment selection ("This operand
> > is the
> > 19 same size as used in DW_FORM_addr.")
>
> Thanks for your explanation.
>

Note: there seems to be some disagreement with my understanding - and I
haven't been around these parts nearly as long as some other folk - so
there's a fair chance I'm wrong/misunderstanding/misreading things.


> So, If we want to use segmented addresses in the debug_rnglist or
> debug_loclist table, we should use encodings like: startx_endx,
> startx_length, etc., to index the {segment/address} pairs in the
> address table of the .debug_addr section rather than the direct
> address forms like: start_end, start_length, etc., right?
>

That's /my/ understanding. (side note: imho: it's good to use the addrx
forms anyway - since you can reduce relocations (& thus object size) that
way - try some experiments with Clang's DWARFv5 support to see how I
implemented it there/how I think it should be done (admittedly I have a
bias towards reducing object size especially, since that's a particular
issue for my customers))


> Besides, the segment_selector_size is just an indicator that tells us
> if the current debug_rnglist/debug_loclist table is using segmented
> address and it should have no effects on parsing the encoding and
> operands (e.g., {DW_RLE_startx_endx, operands0, operands1},
> {DW_RLE_start_end, operands0, operands1}) in the entries.
>

None that I can see - which seems suspicious to me. (similarly in
debug_line - there's a segment_selector_size in the header, but no use that
I know of in the actual parsing of the implementation... )


>
> > debug_addr: segment_size in header, then list of {segment selector,
> > address}
> > debug_aranges: segment_size in header says, then the list contains
> triples
> > of {segment selector, start address, length}
> > debug_line: v5 encodes the address and segment selector size in the
> header,
> > but I'm not sure if/how it's used. The DW_LNE_set_address operation says:
> > "The DW_LNE_set_address opcode takes a single relocatable address as an
> > operand. The size of the operand is the size of an address on the target
> > machine. It sets the address register to the value given by the
> relocatable
> > address and sets the op_index register to 0." - doesn't sound like it's
> > reading the segment selector there.
> >
> > So... I don't think DWARFv5 made anything worse - if anything it did
> enable
> > /a/ way to use fine grained segment selectors in range lists and location
> > lists that doesn't appear, to me, to have been provided before. (it could
> > be needed if you had some functions in some segment and some functions in
> > another segment (which could be represented at the subprogram DIE level -
> > DW_AT_segment 1 on one DW_TAG_subprogram, DW_AT_segment 2 on another
> > DW_TAG_subprogram - but how would you represent the DW_AT_ranges for this
> > CU (in DWARFv4, or in DWARFv5 without using addrx encodings)? I don't
> know
> > how, because I think debug_ranges could describe one range list entry as
> > being from one segment, and another range list entry as being in another
> > segment - they would all be in whatever segment was in DW_AT_segment on
> the
> > CU)
> >
> > does that make sense? Have I missed something about how you could use
> > segment selectors in a debug_loc, debug_ranges, or loclist/rnglist that
> > isn't using an addrx encoding?
> >
> > On Wed, Jul 15, 2020 at 6:37 AM Robinson, Paul via Dwarf-Discuss <
> > dwarf-discuss@lists.dwarfstd.org> wrote:
> >
> >>
> >>
> >> > -Original Message-
> >> > From: Dwarf-Discuss  On
> >> > Behalf
> >> > Of Xing GUO via Dwarf-Discuss
> >> > Sent: Tuesday, July 14, 2020 10:39 PM
> >> > To: dwarf-discuss@lists.dwarfstd.org
> >> > Subject: [Dwarf-Discuss] Segment selectors for the range list table.
> >> >
> >> > Hi there,
> >> >
> >> > The DWARFv5 spec mentioned that there might be segment selectors in
> >> > the range list entries and when the segment_selector_size is 0, the
> >> > segment selectors are omitted from the range list entries. However, it
> >> > didn't mention how the segment selector should be encoded when the
> >> > segment_selector_size isn't 0. Can anyone help me figure it out?
> >> > Thanks a lot!
> >>
> >> Hi Xing,
> >>
> >> The segment selectors 

Re: [Dwarf-Discuss] Segment selectors for the range list table.

2020-07-15 Thread David Blaikie via Dwarf-Discuss
On Wed, Jul 15, 2020 at 7:07 PM Michael Eager via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Segmented addresses have been in the DWARF specification since Version 2
>   and AFAIK have not been changed since that time.  DWARF V5 did not add
> any functionality to segmented addresses that was not present in DWARF
> V2/3.  At least, there was no intention to do so.  Segmented addresses
> are described in Section 2.12.
>
> A segmented address maps into a linear address in a processor-specific
> fashion.


That seems at odds with the non-normative text of 2.12 "In some systems,
addresses are specified as offsets within a given segment rather than as
locations within a single flat address space."

And also would be confusing to me - if there is a contiguous linear address
space, why would DWARF need to specify the use of a segment selector, and
why do some references to addresses allow the inclusion of a segment
selector and some don't? Why not just always use the non-segmented address
description for DWARF?

& I don't find any mention of this idea that some addresses are absolute
and some are segment-relative in 2.12 - it does say that "If none of the
entries in the chain of parents for this entry back to its containing
compilation unit entry have DW_AT_segment attributes, then the entry is
assumed to exist within a flat address space." - as though a flat (I assume
this is synonymous with "linear"?) address space is distinct from the
segmented address space being discussed otherwise?


> AFAIK, only the Intel 8086 and descendants have this
> functionality.  (It's a many to one mapping in the 8086 implementation,
> but that's a problem for a bygone era.)  There's a reference to i386
> memory models in Table 2.7.
>
> DWARF assumes a linear address space.  A segmented address maps to a
> specific address in this linear address space.  The entries in
> DW_AT_ranges for subprograms with different segment addresses would
> usually be referenced by their address in the linear address space.  If
> DW_AT_ranges has a DW_AT_segment, this is an indication that the
> debugger is to perform the processor-specific computation to translate
> the segment-address pair to the linear address.
>
> There is no need to do anything with segments in the line table, since
> the line table contains addresses in the linear address space.
>
> There is some (perhaps considerable) confusion in terminology in the x86
> world, because the x86 has multiple segment registers which on other
> processors would be called base registers.  The values in these
> registers reference memory segments and are added to whatever offset is
> contained in the program to generate an address.  These segment
> registers, and the memory segments which they point to, are NOT the
> segments represented by DW_AT_segment.
>
> Re "reading the segment selector" and "addrx encoding":  The addresses
> in DWARF DIEs are static, not dynamic.  There is no register+offset
> encoding, and processor registers are not read to determine where a
> subprogram is in memory.
>

Sorry, I don't quite follow the connections between all those statements.

2.17 says that if a DIE has a DW_AT_high_pc and DW_AT_segment, then the
high_pc is relative to the specified segment. That's a bit redundant if
high_pc uses FORM_addrx, because the address in the address pool can
specify its own segment, but a producer could choose which way to go there.
(presumably if the AT_segment is there, you should interpret the addrx
high_pc relative to that segment - assuming debug_addr has no segment
selector in it - or perhaps it should go the other way and ignore the local
AT_segment and only rely on whatever segment is in debug_addr)


>
> On 7/15/20 4:31 PM, David Blaikie via Dwarf-Discuss wrote:
> > Looking at how segment selectors work:
> >
> > DW_AT_segment: Applies to a DIE subtree, including any ranges, high/low
> > pc, locations, labels, etc
> > debug_range/loc (v4 and below): Doesn't seem to allow specifying segment
> > variation - inherits from the segment given on the nearest parent DIE
> > that refers to the entry
> > debug_rnglist/loclist (v5): includes segment selector size in the
> > header, but doesn't seem to use it - segment selection via addresses in
> > the address pool (RLE/LLE_*x encodings) would allow fine-grained segment
> > selection, but direct address forms don't seem to allow segment
> > selection ("This operand is the
> > 19 same size as used in DW_FORM_addr.")
> > debug_addr: segment_size in header, then list of {segment selector,
> address}
> > debug_aranges: segment_size in header says, then the list contains
> > triples of {segment selector, start address, length}

Re: [Dwarf-Discuss] end_seq row at same address as previous row

2020-07-09 Thread David Blaikie via Dwarf-Discuss
I think LLVM produces some cases like this (maybe not at sequence end,
but for other instructions (emit two copies without an advance PC
between them (but maybe an advance line, etc) - in both cases, yeah,
I'd consider this to be probably-valid-but-trivially-inefficient
output. The way I think of reading this is that the line table
describes instructions in the (empty) range [0x4004c6, 0x4004c6) as
being on line 30.

On Thu, Jul 9, 2020 at 11:18 AM Tom de Vries via Dwarf-Discuss
 wrote:
>
> Hi,
>
> I came across the following line table program in gdb test-case
> dw2-ranges-base.exp:
> ...
> $ readelf -wl outputs/gdb.dwarf2/dw2-ranges-base/dw2-ranges-base
>
>  Line Number Statements:
>   [0x0154]  Extended opcode 2: set Address to 0x4004ba
>   [0x015f]  Advance Line by 10 to 11
>   [0x0161]  Copy
>   [0x0162]  Advance PC by 12 to 0x4004c6
>   [0x0164]  Advance Line by 19 to 30
>   [0x0166]  Copy
>   [0x0167]  Extended opcode 1: End of Sequence
> ...
>
> My understanding of this is as follows.
>
> The Copy followed by End-of-Sequence is incorrect.
>
> Both the Copy and the End-of-Sequence append a row to the matrix, each
> using the same address: 0x4004c6.  The Copy declares a target
> instruction at that address.  The End-of-Sequence declares that the
> sequence ends before that address.
>
> It's a contradiction that the target instruction is both part of the
> sequence (according to Copy) and not part of the sequence (according to
> End-of-Sequence).
>
> Can you confirm that this analysis is correct?
>
> If so, is there a standard term to describe this problem?
> Incorrect/malformed/invalid/non-conforming/non-sensical or some such?
>
> [ FWIW, gdb handles this type of line table program by deleting the row
> corresponding to the Copy.  The related comment mentions that it removes
> "empty lines" from the line table.  I don't understand the use of the
> term "empty line" for this. ]
>
> Thanks,
> - Tom
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Usage of STRP forms in DWO files.

2020-05-20 Thread David Blaikie via Dwarf-Discuss
my 2c: "If STRP forms are allowed only in DWO files which cannot be
combined into a DWP file, then a packaging utility should be smart enough
to detect such input files and reject them. As there is no simple sign for
that, the tool should analyze sections in input files to check if those
forms are actually used; that parsing will slow down the processing, but it
seems inevitable if we want the tool to be reliable."
That would be too expensive (not the worst - it could be accomplished by
scanning debug_abbrev only - but the point of the dwo/dwp format was to
avoid the packaging tool having to read much of the DWARF at all - index,
str_offsets/strings, and that's basically it).

I think it's probably not worth supporting strp in dwo files. I doubt
either Clang or GCC implement it/ever do that anyway, so probably easy
enough to remove that one sentence from the spec that allows it (& add some
wording to clarify that it's disallowed).

On Wed, May 20, 2020 at 8:26 AM Igor Kudrin via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> Hi all,
>
> It looks like there is an ambiguity in the DWARF standard concerning using
> STRP forms in DWO files.
>
> On the one hand, they are (conditionally) allowed:
> * Section B.2, fig. B.2, p. 278. There are other arcs to ".debug_str.dwo"
> apart from an arc from ".debug_str_offsets.dwo".
> * Section B.2, p. 279, l. 27-29: "(do) .debug_info.dwo to .debug_str.dwo.
> Attribute values of class string may have form DW_FORM_strp, whose value is
> an offset in the .debug_str.dwo section of the corresponding string."
> * Section F.1, p. 393, l. 1-3: "In a .dwo file, referring to a string
> using DW_FORM_strp is valid, but such use results in a file that cannot be
> incorporated into a package file (which involves string merging)."
>
> On the other hand, they are prohibited:
> * Section F.2.3, p. 403, l. 4-6: "In a split DWARF object file, all
> references to strings go through this table (there are no other offsets to
> .debug_str.dwo in a split DWARF object file). That is, there is no use of
> DW_FORM_strp in a split DWARF object file."
> * Section F.3, p. 409, l. 19-22: "Because all references to these strings
> use form DW_FORM_strx, the packaging utility only needs to adjust the
> string offsets in each .debug_str_offsets.dwo contribution after building
> the new .debug_str.dwo section."
>
> All these excerpts are from informative parts. I cannot find any direct
> allowance or prohibition in normative sections. There is a very indirect
> restriction in section 7.3.5, p. 190, l. 24-26: "The string table section
> in .debug_str.dwo contains all the strings referenced from DWARF attributes
> using any of the forms DW_FORM_strx, DW_FORM_strx1, DW_FORM_strx2,
> DW_FORM_strx3 or DW_FORM_strx4." Note that this excerpt does not say
> anything about referencing strings from a .debug_macro.dwo section.
>
> If STRP forms are allowed only in DWO files which cannot be combined into
> a DWP file, then a packaging utility should be smart enough to detect such
> input files and reject them. As there is no simple sign for that, the tool
> should analyze sections in input files to check if those forms are actually
> used; that parsing will slow down the processing, but it seems inevitable
> if we want the tool to be reliable.
>
> I would like to discuss possible ways to avoid that ambiguity in the
> standard. I see the following variants:
>
> 1. Prohibit STRP forms in all DWO files. That would probably be the
> simplest solution, but maybe it is too restrictive.
> 2. Allow only STRP or STRX forms in a DWO file, but not both. In that
> case, a packaging tool can use the absence of a .debug_str_offsets.dwo
> section with a non-empty .desug_str.dwo section as a sign that the input
> file uses STRP forms and easily reject it.
> 3. Allow STRP forms in all DWO files. Consequently, allow STRP forms in
> DWP files. A packaging tool is expected to be smart enough to merge strings
> even from such input files and update references in the corresponding
> sections, not only in .debug_str_offsets.dwo.
>
> Any thoughts?
>
> Best Regards,
> Igor Kudrin
> C++ Developer, Access Softek, Inc.
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Discrepancy Between Implementation and Spec in Template Types

2020-04-10 Thread David Blaikie via Dwarf-Discuss
On Fri, Apr 10, 2020 at 4:20 PM Jay Kamat  wrote:

> Ah, I see, that makes a lot of sense. However, I have a couple questions:
>
> > The DW_AT_type of v1 and the DW_AT_type of t2::m1 would need to
> point
> > to the same DIE, otherwise there would be much confusion about these
> being
> > different types, but being the same type in the DWARF
>
> Would a level of indirection here cause that much confusion? Consumers
> could
> effectively treat it like a typedef, if they wanted to get the concrete
> type
> they could just follow the template type. I'm guessing this probably
> conflicts
> with a lot of existing software and style though.
>
> So in this example:
>
> v1 -> [struct t1]
> m1 -> T2 -> [struct t1]
>

m1 isn't of type T2 though, it's of type t1 - so, mechanically/in more
literal detail, what would that look like in the DWARF?

I think it'd involve introducing a new concept - well, maybe we could use
alias templates as a proxy? but it'd be weird, right - the type DIE
describing t1 has to be able to refer to T2, the template parameter,
but it should probably be scoped wherever t1 is scoped, I guess? it's going
to get pretty weird/unweildy, I'd think, say something like this:

DW_TAG_compile_unit
  DW_TAG_structure_type // DIE X
DW_AT_name "t1"
DW_TAG_template_parameter
  DW_AT_name "T1"
  DW_AT_type "int"
  DW_TAG_structure_type
DW_AT_name "t2"
DW_TAG_template_parameter // DIE Y
  DW_AT_name "T2"
  DW_AT_type "int"
DW_AT_member
  DW_AT_name "m1"
  DW_AT_type // -> DIE Z
  DW_TAG_structure_type // DIE Z
DW_AT_specification // -> DIE X
DW_AT_declaration true // maybe? Not sure
DW_AT_name t1
DW_TAG_template_parameter
  DW_AT_type  // -> DIE Y

That seems a bit unwieldy - and unclear /exactly/ how it'd work, so far as
I can tell.


> For software consuming dwarf right now, can we assume that the type
> referenced
> is always the concrete type, or should we account for the indirection that
> is
> described in the spec? Ideally consumers should be able to handle both,
> but
> not having an impelmentation makes things a little more difficult.
>

I think it's probably reasonable for an implementation to support this in
the ways that are simple/easy (when a template parameter isn't used as a
template argument - but just directly like for a member in your original
example) & probably not super hard to support? I could see how an
implementation might do that for a language that doesn't have the
complicated cases C++ does, maybe? Not sure.


> Lastly, would it be possible to modify the example in future versions of
> the
> spec to limit confusion, or at least add a note saying that that behavior
> is
> optional?
>

Seems reasonable - I don't know much about how the process of changing the
DWARF spec works. I'm a bit more of an implementor/spectator to the spec
process.


>
> Thanks,
> -Jay
>
> David Blaikie writes:
>
> > "quality of implementation" thing - but in general, even if a few bugs
> were
> > fixed/improvements were made to both Clang and GCC, it's going to be
> > hard/impossible to track certain things through templates in DWARF - for
> > similar reasons that it's hard to provide diagnostic messages that
> describe
> > types in the way the user wrote them (not impossible in a few cases -
> Clang
> > and GCC are getting better at saying "std::string" often instead of
> > "std::basic_string<...>" for instance)
> >
> > For instance - if you had a member that was an instantiation of another
> > template:
> >
> > template struct t1 { };
> > template struct t2 { t1 m1; };
> > t1 v1;
> > t2 v2;
> >
> > The DW_AT_type of v1 and the DW_AT_type of t2::m1 would need to
> point
> > to the same DIE, otherwise there would be much confusion about these
> being
> > different types, but being the same type in the DWARF - so that type
> > description can't mention T2 (because T2 has nothing to do with t1)
> so
> > there's no way to describe that use of T2, for instance.
> >
> > Basically: Due to necessary canonicalization, this isn't doable in
> general,
> > so compilers don't bother doing it at all - roughly?
> >
> > On Thu, Apr 9, 2020 at 10:37 AM Jay Kamat via Dwarf-Discuss <
> > dwarf-discuss@lists.dwarfstd.org> wrote:
> >
> >> I wasn't on the list when I originally sent this message, and it didn't
> >> show
> >> up in the archive, so I'm sending it again. Sorry if there's
> duplication:
> >>
> >> Hi!
> >>
> >> I'm currently working on a debugger which consumes dwarf information
> >> and I noticed a possible discrepancy between output from popular
> >> compilers (gcc, clang) and the DWARF 5 and 4 spec.
> >>
> >> In section 'D.11 Template Example' for the given source:
> >>
> >> template
> >> struct wrapper {
> >> T comp;
> >> };
> >> wrapper obj;
> >>
> >> It says:
> >>
> >> The actual type of the component comp is int, but in the DWARF the
> >> type references the DW_TAG_template_type_parameter for T, which in
> >> turn references int. This 

  1   2   >