Re: [Dwarf-discuss] Proposal: Describe prologue and epilogue ranges

2024-03-19 Thread Robinson, Paul via Dwarf-discuss


Andrew Cagney wrote:

> > A single location description (which can be either simple or composite
> > location descriptions) has the lifetime of its closest containing scope.
> > The case we care about here is when that scope is a subprogram, and
> > therefore the lifetime spans the entire subprogram. Pedantically, that
> > lifetime includes prologue and epilogue ranges.
> >
> > It is common practice for unoptimized code to allocate local variables
> > to a stack frame, and use that stack location in the single location
> > description. Because the stack frame is not necessarily in a valid state
> > during prologue or epilogue code, in practice, debuggers typically
> assume
> > that a single location description is not valid during a prologue or
> > epilogue, although the DWARF spec does not explicitly say so (AFAIK).
> 
> Does this problem extend to instructions within a statement where a
> simple location can also be invalid?  For instance, given:
> 
> load r1 from i# i++
> inc r1
>  -> store r1 in i
> 
> an attempt to modify "i" would be trashed when the store instruction is
> executed
> 
> I'm not sure if this should be mentioned in the standard though.
> Perhaps this is covered by "... and it does not move during its
> lifetime."

I don't see this case as any different from any other assignment.
"i" hasn't moved, it has been copied in order to do some computation.
The assignment doesn't actually occur until the store is executed.
In typical unoptimized code, you wouldn't stop between the "inc"
and the "store."

--paulr
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


[Dwarf-discuss] Proposal: Describe prologue and epilogue ranges

2024-03-18 Thread Robinson, Paul via Dwarf-discuss
After today's call, hearing some viewpoints and hopefully learning a
few things, I thought I'd take a stab at reframing 240108.1. (Without
once mentioning CFI!) It ended up becoming an alternative proposal,
but I'm fine with Zoran taking it over if he wants to.

# Describe prologue and epilogue ranges

## Background

### Stopping Points

Ordinarily, a source-level debugger will prefer to pause execution of a
program at instructions identified by the compiler as good places to do
so. These include instructions flagged as `is_stmt`, `prologue_end`, or
`epilogue_begin`. A user expects debug info such as source coordinates
and variable locations to be sensible and useful at those points.

It is entirely possible for execution to pause at other instructions.
There are a number of possible reasons for this.

- The user has chosen to single-step instructions rather than statements.
- The user has requested a breakpoint at a specific instruction that
happens not to have any of the above flags.
- An asynchronous exception has occurred and the debugger intercepted it.
- The program has crashed and the user is looking at a core dump.

This list is not exhaustive.

Let's call the instruction where a debugger has paused execution (or
the instruction where a crash was triggered) a "stopping point."

### Prologue/Epilogue Ranges

In DWARF v3 thru v5, a subprogram's prologue(s) and epilogue(s) are
described indirectly by the line table. A prologue generally consists
of all instructions from an entry point up to the first executed
instruction that is flagged as `prologue_end`. An epilogue generally
consists of all instructions from an instruction flagged as
`epilogue_begin` to where the subprogram returns to its caller. These
groups of instructions implicitly form ranges. (These ranges might be
empty.)

A subprogram might have multiple prologues if it has multiple entry
points; more often, it might have multiple epilogues if it has multiple
exit or return points. In particular, when there are multiple epilogues
it is not necessarily clear when an epilogue ends and the next basic
block (which might not be part of any epilogue) begins. (Even in the
case of a single epilogue, a cold but functional basic block might be
placed after the epilogue.)

Due to optimization, prologue or epilogue instructions might be mixed
with other instructions, so in practice prologue and epilogue ranges
might not be contiguous. DWARF does not have a way to describe these
non-contiguous prologue and epilogue ranges. Compilers typically have
various heuristics to pick stopping points for optimized prologue and
epilogue ranges.

### Single Location Descriptions

A single location description (which can be either simple or composite
location descriptions) has the lifetime of its closest containing scope.
The case we care about here is when that scope is a subprogram, and
therefore the lifetime spans the entire subprogram. Pedantically, that
lifetime includes prologue and epilogue ranges.

It is common practice for unoptimized code to allocate local variables
to a stack frame, and use that stack location in the single location
description. Because the stack frame is not necessarily in a valid state
during prologue or epilogue code, in practice, debuggers typically assume
that a single location description is not valid during a prologue or
epilogue, although the DWARF spec does not explicitly say so (AFAIK).

## Overview

A stopping point might occur during a prologue or epilogue range, which
means single location descriptions for subprogram-scope objects might
not be valid.

- It would be good if the DWARF spec actually said single location
descriptions were not necessarily valid in those ranges. This is simply
codifying existing practice.
- It would be good if debuggers could reliably identify prologue and
epilogue ranges.

The proposal adds text that excludes prologues and epilogues from the
implicit range of a subprogram-scope object, and adds a register to the
line-table state machine to identify prologues and epilogues.

Unlike `prologue_end` and `epilogue_begin`, the new `prologue_epilogue`
register is "sticky" in that it is not automatically reset on every
row of the line table. At an entry point, it must be set explicitly to
indicate the beginning of a prologue; it is automatically reset by the
DW_LNS_set_prologue_end. In an epilogue, it is automatically set by
DW_LNS_set_epilogue_begin, and reset by DW_LNE_end_sequence. This means
a function with one contiguous prologue and one contiguous epilogue,
terminated by `end_sequence`, the line-number program needs only one
new opcode to support `prologue_epilogue`.

Note: I have not tried to determine whether this minimizes size in
practice. It might be that prologues and/or epilogues typically occupy
only one row of the line table, in which case having the flag reset on
every row might take up less space.

## Proposed Changes

In Section 2.6 "Location Descriptions" modify the last sentence of
item 

[Dwarf-discuss] Update 240118.1: Allow padding in all tables

2024-02-05 Thread Robinson, Paul via Dwarf-discuss
Per comments on the list, revise the proposed non-normative paragraph for 
.debug_abbrev as follows.

### .debug_abbrev

In Section 7.5.3 "Abbreviations Tables" (p.207), at the end of the
section, add a new non-normative paragraph:

*An abbreviations table may be padded or aligned by adding 0 bytes
at the end.*


> -Original Message-
> From: Dwarf-discuss  bounces+paul.robinson=sony@lists.dwarfstd.org> On Behalf Of Robinson,
> Paul via Dwarf-discuss
> Sent: Thursday, January 18, 2024 2:08 PM
> To: dwarf-discuss@lists.dwarfstd.org
> Subject: [Dwarf-discuss] Proposal: Allow padding in all tables
> 
> # Allow padding in all tables
> 
> Enhancement; multiple sections.
> 
> ## Background
> 
> Issue 230329.1 requires all tables to be contiguous. During the discussion
> of that issue, the question came up of whether all tables allowed padding,
> so that contiguous concatenated contributions could be aligned reasonably.
> This is the result of my research.
> 
> ## Overview
> 
> The set of tables (merging the two tables from 230329.1) is as follows:
> 
> - .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
> - .debug_aranges (Section 6.1.2)
> - .debug_addr (Section 7.27)
> - .debug_frame (Section 6.4.1)
> - .debug_info / .debug_info.dwo (Section 7.5.1)
> - .debug_line / .debug_line.dwo  (Section 6.2.4)
> - .debug_line_str
> - .debug_loclists / .debug_loclists.dwo (Section 7.29)
> - .debug_macro / .debug_macro.dwo (Section 6.3.1)
> - .debug_names (Section 6.1.1)
> - .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
> - .debug_str / .debug_str.dwo
> - .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)
> 
> ### .debug_abbrev
> 
> Entries have arbitrary size. Can be padded by adding an unused abbrev
> entry. Proposing a non-normative paragraph describing this.
> 
> ### .debug_aranges
> 
> Removed by 220724.1.
> 
> ### .debug_addr
> 
> Entries have a size of (segment_selector_size + address_size) and don't
> explicitly provide a padding mechanism. Adding unused entries at the end
> of the table should suffice. Proposing a non-normative paragraph
> describing this.
> 
> ### .debug_frame
> 
> Already permits padding by use of DW_CFA_nop.
> 
> ### .debug_info
> 
> Already permits padding by use of the abbreviation code 0 (see Section
> 7.5.2).
> 
> ### .debug_line
> 
> Already has DW_LNE_padding.
> 
> ### .debug_line_str
> 
> This is a string section and does not need padding (typically would be
> merged, not concatenated).
> 
> ### .debug_loclists
> 
> Already permits padding by use of repeated DW_LLE_end_of_list, with a non-
> normative comment to that effect.
> 
> ### .debug_macro
> 
> This has no unit_length and no explicit provision for padding. One could
> insert unused opcodes into the opcode_operands_table but this seems like
> quite a hack. In keeping with other sections, I'm proposing a
> DW_MACRO_padding opcode.
> 
> ### .debug_names
> 
> Components are mostly 4- or 8-byte multiples, except for the abbreviation
> table. The abbreviation table explicitly permits padding (Section
> 6.1.1.4.7).
> 
> ### .debug_rnglists
> 
> Already permits padding by use of repeated DW_RLE_end_of_list, with a non-
> normative comment to that effect.
> 
> ### .debug_str
> 
> This is a string section and does not need padding (typically would be
> merged, not concatenated).
> 
> ### .debug_str_offsets
> 
> This has a header of 8 or 16 bytes, and entries of 4 or 8 bytes. This can
> still require padding if you want alignment greater than 4 bytes, and
> there is no explicit provision. Proposing a non-normative paragraph
> describing this.
> 
> ### Conclusion
> 
> Everything is already covered except .debug_abbrev, .debug_addr,
> .debug_str_offsets, and .debug_macro. The first three need non-normative
> notes describing how to pad the sections, and .debug_macro requires a new
> opcode to introduce padding cleanly.
> 
> ## Proposed Changes
> 
> I sorted these by affected section. In addition to the section-specific
> changes there is one general note.
> 
> ### .debug_abbrev
> 
> In Section 7.5.3 "Abbreviations Tables" (p.207), at the end of the
> section, add a new non-normative paragraph:
> 
> *This table may be padded by adding an unused abbreviation entry. The
> minimum number of bytes in an abbreviation entry is four (abbreviation
> number, child flag, and two 0 bytes indicating the end of the
> attribute/form pairs). This can be expanded by choosing a large
> abbreviation number with a longer LEB128 encoding, or adding non-zero
> attribute/form pairs.*
> 
> ### .debug_macro
> 
> Add new Section

Re: [Dwarf-discuss] Proposal: Allow padding in all tables

2024-01-31 Thread Robinson, Paul via Dwarf-discuss
This proposal is guidance for the producer, not the linker. The producer needs 
this guidance specifically because linkers don’t pad/align contributions.

I believe padding is rarely a functional requirement, and when it is, it’s not 
for alignment IME. This is where the line-table padding came from, allowing 
elbow room to replace a function’s line table without having to update 
references to other contributions. (Motivating examples include JIT 
(re-)compilation and incremental linking.)

Padding for alignment, which is generally for performance or convenience and 
which I have run into in past years (pre-LLVM), must not confuse dumpers (which 
would be inclined to interpret padding bytes as the next header); therefore the 
padding bytes have to be interpretable.

I think if we’re going to mention padding (which we already do in six of the 
ten non-string-section cases described below) we should be complete about it, 
hence this proposal. I’m not especially excited about the .debug_macro case, 
but as we failed to give that section a header with a length, we have to live 
with the consequences.

If you think padding should never be mentioned (and so anyone who feels moved 
to provide padding has to re-invent the wheel), feel free to write a 
counter-proposal removing the existing mentions.
--paulr

From: David Blaikie 
Sent: Tuesday, January 30, 2024 6:01 PM
To: Robinson, Paul 
Cc: dwarf-discuss@lists.dwarfstd.org
Subject: Re: [Dwarf-discuss] Proposal: Allow padding in all tables

Is anyone actually using this? In my experience linkers are generally 
concatenating these sections together with no extra padding/alignment.

I'd rather not spec something that's not used/needed. I'm happy for consumers 
to be improved in the face of degenerate entries that might be created for 
padding if developers of such consumers feel so inclined (though I'd probably 
push back a bit on it in the consumers I work on - in the absence of any 
evidence of particular need/use case).

On Thu, Jan 18, 2024 at 11:08 AM Robinson, Paul via Dwarf-discuss 
mailto:dwarf-discuss@lists.dwarfstd.org>> 
wrote:
# Allow padding in all tables

Enhancement; multiple sections.

## Background

Issue 230329.1 requires all tables to be contiguous. During the discussion of 
that issue, the question came up of whether all tables allowed padding, so that 
contiguous concatenated contributions could be aligned reasonably. This is the 
result of my research.

## Overview

The set of tables (merging the two tables from 230329.1) is as follows:

- .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
- .debug_aranges (Section 6.1.2)
- .debug_addr (Section 7.27)
- .debug_frame (Section 6.4.1)
- .debug_info / .debug_info.dwo (Section 7.5.1)
- .debug_line / .debug_line.dwo  (Section 6.2.4)
- .debug_line_str
- .debug_loclists / .debug_loclists.dwo (Section 7.29)
- .debug_macro / .debug_macro.dwo (Section 6.3.1)
- .debug_names (Section 6.1.1)
- .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
- .debug_str / .debug_str.dwo
- .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)

### .debug_abbrev

Entries have arbitrary size. Can be padded by adding an unused abbrev entry. 
Proposing a non-normative paragraph describing this.

### .debug_aranges

Removed by 220724.1.

### .debug_addr

Entries have a size of (segment_selector_size + address_size) and don't 
explicitly provide a padding mechanism. Adding unused entries at the end of the 
table should suffice. Proposing a non-normative paragraph describing this.

### .debug_frame

Already permits padding by use of DW_CFA_nop.

### .debug_info

Already permits padding by use of the abbreviation code 0 (see Section 7.5.2).

### .debug_line

Already has DW_LNE_padding.

### .debug_line_str

This is a string section and does not need padding (typically would be merged, 
not concatenated).

### .debug_loclists

Already permits padding by use of repeated DW_LLE_end_of_list, with a 
non-normative comment to that effect.

### .debug_macro

This has no unit_length and no explicit provision for padding. One could insert 
unused opcodes into the opcode_operands_table but this seems like quite a hack. 
In keeping with other sections, I'm proposing a DW_MACRO_padding opcode.

### .debug_names

Components are mostly 4- or 8-byte multiples, except for the abbreviation 
table. The abbreviation table explicitly permits padding (Section 6.1.1.4.7).

### .debug_rnglists

Already permits padding by use of repeated DW_RLE_end_of_list, with a 
non-normative comment to that effect.

### .debug_str

This is a string section and does not need padding (typically would be merged, 
not concatenated).

### .debug_str_offsets

This has a header of 8 or 16 bytes, and entries of 4 or 8 bytes. This can still 
require padding if you want alignment greater than 4 bytes, and there is no 
explicit provision. Proposing a non-normative paragraph describing this.

### Conclusion

Everything is already covered except .debug_abbrev, .debu

Re: [Dwarf-discuss] Proposal: Allow padding in all tables

2024-01-19 Thread Robinson, Paul via Dwarf-discuss
> > ### .debug_abbrev
> >
> > In Section 7.5.3 "Abbreviations Tables" (p.207), at the end of the
> section, add a new non-normative paragraph:
> >
> > *This table may be padded by adding an unused abbreviation entry. The
> minimum number of bytes in an abbreviation entry is four (abbreviation
> number, child flag, and two 0 bytes indicating the end of the
> attribute/form pairs). This can be expanded by choosing a large
> abbreviation number with a longer LEB128 encoding, or adding non-zero
> attribute/form pairs.*
> 
> Couldn't the abbrev table simply be padded with 0 bytes?

Hmmm... that would appear to a dumper as a series of zero-length tables,
I suppose? Would look funny in a dump but it could work. And would be a
lot simpler for the producer of course.
--paulr
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


[Dwarf-discuss] Proposal: Allow padding in all tables

2024-01-18 Thread Robinson, Paul via Dwarf-discuss
# Allow padding in all tables

Enhancement; multiple sections.

## Background

Issue 230329.1 requires all tables to be contiguous. During the discussion of 
that issue, the question came up of whether all tables allowed padding, so that 
contiguous concatenated contributions could be aligned reasonably. This is the 
result of my research.

## Overview

The set of tables (merging the two tables from 230329.1) is as follows:

- .debug_abbrev / .debug_abbrev.dwo (Section 7.5.3)
- .debug_aranges (Section 6.1.2)
- .debug_addr (Section 7.27)
- .debug_frame (Section 6.4.1)
- .debug_info / .debug_info.dwo (Section 7.5.1)
- .debug_line / .debug_line.dwo  (Section 6.2.4)
- .debug_line_str
- .debug_loclists / .debug_loclists.dwo (Section 7.29)
- .debug_macro / .debug_macro.dwo (Section 6.3.1)
- .debug_names (Section 6.1.1)
- .debug_rnglists / .debug_rnglists.dwo (Section 7.28)
- .debug_str / .debug_str.dwo
- .debug_str_offsets / .debug_str_offsets.dwo (Section 7.26)

### .debug_abbrev

Entries have arbitrary size. Can be padded by adding an unused abbrev entry. 
Proposing a non-normative paragraph describing this.

### .debug_aranges

Removed by 220724.1.

### .debug_addr

Entries have a size of (segment_selector_size + address_size) and don't 
explicitly provide a padding mechanism. Adding unused entries at the end of the 
table should suffice. Proposing a non-normative paragraph describing this.

### .debug_frame

Already permits padding by use of DW_CFA_nop.

### .debug_info

Already permits padding by use of the abbreviation code 0 (see Section 7.5.2).

### .debug_line

Already has DW_LNE_padding.

### .debug_line_str

This is a string section and does not need padding (typically would be merged, 
not concatenated).

### .debug_loclists

Already permits padding by use of repeated DW_LLE_end_of_list, with a 
non-normative comment to that effect.

### .debug_macro

This has no unit_length and no explicit provision for padding. One could insert 
unused opcodes into the opcode_operands_table but this seems like quite a hack. 
In keeping with other sections, I'm proposing a DW_MACRO_padding opcode.

### .debug_names

Components are mostly 4- or 8-byte multiples, except for the abbreviation 
table. The abbreviation table explicitly permits padding (Section 6.1.1.4.7).

### .debug_rnglists

Already permits padding by use of repeated DW_RLE_end_of_list, with a 
non-normative comment to that effect.

### .debug_str

This is a string section and does not need padding (typically would be merged, 
not concatenated).

### .debug_str_offsets

This has a header of 8 or 16 bytes, and entries of 4 or 8 bytes. This can still 
require padding if you want alignment greater than 4 bytes, and there is no 
explicit provision. Proposing a non-normative paragraph describing this.

### Conclusion

Everything is already covered except .debug_abbrev, .debug_addr, 
.debug_str_offsets, and .debug_macro. The first three need non-normative notes 
describing how to pad the sections, and .debug_macro requires a new opcode to 
introduce padding cleanly.

## Proposed Changes

I sorted these by affected section. In addition to the section-specific changes 
there is one general note.

### .debug_abbrev

In Section 7.5.3 "Abbreviations Tables" (p.207), at the end of the section, add 
a new non-normative paragraph:

*This table may be padded by adding an unused abbreviation entry. The minimum 
number of bytes in an abbreviation entry is four (abbreviation number, child 
flag, and two 0 bytes indicating the end of the attribute/form pairs). This can 
be expanded by choosing a large abbreviation number with a longer LEB128 
encoding, or adding non-zero attribute/form pairs.*

### .debug_macro

Add new Section 6.3.4 "Other Entries" (~ p.170) as follows:

1. DW_MACRO_padding
   The DW_MACRO_padding opcode takes two operands, a byte count and a sequence
   of arbitrary bytes. The byte count is an unsigned LEB128 encoded number and
   does not include the size of the opcode or the byte count operand. The opcode
   and operands have no effect on the macro information.

   *This permits a producer to pad the macro information with a minimum of two 
bytes.*

### .debug_str_offsets

In Section 7.26 "String Offsets Table" (p.241), at the end of the section, add 
a new non-normative paragraph:

*This table may be padded with unused entries to fill out the table to some 
desired alignment. These entries should have all 1 bits as a hint that the 
entries are unused.*

### .debug_addr

In Section 7.27 "Address Table" (p.241), at the end of the section, add a new 
non-normative paragraph:

*This table may be padded with unused entries to fill out the table to some 
desired alignment. These entries should have all 1 bits as a hint that the 
entries are unused.*

### General

In Section 7.34 "Contiguous Tables" (added by issue 230329.1), at the end of 
the section, add a new non-normative paragraph:

*Every table of information has a way for the table as a whole to be 

Re: [Dwarf-discuss] Question: ETA?

2023-11-13 Thread Robinson, Paul via Dwarf-discuss
Speaking only for myself: Questions about ETA seem reasonable, as the interval 
between v4 and v5 was 6 years 8 months, and it has already been 6 years 9 
months since v5 was published. That said, the committee has never worked to a 
specific timeline.

There is indeed a fair amount of work left to be done by the committee, some of 
which has had side discussions but not yet been formally proposed. My 
impression (I haven’t tried to verify this) is that the committee took longer 
than usual to get started on this round. Also we spent a fair amount of time on 
organizational issues, which obviously would detract from time spent on 
technical issues. The “change of administration” didn’t help either. But I 
think we are back in the groove.

Regarding time commitment, we meet one hour every other week, which is not 
significantly different from the two hours per month that we met during 
consideration of the previous two versions. On the other hand, the committee is 
noticeably larger than it used to be, which can mean that discussions take 
longer. Perhaps we should increase the meeting time to get through the backlog 
more efficiently, and make up for lost time.

--paulr

From: Dwarf-discuss 
 On Behalf Of 
Ben Woodard via Dwarf-discuss
Sent: Sunday, November 12, 2023 10:31 AM
To: Eleanor Bartle 
Cc: dwarf-discuss@lists.dwarfstd.org
Subject: Re: [Dwarf-discuss] Question: ETA?

I’ve asked this question personally many times directly to members of the 
executive committee. The overall answer seems to be “when we are done”. The 
thing is, there are quite a few proposals sitting in the DWARF issue queue that 
have yet to be discussed AT ALL in the official DWARF committee meeting and the 
current meeting is only one hour every other week. Plus, there are a rather 
large number of additional proposals which are quite extensive which are still 
being discussed outside of the DWARF committee meeting and haven’t yet made it 
to the DWARF issue queue. E.g. https://github.com/ccoutant/dwarf-locations 
  which is the standardization 
effort for 
https://www.llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html 
 
 I also have 3 more that I’m incubating which haven’t seen the light of day yet 
because of some of this other work.

Anyway, before the change in administration, it seemed like we were rushing to 
get DWARF6 out the door with just minor corrections and revisions. Now, it 
seems like DWARF6 is going to have many more significant changes in it and it 
is going to take a while. Personally, I’m quite glad for this because I feel as 
though a lot more work needs to be done. Before the change in administration, I 
felt DWARF6 was being rushed. I would say check in again in 6 months and see 
where we are then.

In the mean time, there is a 
https://snapshots.sourceware.org/dwarfstd/dwarf-spec/
 which is the current working draft and keep an eye on the Issue queue.

-ben


On Nov 12, 2023, at 1:49 AM, Eleanor Bartle via Dwarf-discuss 
mailto:dwarf-discuss@lists.dwarfstd.org>> 
wrote:

Is there any plan for a time to release version 6? If not a time, then a 
condition? Say "2025" or "some time in the next year" or "when no new proposals 
are accepted for three months" or "when two independent implementations are 
fully compliant".
--
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Do Dwarf symbols only use ascii?

2023-11-02 Thread Robinson, Paul via Dwarf-discuss
DWARF strongly recommends UTF-8 in all cases, and there's an attribute on the 
compile unit that allows the producer to claim it uses UTF-8. But, whether the 
producer actually uses UTF-8 or something else is up to the individual producer 
(usually the compiler).
--paulr

From: Dwarf-discuss 
 On Behalf Of 
Roger Phillips via Dwarf-discuss
Sent: Thursday, November 2, 2023 6:30 AM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-discuss] Do Dwarf symbols only use ascii?

Greetings,

I'm currently trying to debug a problem in the dynamorio system where the 
isdigit function crashes in elftoolchain while trying to parse symbols from 
dwarf info:

https://github.com/DynamoRIO/dynamorio/issues/6161

My question is whether these symbols really need the locale functionality of 
libc's isdigit function or if the symbols in Dwarf are just standard ascii and 
could be parsed in a portable way with the simple method mentioned there.

Regards.
[https://opengraph.githubassets.com/87058afdf67e2f77d78a5ad6075a876f176e4e695369461f8ef6bb131b46286e/DynamoRIO/dynamorio/issues/6161]
CRASH Segfault with DrMemory * Issue #6161 * 
DynamoRIO/dynamorio
Received SIGSEGV at client library pc 0x7f36e6555fa9 in thread 15974 Base: 
0x7f36ff0c5000 Registers:eax=0x0033 ebx=0x7f34e72ed298 
ecx=0x0050 edx=0x00...
github.com

-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] DW_AT_frame_base

2023-09-18 Thread Robinson, Paul via Dwarf-discuss
A "location description [that] is a register operation" is the language in 
DWARF v3; in later versions, it is "a simple register location description." 
This means something like DW_OP_reg5, which is allowed in a location 
description but not in a DWARF expression.

Form DW_FORM_data4, value 0, would be interpreted as a location list reference. 
It is not by itself an address or a register location description.
--paulr

From: Dwarf-discuss 
 On Behalf Of 
Vsevolod Alekseyev via Dwarf-discuss
Sent: Monday, September 18, 2023 10:08 AM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-discuss] DW_AT_frame_base

Please help me interpret the DWARF spec regarding the DW_AT_frame_base. Quoted 
by spec v3, section 3.5.5, but similar wording in v4 and v5:

"A subroutine or entry point entry may also have a DW_AT_frame_base attribute, 
whose value is a location description that computes the "frame base" for the 
subroutine or entry point. If the location description is a register operation, 
the given register contains the frame base address. If the location description 
is a DWARF expression, the result of evaluating that expression is the frame 
base address. Finally, for a location list, this interpretation applies to each 
location expression contained in the list of location list entries."

So what does "location description that is a register operation" mean here? 
Since the option of a DWARF expression block is covered by the second option, 
that rather suggests to me that "register operation" is not a DWARF operation. 
I mean, the wording "if A is X, then Y. If A is P, then Q" usually means that P 
is distinct from X, right?

On a more practical note, I'm currently staring at a crash report with a DWARF 
attribute parsing failure. DWARF v3 Linux ELF binary, produced by NASM. 
DW_AT_frame_base, form DW_FORM_data4, value 0. The code assumes it's a loclist 
pointer, but the binary doesn't contain a loclist section.

Zero as a loclist pointer in a v3 binary could make sense. As a "register 
operation" - I'm not sure. Could be a compiler quirk, but I'm admitting the 
possibility that I'm misreading something.



-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] Ranges for DW_TAG_namespace

2023-09-14 Thread Robinson, Paul via Dwarf-discuss
I suppose it didn't seem useful to provide ranges on namespaces. A C++ 
namespace isn't a program entity of its own, it's a way of managing names of 
entities. It doesn't even restrict the scope of those names; you can refer to 
them anywhere if you use the fully qualified version of the name. (With the 
obvious caveat about names defined in anonymous namespaces.)

Did you have a reason for considering a namespace to be a program entity? What 
would that entity do?
--paulr

From: Dwarf-discuss 
 On Behalf Of 
rifkin.jer--- via Dwarf-discuss
Sent: Thursday, September 14, 2023 6:51 PM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-discuss] Ranges for DW_TAG_namespace

Hello,
What is the reasoning for not including range information on DW_TAG_namespace 
DIEs? Is there a canonical way to check if a DW_TAG_namespace DIE contains a 
given address?

Thank you,
Jeremy
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


[Dwarf-discuss] Proposal: Error: DW_OP_entry_value description and examples

2023-08-08 Thread Robinson, Paul via Dwarf-discuss
# Error: DW_OP_entry_value description and examples

## Overview

DW_OP_entry_value provides a way to compute a value as if the value
had been computed on entry to the current subprogram. Its operand is
either a DWARF expression, which assumes nothing is already on the
DWARF stack and leaves one value on the DWARF stack; or, it is a
register location description, and the content of the register (as it
was on entry to the subprogram) is implicitly read and pushed on the
DWARF stack. The register location description is simply a more compact 
representation of the common case of reading a register; for example, 
the producer can emit "DW_OP_reg1" instead of "DW_OP_breg1 0" with the 
same result.

However, the description and examples aren't completely consistent, and
in some cases are wrong. This proposal corrects the language and examples.

## Proposed Changes

### Section 2.5.1.7 Special Operations, p.37

The first sentence of the description of DW_OP_entry_value reads:

The DW_OP_entry_value operation pushes the value that the described
location held upon entering the current subprogram.

A DWARF expression does not describe a location, so this should read:

The DW_OP_entry_value operation evaluates an expression or register
location description as if it had been evaluated upon entering the
current subprogram, and pushes the value of the expression or content
of the register, respectively.

### Appendix D.1.3, pp.291-292

There are six examples of DW_OP_entry_value, some of which are not
consistent with the description and some of which are just plain wrong.
The six examples are:

DW_OP_entry_value 2 DW_OP_breg1 0
DW_OP_entry_value 1 DW_OP_reg1
DW_OP_entry_value 2 DW_OP_breg1 0 DW_OP_stack_value
DW_OP_entry_value 1 DW_OP_reg1 DW_OP_stack_value
DW_OP_entry_value 3 DW_OP_breg4 16 DW_OP_deref DW_OP_stack_value
DW_OP_entry_value 1 DW_OP_reg5 DW_OP_plus_uconst 16

The first two illustrate the smaller expression allowed by the special
case for a register location description; they are fine.

The third is just wrong and should be deleted. DW_OP_stack_value converts
a memory location description into a value, but what precedes it is not
a memory location description. Without the stack_value, it's identical to
the first example.

The fourth is just wrong and should be deleted. DW_OP_reg1 names a register
but pushes nothing on the stack, so DW_OP_stack_value is incorrect; and
without that, it's identical to the second example.

The fifth should not have DW_OP_stack_value at the end; the expression
already leaves a value on the stack.

The sixth is incorrectly using DW_OP_reg5, which is allowed in a location
description but not in a simple DWARF expression. The expression could be:
DW_OP_entry_value 2 DW_OP_breg5 16

The textual description of the sixth example should also be revised.
Currently it reads:
The address of the memory location is calculated by adding 16 to the
value contained in register 5 upon erringing the current subprogram.
Note that unlike the previous DW_OP_entry_value examples, this one
does not end with DW_OP_stack_value.

This should be changed to:
Add 16 to the value register 5 had upon entering the current subprogram
and push the result.

(The italicized text after the sixth example should be removed entirely;
there is no reason ever to use DW_OP_stack_value in these expressions.)

-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-discuss] ISSUE: tensor types. V3

2023-04-26 Thread Robinson, Paul via Dwarf-discuss
> > If it ever became necessary, you can always add a 2nd attribute for it.
> > As an example, in our Ada compiler decades ago, we did this for
> > DW_AT_artificial.  It's just a flag, so either present or not-present.
> > We added a 2nd DW_AT_artificial_kind with a whole bunch of different
> > enums for the various kinds our compiler generated.  The point is you
> > still can get there even if DW_AT_tensor is just a flag.
> 
> Totally, not opposed to that if that is the way that people want to
> handle it. My only (admittedly weak) argument against doing it that way
> is that there there will now be two attributes rather than one and the
> space that it takes up.

With DW_FORM_flag_present, there's no extra space taken up in the DIE.
--paulr

-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss


Re: [Dwarf-Discuss] CU-local types

2022-05-18 Thread Robinson, Paul via Dwarf-Discuss
> Looks like gdb and lldb both have issues with C++ local types (either
> types defined in anonymous namespaces, or otherwise localized - eg: a
> non-local template with a local type or variable in one of its
> parameters). 
> ...
> So... what could/should we do about this?

Do you have a strong argument for why these are not debugger bugs?
It sounds to me like gdb/lldb are handling anonymous namespaces
incorrectly, in effect treating their contents as global rather than 
CU-local.

--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] EXTERNAL: Corner-cases with bitfields

2022-05-09 Thread Robinson, Paul via Dwarf-Discuss
> Pedro Alves wrote:
> On 2022-05-09 16:48, Ron Brender via Dwarf-Discuss wrote:
> > So my suggestion is to file a bug report with CLANG, requesting they
> correct their DWARF output to reflect all details needed
> > by your language.
> 
> An issue here is that DWARF does say this, in (DWARF 5, 5.7.6 Data Member
> Entries, page 119):
> 
>  "If the size of a data member is not the same as the size of the type
> given for the
> 
> ^^
> ^^^
>  data member, the data member has either a DW_AT_byte_size or a
>  ^^^
>  DW_AT_bit_size attribute whose integer constant value (see Section 2.19
> on
>  page 55) is the amount of storage needed to hold the value of the data
> member."
> 
> Note the part I underlined.  In Lancelot's case, the size of the data
> member
> IS the same as the size of the type given for the data member.  So Clang
> could well pedantically
> claim that they _are_ following the spec.  Shouldn't the spec be clarified
> here?

What the spec says is that a producer isn't _required_ to emit the
DW_AT_bit_size attribute.  But, given that DWARF is a permissive
standard, the producer is certainly _allowed_ to emit the attribute.  
If this is a hint that the target debugger will understand, regarding
the ABI, it seems okay to me for the producer to do that.

> This then raises the question of whether a debugger can assume that the
> presence of a DW_AT_bit_size
> attribute indicates that the field is a bit field at the C/C++ source
> level.  GDB is assuming that
> today, as there's really no other way to tell, but I don't think the spec
> explicitly says so.

GDB is choosing to make that interpretation, which it's allowed to do.
The DWARF spec just doesn't promise that interpretation is correct.

You can propose to standardize that interpretation by filing an issue
with the DWARF committee at https://dwarfstd.org/Comment.php and it might
or might not become part of DWARF v6.  It might be tricky because you'd
be generalizing something very specific to your environment.

You can also, separately, try to get Clang to emit the DW_AT_bit_size
attribute in these cases for the AMDGPU target(s).  This seems more
likely to work, especially as there's an ABI requirement involved, and
(given that GDB makes this interpretation) I assume gcc already does this.

--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to generate DWARF info for a template alias to a raw pointer

2022-05-06 Thread Robinson, Paul via Dwarf-Discuss
> Could someone help to point out what kind of DWARF info should
> be generated for below c++ source? Thanks
> 
> ```
> template
> using ptr = T*;
> 
> ptr  abc;
> ```
> 
> We declare a template alias here, so we may generate
> `DW_TAG_template_type_parameter` like:
> 
> ```
> 0x0057:   DW_TAG_base_type
>     DW_AT_name  ("int")
>     DW_AT_byte_size  (0x04)
>     DW_AT_encoding   (DW_ATE_signed)
> 
> 0x005e:   DW_TAG_pointer_type
>     DW_AT_type    (0x0057 "int")
> 
> 0x0064:   DW_TAG_template_alias
>     DW_AT_name  ("ptr")
>     DW_AT_type    (0x005e "int *")
> 
> 0x0076: DW_TAG_template_type_parameter
>   DW_AT_name   ("T")
>   DW_AT_type  (0x0057 "int")
> 
> 0x007e:   DW_TAG_variable
>     DW_AT_name  ("abc")
>     DW_AT_type    (0x0064 "ptr")
> ```

This all looks okay to me, with DW_TAG_template_type_parameter
being a child of DW_TAG_template_alias.  There's an alias
named `ptr`, its formal parameter is `T`, its actual parameter
is `int`, and so the alias is a typedef of `int *`.

> ` DW_TAG_template_type_parameter ` should be for a notation to
> create a template base on another template, but as you can see
> the referred type 0x005e is not a template. What kind of
> DWARF info should we generate here? We should use 
> ` DW_TAG_typedef` instead of ` DW_TAG_template_type_parameter`
> for this special case?

DW_TAG_template_type_parameter is correctly describing the
parameter to the template.

There's no need for a DW_TAG_typedef, because DW_TAG_template_alias
is implicitly a typedef.

--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] How to interpret DW_AT_artificial tag?

2022-02-28 Thread Robinson, Paul via Dwarf-Discuss
DW_AT_artificial generally means the item is compiler-generated, or otherwise 
has no explicit representation in the source.
An artificial member in a structure takes up however much space it takes, just 
like any other member, and the compiler should have generated the correct 
offsets for the other members of the structure.  So, I’d expect the first 
non-artificial member to have offset 4 (or greater).  Whether the consumer (in 
this case, your application) has to compensate really depends on what the 
application is doing.
--paulr

From: Dwarf-Discuss  On Behalf Of Ron 
Louzon via Dwarf-Discuss
Sent: Monday, February 28, 2022 8:50 AM
To: Dwarf-Discuss@lists.dwarfstd.org
Subject: [Dwarf-Discuss] How to interpret DW_AT_artificial tag?

I have an application which uses DwarfLib to extract type information from 
debug executable images.  I have found in the DWARF data that some structure 
types have a "virtual" pointer added as their first member and this pointer's 
DIE contains the tag DW_AT_artificial=true.  How does that pointer member 
affect the offsets of the members that follow it in the structure.  Should this 
4-byte pointer be ignored or will its size cause the other structure members to 
be pushed out in memory by 4 bytes?

thanks,
ron
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF5 line table file numbering inconsistent

2020-10-15 Thread Robinson, Paul via Dwarf-Discuss
> On Thu, Oct 15, 2020 at 04:27:16PM +, Robinson, Paul wrote:
> > > Yes. Please do publish the document somewhere. It would be interesting
> > > to know exactly what is being said to be inconsistent. As far as I
> > > know the issue of the file index defaulting to one and not having a
> > > way to refer to index zero from an DIE attribute is
> > > inconvenient/inefficient (because you often end up duplicating the
> > > zero file entry), but not inconsistent. It is consistent with how
> > > DWARF4 line table file numbers are interpreted and I believe that is
> > > also how consumers do it.
> > >
> > The best place for a list/document would be on wiki.dwarfstd.org I
> think?
> 
> Yes, or simply this mailinglist so people can point to the archived
> discussion.
> 
> > The line table has allowed file 0 to mean the primary source file
> starting
> > in DWARF v4; DWARF v5 just made file 0 be explicit in the file table and
> > not implicitly a reference to the info in the compile_unit header.
> 
> That is not how I read DWARF v4, which said:
> 
>  The primary source file is described by an entry whose path name
>  exactly matches that given in the DW_AT_name attribute in the
>  compilation unit, and whose directory is understood to be given
>  by the implicit entry with index 0.
> 
>  The line number program assigns numbers to each of the file
>  entries in order, beginning with 1, and uses those numbers
>  instead of file names in the file register.
> 
> So it doesn't seem to say file 0 is allowed or has any implicit
> meaning.

Ah you're right, I was misreading the first of those paragraphs.

> Also the initial value of the file register is 1 so referring to the
> zero entry (which seems allowed in DWARFv5) is only possible when
> doing an explicit DW_LNS_set_file 0 first.
> 
> > It is
> > an oversight that we missed the reference regarding DW_AT_decl_file (and
> > presumably the other _file attributes, I haven't checked).
> 
> Both DWARF v4 and v5 say "The value 0 indicates that no source file
> has been specified." I assumed that was deliberate, but maybe it was
> an oversight. But given that both versions say the same I would avoid
> using zero to mean something different.

I'm sure it was deliberate to say that in DWARF v2, but I'm not sure
it was deliberate to leave the language in place in DWARF v5.

The intent was certainly for file 0 to mean the root source file, and
make it explicit in the line table; I don't recall any discussion
either way about DIE attributes but as they use the line table's
definitions, it's hard to imagine we wouldn't have wanted DW_AT_decl_file
= 0 to mean the root source file.  Because, of course, if we *don't* do
that, then the root source file has to occupy two entries in the file
table, which is wasteful.

--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF5 line table file numbering inconsistent

2020-10-15 Thread Robinson, Paul via Dwarf-Discuss
> On Thu, Oct 15, 2020 at 11:46:55AM -0400, Eric Christopher via Dwarf-
> Discuss wrote:
> > "This margin is too narrow to contain..." ;)
> >
> > I'd like to see the doc - it's easy to believe we've gotten something
> wrong
> > here.. Might be good to fix this as textual edits rather than waiting on
> a
> > full dwarf standard release because we're going to run into this a lot
> if
> > we can't get it sorted quickly as multiple producers all produce
> something
> > slightly different and incompatible.
> >
> > Thoughts?
> 
> Yes. Please do publish the document somewhere. It would be interesting
> to know exactly what is being said to be inconsistent. As far as I
> know the issue of the file index defaulting to one and not having a
> way to refer to index zero from an DIE attribute is
> inconvenient/inefficient (because you often end up duplicating the
> zero file entry), but not inconsistent. It is consistent with how
> DWARF4 line table file numbers are interpreted and I believe that is
> also how consumers do it.
> 
> Thanks,
> 
> Mark
> 

The best place for a list/document would be on wiki.dwarfstd.org I think?

The line table has allowed file 0 to mean the primary source file starting
in DWARF v4; DWARF v5 just made file 0 be explicit in the file table and
not implicitly a reference to the info in the compile_unit header.  It is
an oversight that we missed the reference regarding DW_AT_decl_file (and
presumably the other _file attributes, I haven't checked).

--paulr


___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread Robinson, Paul via Dwarf-Discuss
David Blaikie has brought this up with me (or in conversations that
I observed) a couple of times:

It's common to want to refer to a particular address plus an offset,
for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
block or inlined subprogram within another subprogram.  Generally
the only symbolic address available is the entry point of the
containing subprogram.  Back when addresses were held directly in 
the .debug_info section, the attributes would have relocations, the
offset would be encoded into the relocation and the linker would
just do the right thing.

With DWARF v5, we now have the .debug_addr section, which contains
the addresses to be fixed up by the linker.  But, we don't have a
way to specify an offset to add to an entry in the .debug_addr
section; instead, each unique addr+offset requires its own entry
in the .debug_addr table.  This consumes additional space, these
entries are generally not reusable, and it doesn't reduce the
overall number of relocations that the linker must process.

It's not feasible to define a new attribute for address+offset,
because an attribute has only one value, and the attribute would
have to specify both the .debug_addr index and the offset to add.
But, we could define an "indirect" entry in .debug_addr, and then
reference it with an attribute in the same way that we reference
any other .debug_addr entry.

An indirect entry would be the same size as all other entries in 
.debug_addr (i.e., the size of an address on the target).  The
upper half would be another index into .debug_addr and the lower
half would be the addend.  The consumer adds the addend to the
value from the entry specified by the "another index."

This solution doesn't save space in .debug_addr, but it does
reduce the number of relocations.  Ideally .debug_addr would
require only one relocation per function.

We can debate whether the addend should be signed or unsigned,
and whether the indirect entries should be a separate subtable,
but I wanted to float the idea here before I wrote it up as a
proposal.

Alternatively, the indirect sub-table could be encoded with
ULEB/SLEB pairs, but that makes it hard to find them by index.
They could be found by a direct reference, but that requires a
relocation from .debug_info to .debug_addr, so we haven't saved
any relocations that way.

If there are obvious flaws I can't see, or someone is inspired
to come up with another solution, please let me know!  Otherwise
I'll write it up as a formal proposal probably later this week.

Thanks,
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-07-16 Thread Robinson, Paul via Dwarf-Discuss
(resending, this time without dropping the list from the cc: grump grump)

> -Original Message-
> From: Dwarf-Discuss  On Behalf
> Of Michael Eager via Dwarf-Discuss
> Sent: Thursday, July 16, 2020 2:12 PM
> To: todd.al...@concurrent-rt.com; Metzger, Markus T
> 
> Cc: dwarf-discuss@lists.dwarfstd.org
> Subject: Re: [Dwarf-Discuss] modeling different address spaces
> 
> On 7/16/20 10:06 AM, Todd Allen via Dwarf-Discuss wrote:
> > Markus, Michael, David, Xing,
> >
> > I always assumed that the segment support in DWARF was meant to be more
> general,
> > and support architectures where there was no single flat memory, and so
> the
> > segments were necessary for memory accesses.  I personally have not
> dealt with
> > any architectures where DW_AT_segment came into play, though.
> 
> It is phrased in a way to make it less architecturally specific.  That's
> in keeping with our desire to prevent DWARF from including architecture
> specific specifications.  For example, we don't want to say "on ARM do
> this" but on "MIPS do that".  DWARF doesn't specify how the translation
> from segmented to linear addresses is done.

The example that most often comes up is Harvard architectures.  As it
happens, I think it's nearly always obvious from context whether a given
address is data-segment or code-segment.  The only time it's not, that I'm
aware of, is in the .debug_aranges section, where addresses are associated
with compile-units without any indication of whether they are code or data
addresses.  I've heard arguments that .debug_aranges should only have code
addresses in it, but I don't think that's what the spec says.
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-07-16 Thread Robinson, Paul via Dwarf-Discuss



> -Original Message-
> From: Dwarf-Discuss  On Behalf
> Of Michael Eager via Dwarf-Discuss
> Sent: Thursday, July 16, 2020 2:22 PM
> To: John DelSignore ; todd.allen@concurrent-
> rt.com; Metzger, Markus T 
> Cc: dwarf-discuss@lists.dwarfstd.org; Tye, Tony 
> Subject: Re: [Dwarf-Discuss] modeling different address spaces
> 
> On 7/16/20 10:26 AM, John DelSignore via Dwarf-Discuss wrote:
> > FYI, Tony Tye and his team at AMD created a DWARF Proposal for
> heterogeneous debugging, which is generally useful but required to debug
> optimized code for GPUs. It directly addresses the issue of how to model
> different address spaces and makes location descriptions first-class
> objects that can be push onto the evaluation stack.
> >
> >
> https://urldefense.com/v3/__https://llvm.org/docs/AMDGPUDwarfProposalForHe
> terogeneousDebugging.html__;!!JmoZiZGBv3RvKRSx!rcLFM5tXmxFw3UnRNhyWapnfPy9
> w2jREY_Id2g3BwR_gCVTPgXEAsLA2NKIn2q-qcA$
> >
> > AFAIK, these changes will be made to LLVM and there is interest in
> adding to the DWARF standard eventually.
> 
> As mentioned in the past, I would be pleased to see proposals submitted
> for these changes.
> 
> I would like to avoid the situation where we have an informal proposal
> lacking specific changes to the DWARF standard, matched with an
> implementation which claims to match the proposal.  That's the opposite
> of standardization.

I think the idea was, it would be implementation experience, which will
inform the proposal.  I've had "review this work" on my to-do list for
an embarrassingly long time now.  The initial problem is that it's a
huge chunk of stuff and is a big time investment to look at.
--paulr

> 
> --
> Michael Eager
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> https://urldefense.com/v3/__http://lists.dwarfstd.org/listinfo.cgi/dwarf-
> discuss-
> dwarfstd.org__;!!JmoZiZGBv3RvKRSx!rcLFM5tXmxFw3UnRNhyWapnfPy9w2jREY_Id2g3B
> wR_gCVTPgXEAsLA2NKK8z2h0fg$
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Selectively strip CUs from .debug_info?

2020-04-09 Thread Robinson, Paul via Dwarf-Discuss


> -Original Message-
> From: Greg Clayton 
> Sent: Thursday, April 9, 2020 6:32 PM
> To: Robinson, Paul 
> Cc: dwarf-discuss@lists.dwarfstd.org
> Subject: Re: [Dwarf-Discuss] Selectively strip CUs from .debug_info?
> 
> Not aware of any tools that can do this, but can't you just do this in
> your build?:
> 
> - link once with full .o files and save debug info off
> - strip debug info from all third party .o files and link again. You still
> get all debug info for any .o files you didn't strip

For sanity we'd need to verify the two links did produce exactly the same 
non-debug-info parts; but it seems likely the linker would be idempotent
enough for that to work.  Thanks!
--paulr

> 
> 
> 
> 
> > On Apr 9, 2020, at 2:50 PM, Robinson, Paul via Dwarf-Discuss  disc...@lists.dwarfstd.org> wrote:
> >
> > Does anyone know of a tool that can strip debug info for specified
> > CUs from an executable?  I'm not aware of a way to do this, but
> > there are many things I'm not aware of. 
> >
> > The use case is someone who wants to build the entire program
> > (which includes a number of 3rd-party libraries) with debug info,
> > so they'll have full symbols for crash dump analysis; but then
> > strip the debug info for those libraries, in order to speed up
> > debugger load time.  In this scenario, stripping the 3rd-party
> > code before linking isn't going to satisfy the crash dump analysis
> > requirement.
> >
> > I've also brought up split DWARF, but I'm not sure it fits the need.
> >
> > Thanks,
> > --paulr
> >
> > ___
> > Dwarf-Discuss mailing list
> > Dwarf-Discuss@lists.dwarfstd.org
> >
> https://urldefense.com/v3/__http://lists.dwarfstd.org/listinfo.cgi/dwarf-
> discuss-
> dwarfstd.org__;!!JmoZiZGBv3RvKRSx!uBfzpOrA5519OhhGkODt1sTUVyTTJkfTTpGqJ5W9
> rwHxBfo3p5NvGwPvlXXpa6OrdQ$

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] Selectively strip CUs from .debug_info?

2020-04-09 Thread Robinson, Paul via Dwarf-Discuss
Does anyone know of a tool that can strip debug info for specified
CUs from an executable?  I'm not aware of a way to do this, but
there are many things I'm not aware of. 

The use case is someone who wants to build the entire program
(which includes a number of 3rd-party libraries) with debug info,
so they'll have full symbols for crash dump analysis; but then
strip the debug info for those libraries, in order to speed up
debugger load time.  In this scenario, stripping the 3rd-party
code before linking isn't going to satisfy the crash dump analysis
requirement.

I've also brought up split DWARF, but I'm not sure it fits the need.

Thanks,
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Use of Location Description operations in DWARF Expressions?

2020-03-23 Thread Robinson, Paul via Dwarf-Discuss



> -Original Message-
> From: Dwarf-Discuss  On Behalf
> Of Adrian Prantl via Dwarf-Discuss
> Sent: Friday, March 20, 2020 1:29 PM
> To: Michael Eager 
> Cc: dwarf-discuss@lists.dwarfstd.org
> Subject: Re: [Dwarf-Discuss] Use of Location Description operations in
> DWARF Expressions?
> 
> 
> 
> > On Mar 19, 2020, at 5:49 PM, Michael Eager via Dwarf-Discuss  disc...@lists.dwarfstd.org> wrote:
> >
> > My reading of sections 2.5 & 2.6 is that you cannot have a DW_OP_piece
> in an DWARF expression.
> >
> 
> I wonder if this is an intentional part of the design because of
> ambiguity/correctness issues or is this just something that happens to
> fall out of the way the text is worded? I can see how such a restriction
> might simplify DWARF consumers, but it also seems like an arbitrary
> restriction for which there may not be a technical reason.

My intuition (clearly I wasn't there at the time) is that this is like
a C expression being an rvalue (DWARF expression) or lvalue (location
description).  Values and locations aren't the same thing.
--paulr

> That distinction is important, because if there is a *technical* reason
> for not supporting them we should refrain from implementing this in LLVM.
> But if there isn't, there is no harm done in implementing it as an
> extension, and DWARF consumers that don't support it can just ignore these
> expressions and return N/A.
> 
> -- adrian
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] Segment selectors for Harvard architectures

2020-03-19 Thread Robinson, Paul via Dwarf-Discuss
This recently came up in the LLVM project.  Harvard architectures
put code and data into separate address spaces, but those spaces
are not explicit; instructions that load/store memory implicitly
use the data space, while things like taking a function address or 
doing indirect branches will implicitly use the code space.  This 
doubles the effective size of memory without consuming an address 
bit, as well as having other secondary benefits like not allowing
self-modifying code.

Nearly all of the DWARF information does not need to distinguish
between code and address spaces, because it's easy to derive that
from context.  Addresses in the line table or a range list will be
code addresses; in .debug_info, addresses of code elements will be
code addresses, while variables will be data addresses. And so on.

This only seems to break down in the .debug_aranges section, which
records both data and code addresses without any context to let a
consumer know which is what.  In a flat-address architecture, no
distinction is needed; in a segmented architecture, there will be
a segment selector as part of any address, and that includes the
.debug_aranges section.  What about for Harvard architectures?

What I suggested in the LLVM project is that .debug_aranges would
have a 1-byte segment selector and use some trivial scheme such as
0=code, 1=data to distinguish what kind of address it is.  Other
DWARF sections wouldn't need a selector because they can all use
context to figure it out; this avoids the size overhead of using
segment selectors everywhere else.

Pavel Labath pointed out that this seems inconsistent and might
make consumers unhappy; segment selectors are described as a
characteristic of the target architecture, so having them in one
place and not others might look suspicious.  IMO it's a reasonable 
"permissive" use of the existing DWARF structures, but it seemed
worth asking here.

Does this (segment selector only in .debug_aranges) sound okay?
Should there be non-normative text or a wiki description of this?
Do we want to codify the 0=code 1=data use of segment selectors
for all Harvard architectures (that don't otherwise have explicit
segements) so that this doesn't have to be set by ABI committees?

I'm willing to write up whatever needs writing up, either as a
proposal or as a wiki entry.

Thanks,
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] File name encoding in DWARF

2019-11-27 Thread Robinson, Paul via Dwarf-Discuss
> I am trying to consume the file name of a compile unit. Fortunately
> DW_TAG_compile_unit has a member DW_AT_name and DW_AT_comp_dir.
> Unfortunately it is not clear which kind of encoding is used to
> store these strings. I tried around with GCC and clang. GCC under
> Windows produces latin1 encoding and clang produces UTF-8. I have
> not found any information inside the DWARF debug info on the type
> of encoding used. Is there a way to determine which encoding was
> used? Otherwise, searching for the file on disk can become a
> problem.

DWARF version 3 added the DW_AT_use_UTF8 flag on the compilation
unit to indicate that strings are encoded with UTF-8.  If the flag
is present, you can assume all strings are UTF-8.  Unfortunately,
in the absence of that flag, there is no specified encoding.
DWARF version 5 strongly recommends UTF-8 but does not require it.
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] dwarf stack operator for byte swap.

2019-10-28 Thread Robinson, Paul via Dwarf-Discuss
Hello Chirag,

Regarding a byte-swap operation, it seems that you have a reasonable use-case 
on a bi-endian machine.  Feel free to request a new operator on the "public 
comments" page at http://dwarfstd.org/Comment.php

Note that a byte-swap operator would swap all bytes in the top-of-stack value, 
which on your 64-bit machine would of course be a 64-bit value.  As you want a 
32-bit swapped value, you would still need to do a shift afterward, but even 
so, "DW_OP_byte_swap DW_OP_const1u 32 DW_OP_shr" would be considerably shorter 
than what you have to do now.

Of course a new operator would be introduced in a new DWARF revision, which is 
likely to be years away.  In the meantime let me suggest a shorter expression 
for doing the byte-swap operation.  The book "Hacker's Delight" shows a 
straightforward 32-bit byte swap with masks no wider than 16 bits, as follows:
   x = (x << 24) | ((x & 0xff00) << 8) | ((x >> 8) & 0xff00) | (x 
>> 24);
Your 64-bit machine will of course use 64-bit values on the expression stack, 
so to keep the result "32-bit clean" we want to do one additional mask:
   x = ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x >> 8) & 
0xff00) | (x >> 24);
Translating this into a DWARF expression, I get the following:
   DW_OP_dup, DW_OP_const1u 0xff, DW_OP_and, DW_OP_lit24, 
DW_OP_shl, DW_OP_swap, DW_OP_dup, DW_OP_const2u 0xff00, DW_OP_and, DW_OP_lit8, 
DW_OP_shl, DW_OP_swap, DW_OP_dup, DW_OP_lit8, DW_OP_shr, DW_OP_const2u 0xff00, 
DW_OP_and, DW_OP_swap, DW_OP_lit24, DW_OP_shr, DW_OP_or, DW_OP_or, DW_OP_or

I hope this is helpful to you.
--paulr

From: Dwarf-Discuss  On Behalf Of 
Chirag Patel via Dwarf-Discuss
Sent: Monday, October 28, 2019 12:47 AM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-Discuss] dwarf stack operator for byte swap.

Hello Dwarf experts.

I am currently working trying to encode dwarf of binaries with having bi-endian 
variables marked with DW_AT_endianity attribute.
The location calculation for some  variable depends on other variable with 
different endianity, also the value of this other variable is known at runtime.

At the moment I am using location list to calculate the correct location of 
first variable and list of dwarf operators to reverse the endianity of variable 
"__gbloffset__" in below case (I only needed lower 32 bits on 64 bit machine).

0x01e5: DW_TAG_base_type
 DW_AT_byte_size  (0x04)
  DW_AT_encoding  (DW_ATE_signed)
DW_AT_name  ("int")
DW_AT_endianity (DW_END_big)
...
0x0057:   DW_TAG_variable
DW_AT_name  ("__gbloffset__")
DW_AT_type  (0x01e5 "int")
DW_AT_external  (true)
DW_AT_decl_file ("...")
DW_AT_decl_line (8)
DW_AT_location  (DW_OP_addr 0) // pre linkage
DW_AT_linkage_name  ("_gblsection__")

0x0170:   DW_TAG_variable
DW_AT_name  ("VAR1")
DW_AT_type  (0x010b "fixed.dec.display.72")
DW_AT_decl_file ("...")
DW_AT_decl_line (10)
DW_AT_location  (DW_OP_addr 0x0, DW_OP_call4 0x57, 
DW_OP_deref_size, 4,
DW_OP_dup, DW_OP_constu 0xff, DW_OP_lit0, DW_OP_shl, DW_OP_and, DW_OP_lit24, 
DW_OP_shl, DW_OP_swap, DW_OP_dup, DW_OP_constu 0xff, DW_OP_lit8, DW_OP_shl, 
DW_OP_and, DW_OP_lit8, DW_OP_shl, DW_OP_swap, DW_OP_dup, DW_OP_constu 0xff, 
DW_OP_lit16, DW_OP_shl, DW_OP_and, DW_OP_lit8, DW_OP_shr, DW_OP_swap, 
DW_OP_constu 0xff, DW_OP_lit24, DW_OP_shl, DW_OP_and, DW_OP_lit24, DW_OP_shr, 
DW_OP_swap, DW_OP_or, DW_OP_or, DW_OP_or, DW_OP_plus)
DW_AT_linkage_name  ("VAR1")


In above snippet of dwarf dump, I am using yellow highlighted list of operators 
to swap the bytes.
I think there should be a support for DW_OP_byte_swap simple operator to 
accomplice this simple task. Does this idea looks like it can be useful? Is 
there any specific reason why dwarf spec does not have it or I am missing 
something subtle.

I hope I conveyed the idea properly, apologies in advanced as English is not my 
first language.

Thanks and regards,

Chirag Patel
Software Engineer | Raincode Labs India
Tel: (+91) 080 41159811
Mob: (+91) 9049336744
www.raincodelabs.com
[linkedin-button]

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org