Re: [Dwarf-Discuss] EXTERNAL: Corner-cases with bitfields

2022-05-09 Thread Todd Allen via Dwarf-Discuss
On Mon, May 09, 2022 at 04:09:59PM -0700, Michael Eager wrote:
> On 5/9/22 16:00, Todd Allen via Dwarf-Discuss wrote:
> > I suppose, if you didn't want to submit an issue, another solution would be 
> > to
> > require the necessary tags & attributes in the ABI itself.  We already 
> > expect
> > ABI documents to provide things like register values, CFI initial values, 
> > and
> > some more esoteric stuff (augmentations, non-standard endianity & isa).  An 
> > ABI
> > that required descriptions in ABI-specific situations like these two seems
> > reasonable to me.  And it places no burden on compilers for other ABI's.
> This creates the situation where there are two definitions for a DWARF
> attribute, one in an ABI and a different one in the DWARF Spec.  We want to
> avoid situations where one producer says "I'm following DWARF" and another
> "I'm following the ABI".  That makes interoperability difficult.
> The information you mention in an ABI is not in the DWARF Spec.

I don't know that it's quite that bad.  The ABI could say that DW_AT_bit_size
*also* implies that the field is a bit field for ABI purposes.  That doesn't
change the meaning from the DWARF specification; it merely adds to it.  Mind
you, I think an explicit DW_AT_bit_field (or something like that) is better.

Also, while the DWARF standard is intentionally permissive, an ABI need not be.
They could mandate either of the above solutions, and also mandate descriptions
of anonymous 0-sized fields.  (Unless there's a better, more direct
description for that case.)

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] EXTERNAL: Corner-cases with bitfields

2022-05-09 Thread Todd Allen via Dwarf-Discuss
On Mon, May 09, 2022 at 09:41:03PM +, Dwarf Discussion wrote:
> > Pedro Alves wrote:
> > On 2022-05-09 16:48, Ron Brender via Dwarf-Discuss wrote:
> > > So my suggestion is to file a bug report with CLANG, requesting they
> > correct their DWARF output to reflect all details needed
> > > by your language.
> > 
> > An issue here is that DWARF does say this, in (DWARF 5, 5.7.6 Data Member
> > Entries, page 119):
> > 
> >  "If the size of a data member is not the same as the size of the type
> > given for the
> > 
> > ^^
> > ^^^
> >  data member, the data member has either a DW_AT_byte_size or a
> >  ^^^
> >  DW_AT_bit_size attribute whose integer constant value (see Section 2.19
> > on
> >  page 55) is the amount of storage needed to hold the value of the data
> > member."
> > 
> > Note the part I underlined.  In Lancelot's case, the size of the data
> > member
> > IS the same as the size of the type given for the data member.  So Clang
> > could well pedantically
> > claim that they _are_ following the spec.  Shouldn't the spec be clarified
> > here?
> What the spec says is that a producer isn't _required_ to emit the
> DW_AT_bit_size attribute.  But, given that DWARF is a permissive
> standard, the producer is certainly _allowed_ to emit the attribute.  
> If this is a hint that the target debugger will understand, regarding
> the ABI, it seems okay to me for the producer to do that.
> > This then raises the question of whether a debugger can assume that the
> > presence of a DW_AT_bit_size
> > attribute indicates that the field is a bit field at the C/C++ source
> > level.  GDB is assuming that
> > today, as there's really no other way to tell, but I don't think the spec
> > explicitly says so.
> GDB is choosing to make that interpretation, which it's allowed to do.
> The DWARF spec just doesn't promise that interpretation is correct.
> You can propose to standardize that interpretation by filing an issue
> with the DWARF committee at and it might
> or might not become part of DWARF v6.  It might be tricky because you'd
> be generalizing something very specific to your environment.
> You can also, separately, try to get Clang to emit the DW_AT_bit_size
> attribute in these cases for the AMDGPU target(s).  This seems more
> likely to work, especially as there's an ABI requirement involved, and
> (given that GDB makes this interpretation) I assume gcc already does this.


I suppose, if you didn't want to submit an issue, another solution would be to
require the necessary tags & attributes in the ABI itself.  We already expect
ABI documents to provide things like register values, CFI initial values, and
some more esoteric stuff (augmentations, non-standard endianity & isa).  An ABI
that required descriptions in ABI-specific situations like these two seems
reasonable to me.  And it places no burden on compilers for other ABI's.

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] EXTERNAL: Corner-cases with bitfields

2022-05-06 Thread Todd Allen via Dwarf-Discuss
> Dear all,
> During our work on debugging support of compute workloads on AMDGPU[1],
> we (at AMD) have been seeing two cases regarding description of
> bitfields in DWARF for which we do not find definitive answers in the
> DWARF documentation.  For those cases, when experiencing with usual CPU
> targets we observe different behaviors on different toolchains.  As a
> consequence, we would like to discuss those points here to gather
> feedbacks.  If deemed necessary, we will submit a formal clarification
> issue to the dwarf committee.
> Both of the cases we present below impact how arguments are passed
> during function calls in the ABI for at least our target (AMDGPU).
> However, the debug information available to the debugger does not give
> enough information to decide how to handle the type and the spec does
> not really say what debug information should be generated to properly
> describe those cases.  Also note that in both case, the DWARF
> information we have is sufficient to describe the memory layout of the
> types.
> 1 - Bitfield member with a size matching its underlying type:
> The first point we would like to discuss is the one of  bitfield members
> whose sizes match their underlying type.  Let's consider the following
> example:
>  struct Foo
>  {
>char a :???8;
>char b : 8;
>  };
> If we compile such example with GCC it will add the `DW_AT_bit_size` and
> `DW_AT_bit_offset` attributes to the `a` and `b` DIEs.
> Clang on the other hand will not produce those attributes.
> On the debugger side, GDB currently considers a struct member as being
> packed (i.e. part of a bitfield) if it has the `DW_AT_bit_size`
> attribute present and is non-0.  Therefore, GDB will "understand"
> what GCC produces, but not what Clang produces.
> What Clang does seems to be a reasonable thing to do if one is only
> interested in the memory layout of the type.  This however is not
> sufficient in our case to decide how to handle such type when
> placing/inspecting arguments in registers in the context of function
> calls. In our ABI, bitfield members are passed packed together, while
> two chars in a struct would be placed in separate registers.
> To clarify this situation, it would be helpful that a producer always
> includes the DW_AT_bit_size attribute for bit field, which the standard
> does not suggest nor require.

It sounds like your ABI is basing its decision on a boolean: is the field a bit
field or not.  And you're trying to deduce this from DW_AT_bit_offset.  Perhaps
a better solution would be to make this explicit in the DWARF, some new
DW_AT_bitfield flag.  There's very little that the DWARF standard can do to
mandate such an attribute.  (Permissive standard yadda yadda.)  But if it's
necessary for debuggers to work correctly in a given ABI, compilers should be
well-motivated to produce it when generating code for that ABI.

> 2 - Unnamed zero sized bitfield
> Another case we met is related to unnamed fields with a size of 0 bits.
> Such field is defined in the c++ standard as (in 9.6 Bit-Fields):
>  > As a special case, an unnamed bit-field with a width of zero
>  > specifies alignment of the next bit-field at an allocation unit
>  > boundary
> If we now consider an extended version of our previous example:
>  struct Foo
>  {
>char a : 8;
>char : 0;
>char b :???8,
>  };
> Neither GCC nor Clang give any information about the unnamed bitfield.
> As for the previous case, the presence of such field impacts how
> arguments are passed during function calls on our target, leaving the
> debugger unable to properly decide how to handle such cases.
> As for the previous case, both compilers can properly describe Foo's
> memory layout using DW_AT_bit_offset.
> It seems that such 0-sized field also has impact on ABI on other targets
> such as armv7hl and aarch64 as discussed in [2].  Should the DWARF
> specification give some guidance on how to handle such situation?
> All thoughts on those cases are welcome.

I'm agreeing with Michael that describing the unnamed bitfield seems dubious.
If it does impact the ABI, I'm wondering if that impact is indirect: that is,
the presence of this 0-width bit field changes an attribute of the next field,
and that attribute is responsible for difference in the behavior.  If so, is
there any way other than a 0-width bit field to cause the same behavior?  This
might be another case where describing the attribute that's directly responsible
might be better.

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] [EXTERNAL] - RE: Multiple floating point types with the same size but different encodings

2022-01-25 Thread Todd Allen via Dwarf-Discuss
> For ATE codes, the problem is with standardization if we wanted
> to standardize it in some way for DWARF6.
> The current DW_ATE_{,complex_}float is way too unspecific and historically
> can be about various formats.
> So, we'd need something like DW_ATE_{,complex_}ieee754_float
> (or ieee754_binary_float?)
> which would depending on DW_AT_byte_size be binary{16,32,64,128,256}
> format, and then add DW_ATE_* values for the floating point formats
> known to us, which would be
> vax_f_float, vax_g_float, vax_d_float (what about vax_h_float?)
> bfloat16
> Intel extended precision
> IBM extended (double double)
> what else?
> Anyway, it might be possible as can be seen in the DW_ATE_HP_*
> extensions to reuse the same DW_ATE_* code for multiple different
> formats as long as they are guaranteed to have different DW_AT_byte_size.
> For DW_AT_precision/DW_AT_min_exponent/DW_AT_max_exponent we would
> just define them the same way as C/C++ does define corresponding
> macros, e.g.
> (though of course, we can only reasonably use properties that are
> expressible as small integral values or booleans, can't have
> attributes matching to say FLT_MAX etc., which need some floating point
> values).  All could be optional and the producers would need to use them
> only if without those attributes it would be ambiguous what it is.

I suspect you'd end up needing more attributes to completely pin down a
floating-point type.  Consider the i86/i87 FPU 80-bit floating-point type.  It
had a single bit which was the integer part, in additional to the traditional
fractional bits of the mantissa.  So determining the number of bits in the
exponent is not simply (bit_size - mantissa_bits - sign_bit).  Also, you'd need
to indicate lack of support for inf/nan, unless you were expecting that to be
deduced from the min_exponent/max_exponent.

The encodings do seem like a cleaner approach, as Ron argued: You either
recognize the enumerated value, probably because your hardware supports it
natively, and you don't care all that much about all the persnickety bits; or
you reject the type.  I suppose you might choose to support a non-native
floating-point type, but I suspect you'd need to have a priori knowledge of all
the details anyway.

If we do end up promoting any of these architecture-specific types into the
standard, perhaps we should provide some of the implementation details about
what they mean.  We could do that by referencing other documents, but it seems
reasonable to include a table containing the number of bits for each field, and
a mention of any major peculiarities (e.g. skewed bias, inf/nan unsupported,

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] string reduction techniques

2021-11-07 Thread Todd Allen via Dwarf-Discuss
rather hopeful we might be able
>  to reduce the overheads enough to avoid widespread use of DWARF64 -
>  but it's not a sure thing by any means.
>  Agreed. I'd like to explore as many avenues as we can to eliminate
>  the
>  need for DWARF64.
>  >> Honestly, I've never been sure why gcc generates
>  DW_AT_linkage_name.  Our
>  >> debugger almost never uses it.  (There is one use to detect "GNU
>  indirect"
>  >> functions.)  I wonder if it would be possible to avoid them if
>  you provided
>  >> enough info about the template parameters, if the debugger had
>  its own name
>  >> mangler.  I had to write one for our debugger a couple years ago,
>  and it
>  >> definitely was a persnickety beast.  But doable with enough
>  information.  Mind
>  >> you, I'm not sure there is enough information to do it perfectly
>  with the state
>  >> of DWARF & gcc right now.
>  >
>  > Yeah, that was/is certainly my first pass - the way I've done the
>  DW_AT_name one is to have a feature in clang that produces the short
>  name "t1" but then also embeds the template argument list in the
>  name (like this: "_STNt1|") - then llvm-dwarfdump will detect
>  this prefix, split up the name, rebuild the original name as it
>  would if it'd been given only the simple name ("t1") and compare it
>  to the one from clang. Then I can run this over large programs and
>  check everything round-trips correctly & in clang, classify any
>  names we can't roundtrip so they get emitted in full rather than
>  shortened.
>  > We could do something similar with linkage names - since to know
>  there's some prior art in your work there.
>  >
>  > I wouldn't be averse to considering what'd take to make DWARF
>  robust enough to always roundtrip simple and linkage names in this
>  way - I don't think it'd take a /lot/ of extra DWARF content.
>  Fuzzy memory here, but as I recall, GCC didn't generate linkage
>  names
>  (or only did in some very specific cases) until the LTO folks
>  convinced us they needed it in order to relate profile data back to
>  the source. Perhaps if we came up with a better way of doing that,
>  we
>  could eliminate the linkage names.
>No, see, that's a mildly reasonable answer.
>If you go far enough back, the linkage names exist for a few reasons:
>1. Because the debug info wasn't always good enough, and so GDB used
>to demangle the linkage names and parse them using a hacked up C++-ish
>parser for type info.
>2. Even when it didn't, it decoded linkage names to detect things like
>destructors/constructors, etc.
>3. Because It used it to do remangling properly and try to generate
>method signatures to lookup (and for #1)
>4. Because it was used to do symbol lookup of in the ELF/etc symbol
>tables for static things/etc.
>5. Because it saved space in STABS to do #1 (they predate DWARF by
>If you checkout gdb source code, circa 2001, and search for things
>like check_stub_method, and follow all the things it calls (like
>gdb_mangle_name), you can learn the history of linkage names (and
>probably throw up in your mouth a little).
> If you do a case insensitive search for things like "physname" and
>"phys_name", you'll see all the places it used to use the linkage
>I spent a lot of time abstracting out things like the
>constructor/destructor name testing, vptr name finding, etc, so that
>someone later might have a chance to get rid of linkage names (it was
>also necessary because of the gcc 2.95->3.0 ABI change).
>Dwarf-Discuss mailing list
>  ___
>  Dwarf-Discuss mailing list
>  [7]
>  [8]
> References
>Visible links

> ___
> Dwarf-Discuss mailing list

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] string reduction techniques

2021-11-01 Thread Todd Allen via Dwarf-Discuss

If I understand right: The space saving you're expecting is the near-elimination
of DW_AT_name strings.  If they are only simple names like "T" and "int", they
can be placed into the string table once each, and it should be very small.  But
you're expecting the DW_AT_linkage_name attributes still to have lots of
replication because of the large composed names.  So I gather that was where
your estimate of 1/2 reduction came from.

I was trying to figure out how we came to opposite conclusions, and I think it's
that I have this (implicit) assumption of a sort of "DWARF Moore's Law", that
the size of debug info/strings/etc. would double periodically, just based on the
tendency of software systems to grooow.  I'm likening it to Moore's Law,
because I expect it's the same sort of vague, rough estimate that somehow still
applies to the real world.

Assuming it does apply, your halving of the string table amounts to buying
yourself one doubling period, and then you're back to requiring DWARF64 string
tables.  (Meanwhile, DWARF64 gives us 32 doubling periods over DWARF32.  So
hopefully that will last us for a while...)

I can't be sure about this exponential growth.  I don't have the data to back it
up.  But I will say, when we created DWARF64, I was skeptical that it would be
needed during my career.  And yet here we are...


The reduction for DW_AT_linkage_name does seem like a tougher nut to crack.  As
you mentioned, there is a tendency to eliminate *some* of the replication
because of the mangler's use of substitution strings (S_, S0_, S1_, etc.)  But
that same feature probably would make it a lot harder to do anything clever
about chopping up the linkage names into substrings.

Honestly, I've never been sure why gcc generates DW_AT_linkage_name.  Our
debugger almost never uses it.  (There is one use to detect "GNU indirect"
functions.)  I wonder if it would be possible to avoid them if you provided
enough info about the template parameters, if the debugger had its own name
mangler.  I had to write one for our debugger a couple years ago, and it
definitely was a persnickety beast.  But doable with enough information.  Mind
you, I'm not sure there is enough information to do it perfectly with the state
of DWARF & gcc right now.


On Mon, Nov 01, 2021 at 01:06:33PM -0700, David Blaikie wrote:
>Hey Todd,
>Just some details regarding the string reduction strategies I'm pursuing
>to address DWARF32 overflowing .debug_str.dwo/.debug_str_offsets.dwo
>sections in some large binaries at Google.
>So the extreme cases I'm dealing with are predominantly C++ Expression
>templates (in TensorFlow and Eigen) - these produce types with very large
>DW_AT_names ("f1") and DW_AT_linkage_names (eg: "_Z2f1IiEvv") (but
>with many more template parameters, none of which are ever user-written
>but deduced).
>So the main fix I'm pursuing (roughly called "simplified template names")
>is to omit template parameter lists from DW_AT_names of templates in most
>cases, allowing the consumer to reconstruct the name from
>DW_AT_template_*_parameters itself, recursively. Further discussion and
>here: [1]
>- in terms of how this affects scaling factors, it means that adding an
>additional template instantiation of existing types would add no new data
>to .debug_str (eg: going from a program with "t1" to "t1>"
>would add no new entries to .debug_str). Not all names can be readily
>reconstructed - so I'm opting the feature out on those, but we could have
>a more deeper discussion about how to handle them if we wanted to make
>this a full-fledged/robust feature (maybe one the DWARF spec
>GDB seems to handle this sort of debug info OK - I guess someone did real
>work to support that at some point (so maybe some other debugger already
>generates DWARF like this).
>The other half, though, is DW_AT_linkage_names - and in theory similar
>rebuilding could be done, but that'd require baking a lot fo
>implementation knowledge into the DWARF Consumer that DWARF is meant to
>help avoid... so I'm unsure what the right solution is there just now, but
>there's a few ideas I'm still kicking around. At least linkage names have
>less redundancy (within a single name they avoid redundancy - "t1,
>t1>" only ends up with a single description of "t1" instead of
>two of them like you get with the DW_AT_name) than DW_AT_names, so they do
>scale a bit better already.
>Happy to discuss these ideas in specific, or their impact on debug_str
>growth in more detail any time (here, video chat, discords, etc).
>- Dave
> References
>Visible links

Re: [Dwarf-Discuss] modeling different address spaces

2020-07-20 Thread Todd Allen via Dwarf-Discuss

My experience with an architecture like this also is a GPU: the Nvidia CUDA
GPUs.  I don't work on nvcc.  I'm coming at this from the consumer side.  But
what I've observed:

They use DW_AT_address_class with a CUDA-specific enum of address spaces, with
values for things like: global memory, shared memory, const memory, etc.  They
don't attach these attributes to subroutines, because all the code on that
architecture is in a single "code" memory.  They do attach them to pointer
types, as the DWARF spec describes.  They also attach them to variables,
formals, etc.  That's a vendor extension (which I'd forgotten until I looked it
up again in the DWARF spec).  But an obvious one.  We might want to formalize it
at some point.

Anyway, these are the sorts of things we see:

  DW_AT_type  : ...
  DW_AT_address_class : ptxGenericStorage

  DW_AT_name  : myConstant
  DW_AT_type  : ...
  DW_AT_location  : ...
  DW_AT_address_class : ptxConstStorage

  DW_AT_abstract_origin : ...
  DW_AT_location: ...
  DW_AT_address_class   : ptxLocalStorage

I don't know your architecture, but I'd expect something similar to work for any
GPU with heterogeneous memories.

Todd Allen
Concurrent Real-Time

On Mon, Jul 20, 2020 at 08:31:53AM +, Dwarf Discussion wrote:
> Hello Michael,
> > > What would be the recommended way to model variables that are allocated
> > > to different address spaces?
> > 
> > Can you describe the architecture a bit?
> It's a GPU.  It uses a different address space for shared local memory.
> > > I found DW_OPT_xderef for dereferencing address-space qualified pointers
> > > but the resulting memory location description wouldn???t have an
> > > address-space qualifier.
> > 
> > DW_OPT_xderef translates from an architecturally defined memory
> > reference including an address space into a linear address space
> > (generic type).  DWARF doesn't support computations on address-space
> > qualified addresses, although using a typed stack, this could be an
> > extension.
> I don't see a need for this, right now.  It should suffice to describe that an
> object lives in address-space A so the location expression yields an 
> A-address.
> In another email you said: "
> CUDA address spaces or a DSP with multiple distinct address spaces are 
> what would conventionally be described as segmented memory.  I think 
> that using the DW_AT_address_space would be reasonable.
> ".
> I assume you mean DW_AT_address_class.  This should do the trick.  I just 
> wasn't
> sure if that's the intended use of that attribute.
> > > I found DW_AT_address_class, which allows attaching an integer, which
> > > could represent the address-space.  This sounds pretty close.  I???m a bit
> > > thrown off by the example, though.
> > 
> > Which example?
> Table 2.7 "Example address class codes" on p. 48.  It uses DW_AT_address_class
> to describe addressing modes.
> Regards,
> Markus.
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0,
> Managing Directors: Christin Eisenschmid, Gary Kershaw
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
> ___
> Dwarf-Discuss mailing list
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] modeling different address spaces

2020-07-16 Thread Todd Allen via Dwarf-Discuss
Markus, Michael, David, Xing,

I always assumed that the segment support in DWARF was meant to be more general,
and support architectures where there was no single flat memory, and so the
segments were necessary for memory accesses.  I personally have not dealt with
any architectures where DW_AT_segment came into play, though.

Certainly x86 does not fall into that "truly distinct segments" category, at
least not in modern times.  The segment registers there (fs & gs, for example)
are an indirect way of specifying a base address within the flat address space.
They usually end up being used for thread-specific data structures where each
thread has a different segment selector which implies a different base address.
And it requires a syscall to interact with the base addresses, at least on
Linux.  The other segment registers (cs, ds, ss) are set-and-forget by the OS

The CUDA architecture is an interesting case.  It doesn't use DW_AT_segment at
all.  But it does use the DW_AT_address_class attribute to specify CUDA segments
(e.g. Global, Local, Shared, among many others) for variables and/or types.  So
it's fairly fine-grained.  You can, for example, have a shared pointer to a
global pointer to a local integer, and the DW_AT_address_class attribute can
convey that.

Some of those CUDA segments are for radically different sorts of memory
(e.g. very low latency Shared memory vs. high latency Global memory).  But other
distinctions seem more gratuitous (e.g. Param vs. Global memory).  I assume that
there's a CUDA under-the-hood mapping of many of the segments to regions of a
flat Global address space in there, but the CUDA architectures & drivers
deliberately hide that mapping.  So effectively you end up with all the segments
being distinct, as far as a debugger can tell.

On Thu, Jul 16, 2020 at 09:23:51AM +, Dwarf Discussion wrote:
>What would be the recommended way to model variables that are allocated to
>different address spaces?
>I found DW_OPT_xderef for dereferencing address-space qualified pointers
>but the resulting memory location description wouldn't have an
>address-space qualifier.
>I found DW_AT_address_class, which allows attaching an integer, which
>could represent the address-space.  This sounds pretty close.  I'm a bit
>thrown off by the example, though.

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] Segment selectors for Harvard architectures

2020-03-23 Thread Todd Allen via Dwarf-Discuss

I haven't needed to contend with this issue.  But as I was looking over the
standard, this was my initial gut reaction too: use the segment selectors.  This
use actually does seem like it's a characteristic of the target architecture to
me.  You started the discussion with "Harvard architectures".

DWARF does permit architectures to specify aspects of their DWARF description,
after all.  I can't recall it ever being done *formally*, but it's been done
informally for every architecture that uses DWARF.  At a bare minimum, register
encodings.  And usually you have to root around in somebody else's source code
to find it.

This one has a slightly higher chance of breaking a consumer, if that consumer
was written not to tolerate the segment selectors.  But I think it would be fair
to put any such blame on the consumer in that case.  If the consumer doesn't die
with a SIGSEGV, then it might ignore the segments.  And then it would be no
worse off than now.

On Thu, Mar 19, 2020 at 06:05:16PM +, Dwarf Discussion wrote:
> This recently came up in the LLVM project.  Harvard architectures
> put code and data into separate address spaces, but those spaces
> are not explicit; instructions that load/store memory implicitly
> use the data space, while things like taking a function address or 
> doing indirect branches will implicitly use the code space.  This 
> doubles the effective size of memory without consuming an address 
> bit, as well as having other secondary benefits like not allowing
> self-modifying code.
> Nearly all of the DWARF information does not need to distinguish
> between code and address spaces, because it's easy to derive that
> from context.  Addresses in the line table or a range list will be
> code addresses; in .debug_info, addresses of code elements will be
> code addresses, while variables will be data addresses. And so on.
> This only seems to break down in the .debug_aranges section, which
> records both data and code addresses without any context to let a
> consumer know which is what.  In a flat-address architecture, no
> distinction is needed; in a segmented architecture, there will be
> a segment selector as part of any address, and that includes the
> .debug_aranges section.  What about for Harvard architectures?
> What I suggested in the LLVM project is that .debug_aranges would
> have a 1-byte segment selector and use some trivial scheme such as
> 0=code, 1=data to distinguish what kind of address it is.  Other
> DWARF sections wouldn't need a selector because they can all use
> context to figure it out; this avoids the size overhead of using
> segment selectors everywhere else.
> Pavel Labath pointed out that this seems inconsistent and might
> make consumers unhappy; segment selectors are described as a
> characteristic of the target architecture, so having them in one
> place and not others might look suspicious.  IMO it's a reasonable 
> "permissive" use of the existing DWARF structures, but it seemed
> worth asking here.
> Does this (segment selector only in .debug_aranges) sound okay?
> Should there be non-normative text or a wiki description of this?
> Do we want to codify the 0=code 1=data use of segment selectors
> for all Harvard architectures (that don't otherwise have explicit
> segements) so that this doesn't have to be set by ABI committees?
> I'm willing to write up whatever needs writing up, either as a
> proposal or as a wiki entry.
> Thanks,
> --paulr
> _______
> Dwarf-Discuss mailing list

Todd Allen
Concurrent Real-Time
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] Use of Location Description operations in DWARF Expressions?

2020-03-23 Thread Todd Allen via Dwarf-Discuss
I recall this being intentional as well.  This is how I think of these items.
And this is just the gist of things.  I didn't put on my ABI Lawyer hat for

A DWARF expression is a stack machine that evaluates to a value.

A location description describes the "location" of an object.  A "location" is
pure concept here, and doesn't necessarily require any physical location.  It
can be:
 * In memory at an address: It has a DWARF expression which computes the
   start address.
 * In a register: DW_OP_regN, DW_OP_regx.  No locdesc needed.
 * Nowhere, but with a known value: DW_OP_implicit_value, DW_OP_stack_value.
   I think of this as being an ephemeral "location", which in concrete terms
   would be in a buffer in the debugger or some other consumer.  There's a
   DWARF expression which computes the value.
 * Nowhere, but where it's an optimized-away pointer, and its designated
   (pointed-to) value is a known value, much like above.
 * Spread out across multiple distinct locations: DW_OP_piece's, where each
   piece can be any one of the above.
Oh, and one more:
 * Nowhere at all.  Go fish.

So locdescs can use a DWARF expression for a couple different purposes, or even
multiple DWARF expressions.  But they have additional operators for additional
cases (e.g. registers), and for "glue" (DW_OP_piece).

Conversely, DWARF expressions cannot use any of the locdesc special operators or

But, of course, there could be use cases where some of these would make sense in
a DWARF expression, and we just didn't think of it.  Nothing springs to my mind
right now...  But if you have a compelling case, we certainly could move some of
those special/glue operators from the locdesc category to the DWARF expression

It think it feels a little blurry only because locdescs came first, and then we
co-opted them for DWARF expressions, and restricted the use of certain operators
in that case.  And then it eventually changed into what we have now.  But a lot
of us remember the history, which creates that blur.

Todd Allen
Concurrent Real-Time

On Mon, Mar 23, 2020 at 12:04:58PM -0700, Dwarf Discussion wrote:
> On 3/23/20 6:28 AM, Robinson, Paul via Dwarf-Discuss wrote:
> > > From: Dwarf-Discuss  On Behalf
> > > Of Adrian Prantl via Dwarf-Discuss
> > > > On Mar 19, 2020, at 5:49 PM, Michael Eager via Dwarf-Discuss  > >> wrote:
> > > > 
> > > > My reading of sections 2.5 & 2.6 is that you cannot have a DW_OP_piece
> > > in an DWARF expression.
> > > > 
> > > 
> > > I wonder if this is an intentional part of the design because of
> > > ambiguity/correctness issues or is this just something that happens to
> > > fall out of the way the text is worded? I can see how such a restriction
> > > might simplify DWARF consumers, but it also seems like an arbitrary
> > > restriction for which there may not be a technical reason.
> > 
> > My intuition (clearly I wasn't there at the time) is that this is like
> > a C expression being an rvalue (DWARF expression) or lvalue (location
> > description).  Values and locations aren't the same thing.
> It is somewhat an L-value vs R-value issue.
> You can craft a DWARF expression to extract a value (an R-value) from
> arbitrary memory locations or registers (for example, using DW_OP_and,
> DW_OP_sh?, etc.) and place it on the top of the stack.  A DW_OP_piece
> operator doesn't do this.  (There might be value in an operator which
> extracts a value from a composite location.)
> A location (an L-value) which includes potentially multiple register or
> memory locations and multiple DW_OP_piece or DW_OP_bit_piece operations
> can't be evaluated by a simple stack-based expression interpreter.
> The design is intentional, AFAIK, not accidental.
> I think that the description has become a bit less clear with the addition
> of the Implicit Location Descriptions in Section, which do compute
> values, rather than locations.  Perhaps these should have been described in
> Section 2.5 as parts of a DWARF expression, not as parts of a Location
> Description.
> The description (and implementation) of DWARF expressions and locations are
> somewhat muddled together.  This can be seen in the first sentence of
> Section 2.5:
>DWARF expressions describe how to compute a value or specify a
> A clearer definition would specify that the DWARF expression only computes a
> value, and leaving what that value means (e.g., register/memory contents,
> arbitrary computation, memory address) to the context in which the
> expression is used.  A more precise definition of a location, especially a
> composite location, would he

Re: [Dwarf-Discuss] Some DWARFv5 draft feedback

2016-12-01 Thread Todd Allen
> Enumeration types. It is allowed to have a DW_AT_byte_size on a
> DW_TAG_enumeration_type, but not DW_AT_encoding. To describe both size
> and encoding one needs to use a DW_AT_type pointing to a base type that
> represents the "underlying type". For languages where enumerations don't
> have an underlying type, or for strongly typed enums it is easier to
> attach the encoding directly than adding and indirection to a base type.
> Add DW_AT_encoding to the attribute list for DW_TAG_enumeration_type.

FWIW, our Ada compiler always placed DW_AT_encoding attributes directly on
DW_TAG_enumeration_type to indicate the underlying signedness.  Ada never was
truly supported, so we just viewed it as part of the Ada support we'd defined.

Todd Allen
Concurrent Computer Corporation
Dwarf-Discuss mailing list

Re: [Dwarf-Discuss] How to represent address space information in DWARF

2016-07-28 Thread Todd Allen
On Wed, Jul 27, 2016 at 07:39:54PM -0400, Tye, Tony wrote:
>Another question that has been raised as part of the HSA Foundation
>([1] tools working group relates to the
>manner that address spaces should be represented in DWARF.
>HSA defines segments in which variables can be allocated. These are
>basically the same as the address spaces of OpenCL. HSA defines kernels
>that are basically the same as OpenCL kernels. A kernel is a grid launch
>of separate threads of execution (termed work-items). These work-items are
>grouped into work-groups. The work-items can access one of three main
>memory segments:
>1. The global segment is accessible by all work-items. In hardware it is
>typically just the global memory.
>2. The group segment (corresponding to the local address space of OpenCL)
>is accessible only to the work-items in the same work-group. Each
>work-group has its own copy of variables allocated in the group segment.
>On GPU hardware this can be implemented as special hardware managed
>scratch pad memory (not part of globally addressable memory), with special
>hardware instructions to access it.
>3. The private segment is accessible to a single work-item. Each work-item
>has its own copy of variables allocated in the private segment. On GPU
>hardware this could also involve special hardware instructions.
>HSA also defines the concept of a flat address (similar to OpenCL generic
>addresses). It is essentially a linearization of the addresses of the 3
>address spaces. For example, one range of a flat address maps to the group
>segment, another range maps to the private segment, and the rest map
>directly to the global segment. However, it is target specific what exact
>method is used to achieve the linearization.
>The following was the conclusion we reached from reading the DWARF
>standard and looking at how gdb and lldb would use the information. We are
>currently working on creating a patch for LLVM to support address spaces
>and would appreciate any feedback on if this matches the intended usage of
>DWARF features to support this style of address space.
>1. Use the DW_AT_address_class to specify that the value of a pointer-like
>value is the address within a specific address space. Pointer-like values
>include pointers, references, functions and function types. For HSA we are
>really only concerned with pointer/reference values currently. It would
>apply to a pointer-like type DIE, or a variable with a pointer-like type.
>In the case of a variable it does not specify the address space of the
>variable's location, but specifies how to treat the address value stored
>in the variable.
>2. Use DW_OP_xderef in the location expression of a variable to specify
>the address space in which the variable is located. Since location
>expressions can specify different locations depending on the PC, this
>allows the variable to be optimized to have multiple locations. For
>example, sometimes in a memory location in the group address space,
>sometimes in a register, sometimes in a memory location in the private
>address space (maybe due to spilling of the register), etc.
>Attempting to use DW_AT_address_class on variables to specify their
>address space location conflicts with DWARF stating that it applies to the
>pointee as described in #1. It also breaks the flexibility of location
>expressions allowing the location to change according to PC.
>When a debugger evaluates a DWARF location expression it can generate a
>flat address to encode the address space. It can do this by implementing
>the XDEREF as a target specific conversion from a segment address into a
>flat address. Similarly when using a value as an address that has a
>pointer-like type with an address class, the value can be converted to a
>flat address. When accessing addresses the debugger would have to provide
>the "current thread" so that the correct group/private address space
>instance can be accessed when given a flat address that maps to the group
>or private segments. It appears both gdb and lldb provide this.

FWIW, the use of DW_AT_address_class on pointer & reference times also is how
Nvidia's CUDA compiler describes its various segments.  I haven't encountered
any uses of DW_OP_xderef* operators, but it probably is just because it hasn't
been necessary.

Todd Allen
Concurrent Computer Corporation
Dwarf-Discuss mailing list