Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 4:29 PM Greg Clayton  wrote:

>
>
> On Mar 11, 2021, at 1:12 PM, Paul Robinson via Dwarf-Discuss <
> dwarf-discuss@lists.dwarfstd.org> wrote:
>
> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>
>
> We will use them if they are there, but one interesting issue that we ran
> into with LLDB is some compile units might be in .debug_aranges because the
> compiler made a .debug_aranges section in the .o file, but others might
> not. So we had to add code to LLDB to figure out which compile units have
> any entries in the .debug_aranges section, and read the DW_AT_ranges from
> the DW_TAG_compile_unit if it exist, and if it doesn't, manually index the
> DWARF to create one on the fly each time.
>
>
> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>
> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
>
> A big issue with any of the DWARF sections is we are subject to making the
> contents work with linkers that just want to concatenate + relocate. This
> often leads to information being kept around when dead stripping occurs
> because anything that is dead stripped will just have its address zero'ed
> out or -1'ed out, but this bogus info is still in the data.
>

Yeah, we talked some last year about formalizing this more into the -1
tombstone - I thought maybe Paul had proposed that for standardization,
though at a glance I don't see the proposal. It's probably somewhere there.


> If we don't need a format that can simply be concatenated and relocated,
> the GSYM format, which is open sourced in llvm.org already, might be good
> inspiration for a .debug_aranges successor section that has very efficient
> lookups. The GSYM format could actually be used as is by adding only a new
> DIE offset IntoType.
>
> Besides ".debug_names", all other DWARF accelerator tables are really just
> random indexes that must be linearly scanned or pre-indexed prior to being
> used because of the concatenate + relocate style that is used for these
> DWARF sections. It would be great if any future accelerator tables are "map
> into memory and use as is" kind of tables like ".debug_names" and the
> ".apple_XXX" name accelerator tables.
>

Ah, fair point - could come up with a rather different structure if it were
designed for fast on-disk query (though then, like .debug_names (which I
don't think we have any linkers that can link today, for instance), you'd
probably /really/ want it to be linked in a content-aware manner, because
probing separate lookup tables (even if they're more designed for that)
per-CU doesn't probably gain you a lot).

- Dave


>
>
> Thanks,
> --paulr
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is 

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread Greg Clayton via Dwarf-Discuss


> On Mar 11, 2021, at 1:12 PM, Paul Robinson via Dwarf-Discuss 
>  wrote:
> 
> Tom Russell could perhaps speak to this better, but my understanding is that 
> our debugger guys like having .debug_aranges, because parsing the CU DIE does 
> take that extra effort.  I am unfamiliar with their code so I have to take 
> their word on it.  But I can certainly imagine that probing hundreds to 
> thousands of CUs in order to collect range information with lengthy range 
> lists would be more expensive than running through a comparatively compact 
> .debug_aranges list.  If Tom tells me I’m wrong, well, wouldn’t be the first 
> time.

We will use them if they are there, but one interesting issue that we ran into 
with LLDB is some compile units might be in .debug_aranges because the compiler 
made a .debug_aranges section in the .o file, but others might not. So we had 
to add code to LLDB to figure out which compile units have any entries in the 
.debug_aranges section, and read the DW_AT_ranges from the DW_TAG_compile_unit 
if it exist, and if it doesn't, manually index the DWARF to create one on the 
fly each time.

>  
> One thing we have encountered (see issue 210113.1) is that when we’ve done 
> dead-stripping, .debug_aranges entries (one per function, typically, because 
> -ffunction-sections) can end up pointing to nothing.  In our proprietary 
> linker I believe we compress/rewrite .debug_aranges to minimize the number of 
> entries, which by coincidence ends up producing a conforming aranges list; 
> LLD doesn’t do that, which means it produces a non-conforming list (with 
> zero-length entries), hence the issue. 
>  
> I’ll have to think about what a “modern” .debug_aranges might want to look 
> like.

A big issue with any of the DWARF sections is we are subject to making the 
contents work with linkers that just want to concatenate + relocate. This often 
leads to information being kept around when dead stripping occurs because 
anything that is dead stripped will just have its address zero'ed out or -1'ed 
out, but this bogus info is still in the data.

If we don't need a format that can simply be concatenated and relocated, the 
GSYM format, which is open sourced in llvm.org  already, 
might be good inspiration for a .debug_aranges successor section that has very 
efficient lookups. The GSYM format could actually be used as is by adding only 
a new DIE offset IntoType. 

Besides ".debug_names", all other DWARF accelerator tables are really just 
random indexes that must be linearly scanned or pre-indexed prior to being used 
because of the concatenate + relocate style that is used for these DWARF 
sections. It would be great if any future accelerator tables are "map into 
memory and use as is" kind of tables like ".debug_names" and the ".apple_XXX" 
name accelerator tables. 


> Thanks,
> --paulr
>  
> From: David Blaikie  
> Sent: Thursday, March 11, 2021 3:48 PM
> To: Robinson, Paul 
> Cc: Cary Coutant ; DWARF Discuss 
> 
> Subject: debug_aranges use and overhead
>  
> On Thu, Mar 11, 2021 at 5:48 AM  > wrote:
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
> 
> Yeah, how about we spin it off into another thread (done here)
>  
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
> 
> Did you want to file an issue to improve how .debug_aranges works?
> 
> I don't currently understand the value it provides, and I at least don't have 
> a use case for it, so I'm not sure I'd be the best person to advocate/drive 
> that work.
> 
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
> 
> .debug_names is quite different though - it collects information from across 
> the DIE tree - information that is expensive to otherwise gather (walking the 
> whole DIE tree).
> 
> .debug_aranges is not like that for most producers (producers that do include 
> the address ranges on the CU DIE) - the data is readily available immediately 
> on the CU. That does involve reading some of .debug_abbrev, and interpreting 
> a handful of attributes - but at least for the use cases I'm aware of, that 
> overhead isn't worth the size increase.
> 
> Do you have numbers on the benefits of .debug_aranges compared to parsing the 
> ranges from CU DIEs?
> 
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if its 
> children DIEs 

Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 1:12 PM  wrote:

> Tom Russell could perhaps speak to this better, but my understanding is
> that our debugger guys like having .debug_aranges, because parsing the CU
> DIE does take that extra effort.  I am unfamiliar with their code so I have
> to take their word on it.  But I can certainly imagine that probing
> hundreds to thousands of CUs in order to collect range information with
> lengthy range lists would be more expensive than running through a
> comparatively compact .debug_aranges list.  If Tom tells me I’m wrong,
> well, wouldn’t be the first time.
>

Yeah, I'd be curious to know more, for sure. Might resort to writing the
smallest DWARF parser to, say, handle address queries using debug_aranges
or CU ranges for comparison.


> One thing we have encountered (see issue 210113.1) is that when we’ve done
> dead-stripping, .debug_aranges entries (one per function, typically,
> because -ffunction-sections) can end up pointing to nothing.  In our
> proprietary linker I believe we compress/rewrite .debug_aranges to minimize
> the number of entries, which by coincidence ends up producing a conforming
> aranges list; LLD doesn’t do that, which means it produces a non-conforming
> list (with zero-length entries), hence the issue.
>

Yeah, it might be that it's more practical to fixup debug_aranges for dead
stripping than it is to fixup debug_rnglists (I mean, it is, for sure
easier to do) - and that fixing up likely makes the aranges much more
compact and thus cheaper to use/parse, which might be part of the
motivation for them.

One of the things I've thought about in that direction would be a flag on
debug_rnglists contributions (a bit in the header) that says "all rnglists
in here are referenced /only/ by rnglistx" - that way a linker could know
that it could rewrite the whole rnglist contribution and so long as it
fixed up the offset table at the start to adjust for any shrinking or
removed rnglists, it would still be correct. Hmm, now that I think about it
-such an attribute wouldn't be needed, necessarily - if the linker was
willing to adjust how relocations referring to the debug_rnglist section
were applied as things shifted around. (& you've got to use relocations
anyway, if you're not using rnglistx)


> I’ll have to think about what a “modern” .debug_aranges might want to look
> like.
>
> Thanks,
>
> --paulr
>
>
>
> *From:* David Blaikie 
> *Sent:* Thursday, March 11, 2021 3:48 PM
> *To:* Robinson, Paul 
> *Cc:* Cary Coutant ; DWARF Discuss <
> dwarf-discuss@lists.dwarfstd.org>
> *Subject:* debug_aranges use and overhead
>
>
>
> On Thu, Mar 11, 2021 at 5:48 AM  wrote:
>
> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>
>
> Yeah, how about we spin it off into another thread (done here)
>
>
> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>
>
> I don't currently understand the value it provides, and I at least don't
> have a use case for it, so I'm not sure I'd be the best person to
> advocate/drive that work.
>
> Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>
>
> .debug_names is quite different though - it collects information from
> across the DIE tree - information that is expensive to otherwise gather
> (walking the whole DIE tree).
>
> .debug_aranges is not like that for most producers (producers that do
> include the address ranges on the CU DIE) - the data is readily available
> immediately on the CU. That does involve reading some of .debug_abbrev, and
> interpreting a handful of attributes - but at least for the use cases I'm
> aware of, that overhead isn't worth the size increase.
>
> Do you have numbers on the benefits of .debug_aranges compared to parsing
> the ranges from CU DIEs?
>
> (one possible issue: the CU doesn't /have/ to contain low/high/ranges if
> its children DIEs contain addresses - having that as a guarantee, or some
> preferred way of encoding zero length (high/low of 0 would be acceptable, I
> guess) would be nice & make it cheap to skip over CUs that don't have any
> address ranges)
>
> Roughly, a modern debug_aranges to me would look something like:
>
> 
> 
> 
> 
> 
>
> So it could fully re-use the rnglist encoding. If this was going to be as
> compact as possible, it'd need to be configurable which encodings it uses -
> ranges V 

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 12:07 PM Mark Wielaard  wrote:

> Hi David,
>
> On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> > > > (I went to look a bit further and GCC's .debug_loclists.dwo but it
> seems
> > > > there's something about it that llvm-dwarfdump can't understand - it
> only
> > > > prints a handful of rather mangled location lists... not sure which
> > > > component (GCC, llvm-dwarfdump, or both) is getting things confused
> here
> > > -
> > > > oh, maybe some kind of DWARF extension for the "views" system, by the
> > > looks
> > > > of it)
> > >
> > > Yes, you might try -gno-variable-location-views or simply use binutils
> or
> > > elfutils readelf to look at them.
> > >
> >
> > Thanks! - is this proposed as a DWARF extension? I thought I remembered
> it
> > coming up, but hadn't realized how non-standard it was/that it was
> already
> > implemented. (quick search on the issues page and I can't find any
> mention
> > of it at least)
>
> We kind of need a dwarf-extensions discussion list to document/discuss
> these kind of non-extendable DWARF extensions. Only half kidding. Some
> things in DWARF are well designed to allow vendor extensions that can
> be skipped/ignored, but some aren't and we probably need to coordinate
> more because it is years between standard spec releases.
>

Yeah, happy to do that anywhere - dwarf-discuss is probably OK for it, I'd
guess.

& happy to co-implement DWARF extensions/future proposals - especially when
they're carving out an extension space, so it's less a question of "is this
a good extension" (a more nuanced/difficult debate - then it comes down to
is anyone going to use it/need it in lldb, etc) and more "is it reasonable
for this feature to be extensible and how should that work". Won't mean
immediate implementation in LLVM, but at least agreeing on the direction &
will make adding support at least in dumpers more clearly
motivated/understood/etc.

Implementing not-yet-standardized things, especially if they look like a
plausible direction for the standard, is a good thing - getting some
implementation experience, ironing out any gotchas, etc, before it's
published and possibly more widely adopted. (that said, I wouldn't mind
knowing what "widely adopted" looks like - folks mention maintaining old or
obscure toolchains, but not sure if they're using more modern DWARF, or is
it basically Clang, LLDB, GCC, and GDB using anything like DWARFv5 and
beyond?)


> Extending loclists is a bit of a pain because they aren't really
> extendable. Making them extendable is
> http://dwarfstd.org/ShowIssue.php?issue=170427.2


Ah, indeed - thanks for the link!


> but I am still
> pondering whether that really helps here because as written you can
> only interpret them end of list, but not really skip them.
>

Yeah, it would be nice if extension opcodes had a uleb length as their
first argument.

(this is essentially the difference between a custom DW_TAG or DW_AT (very
cheap, easy for consumers to ignore if they don't recognize it) and a
custom DW_FORM (expensive - consumer can't parse the list at all) - though
I guess this extension issue /might/ fall in between, as you could read the
list up until you hit an extension, and use that partial information for
locations, even if you couldn't parse all of it)


> Location views themselves are
> http://dwarfstd.org/ShowIssue.php?issue=170427.1


Right right - thanks for that!


> Alexandre Oliva, who proposed the Location Views as DWARF
> extension. He has some more background material at
> http://www.fsfla.org/~lxoliva/papers/sfn/ Which is mostly on variable
> tracking assignments and statement frontier annotations.  Which
> describes the GCC implementation that makes it possible to have
> location views.
>
> What is proposed is slightly different from what GCC currently
> implements though. Caroline, Cary and I are supposed to sit down and
> discuss it to see how it can be standardized. But finding time has
> been tricky.
>

*nod* takes some time, I'm interested to see how it comes along.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread Paul Robinson via Dwarf-Discuss
Tom Russell could perhaps speak to this better, but my understanding is that 
our debugger guys like having .debug_aranges, because parsing the CU DIE does 
take that extra effort.  I am unfamiliar with their code so I have to take 
their word on it.  But I can certainly imagine that probing hundreds to 
thousands of CUs in order to collect range information with lengthy range lists 
would be more expensive than running through a comparatively compact 
.debug_aranges list.  If Tom tells me I’m wrong, well, wouldn’t be the first 
time.

One thing we have encountered (see issue 210113.1) is that when we’ve done 
dead-stripping, .debug_aranges entries (one per function, typically, because 
-ffunction-sections) can end up pointing to nothing.  In our proprietary linker 
I believe we compress/rewrite .debug_aranges to minimize the number of entries, 
which by coincidence ends up producing a conforming aranges list; LLD doesn’t 
do that, which means it produces a non-conforming list (with zero-length 
entries), hence the issue.

I’ll have to think about what a “modern” .debug_aranges might want to look like.
Thanks,
--paulr

From: David Blaikie 
Sent: Thursday, March 11, 2021 3:48 PM
To: Robinson, Paul 
Cc: Cary Coutant ; DWARF Discuss 

Subject: debug_aranges use and overhead

On Thu, Mar 11, 2021 at 5:48 AM 
mailto:paul.robin...@sony.com>> wrote:
Hopefully not to side-track things too much... maybe wants its own
thread, if there's more to debate here.

Yeah, how about we spin it off into another thread (done here)

>> For the case you suggested where it would be useful to keep the range
>> list for the CU in the .o file, I think .debug_aranges is what you're
>> looking for.
>
> aranges has been off by default in LLVM for a while - it adds a lot of
> overhead (doesn't have all the nice rnglist encodings for instance -
> nor can it use debug_addr, and if it did it'd still be duplicate with
> the CU ranges wherever they were).

Did you want to file an issue to improve how .debug_aranges works?

I don't currently understand the value it provides, and I at least don't have a 
use case for it, so I'm not sure I'd be the best person to advocate/drive that 
work.
Complaining that it duplicates CU ranges is missing the point, though;
it's an index, like .debug_names, of course it duplicates other info.
If you want to suggest an improved index, like we did with .debug_names,
that would be great too.

.debug_names is quite different though - it collects information from across 
the DIE tree - information that is expensive to otherwise gather (walking the 
whole DIE tree).

.debug_aranges is not like that for most producers (producers that do include 
the address ranges on the CU DIE) - the data is readily available immediately 
on the CU. That does involve reading some of .debug_abbrev, and interpreting a 
handful of attributes - but at least for the use cases I'm aware of, that 
overhead isn't worth the size increase.

Do you have numbers on the benefits of .debug_aranges compared to parsing the 
ranges from CU DIEs?

(one possible issue: the CU doesn't /have/ to contain low/high/ranges if its 
children DIEs contain addresses - having that as a guarantee, or some preferred 
way of encoding zero length (high/low of 0 would be acceptable, I guess) would 
be nice & make it cheap to skip over CUs that don't have any address ranges)

Roughly, a modern debug_aranges to me would look something like:







So it could fully re-use the rnglist encoding. If this was going to be as 
compact as possible, it'd need to be configurable which encodings it uses - 
ranges V high/low, addrx V addr - at which point it'd probably look like a 
small DIE with an inline abbrev (similar to the way DWARFv5 encodes the file 
and directory entries now, and how debug_names is self-describing) - at which 
point it looks to me a lot like parsing the CU DIEs.

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


[Dwarf-Discuss] debug_aranges use and overhead

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 5:48 AM  wrote:

> Hopefully not to side-track things too much... maybe wants its own
> thread, if there's more to debate here.
>

Yeah, how about we spin it off into another thread (done here)


> >> For the case you suggested where it would be useful to keep the range
> >> list for the CU in the .o file, I think .debug_aranges is what you're
> >> looking for.
> >
> > aranges has been off by default in LLVM for a while - it adds a lot of
> > overhead (doesn't have all the nice rnglist encodings for instance -
> > nor can it use debug_addr, and if it did it'd still be duplicate with
> > the CU ranges wherever they were).
>
> Did you want to file an issue to improve how .debug_aranges works?
>

I don't currently understand the value it provides, and I at least don't
have a use case for it, so I'm not sure I'd be the best person to
advocate/drive that work.

Complaining that it duplicates CU ranges is missing the point, though;
> it's an index, like .debug_names, of course it duplicates other info.
> If you want to suggest an improved index, like we did with .debug_names,
> that would be great too.
>

.debug_names is quite different though - it collects information from
across the DIE tree - information that is expensive to otherwise gather
(walking the whole DIE tree).

.debug_aranges is not like that for most producers (producers that do
include the address ranges on the CU DIE) - the data is readily available
immediately on the CU. That does involve reading some of .debug_abbrev, and
interpreting a handful of attributes - but at least for the use cases I'm
aware of, that overhead isn't worth the size increase.

Do you have numbers on the benefits of .debug_aranges compared to parsing
the ranges from CU DIEs?

(one possible issue: the CU doesn't /have/ to contain low/high/ranges if
its children DIEs contain addresses - having that as a guarantee, or some
preferred way of encoding zero length (high/low of 0 would be acceptable, I
guess) would be nice & make it cheap to skip over CUs that don't have any
address ranges)

Roughly, a modern debug_aranges to me would look something like:







So it could fully re-use the rnglist encoding. If this was going to be as
compact as possible, it'd need to be configurable which encodings it uses -
ranges V high/low, addrx V addr - at which point it'd probably look like a
small DIE with an inline abbrev (similar to the way DWARFv5 encodes the
file and directory entries now, and how debug_names is self-describing) -
at which point it looks to me a lot like parsing the CU DIEs.
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Mark Wielaard via Dwarf-Discuss
Hi David,

On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> > > (I went to look a bit further and GCC's .debug_loclists.dwo but it seems
> > > there's something about it that llvm-dwarfdump can't understand - it only
> > > prints a handful of rather mangled location lists... not sure which
> > > component (GCC, llvm-dwarfdump, or both) is getting things confused here
> > -
> > > oh, maybe some kind of DWARF extension for the "views" system, by the
> > looks
> > > of it)
> >
> > Yes, you might try -gno-variable-location-views or simply use binutils or
> > elfutils readelf to look at them.
> >
> 
> Thanks! - is this proposed as a DWARF extension? I thought I remembered it
> coming up, but hadn't realized how non-standard it was/that it was already
> implemented. (quick search on the issues page and I can't find any mention
> of it at least)

We kind of need a dwarf-extensions discussion list to document/discuss
these kind of non-extendable DWARF extensions. Only half kidding. Some
things in DWARF are well designed to allow vendor extensions that can
be skipped/ignored, but some aren't and we probably need to coordinate
more because it is years between standard spec releases.

Extending loclists is a bit of a pain because they aren't really
extendable. Making them extendable is
http://dwarfstd.org/ShowIssue.php?issue=170427.2 but I am still
pondering whether that really helps here because as written you can
only interpret them end of list, but not really skip them.

Location views themselves are
http://dwarfstd.org/ShowIssue.php?issue=170427.1

Alexandre Oliva, who proposed the Location Views as DWARF
extension. He has some more background material at
http://www.fsfla.org/~lxoliva/papers/sfn/ Which is mostly on variable
tracking assignments and statement frontier annotations.  Which
describes the GCC implementation that makes it possible to have
location views.

What is proposed is slightly different from what GCC currently
implements though. Caroline, Cary and I are supposed to sit down and
discuss it to see how it can be standardized. But finding time has
been tricky.

Cheers,

Mark
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 11:44 AM Jakub Jelinek  wrote:

> On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> > Thanks! - is this proposed as a DWARF extension? I thought I remembered
> it
>
> 170427.1 I think.  Note, what is emitted is different from what is being
> proposed, the problem with DW_LLE_* and DW_RLE_* is that they aren't easily
> extensible (in a way that would allow consumers that don't know about the
> extension skip it and parse just the standard ones; because when seeing
> an unknown opcode, the consumer doesn't know what arguments if any it has).
> E.g. in the way .debug_macro allows producers to define what arguments
> extension opcodes have (how many and what DW_FORM_* each has).
> So I think what GCC currently produces puts the stuff before the location
> sequences such that if a consumer can't handle those, it can skip those.
>

Ah, cunning! Yeah, there's a few places where LLVM just keeps trying to
parse the next thing, rather than only parsing parts that are referenced
from elsewhere (the other one I know of is a bug in location lists when
combined with bfd's linker tombstoning of gc'd sections (it sets any
relocation to a gc'd section to zero): if a location list were to span
across a gc'd section (such as for a global, raised into a register in one
function - LLVM can't produce the right DWARF for this, not sure about GCC)
binutils readelf, etc, will only dump sections of debug_loc that are
referenced from .debug_info, so the early list termination just leaves
holes, rather than mangled parsing trying to interpret the location
expression following the accidental terminator as the start of another
location list)


> The only thing that doesn't really work well for consumer unaware about
> that
> extension is walking the whole .debug_rnglists and dumping everything that
> it contains.
>

Yup - yeah, LLVM will just try to parse each offset and then go to the next
one, etc. (I don't think lldb would do this, hopefully - this is only an
issue with llvm-dwarfdump trying to dump as much as possible)

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Jakub Jelinek via Dwarf-Discuss
On Thu, Mar 11, 2021 at 11:30:05AM -0800, David Blaikie wrote:
> Thanks! - is this proposed as a DWARF extension? I thought I remembered it

170427.1 I think.  Note, what is emitted is different from what is being
proposed, the problem with DW_LLE_* and DW_RLE_* is that they aren't easily
extensible (in a way that would allow consumers that don't know about the
extension skip it and parse just the standard ones; because when seeing
an unknown opcode, the consumer doesn't know what arguments if any it has).
E.g. in the way .debug_macro allows producers to define what arguments
extension opcodes have (how many and what DW_FORM_* each has).
So I think what GCC currently produces puts the stuff before the location
sequences such that if a consumer can't handle those, it can skip those.
The only thing that doesn't really work well for consumer unaware about that
extension is walking the whole .debug_rnglists and dumping everything that
it contains.

Jakub

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Retrieving variables, function address using dwarf

2021-03-11 Thread Greg Clayton via Dwarf-Discuss
Most local variables have locations that do require registers. 
DW_OP_call_frame_cfa says it needs to push the value that defines the call 
frame address which is typically based on the SP or FP depending on how things 
were compiled, so you would need registers for this. DW_OP_fbreg is another 
common opcode for local variables which relies on you being able to evaluated 
the DW_TAG_subprogram's DW_AT_frame_base attribute, which is a location 
expression, that often is something like "SP + " or "FP + ", so you 
will need registers for that too.

> On Mar 10, 2021, at 10:38 PM, Archana Deshmukh via Dwarf-Discuss 
>  wrote:
> 
> Thanks Michael for the response. Actually, I have only this much information. 
>  
> 
> I need to get information related to 
> 
>  For global variables , I read the address "55b51afea000" from > 
> /proc//maps file. I use DW_OP_addr parameter to retrieve the address.
> 
>  55b51afea000 + DW_OP_addr gives me the address of global variables.  
> 
> For function, I read the address "55b51afea000" from > /proc//maps 
> file.. I use DW_AT_low_pc parameter to retrieve the function starting address
> 
> Now, I need to read the local variables address. As I am not executing the 
> process, I cannot use registers. I need to use DW_OP_call_frame_cfa. I am not 
> able to understand how to retrieve addresses using  DW_OP_call_frame_cfa.
> 
> Any pointer or suggestion are most welcome.
> 
> Best Regards,
> Archana 
> 
> 
> 
>   
> 
> On Tue, Mar 9, 2021 at 11:07 PM Michael Eager  > wrote:
> It's difficult to offer advice with such a spare description.
> 
> You might read the executable and relocate the .debug_info and
> other debug sections using the process map.  If you have the
> process image, this probably would not be necessary.
> 
> On 3/8/21 1:49 AM, Archana Deshmukh via Dwarf-Discuss wrote:
> > Hello,
> > 
> > I have a pinatrace.out and process map of a file.
> > With this input, I need to build a symbol table.
> > 
> > Best Regards,
> > Archana Deshmukh
> > 
> > On Sun, Mar 7, 2021 at 10:29 AM Archana Deshmukh 
> > mailto:desharchan...@gmail.com> 
> > >> wrote:
> > 
> > 
> > 
> > -- Forwarded message -
> > From: *Michael Eager* mailto:ea...@eagercon.com> 
> > >>
> > Date: Sat, Mar 6, 2021 at 10:53 PM
> > Subject: Re: [Dwarf-Discuss] Retrieving variables, function address
> > using dwarf
> > To: Archana Deshmukh  > 
> > >>, 
> > mailto:dwarf-discuss@lists.dwarfstd.org>
> >  > >>
> > 
> > 
> > On 3/5/21 8:28 PM, Archana Deshmukh via Dwarf-Discuss wrote:
> >  > I need to read the address of local variable, global variable,
> > function
> >  > name and function arguments from the process.
> >  >
> >  > For global variables , I read the address "55b51afea000" from
> >  > /proc//maps file. I use DW_OP_addr parameter to retrieve the
> > address.
> >  > 55b51afea000 + DW_OP_addr gives me the address of global variable.
> >  >
> >  > I need to read the stack segment, heap. Is there any way to read
> >  > segments? DW_AT_segment parameter seems to be for 16 bit.
> >  >
> >  > I need to read the following process map using dwarf.
> >  >
> >  > Any suggestion, pointers are welcome.
> >  >
> >  > 55b51afea000-55b51afeb000 r-xp  fd:00 5902563
> > 
> > Can you explain what you are trying to do?
> > 
> > Usually a DWARF consumer (a debugger) does not need to read the
> > process memory map.  All of the information you mention is in
> > the DWARF data.  You may need to relocate addresses in the DWARF
> > debug data.
> > 
> > DWARF does not contain information about the process memory
> > layout, such as the location of the heap or the start of the
> > stack.
> > 
> > -- 
> > Michael Eager
> > 
> > 
> > ___
> > Dwarf-Discuss mailing list
> > Dwarf-Discuss@lists.dwarfstd.org 
> > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org 
> > 
> > 
> 
> -- 
> Michael Eager
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 2:55 AM Mark Wielaard  wrote:

> Hi David,
>
> On Thu, Mar 11, 2021 at 01:01:05AM -0800, David Blaikie wrote:
> > +Mark in case he's got further context/perspective to share in the
> context
> > of this thread
>
> I haven't yet caught up on the mailinglist, but I think I understand
> the context, it was a discussion Simon and I had about how to handle
> .debug_rnglists in the main object file vs the split object.
>
> > One particular thing I'll pull out of the gdb-patches thread is:
> >
> > "But the rnglists
> > (loclists) themselves can still use relocations. A large part of them
> > is non-shared addresses, so using indexes (into the .debug_addr
> > addr_base) would simply be extra overhead."
> >
> > That's not quite right - while a direct mapping from debug_loc and
> > debug_ranges (at least location and range lists not using base address
> > selection entries) to debug_loclist and debug_rnglist would produce a
> > similar number of addresses and relocations - there's a lot to be gained
> by
> > using DW_RLE/LLE_base_addressx entries - then you can strategically reuse
> > an already-existing debug_addr entry and avoid another relocation all
> > together (debug_loc and debug_ranges couldn't do this, even when using a
> > base address selection entry - that base address couldn't be shared with
> > other lists, since it was inline).
>
> I admit I didn't implement anything to measure. So I can certainly be
> convinced of the opposite. But if your strategically reuse algorithm
> can also identify when it isn't strategic, then just not having an
> indirection for that address through the .debug_addr index is still a
> win. It just means you don't get to move the relocation to the
> .debug_addr. But I see why this is still important because...
>

Yeah - I haven't implemented anything in LLVM to bail out and avoid addrx
when there isn't another use of the address in DWARFv5 non-split because I
don't have much use for non-split (Google's not switched to split by
default, but that's the mode to use when size matters - so optimizing
non-split for size isn't a high priority for me) and there's no case I know
of where DW_AT_low_pc or any address in DW_AT_ranges wouldn't be used in at
least one other place: all those addresses will be used as the starting
address of a subprogram at least, so that's at least two uses.


>
> > also, as to the original motivation for Split DWARF (reduce object size,
> > reduce relocations, etc) - mostly a distributed build system where the
> cost
> > of shipping all the object files to the link step is a significant
> > bottleneck - so reduced object size (so reducing the DWARF object size -
> > which is both .debug_* sizes, and .rela.debug_* sizes equally - well,
> > except we do use -gz so .debug_* sizes are much smaller, but
> .rela.debug_*
> > is not compressed - so reducing relocations is /extra/ important).
>
> Although I see how a distributed build system where there is a cost of
> shipping object files to the linker might be a motivating factor. I
> also think that isn't a common setup. So yes to reducing relocations,
> having less work for the linker to do. But reducing transport cost
> wouldn't be that high on my list.
>

Fair enough - yeah, it boils down to similarly, as you said, fewer actions
for the linker to perform (both in terms of relocations to apply, and in
terms of bytes to write to the output file/linked executable (&
subsequently/also a smaller final linked executable)).


> > (I went to look a bit further and GCC's .debug_loclists.dwo but it seems
> > there's something about it that llvm-dwarfdump can't understand - it only
> > prints a handful of rather mangled location lists... not sure which
> > component (GCC, llvm-dwarfdump, or both) is getting things confused here
> -
> > oh, maybe some kind of DWARF extension for the "views" system, by the
> looks
> > of it)
>
> Yes, you might try -gno-variable-location-views or simply use binutils or
> elfutils readelf to look at them.
>

Thanks! - is this proposed as a DWARF extension? I thought I remembered it
coming up, but hadn't realized how non-standard it was/that it was already
implemented. (quick search on the issues page and I can't find any mention
of it at least)

(aside: Hmm, readelf doesn't have support for the offset entry tables in
either rnglists or loclists, I think:

readelf: Warning: The .debug_rnglists section contains unsupported offset
entry count: 2819.)

Unrecognized debug section: .debug_loclists.dwo

But, yeah, using readelf on a non-split-DWARF build I see these "location
view pair"s showing up.

I think you did convince me we need to look at smarter .debug_addr usage.
>

Great! Happy to chat about it further any time! I can point you to some of
the patches in LLVM and/or provide examples that demonstrate the
interesting cases of reuse.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Paul Robinson via Dwarf-Discuss
Hopefully not to side-track things too much... maybe wants its own
thread, if there's more to debate here.

>> For the case you suggested where it would be useful to keep the range
>> list for the CU in the .o file, I think .debug_aranges is what you're
>> looking for.
>
> aranges has been off by default in LLVM for a while - it adds a lot of
> overhead (doesn't have all the nice rnglist encodings for instance -
> nor can it use debug_addr, and if it did it'd still be duplicate with
> the CU ranges wherever they were).

Did you want to file an issue to improve how .debug_aranges works?

Complaining that it duplicates CU ranges is missing the point, though;
it's an index, like .debug_names, of course it duplicates other info.
If you want to suggest an improved index, like we did with .debug_names,
that would be great too.
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Mark Wielaard via Dwarf-Discuss
Hi David,

On Thu, Mar 11, 2021 at 01:01:05AM -0800, David Blaikie wrote:
> +Mark in case he's got further context/perspective to share in the context
> of this thread

I haven't yet caught up on the mailinglist, but I think I understand
the context, it was a discussion Simon and I had about how to handle
.debug_rnglists in the main object file vs the split object.

> One particular thing I'll pull out of the gdb-patches thread is:
> 
> "But the rnglists
> (loclists) themselves can still use relocations. A large part of them
> is non-shared addresses, so using indexes (into the .debug_addr
> addr_base) would simply be extra overhead."
> 
> That's not quite right - while a direct mapping from debug_loc and
> debug_ranges (at least location and range lists not using base address
> selection entries) to debug_loclist and debug_rnglist would produce a
> similar number of addresses and relocations - there's a lot to be gained by
> using DW_RLE/LLE_base_addressx entries - then you can strategically reuse
> an already-existing debug_addr entry and avoid another relocation all
> together (debug_loc and debug_ranges couldn't do this, even when using a
> base address selection entry - that base address couldn't be shared with
> other lists, since it was inline).

I admit I didn't implement anything to measure. So I can certainly be
convinced of the opposite. But if your strategically reuse algorithm
can also identify when it isn't strategic, then just not having an
indirection for that address through the .debug_addr index is still a
win. It just means you don't get to move the relocation to the
.debug_addr. But I see why this is still important because...

> also, as to the original motivation for Split DWARF (reduce object size,
> reduce relocations, etc) - mostly a distributed build system where the cost
> of shipping all the object files to the link step is a significant
> bottleneck - so reduced object size (so reducing the DWARF object size -
> which is both .debug_* sizes, and .rela.debug_* sizes equally - well,
> except we do use -gz so .debug_* sizes are much smaller, but .rela.debug_*
> is not compressed - so reducing relocations is /extra/ important).

Although I see how a distributed build system where there is a cost of
shipping object files to the linker might be a motivating factor. I
also think that isn't a common setup. So yes to reducing relocations,
having less work for the linker to do. But reducing transport cost
wouldn't be that high on my list.

> (I went to look a bit further and GCC's .debug_loclists.dwo but it seems
> there's something about it that llvm-dwarfdump can't understand - it only
> prints a handful of rather mangled location lists... not sure which
> component (GCC, llvm-dwarfdump, or both) is getting things confused here -
> oh, maybe some kind of DWARF extension for the "views" system, by the looks
> of it)

Yes, you might try -gno-variable-location-views or simply use binutils or
elfutils readelf to look at them.

I think you did convince me we need to look at smarter .debug_addr usage.

Thanks,

Mark
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Retrieving variables, function address using dwarf

2021-03-11 Thread Michael Eager via Dwarf-Discuss

On 3/10/21 10:38 PM, Archana Deshmukh wrote:
Thanks Michael for the response. Actually, I have only this much 
information.


I need to get information related to

  For global variables , I read the address "55b51afea000" from     > 
/proc//maps file. I use DW_OP_addr parameter to retrieve the address.


  55b51afea000 + DW_OP_addr gives me the address of global variables.

For function, I read the address "55b51afea000" from     > 
/proc//maps file.. I use DW_AT_low_pc parameter to retrieve the 
function starting address


Now, I need to read the local variables address. As I am not executing 
the process, I cannot use registers. I need to use DW_OP_call_frame_cfa. 
I am not able to understand how to retrieve addresses using 
DW_OP_call_frame_cfa.


Any pointer or suggestion are most welcome.


Attach to the process using ptrace() and you can get process registers.



Best Regards,
Archana




On Tue, Mar 9, 2021 at 11:07 PM Michael Eager > wrote:


It's difficult to offer advice with such a spare description.

You might read the executable and relocate the .debug_info and
other debug sections using the process map.  If you have the
process image, this probably would not be necessary.

On 3/8/21 1:49 AM, Archana Deshmukh via Dwarf-Discuss wrote:
 > Hello,
 >
 > I have a pinatrace.out and process map of a file.
 > With this input, I need to build a symbol table.
 >
 > Best Regards,
 > Archana Deshmukh
 >
 > On Sun, Mar 7, 2021 at 10:29 AM Archana Deshmukh
 > mailto:desharchan...@gmail.com>
>>
wrote:
 >
 >
 >
 >     -- Forwarded message -
 >     From: *Michael Eager* mailto:ea...@eagercon.com> >>
 >     Date: Sat, Mar 6, 2021 at 10:53 PM
 >     Subject: Re: [Dwarf-Discuss] Retrieving variables, function
address
 >     using dwarf
 >     To: Archana Deshmukh mailto:desharchan...@gmail.com>
 >     >>,
mailto:dwarf-discuss@lists.dwarfstd.org>
 >     >>
 >
 >
 >     On 3/5/21 8:28 PM, Archana Deshmukh via Dwarf-Discuss wrote:
 >      > I need to read the address of local variable, global variable,
 >     function
 >      > name and function arguments from the process.
 >      >
 >      > For global variables , I read the address "55b51afea000" from
 >      > /proc//maps file. I use DW_OP_addr parameter to
retrieve the
 >     address.
 >      > 55b51afea000 + DW_OP_addr gives me the address of global
variable.
 >      >
 >      > I need to read the stack segment, heap. Is there any way
to read
 >      > segments? DW_AT_segment parameter seems to be for 16 bit.
 >      >
 >      > I need to read the following process map using dwarf.
 >      >
 >      > Any suggestion, pointers are welcome.
 >      >
 >      > 55b51afea000-55b51afeb000 r-xp  fd:00 5902563
 >
 >     Can you explain what you are trying to do?
 >
 >     Usually a DWARF consumer (a debugger) does not need to read the
 >     process memory map.  All of the information you mention is in
 >     the DWARF data.  You may need to relocate addresses in the DWARF
 >     debug data.
 >
 >     DWARF does not contain information about the process memory
 >     layout, such as the location of the heap or the start of the
 >     stack.
 >
 >     --
 >     Michael Eager
 >
 >
 > ___
 > Dwarf-Discuss mailing list
 > Dwarf-Discuss@lists.dwarfstd.org

 > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

 >

-- 
Michael Eager




--
Michael Eager
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 1:39 AM Jakub Jelinek  wrote:

> On Thu, Mar 11, 2021 at 01:05:06AM -0800, David Blaikie wrote:
> > What's your take on:
> >
> > 1) Fixing GDB to handle GCC's current output.
>
> I don't know what GDB will do, it is up to the GDB people.
>
> > 2) Fixing GCC to produce something maybe more standards conforming (to my
> > mind, ideally: ranges on the skeleton CU (using either
> > rnglists_base+rnglistx (like LLVM), or sec_offset (actually more
> > compact/better than LLVM anyway, and avoids the ambiguous situation), and
> > rnglistx in child DIEs the split full unit using using
> .debug_rnglists.dwo)
>
> Given the
> 3.1.3 "The following attributes are not part of a split full compilation
> unit entry but instead are 18 inherited (if present) from the corresponding
> skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges,
> DW_AT_stmt_list, DW_AT_comp_dir, DW_AT_str_offsets_base, DW_AT_addr_base
> and DW_AT_rnglists_base."
> sentence, at least for DWARF5 putting DW_AT_ranges into the full unit
> rather
> than skeleton unit for split DWARF seems like non-conforming, so I'll
> probably adjust my patch, but see below.
> Now, for DW_AT_addr_base and DW_AT_rnglists_base the spec talks about
> it affecting just .debug_addr or .debug_rnglists section, doesn't mention
> the .debug_rnglists.dwo section, while for DW_AT_str_offsets_base it talks
> about .debug_str_offsets or .debug_str_offsets.dwo.
> So, maybe one reason why DW_AT_rnglists_base might be ok on the skeleton
> unit.  On the other side, e.g. in Table F.1 I see there for Skeleton and
> Split:
> DW_AT_low_pc - Skeleton only
> DW_AT_ranges - Split Full only (so, in contradition of 3.1.3)
> DW_AT_rnglists_base - not present
> So, DWARF5 is inconsistent.  But appendix F is informative and so I think
> the normative 3.1.3 wins.
>

Yup, I hope to get those inconsistencies addressed through an issue I've
filed earlier today. Glad for the discussion/confirmation/etc.


> So, I think I'll go with DW_AT_ranges and DW_AT_low_pc in
> DW_TAG_skeleton_unit, but the former using DW_FORM_sec_offset rather than
> DW_FORM_rnglistx and no DW_AT_rnglists_base (I really don't see a benefit
> of that, there is one relocation either way, either on the DW_AT_ranges
> with DW_FORM_sec_offset or on DW_AT_rnglists_base with DW_FORM_sec_offset,
> but for the latter one needs one byte for the DW_FORM_rnglistx too and
> two extra bytes in .debug_abbrev.  DW_FORM_rnglistx can be beneficial if
> there
> is more than one range, which is not the case for the skeleton.
>

Yep, that's my take on it too - while I can argue that the LLVM output
is/should be valid, it's not the most efficient, and the most efficient
(using sec_offset for ranges on the skeleton CU) dodges this particular
question of validity - sounds good to me.


> But .debug_rnglists I'll probably use the *x suffixed DW_RLE_* opcodes when
> DW_RLE_offset_pair can't be used even in the skeleton .debug_rnglists.
>

Yeah, I can certainly +1 to that. Sharing the debug_addr entries is great
for object file size.

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Jakub Jelinek via Dwarf-Discuss
On Thu, Mar 11, 2021 at 01:05:06AM -0800, David Blaikie wrote:
> What's your take on:
> 
> 1) Fixing GDB to handle GCC's current output.

I don't know what GDB will do, it is up to the GDB people.

> 2) Fixing GCC to produce something maybe more standards conforming (to my
> mind, ideally: ranges on the skeleton CU (using either
> rnglists_base+rnglistx (like LLVM), or sec_offset (actually more
> compact/better than LLVM anyway, and avoids the ambiguous situation), and
> rnglistx in child DIEs the split full unit using using .debug_rnglists.dwo)

Given the
3.1.3 "The following attributes are not part of a split full compilation
unit entry but instead are 18 inherited (if present) from the corresponding
skeleton compilation unit: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges,
DW_AT_stmt_list, DW_AT_comp_dir, DW_AT_str_offsets_base, DW_AT_addr_base
and DW_AT_rnglists_base."
sentence, at least for DWARF5 putting DW_AT_ranges into the full unit rather
than skeleton unit for split DWARF seems like non-conforming, so I'll
probably adjust my patch, but see below.
Now, for DW_AT_addr_base and DW_AT_rnglists_base the spec talks about
it affecting just .debug_addr or .debug_rnglists section, doesn't mention
the .debug_rnglists.dwo section, while for DW_AT_str_offsets_base it talks
about .debug_str_offsets or .debug_str_offsets.dwo.
So, maybe one reason why DW_AT_rnglists_base might be ok on the skeleton
unit.  On the other side, e.g. in Table F.1 I see there for Skeleton and Split:
DW_AT_low_pc - Skeleton only
DW_AT_ranges - Split Full only (so, in contradition of 3.1.3)
DW_AT_rnglists_base - not present
So, DWARF5 is inconsistent.  But appendix F is informative and so I think
the normative 3.1.3 wins.

So, I think I'll go with DW_AT_ranges and DW_AT_low_pc in
DW_TAG_skeleton_unit, but the former using DW_FORM_sec_offset rather than
DW_FORM_rnglistx and no DW_AT_rnglists_base (I really don't see a benefit
of that, there is one relocation either way, either on the DW_AT_ranges
with DW_FORM_sec_offset or on DW_AT_rnglists_base with DW_FORM_sec_offset,
but for the latter one needs one byte for the DW_FORM_rnglistx too and
two extra bytes in .debug_abbrev.  DW_FORM_rnglistx can be beneficial if there
is more than one range, which is not the case for the skeleton.
But .debug_rnglists I'll probably use the *x suffixed DW_RLE_* opcodes when
DW_RLE_offset_pair can't be used even in the skeleton .debug_rnglists.

Jakub

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
On Thu, Mar 11, 2021 at 12:32 AM Jakub Jelinek  wrote:

> On Wed, Mar 10, 2021 at 10:07:27PM -0800, David Blaikie wrote:
> > On Wed, Mar 10, 2021 at 9:38 PM Jakub Jelinek  wrote:
> >
> > > On Wed, Mar 10, 2021 at 04:12:57PM -0800, David Blaikie via
> Dwarf-Discuss
> > > wrote:
> > > > On Wed, Mar 10, 2021 at 4:02 PM Cary Coutant 
> wrote:
> > > >
> > > > > > > So in the end the logical thing to do when encountering a
> > > > > > > DW_FORM_rnglistx in a split-unit, in order to support
> everybody, is
> > > > > > > probably to go to the .debug_rnglists.dwo section, if there's
> one,
> > > > > > > disregarding the (inherited) DW_AT_rnglists_base.  If there
> isn't,
> > > then
> > > > > > > try the linked file's .debug_rnglists section, using
> > > > > > > DW_AT_rnglists_base.  If there isn't, then something is
> malformed.
> > > > >
> > > > > Looks reasonable to me. I think we need a new issue to clarify
> this in
> > > > > DWARF 6.
> > > > >
> > > >
> > > > Given that DWARFv5 isn't on by default in GCC yet & I think has a few
> > > more
> > >
> > > It is on by default.  But -gstrict-dwarf is not on by default.
> > >
> >
> > Oh, it is - in a released version of the compiler, or only in
> development?
>
> Still in development, but the prerelease already widely deployed by
> multiple
> Linux distributions.
>
> > & you're proposing changing the behavior only under -gstrict-dwarf,
> rather
> > than in general? Any particular reason?
>
> Just a typo, sorry, meant -gsplit-dwarf.
>

Ah, right right - I'm with you.

What's your take on:

1) Fixing GDB to handle GCC's current output.
2) Fixing GCC to produce something maybe more standards conforming (to my
mind, ideally: ranges on the skeleton CU (using either
rnglists_base+rnglistx (like LLVM), or sec_offset (actually more
compact/better than LLVM anyway, and avoids the ambiguous situation), and
rnglistx in child DIEs the split full unit using using .debug_rnglists.dwo)
3) both? (so GDB can handle old GCC's output and the newer/more correct
output)

Personally, I'd have thought it'd be enough to move forward, change GCC and
be done - but if folks would like GDB (& GDB folks are cool with it) to
handle the old/weird GCC output, that's cool/up to GDB folks. Though I hope
that sort of DWARF doesn't stick around long/need lots of long-lived
support (I hope we don't need to add it to llvm's symbolizer for instance).

- Dave
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread David Blaikie via Dwarf-Discuss
+Mark in case he's got further context/perspective to share in the context
of this thread

One particular thing I'll pull out of the gdb-patches thread is:

"But the rnglists
(loclists) themselves can still use relocations. A large part of them
is non-shared addresses, so using indexes (into the .debug_addr
addr_base) would simply be extra overhead."

That's not quite right - while a direct mapping from debug_loc and
debug_ranges (at least location and range lists not using base address
selection entries) to debug_loclist and debug_rnglist would produce a
similar number of addresses and relocations - there's a lot to be gained by
using DW_RLE/LLE_base_addressx entries - then you can strategically reuse
an already-existing debug_addr entry and avoid another relocation all
together (debug_loc and debug_ranges couldn't do this, even when using a
base address selection entry - that base address couldn't be shared with
other lists, since it was inline). This has significant savings, and was
the main reason I suggested to Paul Robinson that range lists should get
the same handling as loclists - since optimized builds use a lot of range
listst and their relocations were taking up a huge amount of the remaining
.o/executable contribution to debug info. This (rnglists.dwo with strategic
use of base address selection entries) was the main win we saw when
switching Google from DWARFv4+GNU-extension Split DWARF to DWARFv5 and
justified the not insignificant work of updating the various DWARF
consumers we had (including a few of those patches upstreamed to gdb, lldb,
and various internal and external symbolizers).

also, as to the original motivation for Split DWARF (reduce object size,
reduce relocations, etc) - mostly a distributed build system where the cost
of shipping all the object files to the link step is a significant
bottleneck - so reduced object size (so reducing the DWARF object size -
which is both .debug_* sizes, and .rela.debug_* sizes equally - well,
except we do use -gz so .debug_* sizes are much smaller, but .rela.debug_*
is not compressed - so reducing relocations is /extra/ important).

And a neat note: Actually after DWARFv5 we have a .rela.debug_addr which is
/smaller/ than .rela.debug_line, which is sort of surprising/noteworthy.
That means that .rela.debug_addr has just one address for each section* -
but it also has an entry for each global variable, which .debug_line
doesn't have, so I'd usually expect .rela.debug_line to be a strict subset
of .rela.debug_addr - except that DWARFv5 moved the debug_line strings out
to .debug_line_str, which added a relocation for every file/directory name
- pushing the number of relocations up over the trimmed down
rela.debug_addr.

I haven't done GCC V Clang comparisons in a while - but it might be worth
trying some with Split DWARF, as I suspect this rnglist stuff and strategic
base address selection logic may carry quite some weight.

Picked a random file from the LLVM tree and built it with -O3 -gdwarf-5
-gsplit-dwarf with Clang and GCC ToT and some relevant stats:

Probably the lower bound for relocations is GCC's .rela.debug_line since it
uses the DWARFv3 line tables, without relocations for each directory and
file name, so basically one relocation per section (this build used
-ffunction-sections, so that amounts to one relocation per function):

 2.29Ki .rela.debug_line

GCC's other .rela.debug_*:

  291Ki .rela.debug_addr

   74Ki .rela.debug_rnglists

Clang's:

  7Ki .rela.debug_line
 45Ki .rela.debug_addr

Or, with the "prefer DW_AT_ranges, even when the range is contiguous":

  6Ki .rela.debug_addr


And with a custom form...

 2.20Ki   0.0%   0.rela.debug_addr

And comparing the .rela.debug_line and .rela.debug_addr in this last
example - there's exactly one more debug_addr relocation than debug_line
relocation, the one global variable in this CU.

(I went to look a bit further and GCC's .debug_loclists.dwo but it seems
there's something about it that llvm-dwarfdump can't understand - it only
prints a handful of rather mangled location lists... not sure which
component (GCC, llvm-dwarfdump, or both) is getting things confused here -
oh, maybe some kind of DWARF extension for the "views" system, by the looks
of it)

* actually there's one bit left: DW_AT_low_pc - it can't use an
addrx+offset encoding. So for now I've implemented a mode in Clang where
DW_AT_ranges instead of DW_AT_low/high_pc is used even when a DIE has a
contiguous address, so that a strategic base address can be used, reducing
the size of .{rela.}debug_addr a bit more, at the expense of a slightly
larger .debug_rnglists.dwo. I hope to propose/add some kind of addrx+offset
form to DWARFv6 to address this gap (& I've prototyped that in Clang too,
under a flag).

On Wed, Mar 10, 2021 at 11:23 PM Simon Marchi via Dwarf-Discuss <
dwarf-discuss@lists.dwarfstd.org> wrote:

> On 2021-03-10 10:59 a.m., Jakub Jelinek via Dwarf-Discuss wrote:> Hi!
> >
> > We got a report today that 

Re: [Dwarf-Discuss] Split Dwarf vs. CU DW_AT_ranges / DW_AT_low_pc placement

2021-03-11 Thread Jakub Jelinek via Dwarf-Discuss
On Wed, Mar 10, 2021 at 10:07:27PM -0800, David Blaikie wrote:
> On Wed, Mar 10, 2021 at 9:38 PM Jakub Jelinek  wrote:
> 
> > On Wed, Mar 10, 2021 at 04:12:57PM -0800, David Blaikie via Dwarf-Discuss
> > wrote:
> > > On Wed, Mar 10, 2021 at 4:02 PM Cary Coutant  wrote:
> > >
> > > > > > So in the end the logical thing to do when encountering a
> > > > > > DW_FORM_rnglistx in a split-unit, in order to support everybody, is
> > > > > > probably to go to the .debug_rnglists.dwo section, if there's one,
> > > > > > disregarding the (inherited) DW_AT_rnglists_base.  If there isn't,
> > then
> > > > > > try the linked file's .debug_rnglists section, using
> > > > > > DW_AT_rnglists_base.  If there isn't, then something is malformed.
> > > >
> > > > Looks reasonable to me. I think we need a new issue to clarify this in
> > > > DWARF 6.
> > > >
> > >
> > > Given that DWARFv5 isn't on by default in GCC yet & I think has a few
> > more
> >
> > It is on by default.  But -gstrict-dwarf is not on by default.
> >
> 
> Oh, it is - in a released version of the compiler, or only in development?

Still in development, but the prerelease already widely deployed by multiple
Linux distributions.

> & you're proposing changing the behavior only under -gstrict-dwarf, rather
> than in general? Any particular reason?

Just a typo, sorry, meant -gsplit-dwarf.

Jakub

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org