Re: [Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 15, 2020 at 2:47 PM Greg Clayton via Dwarf-Discuss
 wrote:
>
> One simple approach would be to be able to represent a DW_AT_low_pc with a 
> DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset 
> from the DW_AT_low_pc.

I'm not sure this would catch all the desired cases/be especially tidy
to implement, unfortunately.

> The value of the DW_AT_low_pc would be an offset from either:
> 1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by 
> looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, 
> then use the lowest address out of all of them.

"lowest address in DW_AT_ranges" wouldn't be suitable when ranges are
used across sections (eg: some CU ranges - when functions are in
different sections due to inline functions or -ffunction-sections). If
everything was in one section then an implementation could use low_pc
to indicate a good base address even if they still needed DW_AT_ranges
(eg: void f1() { } __attribute__((nodebug)) void f2() { } void f3() {
} - or other cases where a single section with multiple hunks of debug
info could exist with holes in between) - but it's possible to have
that and ranges. eg:

// compiled without function sections, so f1 is in one section, but f2
and f3 are in a single section together, separate from f1
inline void f1() { }
void f2() { f1(); }
void f3() { }

the low_pc of f3 could benefit from using the same address (+offset)
as the low_pc of f2 - but there would be no clear way to indicate
which part of the CU's DW_AT_ranges could be used as the base address
for 'f3'.

> 2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX 
> encoding.

Similar in the example above, 'f3' has no parent with a suitable
low_pc, but would benefit from sharing the same debug_addr entry as
'f2'.

A more extreme example happens in LLVM's prototype "Propeller" feature
- which essentially is "basic block sections" - where even a single
function may be fragmented across multiple sections and have no
specific ordering/scope based hierarchy about which base address to
use (so the function would have DW_AT_ranges, not just DW_AT_low/high
- and some internal scope could have a contiguous range and would want
to reuse one of the addresses used in DW_AT_ranges (+an offset from
it)).

- Dave

> Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding 
> small since it is always relative to the first parent scope's DW_AT_low_pc. 
> So this could save a lot of space in the DWARF if we use the smallest 
> possible DW_FORM_data encoding all the time.
> Solution #2 could be easier as you would traverse parent scopes looking for 
> an address encoding as the DW_FORM.
>
> This would allow DW_TAG_subprogram DIEs to have a single relocation on the 
> DW_AT_low_pc.
>
> Greg Clayton
>
>
> > On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss 
> >  wrote:
> >
> > David Blaikie has brought this up with me (or in conversations that
> > I observed) a couple of times:
> >
> > It's common to want to refer to a particular address plus an offset,
> > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> > block or inlined subprogram within another subprogram.  Generally
> > the only symbolic address available is the entry point of the
> > containing subprogram.  Back when addresses were held directly in
> > the .debug_info section, the attributes would have relocations, the
> > offset would be encoded into the relocation and the linker would
> > just do the right thing.
> >
> > With DWARF v5, we now have the .debug_addr section, which contains
> > the addresses to be fixed up by the linker.  But, we don't have a
> > way to specify an offset to add to an entry in the .debug_addr
> > section; instead, each unique addr+offset requires its own entry
> > in the .debug_addr table.  This consumes additional space, these
> > entries are generally not reusable, and it doesn't reduce the
> > overall number of relocations that the linker must process.
> >
> > It's not feasible to define a new attribute for address+offset,
> > because an attribute has only one value, and the attribute would
> > have to specify both the .debug_addr index and the offset to add.
> > But, we could define an "indirect" entry in .debug_addr, and then
> > reference it with an attribute in the same way that we reference
> > any other .debug_addr entry.
> >
> > An indirect entry would be the same size as all other entries in
> > .debug_addr (i.e., the size of an address on the target).  The
> > upper half would be another index into .debug_addr and the lower
> > half would be the addend.  The consumer adds the addend to the
> > value from the entry specified by the "another index."
> >
> > This solution doesn't save space in .debug_addr, but it does
> > reduce the number of relocations.  Ideally .debug_addr would
> > require only one relocation per function.
> >
> > We can debate whether the addend should be signed or 

Re: [Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread Greg Clayton via Dwarf-Discuss
One simple approach would be to be able to represent a DW_AT_low_pc with a 
DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset 
from the DW_AT_low_pc. The value of the DW_AT_low_pc would be an offset from 
either:
1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by 
looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, then 
use the lowest address out of all of them.
2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX 
encoding.

Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding 
small since it is always relative to the first parent scope's DW_AT_low_pc. So 
this could save a lot of space in the DWARF if we use the smallest possible 
DW_FORM_data encoding all the time.
Solution #2 could be easier as you would traverse parent scopes looking for an 
address encoding as the DW_FORM.

This would allow DW_TAG_subprogram DIEs to have a single relocation on the 
DW_AT_low_pc.

Greg Clayton


> On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss 
>  wrote:
> 
> David Blaikie has brought this up with me (or in conversations that
> I observed) a couple of times:
> 
> It's common to want to refer to a particular address plus an offset,
> for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> block or inlined subprogram within another subprogram.  Generally
> the only symbolic address available is the entry point of the
> containing subprogram.  Back when addresses were held directly in 
> the .debug_info section, the attributes would have relocations, the
> offset would be encoded into the relocation and the linker would
> just do the right thing.
> 
> With DWARF v5, we now have the .debug_addr section, which contains
> the addresses to be fixed up by the linker.  But, we don't have a
> way to specify an offset to add to an entry in the .debug_addr
> section; instead, each unique addr+offset requires its own entry
> in the .debug_addr table.  This consumes additional space, these
> entries are generally not reusable, and it doesn't reduce the
> overall number of relocations that the linker must process.
> 
> It's not feasible to define a new attribute for address+offset,
> because an attribute has only one value, and the attribute would
> have to specify both the .debug_addr index and the offset to add.
> But, we could define an "indirect" entry in .debug_addr, and then
> reference it with an attribute in the same way that we reference
> any other .debug_addr entry.
> 
> An indirect entry would be the same size as all other entries in 
> .debug_addr (i.e., the size of an address on the target).  The
> upper half would be another index into .debug_addr and the lower
> half would be the addend.  The consumer adds the addend to the
> value from the entry specified by the "another index."
> 
> This solution doesn't save space in .debug_addr, but it does
> reduce the number of relocations.  Ideally .debug_addr would
> require only one relocation per function.
> 
> We can debate whether the addend should be signed or unsigned,
> and whether the indirect entries should be a separate subtable,
> but I wanted to float the idea here before I wrote it up as a
> proposal.
> 
> Alternatively, the indirect sub-table could be encoded with
> ULEB/SLEB pairs, but that makes it hard to find them by index.
> They could be found by a direct reference, but that requires a
> relocation from .debug_info to .debug_addr, so we haven't saved
> any relocations that way.
> 
> If there are obvious flaws I can't see, or someone is inspired
> to come up with another solution, please let me know!  Otherwise
> I'll write it up as a formal proposal probably later this week.
> 
> Thanks,
> --paulr
> 
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread David Blaikie via Dwarf-Discuss
On Tue, Sep 15, 2020 at 10:13 AM Robinson, Paul via Dwarf-Discuss
 wrote:
>
> David Blaikie has brought this up with me (or in conversations that
> I observed) a couple of times:

Thanks for bringing this up! Not sure if I've raised this on
dwarf-discuss specifically before.. ah, yeah, 3 years ago:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-June/004378.html
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-July/thread.html#4380
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-August/004393.html

Most recently I had an idea for a workaround that I proposed on the
llvm-dev mailing list:
https://groups.google.com/g/llvm-dev/c/g3eGxhi4ATU/m/fbrBPFxNBwAJ
The idea being that actually using debug_rnglists even for contiguous
ranges would reduce .o/executable file size when using Split DWARF. I
think the data I had even showed breakeven for non-split DWARF object
files, probably slight growth for linked executables in that case,
though.

> It's common to want to refer to a particular address plus an offset,
> for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
> block or inlined subprogram within another subprogram.

Yep - the ones I'm especially interested in now, are those that won't
be addressed even by a "ranges everywhere" approach (though that
approach does have size tradeoffs that I'd like to avoid/improve on
too, for sure!) - DW_TAG_call_site's
DW_AT_call_pc/DW_AT_call_return_pc and DW_TAG_label's DW_AT_low_pc.
The latter isn't super common in code I'm dealing with, but the former
is pretty ubiquitous now.

>  Generally
> the only symbolic address available is the entry point of the
> containing subprogram.  Back when addresses were held directly in
> the .debug_info section, the attributes would have relocations, the
> offset would be encoded into the relocation and the linker would
> just do the right thing.
>
> With DWARF v5, we now have the .debug_addr section, which contains
> the addresses to be fixed up by the linker.  But, we don't have a
> way to specify an offset to add to an entry in the .debug_addr
> section; instead, each unique addr+offset requires its own entry
> in the .debug_addr table.  This consumes additional space, these
> entries are generally not reusable, and it doesn't reduce the
> overall number of relocations that the linker must process.

If you're encountering size penalties with non-split DWARFv5 due to
debug_addr indirection - we could change LLVM to choose which
addresses to indirect and which ones to use the classing/DWARFv4-esque
representations.
(But, yeah, overall, I think it's better for lots of use cases to
support an addr+offset encoding)

> It's not feasible to define a new attribute for address+offset,
> because an attribute has only one value, and the attribute would
> have to specify both the .debug_addr index and the offset to add.

I don't follow this ^ - I think previously we've discussed at least 2
representations that could do this:
uleb+uleb
generalized exprloc support

admittedly uleb+uleb has the problem that it's a variable-length
encoding, but at least LLVM currently is using addrx exclusively, and
not the addrxN fixed length encodings.

> But, we could define an "indirect" entry in .debug_addr, and then
> reference it with an attribute in the same way that we reference
> any other .debug_addr entry.

This direction would, for my use case, be unfortunate - since my goal
is to remove as much DWARF from object files as possible under Split
DWARF - so leaving anything extra in debug_addr works against that
goal.

> An indirect entry would be the same size as all other entries in
> .debug_addr (i.e., the size of an address on the target).  The
> upper half would be another index into .debug_addr and the lower
> half would be the addend.  The consumer adds the addend to the
> value from the entry specified by the "another index."

If it's OK to use such a small fixed length encoding (addrx supports
variable length with fixed lengths of 1/2/3/4 - offsets in LLVM are
emitted as data4) then we could introduce that as the
FORM_addrx4_offset4 (or could make it variable length depending on
pointer size - but that seems less relevant when it's not uin the
debug_addr section) form and a uleb+uleb form, without providing all
the possible combinations of addrx{1,2,3,4,N}_offset{1,2,3,4,M}.

In any case, I think of these forms as sort of special
case/compact/easier to parse encodings of the generalized exprloc
(DW_OP_addrx(N), DW_OP_constu(M), DW_OP_plus).

>
> This solution doesn't save space in .debug_addr, but it does
> reduce the number of relocations.  Ideally .debug_addr would
> require only one relocation per function.
>
> We can debate whether the addend should be signed or unsigned,
> and whether the indirect entries should be a separate subtable,
> but I wanted to float the idea here before I wrote it up as a
> proposal.

I'd be fairly in favor of unsigned. Generally LLVM already 

[Dwarf-Discuss] .debug_addr entry plus offset

2020-09-15 Thread Robinson, Paul via Dwarf-Discuss
David Blaikie has brought this up with me (or in conversations that
I observed) a couple of times:

It's common to want to refer to a particular address plus an offset,
for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical
block or inlined subprogram within another subprogram.  Generally
the only symbolic address available is the entry point of the
containing subprogram.  Back when addresses were held directly in 
the .debug_info section, the attributes would have relocations, the
offset would be encoded into the relocation and the linker would
just do the right thing.

With DWARF v5, we now have the .debug_addr section, which contains
the addresses to be fixed up by the linker.  But, we don't have a
way to specify an offset to add to an entry in the .debug_addr
section; instead, each unique addr+offset requires its own entry
in the .debug_addr table.  This consumes additional space, these
entries are generally not reusable, and it doesn't reduce the
overall number of relocations that the linker must process.

It's not feasible to define a new attribute for address+offset,
because an attribute has only one value, and the attribute would
have to specify both the .debug_addr index and the offset to add.
But, we could define an "indirect" entry in .debug_addr, and then
reference it with an attribute in the same way that we reference
any other .debug_addr entry.

An indirect entry would be the same size as all other entries in 
.debug_addr (i.e., the size of an address on the target).  The
upper half would be another index into .debug_addr and the lower
half would be the addend.  The consumer adds the addend to the
value from the entry specified by the "another index."

This solution doesn't save space in .debug_addr, but it does
reduce the number of relocations.  Ideally .debug_addr would
require only one relocation per function.

We can debate whether the addend should be signed or unsigned,
and whether the indirect entries should be a separate subtable,
but I wanted to float the idea here before I wrote it up as a
proposal.

Alternatively, the indirect sub-table could be encoded with
ULEB/SLEB pairs, but that makes it hard to find them by index.
They could be found by a direct reference, but that requires a
relocation from .debug_info to .debug_addr, so we haven't saved
any relocations that way.

If there are obvious flaws I can't see, or someone is inspired
to come up with another solution, please let me know!  Otherwise
I'll write it up as a formal proposal probably later this week.

Thanks,
--paulr

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org