[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-02-12 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1871932 , @labath wrote:

> In D70840#1871862 , @mstorsjo wrote:
>
> > > Now someone might argue that the looking up the address of the `ldr` 
> > > opcode is wrong, because that is not the actual pc value, and that we 
> > > should lookup the address of `ldr+1`. In that case we can point them to 
> > > the next function (`call_noreturn`), which ends with a call to a noreturn 
> > > function. Now if we get a crash in `does_not_return()`, the address we 
> > > will get from the backtrace will be one byte after the last opcode of 
> > > `call_noreturn` (`bl  does_not_return()`). This means that in the 
> > > **non-windows** case, it will not resolve correctly to the 
> > > `call_noreturn` even if we apply the usual trick of subtracting 1 from 
> > > the address to bring it into the range of the calling function.
> >
> > Hmm... I presume this issue is present now already (and doesn't really 
> > change due to normalizing the line tables and ranges on windows, to the 
> > same as linux). Not familiar with "the usual trick of subtracting 1 from 
> > the address to bring it into the range of the calling function", but 
> > wouldn't something like `GetOpcodeLoadAddress(pc) - 1` always bring you 
> > into the range of the calling function, given a return address?
>
>
> Lldb uses this when unwinding specifically to deal with the "call as a last 
> instruction" problem, and I'm pretty sure it's not the only tool doing that. 
> This is even described in the DWARF spec (non-normative text):
>
> > In most cases the return address is in the same context as the calling 
> > address, but that
> >  need not be the case, especially if the producer knows in some way the 
> > call never will
> >  return. The context of the ’return address’ might be on a different line, 
> > in a different
> >  lexical block, or past the end of the calling subroutine. If a consumer 
> > were to assume that
> >  it was in the same context as the calling address, the virtual unwind 
> > might fail.
> > 
> > For architectures with constant-length instructions where the return address
> >  immediately follows the call instruction, a simple solution is to subtract 
> > the length of an
> >  instruction from the return address to obtain the calling instruction. For 
> > architectures
> >  with variable-length instructions (for example, x86), this is not 
> > possible. However,
> >  subtracting 1 from the return address, although not guaranteed to provide 
> > the exact
> >  calling address, generally will produce an address within the same context 
> > as the calling
> >  address, and that usually is sufficient.
>
> Using `GetOpcodeLoadAddress(pc) - 1` would work, but this implies that the 
> caller should be passing in "load" addresses, which is in conflict with the 
> premise I have made at the beginning of the paragraph that one should be 
> passing in "code" addresses here. (My attempt at proof by contradiction.)


Ah, I see - so this also is another argument for making things work with "load" 
addresses - good then.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-02-12 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1871862 , @mstorsjo wrote:

> > Now someone might argue that the looking up the address of the `ldr` opcode 
> > is wrong, because that is not the actual pc value, and that we should 
> > lookup the address of `ldr+1`. In that case we can point them to the next 
> > function (`call_noreturn`), which ends with a call to a noreturn function. 
> > Now if we get a crash in `does_not_return()`, the address we will get from 
> > the backtrace will be one byte after the last opcode of `call_noreturn` 
> > (`bl  does_not_return()`). This means that in the **non-windows** case, 
> > it will not resolve correctly to the `call_noreturn` even if we apply the 
> > usual trick of subtracting 1 from the address to bring it into the range of 
> > the calling function.
>
> Hmm... I presume this issue is present now already (and doesn't really change 
> due to normalizing the line tables and ranges on windows, to the same as 
> linux). Not familiar with "the usual trick of subtracting 1 from the address 
> to bring it into the range of the calling function", but wouldn't something 
> like `GetOpcodeLoadAddress(pc) - 1` always bring you into the range of the 
> calling function, given a return address?


Lldb uses this when unwinding specifically to deal with the "call as a last 
instruction" problem, and I'm pretty sure it's not the only tool doing that. 
This is even described in the DWARF spec (non-normative text):

> In most cases the return address is in the same context as the calling 
> address, but that
>  need not be the case, especially if the producer knows in some way the call 
> never will
>  return. The context of the ’return address’ might be on a different line, in 
> a different
>  lexical block, or past the end of the calling subroutine. If a consumer were 
> to assume that
>  it was in the same context as the calling address, the virtual unwind might 
> fail.
> 
> For architectures with constant-length instructions where the return address
>  immediately follows the call instruction, a simple solution is to subtract 
> the length of an
>  instruction from the return address to obtain the calling instruction. For 
> architectures
>  with variable-length instructions (for example, x86), this is not possible. 
> However,
>  subtracting 1 from the return address, although not guaranteed to provide 
> the exact
>  calling address, generally will produce an address within the same context 
> as the calling
>  address, and that usually is sufficient.

Using `GetOpcodeLoadAddress(pc) - 1` would work, but this implies that the 
caller should be passing in "load" addresses, which is in conflict with the 
premise I have made at the beginning of the paragraph that one should be 
passing in "code" addresses here. (My attempt at proof by contradiction.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-02-12 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1871745 , @labath wrote:

> I did not send out any emails for this, but I did talk about this with 
> @dblaikie when I was in the US last week. My take from that is that we should 
> start by "fixing" llvm-symbolizer so that it works reasonably in these cases. 
> I'll use this  godbolt link to illustrate what 
> I mean.
>
> If we take a look at the `deref` function in that link, and try to pass the 
> address of the first instruction `ldr r0, [r0]` to llvm-symbolizer, then 
> we will get something reasonable in the non-windows case, but we will get 
> nothing in the windows case, because the debug info would claim that the 
> function starts halfway into that opcode. Fixing the tool so that it returns 
> the same thing everywhere sounds reasonable, and since it does both line 
> table and debug_info lookups, this will already answer the main questions we 
> have about where this logic should be implemented. Then we can just copy that 
> in lldb.


Thanks for keeping track of this! I'll look into fixing this at some suitable 
level for llvm-symbolizer for further discussion.

> Now someone might argue that the looking up the address of the `ldr` opcode 
> is wrong, because that is not the actual pc value, and that we should lookup 
> the address of `ldr+1`. In that case we can point them to the next function 
> (`call_noreturn`), which ends with a call to a noreturn function. Now if we 
> get a crash in `does_not_return()`, the address we will get from the 
> backtrace will be one byte after the last opcode of `call_noreturn` (`bl  
> does_not_return()`). This means that in the **non-windows** case, it will not 
> resolve correctly to the `call_noreturn` even if we apply the usual trick of 
> subtracting 1 from the address to bring it into the range of the calling 
> function.

Hmm... I presume this issue is present now already (and doesn't really change 
due to normalizing the line tables and ranges on windows, to the same as 
linux). Not familiar with "the usual trick of subtracting 1 from the address to 
bring it into the range of the calling function", but wouldn't something like 
`GetOpcodeLoadAddress(pc) - 1` always bring you into the range of the calling 
function, given a return address?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-02-12 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

I did not send out any emails for this, but I did talk about this with 
@dblaikie when I was in the US last week. My take from that is that we should 
start by "fixing" llvm-symbolizer so that it works reasonably in these cases. 
I'll use this  godbolt link to illustrate what I 
mean.

If we take a look at the `deref` function in that link, and try to pass the 
address of the first instruction `ldr r0, [r0]` to llvm-symbolizer, then we 
will get something reasonable in the non-windows case, but we will get nothing 
in the windows case, because the debug info would claim that the function 
starts halfway into that opcode. Fixing the tool so that it returns the same 
thing everywhere sounds reasonable, and since it does both line table and 
debug_info lookups, this will already answer the main questions we have about 
where this logic should be implemented. Then we can just copy that in lldb.

Now someone might argue that the looking up the address of the `ldr` opcode is 
wrong, because that is not the actual pc value, and that we should lookup the 
address of `ldr+1`. In that case we can point them to the next function 
(`call_noreturn`), which ends with a call to a noreturn function. Now if we get 
a crash in `does_not_return()`, the address we will get from the backtrace will 
be one byte after the last opcode of `call_noreturn` (`bl  
does_not_return()`). This means that in the **non-windows** case, it will not 
resolve correctly to the `call_noreturn` even if we apply the usual trick of 
subtracting 1 from the address to bring it into the range of the calling 
function. Either way, there's some fixup that needs to happen somewhere...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-29 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1845176 , @clayborg wrote:

> Speaking to having the LLVM layer strip things out, this should be an option 
> if it does get added there. If we have a stripped binary and we have no 
> symbols or any other CPU map, and DWARF is the only way to tell if a function 
> was ARM or Thumb by looking at bit zero, then we need to somehow be able to 
> get this raw information.


I would expect there will still be some way to get the raw data, because the 
dumping tools would want to get it.

However, if I understand things correctly, this is would not be a very useful 
source of information for us -- on windows arm, all debug info addresses will 
have the thumb bit set (because everything is thumb), and everywhere else the 
debug info will have the bit 0 cleared, regardless of whether the function is 
thumb or not...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-28 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added a comment.

Speaking to having the LLVM layer strip things out, this should be an option if 
it does get added there. If we have a stripped binary and we have no symbols or 
any other CPU map, and DWARF is the only way to tell if a function was ARM or 
Thumb by looking at bit zero, then we need to somehow be able to get this raw 
information.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-28 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added inline comments.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:400
+entry->base = arch.GetOpcodeLoadAddress(entry->base);
+  }
+  if (frame_base && frame_base->IsLocationList()) {

Either way, I am not picky.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-28 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1844318 , @mstorsjo wrote:

> In D70840#1844279 , @labath wrote:
>
> > This is something that probably should be discussed with the llvm dwarf 
> > owners too (though most of them are on this review already).
> >
> > However, if we assume that we can agree that this logic should be 
> > implemented at a fairly low level in the dwarf parsing code (so that e.g., 
> > when we switch to the llvm parser, it will do this thing for us),
>
>
> I guess the main question here, as follows from my comment above, is who/how 
> is the fixup controlled in the llvm side, where there's (to my knowledge) no 
> `GetOpcodeLoadAddress` that can abstract away the fixup? IIRC the dwarf code 
> itself is happily unaware of what architecture it is describing.


Yes, that is the trickiest part. Dwarf is architecture-agnostic for the most 
part (which makes this behavior of the windows linker very unfortunate), but 
the llvm DWARFContext already knows the architecture of the file it is 
describing (DWARFContext::getArch). The fixup could key off of that, but this 
will need careful consideration.

> 
> 
>> then this patch still seems quite "schizophrenic" to me, because the line 
>> table and location list fixups happen pretty high up. The location list case 
>> is particularly interesting here because we use a high level api 
>> (`llvm::DWARFLocationExpression`) to retrieve the location list, which may 
>> be reasonably expected to return "fixed" addresses, and so this fix should 
>> happen in llvm.
> 
> FWIW, as @clayborg said that `DWARFExpression` owns a Module in these cases, 
> and thus should have access to an ArchSpec, I could just slip the fixup into 
> `DWARFExpression::SetLocationListAddresses`. The reason I was messing around 
> there was primarily to avoid sprinkling fixups in many codepaths in 
> `DWARFDebugInfoEntry::GetDIENamesAndRanges` that call this method.

Here, I wasn't referring to `SetLocationListAddresses`, but rather to the 
addresses you get from the location list section 
(DWARFExpression::GetLocationExpression). I would expect you need to fix those 
do (but you don't do it in this patch). As for `SetLocationListAddresses`, I 
would still consider that high level, as the addresses pass through several 
high-level functions before landing here. If we go with the low-level approach, 
I'd expect this to happen somewhere near `DWARFUnit::GetBaseAddress` and 
`DWARFDIE::getLowPC` (imaginary function inspired by 
`llvm::DWARFDie::getLowAndHighPC`)

> 
> 
>> (The line tables also use llvm code for parsing, but we access them using  a 
>> very low-level api, so it's likely this check would have to done on lldb 
>> side even if llvm provided such a functionality -- but this could happen in 
>> SymbolFileDWARF::ParseLineTable instead of in the LineTable class).
> 
> Sure, I'll look into moving the fixups there.

I wouldn't bother with that too much until we figure out the overall strategy 
here.

>> If you want, I can send the initial email to start the discussion, but I 
>> can't commit to writing any of the patches. :(
> 
> If you can do that, that would be very much appreciated!

OK, I'll try to get the ball rolling there.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-28 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1844279 , @labath wrote:

> Right, your answer implicitly assumes (and I do agree with you) that the 
> answer is that `getSubroutineForAddress` should return the function that 
> address is in. That is not unreasonable -- I deliberately chose the most 
> high-level api I could find, to make arguing that case easy. But that case 
> should be argued nonetheless. For example, a counterargument to that might be 
> that one should only use valid PC register values as inputs to this function 
> (so an even value would not be a valid input here, because the function is 
> thumb), and in that case, the there would be no fixups needed. This position 
> could be further strengthened by the fact that the llvm dwarf parser is also 
> used by tools (e.g., llvm-dwarfdump) which want a fairly exact view of the 
> data in the file, without too much smartness, so there may need to be some 
> kind of an api which would provide the "raw" value.


I guess the selection of what kind of fixups to apply could be controlled via 
whatever object controls the fixup. In these patches, it's lldb's ArchSpec - 
the low level dwarf routines doesn't have any ArchSpec but uses the one given 
to them to fix up the addresses.

> This is something that probably should be discussed with the llvm dwarf 
> owners too (though most of them are on this review already).
> 
> However, if we assume that we can agree that this logic should be implemented 
> at a fairly low level in the dwarf parsing code (so that e.g., when we switch 
> to the llvm parser, it will do this thing for us),

I guess the main question here, as follows from my comment above, is who/how is 
the fixup controlled in the llvm side, where there's (to my knowledge) no 
`GetOpcodeLoadAddress` that can abstract away the fixup? IIRC the dwarf code 
itself is happily unaware of what architecture it is describing.

> then this patch still seems quite "schizophrenic" to me, because the line 
> table and location list fixups happen pretty high up. The location list case 
> is particularly interesting here because we use a high level api 
> (`llvm::DWARFLocationExpression`) to retrieve the location list, which may be 
> reasonably expected to return "fixed" addresses, and so this fix should 
> happen in llvm.

FWIW, as @clayborg said that `DWARFExpression` owns a Module in these cases, 
and thus should have access to an ArchSpec, I could just slip the fixup into 
`DWARFExpression::SetLocationListAddresses`. The reason I was messing around 
there was primarily to avoid sprinkling fixups in many codepaths in 
`DWARFDebugInfoEntry::GetDIENamesAndRanges` that call this method.

> (The line tables also use llvm code for parsing, but we access them using  a 
> very low-level api, so it's likely this check would have to done on lldb side 
> even if llvm provided such a functionality -- but this could happen in 
> SymbolFileDWARF::ParseLineTable instead of in the LineTable class).

Sure, I'll look into moving the fixups there.

> So, at the end of the day, I guess what I am saying is that we should send a 
> small RFC to the llvm debug info folks to clarify what should the behavior of 
> the llvm classes be for this kind of dwarf. If the conclusion is that this 
> should be implemented inside these classes (and not by their users), then we 
> should implement it there, and then have the lldb code mirror that. This is 
> the approach we've been trying to do for a while now when modifying or 
> implementing new dwarf features in lldb, but this is probably the first case 
> of wanting to implement a feature in lldb where there is no existing llvm 
> counterpart. (In contrast the DWZ feature is also blocked in part due to 
> concerns about increasing llvm divergence, but there I'm pretty sure the 
> answer is that DWZ should not be implemented in llvm, and the lldb parts 
> should be written in a way that does not require low-level dwarf changes.)
> 
> If you want, I can send the initial email to start the discussion, but I 
> can't commit to writing any of the patches. :(

If you can do that, that would be very much appreciated!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-28 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1838249 , @mstorsjo wrote:

> In D70840#1838186 , @labath wrote:
>
> > The thing that I am not sure we have fully explored is whether there is any 
> > need for this `&~1` business in the llvm dwarf code. For instance, what 
> > should `llvm::DWARFUnit::getSubroutineForAddress` return if you pass it an 
> > address that is equal to the actual memory address of the start of the 
> > thumb function, but the relevant DW_AT_low_pc contains `address|1`? 
> > Answering that might give us an indication on what is the layer at which 
> > this fixup should be applied.
>
>
> From a quick check at the code there, for that particular case, either it'd 
> be done in `llvm::DWARFUnit::updateAddressDieMap` or in 
> `llvm::DWARFDie::getAddressRanges`. Since all levels of the API is public, 
> the fix does become safer the closer to parsing it is, but one also generally 
> has less context to work with in those cases.


Right, your answer implicitly assumes (and I do agree with you) that the answer 
is that `getSubroutineForAddress` should return the function that address is 
in. That is not unreasonable -- I deliberately chose the most high-level api I 
could find, to make arguing that case easy. But that case should be argued 
nonetheless. For example, a counterargument to that might be that one should 
only use valid PC register values as inputs to this function (so an even value 
would not be a valid input here, because the function is thumb), and in that 
case, the there would be no fixups needed. This position could be further 
strengthened by the fact that the llvm dwarf parser is also used by tools 
(e.g., llvm-dwarfdump) which want a fairly exact view of the data in the file, 
without too much smartness, so there may need to be some kind of an api which 
would provide the "raw" value.

This is something that probably should be discussed with the llvm dwarf owners 
too (though most of them are on this review already).

However, if we assume that we can agree that this logic should be implemented 
at a fairly low level in the dwarf parsing code (so that e.g., when we switch 
to the llvm parser, it will do this thing for us), then this patch still seems 
quite "schizophrenic" to me, because the line table and location list fixups 
happen pretty high up. The location list case is particularly interesting here 
because we use a high level api (`llvm::DWARFLocationExpression`) to retrieve 
the location list, which may be reasonably expected to return "fixed" 
addresses, and so this fix should happen in llvm. (The line tables also use 
llvm code for parsing, but we access them using  a very low-level api, so it's 
likely this check would have to done on lldb side even if llvm provided such a 
functionality -- but this could happen in SymbolFileDWARF::ParseLineTable 
instead of in the LineTable class).

So, at the end of the day, I guess what I am saying is that we should send a 
small RFC to the llvm debug info folks to clarify what should the behavior of 
the llvm classes be for this kind of dwarf. If the conclusion is that this 
should be implemented inside these classes (and not by their users), then we 
should implement it there, and then have the lldb code mirror that. This is the 
approach we've been trying to do for a while now when modifying or implementing 
new dwarf features in lldb, but this is probably the first case of wanting to 
implement a feature in lldb where there is no existing llvm counterpart. (In 
contrast the DWZ feature is also blocked in part due to concerns about 
increasing llvm divergence, but there I'm pretty sure the answer is that DWZ 
should not be implemented in llvm, and the lldb parts should be written in a 
way that does not require low-level dwarf changes.)

If you want, I can send the initial email to start the discussion, but I can't 
commit to writing any of the patches. :(


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-27 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo marked 2 inline comments as done.
mstorsjo added inline comments.



Comment at: lldb/include/lldb/Expression/DWARFExpression.h:142
 
+  std::pair GetLocationListAddresses() const;
+

clayborg wrote:
> Might be better as two functions? One to get the CU address and func address:
> 
> ```
> lldb_private::Address GetCUAddress();
> lldb_private::Address GetFuncAddress();
> ```
> 
> The DWARFExpression has a Module when the location comes from DWARF and this 
> can be used to get the arch and sanitize the address by calling 
> GetOpcodeLoadAddress on the address before returning it. 
Oh, if we'd make DWARFExpression::SetLocationListAddresses do the sanitization, 
then we don't need to add the getters at all - that'd save a lot of extra 
changes as well.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:397-400
+  for (size_t i = 0; i < ranges.GetSize(); i++) {
+DWARFRangeList::Entry *entry = ranges.GetMutableEntryAtIndex(i);
+entry->base = arch.GetOpcodeLoadAddress(entry->base);
+  }

clayborg wrote:
> What about asking the DWARFRangeList to fix all code addresses? This code 
> would become:
> 
> ```
> ranges.GetOpcodeAddresses(arch);
> ```
> 
> 
Yeah, I was thinking of some refactoring like that. I would have gone for a 
method that mutates the current range, e.g. `ranges.FixOpcodeAddresses(arch)`, 
but would you prefer it to be something that returns a new sanitized range 
instead?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-27 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added inline comments.



Comment at: lldb/include/lldb/Expression/DWARFExpression.h:142
 
+  std::pair GetLocationListAddresses() const;
+

Might be better as two functions? One to get the CU address and func address:

```
lldb_private::Address GetCUAddress();
lldb_private::Address GetFuncAddress();
```

The DWARFExpression has a Module when the location comes from DWARF and this 
can be used to get the arch and sanitize the address by calling 
GetOpcodeLoadAddress on the address before returning it. 



Comment at: lldb/include/lldb/Utility/RangeMap.h:247-250
+  Entry *GetMutableEntryAtIndex(size_t i) {
+return ((i < m_entries.size()) ? _entries[i] : nullptr);
+  }
+

Remove if we make DWARFRangeList handle this as mentioned in inline comment



Comment at: lldb/source/Expression/DWARFExpression.cpp:102-106
+std::pair DWARFExpression::GetLocationListAddresses() const {
+  return std::make_pair(m_loclist_addresses->cu_file_addr,
+m_loclist_addresses->func_file_addr);
+}
+

Convert to:

```
lldb_private::Address DWARFExpression::GetCUAddress();
lldb_private::Address DWARFExpression::GetFuncAddress();
```




Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:397-400
+  for (size_t i = 0; i < ranges.GetSize(); i++) {
+DWARFRangeList::Entry *entry = ranges.GetMutableEntryAtIndex(i);
+entry->base = arch.GetOpcodeLoadAddress(entry->base);
+  }

What about asking the DWARFRangeList to fix all code addresses? This code would 
become:

```
ranges.GetOpcodeAddresses(arch);
```





Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:401-405
+  if (frame_base && frame_base->IsLocationList()) {
+std::pair loclist = 
frame_base->GetLocationListAddresses();
+loclist.second = arch.GetOpcodeLoadAddress(loclist.second);
+frame_base->SetLocationListAddresses(loclist.first, loclist.second);
+  }

Remove if we add:
```
lldb_private::Address DWARFExpression::GetCUAddress();
lldb_private::Address DWARFExpression::GetFuncAddress();
```




Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:776-779
+for (size_t i = 0; i < ranges.GetSize(); i++) {
+  DWARFRangeList::Entry *entry = ranges.GetMutableEntryAtIndex(i);
+  entry->base = arch.GetOpcodeLoadAddress(entry->base);
+}

Call new DWARFRangeList::GetOpcodeAddresses?

```
ranges.GetOpcodeAddresses(arch);
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-24 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo updated this revision to Diff 240292.
mstorsjo added a comment.

I tried doing the other alternative; now we primarily touch up the values in 
DWARFDebugInfoEntry::GetAttributeAddressRange(s) and 
DWARFDebugInfoEntry::GetDIENamesAndRanges. This variant feels fairly concise 
and consistent.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840

Files:
  lldb/include/lldb/Expression/DWARFExpression.h
  lldb/include/lldb/Symbol/LineTable.h
  lldb/include/lldb/Utility/ArchSpec.h
  lldb/include/lldb/Utility/RangeMap.h
  lldb/source/Expression/DWARFExpression.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
  lldb/source/Symbol/DWARFCallFrameInfo.cpp
  lldb/source/Symbol/LineTable.cpp
  lldb/source/Utility/ArchSpec.cpp
  lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s

Index: lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
===
--- /dev/null
+++ lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
@@ -0,0 +1,183 @@
+# Test that a linked windows executable, with a thumb bit in many address
+# fields, gets the thumb bit stripped out from addresses.
+
+# If the thumb bit isn't stripped out from subprogram ranges, 0x401006 is
+# associated with the function "entry", while it actually is the start of
+# the function "other".
+# If the thumb bit isn't stripped out from line tables, the LineEntry
+# points to the wrong line.
+
+# REQUIRES: lld, arm
+
+# RUN: llvm-mc -triple armv7-windows-gnu %s -filetype=obj > %t.o
+# RUN: lld-link %t.o -out:%t.exe -debug:dwarf -entry:entry -subsystem:console -lldmingw
+# RUN: %lldb %t.exe -o "image lookup -v -a 0x401006" -b | FileCheck %s
+
+# CHECK-LABEL: image lookup -v -a 0x401006
+# CHECK: Function: {{.*}}, name = "other", range = [0x00401006-0x0040100c)
+# CHECK: LineEntry: [0x00401006-0x0040100a): /path/to/src/dwarf-thumb.c:7:12
+
+.text
+.syntax unified
+.file   "dwarf-thumb.c"
+.file   1 "" "dwarf-thumb.c"
+.def entry;
+.scl2;
+.type   32;
+.endef
+.globl  entry   @ -- Begin function entry
+.p2align1
+.code   16  @ @entry
+.thumb_func
+entry:
+.Lfunc_begin0:
+.loc1 2 0   @ dwarf-thumb.c:2:0
+.cfi_sections .debug_frame
+.cfi_startproc
+.loc1 4 9 prologue_end  @ dwarf-thumb.c:4:9
+mov r1, r0
+.Ltmp0:
+b   other
+.Ltmp1:
+.Lfunc_end0:
+.cfi_endproc
+@ -- End function
+.def other;
+.scl2;
+.type   32;
+.endef
+.globl  other   @ -- Begin function other
+.p2align1
+.code   16  @ @other
+.thumb_func
+other:
+.Lfunc_begin1:
+.loc1 6 0   @ dwarf-thumb.c:6:0
+.cfi_startproc
+.loc1 7 12 prologue_end @ dwarf-thumb.c:7:12
+add.w   r0, r1, r1, lsl #1
+.loc1 7 2 is_stmt 0 @ dwarf-thumb.c:7:2
+bx  lr
+.Ltmp2:
+.Lfunc_end1:
+.cfi_endproc
+@ -- End function
+.section.debug_str,"dr"
+.Linfo_string:
+.Linfo_string1:
+.asciz  "dwarf-thumb.c"
+.Linfo_string2:
+.asciz  "/path/to/src"
+.Linfo_string3:
+.asciz  "other"
+.Linfo_string6:
+.asciz  "entry"
+.section.debug_loc,"dr"
+.Lsection_debug_loc:
+.Ldebug_loc0:
+.long   .Lfunc_begin0-.Lfunc_begin0
+.long   .Ltmp0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   80  @ DW_OP_reg0
+.long   .Ltmp0-.Lfunc_begin0
+.long   .Lfunc_end0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   81  @ DW_OP_reg1
+.long   0
+.long   0
+.section.debug_abbrev,"dr"
+.Lsection_abbrev:
+.byte   1   @ Abbreviation Code
+.byte   17  @ DW_TAG_compile_unit
+.byte   1   @ DW_CHILDREN_yes
+.byte   37  @ DW_AT_producer
+.byte   37  @ DW_FORM_strx1
+.byte   19  @ DW_AT_language
+.byte   5   @ DW_FORM_data2
+.byte   3   @ DW_AT_name
+.byte   14  @ DW_FORM_strp
+.byte   16  @ DW_AT_stmt_list
+.byte   23  @ DW_FORM_sec_offset
+.byte   27  @ DW_AT_comp_dir
+.byte   14  @ DW_FORM_strp
+.byte   17   

[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-24 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1838186 , @labath wrote:

> Yes, I was keeping this problem in mind when I was working on that patch. :) 
> I believe more work could be done to reduce the number of places that parse 
> DW_AT_low/high_pc, but I haven't gotten around to that yet..


Thanks :-)

Ok - this patch at least shows which codepaths I think are touching it at the 
moment. In addition to DW_AT_low/high_pc, there's also the case of 
DW_AT_ranges; I think all relevant paths that access that are taken care of 
here.

One downside to moving the fixups further out from the parsing, is that it 
easily becomes more brittle. If a new codepath accesses some getter without 
doing the fixup afterwards, we'd end up with the same hard-to-diagnose 
brokenness again.

> The thing that I am not sure we have fully explored is whether there is any 
> need for this `&~1` business in the llvm dwarf code. For instance, what 
> should `llvm::DWARFUnit::getSubroutineForAddress` return if you pass it an 
> address that is equal to the actual memory address of the start of the thumb 
> function, but the relevant DW_AT_low_pc contains `address|1`? Answering that 
> might give us an indication on what is the layer at which this fixup should 
> be applied.

From a quick check at the code there, for that particular case, either it'd be 
done in `llvm::DWARFUnit::updateAddressDieMap` or in 
`llvm::DWARFDie::getAddressRanges`. Since all levels of the API is public, the 
fix does become safer the closer to parsing it is, but one also generally has 
less context to work with in those cases.

I think the most narrow interface to apply the fix in would be 
`DWARFDebugInfoEntry::GetAttributeAddressRanges` (plus 
`DWARFDebugInfoEntry::GetAttributeAddressRange` which is used a few places 
internally in `DWARFDebugInfoEntry`) and 
`DWARFDebugInfoEntry::GetDIENamesAndRanges` (plus line tables); if only applied 
there, I think it would cover all the cases I'm fixing up at the moment...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-24 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

Yes, I was keeping this problem in mind when I was working on that patch. :) I 
believe more work could be done to reduce the number of places that parse 
DW_AT_low/high_pc, but I haven't gotten around to that yet..

The thing that I am not sure we have fully explored is whether there is any 
need for this `&~1` business in the llvm dwarf code. For instance, what should 
`llvm::DWARFUnit::getSubroutineForAddress` return if you pass it an address 
that is equal to the actual memory address of the start of the thumb function, 
but the relevant DW_AT_low_pc contains `address|1`? Answering that might give 
us an indication on what is the layer at which this fixup should be applied.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-24 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo updated this revision to Diff 240118.
mstorsjo added a comment.
Herald added a reviewer: shafik.

Hopefully this is a bit less schizophrenic now. It's still quite RFC'y and many 
of the fixup loops could probably still be refactored, if someone suggests 
where to place the shared code.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840

Files:
  lldb/include/lldb/Expression/DWARFExpression.h
  lldb/include/lldb/Symbol/LineTable.h
  lldb/include/lldb/Utility/ArchSpec.h
  lldb/include/lldb/Utility/RangeMap.h
  lldb/source/Expression/DWARFExpression.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
  lldb/source/Symbol/DWARFCallFrameInfo.cpp
  lldb/source/Symbol/LineTable.cpp
  lldb/source/Utility/ArchSpec.cpp
  lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s

Index: lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
===
--- /dev/null
+++ lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
@@ -0,0 +1,183 @@
+# Test that a linked windows executable, with a thumb bit in many address
+# fields, gets the thumb bit stripped out from addresses.
+
+# If the thumb bit isn't stripped out from subprogram ranges, 0x401006 is
+# associated with the function "entry", while it actually is the start of
+# the function "other".
+# If the thumb bit isn't stripped out from line tables, the LineEntry
+# points to the wrong line.
+
+# REQUIRES: lld, arm
+
+# RUN: llvm-mc -triple armv7-windows-gnu %s -filetype=obj > %t.o
+# RUN: lld-link %t.o -out:%t.exe -debug:dwarf -entry:entry -subsystem:console -lldmingw
+# RUN: %lldb %t.exe -o "image lookup -v -a 0x401006" -b | FileCheck %s
+
+# CHECK-LABEL: image lookup -v -a 0x401006
+# CHECK: Function: {{.*}}, name = "other", range = [0x00401006-0x0040100c)
+# CHECK: LineEntry: [0x00401006-0x0040100a): /path/to/src/dwarf-thumb.c:7:12
+
+.text
+.syntax unified
+.file   "dwarf-thumb.c"
+.file   1 "" "dwarf-thumb.c"
+.def entry;
+.scl2;
+.type   32;
+.endef
+.globl  entry   @ -- Begin function entry
+.p2align1
+.code   16  @ @entry
+.thumb_func
+entry:
+.Lfunc_begin0:
+.loc1 2 0   @ dwarf-thumb.c:2:0
+.cfi_sections .debug_frame
+.cfi_startproc
+.loc1 4 9 prologue_end  @ dwarf-thumb.c:4:9
+mov r1, r0
+.Ltmp0:
+b   other
+.Ltmp1:
+.Lfunc_end0:
+.cfi_endproc
+@ -- End function
+.def other;
+.scl2;
+.type   32;
+.endef
+.globl  other   @ -- Begin function other
+.p2align1
+.code   16  @ @other
+.thumb_func
+other:
+.Lfunc_begin1:
+.loc1 6 0   @ dwarf-thumb.c:6:0
+.cfi_startproc
+.loc1 7 12 prologue_end @ dwarf-thumb.c:7:12
+add.w   r0, r1, r1, lsl #1
+.loc1 7 2 is_stmt 0 @ dwarf-thumb.c:7:2
+bx  lr
+.Ltmp2:
+.Lfunc_end1:
+.cfi_endproc
+@ -- End function
+.section.debug_str,"dr"
+.Linfo_string:
+.Linfo_string1:
+.asciz  "dwarf-thumb.c"
+.Linfo_string2:
+.asciz  "/path/to/src"
+.Linfo_string3:
+.asciz  "other"
+.Linfo_string6:
+.asciz  "entry"
+.section.debug_loc,"dr"
+.Lsection_debug_loc:
+.Ldebug_loc0:
+.long   .Lfunc_begin0-.Lfunc_begin0
+.long   .Ltmp0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   80  @ DW_OP_reg0
+.long   .Ltmp0-.Lfunc_begin0
+.long   .Lfunc_end0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   81  @ DW_OP_reg1
+.long   0
+.long   0
+.section.debug_abbrev,"dr"
+.Lsection_abbrev:
+.byte   1   @ Abbreviation Code
+.byte   17  @ DW_TAG_compile_unit
+.byte   1   @ DW_CHILDREN_yes
+.byte   37  @ DW_AT_producer
+.byte   37  @ DW_FORM_strx1
+.byte   19  @ DW_AT_language
+.byte   5   @ DW_FORM_data2
+.byte   3   @ DW_AT_name
+.byte   14  @ 

[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2020-01-24 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1792498 , @labath wrote:

> In D70840#1791292 , @mstorsjo wrote:
>
> > And irrespectively if the ArchSpec vs Architecture design, can you (either 
> > of you) comment on the updated form of the patch?
>
>
> The code still seems somewhat schizophrenic to me. :/ The line tables are 
> fixed up super late, but DW_AT_low_pc is adjusted very early. The line table 
> adjustment happens even after sorting, which means the fixup could alter the 
> sort order. It probably wouldn't matter in practice, as everything would just 
> get decremented by one, but it still seems like a bad design. And adjusting 
> the low_pc so early will complicate the move to the llvm dwarf parser.
>
> I think I'd most prefer some middle ground where the fixup happens after the 
> lowest extraction layers are finished, but before the data hits the "generic" 
> code. It's possible that no such place exists right now, but it might be 
> possible to create something with a bit of refactoring...


I tried to revisit this a bit now. Thanks to D72920 
, some of the more problematic cases went 
away, and I tried to trace all callers of the relevant methods and moving the 
fixups into them. Now the DWARFDebugInfoEntry class is no longer touched at 
all. I also tried to move fixups to before sorting.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-20 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1792498 , @labath wrote:

> In D70840#1791292 , @mstorsjo wrote:
>
> > And irrespectively if the ArchSpec vs Architecture design, can you (either 
> > of you) comment on the updated form of the patch?
>
>
> The code still seems somewhat schizophrenic to me. :/ The line tables are 
> fixed up super late,


This bit was based on suggestions by @clayborg here; it could be moved earlier 
as well, there's not much restrictions on where that one is done.

> but DW_AT_low_pc is adjusted very early. The line table adjustment happens 
> even after sorting, which means the fixup could alter the sort order. It 
> probably wouldn't matter in practice, as everything would just get 
> decremented by one, but it still seems like a bad design. And adjusting the 
> low_pc so early will complicate the move to the llvm dwarf parser.
> 
> I think I'd most prefer some middle ground where the fixup happens after the 
> lowest extraction layers are finished, but before the data hits the "generic" 
> code. It's possible that no such place exists right now, but it might be 
> possible to create something with a bit of refactoring...

The main issue with DW_AT_low_pc/DW_AT_high_pc, is that they're used in many 
different places - it's not just a straightforward process of "read data from 
storage into intermediate representations", but there's different accessor 
methods that all end up going to the original source, fetching the data. 
`GetDIENamesAndRanges` is one of the methods that reads data and outputs arrays 
and ranges, and those could be considered to handle later after extracting. 
`GetAttributeAddressRange` is one that is used in different contexts, where it 
might be possible to move the fixing up to a higher layer. But then there's 
cases like `LookupAddress` which seems to fetch the raw data from storage, just 
to do `if ((lo_pc <= address) && (address < hi_pc))` to see if the info for the 
desired address is found.

But I guess it could be moved a few steps further out at least; for 
`LookupAddress` it could be done at the very end after fetching all the 
addresses, before doing the comparison, for `GetAttributeAddressRange` (which 
also is used by `LookupAddress`) it could be done right before returning, and 
with `GetDIENamesAndRanges` it could even be done by the caller.

The problem with the interface of the `DWARFDebugInfoEntry` class is that 
there's a lot of public methods, that provide access to data both at a high and 
low level of abstraction. Currently not all of the public methods are called 
from outside of the object, but the current implementation (where it's done 
immediately after loading values) was an attempt to safeguard all possible 
access patterns, even the ones not currently exercised. If we only do it in 
certain places, we could later end up with new code accessing the lower level 
methods, bypassing the fixups.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-20 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1791292 , @mstorsjo wrote:

> And irrespectively if the ArchSpec vs Architecture design, can you (either of 
> you) comment on the updated form of the patch?


The code still seems somewhat schizophrenic to me. :/ The line tables are fixed 
up super late, but DW_AT_low_pc is adjusted very early. The line table 
adjustment happens even after sorting, which means the fixup could alter the 
sort order. It probably wouldn't matter in practice, as everything would just 
get decremented by one, but it still seems like a bad design. And adjusting the 
low_pc so early will complicate the move to the llvm dwarf parser.

I think I'd most prefer some middle ground where the fixup happens after the 
lowest extraction layers are finished, but before the data hits the "generic" 
code. It's possible that no such place exists right now, but it might be 
possible to create something with a bit of refactoring...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-20 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

The main reason I don't want the Architecture class in ArchSpec is to keep the 
dependency graph of lldb-server clean (it's not very clean to begin with, but 
at least I want to avoid making it worse). lldb-server should not know anything 
about the Target class (that's a liblldb concept), but it has plenty of reasons 
to play around with ArchSpecs. If ArchSpec contained an Architecture instance, 
then lldb-server would end up transitively 
(Architecture::GetBreakableLoadAddress and friends) depending on Target. That's 
why I think we need two repositories for architecture-specific code. ArchSpec 
could be the home for the shared liblldb+lldb-server stuff (which will end up 
being low level code only, because lldb-server is low-level). The home for the 
second stuff might as well be the Architecture class. The &~1 logic seems 
sufficiently low-level to fall into the first category -- even though 
lldb-server does not seem to need it right now, it's not hard to imagine it 
needing that in the future.

A secondary issue is the inheritance. ArchSpec is currently a simple, (mostly) 
trivially copyable class. Introducing inheritance would complicate that. It's 
true that virtual dispatch can be cleaner than a switch, but I think this only 
applies if the logic inside the switch cases is complicated. And even then the 
code can be made relatively clean with a bit of care (and llvm is does not seem 
to be afraid of switches in general).  In this particular case, I think that 
the logic is actually so simple that the virtual dispatch would only obscure 
what is going on.

If, in the future, we end up needing to put some more complicated code here, we 
can revisit this decision (in that case, I'd probably argue for creating some 
sort of a different entity to hold this functionality), but for now, I don't 
see a reason to complicate the ArchSpec design on account of this.

I can see how, from the perspective of someone maintaining a downstream target, 
having everything architecture-specific in a single place would be appealing, 
but I wouldn't want to put this objective too high on the priority list. 
Otherwise, I fear that this entity (whatever it would end up being called) will 
end up being a central nexus for everything in lldb. For example there's a lot 
of arm-specific code in ObjectFileELF. Some of that could be restructured to be 
independent of elf, but doing that for everything would be tricky, and so we 
may end up with everything depending on ELF.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-19 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added a comment.

In D70840#1783351 , @labath wrote:

> I am strongly opposed to ArchSpec owing an Architecture object. The latter is 
> far more complicated -- it reads bytes from target memory and disassembles 
> them -- whereas ArchSpec just returns a bunch of constants. If anything, it 
> should be the other way around. That way the code which just fiddles with 
> triples does not need to depend on the whole world.


I know part of this re-org was to keep the ArchSpec class from accessing any 
Target stuff to keep each directory with source code from accessing stuff in 
another tree. So from that perspective I can see your objection. I don't think 
it adds anything at all to the code by splitting this up other than achieving 
this separation though. It makes sense to ask an architecture to do 
architecture specific things. No sure why they need to be split IMHO.

> Having an Architecture instance in the Module object (to ensure it's 
> independent of the target) seems /ok/, though I am not really convinced that 
> it is needed, as the &~1 logic seems like a good candidate for ArchSpec 
> (which already knows a lot about arm vs. thumb etc.

I would rather keep all architecture specific code in the Architecture plug-ins 
somehow. It keeps code modifications clearer as only on file changes that is 
architecture specific. If we inline stuff into ArchSpec we end up with a big 
switch statement for the core and every adds their code to the same function.

So not sure what the right solution is here. I still like folding the 
Architecture into ArchSpec myself, but I am open to any and all opinions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-19 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

@clayborg - can you follow up on @labath's reply here?

And irrespectively if the ArchSpec vs Architecture design, can you (either of 
you) comment on the updated form of the patch?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-13 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

I am strongly opposed to ArchSpec owing an Architecture object. The latter is 
far more complicated -- it reads bytes from target memory and disassembles them 
-- whereas ArchSpec just returns a bunch of constants. If anything, it should 
be the other way around. That way the code which just fiddles with triples does 
not need to depend on the whole world.

Having an Architecture instance in the Module object (to ensure it's 
independent of the target) seems /ok/, though I am not really convinced that it 
is needed, as the &~1 logic seems like a good candidate for ArchSpec (which 
already knows a lot about arm vs. thumb etc.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-12 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1782951 , @clayborg wrote:

> So my idea would be:
>  1 - add all virtual functions to from lldb_private::Architecture as normal 
> members of ArchSpec. 
>  2 - have ArchSpec grap the the lldb_private::Architecture using info in the 
> ArchSpec when it is needed for methods added in step 1 and still call through 
> to the lldb_private::Architecture plug-ins


So ArchSpec would have an Architecture* member which is lazy loaded when needed?

A couple details come to mind:

- This would have to be a shared_ptr, to keep ArchSpec easily copyable
- As those methods where this is used are const, should the shared_ptr 
Architecture be made mutable? Or skip the const on those methods?
- There's a few places (in DWARFDebugInfoEntry) where I fetch a new copy of an 
ArchSpec (almost) each time I use it. In those cases, we'd end up with a new 
lazy load of the plugin each time (if the originating object hasn't needed 
loading the plugin yet). So perhaps it's best to always load it (like in 
ArchSpec::SetTriple) and similar ones, so that there's always an initialized 
shared_ptr that can be copied along cheaply?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-12 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added a comment.

Sorry for the delay. I have had medical issues going on for the past month.

The hard part about Architecture vs ArchSpec is the architecture of the target 
doesn't always exactly match the arch of each module. But I agree that we need 
to be able to do more with ArchSpec level classes.

If we can get the Architecture from the inside the ArchSpec.cpp file only and 
always work through the ArchSpec class (an possible rename it to Architecture 
after get rid of the old Architecture?). There is nothing that seems target 
specific about the current lldb_private::Architecture class.

So my idea would be:
1 - add all virtual functions to from lldb_private::Architecture as normal 
members of ArchSpec. 
2 - have ArchSpec grap the the lldb_private::Architecture using info in the 
ArchSpec when it is needed for methods added in step 1 and still call through 
to the lldb_private::Architecture plug-ins 
3 - update all code that was directly accessing lldb_private::Architecture from 
the target and have them get the ArchSpec from the target instead and use that 
to make the same calls
4 - remove "Architecture *GetArchitecturePlugin()" from Target.h

My reasoning is there is no reason we need to access lldb_private::Architecture 
from a target only.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-12 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo updated this revision to Diff 233586.
mstorsjo edited the summary of this revision.
mstorsjo added a comment.

Added ArchSpec::GetOpcodeLoadAddress() (didn't refactor Architecture to use it 
yet, so this is still duplicated code with the ArchitectureArm and 
ArchitectureMips plugins, but posting for feedback).

Changed the existing DWARFCallFrameInfo to use it for clearing the lowest bit, 
and added a Finalize pass to both LineTable and DWARFDebugAranges for fixing it 
up.

DWARFDebugInfoEntry still has the calls scattered around, as there's many 
different call paths, and once parsed, the data ends up in many different 
places.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840

Files:
  lldb/include/lldb/Symbol/LineTable.h
  lldb/include/lldb/Utility/ArchSpec.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
  lldb/source/Symbol/DWARFCallFrameInfo.cpp
  lldb/source/Symbol/LineTable.cpp
  lldb/source/Utility/ArchSpec.cpp
  lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s

Index: lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
===
--- /dev/null
+++ lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
@@ -0,0 +1,183 @@
+# Test that a linked windows executable, with a thumb bit in many address
+# fields, gets the thumb bit stripped out from addresses.
+
+# If the thumb bit isn't stripped out from subprogram ranges, 0x401006 is
+# associated with the function "entry", while it actually is the start of
+# the function "other".
+# If the thumb bit isn't stripped out from line tables, the LineEntry
+# points to the wrong line.
+
+# REQUIRES: lld, arm
+
+# RUN: llvm-mc -triple armv7-windows-gnu %s -filetype=obj > %t.o
+# RUN: lld-link %t.o -out:%t.exe -debug:dwarf -entry:entry -subsystem:console -lldmingw
+# RUN: %lldb %t.exe -o "image lookup -v -a 0x401006" -b | FileCheck %s
+
+# CHECK-LABEL: image lookup -v -a 0x401006
+# CHECK: Function: {{.*}}, name = "other", range = [0x00401006-0x0040100c)
+# CHECK: LineEntry: [0x00401006-0x0040100a): /path/to/src/dwarf-thumb.c:7:12
+
+.text
+.syntax unified
+.file   "dwarf-thumb.c"
+.file   1 "" "dwarf-thumb.c"
+.def entry;
+.scl2;
+.type   32;
+.endef
+.globl  entry   @ -- Begin function entry
+.p2align1
+.code   16  @ @entry
+.thumb_func
+entry:
+.Lfunc_begin0:
+.loc1 2 0   @ dwarf-thumb.c:2:0
+.cfi_sections .debug_frame
+.cfi_startproc
+.loc1 4 9 prologue_end  @ dwarf-thumb.c:4:9
+mov r1, r0
+.Ltmp0:
+b   other
+.Ltmp1:
+.Lfunc_end0:
+.cfi_endproc
+@ -- End function
+.def other;
+.scl2;
+.type   32;
+.endef
+.globl  other   @ -- Begin function other
+.p2align1
+.code   16  @ @other
+.thumb_func
+other:
+.Lfunc_begin1:
+.loc1 6 0   @ dwarf-thumb.c:6:0
+.cfi_startproc
+.loc1 7 12 prologue_end @ dwarf-thumb.c:7:12
+add.w   r0, r1, r1, lsl #1
+.loc1 7 2 is_stmt 0 @ dwarf-thumb.c:7:2
+bx  lr
+.Ltmp2:
+.Lfunc_end1:
+.cfi_endproc
+@ -- End function
+.section.debug_str,"dr"
+.Linfo_string:
+.Linfo_string1:
+.asciz  "dwarf-thumb.c"
+.Linfo_string2:
+.asciz  "/path/to/src"
+.Linfo_string3:
+.asciz  "other"
+.Linfo_string6:
+.asciz  "entry"
+.section.debug_loc,"dr"
+.Lsection_debug_loc:
+.Ldebug_loc0:
+.long   .Lfunc_begin0-.Lfunc_begin0
+.long   .Ltmp0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   80  @ DW_OP_reg0
+.long   .Ltmp0-.Lfunc_begin0
+.long   .Lfunc_end0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   81  @ DW_OP_reg1
+.long   0
+.long   0
+.section.debug_abbrev,"dr"
+.Lsection_abbrev:
+.byte   1   @ Abbreviation Code
+.byte   17  @ DW_TAG_compile_unit
+.byte   1   @ DW_CHILDREN_yes
+.byte   37  @ DW_AT_producer
+.byte   37  @ DW_FORM_strx1
+.byte   19  @ 

[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-12 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1779077 , @mstorsjo wrote:

> In D70840#1763639 , @labath wrote:
>
> > - I think we ought to have some kind of a utility function to hold the 
> > logic for the `&~1` stuff. there is something like that in 
> > Architecture::GetOpcodeLoadAddress. The Architecture class is mostly 
> > independent of a target, so we may be able create one instance for each 
> > module or symbol file, but that feels quite heavy. I might even go for 
> > putting something like that into the ArchSpec class. The raison d'être of 
> > the Architecture class was to evict some heavy code out of ArchSpec -- this 
> > was ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so 
> > still don't thing it belongs in ArchSpec, but the `&~1` thingy seems like 
> > it could find a place there.)
>
>
> I'm trying to dig into this now even if @clayborg hasn't commented on your 
> suggestions.
>
> This was moved out from the Target class in D48623 
> /rG7aa9e7bc5739f468143b7b97060a9fbbfb94c6d2 
> . In 
> addition to GetOpcodeLoadAddress and GetCallableLoadAddress, there's also 
> GetBreakableLoadAddress in ArchitectureMips, which is a rather big piece of 
> code as well, so I guess that can't be motivated to be moved, but as you 
> write, the `&~1` bit in itself might be ok.


Yes `GetBreakableLoadAddress` is definitely too big (and a huge hack too)...

> If we add a GetOpcodeLoadAddress in ArchSpec, which does `&~1` on mips and 
> arm - should we try to call this from the Architecture plugins (we don't have 
> an ArchSpec stored there right now), or just see it as tolerable and 
> justifiable duplication?

I think it's fine to have an ArchSpec member in the "Architecture" plugin, and 
then implement the forwarding in the base class. As an alternative, we could 
look at the existing callers of this function, and see if they have an ArchSpec 
around so they could call this directly.

> The existing GetOpcodeLoadAddress takes an AddressClass as well, should we 
> keep that, or just make a plain GetOpcodeLoadAddress that assumes that the 
> provided address is a code address?

I /think/ keeping the address class argument would be less confusing, but I am 
open to other ideas too...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-11 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1763639 , @labath wrote:

> - I think we ought to have some kind of a utility function to hold the logic 
> for the `&~1` stuff. there is something like that in 
> Architecture::GetOpcodeLoadAddress. The Architecture class is mostly 
> independent of a target, so we may be able create one instance for each 
> module or symbol file, but that feels quite heavy. I might even go for 
> putting something like that into the ArchSpec class. The raison d'être of the 
> Architecture class was to evict some heavy code out of ArchSpec -- this was 
> ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still 
> don't thing it belongs in ArchSpec, but the `&~1` thingy seems like it could 
> find a place there.)


I'm trying to dig into this now even if @clayborg hasn't commented on your 
suggestions.

This was moved out from the Target class in D48623 
/rG7aa9e7bc5739f468143b7b97060a9fbbfb94c6d2 
. In 
addition to GetOpcodeLoadAddress and GetCallableLoadAddress, there's also 
GetBreakableLoadAddress in ArchitectureMips, which is a rather big piece of 
code as well, so I guess that can't be motivated to be moved, but as you write, 
the `&~1` bit in itself might be ok.

If we add a GetOpcodeLoadAddress in ArchSpec, which does `&~1` on mips and arm 
- should we try to call this from the Architecture plugins (we don't have an 
ArchSpec stored there right now), or just see it as tolerable and justifiable 
duplication? The existing GetOpcodeLoadAddress takes an AddressClass as well, 
should we keep that, or just make a plain GetOpcodeLoadAddress that assumes 
that the provided address is a code address?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-06 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1767611 , @mstorsjo wrote:

> In D70840#1767049 , @labath wrote:
>
> > In D70840#1767028 , @mstorsjo 
> > wrote:
> >
> > > In D70840#1765373 , @clayborg 
> > > wrote:
> > >
> > > > So some background on how address masks are handled in LLDB:
> > > >
> > > > Typically the way we have tried to take care of the extra Thumb bit for 
> > > > ARM using:
> > > >
> > > >   lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool 
> > > > is_indirect = false) const;
> > > >   lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass 
> > > > addr_class = AddressClass::eInvalid) const;
> > > >   
> > > >
> > > > The first will add the extra bit to an address if needed. The latter 
> > > > will strip the bit if needed. This does require a target though and the 
> > > > target uses the "Architecture" class for ARM to do the work of using 
> > > > the mask. Not sure if we want to try to get an architecture class and 
> > > > use that here for stripping the bit instead of using an address mask?
> > >
> > >
> > > So, where do I get hold of an Architecture* object instance from within 
> > > the relevant contexts (within SymbolFileDWARF, where we have a reference 
> > > to the object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp 
> > > where we have existing code that does the same.
> >
> >
> > Well... that's the tricky part (which I've tried to allude to in  
> > D70840#1763639 . Currently the 
> > only thing holding an architecture object is the target class, but since 
> > one of the goals of the SymbolFile stuff is to be independent of any 
> > particular target, you cannot easily get hold of that. That is why I tried 
> > to steer this towards having this in the ArchSpec class. If we don't want 
> > that, we'll probably have to create one new Architecture instance local to 
> > each Module object...
>
>
> Ah, sorry for overlooking those parts of your comment. @clayborg - what do 
> you think of @labath's suggestions here?


Ping @clayborg - can you comment on @labath's suggestions?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-03 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1767049 , @labath wrote:

> In D70840#1767028 , @mstorsjo wrote:
>
> > In D70840#1765373 , @clayborg 
> > wrote:
> >
> > > So some background on how address masks are handled in LLDB:
> > >
> > > Typically the way we have tried to take care of the extra Thumb bit for 
> > > ARM using:
> > >
> > >   lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool 
> > > is_indirect = false) const;
> > >   lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass 
> > > addr_class = AddressClass::eInvalid) const;
> > >   
> > >
> > > The first will add the extra bit to an address if needed. The latter will 
> > > strip the bit if needed. This does require a target though and the target 
> > > uses the "Architecture" class for ARM to do the work of using the mask. 
> > > Not sure if we want to try to get an architecture class and use that here 
> > > for stripping the bit instead of using an address mask?
> >
> >
> > So, where do I get hold of an Architecture* object instance from within the 
> > relevant contexts (within SymbolFileDWARF, where we have a reference to the 
> > object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we 
> > have existing code that does the same.
>
>
> Well... that's the tricky part (which I've tried to allude to in  
> D70840#1763639 . Currently the only 
> thing holding an architecture object is the target class, but since one of 
> the goals of the SymbolFile stuff is to be independent of any particular 
> target, you cannot easily get hold of that. That is why I tried to steer this 
> towards having this in the ArchSpec class. If we don't want that, we'll 
> probably have to create one new Architecture instance local to each Module 
> object...


Ah, sorry for overlooking those parts of your comment. @clayborg - what do you 
think of @labath's suggestions here?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-03 Thread Pavel Labath via Phabricator via lldb-commits
labath added a comment.

In D70840#1767028 , @mstorsjo wrote:

> In D70840#1765373 , @clayborg wrote:
>
> > So some background on how address masks are handled in LLDB:
> >
> > Typically the way we have tried to take care of the extra Thumb bit for ARM 
> > using:
> >
> >   lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool 
> > is_indirect = false) const;
> >   lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class 
> > = AddressClass::eInvalid) const;
> >   
> >
> > The first will add the extra bit to an address if needed. The latter will 
> > strip the bit if needed. This does require a target though and the target 
> > uses the "Architecture" class for ARM to do the work of using the mask. Not 
> > sure if we want to try to get an architecture class and use that here for 
> > stripping the bit instead of using an address mask?
>
>
> So, where do I get hold of an Architecture* object instance from within the 
> relevant contexts (within SymbolFileDWARF, where we have a reference to the 
> object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have 
> existing code that does the same.


Well... that's the tricky part (which I've tried to allude to in  
D70840#1763639 . Currently the only 
thing holding an architecture object is the target class, but since one of the 
goals of the SymbolFile stuff is to be independent of any particular target, 
you cannot easily get hold of that. That is why I tried to steer this towards 
having this in the ArchSpec class. If we don't want that, we'll probably have 
to create one new Architecture instance local to each Module object...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-03 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1765373 , @clayborg wrote:

> So some background on how address masks are handled in LLDB:
>
> Typically the way we have tried to take care of the extra Thumb bit for ARM 
> using:
>
>   lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool 
> is_indirect = false) const;
>   lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = 
> AddressClass::eInvalid) const;
>   
>
> The first will add the extra bit to an address if needed. The latter will 
> strip the bit if needed. This does require a target though and the target 
> uses the "Architecture" class for ARM to do the work of using the mask. Not 
> sure if we want to try to get an architecture class and use that here for 
> stripping the bit instead of using an address mask?


So, where do I get hold of an Architecture* object instance from within the 
relevant contexts (within SymbolFileDWARF, where we have a reference to the 
object file)? Also within source/Symbol/DWARFCallFrameInfo.cpp where we have 
existing code that does the same.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread David Blaikie via Phabricator via lldb-commits
dblaikie added a comment.

In D70840#1765876 , @mstorsjo wrote:

> In D70840#1765219 , @labath wrote:
>
> > debug_aranges is a sort of a lookup table for speeding up address->compile 
> > unit searches. llvm does not generate it by default since, and I think the 
> > reason is that you can usually get the same kind of data from the 
> > DW_AT_ranges attribute of the compile unit.
>
>
> Ok, thanks for the explanation!
>
> On the topic of debug_aranges (but unrelated to this thread), I've been 
> notified about a project, drmingw, a postmortem debugger for the mingw 
> environment (I think), which stumbles on code built by llvm exactly due to 
> the lack of debug_aranges: 
> https://github.com/jrfonseca/drmingw/issues/42#issuecomment-526850588 They 
> speculate that it almost even would be mandated by the dwarf spec.


Yeah, I've had some conversations with folks on a couple of different things 
that have been curious about Clang/LLVM's lack of debug_aranges - my general 
take on it (as one of the people involved in turning off debug_aranges by 
default in LLVM's output - it can still be enabled by -gdwarf-aranges) is that 
they add a significant amount of object size for little benefit compared to the 
ranges on CU DIEs themselves. Clients who read input without aranges for a 
given CU should look in the CU DIE to see if there are ranges there - it should 
add very little overhead to consumers & save some significant debug info size 
(especially for split-dwarf in object sizes (where the relocations still exist 
& take up a fair bit of space - especially with compression enabled, where 
relocations are not compressed)).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo marked an inline comment as done.
mstorsjo added a comment.

In D70840#1765219 , @labath wrote:

> debug_aranges is a sort of a lookup table for speeding up address->compile 
> unit searches. llvm does not generate it by default since, and I think the 
> reason is that you can usually get the same kind of data from the 
> DW_AT_ranges attribute of the compile unit.


Ok, thanks for the explanation!

On the topic of debug_aranges (but unrelated to this thread), I've been 
notified about a project, drmingw, a postmortem debugger for the mingw 
environment (I think), which stumbles on code built by llvm exactly due to the 
lack of debug_aranges: 
https://github.com/jrfonseca/drmingw/issues/42#issuecomment-526850588 They 
speculate that it almost even would be mandated by the dwarf spec.

> Overall, I think this needs to be handled in DWARF code. That may even make 
> sense, since PDB may not suffer from the same problem (does it?)

Not sure actually, I'd have to check. (FWIW, llvm doesn't support generating 
codeview debug info for PDB for ARM, only x86 and aarch64, but I could check 
with MSVC tools.)




Comment at: lldb/include/lldb/Symbol/LineTable.h:333
+
+  bool m_clear_address_zeroth_bit = false;
 };

clayborg wrote:
> Might be nice to let the line table parse itself first, and then in a post 
> production step clean up all the addresses? Maybe
> 
> ```
> void LineTable::Finalize(Architecture *arch);
> ```
> 
> Then we let the architecture plug-in handle any stripping using:
> 
> ```
> lldb::addr_t Architecture::GetOpcodeLoadAddress(lldb::addr_t addr, 
> AddressClass addr_class) const;
> ```
> 
> The ArchitectureArm plugin does this:
> 
> ```
> addr_t ArchitectureArm::GetOpcodeLoadAddress(addr_t opcode_addr,
>  AddressClass addr_class) const {
>   switch (addr_class) {
>   case AddressClass::eData:
>   case AddressClass::eDebug:
> return LLDB_INVALID_ADDRESS;
>   default: break;
>   }
>   return opcode_addr & ~(1ull);
> }
> ```
> 
> 
Thanks for all the pointers and the explanations! I'll try to have a look into 
all of this in a few days.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread Adrian McCarthy via Phabricator via lldb-commits
amccarth resigned from this revision.
amccarth added a comment.

I'm following along, but I don't think I have enough domain knowledge to 
qualify as a reviewer.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added a comment.

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM 
using:

  lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect 
= false) const;
  lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = 
AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip 
the bit if needed. This does require a target though and the target uses the 
"Architecture" class for ARM to do the work of using the mask. Not sure if we 
want to try to get an architecture class and use that here for stripping the 
bit instead of using an address mask?

Also, any lldb_private::Address can be asked for its address class:

  AddressClass Address::GetAddressClass() const;

This will return "eCode" for ARM code and "eCodeAlternateISA" for Thumb code. 
This is resolved by the ObjectFile::GetAddressClass:

  /// Get the address type given a file address in an object file.
  ///
  /// Many binary file formats know what kinds This is primarily for ARM
  /// binaries, though it can be applied to any executable file format that
  /// supports different opcode types within the same binary. ARM binaries
  /// support having both ARM and Thumb within the same executable container.
  /// We need to be able to get \return
  /// The size of an address in bytes for the currently selected
  /// architecture (and object for archives). Returns zero if no
  /// architecture or object has been selected.
  virtual AddressClass GetAddressClass(lldb::addr_t file_addr);

So currently no code in LLDB tries to undo the address masks that might be in 
the object file or debug info, and we take care of it after the fact. Early in 
the ARM days there used to be extra symbols that were added to the symbol table 
with names like "$a" for ARM, "$t" for Thumb and "$d" for data. There would be 
multiple of these symbols in an ordered vector of symbols that would create a 
CPU map. This was early in the ARM days before the branch instruction would 
watch for bit zero. Later ARM architectures started using bit zero to indicate 
which mode to put the processor in instead of using an explicit "branch to arm" 
or "branch to thumb" instructions. When this new stuff came out bit zero 
started showing up in symbol tables. So the current code allows for either the 
old style (CPU map in symbol table with $a $t and $d symbols) and the new style 
(bit zero set and no CPU map).




Comment at: lldb/include/lldb/Symbol/LineTable.h:333
+
+  bool m_clear_address_zeroth_bit = false;
 };

Might be nice to let the line table parse itself first, and then in a post 
production step clean up all the addresses? Maybe

```
void LineTable::Finalize(Architecture *arch);
```

Then we let the architecture plug-in handle any stripping using:

```
lldb::addr_t Architecture::GetOpcodeLoadAddress(lldb::addr_t addr, AddressClass 
addr_class) const;
```

The ArchitectureArm plugin does this:

```
addr_t ArchitectureArm::GetOpcodeLoadAddress(addr_t opcode_addr,
 AddressClass addr_class) const {
  switch (addr_class) {
  case AddressClass::eData:
  case AddressClass::eDebug:
return LLDB_INVALID_ADDRESS;
  default: break;
  }
  return opcode_addr & ~(1ull);
}
```





Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp:19-20
 // Constructor
-DWARFDebugAranges::DWARFDebugAranges() : m_aranges() {}
+DWARFDebugAranges::DWARFDebugAranges(dw_addr_t addr_mask)
+: m_aranges(), m_addr_mask(addr_mask) {}
 

Use Architecture plug-in instead of hard coded mask.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h:53
   RangeToDIE m_aranges;
+  dw_addr_t m_addr_mask;
 };

See comment in LineTable.h above.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp:41-42
 
-  m_cu_aranges_up = std::make_unique();
+  m_cu_aranges_up =
+  std::make_unique(m_dwarf.GetAddressMask());
   const DWARFDataExtractor _aranges_data =

Use Architecture plug-in instead of hard coded mask.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:258
 case DW_AT_low_pc:
-  lo_pc = form_value.Address();
+  lo_pc = form_value.Address() & m_addr_mask;
 

Use Architecture plug-in instead of hard coded mask.



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:273
   form_value.Form() == DW_FORM_GNU_addr_index) {
-hi_pc = form_value.Address();
+hi_pc = form_value.Address() & m_addr_mask;
   } else {

ditto...



Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:44-45
 
+  void 

[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread Greg Clayton via Phabricator via lldb-commits
clayborg added a comment.

So the main goal here is to let the Architecture plug-in handle the address 
masking in all places and make sure there is no code like:

  if (ArchSpec arch = m_objfile_sp->GetArchitecture()) {
if (arch.GetTriple().getArch() == llvm::Triple::arm ||
arch.GetTriple().getArch() == llvm::Triple::thumb)
  return ~1ull;
  }

anywhere in the codebase. Otherwise when we add a new architecture that 
requires bit stripping/adding, we end up having to change many locations in the 
code (2 of which were added with this patch). So use the Architecture plug-in 
to do this and we just need to edit the Architecture plug-in code for each 
architecture.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread David Blaikie via Phabricator via lldb-commits
dblaikie added a subscriber: rnk.
dblaikie added a comment.

In D70840#1765219 , @labath wrote:

> Throwing some additional dwarf people in here...
>
> In D70840#1763750 , @mstorsjo wrote:
>
> > In D70840#1763705 , @mstorsjo 
> > wrote:
> >
> > > In D70840#1763639 , @labath 
> > > wrote:
> > >
> > > > Yeah, this is going to be tricky... I don't really know what's the 
> > > > right way to do this, but here are the thoughts I have on this so far:
> > > >
> > > > - The new DWARFDebugInfoEntry member is going to substantially increase 
> > > > our memory footprint -- that's a non-starter
> > >
> > >
> > > Ok, I feared that class was one where the size is critical yeah...
> > >
> > > > - I don't like the inconsistency where all addresses are demangled in 
> > > > the DWARF code, except the line tables, which are in the "generic" code
> > >
> > > Yup. The reason for doing it like this was as I tried to find the most 
> > > narrow interface where I could catch all of them.
> >
> >
> > Is there any corresponding place in generic code where one could do the 
> > same operation on the addresses that comes from pc_lo/pc_hi ranges from 
> > .debug_info and .debug_aranges (not familiar with this section and when 
> > it's generated etc), if that would end up touching fewer places? The 
> > existing predecent in DWARFCallFrameInfo::GetFDEIndex is dwarf specific, 
> > but in generic code outside of the dwarf symbolfile plugin.
>
>
> debug_aranges is a sort of a lookup table for speeding up address->compile 
> unit searches. llvm does not generate it by default since, and I think the 
> reason is that you can usually get the same kind of data from the 
> DW_AT_ranges attribute of the compile unit. I don't think you would be able 
> to handle that in generic code, as debug_aranges does not make it's way into 
> generic code.
>
> Overall, I think this needs to be handled in DWARF code. That may even make 
> sense, since PDB may not suffer from the same problem (does it?)
>
> TBH, the `m_addr_mask` member issue is not that hard to resolve -- all 
> DWARFDebugInfoEntry functions get a DWARFUnit argument (otherwise they 
> wouldn't be able to do anything). Theoretically, we could store the mask 
> there.
>
> However, the question on my mind is what does that say about the llvm<->lldb 
> dwarf parser convergence (one of our long term goals is to delete the lldb 
> dwarf parser altogether, leaving just the high level lldb-specific stuff). 
> Adding mask handling code low into the dwarf parser would go against that 
> goal, as there is no equivalent llvm functionality.
>
> One possible solution would be to try to add equivalent llvm functionality -- 
> it sounds like something like this is needed there too, as all llvm tools (I 
> don't know if there's anything besides llvm-symbolizer) will probably not 
> work without it. (If they do, it would be interesting to figure out how they 
> manage that.)


Yeah, I don't know much about the ARM tagged pointer stuff, but if the tags 
don't appear in the return addresses in a stack trace & thus the addresses 
you'd pass to llvm-symbolizer then I think it'd make sense to implement it 
somewhere in LLVM's libDebugInfoDWARF (& yeah, don't know about PDB either, 
perhaps @rnk does). If there is no common code between debug_aranges and other 
address parsing -perhaps there should be? a filter of some kind that could be 
used for all addresses being read?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-12-02 Thread Pavel Labath via Phabricator via lldb-commits
labath added reviewers: JDevlieghere, aprantl.
labath added a subscriber: dblaikie.
labath added a comment.

Throwing some additional dwarf people in here...

In D70840#1763750 , @mstorsjo wrote:

> In D70840#1763705 , @mstorsjo wrote:
>
> > In D70840#1763639 , @labath wrote:
> >
> > > Yeah, this is going to be tricky... I don't really know what's the right 
> > > way to do this, but here are the thoughts I have on this so far:
> > >
> > > - The new DWARFDebugInfoEntry member is going to substantially increase 
> > > our memory footprint -- that's a non-starter
> >
> >
> > Ok, I feared that class was one where the size is critical yeah...
> >
> > > - I don't like the inconsistency where all addresses are demangled in the 
> > > DWARF code, except the line tables, which are in the "generic" code
> >
> > Yup. The reason for doing it like this was as I tried to find the most 
> > narrow interface where I could catch all of them.
>
>
> Is there any corresponding place in generic code where one could do the same 
> operation on the addresses that comes from pc_lo/pc_hi ranges from 
> .debug_info and .debug_aranges (not familiar with this section and when it's 
> generated etc), if that would end up touching fewer places? The existing 
> predecent in DWARFCallFrameInfo::GetFDEIndex is dwarf specific, but in 
> generic code outside of the dwarf symbolfile plugin.


debug_aranges is a sort of a lookup table for speeding up address->compile unit 
searches. llvm does not generate it by default since, and I think the reason is 
that you can usually get the same kind of data from the DW_AT_ranges attribute 
of the compile unit. I don't think you would be able to handle that in generic 
code, as debug_aranges does not make it's way into generic code.

Overall, I think this needs to be handled in DWARF code. That may even make 
sense, since PDB may not suffer from the same problem (does it?)

TBH, the `m_addr_mask` member issue is not that hard to resolve -- all 
DWARFDebugInfoEntry functions get a DWARFUnit argument (otherwise they wouldn't 
be able to do anything). Theoretically, we could store the mask there.

However, the question on my mind is what does that say about the llvm<->lldb 
dwarf parser convergence (one of our long term goals is to delete the lldb 
dwarf parser altogether, leaving just the high level lldb-specific stuff). 
Adding mask handling code low into the dwarf parser would go against that goal, 
as there is no equivalent llvm functionality.

One possible solution would be to try to add equivalent llvm functionality -- 
it sounds like something like this is needed there too, as all llvm tools (I 
don't know if there's anything besides llvm-symbolizer) will probably not work 
without it. (If they do, it would be interesting to figure out how they manage 
that.)

Another possibility might be to implement this in the higher levels of the 
dwarf code (the parts that are likely to stay in lldb forever)...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-11-29 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1763705 , @mstorsjo wrote:

> In D70840#1763639 , @labath wrote:
>
> > Yeah, this is going to be tricky... I don't really know what's the right 
> > way to do this, but here are the thoughts I have on this so far:
> >
> > - The new DWARFDebugInfoEntry member is going to substantially increase our 
> > memory footprint -- that's a non-starter
>
>
> Ok, I feared that class was one where the size is critical yeah...
>
> > - I don't like the inconsistency where all addresses are demangled in the 
> > DWARF code, except the line tables, which are in the "generic" code
>
> Yup. The reason for doing it like this was as I tried to find the most narrow 
> interface where I could catch all of them.


Is there any corresponding place in generic code where one could do the same 
operation on the addresses that comes from pc_lo/pc_hi ranges from .debug_info 
and .debug_aranges (not familiar with this section and when it's generated 
etc), if that would end up touching fewer places? The existing predecent in 
DWARFCallFrameInfo::GetFDEIndex is dwarf specific, but in generic code outside 
of the dwarf symbolfile plugin.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-11-29 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo updated this revision to Diff 231517.
mstorsjo edited the summary of this revision.
mstorsjo added a comment.

Converted the test to .s form, using lld, split out the ObjectFile changes to 
D70848  (which this change now depends on for 
the tests).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840

Files:
  lldb/include/lldb/Symbol/LineTable.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
  lldb/source/Symbol/LineTable.cpp
  lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s

Index: lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
===
--- /dev/null
+++ lldb/test/Shell/SymbolFile/DWARF/thumb-windows.s
@@ -0,0 +1,183 @@
+# Test that a linked windows executable, with a thumb bit in many address
+# fields, gets the thumb bit stripped out from addresses.
+
+# If the thumb bit isn't stripped out from subprogram ranges, 0x401006 is
+# associated with the function "entry", while it actually is the start of
+# the function "other".
+# If the thumb bit isn't stripped out from line tables, the LineEntry
+# points to the wrong line.
+
+# REQUIRES: lld, arm
+
+# RUN: llvm-mc -triple armv7-windows-gnu %s -filetype=obj > %t.o
+# RUN: lld-link %t.o -out:%t.exe -debug:dwarf -entry:entry -subsystem:console -lldmingw
+# RUN: %lldb %t.exe -o "image lookup -v -a 0x401006" -b | FileCheck %s
+
+# CHECK-LABEL: image lookup -v -a 0x401006
+# CHECK: Function: {{.*}}, name = "other", range = [0x00401006-0x0040100c)
+# CHECK: LineEntry: [0x00401006-0x0040100a): /path/to/src/dwarf-thumb.c:7:12
+
+.text
+.syntax unified
+.file   "dwarf-thumb.c"
+.file   1 "" "dwarf-thumb.c"
+.def entry;
+.scl2;
+.type   32;
+.endef
+.globl  entry   @ -- Begin function entry
+.p2align1
+.code   16  @ @entry
+.thumb_func
+entry:
+.Lfunc_begin0:
+.loc1 2 0   @ dwarf-thumb.c:2:0
+.cfi_sections .debug_frame
+.cfi_startproc
+.loc1 4 9 prologue_end  @ dwarf-thumb.c:4:9
+mov r1, r0
+.Ltmp0:
+b   other
+.Ltmp1:
+.Lfunc_end0:
+.cfi_endproc
+@ -- End function
+.def other;
+.scl2;
+.type   32;
+.endef
+.globl  other   @ -- Begin function other
+.p2align1
+.code   16  @ @other
+.thumb_func
+other:
+.Lfunc_begin1:
+.loc1 6 0   @ dwarf-thumb.c:6:0
+.cfi_startproc
+.loc1 7 12 prologue_end @ dwarf-thumb.c:7:12
+add.w   r0, r1, r1, lsl #1
+.loc1 7 2 is_stmt 0 @ dwarf-thumb.c:7:2
+bx  lr
+.Ltmp2:
+.Lfunc_end1:
+.cfi_endproc
+@ -- End function
+.section.debug_str,"dr"
+.Linfo_string:
+.Linfo_string1:
+.asciz  "dwarf-thumb.c"
+.Linfo_string2:
+.asciz  "/path/to/src"
+.Linfo_string3:
+.asciz  "other"
+.Linfo_string6:
+.asciz  "entry"
+.section.debug_loc,"dr"
+.Lsection_debug_loc:
+.Ldebug_loc0:
+.long   .Lfunc_begin0-.Lfunc_begin0
+.long   .Ltmp0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   80  @ DW_OP_reg0
+.long   .Ltmp0-.Lfunc_begin0
+.long   .Lfunc_end0-.Lfunc_begin0
+.short  1   @ Loc expr size
+.byte   81  @ DW_OP_reg1
+.long   0
+.long   0
+.section.debug_abbrev,"dr"
+.Lsection_abbrev:
+.byte   1   @ Abbreviation Code
+.byte   17  @ DW_TAG_compile_unit
+.byte   1   @ DW_CHILDREN_yes
+.byte   37  @ DW_AT_producer
+.byte   37  @ DW_FORM_strx1
+.byte   19  @ DW_AT_language
+.byte   5   @ DW_FORM_data2
+.byte   3   @ DW_AT_name
+.byte   14  @ DW_FORM_strp
+.byte   16  @ DW_AT_stmt_list
+.byte   23  @ DW_FORM_sec_offset
+.byte   27  @ DW_AT_comp_dir
+.byte   14  @ DW_FORM_strp
+ 

[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-11-29 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo added a comment.

In D70840#1763639 , @labath wrote:

> Yeah, this is going to be tricky... I don't really know what's the right way 
> to do this, but here are the thoughts I have on this so far:
>
> - The new DWARFDebugInfoEntry member is going to substantially increase our 
> memory footprint -- that's a non-starter


Ok, I feared that class was one where the size is critical yeah...

> - I don't like the inconsistency where all addresses are demangled in the 
> DWARF code, except the line tables, which are in the "generic" code

Yup. The reason for doing it like this was as I tried to find the most narrow 
interface where I could catch all of them.

> - I think we ought to have some kind of a utility function to hold the logic 
> for the `&~1` stuff. there is something like that in 
> Architecture::GetOpcodeLoadAddress. The Architecture class is mostly 
> independent of a target, so we may be able create one instance for each 
> module or symbol file, but that feels quite heavy. I might even go for 
> putting something like that into the ArchSpec class. The raison d'être of the 
> Architecture class was to evict some heavy code out of ArchSpec -- this was 
> ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still 
> don't thing it belongs in ArchSpec, but the `&~1` thingy seems like it could 
> find a place there.)
> - is this behavior reproducible with lld ? If so, I think it'd make the test 
> more readable to run llvm-mc+lld as a part of the test

Yes it is. Ok, that's fine with me.

> - the address byte size would best be handled separately

On second thought, yes, this should be trivial to split out and make a similar 
test with inspecting line tables of an i386 binary.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-11-29 Thread Pavel Labath via Phabricator via lldb-commits
labath added reviewers: clayborg, jingham.
labath added a comment.

Yeah, this is going to be tricky... I don't really know what's the right way to 
do this, but here are the thoughts I have on this so far:

- The new DWARFDebugInfoEntry member is going to substantially increase our 
memory footprint -- that's a non-starter
- I don't like the inconsistency where all addresses are demangled in the DWARF 
code, except the line tables, which are in the "generic" code
- I think we ought to have some kind of a utility function to hold the logic 
for the `&~1` stuff. there is something like that in 
Architecture::GetOpcodeLoadAddress. The Architecture class is mostly 
independent of a target, so we may be able create one instance for each module 
or symbol file, but that feels quite heavy. I might even go for putting 
something like that into the ArchSpec class. The raison d'être of the 
Architecture class was to evict some heavy code out of ArchSpec -- this was 
ArchitectureArm::OverrideStopInfo. That one is definitely too heavy, so still 
don't thing it belongs in ArchSpec, but the `&~1` thingy seems like it could 
find a place there.)
- is this behavior reproducible with lld ? If so, I think it'd make the test 
more readable to run llvm-mc+lld as a part of the test
- the address byte size would best be handled separately


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

2019-11-29 Thread Martin Storsjö via Phabricator via lldb-commits
mstorsjo created this revision.
mstorsjo added reviewers: labath, amccarth.
Herald added subscribers: kristof.beyls, aprantl.
Herald added a reviewer: jdoerfert.
Herald added a project: LLDB.

Windows on ARM is always thumb, and contrary to ELF, there's no per-symbol flag 
to indicate arm vs thumb mode. Therefore, the linkers both Microsoft's link.exe 
and lld) sets the thumb bit in all relocations pointing to code sections.

In ELF, the thumb bit is only set for symbols marked as %function, that is, it 
isn't set in local labels used when constructing the dwarf debug info and line 
tables.

As the windows linkers sets the thumb bit in all code addresses, we'd have to 
cope with this and strip out the lowest bit from all code addresses if the 
architecture is ARM/Thumb.

The testcase uses a prelinked .yaml source instead of an assembly source file, 
to avoid needing to run a linker as part of the test.

There's already predecent for this in DWARFCallFrameInfo::GetFDEIndex, this 
extends the same concept to DWARF debug info and line tables. I'm not sure if I 
managed to get all spots covered here, but I tried to look for all cases of 
DW_AT_low_pc/DW_AT_high_pc and find the best place to handle it, where it would 
have to be handled in as few places as possible.

This is admittedly not very pretty, improvement suggestions are welcome.

Finally, this makes sure to propagate the correct address byte size to 
DataExtractors that are taken from e.g. ObjectFile::ReadSectionData. 
Previously, the DataExtractor got the host address size set, which later 
tripped up DWARFDebugLine::LineTable::parse as the address size of the line 
table (4) mismatched the one set in the DWARFDataExtractor (which was 
propagated from ObjectFile's internal m_data, which never got anything else set 
than the default from the host's sizeof(void*)).

This caused line tables to end up missing if inspecting a foreign object file 
(like in the testcase).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D70840

Files:
  lldb/include/lldb/Symbol/LineTable.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
  lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h
  lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
  lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
  lldb/source/Symbol/LineTable.cpp
  lldb/source/Symbol/ObjectFile.cpp
  lldb/test/Shell/SymbolFile/DWARF/lit.local.cfg
  lldb/test/Shell/SymbolFile/DWARF/thumb-windows.yaml

Index: lldb/test/Shell/SymbolFile/DWARF/thumb-windows.yaml
===
--- /dev/null
+++ lldb/test/Shell/SymbolFile/DWARF/thumb-windows.yaml
@@ -0,0 +1,125 @@
+# Test that a linked windows executable, with a thumb bit in many address
+# fields, gets the thumb bit stripped out from addresses.
+
+# If the thumb bit isn't stripped out from subprogram ranges, 0x401006 is
+# associated with the function "entry", while it actually is the start of
+# the function "other".
+# If the thumb bit isn't stripped out from line tables, the LineEntry
+# points to the wrong line.
+
+# RUN: yaml2obj %s > %t.exe
+# RUN: %lldb %t.exe -o "image lookup -v -a 0x401006" -b | FileCheck %s
+
+# CHECK-LABEL: image lookup -v -a 0x401006
+# CHECK: Function: {{.*}}, name = "other", range = [0x00401006-0x0040100c)
+# CHECK: LineEntry: [0x00401006-0x0040100a): /path/to/src/dwarf-thumb.c:7:12
+
+--- !COFF
+OptionalHeader:
+  AddressOfEntryPoint: 4097
+  ImageBase:   4194304
+  SectionAlignment: 4096
+  FileAlignment:   512
+  MajorOperatingSystemVersion: 6
+  MinorOperatingSystemVersion: 0
+  MajorImageVersion: 0
+  MinorImageVersion: 0
+  MajorSubsystemVersion: 6
+  MinorSubsystemVersion: 0
+  Subsystem:   IMAGE_SUBSYSTEM_WINDOWS_CUI
+  DLLCharacteristics: [ IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE, IMAGE_DLL_CHARACTERISTICS_NX_COMPAT, IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE ]
+  SizeOfStackReserve: 1048576
+  SizeOfStackCommit: 4096
+  SizeOfHeapReserve: 1048576
+  SizeOfHeapCommit: 4096
+  ExportTable:
+RelativeVirtualAddress: 0
+Size:0
+  ImportTable:
+RelativeVirtualAddress: 0
+Size:0
+  ResourceTable:
+RelativeVirtualAddress: 0
+Size:0
+  ExceptionTable:
+RelativeVirtualAddress: 0
+Size:0
+  CertificateTable:
+RelativeVirtualAddress: 0
+Size:0
+  BaseRelocationTable:
+RelativeVirtualAddress: 0
+Size:0
+  Debug:
+RelativeVirtualAddress: 8192
+Size:28
+  Architecture:
+RelativeVirtualAddress: 0
+Size:0
+  GlobalPtr:
+RelativeVirtualAddress: 0
+Size:0
+  TlsTable:
+RelativeVirtualAddress: 0
+Size:0
+