Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Frédéric Riss via lldb-dev


> On Feb 27, 2019, at 3:14 PM, Zachary Turner  wrote:
> 
> 
> 
> On Wed, Feb 27, 2019 at 2:52 PM Frédéric Riss  > wrote:
>> On Feb 27, 2019, at 10:12 AM, Zachary Turner > > wrote:
> 
> 
>> For what it's worth, in an earlier message I mentioned that I would probably 
>> build the server by using mostly code from LLVM, and making sure that it 
>> supported the union of things currently supported by LLDB and LLVM's DWARF 
>> parsers.  Doing that would naturally require merging the two (which has been 
>> talked about for a long time) as a pre-requisite, and I would expect that 
>> for testing purposes we might want something like llvm-dwarfdump but that 
>> dumps a higher level description of the information (if we change our DWARF 
>> emission code in LLVM for example, to output the exact same type in slightly 
>> different ways in the underlying DWARF, we wouldn't want our test to break, 
>> for example).  So for example imagine you could run something like 
>> `lldb-dwarfdump -lookup-type=foo a.out` and it would dump some description 
>> of the type that is resilient to insignificant changes in the underlying 
>> DWARF.
> 
> At which level do you consider the “DWARF parser” to stop and the debugger 
> policy to start? In my view, the DWARF parser stop at the DwarfDIE boundary. 
> Replacing it wouldn’t get us closer to a higher-level abstraction.
> At the level where you have an alternative representation that you no longer 
> have to access to the debug info.  In LLDB today, this "representation" is a 
> combination of LLDB's own internal symbol hierarchy (e.g. lldb_private::Type, 
> lldb_private::Function, etc) and the Clang AST.  Once you have constructed 
> those 2 things, the DWARF parser is out of the picture.
> 
> A lot of the complexity in processing raw DWARF comes from handling different 
> versions of the DWARF spec (e.g. supporting DWARF 4 & DWARF 5), collecting 
> and interpreting the subset of attributes which happens be present, following 
> references to other parts of the DWARF, and then at the end of all this (or 
> perhaps during all of this), dealing with "partial information" (e.g. 
> something that would have saved me a lot of trouble was missing, now I have 
> to do extra work to find it).
> 
> I'm treading DWARF expressions as an exception though, because it would be 
> somewhat tedious and not provide much value to convert those into some text 
> format and then evaluate the text representation of the expression since it's 
> already in a format suitable for processing.  So for this case, you could 
> just encode the byte sequence into a hex string and send that.
> 
> I hinted at this already, but part of the problem (at least in my mind) is 
> that our "DWARF parser" is intermingled with the code that *interprets the 
> parsed DWARF*.  We parse a little bit, build something, parse a little bit 
> more, add on to the thing we're building, etc.  This design is fragile and 
> makes error handling difficult, so part of what I'm proposing is a separation 
> here, where "parse as much as possible, and return an intermediate 
> representation that is as finished as we are able to make it".
> 
> This part is independent of whether DWARF parsing is out of process however.  
> That's still useful even if DWARF parsing is in process, and we've talked 
> about something like that for a long time, whereby we have some kind of API 
> that says "give me the thing, handle all errors internally, and either return 
> me a thing which I can trust or an error".  I'm viewing "thing which I can 
> trust" as some representation which is separate from the original DWARF, and 
> which we could test -- for example -- by writing a tool which dumps this 
> representation

Ok, here we are talking about something different (which you might have been 
expressing since the beginning and I misinterpreted). If you want to decouple 
dealing with DIEs from creating ASTs as a preliminary, then I think this would 
be super valuable and it addresses my concerns about duplicating the AST 
creation logic.

I’m sure Greg would have comments about the challenges of lazily parsing the 
DWARF in such a design.

>  
> 
>> At that point you're already 90% of the way towards what I'm proposing, and 
>> it's useful independently.
> 
> 
> I think that “90%” figure is a little off :-) But please don’t take my 
> questions as opposition to the general idea. I find the idea very 
> interesting, and we could maybe use something similar internally so I am 
> interested. That’s why I’m asking questions.
>  
> Hmm, well I think the 90% figure is pretty accurate.  Because if we envision 
> a hypothetical command line tool which ingests DWARF from a binary or set of 
> binaries, and has some command line interface that allows you to query it in 
> the same way our SymbolFile plugins can be queried, and dumps its output in 
> some intermediate format (maybe JSON, maybe something else) 

Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread via lldb-dev
I'm aware that GSYM doesn't have full info, but if you're both looking at 
symbol-server kinds of mechanics and protocols, it would be silly to separate 
them into Clayborg-servers and Zach-servers just because GSYM cares mainly 
about line info.
But whatever.  You guys are designing this, go for it.
--paulr

From: Zachary Turner [mailto:ztur...@google.com]
Sent: Wednesday, February 27, 2019 10:13 AM
To: Robinson, Paul
Cc: fr...@apple.com; lldb-dev@lists.llvm.org
Subject: Re: [lldb-dev] RFC: Moving debug info parsing out of process

GSYM, as I understand it, is basically just an evolution of Breakpad symbols.  
It doesn't contain full fidelity debug information (type information, function 
parameters, etc).
On Tue, Feb 26, 2019 at 5:56 PM 
mailto:paul.robin...@sony.com>> wrote:
When I see this "parsing DWARF and turning it into something else" it is very 
reminiscent of what clayborg is trying to do with GSYM.  You're both talking 
about leveraging LLVM's parser, which is great, but I have to wonder if there 
isn't more commonality being left on the table.  Just throwing that thought out 
there; I don't have anything specific to suggest.
--paulr

From: lldb-dev 
[mailto:lldb-dev-boun...@lists.llvm.org]
 On Behalf Of Frédéric Riss via lldb-dev
Sent: Tuesday, February 26, 2019 5:40 PM
To: Zachary Turner
Cc: LLDB
Subject: Re: [lldb-dev] RFC: Moving debug info parsing out of process



On Feb 26, 2019, at 4:52 PM, Zachary Turner 
mailto:ztur...@google.com>> wrote:


On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss 
mailto:fr...@apple.com>> wrote:

On Feb 26, 2019, at 4:03 PM, Zachary Turner 
mailto:ztur...@google.com>> wrote:

I would probably build the server by using mostly code from LLVM.  Since it 
would contain all of the low level debug info parsing libraries, i would expect 
that all knowledge of debug info (at least, in the form that compilers emit it 
in) could eventually be removed from LLDB entirely.

That’s quite an ambitious goal.

I haven’t looked at the SymbolFile API, what do you expect the exchange 
currency between the server and LLDB to be? Serialized compiler ASTs? If that’s 
the case, it seems like you need a strong rev-lock between the server and the 
client. Which in turn add quite some complexity to the rollout of new versions 
of the debugger.
Definitely not serialized ASTs, because you could be debugging some language 
other than C++.  Probably something more like JSON, where you parse the debug 
info and send back some JSON representation of the type / function / variable 
the user requested, which can almost be a direct mapping to LLDB's internal 
symbol hierarchy (e.g. the Function, Type, etc classes).  You'd still need to 
build the AST on the client

This seems fairly easy for Function or symbols in general, as it’s easy to 
abstract their few properties, but as soon as you get to the type system, I get 
worried.

Your representation needs to have the full expressivity of the underlying debug 
info format. Inventing something new in that space seems really expensive. For 
example, every piece of information we add to the debug info in the compiler 
would need to be handled in multiple places:
 - the server code
 - the client code that talks to the server
 - the current “local" code (for a pretty long while)
Not ideal. I wish there was a way to factor at least the last 2.

But maybe I’m misunderstanding exactly what you’d put in your JSON. If it’s 
very close to the debug format (basically a JSON representation of the DWARF or 
the PDB), then it becomes more tractable as the client code can be the same as 
the current local one with some refactoring.

Fred


So, for example, all of the efforts to merge LLDB and LLVM's DWARF parsing 
libraries could happen by first implementing inside of LLVM whatever 
functionality is missing, and then using that from within the server.  And yes, 
I would expect lldb to spin up a server, just as it does with lldb-server today 
if you try to debug something.  It finds the lldb-server binary and runs it.

When I say "switching the default", what I mean is that if someday this 
hypothetical server supports everything that the current in-process parsing 
codepath supports, we could just delete that entire codepath and switch 
everything to the out of process server, even if that server were running on 
the same physical machine as the debugger client (which would be functionally 
equivalent to what we have today).

(I obviously knew what you meant by "switching the default”, I was trying to 
ask about how… to which the answer is by spinning up a local server)

Do you envision LLDB being able to talk to more than one server at the same 
time? It seems like this could be useful to debug a local build while still 
having access to debug symbols for your dependencies that have their symbols in 
a central repository.

I hadn't really thought of this, but it certainly seems possible.  Since the 
API is stateless, it 

Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Zachary Turner via lldb-dev
On Wed, Feb 27, 2019 at 2:52 PM Frédéric Riss  wrote:

> On Feb 27, 2019, at 10:12 AM, Zachary Turner  wrote:
>
>
>
> For what it's worth, in an earlier message I mentioned that I would
> probably build the server by using mostly code from LLVM, and making sure
> that it supported the union of things currently supported by LLDB and
> LLVM's DWARF parsers.  Doing that would naturally require merging the two
> (which has been talked about for a long time) as a pre-requisite, and I
> would expect that for testing purposes we might want something like
> llvm-dwarfdump but that dumps a higher level description of the information
> (if we change our DWARF emission code in LLVM for example, to output the
> exact same type in slightly different ways in the underlying DWARF, we
> wouldn't want our test to break, for example).  So for example imagine you
> could run something like `lldb-dwarfdump -lookup-type=foo a.out` and it
> would dump some description of the type that is resilient to insignificant
> changes in the underlying DWARF.
>
>
> At which level do you consider the “DWARF parser” to stop and the debugger
> policy to start? In my view, the DWARF parser stop at the DwarfDIE
> boundary. Replacing it wouldn’t get us closer to a higher-level abstraction.
>
At the level where you have an alternative representation that you no
longer have to access to the debug info.  In LLDB today, this
"representation" is a combination of LLDB's own internal symbol hierarchy
(e.g. lldb_private::Type, lldb_private::Function, etc) and the Clang AST.
Once you have constructed those 2 things, the DWARF parser is out of the
picture.

A lot of the complexity in processing raw DWARF comes from handling
different versions of the DWARF spec (e.g. supporting DWARF 4 & DWARF 5),
collecting and interpreting the subset of attributes which happens be
present, following references to other parts of the DWARF, and then at the
end of all this (or perhaps during all of this), dealing with "partial
information" (e.g. something that would have saved me a lot of trouble was
missing, now I have to do extra work to find it).

I'm treading DWARF expressions as an exception though, because it would be
somewhat tedious and not provide much value to convert those into some text
format and then evaluate the text representation of the expression since
it's already in a format suitable for processing.  So for this case, you
could just encode the byte sequence into a hex string and send that.

I hinted at this already, but part of the problem (at least in my mind) is
that our "DWARF parser" is intermingled with the code that *interprets the
parsed DWARF*.  We parse a little bit, build something, parse a little bit
more, add on to the thing we're building, etc.  This design is fragile and
makes error handling difficult, so part of what I'm proposing is a
separation here, where "parse as much as possible, and return an
intermediate representation that is as finished as we are able to make it".

This part is independent of whether DWARF parsing is out of process
however.  That's still useful even if DWARF parsing is in process, and
we've talked about something like that for a long time, whereby we have
some kind of API that says "give me the thing, handle all errors
internally, and either return me a thing which I can trust or an error".
I'm viewing "thing which I can trust" as some representation which is
separate from the original DWARF, and which we could test -- for example --
by writing a tool which dumps this representation



>
> At that point you're already 90% of the way towards what I'm proposing,
> and it's useful independently.
>
>
> I think that “90%” figure is a little off :-) But please don’t take my
> questions as opposition to the general idea. I find the idea very
> interesting, and we could maybe use something similar internally so I am
> interested. That’s why I’m asking questions.
>

Hmm, well I think the 90% figure is pretty accurate.  Because if we
envision a hypothetical command line tool which ingests DWARF from a binary
or set of binaries, and has some command line interface that allows you to
query it in the same way our SymbolFile plugins can be queried, and dumps
its output in some intermediate format (maybe JSON, maybe something else)
and is sufficiently descriptive to make a Clang AST or build LLDB's
internal symbol & type hierarchy out of it, then at that point the only
thing missing from my original proposal is a socket to send that over the
wire and something on the other end to make the Clang AST and LLDB type /
symbol hierarchy.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Frédéric Riss via lldb-dev


> On Feb 27, 2019, at 10:12 AM, Zachary Turner  wrote:
> 
> 
> 
> On Tue, Feb 26, 2019 at 5:39 PM Frédéric Riss  > wrote:
> 
>> On Feb 26, 2019, at 4:52 PM, Zachary Turner > > wrote:
>> 
>> 
>> 
>> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss > > wrote:
>> 
>>> On Feb 26, 2019, at 4:03 PM, Zachary Turner >> > wrote:
>>> 
>>> I would probably build the server by using mostly code from LLVM.  Since it 
>>> would contain all of the low level debug info parsing libraries, i would 
>>> expect that all knowledge of debug info (at least, in the form that 
>>> compilers emit it in) could eventually be removed from LLDB entirely.
>> 
>> That’s quite an ambitious goal.
>> 
>> I haven’t looked at the SymbolFile API, what do you expect the exchange 
>> currency between the server and LLDB to be? Serialized compiler ASTs? If 
>> that’s the case, it seems like you need a strong rev-lock between the server 
>> and the client. Which in turn add quite some complexity to the rollout of 
>> new versions of the debugger.
>> Definitely not serialized ASTs, because you could be debugging some language 
>> other than C++.  Probably something more like JSON, where you parse the 
>> debug info and send back some JSON representation of the type / function / 
>> variable the user requested, which can almost be a direct mapping to LLDB's 
>> internal symbol hierarchy (e.g. the Function, Type, etc classes).  You'd 
>> still need to build the AST on the client
> 
> This seems fairly easy for Function or symbols in general, as it’s easy to 
> abstract their few properties, but as soon as you get to the type system, I 
> get worried.
> 
> Your representation needs to have the full expressivity of the underlying 
> debug info format. Inventing something new in that space seems really 
> expensive. For example, every piece of information we add to the debug info 
> in the compiler would need to be handled in multiple places:
>  - the server code
>  - the client code that talks to the server
>  - the current “local" code (for a pretty long while)
> Not ideal. I wish there was a way to factor at least the last 2. 
> How often does this actually happen though?  The C++ type system hasn't 
> really undergone very many fundamental changes over the years.

I think over the last year we’ve done at least a couple extensions to what we 
put in DWARF (for ObjC classes and ARM PAC support which is not upstream yet). 
Adrian usually does those evolutions, so he might have a better idea. We plan 
on potentially adding a bunch more information to DWARF to more accurately 
represent the Obj-C type system.  

>   I mocked up a few samples of what some JSON descriptions would look like, 
> and it didn't seem terrible.  It certainly is some work -- there's no denying 
> -- but I think a lot of the "expressivity" of the underlying format is 
> actually more accurately described as "flexibility".  What I mean by this is 
> that there are both many different ways to express the same thing, as well as 
> many entities that can express different things depending on how they're 
> used.  An intermediate format gives us a way to eliminate all of that 
> flexibility and instead offer consistency, which makes client code much 
> simpler.  In a way, this is a similar benefit to what one gets by compiling a 
> source language down to LLVM IR and then operating on the LLVM IR because you 
> have a much simpler grammar to deal with, along with more semantic 
> restrictions on what kind of descriptions you form with that grammar (to be 
> clear: JSON itself is not restrictive, but we can make our schema 
> restrictive).

What I’m worried about is not exactly the amount of work, just the scope of the 
new abstraction. It needs to be good enough for any language and any debug 
information format. It needs efficient implementation of at least symbols, 
types, decl contexts, frame information, location expressions, target register 
mappings... And it’ll require the equivalent of the various ASTParser 
implementations. That’s a lot of new and forked code. I’d feel way better if we 
were able to reuse some of the existing code. I’m not sure how feasible this is 
though.

> For what it's worth, in an earlier message I mentioned that I would probably 
> build the server by using mostly code from LLVM, and making sure that it 
> supported the union of things currently supported by LLDB and LLVM's DWARF 
> parsers.  Doing that would naturally require merging the two (which has been 
> talked about for a long time) as a pre-requisite, and I would expect that for 
> testing purposes we might want something like llvm-dwarfdump but that dumps a 
> higher level description of the information (if we change our DWARF emission 
> code in LLVM for example, to output the exact same type in slightly different 
> ways in the underlying DWARF, we wouldn't want our test to break, for 
> 

Re: [lldb-dev] When should ArchSpecs match?

2019-02-27 Thread Ted Woodward via lldb-dev
Hexagon uses “hexagon-unknown-elf” as its triple when running standalone (no 
OS) or with QuRT (our embedded OS), which expands to 
“hexagon-unknown-unknown-elf” sometimes, or “hexagon-unknown--elf” other times. 
For Linux we use “hexagon-unknown-linux”.

One issue I’ve seen is the Linux platform will match against 
“hexagon-unknown--elf”, so I need to make sure the Hexagon platform is in the 
plugin list before the Linux platform.

Ted

From: lldb-dev  On Behalf Of Greg Clayton via 
lldb-dev
Sent: Wednesday, February 27, 2019 4:15 PM
To: Zachary Turner 
Cc: ted.woodw...@codeaurora.org; LLDB 
Subject: [EXT] Re: [lldb-dev] When should ArchSpecs match?




On Dec 7, 2018, at 8:10 AM, Zachary Turner via lldb-dev 
mailto:lldb-dev@lists.llvm.org>> wrote:

“Unknown” is a perfectly fine value for the os though, and I’m not suggesting 
to change that.

My point is simply that Jason’s situation (baremetal) is one that is not even 
expressible by the Triple syntax. As long as there’s some enum value that 
describes the situation (of which unknown is a valid choice), the problem goes 
away.

We current use a "specified unknown" (where enum and string are unknown) to 
mean "none", which is what we use to say specify bare metal (no OS). I am happy 
to change that though. If we change this, then a few people's workflows might 
have to change where they used to say "armv7-apple-unknown" to 
"armv7-apple-none". Not a big deal since not many people are using LLDB for 
bare board debugging right now, but something we will need to document.

Greg



On Fri, Dec 7, 2018 at 8:06 AM 
mailto:ted.woodw...@codeaurora.org>> wrote:
We use 2 triples for Hexagon:
hexagon-unknown-elf (which becomes hexagon-unknown-unknown-elf internally), and 
hexagon-unknown-linux.

We follow the Linux standard and add in magic to the elf to identify it as a 
Linux binary. But in the hexagon-unknown-elf case we have no way to distinguish 
between standalone (no OS, running on our simulator) or QuRT (proprietary OS, 
could be running on hardware or simulator). In fact, the same shared library 
that has no OS calls (just standard library calls that go into the appropriate 
.so) could run under either one.

I think requiring a value for every OS would be a non-starter for us.

--
Ted Woodward
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

From: lldb-dev 
mailto:lldb-dev-boun...@lists.llvm.org>> On 
Behalf Of Zachary Turner via lldb-dev
Sent: Friday, December 7, 2018 4:38 AM
To: Pavel Labath mailto:pa...@labath.sk>>
Cc: LLDB mailto:lldb-dev@lists.llvm.org>>
Subject: Re: [lldb-dev] When should ArchSpecs match?

We can already say that with OSType::Unknown. That’s different than “i know 
that no OS exists”
On Fri, Dec 7, 2018 at 12:00 AM Pavel Labath 
mailto:pa...@labath.sk>> wrote:
On 07/12/2018 01:22, Jason Molenda via lldb-dev wrote:
> Oh sorry I missed that.  Yes, I think a value added to the OSType for NoOS or 
> something would work.  We need to standardize on a textual representation for 
> this in a triple string as well, like 'none'.  Then with arm64-- and 
> arm64-*-* as UnknownVendor + UnknownOS we can have these marked as 
> "compatible" with any other value in the case Adrian is looking at.
>
>

Sounds good to me.

As another data point, it is usually impossible to tell from looking at
an ELF file which os it is intended to run on. You can tell the
architecture because it's right in the elf header, but that's about it.
Some OSs get around this by adding a special section like
.this.is.an.android.binary, but not all of them. So in general, we need
to be able to say "I have no idea which OS is this binary intended for".

pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Sanimir Agovic via lldb-dev
Hi Zachary,

On Mon, Feb 25, 2019 at 7:23 PM Zachary Turner via lldb-dev <
lldb-dev@lists.llvm.org> wrote:
> [...]
> Thoughts?
Having a standalone symbols interface would open many tooling
possibilities, the available interfaces are too dwarfish and too primitive.
This necessarily does not require an out-of-process symbol server but I see
that it is appealing to you especially with the problems you are facing.

I do not want start bikeshedding on implementation details already as it
seems you have your own but I suggest starting with a linetable interface.
It has a simple and stable interface addr2locs/loc2addrs, is complete on
its own (no symbols required), not prone to dwarf/pdb or language oddities,
and imho is the most fundamental debug information. This would allow you to
focus on the necessary details and still have a good portion of
functionality.
Out-of-process symbol server do work but are less useful nowadays. Hope it
solves the problems you are facing.

 -Sanimir


On Mon, Feb 25, 2019 at 7:23 PM Zachary Turner via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

> Hi all,
>
> We've got some internal efforts in progress, and one of those would
> benefit from debug info parsing being out of process (independently of
> whether or not the rest of LLDB is out of process).
>
> There's a couple of advantages to this, which I'll enumerate here:
>
>- It improves one source of instability in LLDB which has been known
>to be problematic -- specifically, that debug info can be bad and handling
>this can often be difficult and bring down the entire debug session.  While
>other efforts have been made to address stability by moving things out of
>process, they have not been upstreamed, and even if they had I think we
>would still want this anyway, for reasons that follow.
>- It becomes theoretically possible to move debug info parsing not
>just to another process, but to another machine entirely.  In a broader
>sense, this decouples the physical debug info location (and for that
>matter, representation) from the debugger host.
>- It becomes testable as an independent component, because you can
>just send requests to it and dump the results and see if they make sense.
>Currently there is almost zero test coverage of this aspect of LLDB apart
>from what you can get after going through many levels of indirection via
>spinning up a full debug session and doing things that indirectly result in
>symbol queries.
>
> The big win here, at least from my point of view, is the second one.
> Traditional symbol servers operate by copying entire symbol files (DSYM,
> DWP, PDB) from some machine to the debugger host.  These can be very large
> -- we've seen 12+ GB in some cases -- which ranges from "slow bandwidth
> hog" to "complete non-starter" depending on the debugger host and network.
> In this kind of scenario, one could theoretically run the debug info
> process on the same NAS, cloud, or whatever as the symbol server.  Then,
> rather than copying over an entire symbol file, it responds only to the
> query you issued -- if you asked for a type, it just returns a packet
> describing the type you requested.
>
> The API itself would be stateless (so that you could make queries for
> multiple targets in any order) as well as asynchronous (so that responses
> might arrive out of order).  Blocking could be implemented in LLDB, but
> having the server be asynchronous means multiple clients could connect to
> the same server instance.  This raises interesting possibilities.  For
> example, one can imagine thousands of developers connecting to an internal
> symbol server on the network and being able to debug remote processes or
> core dumps over slow network connections or on machines with very little
> storage (e.g. chromebooks).
>
>
> On the LLDB side, all of this is hidden behind the SymbolFile interface,
> so most of LLDB doesn't have to change at all.   While this is in
> development, we could have SymbolFileRemote and keep the existing local
> codepath the default, until such time that it's robust and complete enough
> that we can switch the default.
>
> Thoughts?
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] When should ArchSpecs match?

2019-02-27 Thread Greg Clayton via lldb-dev


> On Dec 7, 2018, at 8:10 AM, Zachary Turner via lldb-dev 
>  wrote:
> 
> “Unknown” is a perfectly fine value for the os though, and I’m not suggesting 
> to change that.
> 
> My point is simply that Jason’s situation (baremetal) is one that is not even 
> expressible by the Triple syntax. As long as there’s some enum value that 
> describes the situation (of which unknown is a valid choice), the problem 
> goes away.

We current use a "specified unknown" (where enum and string are unknown) to 
mean "none", which is what we use to say specify bare metal (no OS). I am happy 
to change that though. If we change this, then a few people's workflows might 
have to change where they used to say "armv7-apple-unknown" to 
"armv7-apple-none". Not a big deal since not many people are using LLDB for 
bare board debugging right now, but something we will need to document.

Greg


> On Fri, Dec 7, 2018 at 8:06 AM  > wrote:
> We use 2 triples for Hexagon:
> 
> hexagon-unknown-elf (which becomes hexagon-unknown-unknown-elf internally), 
> and hexagon-unknown-linux.
> 
>  
> 
> We follow the Linux standard and add in magic to the elf to identify it as a 
> Linux binary. But in the hexagon-unknown-elf case we have no way to 
> distinguish between standalone (no OS, running on our simulator) or QuRT 
> (proprietary OS, could be running on hardware or simulator). In fact, the 
> same shared library that has no OS calls (just standard library calls that go 
> into the appropriate .so) could run under either one.
> 
>  
> 
> I think requiring a value for every OS would be a non-starter for us.
> 
>  
> 
> --
> 
> Ted Woodward
> 
> Qualcomm Innovation Center, Inc.
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
> Foundation Collaborative Project
> 
>  
> 
> From: lldb-dev  > On Behalf Of Zachary Turner via 
> lldb-dev
> Sent: Friday, December 7, 2018 4:38 AM
> To: Pavel Labath mailto:pa...@labath.sk>>
> Cc: LLDB mailto:lldb-dev@lists.llvm.org>>
> Subject: Re: [lldb-dev] When should ArchSpecs match?
> 
>  
> 
> We can already say that with OSType::Unknown. That’s different than “i know 
> that no OS exists”
> 
> On Fri, Dec 7, 2018 at 12:00 AM Pavel Labath  > wrote:
> 
> On 07/12/2018 01:22, Jason Molenda via lldb-dev wrote:
> > Oh sorry I missed that.  Yes, I think a value added to the OSType for NoOS 
> > or something would work.  We need to standardize on a textual 
> > representation for this in a triple string as well, like 'none'.  Then with 
> > arm64-- and arm64-*-* as UnknownVendor + UnknownOS we can have these marked 
> > as "compatible" with any other value in the case Adrian is looking at.
> > 
> > 
> 
> Sounds good to me.
> 
> As another data point, it is usually impossible to tell from looking at 
> an ELF file which os it is intended to run on. You can tell the 
> architecture because it's right in the elf header, but that's about it. 
> Some OSs get around this by adding a special section like 
> .this.is.an.android.binary, but not all of them. So in general, we need 
> to be able to say "I have no idea which OS is this binary intended for".
> 
> pl
> 
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [8.0.0 Release] rc3 has been tagged

2019-02-27 Thread Hans Wennborg via lldb-dev
Dear testers,

8.0.0-rc3 was just tagged from the release_80 branch at r355015.

We're running a little behind schedule now, but I think we're also
close to be able to call this done.

Please take a close look at this release candidate. Unless anything
bad comes up, this is probably very similar to what the final release
will look like.

Testers, please run the test script, share your results, and upload binaries.

I'll publish source tarballs and docs as soon as possible, and
binaries as they become available.

Thanks,
Hans
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Zachary Turner via lldb-dev
GSYM, as I understand it, is basically just an evolution of Breakpad
symbols.  It doesn't contain full fidelity debug information (type
information, function parameters, etc).

On Tue, Feb 26, 2019 at 5:56 PM  wrote:

> When I see this "parsing DWARF and turning it into something else" it is
> very reminiscent of what clayborg is trying to do with GSYM.  You're both
> talking about leveraging LLVM's parser, which is great, but I have to
> wonder if there isn't more commonality being left on the table.  Just
> throwing that thought out there; I don't have anything specific to suggest.
>
> --paulr
>
>
>
> *From:* lldb-dev [mailto:lldb-dev-boun...@lists.llvm.org] *On Behalf Of 
> *Frédéric
> Riss via lldb-dev
> *Sent:* Tuesday, February 26, 2019 5:40 PM
> *To:* Zachary Turner
> *Cc:* LLDB
> *Subject:* Re: [lldb-dev] RFC: Moving debug info parsing out of process
>
>
>
>
>
>
>
> On Feb 26, 2019, at 4:52 PM, Zachary Turner  wrote:
>
>
>
>
>
> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss  wrote:
>
>
>
> On Feb 26, 2019, at 4:03 PM, Zachary Turner  wrote:
>
>
>
> I would probably build the server by using mostly code from LLVM.  Since
> it would contain all of the low level debug info parsing libraries, i would
> expect that all knowledge of debug info (at least, in the form that
> compilers emit it in) could eventually be removed from LLDB entirely.
>
>
>
> That’s quite an ambitious goal.
>
>
>
> I haven’t looked at the SymbolFile API, what do you expect the exchange
> currency between the server and LLDB to be? Serialized compiler ASTs? If
> that’s the case, it seems like you need a strong rev-lock between the
> server and the client. Which in turn add quite some complexity to the
> rollout of new versions of the debugger.
>
> Definitely not serialized ASTs, because you could be debugging some
> language other than C++.  Probably something more like JSON, where you
> parse the debug info and send back some JSON representation of the type /
> function / variable the user requested, which can almost be a direct
> mapping to LLDB's internal symbol hierarchy (e.g. the Function, Type, etc
> classes).  You'd still need to build the AST on the client
>
>
>
> This seems fairly easy for Function or symbols in general, as it’s easy to
> abstract their few properties, but as soon as you get to the type system, I
> get worried.
>
>
>
> Your representation needs to have the full expressivity of the underlying
> debug info format. Inventing something new in that space seems really
> expensive. For example, every piece of information we add to the debug info
> in the compiler would need to be handled in multiple places:
>
>  - the server code
>
>  - the client code that talks to the server
>
>  - the current “local" code (for a pretty long while)
>
> Not ideal. I wish there was a way to factor at least the last 2.
>
>
>
> But maybe I’m misunderstanding exactly what you’d put in your JSON. If
> it’s very close to the debug format (basically a JSON representation of the
> DWARF or the PDB), then it becomes more tractable as the client code can be
> the same as the current local one with some refactoring.
>
>
>
> Fred
>
>
>
>
>
> So, for example, all of the efforts to merge LLDB and LLVM's DWARF parsing
> libraries could happen by first implementing inside of LLVM whatever
> functionality is missing, and then using that from within the server.  And
> yes, I would expect lldb to spin up a server, just as it does with
> lldb-server today if you try to debug something.  It finds the lldb-server
> binary and runs it.
>
>
>
> When I say "switching the default", what I mean is that if someday this
> hypothetical server supports everything that the current in-process parsing
> codepath supports, we could just delete that entire codepath and switch
> everything to the out of process server, even if that server were running
> on the same physical machine as the debugger client (which would be
> functionally equivalent to what we have today).
>
>
>
> (I obviously knew what you meant by "switching the default”, I was trying
> to ask about how… to which the answer is by spinning up a local server)
>
>
>
> Do you envision LLDB being able to talk to more than one server at the
> same time? It seems like this could be useful to debug a local build while
> still having access to debug symbols for your dependencies that have their
> symbols in a central repository.
>
>
>
> I hadn't really thought of this, but it certainly seems possible.  Since
> the API is stateless, it could send requests to any server it wanted, with
> some mechanism of selecting between them.
>
>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: Moving debug info parsing out of process

2019-02-27 Thread Zachary Turner via lldb-dev
On Tue, Feb 26, 2019 at 5:39 PM Frédéric Riss  wrote:

>
> On Feb 26, 2019, at 4:52 PM, Zachary Turner  wrote:
>
>
>
> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss  wrote:
>
>>
>> On Feb 26, 2019, at 4:03 PM, Zachary Turner  wrote:
>>
>> I would probably build the server by using mostly code from LLVM.  Since
>> it would contain all of the low level debug info parsing libraries, i would
>> expect that all knowledge of debug info (at least, in the form that
>> compilers emit it in) could eventually be removed from LLDB entirely.
>>
>>
>> That’s quite an ambitious goal.
>>
>> I haven’t looked at the SymbolFile API, what do you expect the exchange
>> currency between the server and LLDB to be? Serialized compiler ASTs? If
>> that’s the case, it seems like you need a strong rev-lock between the
>> server and the client. Which in turn add quite some complexity to the
>> rollout of new versions of the debugger.
>>
> Definitely not serialized ASTs, because you could be debugging some
> language other than C++.  Probably something more like JSON, where you
> parse the debug info and send back some JSON representation of the type /
> function / variable the user requested, which can almost be a direct
> mapping to LLDB's internal symbol hierarchy (e.g. the Function, Type, etc
> classes).  You'd still need to build the AST on the client
>
>
> This seems fairly easy for Function or symbols in general, as it’s easy to
> abstract their few properties, but as soon as you get to the type system, I
> get worried.
>
> Your representation needs to have the full expressivity of the underlying
> debug info format. Inventing something new in that space seems really
> expensive. For example, every piece of information we add to the debug info
> in the compiler would need to be handled in multiple places:
>  - the server code
>  - the client code that talks to the server
>  - the current “local" code (for a pretty long while)
> Not ideal. I wish there was a way to factor at least the last 2.
>
How often does this actually happen though?  The C++ type system hasn't
really undergone very many fundamental changes over the years.  I mocked up
a few samples of what some JSON descriptions would look like, and it didn't
seem terrible.  It certainly is some work -- there's no denying -- but I
think a lot of the "expressivity" of the underlying format is actually more
accurately described as "flexibility".  What I mean by this is that there
are both many different ways to express the same thing, as well as many
entities that can express different things depending on how they're used.
An intermediate format gives us a way to eliminate all of that flexibility
and instead offer consistency, which makes client code much simpler.  In a
way, this is a similar benefit to what one gets by compiling a source
language down to LLVM IR and then operating on the LLVM IR because you have
a much simpler grammar to deal with, along with more semantic restrictions
on what kind of descriptions you form with that grammar (to be clear: JSON
itself is not restrictive, but we can make our schema restrictive).

For what it's worth, in an earlier message I mentioned that I would
probably build the server by using mostly code from LLVM, and making sure
that it supported the union of things currently supported by LLDB and
LLVM's DWARF parsers.  Doing that would naturally require merging the two
(which has been talked about for a long time) as a pre-requisite, and I
would expect that for testing purposes we might want something like
llvm-dwarfdump but that dumps a higher level description of the information
(if we change our DWARF emission code in LLVM for example, to output the
exact same type in slightly different ways in the underlying DWARF, we
wouldn't want our test to break, for example).  So for example imagine you
could run something like `lldb-dwarfdump -lookup-type=foo a.out` and it
would dump some description of the type that is resilient to insignificant
changes in the underlying DWARF.

At that point you're already 90% of the way towards what I'm proposing, and
it's useful independently.

>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev