Re: [lldb-dev] new tool (core2yaml) + a new top-level library (Formats)

2019-03-06 Thread Zachary Turner via lldb-dev
Well, all of the actual yamlization code in obj2yaml and yaml2obj is
library-ized, so you could always add the real code there, then have
core2yaml just link against it
On Wed, Mar 6, 2019 at 5:11 AM Pavel Labath  wrote:

> On 05/03/2019 22:52, Zachary Turner wrote:
> >
> >
> > On Tue, Mar 5, 2019 at 1:47 PM Jonas Devlieghere via lldb-dev
> > mailto:lldb-dev@lists.llvm.org>> wrote:
> >
> >
> > I don't know much about the minidump format or code, but it sounds
> > reasonable for me to have support for it in yaml2obj, which would be
> > a sufficient motivation to have the code live there. As you mention
> > in your footnote, MachO core files are already supported, and it
> > sounds like ELF could reuse a bunch of existing code as well. So
> > having everything in LLVM would give you even more symmetry. I also
> > doubt anyone would mind having more fine grained yamlization, even
> > if you cannot use it to reduce a test it's nicer to see structure
> > than a binary blob (imho). Anyway, that's just my take, I guess this
> > is more of a question for the LLVM mailing list.
> >
> > A lot of obj2yaml output is just "Section Name" / "Section Contents" and
> > then a long hex string consisting of the contents.  Since a core file is
> > an ELF file, this would already be supported for obj2yaml today (in
> > theory)
>
> Actually, even this is not true. An elf *core file* is an *elf file*,
> but it contains no sections. It contains "segments" instead. :P obj2yaml
> has absolutely no support for segments so if you try it to yamlize a
> core file, you will get an empty output.
>
> Interestingly, yaml2obj does contain some support for segments, but its
> extremely limited, and can only be used to create very simple
> "executable" files. Core files still cannot be represented there right
> now, as yaml2obj is still very section-centric.
>
>
> However, I do see the appeal in having a single tool for yamlization of
> various "object" file formats, so I am going to send an email to
> llvm-dev and see what the response is like there. I'd encourage anyone
> interested in this to voice your opinion there too.
>
> regards,
> pavel
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] new tool (core2yaml) + a new top-level library (Formats)

2019-03-06 Thread Pavel Labath via lldb-dev

On 05/03/2019 22:52, Zachary Turner wrote:



On Tue, Mar 5, 2019 at 1:47 PM Jonas Devlieghere via lldb-dev 
mailto:lldb-dev@lists.llvm.org>> wrote:



I don't know much about the minidump format or code, but it sounds
reasonable for me to have support for it in yaml2obj, which would be
a sufficient motivation to have the code live there. As you mention
in your footnote, MachO core files are already supported, and it
sounds like ELF could reuse a bunch of existing code as well. So
having everything in LLVM would give you even more symmetry. I also
doubt anyone would mind having more fine grained yamlization, even
if you cannot use it to reduce a test it's nicer to see structure
than a binary blob (imho). Anyway, that's just my take, I guess this
is more of a question for the LLVM mailing list.

A lot of obj2yaml output is just "Section Name" / "Section Contents" and 
then a long hex string consisting of the contents.  Since a core file is 
an ELF file, this would already be supported for obj2yaml today (in 
theory)


Actually, even this is not true. An elf *core file* is an *elf file*, 
but it contains no sections. It contains "segments" instead. :P obj2yaml 
has absolutely no support for segments so if you try it to yamlize a 
core file, you will get an empty output.


Interestingly, yaml2obj does contain some support for segments, but its 
extremely limited, and can only be used to create very simple 
"executable" files. Core files still cannot be represented there right 
now, as yaml2obj is still very section-centric.



However, I do see the appeal in having a single tool for yamlization of 
various "object" file formats, so I am going to send an email to 
llvm-dev and see what the response is like there. I'd encourage anyone 
interested in this to voice your opinion there too.


regards,
pavel
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] new tool (core2yaml) + a new top-level library (Formats)

2019-03-05 Thread Zachary Turner via lldb-dev
On Tue, Mar 5, 2019 at 1:47 PM Jonas Devlieghere via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

>
> I don't know much about the minidump format or code, but it sounds
> reasonable for me to have support for it in yaml2obj, which would be a
> sufficient motivation to have the code live there. As you mention in your
> footnote, MachO core files are already supported, and it sounds like ELF
> could reuse a bunch of existing code as well. So having everything in LLVM
> would give you even more symmetry. I also doubt anyone would mind having
> more fine grained yamlization, even if you cannot use it to reduce a test
> it's nicer to see structure than a binary blob (imho). Anyway, that's just
> my take, I guess this is more of a question for the LLVM mailing list.
>

A lot of obj2yaml output is just "Section Name" / "Section Contents" and
then a long hex string consisting of the contents.  Since a core file is an
ELF file, this would already be supported for obj2yaml today (in theory),
but I also agree that specific knowledge of breaking it down into finer
grained fields and subfields, and actually parsing the core, is probably
not useful for anything else in LLVM.



>
>
>> Discussion topic #3: Use of .def files in lldb. In one of the patches a
>> create a .def textual header to be used for avoiding repetitive code
>> when dealing various constants. This is fairly common practice in llvm,
>> but would be a first in lldb.
>>
>
> I think this is a good idea. Although not exactly the same, we already got
> our feet wet with a tablegen file in the driver.
>
+1
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] new tool (core2yaml) + a new top-level library (Formats)

2019-03-05 Thread Jonas Devlieghere via lldb-dev
Hi Pavel,

On Tue, Mar 5, 2019 at 8:31 AM Pavel Labath via lldb-dev <
lldb-dev@lists.llvm.org> wrote:

> Hello all,
>
> I have just posted a large-ish patch series for review (D58971, D58973,
> D58975, D58976), and I want to use this opportunity to draw more
> attention to it and highlight various bikeshedding
> opportunities^H^H^Htopics for discussion :).
>
> The new tool is called core2yaml, and it's goal is to fill the gap in
> the testing story for core files. As you might know, at present, the
> only way to test core file parsing code (*) is to check in an opaque
> binary blob and have the debugger open that. This presents a couple of
> challenges:
> - it's really hard to review what is inside the core file
> - one has to jump through various hoops to create a "small" core file
> This tools fixes both issues by enabling one to check in text files,
> with human-readable content. The yaml files can also be easily edited to
> prune out the content which is not relevant for the test. While that's
> not my goal at present, I am hoping that this will one day enable us to
> write self-contained tests for the unwinder, as the core file can be
> used to synthesize (or capture) interesting unwinder scenarios.
>
> Since I also needed to find a home for the new code I was writing, I
> thought this would be good opportunity to create a new library for
> various stuff. The goals I was trying to solve are:
> - make the yaml code a library. The reason for that is that we have a
> number of unittests using checked in binaries, and I thought it would be
> nice to be able to convert those to use yaml representation as well.
> - make the existing minidump parsing code more easily accessible. The
> parsing code currently lives in source/Plugins/Process/minidump, and is
> impossible to use it without pulling in the rest of lldb (which the tool
> doesn't need).
> The solution I came up with here is a new "Formats" library. I chose a
> fairly generic name, because I realized that we have code for
> (de)serializing a bunch of small formats, which don't really have a good
> place to live in. Currently I needed a parser for linux /proc/PID/maps
> files and minidump files, but I am hoping that a generic name would
> enable us to one day move the gdb-remote protocol code there (which is
> also currently buried in some plugin code, which makes it hard to depend
> on from lldb-server), as well as the future debug-info-server, if it
> ever comes into existence.
>
> Discussion topic #1: The library name and scope.
> There are lost of other ways this could be organized. One of the names I
> considered was "BinaryFormat" for symmetry with llvm, but then I chose
> to drop the "Binary" part as it seemed to me we have plenty of
> non-binary formats as well. As for it's dependencies I currently have it
> depending on Utility and nothing else (as far as lldb libraries go). I
> can imagine using some Host code might be useful there too, but I would
> like to avoid any other lldb dependencies right now. Another question is
> whether this should be a single library or a bunch of smaller ones. I
> chose a single library now because the things I initially plan to put
> there are fairly small (/proc/pid/maps parser is 200 LOC), but I can see
> how we may want to create sub-libraries for things that grow big (the
> debug-info server code might turn out to be one of those) or that have
> some additional dependencies.
>

I don't have strong opinions here, nor do I have a better suggestion for
the name.


> Discussion topic #2: tool name and scope
> A case could be made to integrate this functionality into the llvm
> yaml2obj utilities. Here I chose not to do that because the minidump
> format is not at all implemented in llvm, and I do not see a use case
> for it to be implemented/moved there. A stronger case could be made to
> put the elf core code there, since llvm already supports reading elf
> files. While originally being in favour of that, I eventually adopted
> the view that doing this in lldb would be better because:
> - it would bring more symmetry with minidumps
> - it would enable us to do fine-grained yamlization for things that we
> care about (e.g., registers), which is something that would probably be
> uninteresting to the rest of llvm.
>

I don't know much about the minidump format or code, but it sounds
reasonable for me to have support for it in yaml2obj, which would be a
sufficient motivation to have the code live there. As you mention in your
footnote, MachO core files are already supported, and it sounds like ELF
could reuse a bunch of existing code as well. So having everything in LLVM
would give you even more symmetry. I also doubt anyone would mind having
more fine grained yamlization, even if you cannot use it to reduce a test
it's nicer to see structure than a binary blob (imho). Anyway, that's just
my take, I guess this is more of a question for the LLVM mailing list.


> Discussion topic #3: Use of .def files in lldb. 

[lldb-dev] new tool (core2yaml) + a new top-level library (Formats)

2019-03-05 Thread Pavel Labath via lldb-dev

Hello all,

I have just posted a large-ish patch series for review (D58971, D58973, 
D58975, D58976), and I want to use this opportunity to draw more 
attention to it and highlight various bikeshedding 
opportunities^H^H^Htopics for discussion :).


The new tool is called core2yaml, and it's goal is to fill the gap in 
the testing story for core files. As you might know, at present, the 
only way to test core file parsing code (*) is to check in an opaque 
binary blob and have the debugger open that. This presents a couple of 
challenges:

- it's really hard to review what is inside the core file
- one has to jump through various hoops to create a "small" core file
This tools fixes both issues by enabling one to check in text files, 
with human-readable content. The yaml files can also be easily edited to 
prune out the content which is not relevant for the test. While that's 
not my goal at present, I am hoping that this will one day enable us to 
write self-contained tests for the unwinder, as the core file can be 
used to synthesize (or capture) interesting unwinder scenarios.


Since I also needed to find a home for the new code I was writing, I 
thought this would be good opportunity to create a new library for 
various stuff. The goals I was trying to solve are:
- make the yaml code a library. The reason for that is that we have a 
number of unittests using checked in binaries, and I thought it would be 
nice to be able to convert those to use yaml representation as well.
- make the existing minidump parsing code more easily accessible. The 
parsing code currently lives in source/Plugins/Process/minidump, and is 
impossible to use it without pulling in the rest of lldb (which the tool 
doesn't need).
The solution I came up with here is a new "Formats" library. I chose a 
fairly generic name, because I realized that we have code for 
(de)serializing a bunch of small formats, which don't really have a good 
place to live in. Currently I needed a parser for linux /proc/PID/maps 
files and minidump files, but I am hoping that a generic name would 
enable us to one day move the gdb-remote protocol code there (which is 
also currently buried in some plugin code, which makes it hard to depend 
on from lldb-server), as well as the future debug-info-server, if it 
ever comes into existence.


Discussion topic #1: The library name and scope.
There are lost of other ways this could be organized. One of the names I 
considered was "BinaryFormat" for symmetry with llvm, but then I chose 
to drop the "Binary" part as it seemed to me we have plenty of 
non-binary formats as well. As for it's dependencies I currently have it 
depending on Utility and nothing else (as far as lldb libraries go). I 
can imagine using some Host code might be useful there too, but I would 
like to avoid any other lldb dependencies right now. Another question is 
whether this should be a single library or a bunch of smaller ones. I 
chose a single library now because the things I initially plan to put 
there are fairly small (/proc/pid/maps parser is 200 LOC), but I can see 
how we may want to create sub-libraries for things that grow big (the 
debug-info server code might turn out to be one of those) or that have 
some additional dependencies.


Discussion topic #2: tool name and scope
A case could be made to integrate this functionality into the llvm 
yaml2obj utilities. Here I chose not to do that because the minidump 
format is not at all implemented in llvm, and I do not see a use case 
for it to be implemented/moved there. A stronger case could be made to 
put the elf core code there, since llvm already supports reading elf 
files. While originally being in favour of that, I eventually adopted 
the view that doing this in lldb would be better because:

- it would bring more symmetry with minidumps
- it would enable us to do fine-grained yamlization for things that we 
care about (e.g., registers), which is something that would probably be 
uninteresting to the rest of llvm.


Discussion topic #3: Use of .def files in lldb. In one of the patches a 
create a .def textual header to be used for avoiding repetitive code 
when dealing various constants. This is fairly common practice in llvm, 
but would be a first in lldb.


Discussion topic #4: Overlap with "process plugin dump". This tool has 
some overlap with the given command for minidump files, which also 
provides a textual description of minidump files. In case we are ok with 
tweaking the interface of that command slightly (and ok with some yaml 
artefacts in it's output), it should be possible to reimplement that 
command on top of the yaml serialization library.


Discussion topic #5: Anything else I haven't thought of.

regards,
pavel

(*) This is not entirely true for MachO core files, where yaml2obj is 
already able to convert the core files into text form. However, it is 
definitely true for ELF and minidump core files, and even the MachO yaml 
for isn't that well suited