Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-17 Thread Pavel Labath via lldb-dev
Thank you for the update Jonas. I haven't looked at the patch in detail, 
but (for better or worse) it seems more-or-less like what I would expect 
at a first glance.


One of the things that occurred to me while looking at this is that it 
would be great to be able to test this code in greater isolation instead 
of just the full "run lldb.exe" test. (I've already said this on the 
file provider patch, but I think that is doubly true here.) There 
shouldn't be anything (except the prolific use of the string SB) tying 
this code to the SB api, and so we should be able to use it to 
record/replay a dummy api for testing purposes.


That would make it easier to test some of the trickier corner cases. For 
instance you could use placement new to make sure you get two objects at 
the same address instead of hoping that this will eventually happen 
during the full lldb run. It would also enable you to make incremental 
patches while you work out the remaining issues instead of having to 
have a fully working system from the get-go.


I think the only big change that you would need to make for this to work 
is to move the core of this code out of the API folder (maybe all the 
way down to Utility?), as in the way the build is set up now, it won't 
get exported in a way that can be used by anyone else.


I think the patch tool you've made strikes a good compromise between 
ease of use and ease of developing it. I expect it will be particularly 
useful during the initial stages, when you need to annotate a large 
number of functions in batches.



On 17/01/2019 03:29, Jonas Devlieghere wrote:
I've put up a (WIP) patch for the tool (https://reviews.llvm.org/D56822) 
in case anybody is curious about that.


On Tue, Jan 15, 2019 at 1:41 PM Jonas Devlieghere > wrote:


I've updated the patch with a new version of the prototype:
https://reviews.llvm.org/D56322

It uses Pavel's suggestion to use the function address as a runtime
ID. All the deserialization code is generated using templates, with
automatic mapping on indices during serialization and deserialization.

I (again) manually added the macros for the same set of functions I
had in the original prototype. Unsurprisingly this is very
error-prone. It's easy to forget to add the right macros for the
registry, the function, and the return type. Some of these things
can be detected at compile time, other only blow up at run-time. I
strongly believe that a tool to add the macros is the way forward.
It would be more of a developer tool rather than something that
hooks up in the build process.

Note that it's still a prototype, there are outstanding issues like
void pointers, callbacks and other types of argument that require
some kind of additional information to serialize. I also didn't get
around yet to the lifetime issue yet that was discussed on IRC last
week.

Please let me know what you think.

Thanks,
Jonas

On Wed, Jan 9, 2019 at 8:58 AM Jonas Devlieghere
mailto:jo...@devlieghere.com>> wrote:



On Wed, Jan 9, 2019 at 8:42 AM Pavel Labath mailto:pa...@labath.sk>> wrote:

On 09/01/2019 17:15, Jonas Devlieghere wrote:
 >
 >
 > On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath
mailto:pa...@labath.sk>
 > >> wrote:
 >
 >     On 08/01/2019 21:57, Jonas Devlieghere wrote:
 >      > Before I got around to coding this up I realized
you can't take the
 >      > address of constructors in C++, so the function
address won't
 >     work as an
 >      > identifier.
 >      >
 >
 >     You gave up way too easily. :P
 >
 >
 > I counted on you having something in mind, it sounded too
obvious for
 > you to have missed.  ;-)
 >
 >     I realized that constructors are going to be tricky,
but I didn't want
 >     to dive into those details until I knew if you liked
the general idea.
 >     The most important thing to realize here is that for
the identifier
 >     thingy to work, you don't actually need to use the
address of that
 >     method/constructor as the identifier. It is
sufficient to have
 >     something
 >     that can be deterministically computed from the
function. Then you can
 >     use the address of *that* as the identifier.
 >
 >
 > I was thinking about that yesterday. I still feel like it
would be
 > better to have this mapping all done at compile time. I
was considering
 > 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-16 Thread Jonas Devlieghere via lldb-dev
I've put up a (WIP) patch for the tool (https://reviews.llvm.org/D56822) in
case anybody is curious about that.

On Tue, Jan 15, 2019 at 1:41 PM Jonas Devlieghere 
wrote:

> I've updated the patch with a new version of the prototype:
> https://reviews.llvm.org/D56322
>
> It uses Pavel's suggestion to use the function address as a runtime ID.
> All the deserialization code is generated using templates, with automatic
> mapping on indices during serialization and deserialization.
>
> I (again) manually added the macros for the same set of functions I had in
> the original prototype. Unsurprisingly this is very error-prone. It's easy
> to forget to add the right macros for the registry, the function, and the
> return type. Some of these things can be detected at compile time, other
> only blow up at run-time. I strongly believe that a tool to add the macros
> is the way forward. It would be more of a developer tool rather than
> something that hooks up in the build process.
>
> Note that it's still a prototype, there are outstanding issues like void
> pointers, callbacks and other types of argument that require some kind of
> additional information to serialize. I also didn't get around yet to the
> lifetime issue yet that was discussed on IRC last week.
>
> Please let me know what you think.
>
> Thanks,
> Jonas
>
> On Wed, Jan 9, 2019 at 8:58 AM Jonas Devlieghere 
> wrote:
>
>>
>>
>> On Wed, Jan 9, 2019 at 8:42 AM Pavel Labath  wrote:
>>
>>> On 09/01/2019 17:15, Jonas Devlieghere wrote:
>>> >
>>> >
>>> > On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath >> > > wrote:
>>> >
>>> > On 08/01/2019 21:57, Jonas Devlieghere wrote:
>>> >  > Before I got around to coding this up I realized you can't take
>>> the
>>> >  > address of constructors in C++, so the function address won't
>>> > work as an
>>> >  > identifier.
>>> >  >
>>> >
>>> > You gave up way too easily. :P
>>> >
>>> >
>>> > I counted on you having something in mind, it sounded too obvious for
>>> > you to have missed.  ;-)
>>> >
>>> > I realized that constructors are going to be tricky, but I didn't
>>> want
>>> > to dive into those details until I knew if you liked the general
>>> idea.
>>> > The most important thing to realize here is that for the identifier
>>> > thingy to work, you don't actually need to use the address of that
>>> > method/constructor as the identifier. It is sufficient to have
>>> > something
>>> > that can be deterministically computed from the function. Then you
>>> can
>>> > use the address of *that* as the identifier.
>>> >
>>> >
>>> > I was thinking about that yesterday. I still feel like it would be
>>> > better to have this mapping all done at compile time. I was
>>> considering
>>> > some kind of constexpr hashing but that sounded overkill.
>>> >
>>>
>>> Well.. most of this is done through template meta-programming, which
>>> _is_ compile-time. And the fact that I have a use for the new
>>> construct/invoke functions I create this way means that even the space
>>> used by those isn't completely wasted (although I'm sure this could be
>>> made smaller with hard-coded IDs). The biggest impact of this I can
>>> think of is the increased number of dynamic relocations that need to be
>>> performed by the loader, as it introduces a bunch of function pointers
>>> floating around. But even that shouldn't too bad as we have plenty of
>>> other sources of dynamic relocs (currently about 4% of the size of
>>> liblldb and 10% of lldb-server).
>>>
>>
>> Yeah of course, it wasn't my intention to critique your approach. I was
>> talking specifically about the mapping (the std::map) in the prototype,
>> something I asked about earlier in the thread. FWIW I think this would be
>> an excellent trade-off is we don't need a tool to generate code for us. I'm
>> hopeful that we can have the gross of the deserialization code generated
>> this way, most of the "special" cases are still pretty similar and dealing
>> with basic types. Anyway, that should become clear later today as I
>> integrate this into the lldb prototype.
>>
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-15 Thread Jonas Devlieghere via lldb-dev
I've updated the patch with a new version of the prototype:
https://reviews.llvm.org/D56322

It uses Pavel's suggestion to use the function address as a runtime ID. All
the deserialization code is generated using templates, with automatic
mapping on indices during serialization and deserialization.

I (again) manually added the macros for the same set of functions I had in
the original prototype. Unsurprisingly this is very error-prone. It's easy
to forget to add the right macros for the registry, the function, and the
return type. Some of these things can be detected at compile time, other
only blow up at run-time. I strongly believe that a tool to add the macros
is the way forward. It would be more of a developer tool rather than
something that hooks up in the build process.

Note that it's still a prototype, there are outstanding issues like void
pointers, callbacks and other types of argument that require some kind of
additional information to serialize. I also didn't get around yet to the
lifetime issue yet that was discussed on IRC last week.

Please let me know what you think.

Thanks,
Jonas

On Wed, Jan 9, 2019 at 8:58 AM Jonas Devlieghere 
wrote:

>
>
> On Wed, Jan 9, 2019 at 8:42 AM Pavel Labath  wrote:
>
>> On 09/01/2019 17:15, Jonas Devlieghere wrote:
>> >
>> >
>> > On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath > > > wrote:
>> >
>> > On 08/01/2019 21:57, Jonas Devlieghere wrote:
>> >  > Before I got around to coding this up I realized you can't take
>> the
>> >  > address of constructors in C++, so the function address won't
>> > work as an
>> >  > identifier.
>> >  >
>> >
>> > You gave up way too easily. :P
>> >
>> >
>> > I counted on you having something in mind, it sounded too obvious for
>> > you to have missed.  ;-)
>> >
>> > I realized that constructors are going to be tricky, but I didn't
>> want
>> > to dive into those details until I knew if you liked the general
>> idea.
>> > The most important thing to realize here is that for the identifier
>> > thingy to work, you don't actually need to use the address of that
>> > method/constructor as the identifier. It is sufficient to have
>> > something
>> > that can be deterministically computed from the function. Then you
>> can
>> > use the address of *that* as the identifier.
>> >
>> >
>> > I was thinking about that yesterday. I still feel like it would be
>> > better to have this mapping all done at compile time. I was considering
>> > some kind of constexpr hashing but that sounded overkill.
>> >
>>
>> Well.. most of this is done through template meta-programming, which
>> _is_ compile-time. And the fact that I have a use for the new
>> construct/invoke functions I create this way means that even the space
>> used by those isn't completely wasted (although I'm sure this could be
>> made smaller with hard-coded IDs). The biggest impact of this I can
>> think of is the increased number of dynamic relocations that need to be
>> performed by the loader, as it introduces a bunch of function pointers
>> floating around. But even that shouldn't too bad as we have plenty of
>> other sources of dynamic relocs (currently about 4% of the size of
>> liblldb and 10% of lldb-server).
>>
>
> Yeah of course, it wasn't my intention to critique your approach. I was
> talking specifically about the mapping (the std::map) in the prototype,
> something I asked about earlier in the thread. FWIW I think this would be
> an excellent trade-off is we don't need a tool to generate code for us. I'm
> hopeful that we can have the gross of the deserialization code generated
> this way, most of the "special" cases are still pretty similar and dealing
> with basic types. Anyway, that should become clear later today as I
> integrate this into the lldb prototype.
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-09 Thread Jonas Devlieghere via lldb-dev
On Wed, Jan 9, 2019 at 8:42 AM Pavel Labath  wrote:

> On 09/01/2019 17:15, Jonas Devlieghere wrote:
> >
> >
> > On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath  > > wrote:
> >
> > On 08/01/2019 21:57, Jonas Devlieghere wrote:
> >  > Before I got around to coding this up I realized you can't take
> the
> >  > address of constructors in C++, so the function address won't
> > work as an
> >  > identifier.
> >  >
> >
> > You gave up way too easily. :P
> >
> >
> > I counted on you having something in mind, it sounded too obvious for
> > you to have missed.  ;-)
> >
> > I realized that constructors are going to be tricky, but I didn't
> want
> > to dive into those details until I knew if you liked the general
> idea.
> > The most important thing to realize here is that for the identifier
> > thingy to work, you don't actually need to use the address of that
> > method/constructor as the identifier. It is sufficient to have
> > something
> > that can be deterministically computed from the function. Then you
> can
> > use the address of *that* as the identifier.
> >
> >
> > I was thinking about that yesterday. I still feel like it would be
> > better to have this mapping all done at compile time. I was considering
> > some kind of constexpr hashing but that sounded overkill.
> >
>
> Well.. most of this is done through template meta-programming, which
> _is_ compile-time. And the fact that I have a use for the new
> construct/invoke functions I create this way means that even the space
> used by those isn't completely wasted (although I'm sure this could be
> made smaller with hard-coded IDs). The biggest impact of this I can
> think of is the increased number of dynamic relocations that need to be
> performed by the loader, as it introduces a bunch of function pointers
> floating around. But even that shouldn't too bad as we have plenty of
> other sources of dynamic relocs (currently about 4% of the size of
> liblldb and 10% of lldb-server).
>

Yeah of course, it wasn't my intention to critique your approach. I was
talking specifically about the mapping (the std::map) in the prototype,
something I asked about earlier in the thread. FWIW I think this would be
an excellent trade-off is we don't need a tool to generate code for us. I'm
hopeful that we can have the gross of the deserialization code generated
this way, most of the "special" cases are still pretty similar and dealing
with basic types. Anyway, that should become clear later today as I
integrate this into the lldb prototype.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-09 Thread Pavel Labath via lldb-dev

On 09/01/2019 17:15, Jonas Devlieghere wrote:



On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath > wrote:


On 08/01/2019 21:57, Jonas Devlieghere wrote:
 > Before I got around to coding this up I realized you can't take the
 > address of constructors in C++, so the function address won't
work as an
 > identifier.
 >

You gave up way too easily. :P


I counted on you having something in mind, it sounded too obvious for 
you to have missed.  ;-)


I realized that constructors are going to be tricky, but I didn't want
to dive into those details until I knew if you liked the general idea.
The most important thing to realize here is that for the identifier
thingy to work, you don't actually need to use the address of that
method/constructor as the identifier. It is sufficient to have
something
that can be deterministically computed from the function. Then you can
use the address of *that* as the identifier.


I was thinking about that yesterday. I still feel like it would be 
better to have this mapping all done at compile time. I was considering 
some kind of constexpr hashing but that sounded overkill.




Well.. most of this is done through template meta-programming, which 
_is_ compile-time. And the fact that I have a use for the new 
construct/invoke functions I create this way means that even the space 
used by those isn't completely wasted (although I'm sure this could be 
made smaller with hard-coded IDs). The biggest impact of this I can 
think of is the increased number of dynamic relocations that need to be 
performed by the loader, as it introduces a bunch of function pointers 
floating around. But even that shouldn't too bad as we have plenty of 
other sources of dynamic relocs (currently about 4% of the size of 
liblldb and 10% of lldb-server).

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-09 Thread Jonas Devlieghere via lldb-dev
On Wed, Jan 9, 2019 at 5:05 AM Pavel Labath  wrote:

> On 08/01/2019 21:57, Jonas Devlieghere wrote:
> > Before I got around to coding this up I realized you can't take the
> > address of constructors in C++, so the function address won't work as an
> > identifier.
> >
>
> You gave up way too easily. :P
>

I counted on you having something in mind, it sounded too obvious for you
to have missed.  ;-)

I realized that constructors are going to be tricky, but I didn't want
> to dive into those details until I knew if you liked the general idea.
> The most important thing to realize here is that for the identifier
> thingy to work, you don't actually need to use the address of that
> method/constructor as the identifier. It is sufficient to have something
> that can be deterministically computed from the function. Then you can
> use the address of *that* as the identifier.
>

I was thinking about that yesterday. I still feel like it would be better
to have this mapping all done at compile time. I was considering some kind
of constexpr hashing but that sounded overkill.


> I've created a very simple prototype ,
> where I do just that. The way I handle constructors there is that I
> create a special class template (construct), whose instantiations are
> going to be unique for each constructor (I achieve that by making the
> class name and the constructor argument types the template parameters of
> that function). Then I can take the address of the static member
> function inside this class (::doit), and
> use *that* as the ID.


Clever!

As a nice side-effect, the "doit" method actually does invoke the
> constructor in question, so I can also use that in the replay code to
> treat constructors like any other method that returns an object.
>

This is really neat.


> I also do the same thing for (non-static) member functions via the
> "invoke" template, because even though it is possible to take the
> address of those, it is very hard to do anything else with the retrieved
> pointer. So the effect of this that in the rest of the code, I only have
> to work with free functions, as both constructors and member functions
> are converted into equivalent free functions. I haven't tried to handle
> destructors yet, but I don't think those should pose any problems that
> we haven't encountered already.
>
> The example also show how you can use templates to automatically
> generate replay code for "simple" (i.e. those where you can
> (de)serialize each argument independently) functions, and then uses that
> to record/replay a very simple API.
>
> You can see it in action like this:
> $ g++ a.cc  # compile
> $ ./a.out 2>/tmp/recording # generate the recording
> SBFoo 47 42
> Method 1 2
> Static 10 11
> $ cat /tmp/recording
> 0  # ID of the constructor
> 47 # constructor arg 1
> 42 # constructor arg 2
> 0x7ffd74d9a0f7 # constructor result
> 1  # id of SBFoo::Method
> 0x7ffd74d9a0f7 # this
> 1  # arg 1
> 2  # arg 2
> 2  # id of SBFoo::Static
> 10 # arg 1
> 11 # arg 2
> $ ./a.out 1 < /tmp/recording # replay the recording
> SBFoo 47 42
> SBFoo 42 47
> Method 1 2
> Static 10 11
>
> Note that when replaying the SBFoo constructor is called twice. This is
> because this code does not attempt to track the object instances in any
> way... it just creates a new one each time. This obviously needs to be
> fixed, but that's independent of the function ID issue.
>
> hope you find that useful,
> pl
>

Definitely, thank you for taking the time to code up a prototype.

Cheers,
Jonas
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-09 Thread Pavel Labath via lldb-dev

On 08/01/2019 21:57, Jonas Devlieghere wrote:
Before I got around to coding this up I realized you can't take the 
address of constructors in C++, so the function address won't work as an 
identifier.




You gave up way too easily. :P

I realized that constructors are going to be tricky, but I didn't want 
to dive into those details until I knew if you liked the general idea. 
The most important thing to realize here is that for the identifier 
thingy to work, you don't actually need to use the address of that 
method/constructor as the identifier. It is sufficient to have something 
that can be deterministically computed from the function. Then you can 
use the address of *that* as the identifier.


I've created a very simple prototype , 
where I do just that. The way I handle constructors there is that I 
create a special class template (construct), whose instantiations are 
going to be unique for each constructor (I achieve that by making the 
class name and the constructor argument types the template parameters of 
that function). Then I can take the address of the static member 
function inside this class (::doit), and 
use *that* as the ID.


As a nice side-effect, the "doit" method actually does invoke the 
constructor in question, so I can also use that in the replay code to 
treat constructors like any other method that returns an object.


I also do the same thing for (non-static) member functions via the 
"invoke" template, because even though it is possible to take the 
address of those, it is very hard to do anything else with the retrieved 
pointer. So the effect of this that in the rest of the code, I only have 
to work with free functions, as both constructors and member functions 
are converted into equivalent free functions. I haven't tried to handle 
destructors yet, but I don't think those should pose any problems that 
we haven't encountered already.


The example also show how you can use templates to automatically 
generate replay code for "simple" (i.e. those where you can 
(de)serialize each argument independently) functions, and then uses that 
to record/replay a very simple API.


You can see it in action like this:
$ g++ a.cc  # compile
$ ./a.out 2>/tmp/recording # generate the recording
SBFoo 47 42
Method 1 2
Static 10 11
$ cat /tmp/recording
0  # ID of the constructor
47 # constructor arg 1
42 # constructor arg 2
0x7ffd74d9a0f7 # constructor result
1  # id of SBFoo::Method
0x7ffd74d9a0f7 # this
1  # arg 1
2  # arg 2
2  # id of SBFoo::Static
10 # arg 1
11 # arg 2
$ ./a.out 1 < /tmp/recording # replay the recording
SBFoo 47 42
SBFoo 42 47
Method 1 2
Static 10 11

Note that when replaying the SBFoo constructor is called twice. This is 
because this code does not attempt to track the object instances in any 
way... it just creates a new one each time. This obviously needs to be 
fixed, but that's independent of the function ID issue.


hope you find that useful,
pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-08 Thread Jonas Devlieghere via lldb-dev
Before I got around to coding this up I realized you can't take the address
of constructors in C++, so the function address won't work as an
identifier.

On Tue, Jan 8, 2019 at 9:28 AM Jonas Devlieghere 
wrote:

> On Tue, Jan 8, 2019 at 8:27 AM Frédéric Riss  wrote:
>
>>
>>
>> > On Jan 8, 2019, at 1:25 AM, Pavel Labath  wrote:
>> >
>> > On 07/01/2019 22:45, Frédéric Riss wrote:
>> >>> On Jan 7, 2019, at 11:31 AM, Pavel Labath via lldb-dev <
>> lldb-dev@lists.llvm.org > wrote:
>> >>>
>> >>> On 07/01/2019 19:26, Jonas Devlieghere wrote:
>>  On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath > > wrote:
>> I've been thinking about how could this be done better, and the
>> best
>> (though not ideal) way I came up with is using the functions
>> address as
>> the key. That's guaranteed to be unique everywhere. Of course, you
>> cannot serialize that to a file, but since you already have a
>> central
>> place where you list all intercepted functions (to register their
>> replayers), that place can be also used to assign unique integer
>> IDs to
>> these functions. So then the idea would be that the SB_RECORD
>> macro
>> takes the address of the current function, that gets converted to
>> an ID
>> in the lookup table, and the ID gets serialized.
>>  It sound like you would generate the indices at run-time. How would
>> that work with regards to the the reverse mapping?
>> >>> In the current implementation, SBReplayer::Init contains a list of
>> all intercepted methods, right? Each of the SB_REGISTER calls takes two
>> arguments: The method name, and the replay implementation.
>> >>>
>> >>> I would change that so that this macro takes three arguments:
>> >>> - the function address (the "runtime" ID)
>> >>> - an integer (the "serialized" ID)
>> >>> - the replay implementation
>> >>>
>> >>> This creates a link between the function address and the serialized
>> ID. So when, during capture, a method calls SB_RECORD_ENTRY and passes in
>> the function address, that address can be looked up and translated to an ID
>> for serialization.
>> >>>
>> >>> The only thing that would need to be changed is to have
>> SBReplayer::Init execute during record too (which probably means it
>> shouldn't be called SBReplayer, but whatever..), so that the ID mapping is
>> also available when capturing.
>> >>>
>> >>> Does that make sense?
>> >> I think I understand what you’re explaining, and the mapping side of
>> things makes sense. But I’m concerned about the size and complexity of the
>> SB_RECORD macro that will need to be written. IIUC, those would need to
>> take the address of the current function and the prototype, which is a lot
>> of cumbersome text to type. It seems like having a specialized tool to
>> generate those would be nice, but once you have a tool you also don’t need
>> all this complexity, do you?
>> >> Fred
>> >
>> > Yes, if the tool generates the IDs for you and checks that the macro
>> invocations are correct, then you don't need the function prototype.
>> However, that tool also doesn't come for free: Somebody has to write it,
>> and it adds complexity in the form of an extra step in the build process.
>>
>> Definitely agreed, the complexity has to be somewhere.
>>
>> > My point is that this extended macro could provide all the
>> error-checking benefits of this tool. It's a tradeoff, of course, and the
>> cost here is a more complex macro invocation. I think the choice here is
>> mostly down to personal preference of whoever implements this. However, if
>> I was implementing this, I'd go for an extended macro, because I don't find
>> the extra macro complexity to be too much. For example, this should be the
>> macro invocation for SBData::SetDataFromDoubleArray:
>> >
>> > SB_RECORD(bool, SBData, SetDataFromDoubleArray, (double *, size_t),
>> >array, array_len);
>>
>> Yeah, this doesn’t seem so bad. For some reason I imagined it much more
>> verbose. Note that a verification tool that checks that every SB method is
>> instrumented correctly would still be nice (but it can come as a follow-up).
>>
>
> It sounds like this should work but we should try it out to be sure. I'll
> rework the prototype to use the function address and update the thread with
> my findings. I also like the idea of using templates to generate the
> parsers so I'll try that as well.
>
>
>> > It's a bit long, but it's not that hard to type, and all of this
>> information should be present on the previous line, where
>> SBData::SetDataFromDoubleArray is defined (I deliberately made the macro
>> argument order match the function definition syntax).
>> >
>> > And this approach can be further tweaked. For instance, if we're
>> willing to take the hit of having "weird" function definitions, then we can
>> avoid the repetition altogether, and make the macro define the function too:
>> >
>> > 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-08 Thread Jonas Devlieghere via lldb-dev
On Tue, Jan 8, 2019 at 8:27 AM Frédéric Riss  wrote:

>
>
> > On Jan 8, 2019, at 1:25 AM, Pavel Labath  wrote:
> >
> > On 07/01/2019 22:45, Frédéric Riss wrote:
> >>> On Jan 7, 2019, at 11:31 AM, Pavel Labath via lldb-dev <
> lldb-dev@lists.llvm.org > wrote:
> >>>
> >>> On 07/01/2019 19:26, Jonas Devlieghere wrote:
>  On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath  pa...@labath.sk>> wrote:
> I've been thinking about how could this be done better, and the
> best
> (though not ideal) way I came up with is using the functions
> address as
> the key. That's guaranteed to be unique everywhere. Of course, you
> cannot serialize that to a file, but since you already have a
> central
> place where you list all intercepted functions (to register their
> replayers), that place can be also used to assign unique integer
> IDs to
> these functions. So then the idea would be that the SB_RECORD macro
> takes the address of the current function, that gets converted to
> an ID
> in the lookup table, and the ID gets serialized.
>  It sound like you would generate the indices at run-time. How would
> that work with regards to the the reverse mapping?
> >>> In the current implementation, SBReplayer::Init contains a list of all
> intercepted methods, right? Each of the SB_REGISTER calls takes two
> arguments: The method name, and the replay implementation.
> >>>
> >>> I would change that so that this macro takes three arguments:
> >>> - the function address (the "runtime" ID)
> >>> - an integer (the "serialized" ID)
> >>> - the replay implementation
> >>>
> >>> This creates a link between the function address and the serialized
> ID. So when, during capture, a method calls SB_RECORD_ENTRY and passes in
> the function address, that address can be looked up and translated to an ID
> for serialization.
> >>>
> >>> The only thing that would need to be changed is to have
> SBReplayer::Init execute during record too (which probably means it
> shouldn't be called SBReplayer, but whatever..), so that the ID mapping is
> also available when capturing.
> >>>
> >>> Does that make sense?
> >> I think I understand what you’re explaining, and the mapping side of
> things makes sense. But I’m concerned about the size and complexity of the
> SB_RECORD macro that will need to be written. IIUC, those would need to
> take the address of the current function and the prototype, which is a lot
> of cumbersome text to type. It seems like having a specialized tool to
> generate those would be nice, but once you have a tool you also don’t need
> all this complexity, do you?
> >> Fred
> >
> > Yes, if the tool generates the IDs for you and checks that the macro
> invocations are correct, then you don't need the function prototype.
> However, that tool also doesn't come for free: Somebody has to write it,
> and it adds complexity in the form of an extra step in the build process.
>
> Definitely agreed, the complexity has to be somewhere.
>
> > My point is that this extended macro could provide all the
> error-checking benefits of this tool. It's a tradeoff, of course, and the
> cost here is a more complex macro invocation. I think the choice here is
> mostly down to personal preference of whoever implements this. However, if
> I was implementing this, I'd go for an extended macro, because I don't find
> the extra macro complexity to be too much. For example, this should be the
> macro invocation for SBData::SetDataFromDoubleArray:
> >
> > SB_RECORD(bool, SBData, SetDataFromDoubleArray, (double *, size_t),
> >array, array_len);
>
> Yeah, this doesn’t seem so bad. For some reason I imagined it much more
> verbose. Note that a verification tool that checks that every SB method is
> instrumented correctly would still be nice (but it can come as a follow-up).
>

It sounds like this should work but we should try it out to be sure. I'll
rework the prototype to use the function address and update the thread with
my findings. I also like the idea of using templates to generate the
parsers so I'll try that as well.


> > It's a bit long, but it's not that hard to type, and all of this
> information should be present on the previous line, where
> SBData::SetDataFromDoubleArray is defined (I deliberately made the macro
> argument order match the function definition syntax).
> >
> > And this approach can be further tweaked. For instance, if we're willing
> to take the hit of having "weird" function definitions, then we can avoid
> the repetition altogether, and make the macro define the function too:
> >
> > SB_METHOD2(bool, SBData, SetDataFromDoubleArray, double *, array,
> >size_t, array_len, {
> >  // Method body
> > })
>
> I personally don’t like this.
>
> Fred
>
> > This would also enable you to automatically capture method return value
> for the "object" results.
> >
> > pl
>
>

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-08 Thread Frédéric Riss via lldb-dev


> On Jan 8, 2019, at 1:25 AM, Pavel Labath  wrote:
> 
> On 07/01/2019 22:45, Frédéric Riss wrote:
>>> On Jan 7, 2019, at 11:31 AM, Pavel Labath via lldb-dev 
>>> mailto:lldb-dev@lists.llvm.org>> wrote:
>>> 
>>> On 07/01/2019 19:26, Jonas Devlieghere wrote:
 On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath >>> > wrote:
I've been thinking about how could this be done better, and the best
(though not ideal) way I came up with is using the functions address as
the key. That's guaranteed to be unique everywhere. Of course, you
cannot serialize that to a file, but since you already have a central
place where you list all intercepted functions (to register their
replayers), that place can be also used to assign unique integer IDs to
these functions. So then the idea would be that the SB_RECORD macro
takes the address of the current function, that gets converted to an ID
in the lookup table, and the ID gets serialized.
 It sound like you would generate the indices at run-time. How would that 
 work with regards to the the reverse mapping?
>>> In the current implementation, SBReplayer::Init contains a list of all 
>>> intercepted methods, right? Each of the SB_REGISTER calls takes two 
>>> arguments: The method name, and the replay implementation.
>>> 
>>> I would change that so that this macro takes three arguments:
>>> - the function address (the "runtime" ID)
>>> - an integer (the "serialized" ID)
>>> - the replay implementation
>>> 
>>> This creates a link between the function address and the serialized ID. So 
>>> when, during capture, a method calls SB_RECORD_ENTRY and passes in the 
>>> function address, that address can be looked up and translated to an ID for 
>>> serialization.
>>> 
>>> The only thing that would need to be changed is to have SBReplayer::Init 
>>> execute during record too (which probably means it shouldn't be called 
>>> SBReplayer, but whatever..), so that the ID mapping is also available when 
>>> capturing.
>>> 
>>> Does that make sense?
>> I think I understand what you’re explaining, and the mapping side of things 
>> makes sense. But I’m concerned about the size and complexity of the 
>> SB_RECORD macro that will need to be written. IIUC, those would need to take 
>> the address of the current function and the prototype, which is a lot of 
>> cumbersome text to type. It seems like having a specialized tool to generate 
>> those would be nice, but once you have a tool you also don’t need all this 
>> complexity, do you?
>> Fred
> 
> Yes, if the tool generates the IDs for you and checks that the macro 
> invocations are correct, then you don't need the function prototype. However, 
> that tool also doesn't come for free: Somebody has to write it, and it adds 
> complexity in the form of an extra step in the build process.

Definitely agreed, the complexity has to be somewhere.

> My point is that this extended macro could provide all the error-checking 
> benefits of this tool. It's a tradeoff, of course, and the cost here is a 
> more complex macro invocation. I think the choice here is mostly down to 
> personal preference of whoever implements this. However, if I was 
> implementing this, I'd go for an extended macro, because I don't find the 
> extra macro complexity to be too much. For example, this should be the macro 
> invocation for SBData::SetDataFromDoubleArray:
> 
> SB_RECORD(bool, SBData, SetDataFromDoubleArray, (double *, size_t),
>array, array_len);

Yeah, this doesn’t seem so bad. For some reason I imagined it much more 
verbose. Note that a verification tool that checks that every SB method is 
instrumented correctly would still be nice (but it can come as a follow-up).

> It's a bit long, but it's not that hard to type, and all of this information 
> should be present on the previous line, where SBData::SetDataFromDoubleArray 
> is defined (I deliberately made the macro argument order match the function 
> definition syntax).
> 
> And this approach can be further tweaked. For instance, if we're willing to 
> take the hit of having "weird" function definitions, then we can avoid the 
> repetition altogether, and make the macro define the function too:
> 
> SB_METHOD2(bool, SBData, SetDataFromDoubleArray, double *, array,
>size_t, array_len, {
>  // Method body
> })

I personally don’t like this.

Fred

> This would also enable you to automatically capture method return value for 
> the "object" results.
> 
> pl

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-08 Thread Pavel Labath via lldb-dev

On 07/01/2019 22:13, Jonas Devlieghere wrote:



On Mon, Jan 7, 2019 at 3:52 AM Tamas Berghammer > wrote:


One problem is when the behavior of LLDB is not deterministic for
whatever reason (e.g. multi threading, unordered maps, etc...). Lets
take SBModule::FindSymbols() what returns an SBSymbolContextList
without any specific order (haven't checked the implementation but I
would consider a random order to be valid). If a user calls this
function, then iterates through the elements to find an index `I`,
calls `GetContextAtIndex(I)` and pass the result into a subsequent
function then what will we do. Will we capture what did
`GetContextAtIndex(I)` returned in the trace and use that value or
will we capture the value of `I`, call `GetContextAtIndex(I)` during
reproduction and use that value. Doing the first would be correct in
this case but would mean we don't call `GetContextAtIndex(I)` while
doing the second case would mean we call `GetContextAtIndex(I)` with
a wrong index if the order in SBSymbolContextList is non
deterministic. In this case as we know that GetContextAtIndex is
just an accessor into a vector the first option is the correct one
but I can imagine cases where this is not the case (e.g. if
GetContextAtIndex would have some useful side effect).


Indeed, in this scenario we would replay the call with the same `I` 
resulting in an incorrect value. I think the only solution is fixing the 
non-derterminism. This should be straightforward for lists (some kind of 
sensible ordering), but maybe there are other issues I'm not aware of.


For this, I think we should adopt the same rules that llvm has for 
nondeterminism: returning entries in an unspecified order is fine as 
long as that order is always the same for the given set of inputs. So, 
using unordered maps is fine as long as it doesn't use any 
runtime-dependent values (pointers), or this nondeterminism is removed 
(explicit sort) before letting the values out.


This way, when you're debugging something, and your replay doesn't work 
because of nondeterminism, you fix the nodeterministic bug. It may not 
have been the bug you set out to fix, but you still reduce the overall 
number of bugs.




Other interesting question is what to do with functions taking raw
binary data in the form of a pointer + size (e.g. SBData::SetData).
I think we will have to annotate these APIs to make the reproducer
system aware of the amount of data they have to capture and then
allocate these buffers with the correct lifetime during replay. I am
not sure what would be the best way to attach these annotations but
I think we might need a fairly generic framework because I won't be
surprised if there are more situation when we have to add
annotations to the API. I slightly related question is if a function
returns a pointer to a raw buffer (e.g. const char* or void*) then
do we have to capture the content of it or the pointer for it and in
either case what is the lifetime of the buffer returned (e.g.
SBError::GetCString() returns a buffer what goes out of scope when
the SBError goes out of scope).


This a good concern and not something I had a good solution for at this 
point. For const char* string we work around this by serializing the 
actual string. Obviously that won't always work. Also we have the void* 
batons for callsback, which is another tricky thing that wouldn't be 
supported. I'm wondering if we can get away with ignoring these at first 
(maybe printing something in the replay logic that warns the user that 
the reproducer contains an unsupported function?).


Overall, I think we should leave some space to enable hand-written 
record/replay logic for the tricky cases. Batons will be one of those 
cases, but I don't think they're unsolvable. The only thing we can do 
with a user-specified void* baton is pass it back to the user callback. 
But we're not going to that since we don't have the code for the 
callback anyway. Nor do we need to do that since we're not interested in 
what happens outside of SB boundary.


For this, I think the right solution is to replay the *effects* of the 
call to the user callback by re-executing the SB calls it made, which 
can be recorded as usual, when they cross the SB boundary.




Additionally I am pretty sure we have at least some functions
returning various indices what require remapping other then the
pointers either because they are just indexing into a data structure
with undefined internal order or they referencing some other
resource. Just by randomly browsing some of the SB APIs I found for
example SBHostOS::ThreadCreate what returns the pid/tid for the
newly created thread what will have to be remapped (it also takes a
function as an argument what is a problem as well). Because of this
I am not sure if we can get 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-08 Thread Pavel Labath via lldb-dev

On 07/01/2019 22:45, Frédéric Riss wrote:



On Jan 7, 2019, at 11:31 AM, Pavel Labath via lldb-dev 
mailto:lldb-dev@lists.llvm.org>> wrote:


On 07/01/2019 19:26, Jonas Devlieghere wrote:
On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath > wrote:

   I've been thinking about how could this be done better, and the best
   (though not ideal) way I came up with is using the functions 
address as

   the key. That's guaranteed to be unique everywhere. Of course, you
   cannot serialize that to a file, but since you already have a central
   place where you list all intercepted functions (to register their
   replayers), that place can be also used to assign unique integer 
IDs to

   these functions. So then the idea would be that the SB_RECORD macro
   takes the address of the current function, that gets converted to 
an ID

   in the lookup table, and the ID gets serialized.
It sound like you would generate the indices at run-time. How would 
that work with regards to the the reverse mapping?
In the current implementation, SBReplayer::Init contains a list of all 
intercepted methods, right? Each of the SB_REGISTER calls takes two 
arguments: The method name, and the replay implementation.


I would change that so that this macro takes three arguments:
- the function address (the "runtime" ID)
- an integer (the "serialized" ID)
- the replay implementation

This creates a link between the function address and the serialized 
ID. So when, during capture, a method calls SB_RECORD_ENTRY and passes 
in the function address, that address can be looked up and translated 
to an ID for serialization.


The only thing that would need to be changed is to have 
SBReplayer::Init execute during record too (which probably means it 
shouldn't be called SBReplayer, but whatever..), so that the ID 
mapping is also available when capturing.


Does that make sense?


I think I understand what you’re explaining, and the mapping side of 
things makes sense. But I’m concerned about the size and complexity of 
the SB_RECORD macro that will need to be written. IIUC, those would need 
to take the address of the current function and the prototype, which is 
a lot of cumbersome text to type. It seems like having a specialized 
tool to generate those would be nice, but once you have a tool you also 
don’t need all this complexity, do you?


Fred



Yes, if the tool generates the IDs for you and checks that the macro 
invocations are correct, then you don't need the function prototype. 
However, that tool also doesn't come for free: Somebody has to write it, 
and it adds complexity in the form of an extra step in the build process.


My point is that this extended macro could provide all the 
error-checking benefits of this tool. It's a tradeoff, of course, and 
the cost here is a more complex macro invocation. I think the choice 
here is mostly down to personal preference of whoever implements this. 
However, if I was implementing this, I'd go for an extended macro, 
because I don't find the extra macro complexity to be too much. For 
example, this should be the macro invocation for 
SBData::SetDataFromDoubleArray:


SB_RECORD(bool, SBData, SetDataFromDoubleArray, (double *, size_t),
array, array_len);

It's a bit long, but it's not that hard to type, and all of this 
information should be present on the previous line, where 
SBData::SetDataFromDoubleArray is defined (I deliberately made the macro 
argument order match the function definition syntax).


And this approach can be further tweaked. For instance, if we're willing 
to take the hit of having "weird" function definitions, then we can 
avoid the repetition altogether, and make the macro define the function too:


SB_METHOD2(bool, SBData, SetDataFromDoubleArray, double *, array,
size_t, array_len, {
  // Method body
})

This would also enable you to automatically capture method return value 
for the "object" results.


pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Frédéric Riss via lldb-dev


> On Jan 7, 2019, at 11:31 AM, Pavel Labath via lldb-dev 
>  wrote:
> 
> On 07/01/2019 19:26, Jonas Devlieghere wrote:
>> On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath >  >> 
>> wrote:
>>I've been thinking about how could this be done better, and the best
>>(though not ideal) way I came up with is using the functions address as
>>the key. That's guaranteed to be unique everywhere. Of course, you
>>cannot serialize that to a file, but since you already have a central
>>place where you list all intercepted functions (to register their
>>replayers), that place can be also used to assign unique integer IDs to
>>these functions. So then the idea would be that the SB_RECORD macro
>>takes the address of the current function, that gets converted to an ID
>>in the lookup table, and the ID gets serialized.
>> It sound like you would generate the indices at run-time. How would that 
>> work with regards to the the reverse mapping?
> In the current implementation, SBReplayer::Init contains a list of all 
> intercepted methods, right? Each of the SB_REGISTER calls takes two 
> arguments: The method name, and the replay implementation.
> 
> I would change that so that this macro takes three arguments:
> - the function address (the "runtime" ID)
> - an integer (the "serialized" ID)
> - the replay implementation
> 
> This creates a link between the function address and the serialized ID. So 
> when, during capture, a method calls SB_RECORD_ENTRY and passes in the 
> function address, that address can be looked up and translated to an ID for 
> serialization.
> 
> The only thing that would need to be changed is to have SBReplayer::Init 
> execute during record too (which probably means it shouldn't be called 
> SBReplayer, but whatever..), so that the ID mapping is also available when 
> capturing.
> 
> Does that make sense?

I think I understand what you’re explaining, and the mapping side of things 
makes sense. But I’m concerned about the size and complexity of the SB_RECORD 
macro that will need to be written. IIUC, those would need to take the address 
of the current function and the prototype, which is a lot of cumbersome text to 
type. It seems like having a specialized tool to generate those would be nice, 
but once you have a tool you also don’t need all this complexity, do you?

Fred

>>The part that bugs me about this is that taking the address of an
>>overloaded function is extremely tedious (you have to write something
>>like static_cast(::Bar)). That would mean all
>>of these things would have to be passed to the RECORD macro. OTOH, the
>>upshot of this would be that the macro would now have sufficient
>>information to perform pretty decent error checking on its invocation.
>>Another nice about this could be that once you already have a prototype
>>and an address of the function, it should be possible (with sufficient
>>template-fu) to synthesize replay code for the function automatically,
>>at least in the simple cases, which would avoid the repetitiveness of
>>the current replay code. Together, this might obviate the need for any
>>clang plugins or other funny build steps.
>> See my previous question. I see how the signature would help with decoding 
>> but still missing how you'd get the mapping.
>>The second thing I noticed is the usage of pointers for identifying
>>object. A pointer is good for that but only while the object it points
>>to is alive. Once the object is gone, the pointer can (and most likely
>>will) be reused. So, it sounds to me like you also need to track the
>>lifetime of these objects. That may be as simple as intercepting
>>constructor/destructor calls, but I haven't seen anything like that yet
>>(though I haven't looked at all details of the patch).
>> This shouldn't be a problem. When a new object is created it will be 
>> recorded in the table with a new identifier.
> Ok, sounds good.
> 
>>Tying into that is the recording of return values. It looks like the
>>current RECORD_RETURN macro will record the address of the temporary
>>object in the frame of the current function. However, that address will
>>become invalid as soon as the function returns as the result object
>>will
>>be copied into a location specified by the caller as a part of the
>>return processing. Are you handling this in any way?
>> We capture the temporary and the call to the copy-assignment constructor. 
>> This is not super efficient but it's the best we can do.
> 
> Ok, cool. I must have missed that part in the code.
> 
>>The final thing, which I noticed is the lack of any sign of threading
>>support. I'm not too worried about that, as that sounds like something
>>that could be fitted into the existing framework incrementally, but it
>>is something worth keeping in mind, as you're going 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Jonas Devlieghere via lldb-dev
On Mon, Jan 7, 2019 at 3:52 AM Tamas Berghammer 
wrote:

> Thanks Pavel for looping me in. I haven't looked into the actual
> implementation of the prototype yet but reading your description I have
> some concern regarding the amount of data you capture as I feel it isn't
> sufficient to reproduce a set of usecases.
>

Thanks Tamas!


> One problem is when the behavior of LLDB is not deterministic for whatever
> reason (e.g. multi threading, unordered maps, etc...). Lets take
> SBModule::FindSymbols() what returns an SBSymbolContextList without any
> specific order (haven't checked the implementation but I would consider a
> random order to be valid). If a user calls this function, then iterates
> through the elements to find an index `I`, calls `GetContextAtIndex(I)` and
> pass the result into a subsequent function then what will we do. Will we
> capture what did `GetContextAtIndex(I)` returned in the trace and use that
> value or will we capture the value of `I`, call `GetContextAtIndex(I)`
> during reproduction and use that value. Doing the first would be correct in
> this case but would mean we don't call `GetContextAtIndex(I)` while doing
> the second case would mean we call `GetContextAtIndex(I)` with a wrong
> index if the order in SBSymbolContextList is non deterministic. In this
> case as we know that GetContextAtIndex is just an accessor into a vector
> the first option is the correct one but I can imagine cases where this is
> not the case (e.g. if GetContextAtIndex would have some useful side effect).
>

Indeed, in this scenario we would replay the call with the same `I`
resulting in an incorrect value. I think the only solution is fixing the
non-derterminism. This should be straightforward for lists (some kind of
sensible ordering), but maybe there are other issues I'm not aware of.


> Other interesting question is what to do with functions taking raw binary
> data in the form of a pointer + size (e.g. SBData::SetData). I think we
> will have to annotate these APIs to make the reproducer system aware of the
> amount of data they have to capture and then allocate these buffers with
> the correct lifetime during replay. I am not sure what would be the best
> way to attach these annotations but I think we might need a fairly generic
> framework because I won't be surprised if there are more situation when we
> have to add annotations to the API. I slightly related question is if a
> function returns a pointer to a raw buffer (e.g. const char* or void*) then
> do we have to capture the content of it or the pointer for it and in either
> case what is the lifetime of the buffer returned (e.g.
> SBError::GetCString() returns a buffer what goes out of scope when the
> SBError goes out of scope).
>

This a good concern and not something I had a good solution for at this
point. For const char* string we work around this by serializing the actual
string. Obviously that won't always work. Also we have the void* batons for
callsback, which is another tricky thing that wouldn't be supported. I'm
wondering if we can get away with ignoring these at first (maybe printing
something in the replay logic that warns the user that the reproducer
contains an unsupported function?).


> Additionally I am pretty sure we have at least some functions returning
> various indices what require remapping other then the pointers either
> because they are just indexing into a data structure with undefined
> internal order or they referencing some other resource. Just by randomly
> browsing some of the SB APIs I found for example SBHostOS::ThreadCreate
> what returns the pid/tid for the newly created thread what will have to be
> remapped (it also takes a function as an argument what is a problem as
> well). Because of this I am not sure if we can get away with an
> automatically generated set of API descriptions instead of wring one with
> explicit annotations for the various remapping rules.
>

Fixing the non-determinism should also address this, right?


> If there is interest I can try to take a deeper look into the topic
> sometime later but I hope that those initial thoughts are useful.
>

Thank you. I'll start by incorporating the feedback and ping the thread
when the patch is ready for another look.


> Tamas
>
> On Mon, Jan 7, 2019 at 9:40 AM Pavel Labath  wrote:
>
>> On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:
>> > Hi Everyone,
>> >
>> > In September I sent out an RFC [1] about adding reproducers to LLDB.
>> > Over the
>> > past few months, I landed the reproducer framework, support for the GDB
>> > remote
>> > protocol and a bunch of preparatory changes. There's still an open code
>> > review
>> > [2] for dealing with files, but that one is currently blocked by a
>> change to
>> > the VFS in LLVM [3].
>> >
>> > The next big piece of work is supporting user commands (e.g. in the
>> > driver) and
>> > SB API calls. Originally I expected these two things to be separate,
>> but
>> > Pavel
>> > made a 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Pavel Labath via lldb-dev

On 07/01/2019 19:26, Jonas Devlieghere wrote:



On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath > wrote:

I've been thinking about how could this be done better, and the best
(though not ideal) way I came up with is using the functions address as
the key. That's guaranteed to be unique everywhere. Of course, you
cannot serialize that to a file, but since you already have a central
place where you list all intercepted functions (to register their
replayers), that place can be also used to assign unique integer IDs to
these functions. So then the idea would be that the SB_RECORD macro
takes the address of the current function, that gets converted to an ID
in the lookup table, and the ID gets serialized.


It sound like you would generate the indices at run-time. How would that 
work with regards to the the reverse mapping?
In the current implementation, SBReplayer::Init contains a list of all 
intercepted methods, right? Each of the SB_REGISTER calls takes two 
arguments: The method name, and the replay implementation.


I would change that so that this macro takes three arguments:
- the function address (the "runtime" ID)
- an integer (the "serialized" ID)
- the replay implementation

This creates a link between the function address and the serialized ID. 
So when, during capture, a method calls SB_RECORD_ENTRY and passes in 
the function address, that address can be looked up and translated to an 
ID for serialization.


The only thing that would need to be changed is to have SBReplayer::Init 
execute during record too (which probably means it shouldn't be called 
SBReplayer, but whatever..), so that the ID mapping is also available 
when capturing.


Does that make sense?



The part that bugs me about this is that taking the address of an
overloaded function is extremely tedious (you have to write something
like static_cast(::Bar)). That would mean all
of these things would have to be passed to the RECORD macro. OTOH, the
upshot of this would be that the macro would now have sufficient
information to perform pretty decent error checking on its invocation.
Another nice about this could be that once you already have a prototype
and an address of the function, it should be possible (with sufficient
template-fu) to synthesize replay code for the function automatically,
at least in the simple cases, which would avoid the repetitiveness of
the current replay code. Together, this might obviate the need for any
clang plugins or other funny build steps.


See my previous question. I see how the signature would help with 
decoding but still missing how you'd get the mapping.


The second thing I noticed is the usage of pointers for identifying
object. A pointer is good for that but only while the object it points
to is alive. Once the object is gone, the pointer can (and most likely
will) be reused. So, it sounds to me like you also need to track the
lifetime of these objects. That may be as simple as intercepting
constructor/destructor calls, but I haven't seen anything like that yet
(though I haven't looked at all details of the patch).


This shouldn't be a problem. When a new object is created it will be 
recorded in the table with a new identifier.

Ok, sounds good.



Tying into that is the recording of return values. It looks like the
current RECORD_RETURN macro will record the address of the temporary
object in the frame of the current function. However, that address will
become invalid as soon as the function returns as the result object
will
be copied into a location specified by the caller as a part of the
return processing. Are you handling this in any way?


We capture the temporary and the call to the copy-assignment 
constructor. This is not super efficient but it's the best we can do.


Ok, cool. I must have missed that part in the code.



The final thing, which I noticed is the lack of any sign of threading
support. I'm not too worried about that, as that sounds like something
that could be fitted into the existing framework incrementally, but it
is something worth keeping in mind, as you're going to run into that
pretty soon.


Yup, I've intentially ignored this for now.


Awasome.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Jonas Devlieghere via lldb-dev
On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath  wrote:

> On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:
> > Hi Everyone,
> >
> > In September I sent out an RFC [1] about adding reproducers to LLDB.
> > Over the
> > past few months, I landed the reproducer framework, support for the GDB
> > remote
> > protocol and a bunch of preparatory changes. There's still an open code
> > review
> > [2] for dealing with files, but that one is currently blocked by a
> change to
> > the VFS in LLVM [3].
> >
> > The next big piece of work is supporting user commands (e.g. in the
> > driver) and
> > SB API calls. Originally I expected these two things to be separate, but
> > Pavel
> > made a good case [4] that they're actually very similar.
> >
> > I created a prototype of how I envision this to work. As usual, we can
> > differentiate between capture and replay.
> >
> > ## SB API Capture
> >
> > When capturing a reproducer, every SB function/method is instrumented
> > using a
> > macro at function entry. The added code tracks the function identifier
> > (currently we use its name with __PRETTY_FUNCTION__) and its arguments.
> >
> > It also tracks when a function crosses the boundary between internal and
> > external use. For example, when someone (be it the driver, the python
> > binding
> > or the RPC server) call SBFoo, and in its implementation SBFoo calls
> > SBBar, we
> > don't need to record SBBar. When invoking SBFoo during replay, it will
> > itself
> > call SBBar.
> >
> > When a boundary is crossed, the function name and arguments are
> > serialized to a
> > file. This is trivial for basic types. For objects, we maintain a table
> that
> > maps pointer values to indices and serialize the index.
> >
> > To keep our table consistent, we also need to track return for functions
> > that
> > return an object by value. We have a separate macro that wraps the
> returned
> > object.
> >
> > The index is sufficient because every object that is passed to a
> > function has
> > crossed the boundary and hence was recorded. During replay (see below)
> > we map
> > the index to an address again which ensures consistency.
> >
> > ## SB API Replay
> >
> > To replay the SB function calls we need a way to invoke the corresponding
> > function from its serialized identifier. For every SB function, there's a
> > counterpart that deserializes its arguments and invokes the function.
> These
> > functions are added to the map and are called by the replay logic.
> >
> > Replaying is just a matter looping over the function identifiers in the
> > serialized file, dispatching the right deserialization function, until
> > no more
> > data is available.
> >
> > The deserialization function for constructors or functions that return
> > by value
> > contains additional logic for dealing with the aforementioned indices.
> The
> > resulting objects are added to a table (similar to the one described
> > earlier)
> > that maps indices to pointers. Whenever an object is passed as an
> > argument, the
> > index is used to get the actual object from the table.
> >
> > ## Tool
> >
> > Even when using macros, adding the necessary capturing and replay code is
> > tedious and scales poorly. For the prototype, we did this by hand, but we
> > propose a new clang-based tool to streamline the process.
> >
> > For the capture code, the tool would validate that the macro matches the
> > function signature, suggesting a fixit if the macros are incorrect or
> > missing.
> > Compared to generating the macros altogether, it has the advantage that
> we
> > don't have "configured" files that are harder to debug (without faking
> line
> > numbers etc).
> >
> > The deserialization code would be fully generated. As shown in the
> prototype
> > there are a few different cases, depending on whether we have to account
> for
> > objects or not.
> >
> > ## Prototype Code
> >
> > I created a differential [5] on Phabricator with the prototype. It
> > contains the
> > necessary methods to re-run the gdb remote (reproducer) lit test.
> >
> > ## Feedback
> >
> > Before moving forward I'd like to get the community's input. What do you
> > think
> > about this approach? Do you have concerns or can we be smarter
> > somewhere? Any
> > feedback would be greatly appreciated!
> >
> > Thanks,
> > Jonas
> >
> > [1] http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html
> > [2] https://reviews.llvm.org/D54617
> > [3] https://reviews.llvm.org/D54277
> > [4] https://reviews.llvm.org/D55582
> > [5] https://reviews.llvm.org/D56322
> >
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >
>
>
Thanks for the feedback Pavel!


> [Adding Tamas for his experience with recording and replaying APIs.]
>
>
> Thank you for sharing the prototype Jonas. It looks very interesting,
> but there are a couple of things that worry me about it.
>
> The first one is the 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Tamas Berghammer via lldb-dev
Thanks Pavel for looping me in. I haven't looked into the actual
implementation of the prototype yet but reading your description I have
some concern regarding the amount of data you capture as I feel it isn't
sufficient to reproduce a set of usecases.

One problem is when the behavior of LLDB is not deterministic for whatever
reason (e.g. multi threading, unordered maps, etc...). Lets take
SBModule::FindSymbols() what returns an SBSymbolContextList without any
specific order (haven't checked the implementation but I would consider a
random order to be valid). If a user calls this function, then iterates
through the elements to find an index `I`, calls `GetContextAtIndex(I)` and
pass the result into a subsequent function then what will we do. Will we
capture what did `GetContextAtIndex(I)` returned in the trace and use that
value or will we capture the value of `I`, call `GetContextAtIndex(I)`
during reproduction and use that value. Doing the first would be correct in
this case but would mean we don't call `GetContextAtIndex(I)` while doing
the second case would mean we call `GetContextAtIndex(I)` with a wrong
index if the order in SBSymbolContextList is non deterministic. In this
case as we know that GetContextAtIndex is just an accessor into a vector
the first option is the correct one but I can imagine cases where this is
not the case (e.g. if GetContextAtIndex would have some useful side effect).

Other interesting question is what to do with functions taking raw binary
data in the form of a pointer + size (e.g. SBData::SetData). I think we
will have to annotate these APIs to make the reproducer system aware of the
amount of data they have to capture and then allocate these buffers with
the correct lifetime during replay. I am not sure what would be the best
way to attach these annotations but I think we might need a fairly generic
framework because I won't be surprised if there are more situation when we
have to add annotations to the API. I slightly related question is if a
function returns a pointer to a raw buffer (e.g. const char* or void*) then
do we have to capture the content of it or the pointer for it and in either
case what is the lifetime of the buffer returned (e.g.
SBError::GetCString() returns a buffer what goes out of scope when the
SBError goes out of scope).

Additionally I am pretty sure we have at least some functions returning
various indices what require remapping other then the pointers either
because they are just indexing into a data structure with undefined
internal order or they referencing some other resource. Just by randomly
browsing some of the SB APIs I found for example SBHostOS::ThreadCreate
what returns the pid/tid for the newly created thread what will have to be
remapped (it also takes a function as an argument what is a problem as
well). Because of this I am not sure if we can get away with an
automatically generated set of API descriptions instead of wring one with
explicit annotations for the various remapping rules.

If there is interest I can try to take a deeper look into the topic
sometime later but I hope that those initial thoughts are useful.

Tamas

On Mon, Jan 7, 2019 at 9:40 AM Pavel Labath  wrote:

> On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:
> > Hi Everyone,
> >
> > In September I sent out an RFC [1] about adding reproducers to LLDB.
> > Over the
> > past few months, I landed the reproducer framework, support for the GDB
> > remote
> > protocol and a bunch of preparatory changes. There's still an open code
> > review
> > [2] for dealing with files, but that one is currently blocked by a
> change to
> > the VFS in LLVM [3].
> >
> > The next big piece of work is supporting user commands (e.g. in the
> > driver) and
> > SB API calls. Originally I expected these two things to be separate, but
> > Pavel
> > made a good case [4] that they're actually very similar.
> >
> > I created a prototype of how I envision this to work. As usual, we can
> > differentiate between capture and replay.
> >
> > ## SB API Capture
> >
> > When capturing a reproducer, every SB function/method is instrumented
> > using a
> > macro at function entry. The added code tracks the function identifier
> > (currently we use its name with __PRETTY_FUNCTION__) and its arguments.
> >
> > It also tracks when a function crosses the boundary between internal and
> > external use. For example, when someone (be it the driver, the python
> > binding
> > or the RPC server) call SBFoo, and in its implementation SBFoo calls
> > SBBar, we
> > don't need to record SBBar. When invoking SBFoo during replay, it will
> > itself
> > call SBBar.
> >
> > When a boundary is crossed, the function name and arguments are
> > serialized to a
> > file. This is trivial for basic types. For objects, we maintain a table
> that
> > maps pointer values to indices and serialize the index.
> >
> > To keep our table consistent, we also need to track return for functions
> > that
> > return an object by 

Re: [lldb-dev] [Reproducers] SBReproducer RFC

2019-01-07 Thread Pavel Labath via lldb-dev

On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:

Hi Everyone,

In September I sent out an RFC [1] about adding reproducers to LLDB. 
Over the
past few months, I landed the reproducer framework, support for the GDB 
remote
protocol and a bunch of preparatory changes. There's still an open code 
review

[2] for dealing with files, but that one is currently blocked by a change to
the VFS in LLVM [3].

The next big piece of work is supporting user commands (e.g. in the 
driver) and
SB API calls. Originally I expected these two things to be separate, but 
Pavel

made a good case [4] that they're actually very similar.

I created a prototype of how I envision this to work. As usual, we can
differentiate between capture and replay.

## SB API Capture

When capturing a reproducer, every SB function/method is instrumented 
using a

macro at function entry. The added code tracks the function identifier
(currently we use its name with __PRETTY_FUNCTION__) and its arguments.

It also tracks when a function crosses the boundary between internal and
external use. For example, when someone (be it the driver, the python 
binding
or the RPC server) call SBFoo, and in its implementation SBFoo calls 
SBBar, we
don't need to record SBBar. When invoking SBFoo during replay, it will 
itself

call SBBar.

When a boundary is crossed, the function name and arguments are 
serialized to a

file. This is trivial for basic types. For objects, we maintain a table that
maps pointer values to indices and serialize the index.

To keep our table consistent, we also need to track return for functions 
that

return an object by value. We have a separate macro that wraps the returned
object.

The index is sufficient because every object that is passed to a 
function has
crossed the boundary and hence was recorded. During replay (see below) 
we map

the index to an address again which ensures consistency.

## SB API Replay

To replay the SB function calls we need a way to invoke the corresponding
function from its serialized identifier. For every SB function, there's a
counterpart that deserializes its arguments and invokes the function. These
functions are added to the map and are called by the replay logic.

Replaying is just a matter looping over the function identifiers in the
serialized file, dispatching the right deserialization function, until 
no more

data is available.

The deserialization function for constructors or functions that return 
by value

contains additional logic for dealing with the aforementioned indices. The
resulting objects are added to a table (similar to the one described 
earlier)
that maps indices to pointers. Whenever an object is passed as an 
argument, the

index is used to get the actual object from the table.

## Tool

Even when using macros, adding the necessary capturing and replay code is
tedious and scales poorly. For the prototype, we did this by hand, but we
propose a new clang-based tool to streamline the process.

For the capture code, the tool would validate that the macro matches the
function signature, suggesting a fixit if the macros are incorrect or 
missing.

Compared to generating the macros altogether, it has the advantage that we
don't have "configured" files that are harder to debug (without faking line
numbers etc).

The deserialization code would be fully generated. As shown in the prototype
there are a few different cases, depending on whether we have to account for
objects or not.

## Prototype Code

I created a differential [5] on Phabricator with the prototype. It 
contains the

necessary methods to re-run the gdb remote (reproducer) lit test.

## Feedback

Before moving forward I'd like to get the community's input. What do you 
think
about this approach? Do you have concerns or can we be smarter 
somewhere? Any

feedback would be greatly appreciated!

Thanks,
Jonas

[1] http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html
[2] https://reviews.llvm.org/D54617
[3] https://reviews.llvm.org/D54277
[4] https://reviews.llvm.org/D55582
[5] https://reviews.llvm.org/D56322

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev



[Adding Tamas for his experience with recording and replaying APIs.]


Thank you for sharing the prototype Jonas. It looks very interesting, 
but there are a couple of things that worry me about it.


The first one is the usage of __PRETTY_FUNCTION__. That sounds like a 
non-starter even for an initial implementation, as the string that 
expands to is going to differ between compilers (gcc and clang will 
probably agree on it, but I know for a fact it will be different on 
msvc). It that was just an internal property of the serialization 
format, then it might be fine, but it looks like you are hardcoding the 
values in code to connect the methods with their replayers, which is 
going to be a problem.


I've been thinking about how could this