Re: [GSOC2015] JSON Formatting

Louis Feuvrier Sat, 07 Mar 2015 13:16:12 -0800

On Sat, Mar 07, 2015 at 05:38:40AM +0300, Dmitry V. Levin wrote:
> > I think adding abstraction would be hard to do in incremental patches, but I
> > agree on the need for them and a potential CI system.
> 
> There are hundreds of raw tprintf and tprints calls.  I wonder how could
> you introduce an output state machine in incremental patches.


Yes, that is indeed what I was thinking.

> > out_update_state(&om, OSTATE_SYSNAME);
> > out(&om, F_STR, tcp->s_ent->sys_name);
> > out_update_state(&om, OSTATE_ARGS);
> > out(&om, F_FD, tcp->u_args[0]);
> > out(&om, F_FD, tcp->u_args[1]);
> > out_update_state(&om, OSTATE_RET);
> > out(&om, F_FD, tcp->u_rval);
> > out_update_state(&om, OSTATE_DONE);
> > ...

I have discussed this with Gabriel Laskar over lunch and it feels like
another more descriptive API like:

output_sysname(&om, tcp->s_ent->sys_name);
output_arg(&om, F_FD, tcp->u_args[0]);
output_arg(&om, F_FD, tcp->u_args[1]);
output_ret(&om, F_FD, tcp->u_rval);

with output_arg() being a simple wrapper for single arguments of:

output_begin_arg(&om);
/* printing of a structure for example */
output(...);
output_end_arg(&om);

This is the result of a quick brainstorming and might change once again.

> > This is just an idea in the works and I don't know up to what point we could
> > shorten this with implied state change after the printing of the syscall 
> > name,
> > printing of multiples arguments in a single call, and the likes.
> 
> This machine is going to be a bit more complex: it would have to support
> output of nested objects like structures containing arrays of structures
> (e.g. struct msghdr), but in general I think this is the right approach.

I have been looking for a smart way to print out structures in C, but
with lack of reflexion in the language it is not a simple task, and
might require something like two compilation rounds.

Another solution might be to have a printing subroutine for each type of
structure but this feels a bit overkill and re-writing everything would
be very error-prone in my opinion. Writing unit tests on-the-fly would
take a very long time and would certainly end-up in the project not
being finished at the deadline.

The above-mentioned idea of just using output_{begin,end}_arg()
functions in the main code path looks like a softer break from the
current state of the codebase, even though abstraction would not be as
great as in the ideas stated above.

I do not have any idea other than that, and feel a bit stuck here.

> > {'syscall': 'dup2', 'args': [{'fd': 0, 'path': '/dev/pts/5'}, ['fd': 1,
> > 'path': '/dev/pts/5'], 'ret': {'fd': 2, 'path': '/dev/pts/5'}}
> > 
> > I strongly believe the json output is not to be human readable, and should
> > therefore contain as much information as possible (all of it, why not). For
> > example, why not always output the -y option? Considering no human should 
> > read
> > the json output, there is no 'output polluting' per say. We could therefore
> > incorporate timings, syscall count, syscall timestamps, ... This decision 
> > would
> > allow us to also not abbreviate the arguments lists. Discarding information
> > would be left to the discretion of the user.
> 
> I agree that all available information should be included.  Whether
> a particular piece of information is actually available or not is another
> question.  For example, some information is readily available (e.g syscall
> name and number), some costs a syscall to obtain (e.g. timestamp, -y,
> and -i on some architectures), some is quite expensive (e.g. -yy).
> In each case user decides how much information needs to be obtained.

I didn't thought at first of the need for additional syscalls/costly
logic when outputting the -i/-y/-yy options. I thought they could be set
to true by default when using json, but I see now how that logic is
flawed.

> > With line-delimited json, I am imagining this kind of output:
> > 
> > ---- start of the output
> > {'syscall': 'dup2'}
> > {'timestamp': '15:27:02'}
> > {'eip': 139901979798028}
> > {'args': [{'fd': 0, 'path': '/dev/pts/5'}, ['fd': 1, 'path': '/dev/pts/5']}
> > ---- potential hang on the syscall
> > {'ret': {'fd': 2, 'path': '/dev/pts/5'}}
> > {'time': 0.000010}
> > ---- delimiter of some sort
> > {'syscall': 'close'}
> > {'timestamp': '15:27:02'}
> > {'eip': 139901978813632}
> > {'args': [{'fd': 2, 'path': '/dev/pts/5'}]}
> > ---- potential hang on the syscall
> > {'ret': -1, 'errno': 13, 'error': 'EACCES', 'message': 'Permission denied'}
> > {'time': 0.000010}
> > ---- end of the output
> > 
> > Please correct me if my understanding of the json output we are expecting is
> > not at all the same, but this feels right to me.
> 
> Yes, but please keep in mind that not all syscalls are that simple.  For
> example, many syscalls have some of their arguments decoded on exiting
> syscall, and some syscall arguments are decoded both on entering and
> exiting syscall, e.g. _IOWR ioctls.

I understand fully that this is the base case scenario. However, I don't
understand something regarding the arguments being printed after the end
of the syscall. Are they printed out twice? I looked a bit at the code
path for _IOWR ioctls and I don't see when it is done. The sys_ioctl
function looks like it is called at the same time as all the other sys_
functions.

> 
> > The -p option is a bit of a problem: each pid given uses a different tcb
> > structure for each pid. Creating a different output machine for each tcb
> > structure would work in my opinion.
> 
> Exactly.
> 
> > It could simply output on different fds, or
> > maybe use a multiplexing logic for managing multiple `output machines` on 
> > the
> > same file descriptor.
> 
> This would follow the current practice: there is a multiplexing logic for
> the regular mode, and in -ff mode each tcb has its own output descriptor.

Given the fact that json would be used in special cases and probably
read by a machine, I am entertaining the idea of having it
-ff-dependent. This would remove the need for a multiplexing logic.
What do you think?

> > It feels obvious to me that outputing json on stderr with
> > potential program output would make no sense and should not be handled.
> 
> There is an option (-o) to control this behaviour.

Yes, but I meant just disabling the possibility to output json on
something else than a -o'd file. After all, mixing output of the program
and json just plain doesn't make sense.

> > The main issue I have not addressed is notification messages, unfinished,
> > resume stuff and the like.

I am still uncertain about the way to 

> > There are still a lot of questions to be asked and answers to be given but 
> > I'd
> > like to know first your opinion on these few ideas.
> 
> I think a per-tcb output state machine with its own stack (remember about
> nested objects) is the right approach.

A recursive approach to nested-objects would imply a subroutine for each
type of argument, am I right? Would this be the right approach? I feel
like it is too great of a change and shifts a great deal of logic into
the outputting module.

-- 
Louis 'manny' Feuvrier
LSE - EPITA 2016

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Strace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/strace-devel

Re: [GSOC2015] JSON Formatting

Reply via email to