> A better way (which requires much more work of course) would be to > come with smaller incremental patches and clearly separate the parts > that are JSON-specific for later, focusing first on an incremental > refactoring and abstraction of the printf scattered code.
I think adding abstraction would be hard to do in incremental patches, but I agree on the need for them and a potential CI system. > So IMHO a good proposal would go at this in small steps: > - find a best approach abstract printing in abstract of any JSON > output and submit these as small incremental and consistent patches. > This should introduce no new feature. Here are the questions that I think should be asked independently: - what do we want the code/printing API to look like? - what do we want the json output to look like? - do we want the json output to be human-readable? Currently, the outputing code is scattered across the code paths. When put together, all different parts look like this: tprintf("%s(", tcp->s_ent->sys_name); printfd(tcp, tcp->u_args[0]); tprints(", "); printfd(tcp, tcp->u_args[0]); tprints(") "); tprints("= %#lx", tcp->u_rval); tprints("\n"); I feel like the code writing in the comas, parenthesis and everything could be abstracted. Indeed, these delimitations are specific to the classical output, and json would require something else. I am thinking about an `output machine` that would keep a state. This state would determine multiple things: the next delimiter (),= in the case of classical output, wether to flush the output or not (yes after outputting all arguments and waiting for the return of the syscall, no in-between arguments...). We probably would end up with something like: out_update_state(&om, OSTATE_SYSNAME); out(&om, F_STR, tcp->s_ent->sys_name); out_update_state(&om, OSTATE_ARGS); out(&om, F_FD, tcp->u_args[0]); out(&om, F_FD, tcp->u_args[1]); out_update_state(&om, OSTATE_RET); out(&om, F_FD, tcp->u_rval); out_update_state(&om, OSTATE_DONE); ... This is just an idea in the works and I don't know up to what point we could shorten this with implied state change after the printing of the syscall name, printing of multiples arguments in a single call, and the likes. > - once and if that can be completed, implement JSON support, if possible Considering json doesn't handle hexa/octal, wether we should output a string containing an hexadecimal number or just convert that number to decimal output depends on the third question asked at the beginning of this e-mail: do we care for a human-readable json output? Cases such as the -y option fall into the same category up to a point, although they would be easier to handle with nested json. {'syscall': 'dup2', 'args': [0, 1], 'ret': 2} With -y argument, for example: {'syscall': 'dup2', 'args': [{'fd': 0, 'path': '/dev/pts/5'}, ['fd': 1, 'path': '/dev/pts/5'], 'ret': {'fd': 2, 'path': '/dev/pts/5'}} I strongly believe the json output is not to be human readable, and should therefore contain as much information as possible (all of it, why not). For example, why not always output the -y option? Considering no human should read the json output, there is no 'output polluting' per say. We could therefore incorporate timings, syscall count, syscall timestamps, ... This decision would allow us to also not abbreviate the arguments lists. Discarding information would be left to the discretion of the user. > BTW, the line-by-line JSON approach has names and even specs now! > See [3] , [4] and [5] With line-delimited json, I am imagining this kind of output: ---- start of the output {'syscall': 'dup2'} {'timestamp': '15:27:02'} {'eip': 139901979798028} {'args': [{'fd': 0, 'path': '/dev/pts/5'}, ['fd': 1, 'path': '/dev/pts/5']} ---- potential hang on the syscall {'ret': {'fd': 2, 'path': '/dev/pts/5'}} {'time': 0.000010} ---- delimiter of some sort {'syscall': 'close'} {'timestamp': '15:27:02'} {'eip': 139901978813632} {'args': [{'fd': 2, 'path': '/dev/pts/5'}]} ---- potential hang on the syscall {'ret': -1, 'errno': 13, 'error': 'EACCES', 'message': 'Permission denied'} {'time': 0.000010} ---- end of the output Please correct me if my understanding of the json output we are expecting is not at all the same, but this feels right to me. The -p option is a bit of a problem: each pid given uses a different tcb structure for each pid. Creating a different output machine for each tcb structure would work in my opinion. It could simply output on different fds, or maybe use a multiplexing logic for managing multiple `output machines` on the same file descriptor. It feels obvious to me that outputing json on stderr with potential program output would make no sense and should not be handled. The main issue I have not addressed is notification messages, unfinished, resume stuff and the like. There are still a lot of questions to be asked and answers to be given but I'd like to know first your opinion on these few ideas. I also don't believe it to be possible to both refactor the printing code and code the json output in the same GSOC, but as Philippe said, one thing at a time. Cheers, -- Louis 'manny' Feuvrier LSE - EPITA 2016 ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel