Grace O’Hair-Sherman [email protected] 432 Flood Avenue San Francisco, CA, 94112 USA 1-415-425-3451
Emergency contact: Amy O’Hair 1-415-334-5154 GSOC Proposal: Filter which will parse the plain text output of strace and convert it to structured formats such as JSON Synopsis: As it is, the output of strace is not easily machine-readable. I propose to solve this problem by providing a filter to parse strace output and convert to a structured format. This parser will be written in Python and the output will have the option of being in JavaScript Object Notation or MessagePack (http://msgpack.org/). Here is an example of how partial output of strace run on a hello-world program might be output as JSON (supposing the parser were named strace_to_structured): Partial output: % strace -T ./hello execve("./hello", ["./hello"...], [/* 33 vars */]) = 0 <0.000071> brk(0) = 0x24e3000 <0.000006> Newline-delimited JSON Stream: {"syscall": "execve", "args": ["./hello", "[\"./hello\"...]", "[/* 33 vars */]"], "ret_val": 0, "ret_val_hex": "0", "kernel_time": "0.000071"} {"syscall": "brk", "args": ["0"], "ret_val": "38678528", "ret_val_hex": "0x24e3000", "kernel_time": "0.000006"} Benefits to Community Anyone who wants to programmatically consume strace output must currently write their own parser before they can use the output. This parser will save these people time and effort as they can start with a format that is easily parseable. Deliverables Preparations completed: I have built strace and reviewed the previous JSON work done in the project. Roadmap: 23 May - 29 May -- Investigation & research into what useful JSON output would look like and where to put Python program in SourceForge and how to package and distribute the program (with help from community mailing list) (Spring quarter classes at university) 30 May - 5 June -- Set up git repository, get dummy I/O working, and propose JSON format and get review from community mailing list (Spring quarter classes) 6 June - 12 June -- Begin creating prototype that can create JSON output for one test from strace-code/test. I would approach this by writing a parser class containing the parsing logic (possibly regular expressions as I have experience with them) which is instantiated once at each runtime, a syscall class that would be instantiated once for each system call in the strace output, and a formatter class for each output format. This formatter class would contain functions for outputting the fields from the syscall objects in the given format. I would begin by having one formatter class for JSON and then eventually add more classes for other formats as they became feasible. (Spring quarter final examinations at university) 13 June - 19 June -- Continue work on prototype (see description above). Find/write a Newline-delimited JSON validator (like JSLint but accepting of newline-delimited rather than comma-delimited JSON) to check JSON output syntax. 20 June - 26 June -- Finish prototype (see above). Include in prototype another formatter class to output the syscall objects' fields in an strace format to validate/test program correctness. (The goal being for the strace input and the output to look similar). (GSOC Midterm evaluation submission period) 27 June - 3 July -- Create automated test using initial test program. Run filter with more existing strace programs, fixing problems as they appear. Make filter accept flags/command-line arguments to control which formatter to use (JSON or otherwise). 4 July - 10 July -- Ensure filter correctly reads strace output when it is run with flags (e.g. -T, -v ) and correctly outputs corresponding JSON; Write usage text that is emitted by filter when presented with unknown flags. Since a lot of strace flags add different fields (for example -T adds the time a system call spent in the kernel) to the default strace output, I plan to expand the filter to accept strace output produced using such flags by adding more arguments to the initializer for the syscall object and making their default values null (and only giving them values if the flag is enabled). Additionally, whatever function prints the objects as JSON would not print any values that were null to avoid overcomplicating the filter’s output for default/no-flag strace output (making the common case simple). 11 July - 17 July -- Document project so far and ensure flag support (continued from previous week). 18 July - 24 July -- Write a demo program that consumes the filter output and prints a summary of average time taken by different system calls. 25 July - 31 July -- (Stretch goal) enhance filter to output MessagePack (and ensure works with one test from strace-code/test) 1 August - 7 August -- (Stretch goal) Run filter with MessagePack output and with more existing strace programs, fixing problems as they appear. 8 August - 14 August -- (Stretch goal) Ensure filter correctly reads strace output when it is run with flags (e.g. -T, -v ) and correctly outputs corresponding MessagePack 15 August - 23 August 19:00 UTC -- Final week: tidy code, improve documentation, and submit code sample. Related Work: A similar project was proposed and implemented during the 2014 Google Summer of Code, the main difference being that it was supposed to be directly a part of strace. It seems that this project’s scope may have been too big and it was never integrated with strace. This proposal has a smaller scope in that it will be a separate script that does post-processing on strace output. Another difference is that this project will result in a program with options for different output formats, i.e. JSON or MessagePack. (Inspired by this post: goo.gl/2yvCTG) Biographical Information: I am a second-year computer science major at University of California, Santa Cruz. I have taken Computer Architecture, Algorithms and Abstract Data Types, Computer Systems and Assembly Language, Introduction to Data Structures, and Accelerated Introduction to Programming. By summer I will have taken Analysis of Algorithms as well. Almost all these classes have involved UNIX or Linux Bash and Makefiles. I started developing using Ubuntu two years ago when I interned at Gametime United. I also used Git and wrote JSON, both manually and automatically by writing a Python script. I have experience meeting project deadlines; last summer I designed, coded, and shipped an iOS application from start to finish in less than eight weeks. (It is called Amino Ally: goo.gl/WTGgUz ) I haven’t done any open source projects yet, although I’m a member of my school’s Linux Users’ Group, so I’m really excited for this opportunity to get more involved. The relevant skills that will help me achieve this project’s goal include Bash, Makefiles, Git, JSON, regular expressions, and Python (see Github repository github.com/gracefulPotato/gsoc-python for some scripts I have written including a very rough draft script that outputs JSON called struct_ex.py). During the last 10 weeks of Google Summer of Code I will be available full time to work on my project. I have university classes during the first two weeks and final examinations during part of the third week, but I will nonetheless make sure to work at least 20 hours in each of those three weeks. I consider this a serious full-time commitment and I will make up the 60 hours missed during the first three weeks by working 46 hours a week for the remaining 10 weeks. ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 _______________________________________________ Strace-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/strace-devel
