Re: [GSOC 2014] Some thoughts about the code and the proposal

Philippe Ombredanne Tue, 18 Mar 2014 06:11:15 -0700

On Thu, Mar 13, 2014 at 4:55 PM, yangmin zhu <[email protected]> wrote:
> I had spent some day reading the code of strace to grasp the big picture
> of strace.my main purpose is first to find out how did strace dispatch the
> syscall to each printing function. I found that the main function ( in
> file:strace.c) first call init(argc, argv) to initialize some important data
> structures such as "static struct tcb **tcbtab;" and so on. And then process
> the arguments(using getopt()) and set the corresponding global flags(such as
> cflag_t cflag representing the -c\-C option etc..), and use sigaction() to
> set some signal handler at last.
>
> After that, the main trace loop function "static int trace(void)" is called
> to handle all the work.
>
> the trace() function is really very big and I found it will finally call
> trace_syscall(tcp) to do the core output things, then I go into
> trace_syscall(tcp) to see what happened there.
>
> trace_syscall() is defined in syscall.c and it just simply use exiting(tcp)
> to determine which function to call(trace_syscall_exiting(tcp)  or
> trace_syscall_entering(tcp) ). From there I get to know why "strace sleep 2"
> output like that(and in fact I later set a breakpoint there and see it more
> clearly).
>
> trace_syscall_entering(tcp) will call some other functions to populate some
> fields of the tcp structure. And I think the most important output is done
> by the following code:
>
>        if ((tcp->qual_flg & QUAL_RAW) && tcp->s_ent->sys_func != sys_exit)
>               res = printargs(tcp);
>        else
>               res = tcp->s_ent->sys_func(tcp);
>
> apparently, for most functions the else part is executed and it dispatch to
> the RIGHT function by the structure s_ent which store the output function's
> address by a function pointer(together with the name string of the function
> etc..). I then become interesting at where and how does strace assign the
> right value to tcp->s_ent structure. and I found it is done in "static int
> get_scno(struct tcb *tcp)" by the line "tcp->s_ent = &sysent[scno];". the
> global pointer sysent is defined as:
>
>  const struct_sysent sysent0[] = {
>      #include "syscallent.h"
> };
>
> and after looking at the "syscallent.h",  I finally know how strace
> integrates the syscall table. and I think if we want to add support to some
> new syscall we can start from the syscallent.h.
>
> Back to the real output functions, the syscallent.h file give us the
> function name of the output function, they just have the same name to the
> corresponding syscall function. for example, from file.c I found sys_open()
> which just call static int decode_open(struct tcb *tcp, int offset). and the
> function decode_open do all the detail things , It know the detail arguments
> meaning of syscall open() .
>
> another interesting find is that strace have the low-level output function
> which finally output things and other upper functions just use these some
> kind API to finish their output function and do not care how the low-level
> output function works.the typical low-level output functions are:tprintf(),
> printstr(),printpath(), printfd() and so on.
>
> I will spend more time reading and debugging the code to understand its
> implementation and I think there is no need to understand all of the code
> deeply to finish the GSOC project.


yangmin zhu:
I think you are the right track!
This is a very thorough approach you are taking there which is very good!

Small note: try avoid using HTML emails and stick to plain text on the list.


> From this mail
> (http://sourceforge.net/p/strace/mailman/strace-devel/thread/4515571.KdWbzpdtLr%40vapier/#msg32095710),
> I find "the advanced path decoding itself would be large enough to fill a
> whole 3 month GSOC project".
>
> So, Are you suggesting us not to choose the "advanced path decoding" as the
> proposal?

Forget this post, I was just suggesting someone to look into "advanced
path decoding" too in addition to his suggestion.

Please make a proposal!

> I read the discuss in the mail and found the "Structured output" is also a
> good choice and from my current understanding of strace, we can just modify
> the output part of strace alone to finish the work.
>
> From this mail(http://sourceforge.net/p/strace/mailman/message/32072591/ ),
>
> Is it means that I should first finish a very basic prototype addressed some
> of the problems in the list and post the patch to the mailing list?

This pots was just a list of several details to possibly consider in a
proposal and an implmentation.
This is great if you can post a few patches and ideas ahead of your
proposal but is not an absolute requirement.
The work mode if you submit a proposal, and if this is accepted by
starce and the GSOC would be to discuss your approach and submit
pathes as you go for review to the mailing list.


> by the way, I find in this
> mail(http://sourceforge.net/p/strace/mailman/message/31924683/) that the
> current strace is "Printing of decoded C constructs is mostly open-coded" "
> Support of other formats inevitably means introducing some API for
> structured output " and "the strace code base would have a framework to call
> an output module and that would take care of the exact output details."  .
>
> So I am just wondering why strace hard-coded these decoding function and why
> use the method using in flex/bison, such as:
>
> we first define a specification file(plain text with a specific grammar)
> like:
>
> define sys_open: open ( $1, $2 ) = $0
>
> and then strace parse this file and substitute these $1,$2,$0 variable with
> real arguments and output the result string. because I ever used flex/bison
> and I think this maybe better than the hard-coded way?
>
> this is just my very first thoughts and I know it's immature(we still need
> some special way to handle those complex syscall's argument and this
> requires really a great lot work to do).

This is an intriguing idea! I like it, I am sure the devil is the
details though.
Are you suggesting to use a string template for the output, or to have
a grammar to do the actual decoding or arguments?


Thank you again for this detailed and thorough email!
Cordially

-- 
Philippe Ombredanne

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Strace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/strace-devel

Re: [GSOC 2014] Some thoughts about the code and the proposal

Reply via email to