Re: proof-of-concept, utrace->ftrace engine

Roland McGrath Sun, 08 Feb 2009 23:22:39 -0800

> Here's the start of a little ditty that ties process-related events as
> hooked by the Roland McGrath's utrace code into the ftrace
> buffer/control widgetry.  If nothing else, think of it as one
> potential in-tree user of utrace.


Cool!  I won't comment on the use of the tracer or its interface code,
I'll leave that to others.  (It's simplistic and kludgey, but I know
that's what it's going for.)  I'll just review your use of the utrace API.

> +/* Should tracing apply to given task?  Compare against filter
> +   values. */
> +static int trace_test (struct task_struct *tsk) 
> +{
> +        if (trace_taskcomm_filter[0]
> +            && strcmp (trace_taskcomm_filter, tsk->comm))
> +                return 0;

Note that this is the most simple-minded approach for this.  The ->comm
value only changes at exec.  So the "normal", slightly more sophisticated,
way to approach this would be to check at attach time if ->comm matches.
If so, enable full tracing.  If not, enable only EXEC and CLONE events.  In
your report_exec callback, check ->comm to see if the task now should be
filtered in or now should be filtered out, and call utrace_set_events with
more or fewer bits set accordingly.  You always need the report_clone
callback to attach the new child so you can see when it execs; give the new
child the thin or fat event mask as its parent has.

This way, you don't go off the fast paths in signals, etc. when you are
never going to care about those events.  For a trivial hack like this one,
you might not care.  But for more serious use, you want to bother doing it
the fancier way.  If you added syscall tracing support, you probably would
care about the overhead of enabling that on all the uninteresting tasks.

> +        if (trace_taskuid_filter != (u32)-1 
> +            && trace_taskuid_filter != task_uid (tsk))
> +                return 0;

We don't have a utrace event for uid changes, so this one you do have to do
"eagerly".  (Some day in the future, we might well have an event for this
so it can be treated intelligently on transitions as with exec as the
"->comm change event".)

> +static struct utrace_engine_ops process_trace_ops __read_mostly;

This is normally const.  utrace never touches it (all const pointers).
You could change it yourself, but that would not be a normal way to do things.

> +        engine = utrace_attach_task (tsk, UTRACE_ATTACH_CREATE,
> +                                     & process_trace_ops, NULL);

Given how you use UTRACE_ATTACH_MATCH_OPS to effect detach, you might want
to use UTRACE_ATTACH_MATCH_OPS|UTRACE_ATTACH_EXCLUSIVE here.  It's probably
impossible to have another call than yours with the same ops pointer, but
if not then it probably indicates that your later detach could well foul
something up.

> +                /* XXX: Why is this not implicit from the fields
> +                   set in the process_trace_ops? */
> +                rc = utrace_set_events (tsk, engine,

The same reason FWRITE on a struct file is not implicit from having a
.write field set in your struct file_operations.  Your ops struct says
statically what your code is written to handle.  An engine's event mask
says what callbacks you want from that specific thread to that specific
engine at the moment.

> +                                        UTRACE_EVENT(SIGNAL) |

Note this means (exactly as documented):

        _UTRACE_EVENT_SIGNAL,   /* Signal delivery will run a user handler.  */

You might have had UTRACE_EVENT_SIGNAL_ALL in mind.  
That is the union of the five different kinds of SIGNAL* event.

> +u32 process_trace_report_clone (enum utrace_resume_action action,
[...]
> +        return action;
> +}

This is wrong.  If you have nothing special you want to do (just
observing, not perturbing), then "return UTRACE_RESUME;" is what you say.
In report_signal, the non-utrace_resume_action part of the return value
matters, so:
        return UTRACE_RESUME | utrace_signal_action(action);
is what doesn't change anything there.

As documented under 'struct utrace_engine_ops', the action argument is
what other engines are causing to be done independent of what your
engine does.  The utrace_resume_action part of the return value is
what *your engine* wants done, independent of what other engines say.
Your choices might be informed by what other engines are doing in some
cases, but it is not right to mimic what they said.  If some other
engine said UTRACE_STOP, then now you say UTRACE_STOP, but you'll
never call utrace_control to resume, and the thread will be stopped
forever.  If he says UTRACE_STOP and you don't care, you say
UTRACE_RESUME, and the thread stops (UTRACE_STOP < UTRACE_RESUME).
When he calls utrace_control in the future, the thread resumes because
there is no engine left whose last command was UTRACE_STOP.

The non-utrace_resume_action part of the return value (only nonempty for
SIGNAL* and SYSCALL* events) is different.  Unlike utrace_resume_action,
the different choices of different engines can't be combined into a
"least common denominator".  The choice of utrace_signal_action or
utrace_syscall_action setting is what the user-visible disposition
resolving the event will be; all the choices are mutually exclusive and
their effects final.  The last callback to run chooses the final answer.
So each callback has to decide something.  It gets the incoming choice
in its action argument, either from the preceding callback or from the
original normal default (what prevails in the absence of tracing).  The
idiom above passes through the incoming value to leave that choice alone.

> +                /* Skip over kernel threads. */
> +                mm = get_task_mm (tsk);
> +                if (!mm)
> +                        continue;

This should just check PF_KTHREAD.  (As it is, you leak an mm ref here.)
Or just don't bother and handle utrace_attach returning ERR_PTR(-EPERM),
which it will for a kernel thread.


Thanks,
Roland

Re: proof-of-concept, utrace->ftrace engine

Reply via email to