Re: UTRACE_STOP race condition?

Roland McGrath Sun, 15 Mar 2009 19:48:28 -0700

Thanks very much for the feedback, Renzo.  You seem to be about the only
person to thoroughly exercise this part of the API so far.  I'm sure it can
use some refinement.


> please help me. Either I have not understood the meaning of UTRACE_STOP
> or it is completely useless due to a race condition.

I'm confident it can be a little bit of each. ;-)

> There are always two entities in a utrace interaction: the traced
> process and the tracing module.

There are lots of ways to slice things into a notion like 'entity'.
Let's be precise in what we're specifically discussing right now.
The question at hand is about synchronization between two threads:
a traced task and a control task.

> When a traced event occurs in the traced process the correspondent 
> report function gets called in the module.

Your engine's callback function is run by the traced task, yes.

> If the report function returns UTRACE_STOP the traced process stays in a
> quiescent state and the module wakes it up by a 
> utrace_control(...,UTRACE_RESUME) call *later*.

A control task (i.e. whatever other task) can make this call at some time, yes.

> If the module wakes the traced process too quickly, utrace has not yet put
> it into a "stopped" state, therefore UTRACE_RESUME gets lost.
> As a consequence, the execution is blocked.
> 
> IMHO, given the current utrace code, there is no way to set up some kind
> of synchronization in the module to prevent this error.

I understand what scenario you mean.  The rest of your message talks about
implementation details of utrace internals.  Frankly I find this confusing
and distracting from the API discussion.  I've gone to some pains to
explicitly document what all the API guarantees and requirements are (and
aren't), in the kerneldoc and docbook text.  I would like us to discuss the
problems for writing tracing engines in terms of the documented API
constraints and guarantees.

The API documentation says what the contract is between the kernel and the
module writer.  If that specification is ambiguous, we'll first fix the
descriptions to be clear.  If what it specifies needs to change into a
better contract for module writers, we'll decide what new contract to agree
on.  Finally, if the utrace implementation does not do what it says, then
we'll fix the implementation.  Your postings have thrown all this together,
which does not work for me.

Please start a separate thread about each separate issue, such as callback
order among engines.  I understand your motivation for all these things is
tied together, but they are separate subjects to address individually.

In commit 3a9f4c87, I made a change/clarification to the API documentation
for utrace_barrier() and a corresponding fix to the implementation.  What
this does that was missing before is that utrace_barrier() does not
consider your engine's callback to be complete until your callback's return
value has been processed.  That means that if utrace_barrier() returned 0
and then you call utrace_control(UTRACE_RESUME), the UTRACE_STOP return
value of your prior callback is definitely before the UTRACE_RESUME of your
asynchronous control call.

Please address your concerns on the synchronization issue with respect to
the documented API guarantee now made by this utrace_barrier() behavior.


Thanks,
Roland

Re: UTRACE_STOP race condition?

Reply via email to