Re: [PATCH 1/2] UTRACE_STOP race condition (updated)

Oleg Nesterov Thu, 12 Mar 2009 10:24:29 -0700

Hi Renzo,

This patch needs Roland's review, but I'd like to participate...

On 03/12, Renzo Davoli wrote:
>
> I have updated my patch #1 (it solves the race condition on utrace_stop but
> not the nesting issue) for the latest version of utrace.
>
> I am trying to get the patches updated downloading, compiling and testing
> the fixes every week or so...
> Things would be easier if these patch could be merged in the mainstream ;-)

I think it would be better if you describe the problem in the changelog.
It is not convenient to dig the archives to understand which problem
this patch fixes.

Can't really comment this change because I don't understand what is the
supposed behaviour of utrace_control(UTRACE_RESUME). Perhaps the caller
should wait until the target is stopped? The comment says:

         case UTRACE_RESUME:
                * This and all other cases imply resuming if stopped.

it doesn't explain what should we do if it is not stopped yet.

>  static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
>  {
>       bool killed;
> +     struct utrace_engine *engine, *next;
>
>       /*
>        * @utrace->stopped is the flag that says we are safely
> @@ -406,7 +414,23 @@
>               return true;
>       }
>
> -     utrace->stopped = 1;
> +     /* final check: it is really needed to stop? */
> +     list_for_each_entry_safe(engine, next, &utrace->attached, entry) {

I think we can do this earlier, before taking ->siglock

> +             if ((engine->ops != &utrace_detached_ops) && 
> engine_wants_stop(engine)) {

Do we need "!= &utrace_detached_ops" check? mark_engine_detached() removes
ENGINE_STOP from ->flags.

> +                       if (engine_wants_resume(engine)) {
> +                               clear_engine_wants_stop(engine);
> +                               clear_engine_wants_resume(engine);
> +                       }

I'm afraid _wants_resume() adds another problem. Let's suppose we do

        utrace_control(UTRACE_RESUME);
        utrace_control(UTRACE_STOP);

UTRACE_STOP doesn't do clear_engine_wants_resume(), so it can be lost.

And. Let's suppose we call utrace_control(UTRACE_RESUME), and later
report_xxx() returns UTRACE_STOP. Again, this stop request can be lost.
This doesn't look consistent.

Do we really need _wants_resume()? Note that utrace_control(UTRACE_RESUME)
does clear_engine_wants_stop(). Yes, we can race with finish_callback()
in case when ->report_xxx() returns UTRACE_STOP. But, perhaps, in that
case the caller of utrace_control(UTRACE_RESUME) should take care about
the synchronization with its own callbacks? Something like:

        make_sure_my_callback_wont_return_UTRACE_STOP();
        utrace_barrier();
        utrace_control(UTRACE_RESUME);

This way utrace_stop() can just check engine_wants_stop().

Oleg.

Re: [PATCH 1/2] UTRACE_STOP race condition (updated)

Reply via email to