On 09/13, Roland McGrath wrote:
>
> > It turns out, ptrace_detach_task() is absolutely wrong if
> > voluntary == F and the tracee didn't report the stop.
> >
> > For example, ptrace_report_signal() does:
> >
> >  if (utrace_control(current, engine, UTRACE_INTERRUPT))
> >    WARN_ON(1);
> >
> > this is wrong too. The exiting tracee can do detach in parallel.
>
> Huh?  You mean the exiting tracer can do detach in parallel, right?

Argh, tracer, yes.

> So utrace_control() inside the callback returns -ESRCH because @engine
> is already marked as detached.

Yes.

> No, of course you must stay around to give back the intercepted signal.
>
> > But,
> >
> >  ctx->resume = UTRACE_DETACH;
> >  utrace_barrier(...);
> >  utrace_control(UTRACE_RESUME);
> >
> > can't guarantee the tracee won't stop.
>
> The documented API is that this does guarantee that (if utrace_barrier
> returned zero), if you synchronized with the callback.  The effect of the
> callback's return value should be recorded before utrace_barrier returns.
> So you can use barriers or locks to make sure that ->resume is set before
> the callback runs or after it runs, and it checks ->resume, then you can
> win.  If the callback sees ->resume = UTRACE_DETACH, it will know to bail
> out and just deliver the new signal.  Otherwise, the tracer knows that if
> there was a callback, its UTRACE_STOP return value took effect before
> utrace_barrier returned zero.

The problem is, utrace_control(UTRACE_RESUME) can't prevent the stop if
the tracee has already returned UTRACE_STOP, but utrace_stop() didn't
take utrace->lock yet.

See also below.

> > But, even if we fix UTRACE_RESUME, so far I do not see how we can
> > fix the implicit detach and avoid the races with report_signal().
> > I am afraid we need some locks. Implicit detach must never use
> > UTRACE_DETACH blindly, even if ctx->siginfo/signr = 0.
>
> Locks just between the tracer and ptrace_report_signal are not bad.

OK, but let me think a bit more. I'd really like to avoid adding the
new lock to avoid the very unlikely race with the exiting tracer.

> I glanced at the patch, but I'd rather see an explanation of its plan.

Basically, ptrace_detach_task(sig => -1) should do:

        - if we are going to do utrace_control(UTRACE_DETACH), we
          should first instruct the (running) tracee to not report
          a signal, otherwise that signal will be lost.

        - if the tracee has already reported a signal, we should
          set ->resume = UTRACE_DETACH and resume the tracee, like
          explicit detach does.

          We can check ctx->siginfo != NULL to detect this case.

> Especially if it's as I suspect, that we can do
> that without changing the utrace layer.

No, this problem is orthogonal, or I missed something.

Please look at this message

        https://www.redhat.com/archives/utrace-devel/2010-June/msg00075.html

In particualar:

        Off hand I think it does matter today insofar as it violates the
        documented guarantees of the utrace_barrier API.  If utrace_barrier
        returns 0 it is said to mean that, "Any effect of its return value (such
        as %UTRACE_STOP) has already been applied to @engine."  So e.g. if you
        get a wake-up sent by your callback before it returns UTRACE_STOP, and
        then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME),
        then you should be guaranteed that your resumption cleared the
        callback's stop.

Yes, but currently UTRACE_RESUME can't guarantee this. utrace_control(RESUME)
should remove ENGINE_STOP like UTRACE_DETACH does, otherwise this code

        please_never_return_UTRACE_STOP(tracee);

        // make sure any subsequent callback sees the result of above
        utrace_barrier(...);

        // if it has already returned UTRACE_STOP, resume it
        utrace_control(UTRACE_RESUME);

is racy.

Oleg.

Reply via email to