On 09/13, Roland McGrath wrote: > > > It turns out, ptrace_detach_task() is absolutely wrong if > > voluntary == F and the tracee didn't report the stop. > > > > For example, ptrace_report_signal() does: > > > > if (utrace_control(current, engine, UTRACE_INTERRUPT)) > > WARN_ON(1); > > > > this is wrong too. The exiting tracee can do detach in parallel. > > Huh? You mean the exiting tracer can do detach in parallel, right?
Argh, tracer, yes. > So utrace_control() inside the callback returns -ESRCH because @engine > is already marked as detached. Yes. > No, of course you must stay around to give back the intercepted signal. > > > But, > > > > ctx->resume = UTRACE_DETACH; > > utrace_barrier(...); > > utrace_control(UTRACE_RESUME); > > > > can't guarantee the tracee won't stop. > > The documented API is that this does guarantee that (if utrace_barrier > returned zero), if you synchronized with the callback. The effect of the > callback's return value should be recorded before utrace_barrier returns. > So you can use barriers or locks to make sure that ->resume is set before > the callback runs or after it runs, and it checks ->resume, then you can > win. If the callback sees ->resume = UTRACE_DETACH, it will know to bail > out and just deliver the new signal. Otherwise, the tracer knows that if > there was a callback, its UTRACE_STOP return value took effect before > utrace_barrier returned zero. The problem is, utrace_control(UTRACE_RESUME) can't prevent the stop if the tracee has already returned UTRACE_STOP, but utrace_stop() didn't take utrace->lock yet. See also below. > > But, even if we fix UTRACE_RESUME, so far I do not see how we can > > fix the implicit detach and avoid the races with report_signal(). > > I am afraid we need some locks. Implicit detach must never use > > UTRACE_DETACH blindly, even if ctx->siginfo/signr = 0. > > Locks just between the tracer and ptrace_report_signal are not bad. OK, but let me think a bit more. I'd really like to avoid adding the new lock to avoid the very unlikely race with the exiting tracer. > I glanced at the patch, but I'd rather see an explanation of its plan. Basically, ptrace_detach_task(sig => -1) should do: - if we are going to do utrace_control(UTRACE_DETACH), we should first instruct the (running) tracee to not report a signal, otherwise that signal will be lost. - if the tracee has already reported a signal, we should set ->resume = UTRACE_DETACH and resume the tracee, like explicit detach does. We can check ctx->siginfo != NULL to detect this case. > Especially if it's as I suspect, that we can do > that without changing the utrace layer. No, this problem is orthogonal, or I missed something. Please look at this message https://www.redhat.com/archives/utrace-devel/2010-June/msg00075.html In particualar: Off hand I think it does matter today insofar as it violates the documented guarantees of the utrace_barrier API. If utrace_barrier returns 0 it is said to mean that, "Any effect of its return value (such as %UTRACE_STOP) has already been applied to @engine." So e.g. if you get a wake-up sent by your callback before it returns UTRACE_STOP, and then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME), then you should be guaranteed that your resumption cleared the callback's stop. Yes, but currently UTRACE_RESUME can't guarantee this. utrace_control(RESUME) should remove ENGINE_STOP like UTRACE_DETACH does, otherwise this code please_never_return_UTRACE_STOP(tracee); // make sure any subsequent callback sees the result of above utrace_barrier(...); // if it has already returned UTRACE_STOP, resume it utrace_control(UTRACE_RESUME); is racy. Oleg.