I'm replying out of order. So I'm sorry I started rehashing some of this same stuff in the other thread before reading where you'd already referred to it.
> What if the tracee reports a signal and stops, the tracer detaches > but does not wake it up because of SIGNAL_STOP_STOPPED ? In this > case this signal will be visible after the next attach. > > OK, currently ptrace_detach() unconditionally wakes up the tracee. > But we seem to agree this wakeup should be killed, it completely > breaks the group-stop logic and it has other problems. So today the manifest semantics in userland is that PTRACE_DETACH from a signal stop causes immediate delivery of the signal in that thread regardless of the group stoppedness. In a (converted) job stop or non-signal, non-syscall ptrace stop the tracee resumes and drops the passed signr on the floor. Those semantics are very clear for the single-threaded case at least, so we can just be compatible. In MT cases where the result is a group-exit, that is pretty consistent and reliable today, so we probably need to be compatible there too. In MT cases where the result is to flutter back into job stop (SIGSTOP et al), then there too it is reliable today that you'll get another SIGCHLD and wait wakeup to go with that "freshly stopped again". In MT cases where the result is to confuse the group-stop state, i.e. tracee runs (signal handler or ignored signal) while group is supposedly stopped, we probably don't break anything not already broken by cleaning it up somehow. But if the tracer does a loop of PTRACE_DETACH on each tracee thread, then the result today is probably pretty consistent in manifest behavior today: they all get back to running, even if the SIGNAL_STOP_STOPPED flag is still set. But I'm not at all sure anything could discern from userland that SIGNAL_STOP_STOPPED is still set. It will be set anew or cleared by the time it actually matters. So maybe it really is consistent enough with today's observable behavior just to have PTRACE_DETACH clear SIGNAL_STOP_STOPPED. I'm guessing that will be pretty well consistent in the loop-of-detach case (by the end of it). I'm not really sure what would go different in an oddball case where you PTRACE_DETACH'd one thread but the others were actually left stopped (e.g. in true job stop and you only ever attached the one thread). But it probably doesn't much matter, that usage mode must already be pretty broken I would think. > If "monitor" really wants to kill the tracee it should use SIGKILL ;) Say it wants a core dump. > Any other fatal signal can be ignored by the tracee's sub-thread after > detach but before the tracee returns from ptrace_signal(). Say they never tested the MT case before and haven't started caring about it now. They just want what they have observed before to stay the same as they saw with the old kernel. > Since detach_signal() actually sends the signal, ptrace(DETACH, SIGKILL) > really works: it wakes up SIGNAL_STOP_STOPPED tracee. Today's manifest behavior (via the bogus wake_up_process) is that this SIGKILL will get through and take effect (in signal stops, as above). So that should stay. As to the theory of things, this is the flip side of the whole SIGNAL_STOP_DEQUEUED issue I just posted about earlier. Whereas only an earlier SIGCONT/SIGKILL would have cleared stop signals and thus should clear any "pending-equivalent" "half-delivered" stop signals, earlier stop signals would have cleared a pending SIGCONT. I'm not entirely sure what should happen in all the SIGCONT cases, but we can probably enumerate them and hash those out. I doubt anybody is depending much on the corners of that. For the SIGKILL cases, it's more clear. No matter what, if a SIGKILL doesn't just get dropped on the floor (non-signal stop cases), then it's going to override everything else and soon. > Lets discuss how we can implement detach to be compatible... > We already discussed detached_ops. So, we either change engine->ops > (my choice) or add the new engine with ->ops = detached_ops and detach > the ptrace engine. I don't recall all of the earlier discussion about this in detail. But yes, they are equivalent and I think a fresh engine is cleaner. > We need detached_report_signal() callback which > fixups siginfo if signr != info->si_signo and returns > UTRACE_DETACH | UTRACE_SIGNAL_DELIVER. Right, ptrace_report_signal does that now with ->resume = UTRACE_DETACH. > How can detached_report_signal() set info->si_pid correctly? the > tracer has gone away. This probably means this info should be prepared > during detach, the tracer should fill context->siginfo. It means that if the tracer cared at all, he did PTRACE_SETSIGINFO so this would be moot. The vanilla kernel today uses si_pid and si_uid values of what's either the new tracer (with racing PTRACE_ATTACH) or the real parent at the time of actual resumption. It can be any of those values or 0 and I think it's fine. > But the real problem, I can't see how can we teach ptrace_report_signal() > to not play with parting signals. Do you have any idea? perhaps we can > introduce UTRACE_SIGNAL_DELIVER_I_AM_DETACHED_ENGINE, but this is ugly. I'm not sure I'm following you. Do you mean the new attach after the detach? Indeed, we don't want this to see the detached engine's action as "incoming state" and report it again. I guess I would arrange that you never have them both attached and active (i.e. events set). So, attach the new engine, then look up for any existing detached engine. If you find one, you can do one of two things. 1. Take over the old engine's parting signal and mark it in your own context. Then detach the old engine. You set the context to the same state it's in after you have resumed for PTRACE_CONT,signr. Your report_signal callback can always get UTRACE_SIGNAL_REPORT callbacks in that state due to other engines stopping and resuming, so it should already have logic for that state to deliver the signal. 2. Just mark a special state in your context saying that you should do nothing on the next UTRACE_SIGNAL_DELIVER report you see. The old engine will be earlier on the list and run first, getting UTRACE_SIGNAL_REPORT and returning UTRACE_SIGNAL_DELIVER|UTRACE_DETACH. Then your engine runs later, and knows to let this one signal pass through, resetting its context for the next real signal report. > Oh, I'd like to avoid detached_ops. Perhaps the new attach can re-use > the old engine/context? Both can be basically equivalent. I'm not clear on why you prefer that. > ptrace(DETACH, SIGNR) does: > > context->detached = true; // checked in ptrace_report_signal() > return utrace_control(INTERRUPT); > > Now, how ptrace(ATTACH) can clear ->detached and re-use this engine without > utrace->lock ? IOW, will you agree to abuse utrace->lock for this? That sounds insane. > OR. Perhaps we can introduce UTRACE_ATTACH_MATCH_FILTER_DATA ? This sounds more insane. I don't understand why you wouldn't just do a normal MATCH_OPS lookup and then look at the data structure. You could even do CREATE|MATCH_OPS without EXCLUSIVE, and then if (engine->data != ctx) drop your fresh ctx and use ctx=engine->data; use it if ctx->detached and -EPERM if not. > if (!IS_ERR(engine)) { > // We re-used the old engine, but perhaps it is > // going to return UTRACE_DETACH ? > utrace_barrier(engine); > if (engine->ops == utrace_detached_ops) > engine = -EANY; That's what the return value from utrace_barrier is for. > What do you think? I think the code you sent is better than some of these ideas. But it's still not real coherent to me. I feel like you are worrying about particular problems in contorting to some of this really strange hair, but I haven't become clear on what those problems are so as to understand what clean solutions to them might be. Thanks, Roland