On 04/21, Roland McGrath wrote: > > > IOW, 2 threads T1 and T2. T2 forks the child C. T1 ptraces C. C dies > > and becomes EXIT_ZOMBIE. It sends the notification to thread-group. > > > > Then, any thread does do_wait(). But since ptrace_reparented() = T > > we don't release C but send the notification again. This doesn't > > look right. > > Technically, I think this really is "right". It just seems screwy because, > well, the whole ptrace+wait interface is indeed screwy. > > T1 is the ptracer, and is not the natural parent. Consider that T1 runs a > piece of code (library, isolated chunk in a giant complex program) that got > just got asked to trace C. It doesn't know anything about C, it just knows > that PTRACE_ATTACH worked on it. So, it expects the usual behavior when it > does waitpid(C) and gets !WIFSTOPPED: automatic detach, notification of the > real parent, and the real parent's waits work. > > Imagine T2 runs another piece of code that forks and waits for that child, > and doesn't know anything else, e.g. it called system(). That code is > isolated in the function, and all it expects of the rest of the (unknown) > code in the process is that any wait calls are waitpid() selecting only a > known child (or are in other threads using __WNOTHREAD, etc.), so nobody > will steal its child. > > These two isolated chunks of code have limiting (and perhaps short-sighted) > assumptions. But things work out just right for them. (Naturally they > have problems if both calls are in the same thread leaving the child alive > in between, but imagine some current application that never does it that way.) > > Now C dies and the sequence is: > > C dies -> wake_up_parent > T1 wakes up, enters wait loop > T2 wakes up, enters wait loop > T1 sees C in wait_task_zombie() -> will report, about to untrace it > T2 sees C in wait_task_zombie() -> task_ptrace(C) still true, skip it > T1 untraces C > T2 blocks again til 2nd wake_up_parent > > If we were to omit the second do_notify_parent() as you suggest, then T2 > stays blocked forever instead of reaping C. > > If we were to change ptrace_reparented() as you contemplate, then even > after some other wakeup, T2 would get -ECHILD. > > Either way, the system call ABI compatibility is broken. > It's just not an option, merits of interface choices aside. > > Note for this case it now works right when both use just __WNOTHREAD, which > a caller "trying to be smart about it" might reasonably do. T1 is seeing C > on its ->ptraced, and T2 is seeing (skipping) C on its ->children list. > When everybody uses __WNOTHREAD, I bet they'd think that ptrace_reparented() > losing that distinction is pretty counterintuitive.
OK, I see. Thanks! Oleg.