On 04/19, Roland McGrath wrote:
>
> > X->real_parent calls do_wait() and gets -ECHLD, because X is EXIT_DEAD
> > and not traced.
>
> I see.  I don't think we should worry especially about fixing this existing
> bug (ancient, as you call it) on its own.  So let's only consider this in
> the context of the new locking and data structure plans, if that is simpler.

Agreed.

> I am convinced that "unless we know for sure the child will be released" is
> a nice invariant to have if we can get it.  That is, EXIT_DEAD can be on
> ->children but always means "you want to ignore it".
>
> The ptrace not-really-reaping case is the only time that rule is ever
> violated, right?  So, we can just try to close that hole.

Afaics, yes.

> The stronger "no EXIT_DEAD on ->children" invariant means that the
> non-ptrace do_wait() has to use a different method than the current xchg()
> to settle races.  I don't see any good reason to perturb that.  (At least,
> let it be a later concern far removed from ptrace work.)

OK, agreed.

> So, how can we do the ptrace_reparented() case better in wait_task_zombie?
> If we just don't touch exit_state, then we don't violate the invariant.
> Then all we need is a new way to settle the races between ptrace_do_wait()
> calls, right?
>
> We can drop tasklist_lock (maybe need get_task_struct?), take ptrace_mutex,
> do ptrace detaching/clear ->ptrace, drop ptrace_mutex.  Then re-take
> tasklist_lock (read_lock) for do_notify_parent and do the auto-reap
> handling.  (There's no reason I see that this juggling shouldn't happen
> right after dropping tasklist_lock, before getrusage/put_user, only
> release_task has to be after.)

Yes, I think this is right. We should untrace first.

But, I still think we should do this fix before introducing ->ptrace_mutex.
OK, we should avoid taking tasklist for writing. Then we should check
ptrace_reparented() first. If it is true get_task_struct, drop taslist,
take it for writing, untrace, etc. Then re-take tasklist for reading
and continue the reaping. And of course, we should re-check the task
every time we take tasklist and return E_GOTO_REPEAT if it was untraced
or released.

On top of this changes, it would be easier to change the locking.


Hmm... looking at the current code in wait_task_zombie() under
"if (traced)", shouldn't we check !same_thread_group(p->real_parent, current)
before do_notify_parent() ?

--- kernel/exit.c
+++ kernel/exit.c
@@ -1290,7 +1290,8 @@ static int wait_task_zombie(struct task_
                 * If it's still not detached after that, don't release
                 * it now.
                 */
-               if (!task_detached(p)) {
+               if (!task_detached(p) &&
+                   !same_thread_group(p->real_parent, current)) {
                        do_notify_parent(p, p->exit_signal);
                        if (!task_detached(p)) {
                                p->exit_state = EXIT_ZOMBIE;

Oleg.

Reply via email to