Re: strace lockup when tracing exec in go

2016-09-23 Thread Michal Hocko
On Fri 23-09-16 16:07:24, Oleg Nesterov wrote:
> On 09/23, Michal Hocko wrote:
> >
> > On Fri 23-09-16 15:21:02, Oleg Nesterov wrote:
> > >
> > > This change is simply wrong no matter what.
> >
> > I've just tried to extend the existing
> >
> > /*
> >  * Tracers may want to know about even ignored signals.
> >  */
> > return !t->ptrace;
> >
> > but I probably just do not understand what that actually means. I
> > thought that the tracer is _really_ interested in hearing about the
> > signal.
> 
> Yes, the tracer is really interested to know that a signal was sent to
> the _tracee_, not the tracer ;)

OK, now it makes more sense. I was really scratching my head to
understand this part...
 
> > > We could change do_notify_parent()
> > > to call signal_wake_up() if tsk->ptrace, but see above, this won't help.
> >
> > So does this mean WONTFIX? Can we at least document this behavior? It
> > surely is unexpected.
> 
> No, no, no. Of course this must be fixed. The only problem is that I still
> do not know what should we do. I'll try to return to this problem next week.
> I'm afraid we will need to change de_thread() to wait until all other sub-
> threads have passed exit_notify() or even exit_signals(),

Yes making de_thread completely independent on the state of the tracer
would be a huge improvement. While playing with this test case I
triggered some other interesting hangs (e.g. strace hanging in tty while
trying to print something). So just few interruptible waits is not a
full solution.

> but ooh I don't
> like this. Plus in this case we will need to finally define what
> PTRACE_EVENT_EXIT should actually do.

OK, considering this has been broken for quite some time I do not think
we are in hurry. I am slightly worried about how such a solution would
be stable kernel safe but ohh well.
-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Oleg Nesterov
On 09/23, Michal Hocko wrote:
>
> On Fri 23-09-16 15:21:02, Oleg Nesterov wrote:
> >
> > This change is simply wrong no matter what.
>
> I've just tried to extend the existing
>
>   /*
>* Tracers may want to know about even ignored signals.
>*/
>   return !t->ptrace;
>
> but I probably just do not understand what that actually means. I
> thought that the tracer is _really_ interested in hearing about the
> signal.

Yes, the tracer is really interested to know that a signal was sent to
the _tracee_, not the tracer ;)

>
> > We could change do_notify_parent()
> > to call signal_wake_up() if tsk->ptrace, but see above, this won't help.
>
> So does this mean WONTFIX? Can we at least document this behavior? It
> surely is unexpected.

No, no, no. Of course this must be fixed. The only problem is that I still
do not know what should we do. I'll try to return to this problem next week.
I'm afraid we will need to change de_thread() to wait until all other sub-
threads have passed exit_notify() or even exit_signals(), but ooh I don't
like this. Plus in this case we will need to finally define what
PTRACE_EVENT_EXIT should actually do.

Oleg.


--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Michal Hocko
On Fri 23-09-16 15:21:02, Oleg Nesterov wrote:
> On 09/23, Michal Hocko wrote:
> >
> > On Fri 23-09-16 12:21:41, Oleg Nesterov wrote:
> > > On 09/22, Michal Hocko wrote:
> > > >
> > > > --- a/kernel/signal.c
> > > > +++ b/kernel/signal.c
> > > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int 
> > > > sig, bool force)
> > > > if (!sig_task_ignored(t, sig, force))
> > > > return 0;
> > > >
> > > > +   /* Do not ignore signals sent from child to the parent */
> > > > +   if (current->ptrace && current->parent == t)
> > > > +   return 0;
> > >
> > > This doesn't look right in general, and this can't really help.
> > >
> > > This assumes that the tracer will call do_wait() after mm_access()
> > > fails, but this is not necessarily true.
> > >
> > > Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't
> > > even return to user-space if SIGCHLD is ignored, the tracer will silently
> > > restart the syscall.
> >
> > Well, it apparently does help the strace case.
> 
> Only because strace doesn't even try to handle -EINTR; it assumes this is not
> possible, gives up, and calls wait() after that. So this change actually
> breaks strace.

Hmm, OK. In this case process_vm_readv wouldn't give a usuful data which
still sounds better to me than a hang.

> And once again, this can't really help. SIGCHLD can come before strace calls
> process_vm_readv(), and in this case it will enter the syscall without
> signal_pending() == T. IOW, this hack can only help if the tracer already
> sleeps in process_vm_readv().

True.

> Plus, again, "strace -f" can equally hang if mt-exec races with PTRACE_ATTACH.
> 
> > So I am not arguing this
> > is the best fix but can it be harmful?
> 
> This change is simply wrong no matter what.

I've just tried to extend the existing

/*
 * Tracers may want to know about even ignored signals.
 */
return !t->ptrace;

but I probably just do not understand what that actually means. I
thought that the tracer is _really_ interested in hearing about the
signal.

> We could change do_notify_parent()
> to call signal_wake_up() if tsk->ptrace, but see above, this won't help.

So does this mean WONTFIX? Can we at least document this behavior? It
surely is unexpected.

-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Oleg Nesterov
On 09/23, Michal Hocko wrote:
>
> On Fri 23-09-16 12:21:41, Oleg Nesterov wrote:
> > On 09/22, Michal Hocko wrote:
> > >
> > > --- a/kernel/signal.c
> > > +++ b/kernel/signal.c
> > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, 
> > > bool force)
> > >   if (!sig_task_ignored(t, sig, force))
> > >   return 0;
> > >
> > > + /* Do not ignore signals sent from child to the parent */
> > > + if (current->ptrace && current->parent == t)
> > > + return 0;
> >
> > This doesn't look right in general, and this can't really help.
> >
> > This assumes that the tracer will call do_wait() after mm_access()
> > fails, but this is not necessarily true.
> >
> > Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't
> > even return to user-space if SIGCHLD is ignored, the tracer will silently
> > restart the syscall.
>
> Well, it apparently does help the strace case.

Only because strace doesn't even try to handle -EINTR; it assumes this is not
possible, gives up, and calls wait() after that. So this change actually
breaks strace.

And once again, this can't really help. SIGCHLD can come before strace calls
process_vm_readv(), and in this case it will enter the syscall without
signal_pending() == T. IOW, this hack can only help if the tracer already
sleeps in process_vm_readv().

Plus, again, "strace -f" can equally hang if mt-exec races with PTRACE_ATTACH.

> So I am not arguing this
> is the best fix but can it be harmful?

This change is simply wrong no matter what. We could change do_notify_parent()
to call signal_wake_up() if tsk->ptrace, but see above, this won't help.

Oleg.


--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Oleg Nesterov
On 09/22, Michal Hocko wrote:
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, 
> bool force)
>   if (!sig_task_ignored(t, sig, force))
>   return 0;
>  
> + /* Do not ignore signals sent from child to the parent */
> + if (current->ptrace && current->parent == t)
> + return 0;

This doesn't look right in general, and this can't really help.

This assumes that the tracer will call do_wait() after mm_access()
fails, but this is not necessarily true.

Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't
even return to user-space if SIGCHLD is ignored, the tracer will silently
restart the syscall.

Oleg.


--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Oleg Nesterov
Sorry for delay, I was offline. I'll try to return to this problem next
week, currently I can't even read this thread but at first glance the
proposed patch(es) do not look right...

On 09/21, Michal Hocko wrote:
>
> The further investigation shown that the tracer (strace) is stuck
> waiting for cred_guard_mutex
> [<>] mm_access+0x22/0xa0
> [<>] process_vm_rw_core.isra.1+0x112/0x6c0
> [<>] process_vm_rw+0xab/0x110
> [<>] SyS_process_vm_readv+0x15/0x20
> [<>] system_call_fastpath+0x16/0x1b
> [<>] 0x7f186f031c3a
> [<>] 0x
...
> this however doesn't happen because both threads which are dead
> are zombies waiting to be reaped by the parent and to call their
> release_task->__exit_signal.

Yes, I know, this was already reported. And so far I do not see any
reasonable fix. I _think_ that the "real" fix should rework the
security_bprm_* helpers, but unlikely this is possible.

The trivial test-case:

void *thread(void *arg)
{
ptrace(PTRACE_TRACEME, 0,0,0);
return NULL;
}

int main(void)
{
int pid = fork();

if (!pid) {
pthread_t pt;
pthread_create(, NULL, thread, NULL);
pthread_join(pt, NULL);
execlp("echo", "echo", "passed", NULL);
}

sleep(1);
// or anything else which needs ->cred_guard_mutex,
// say open(/proc/$pid/mem)
ptrace(PTRACE_ATTACH, pid, 0,0);
kill(pid, SIGCONT);

return 0;
}

Oleg.


--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Michal Hocko
On Fri 23-09-16 11:50:32, Oleg Nesterov wrote:
> Sorry for delay, I was offline. I'll try to return to this problem next
> week, currently I can't even read this thread but at first glance the
> proposed patch(es) do not look right...
> 
> On 09/21, Michal Hocko wrote:
> >
> > The further investigation shown that the tracer (strace) is stuck
> > waiting for cred_guard_mutex
> > [<>] mm_access+0x22/0xa0
> > [<>] process_vm_rw_core.isra.1+0x112/0x6c0
> > [<>] process_vm_rw+0xab/0x110
> > [<>] SyS_process_vm_readv+0x15/0x20
> > [<>] system_call_fastpath+0x16/0x1b
> > [<>] 0x7f186f031c3a
> > [<>] 0x
> ...
> > this however doesn't happen because both threads which are dead
> > are zombies waiting to be reaped by the parent and to call their
> > release_task->__exit_signal.
> 
> Yes, I know, this was already reported. And so far I do not see any
> reasonable fix. I _think_ that the "real" fix should rework the
> security_bprm_* helpers, but unlikely this is possible.

Rework them to not rely on the cred_guard_mutex?  Is there any way to
workaround this in the strace code?

> The trivial test-case:
> 
>   void *thread(void *arg)
>   {
>   ptrace(PTRACE_TRACEME, 0,0,0);
>   return NULL;
>   }
> 
>   int main(void)
>   {
>   int pid = fork();
> 
>   if (!pid) {
>   pthread_t pt;
>   pthread_create(, NULL, thread, NULL);
>   pthread_join(pt, NULL);
>   execlp("echo", "echo", "passed", NULL);
>   }
> 
>   sleep(1);
>   // or anything else which needs ->cred_guard_mutex,
>   // say open(/proc/$pid/mem)
>   ptrace(PTRACE_ATTACH, pid, 0,0);
>   kill(pid, SIGCONT);
> 
>   return 0;
>   }
> 
> Oleg.

-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-23 Thread Michal Hocko
On Fri 23-09-16 12:21:41, Oleg Nesterov wrote:
> On 09/22, Michal Hocko wrote:
> >
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, 
> > bool force)
> > if (!sig_task_ignored(t, sig, force))
> > return 0;
> >  
> > +   /* Do not ignore signals sent from child to the parent */
> > +   if (current->ptrace && current->parent == t)
> > +   return 0;
> 
> This doesn't look right in general, and this can't really help.
> 
> This assumes that the tracer will call do_wait() after mm_access()
> fails, but this is not necessarily true.
> 
> Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't
> even return to user-space if SIGCHLD is ignored, the tracer will silently
> restart the syscall.

Well, it apparently does help the strace case. So I am not arguing this
is the best fix but can it be harmful?
-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Aleksa Sarai
> This patch doesn't help, nor does the previous patch... but with both
> applied, all is well.  All you have to do now is figure out why :)

 Ohh, I should be more explicit, this needs the mm_access part as well.
 Sorry for not being clear enough. So the full change is
>>>
>>> Ah.  That was gonna happen after lunch, but since you already know it
>>> works, I can get back to un-b0rking one of my trees.
>>
>> Yeah, it should work. The testcase is running in a loop for more than
>> hour already and everything seems to be ok. Now the question is whether
>> the fix is really correct which is something for Oleg.
>>
>> Also I think there is at least one more issue lurking there. Without or
>> without the patch I can make tracer get stuck in do_wait waiting for
>> task which doesn't seem to be alive anymore. I will report it separately
>> after this one gets resolved to not pull more confusion in.
>
> OK, the test in the loop has survived 3h of runtime without a single
> lockup so the patch seems to be working for this case. I am posting the
> patch with the full changelog, let's see if it is correct as well. As
> I've said earlier this might be a strace bug which does an unexpected
> syscall while it should be doing wait on the child process instead.
>
> If the patch is correct then I would mark it for stable as well.

This patch fixes the test case, but it also fixes the original issue 
(that strace of runC would fail when the container init process is 
exec-ing the user init process). Thanks Michal, I'll apply it to my 
local kernel. :D

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Michal Hocko
On Thu 22-09-16 13:09:25, Michal Hocko wrote:
> On Thu 22-09-16 12:09:05, Mike Galbraith wrote:
> > On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote:
> > > On Thu 22-09-16 11:40:09, Mike Galbraith wrote:
> > 
> > > > This patch doesn't help, nor does the previous patch... but with both
> > > > applied, all is well.  All you have to do now is figure out why :)
> > > 
> > > Ohh, I should be more explicit, this needs the mm_access part as well.
> > > Sorry for not being clear enough. So the full change is
> > 
> > Ah.  That was gonna happen after lunch, but since you already know it
> > works, I can get back to un-b0rking one of my trees.
> 
> Yeah, it should work. The testcase is running in a loop for more than
> hour already and everything seems to be ok. Now the question is whether
> the fix is really correct which is something for Oleg.
> 
> Also I think there is at least one more issue lurking there. Without or
> without the patch I can make tracer get stuck in do_wait waiting for
> task which doesn't seem to be alive anymore. I will report it separately
> after this one gets resolved to not pull more confusion in.

OK, the test in the loop has survived 3h of runtime without a single
lockup so the patch seems to be working for this case. I am posting the
patch with the full changelog, let's see if it is correct as well. As
I've said earlier this might be a strace bug which does an unexpected
syscall while it should be doing wait on the child process instead.

If the patch is correct then I would mark it for stable as well.
---
>From fe82d463fd2ef1585d2c37bf9fa6a1761e6ee0e5 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Thu, 22 Sep 2016 10:09:34 +0200
Subject: [PATCH] signal: always signal tracer from the ptraced task

Aleksa has reported the following lockup when stracing the following go
program

% cat exec.go
package main

import (
"os"
"syscall"
)

func main() {
syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
}
$ go version
go version go1.6.3 linux/amd64
$ go build -o exec exec.go

$ strace -f ./exec
[snip]
[pid 10349] select(0, NULL, NULL, NULL, {0, 100} 
[pid 10346] <... select resumed> )  = 0 (Timeout)
[pid 10346] select(0, NULL, NULL, NULL, {0, 20} 
[pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] 

[pid 10346] <... select resumed> )  = 0 (Timeout)
[pid 10349] <... select resumed> )  = 0 (Timeout)

execve will never finish unless the strace process get killed with
SIGKILL. The reason is the following deadlock

tracer  thread_A
thread_$N
SyS_process_vm_readvSyS_execve  do_exit
  do_execveat_common  
exit_notify
prepare_bprm_creds  
do_notify_parent
  mutex_lock(cred_guard_mutex)
__group_send_sig_info
search_binary_handler   
send_signal
  load_elf_binary   
  prepare_signal -> fail SIGHCHLD is SGL_DFL
flush_old_exec
  # wait for sig->notify_count
  process_vm_rw
process_vm_rw_core
  mm_access
mutex_lock_killable(cred_guard_mutex)

So there seems to be 2 issues here. The first one is that exiting
threads (because of the ongoing exec) are not sending SIGCHLD signal
to the tracer but they rely on the tracer to reap them and call
release_task->__exit_signal which in turn would wake up the thread_A and
move on with the exec. The other part of the story is that the tracer
is not in do_wait but rather calls process_vm_readv (presumably to get
arguments of the syscall) and it waits for a lock in killable rather
than interruptible sleep.

The fix is therefore twofold. We want to teach mm_access to sleep in
interruptible sleep and we want to make sure that the traced child
will send the signal to the parent even when it is ignored or SIG_DFL.
sig_ignored already seems to be doing something along that line except
it doesn't handle when a traced child sends a signal to the tracer.
Fix this by checking the current ptrace status and whether the target
task is the tracer.

Reported-by: Aleksa Sarai 
Signed-off-by: Michal Hocko 
---
 kernel/fork.c   | 2 +-
 kernel/signal.c | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5a57b9bab85c..d5b7c3aea187 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, 
unsigned int mode)
struct mm_struct *mm;
int err;
 
-   err =  mutex_lock_killable(>signal->cred_guard_mutex);
+   err =  mutex_lock_interruptible(>signal->cred_guard_mutex);

Re: strace lockup when tracing exec in go

2016-09-22 Thread Michal Hocko
On Thu 22-09-16 12:09:05, Mike Galbraith wrote:
> On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote:
> > On Thu 22-09-16 11:40:09, Mike Galbraith wrote:
> 
> > > This patch doesn't help, nor does the previous patch... but with both
> > > applied, all is well.  All you have to do now is figure out why :)
> > 
> > Ohh, I should be more explicit, this needs the mm_access part as well.
> > Sorry for not being clear enough. So the full change is
> 
> Ah.  That was gonna happen after lunch, but since you already know it
> works, I can get back to un-b0rking one of my trees.

Yeah, it should work. The testcase is running in a loop for more than
hour already and everything seems to be ok. Now the question is whether
the fix is really correct which is something for Oleg.

Also I think there is at least one more issue lurking there. Without or
without the patch I can make tracer get stuck in do_wait waiting for
task which doesn't seem to be alive anymore. I will report it separately
after this one gets resolved to not pull more confusion in.

Anyway, thanks for your time Mike!
-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Mike Galbraith
On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote:
> On Thu 22-09-16 11:40:09, Mike Galbraith wrote:

> > This patch doesn't help, nor does the previous patch... but with both
> > applied, all is well.  All you have to do now is figure out why :)
> 
> Ohh, I should be more explicit, this needs the mm_access part as well.
> Sorry for not being clear enough. So the full change is

Ah.  That was gonna happen after lunch, but since you already know it
works, I can get back to un-b0rking one of my trees.

-Mike

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Mike Galbraith
On Wed, 2016-09-21 at 17:29 +0200, Michal Hocko wrote:
> [I am CCing strace mailing list because even if this turns out to be a
>  kernel bug strace seems to be doing something unexpected - more on that
>  below]
> 
> Hi,
> Aleksa has reported the following lockup when stracing the following go
> program
> 
> % cat exec.go
> package main
> 
> import (
> "os"
> "syscall"
> )
> 
> func main() {
> syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
> }
> $ go version
> go version go1.6.3 linux/amd64
> $ go build -o exec exec.go
> 
> $ strace -f ./exec
> [snip]
> [pid 10349] select(0, NULL, NULL, NULL, {0, 100} 
> [pid 10346] <... select resumed> )  = 0 (Timeout)
> [pid 10346] select(0, NULL, NULL, NULL, {0, 20} 
> [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] 
> [pid 10346] <... select resumed> )  = 0 (Timeout)
> [pid 10349] <... select resumed> )  = 0 (Timeout)
> 
> execve will never finish unless the strace process get killed with
> SIGKILL. The following was debugged on 3.12 kernel and the current
> kernel seems to not trigger the issue as easily but I believe the same
> problem is there as well.
> 
> The further investigation shown that the tracer (strace) is stuck
> waiting for cred_guard_mutex
> [<>] mm_access+0x22/0xa0
> [<>] process_vm_rw_core.isra.1+0x112/0x6c0
> [<>] process_vm_rw+0xab/0x110
> [<>] SyS_process_vm_readv+0x15/0x20
> [<>] system_call_fastpath+0x16/0x1b
> [<>] 0x7f186f031c3a
> [<>] 0x
> 
> while the traced process (3 threads) are trying to perform the exec.
> That means that 2 threads are dead (in fact zombies) waiting in their
> final schedule.
> Call Trace:
>  [] do_exit+0x6f7/0xa70
>  [] do_group_exit+0x39/0xa0
>  [] get_signal_to_deliver+0x1e8/0x5c0
>  [] do_signal+0x42/0x670
>  [] do_notify_resume+0x78/0xc0
>  [] int_signal+0x12/0x17
>  [<7f3a33f3ffb9>] 0x7f3a33f3ffb8
> 
> and one is
> Call Trace:
>  [] flush_old_exec+0xdf/0x890
>  [] load_elf_binary+0x307/0xda0
>  [] search_binary_handler+0xae/0x1f0
>  [] do_execve_common.isra.26+0x64e/0x810
>  [] SyS_execve+0x31/0x50
>  [] stub_execve+0x69/0xa0
>  [<7f3a33f16527>] 0x7f3a33f16526
> 
> waiting for notify_count to drop down to 0
> while (sig->notify_count) {
> __set_current_state(TASK_KILLABLE);
> spin_unlock_irq(lock);
> schedule(); <
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> spin_lock_irq(lock);
> }
> 
> this however doesn't happen because both threads which are dead
> are zombies waiting to be reaped by the parent and to call their
> release_task->__exit_signal. The tracer is blocked on the lock held by
> exec (in prepare_bprm_creds). This is the case in the current kernel as
> well AFAICS so the same should be possible as well. So is this a bug or
> something somewhere else makes sure that this will not happen in newer
> kernels?

master.today...

crash> bt 5138
PID: 5138   TASK: 8803fb4e4c80  CPU: 6   COMMAND: "strace"
 #0 [88038e323b68] __schedule at 81624660
 #1 [88038e323bb0] schedule at 81624b35
 #2 [88038e323bc8] schedule_preempt_disabled at 81624e0e
 #3 [88038e323bd8] __mutex_lock_killable_slowpath at 81625f9d
 #4 [88038e323c30] mutex_lock_killable at 8162605a
 #5 [88038e323c48] mm_access at 8105bbf7
 #6 [88038e323c70] process_vm_rw_core at 811823d3
 #7 [88038e323dc0] process_vm_rw at 8118287d
 #8 [88038e323f38] sys_process_vm_readv at 811829b9
 #9 [88038e323f50] entry_SYSCALL_64_fastpath at 81628736
RIP: 7faee6b2ac3a  RSP: 7ffdc95b62b8  RFLAGS: 0246
RAX: ffda  RBX: 019e7030  RCX: 7faee6b2ac3a
RDX: 0001  RSI: 7ffdc95b62d0  RDI: 1418
RBP: 1414   R8: 0001   R9: 
R10: 7ffdc95b62e0  R11: 0246  R12: 7faee73e66c0
R13:   R14:   R15: 
ORIG_RAX: 0136  CS: 0033  SS: 002b
crash> bt 5140
PID: 5140   TASK: 8803e703b300  CPU: 4   COMMAND: "exec"
 #0 [88038e0b3ce8] __schedule at 81624660
 #1 [88038e0b3d30] schedule at 81624b35
 #2 [88038e0b3d48] flush_old_exec at 811b1f26
 #3 [88038e0b3d88] load_elf_binary at 811fd70f
 #4 [88038e0b3e60] search_binary_handler at 811b13ac
 #5 [88038e0b3e98] do_execveat_common at 811b2f8f
 #6 [88038e0b3f00] sys_execve at 811b33ba
 #7 [88038e0b3f28] do_syscall_64 at 8100194e
RIP: 0045c1ef  RSP: 00c820043e00  RFLAGS: 0246
RAX: ffda  RBX: 00c82000e6c0  RCX: 0045c1ef
RDX: 

Re: strace lockup when tracing exec in go

2016-09-22 Thread Mike Galbraith
On Thu, 2016-09-22 at 10:36 +0200, Michal Hocko wrote:
> On Thu 22-09-16 10:01:26, Michal Hocko wrote:
> > On Thu 22-09-16 06:15:02, Mike Galbraith wrote:
> > [...]
> > > master.today...
> > 
> > Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads
> > per core) cannot reproduce even in 10 minutes or so. I've tried to use
> > the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per
> > soc. and 2 threas per core) and it hit almost instantly. I have tried 
> > mutex_lock_killable -> interruptible and it didn't help as I've
> > expected. So the current kernel doesn't do any magic to prevent from the
> > issue as well.
> > 
> > So I've stared into do_notify_parent some more and the following was
> > just very confusing
> > 
> > > > if (!tsk->ptrace && sig == SIGCHLD &&
> > > > (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
> > > >  (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
> > > > > > /*
> > > > > >  * We are exiting and our parent doesn't care.  POSIX.1
> > > > > >  * defines special semantics for setting SIGCHLD to SIG_IGN
> > > > > >  * or setting the SA_NOCLDWAIT flag: we should be reaped
> > > > > >  * automatically and not left for our parent's wait4 call.
> > > > > >  * Rather than having the parent do it as a magic kind of
> > > > > >  * signal handler, we just set this to tell do_exit that we
> > > > > >  * can be cleaned up without becoming a zombie.  Note that
> > > > > >  * we still call __wake_up_parent in this case, because a
> > > > > >  * blocked sys_wait4 might now return -ECHILD.
> > > > > >  *
> > > > > >  * Whether we send SIGCHLD or not for SA_NOCLDWAIT
> > > > > >  * is implementation-defined: we do (if you don't want
> > > > > >  * it, just use SIG_IGN instead).
> > > > > >  */
> > > > > > autoreap = true;
> > > > > > if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> > > > > > > > sig = 0;
> > > > }
> > 
> > it tries to prevent from what I am seeing in a way. If the SIGCHLD is
> > ignored then it just does autoreap and everything is fine. But this
> > doesn't seem to be the case here. In fact we are not sending the signal
> > because sig_task_ignored is true resp. sig_handler_ignored which can
> > fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD
> > seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried
> 
> Dohh, I've missed !tsk->ptrace check there. So we are not even going
> that via that path. So the sig_handler_ignored cannot possible help.
> I was just too lucky and didn't hit the lockup with the patch.
> 
> So what else might be wrong here? sig_ignored seems to be quite
> confusing
> 
>   > /*
>   >  * Tracers may want to know about even ignored signals.
>   >  */
>   > return !t->ptrace;
> 
> t is the tracer here but it shouldn't have t->ptrace because the child
> is not stopped. So do we need something like the following? Not tested
> yet
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1840c7f4e3c2..bd236ce4a29c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, 
> bool force)
>  >> if (!sig_task_ignored(t, sig, force))
>  >>   > return 0;
>  
> +>> /* Do not ignore signals sent from child to the parent */
> +>> if (current->ptrace && current->parent == t)
> +>>   > return 0;
> +
>  >> /*
>  >>  * Tracers may want to know about even ignored signals.
>  >>  */

This patch doesn't help, nor does the previous patch... but with both
applied, all is well.  All you have to do now is figure out why :)

-Mike


--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Michal Hocko
On Thu 22-09-16 11:40:09, Mike Galbraith wrote:
> On Thu, 2016-09-22 at 10:36 +0200, Michal Hocko wrote:
> > On Thu 22-09-16 10:01:26, Michal Hocko wrote:
> > > On Thu 22-09-16 06:15:02, Mike Galbraith wrote:
> > > [...]
> > > > master.today...
> > > 
> > > Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads
> > > per core) cannot reproduce even in 10 minutes or so. I've tried to use
> > > the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per
> > > soc. and 2 threas per core) and it hit almost instantly. I have tried 
> > > mutex_lock_killable -> interruptible and it didn't help as I've
> > > expected. So the current kernel doesn't do any magic to prevent from the
> > > issue as well.
> > > 
> > > So I've stared into do_notify_parent some more and the following was
> > > just very confusing
> > > 
> > >   > > if (!tsk->ptrace && sig == SIGCHLD &&
> > >   > > (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
> > >   > >  (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
> > >   > > > > /*
> > >   > > > >  * We are exiting and our parent doesn't care.  POSIX.1
> > >   > > > >  * defines special semantics for setting SIGCHLD to SIG_IGN
> > >   > > > >  * or setting the SA_NOCLDWAIT flag: we should be reaped
> > >   > > > >  * automatically and not left for our parent's wait4 call.
> > >   > > > >  * Rather than having the parent do it as a magic kind of
> > >   > > > >  * signal handler, we just set this to tell do_exit that we
> > >   > > > >  * can be cleaned up without becoming a zombie.  Note that
> > >   > > > >  * we still call __wake_up_parent in this case, because a
> > >   > > > >  * blocked sys_wait4 might now return -ECHILD.
> > >   > > > >  *
> > >   > > > >  * Whether we send SIGCHLD or not for SA_NOCLDWAIT
> > >   > > > >  * is implementation-defined: we do (if you don't want
> > >   > > > >  * it, just use SIG_IGN instead).
> > >   > > > >  */
> > >   > > > > autoreap = true;
> > >   > > > > if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> > >   > > > > > > sig = 0;
> > >   > > }
> > > 
> > > it tries to prevent from what I am seeing in a way. If the SIGCHLD is
> > > ignored then it just does autoreap and everything is fine. But this
> > > doesn't seem to be the case here. In fact we are not sending the signal
> > > because sig_task_ignored is true resp. sig_handler_ignored which can
> > > fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD
> > > seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried
> > 
> > Dohh, I've missed !tsk->ptrace check there. So we are not even going
> > that via that path. So the sig_handler_ignored cannot possible help.
> > I was just too lucky and didn't hit the lockup with the patch.
> > 
> > So what else might be wrong here? sig_ignored seems to be quite
> > confusing
> > 
> > > /*
> > >  * Tracers may want to know about even ignored signals.
> > >  */
> > > return !t->ptrace;
> > 
> > t is the tracer here but it shouldn't have t->ptrace because the child
> > is not stopped. So do we need something like the following? Not tested
> > yet
> > 
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 1840c7f4e3c2..bd236ce4a29c 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, 
> > bool force)
> >  >  > if (!sig_task_ignored(t, sig, force))
> >  >  >   > return 0;
> >  
> > +>  > /* Do not ignore signals sent from child to the parent */
> > +>  > if (current->ptrace && current->parent == t)
> > +>  >   > return 0;
> > +
> >  >  > /*
> >  >  >  * Tracers may want to know about even ignored signals.
> >  >  >  */
> 
> This patch doesn't help, nor does the previous patch... but with both
> applied, all is well.  All you have to do now is figure out why :)

Ohh, I should be more explicit, this needs the mm_access part as well.
Sorry for not being clear enough. So the full change is
---
diff --git a/kernel/fork.c b/kernel/fork.c
index 5a57b9bab85c..d5b7c3aea187 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, 
unsigned int mode)
struct mm_struct *mm;
int err;
 
-   err =  mutex_lock_killable(>signal->cred_guard_mutex);
+   err =  mutex_lock_interruptible(>signal->cred_guard_mutex);
if (err)
return ERR_PTR(err);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 96e9bc40667f..5c8b84b76f0b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, bool 
force)
if (!sig_task_ignored(t, sig, force))
return 0;
 
+   /* Do not ignore signals sent from child to the parent */
+   if (current->ptrace && current->parent == t)
+   return 0;
+
/*
 * Tracers may want to know 

Re: strace lockup when tracing exec in go

2016-09-22 Thread Michal Hocko
On Thu 22-09-16 10:01:26, Michal Hocko wrote:
> On Thu 22-09-16 06:15:02, Mike Galbraith wrote:
> [...]
> > master.today...
> 
> Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads
> per core) cannot reproduce even in 10 minutes or so. I've tried to use
> the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per
> soc. and 2 threas per core) and it hit almost instantly. I have tried 
> mutex_lock_killable -> interruptible and it didn't help as I've
> expected. So the current kernel doesn't do any magic to prevent from the
> issue as well.
> 
> So I've stared into do_notify_parent some more and the following was
> just very confusing
> 
>   if (!tsk->ptrace && sig == SIGCHLD &&
>   (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
>(psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
>   /*
>* We are exiting and our parent doesn't care.  POSIX.1
>* defines special semantics for setting SIGCHLD to SIG_IGN
>* or setting the SA_NOCLDWAIT flag: we should be reaped
>* automatically and not left for our parent's wait4 call.
>* Rather than having the parent do it as a magic kind of
>* signal handler, we just set this to tell do_exit that we
>* can be cleaned up without becoming a zombie.  Note that
>* we still call __wake_up_parent in this case, because a
>* blocked sys_wait4 might now return -ECHILD.
>*
>* Whether we send SIGCHLD or not for SA_NOCLDWAIT
>* is implementation-defined: we do (if you don't want
>* it, just use SIG_IGN instead).
>*/
>   autoreap = true;
>   if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>   sig = 0;
>   }
> 
> it tries to prevent from what I am seeing in a way. If the SIGCHLD is
> ignored then it just does autoreap and everything is fine. But this
> doesn't seem to be the case here. In fact we are not sending the signal
> because sig_task_ignored is true resp. sig_handler_ignored which can
> fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD
> seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried

Dohh, I've missed !tsk->ptrace check there. So we are not even going
that via that path. So the sig_handler_ignored cannot possible help.
I was just too lucky and didn't hit the lockup with the patch.

So what else might be wrong here? sig_ignored seems to be quite
confusing

/*
 * Tracers may want to know about even ignored signals.
 */
return !t->ptrace;

t is the tracer here but it shouldn't have t->ptrace because the child
is not stopped. So do we need something like the following? Not tested
yet

diff --git a/kernel/signal.c b/kernel/signal.c
index 1840c7f4e3c2..bd236ce4a29c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, bool 
force)
if (!sig_task_ignored(t, sig, force))
return 0;
 
+   /* Do not ignore signals sent from child to the parent */
+   if (current->ptrace && current->parent == t)
+   return 0;
+
/*
 * Tracers may want to know about even ignored signals.
 */
-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel


Re: strace lockup when tracing exec in go

2016-09-22 Thread Michal Hocko
On Thu 22-09-16 06:15:02, Mike Galbraith wrote:
[...]
> master.today...

Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads
per core) cannot reproduce even in 10 minutes or so. I've tried to use
the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per
soc. and 2 threas per core) and it hit almost instantly. I have tried 
mutex_lock_killable -> interruptible and it didn't help as I've
expected. So the current kernel doesn't do any magic to prevent from the
issue as well.

So I've stared into do_notify_parent some more and the following was
just very confusing

if (!tsk->ptrace && sig == SIGCHLD &&
(psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
 (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
/*
 * We are exiting and our parent doesn't care.  POSIX.1
 * defines special semantics for setting SIGCHLD to SIG_IGN
 * or setting the SA_NOCLDWAIT flag: we should be reaped
 * automatically and not left for our parent's wait4 call.
 * Rather than having the parent do it as a magic kind of
 * signal handler, we just set this to tell do_exit that we
 * can be cleaned up without becoming a zombie.  Note that
 * we still call __wake_up_parent in this case, because a
 * blocked sys_wait4 might now return -ECHILD.
 *
 * Whether we send SIGCHLD or not for SA_NOCLDWAIT
 * is implementation-defined: we do (if you don't want
 * it, just use SIG_IGN instead).
 */
autoreap = true;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
sig = 0;
}

it tries to prevent from what I am seeing in a way. If the SIGCHLD is
ignored then it just does autoreap and everything is fine. But this
doesn't seem to be the case here. In fact we are not sending the signal
because sig_task_ignored is true resp. sig_handler_ignored which can
fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD
seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried

diff --git a/kernel/fork.c b/kernel/fork.c
index 5a57b9bab85c..d5b7c3aea187 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, 
unsigned int mode)
struct mm_struct *mm;
int err;
 
-   err =  mutex_lock_killable(>signal->cred_guard_mutex);
+   err =  mutex_lock_interruptible(>signal->cred_guard_mutex);
if (err)
return ERR_PTR(err);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 96e9bc40667f..1840c7f4e3c2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1630,7 +1630,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
psig = tsk->parent->sighand;
spin_lock_irqsave(>siglock, flags);
if (!tsk->ptrace && sig == SIGCHLD &&
-   (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
+   (sig_handler_ignored(>action[SIGCHLD-1].sa.sa_handler, 
SIGCHLD) ||
 (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
/*
 * We are exiting and our parent doesn't care.  POSIX.1

And it worked out. Now I do not have any idea why the check is explicit
for SIG_IGN rather than sig_handler_ignored, maybe there is a strong
reason for that. Oleg? I will cook up a full patch if this looks ok.
-- 
Michal Hocko
SUSE Labs

--
___
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel