Re: strace lockup when tracing exec in go
On Fri 23-09-16 16:07:24, Oleg Nesterov wrote: > On 09/23, Michal Hocko wrote: > > > > On Fri 23-09-16 15:21:02, Oleg Nesterov wrote: > > > > > > This change is simply wrong no matter what. > > > > I've just tried to extend the existing > > > > /* > > * Tracers may want to know about even ignored signals. > > */ > > return !t->ptrace; > > > > but I probably just do not understand what that actually means. I > > thought that the tracer is _really_ interested in hearing about the > > signal. > > Yes, the tracer is really interested to know that a signal was sent to > the _tracee_, not the tracer ;) OK, now it makes more sense. I was really scratching my head to understand this part... > > > We could change do_notify_parent() > > > to call signal_wake_up() if tsk->ptrace, but see above, this won't help. > > > > So does this mean WONTFIX? Can we at least document this behavior? It > > surely is unexpected. > > No, no, no. Of course this must be fixed. The only problem is that I still > do not know what should we do. I'll try to return to this problem next week. > I'm afraid we will need to change de_thread() to wait until all other sub- > threads have passed exit_notify() or even exit_signals(), Yes making de_thread completely independent on the state of the tracer would be a huge improvement. While playing with this test case I triggered some other interesting hangs (e.g. strace hanging in tty while trying to print something). So just few interruptible waits is not a full solution. > but ooh I don't > like this. Plus in this case we will need to finally define what > PTRACE_EVENT_EXIT should actually do. OK, considering this has been broken for quite some time I do not think we are in hurry. I am slightly worried about how such a solution would be stable kernel safe but ohh well. -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On 09/23, Michal Hocko wrote: > > On Fri 23-09-16 15:21:02, Oleg Nesterov wrote: > > > > This change is simply wrong no matter what. > > I've just tried to extend the existing > > /* >* Tracers may want to know about even ignored signals. >*/ > return !t->ptrace; > > but I probably just do not understand what that actually means. I > thought that the tracer is _really_ interested in hearing about the > signal. Yes, the tracer is really interested to know that a signal was sent to the _tracee_, not the tracer ;) > > > We could change do_notify_parent() > > to call signal_wake_up() if tsk->ptrace, but see above, this won't help. > > So does this mean WONTFIX? Can we at least document this behavior? It > surely is unexpected. No, no, no. Of course this must be fixed. The only problem is that I still do not know what should we do. I'll try to return to this problem next week. I'm afraid we will need to change de_thread() to wait until all other sub- threads have passed exit_notify() or even exit_signals(), but ooh I don't like this. Plus in this case we will need to finally define what PTRACE_EVENT_EXIT should actually do. Oleg. -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Fri 23-09-16 15:21:02, Oleg Nesterov wrote: > On 09/23, Michal Hocko wrote: > > > > On Fri 23-09-16 12:21:41, Oleg Nesterov wrote: > > > On 09/22, Michal Hocko wrote: > > > > > > > > --- a/kernel/signal.c > > > > +++ b/kernel/signal.c > > > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int > > > > sig, bool force) > > > > if (!sig_task_ignored(t, sig, force)) > > > > return 0; > > > > > > > > + /* Do not ignore signals sent from child to the parent */ > > > > + if (current->ptrace && current->parent == t) > > > > + return 0; > > > > > > This doesn't look right in general, and this can't really help. > > > > > > This assumes that the tracer will call do_wait() after mm_access() > > > fails, but this is not necessarily true. > > > > > > Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't > > > even return to user-space if SIGCHLD is ignored, the tracer will silently > > > restart the syscall. > > > > Well, it apparently does help the strace case. > > Only because strace doesn't even try to handle -EINTR; it assumes this is not > possible, gives up, and calls wait() after that. So this change actually > breaks strace. Hmm, OK. In this case process_vm_readv wouldn't give a usuful data which still sounds better to me than a hang. > And once again, this can't really help. SIGCHLD can come before strace calls > process_vm_readv(), and in this case it will enter the syscall without > signal_pending() == T. IOW, this hack can only help if the tracer already > sleeps in process_vm_readv(). True. > Plus, again, "strace -f" can equally hang if mt-exec races with PTRACE_ATTACH. > > > So I am not arguing this > > is the best fix but can it be harmful? > > This change is simply wrong no matter what. I've just tried to extend the existing /* * Tracers may want to know about even ignored signals. */ return !t->ptrace; but I probably just do not understand what that actually means. I thought that the tracer is _really_ interested in hearing about the signal. > We could change do_notify_parent() > to call signal_wake_up() if tsk->ptrace, but see above, this won't help. So does this mean WONTFIX? Can we at least document this behavior? It surely is unexpected. -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On 09/23, Michal Hocko wrote: > > On Fri 23-09-16 12:21:41, Oleg Nesterov wrote: > > On 09/22, Michal Hocko wrote: > > > > > > --- a/kernel/signal.c > > > +++ b/kernel/signal.c > > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, > > > bool force) > > > if (!sig_task_ignored(t, sig, force)) > > > return 0; > > > > > > + /* Do not ignore signals sent from child to the parent */ > > > + if (current->ptrace && current->parent == t) > > > + return 0; > > > > This doesn't look right in general, and this can't really help. > > > > This assumes that the tracer will call do_wait() after mm_access() > > fails, but this is not necessarily true. > > > > Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't > > even return to user-space if SIGCHLD is ignored, the tracer will silently > > restart the syscall. > > Well, it apparently does help the strace case. Only because strace doesn't even try to handle -EINTR; it assumes this is not possible, gives up, and calls wait() after that. So this change actually breaks strace. And once again, this can't really help. SIGCHLD can come before strace calls process_vm_readv(), and in this case it will enter the syscall without signal_pending() == T. IOW, this hack can only help if the tracer already sleeps in process_vm_readv(). Plus, again, "strace -f" can equally hang if mt-exec races with PTRACE_ATTACH. > So I am not arguing this > is the best fix but can it be harmful? This change is simply wrong no matter what. We could change do_notify_parent() to call signal_wake_up() if tsk->ptrace, but see above, this won't help. Oleg. -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On 09/22, Michal Hocko wrote: > > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, > bool force) > if (!sig_task_ignored(t, sig, force)) > return 0; > > + /* Do not ignore signals sent from child to the parent */ > + if (current->ptrace && current->parent == t) > + return 0; This doesn't look right in general, and this can't really help. This assumes that the tracer will call do_wait() after mm_access() fails, but this is not necessarily true. Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't even return to user-space if SIGCHLD is ignored, the tracer will silently restart the syscall. Oleg. -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
Sorry for delay, I was offline. I'll try to return to this problem next week, currently I can't even read this thread but at first glance the proposed patch(es) do not look right... On 09/21, Michal Hocko wrote: > > The further investigation shown that the tracer (strace) is stuck > waiting for cred_guard_mutex > [<>] mm_access+0x22/0xa0 > [<>] process_vm_rw_core.isra.1+0x112/0x6c0 > [<>] process_vm_rw+0xab/0x110 > [<>] SyS_process_vm_readv+0x15/0x20 > [<>] system_call_fastpath+0x16/0x1b > [<>] 0x7f186f031c3a > [<>] 0x ... > this however doesn't happen because both threads which are dead > are zombies waiting to be reaped by the parent and to call their > release_task->__exit_signal. Yes, I know, this was already reported. And so far I do not see any reasonable fix. I _think_ that the "real" fix should rework the security_bprm_* helpers, but unlikely this is possible. The trivial test-case: void *thread(void *arg) { ptrace(PTRACE_TRACEME, 0,0,0); return NULL; } int main(void) { int pid = fork(); if (!pid) { pthread_t pt; pthread_create(, NULL, thread, NULL); pthread_join(pt, NULL); execlp("echo", "echo", "passed", NULL); } sleep(1); // or anything else which needs ->cred_guard_mutex, // say open(/proc/$pid/mem) ptrace(PTRACE_ATTACH, pid, 0,0); kill(pid, SIGCONT); return 0; } Oleg. -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Fri 23-09-16 11:50:32, Oleg Nesterov wrote: > Sorry for delay, I was offline. I'll try to return to this problem next > week, currently I can't even read this thread but at first glance the > proposed patch(es) do not look right... > > On 09/21, Michal Hocko wrote: > > > > The further investigation shown that the tracer (strace) is stuck > > waiting for cred_guard_mutex > > [<>] mm_access+0x22/0xa0 > > [<>] process_vm_rw_core.isra.1+0x112/0x6c0 > > [<>] process_vm_rw+0xab/0x110 > > [<>] SyS_process_vm_readv+0x15/0x20 > > [<>] system_call_fastpath+0x16/0x1b > > [<>] 0x7f186f031c3a > > [<>] 0x > ... > > this however doesn't happen because both threads which are dead > > are zombies waiting to be reaped by the parent and to call their > > release_task->__exit_signal. > > Yes, I know, this was already reported. And so far I do not see any > reasonable fix. I _think_ that the "real" fix should rework the > security_bprm_* helpers, but unlikely this is possible. Rework them to not rely on the cred_guard_mutex? Is there any way to workaround this in the strace code? > The trivial test-case: > > void *thread(void *arg) > { > ptrace(PTRACE_TRACEME, 0,0,0); > return NULL; > } > > int main(void) > { > int pid = fork(); > > if (!pid) { > pthread_t pt; > pthread_create(, NULL, thread, NULL); > pthread_join(pt, NULL); > execlp("echo", "echo", "passed", NULL); > } > > sleep(1); > // or anything else which needs ->cred_guard_mutex, > // say open(/proc/$pid/mem) > ptrace(PTRACE_ATTACH, pid, 0,0); > kill(pid, SIGCONT); > > return 0; > } > > Oleg. -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Fri 23-09-16 12:21:41, Oleg Nesterov wrote: > On 09/22, Michal Hocko wrote: > > > > --- a/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, > > bool force) > > if (!sig_task_ignored(t, sig, force)) > > return 0; > > > > + /* Do not ignore signals sent from child to the parent */ > > + if (current->ptrace && current->parent == t) > > + return 0; > > This doesn't look right in general, and this can't really help. > > This assumes that the tracer will call do_wait() after mm_access() > fails, but this is not necessarily true. > > Note also ptrace_attach(), -ERESTARTNOINTR means that the tracer won't > even return to user-space if SIGCHLD is ignored, the tracer will silently > restart the syscall. Well, it apparently does help the strace case. So I am not arguing this is the best fix but can it be harmful? -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
> This patch doesn't help, nor does the previous patch... but with both > applied, all is well. All you have to do now is figure out why :) Ohh, I should be more explicit, this needs the mm_access part as well. Sorry for not being clear enough. So the full change is >>> >>> Ah. That was gonna happen after lunch, but since you already know it >>> works, I can get back to un-b0rking one of my trees. >> >> Yeah, it should work. The testcase is running in a loop for more than >> hour already and everything seems to be ok. Now the question is whether >> the fix is really correct which is something for Oleg. >> >> Also I think there is at least one more issue lurking there. Without or >> without the patch I can make tracer get stuck in do_wait waiting for >> task which doesn't seem to be alive anymore. I will report it separately >> after this one gets resolved to not pull more confusion in. > > OK, the test in the loop has survived 3h of runtime without a single > lockup so the patch seems to be working for this case. I am posting the > patch with the full changelog, let's see if it is correct as well. As > I've said earlier this might be a strace bug which does an unexpected > syscall while it should be doing wait on the child process instead. > > If the patch is correct then I would mark it for stable as well. This patch fixes the test case, but it also fixes the original issue (that strace of runC would fail when the container init process is exec-ing the user init process). Thanks Michal, I'll apply it to my local kernel. :D -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Thu 22-09-16 13:09:25, Michal Hocko wrote: > On Thu 22-09-16 12:09:05, Mike Galbraith wrote: > > On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote: > > > On Thu 22-09-16 11:40:09, Mike Galbraith wrote: > > > > > > This patch doesn't help, nor does the previous patch... but with both > > > > applied, all is well. All you have to do now is figure out why :) > > > > > > Ohh, I should be more explicit, this needs the mm_access part as well. > > > Sorry for not being clear enough. So the full change is > > > > Ah. That was gonna happen after lunch, but since you already know it > > works, I can get back to un-b0rking one of my trees. > > Yeah, it should work. The testcase is running in a loop for more than > hour already and everything seems to be ok. Now the question is whether > the fix is really correct which is something for Oleg. > > Also I think there is at least one more issue lurking there. Without or > without the patch I can make tracer get stuck in do_wait waiting for > task which doesn't seem to be alive anymore. I will report it separately > after this one gets resolved to not pull more confusion in. OK, the test in the loop has survived 3h of runtime without a single lockup so the patch seems to be working for this case. I am posting the patch with the full changelog, let's see if it is correct as well. As I've said earlier this might be a strace bug which does an unexpected syscall while it should be doing wait on the child process instead. If the patch is correct then I would mark it for stable as well. --- >From fe82d463fd2ef1585d2c37bf9fa6a1761e6ee0e5 Mon Sep 17 00:00:00 2001 From: Michal HockoDate: Thu, 22 Sep 2016 10:09:34 +0200 Subject: [PATCH] signal: always signal tracer from the ptraced task Aleksa has reported the following lockup when stracing the following go program % cat exec.go package main import ( "os" "syscall" ) func main() { syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ()) } $ go version go version go1.6.3 linux/amd64 $ go build -o exec exec.go $ strace -f ./exec [snip] [pid 10349] select(0, NULL, NULL, NULL, {0, 100} [pid 10346] <... select resumed> ) = 0 (Timeout) [pid 10346] select(0, NULL, NULL, NULL, {0, 20} [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] [pid 10346] <... select resumed> ) = 0 (Timeout) [pid 10349] <... select resumed> ) = 0 (Timeout) execve will never finish unless the strace process get killed with SIGKILL. The reason is the following deadlock tracer thread_A thread_$N SyS_process_vm_readvSyS_execve do_exit do_execveat_common exit_notify prepare_bprm_creds do_notify_parent mutex_lock(cred_guard_mutex) __group_send_sig_info search_binary_handler send_signal load_elf_binary prepare_signal -> fail SIGHCHLD is SGL_DFL flush_old_exec # wait for sig->notify_count process_vm_rw process_vm_rw_core mm_access mutex_lock_killable(cred_guard_mutex) So there seems to be 2 issues here. The first one is that exiting threads (because of the ongoing exec) are not sending SIGCHLD signal to the tracer but they rely on the tracer to reap them and call release_task->__exit_signal which in turn would wake up the thread_A and move on with the exec. The other part of the story is that the tracer is not in do_wait but rather calls process_vm_readv (presumably to get arguments of the syscall) and it waits for a lock in killable rather than interruptible sleep. The fix is therefore twofold. We want to teach mm_access to sleep in interruptible sleep and we want to make sure that the traced child will send the signal to the parent even when it is ignored or SIG_DFL. sig_ignored already seems to be doing something along that line except it doesn't handle when a traced child sends a signal to the tracer. Fix this by checking the current ptrace status and whether the target task is the tracer. Reported-by: Aleksa Sarai Signed-off-by: Michal Hocko --- kernel/fork.c | 2 +- kernel/signal.c | 4 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 5a57b9bab85c..d5b7c3aea187 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) struct mm_struct *mm; int err; - err = mutex_lock_killable(>signal->cred_guard_mutex); + err = mutex_lock_interruptible(>signal->cred_guard_mutex);
Re: strace lockup when tracing exec in go
On Thu 22-09-16 12:09:05, Mike Galbraith wrote: > On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote: > > On Thu 22-09-16 11:40:09, Mike Galbraith wrote: > > > > This patch doesn't help, nor does the previous patch... but with both > > > applied, all is well. All you have to do now is figure out why :) > > > > Ohh, I should be more explicit, this needs the mm_access part as well. > > Sorry for not being clear enough. So the full change is > > Ah. That was gonna happen after lunch, but since you already know it > works, I can get back to un-b0rking one of my trees. Yeah, it should work. The testcase is running in a loop for more than hour already and everything seems to be ok. Now the question is whether the fix is really correct which is something for Oleg. Also I think there is at least one more issue lurking there. Without or without the patch I can make tracer get stuck in do_wait waiting for task which doesn't seem to be alive anymore. I will report it separately after this one gets resolved to not pull more confusion in. Anyway, thanks for your time Mike! -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Thu, 2016-09-22 at 11:53 +0200, Michal Hocko wrote: > On Thu 22-09-16 11:40:09, Mike Galbraith wrote: > > This patch doesn't help, nor does the previous patch... but with both > > applied, all is well. All you have to do now is figure out why :) > > Ohh, I should be more explicit, this needs the mm_access part as well. > Sorry for not being clear enough. So the full change is Ah. That was gonna happen after lunch, but since you already know it works, I can get back to un-b0rking one of my trees. -Mike -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Wed, 2016-09-21 at 17:29 +0200, Michal Hocko wrote: > [I am CCing strace mailing list because even if this turns out to be a > kernel bug strace seems to be doing something unexpected - more on that > below] > > Hi, > Aleksa has reported the following lockup when stracing the following go > program > > % cat exec.go > package main > > import ( > "os" > "syscall" > ) > > func main() { > syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ()) > } > $ go version > go version go1.6.3 linux/amd64 > $ go build -o exec exec.go > > $ strace -f ./exec > [snip] > [pid 10349] select(0, NULL, NULL, NULL, {0, 100} > [pid 10346] <... select resumed> ) = 0 (Timeout) > [pid 10346] select(0, NULL, NULL, NULL, {0, 20} > [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] > [pid 10346] <... select resumed> ) = 0 (Timeout) > [pid 10349] <... select resumed> ) = 0 (Timeout) > > execve will never finish unless the strace process get killed with > SIGKILL. The following was debugged on 3.12 kernel and the current > kernel seems to not trigger the issue as easily but I believe the same > problem is there as well. > > The further investigation shown that the tracer (strace) is stuck > waiting for cred_guard_mutex > [<>] mm_access+0x22/0xa0 > [<>] process_vm_rw_core.isra.1+0x112/0x6c0 > [<>] process_vm_rw+0xab/0x110 > [<>] SyS_process_vm_readv+0x15/0x20 > [<>] system_call_fastpath+0x16/0x1b > [<>] 0x7f186f031c3a > [<>] 0x > > while the traced process (3 threads) are trying to perform the exec. > That means that 2 threads are dead (in fact zombies) waiting in their > final schedule. > Call Trace: > [] do_exit+0x6f7/0xa70 > [] do_group_exit+0x39/0xa0 > [] get_signal_to_deliver+0x1e8/0x5c0 > [] do_signal+0x42/0x670 > [] do_notify_resume+0x78/0xc0 > [] int_signal+0x12/0x17 > [<7f3a33f3ffb9>] 0x7f3a33f3ffb8 > > and one is > Call Trace: > [] flush_old_exec+0xdf/0x890 > [] load_elf_binary+0x307/0xda0 > [] search_binary_handler+0xae/0x1f0 > [] do_execve_common.isra.26+0x64e/0x810 > [] SyS_execve+0x31/0x50 > [] stub_execve+0x69/0xa0 > [<7f3a33f16527>] 0x7f3a33f16526 > > waiting for notify_count to drop down to 0 > while (sig->notify_count) { > __set_current_state(TASK_KILLABLE); > spin_unlock_irq(lock); > schedule(); < > if (unlikely(__fatal_signal_pending(tsk))) > goto killed; > spin_lock_irq(lock); > } > > this however doesn't happen because both threads which are dead > are zombies waiting to be reaped by the parent and to call their > release_task->__exit_signal. The tracer is blocked on the lock held by > exec (in prepare_bprm_creds). This is the case in the current kernel as > well AFAICS so the same should be possible as well. So is this a bug or > something somewhere else makes sure that this will not happen in newer > kernels? master.today... crash> bt 5138 PID: 5138 TASK: 8803fb4e4c80 CPU: 6 COMMAND: "strace" #0 [88038e323b68] __schedule at 81624660 #1 [88038e323bb0] schedule at 81624b35 #2 [88038e323bc8] schedule_preempt_disabled at 81624e0e #3 [88038e323bd8] __mutex_lock_killable_slowpath at 81625f9d #4 [88038e323c30] mutex_lock_killable at 8162605a #5 [88038e323c48] mm_access at 8105bbf7 #6 [88038e323c70] process_vm_rw_core at 811823d3 #7 [88038e323dc0] process_vm_rw at 8118287d #8 [88038e323f38] sys_process_vm_readv at 811829b9 #9 [88038e323f50] entry_SYSCALL_64_fastpath at 81628736 RIP: 7faee6b2ac3a RSP: 7ffdc95b62b8 RFLAGS: 0246 RAX: ffda RBX: 019e7030 RCX: 7faee6b2ac3a RDX: 0001 RSI: 7ffdc95b62d0 RDI: 1418 RBP: 1414 R8: 0001 R9: R10: 7ffdc95b62e0 R11: 0246 R12: 7faee73e66c0 R13: R14: R15: ORIG_RAX: 0136 CS: 0033 SS: 002b crash> bt 5140 PID: 5140 TASK: 8803e703b300 CPU: 4 COMMAND: "exec" #0 [88038e0b3ce8] __schedule at 81624660 #1 [88038e0b3d30] schedule at 81624b35 #2 [88038e0b3d48] flush_old_exec at 811b1f26 #3 [88038e0b3d88] load_elf_binary at 811fd70f #4 [88038e0b3e60] search_binary_handler at 811b13ac #5 [88038e0b3e98] do_execveat_common at 811b2f8f #6 [88038e0b3f00] sys_execve at 811b33ba #7 [88038e0b3f28] do_syscall_64 at 8100194e RIP: 0045c1ef RSP: 00c820043e00 RFLAGS: 0246 RAX: ffda RBX: 00c82000e6c0 RCX: 0045c1ef RDX:
Re: strace lockup when tracing exec in go
On Thu, 2016-09-22 at 10:36 +0200, Michal Hocko wrote: > On Thu 22-09-16 10:01:26, Michal Hocko wrote: > > On Thu 22-09-16 06:15:02, Mike Galbraith wrote: > > [...] > > > master.today... > > > > Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads > > per core) cannot reproduce even in 10 minutes or so. I've tried to use > > the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per > > soc. and 2 threas per core) and it hit almost instantly. I have tried > > mutex_lock_killable -> interruptible and it didn't help as I've > > expected. So the current kernel doesn't do any magic to prevent from the > > issue as well. > > > > So I've stared into do_notify_parent some more and the following was > > just very confusing > > > > > > if (!tsk->ptrace && sig == SIGCHLD && > > > > (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || > > > > (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { > > > > > > /* > > > > > > * We are exiting and our parent doesn't care. POSIX.1 > > > > > > * defines special semantics for setting SIGCHLD to SIG_IGN > > > > > > * or setting the SA_NOCLDWAIT flag: we should be reaped > > > > > > * automatically and not left for our parent's wait4 call. > > > > > > * Rather than having the parent do it as a magic kind of > > > > > > * signal handler, we just set this to tell do_exit that we > > > > > > * can be cleaned up without becoming a zombie. Note that > > > > > > * we still call __wake_up_parent in this case, because a > > > > > > * blocked sys_wait4 might now return -ECHILD. > > > > > > * > > > > > > * Whether we send SIGCHLD or not for SA_NOCLDWAIT > > > > > > * is implementation-defined: we do (if you don't want > > > > > > * it, just use SIG_IGN instead). > > > > > > */ > > > > > > autoreap = true; > > > > > > if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) > > > > > > > > sig = 0; > > > > } > > > > it tries to prevent from what I am seeing in a way. If the SIGCHLD is > > ignored then it just does autoreap and everything is fine. But this > > doesn't seem to be the case here. In fact we are not sending the signal > > because sig_task_ignored is true resp. sig_handler_ignored which can > > fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD > > seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried > > Dohh, I've missed !tsk->ptrace check there. So we are not even going > that via that path. So the sig_handler_ignored cannot possible help. > I was just too lucky and didn't hit the lockup with the patch. > > So what else might be wrong here? sig_ignored seems to be quite > confusing > > > /* > > * Tracers may want to know about even ignored signals. > > */ > > return !t->ptrace; > > t is the tracer here but it shouldn't have t->ptrace because the child > is not stopped. So do we need something like the following? Not tested > yet > > diff --git a/kernel/signal.c b/kernel/signal.c > index 1840c7f4e3c2..bd236ce4a29c 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, > bool force) > >> if (!sig_task_ignored(t, sig, force)) > >> > return 0; > > +>> /* Do not ignore signals sent from child to the parent */ > +>> if (current->ptrace && current->parent == t) > +>> > return 0; > + > >> /* > >> * Tracers may want to know about even ignored signals. > >> */ This patch doesn't help, nor does the previous patch... but with both applied, all is well. All you have to do now is figure out why :) -Mike -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Thu 22-09-16 11:40:09, Mike Galbraith wrote: > On Thu, 2016-09-22 at 10:36 +0200, Michal Hocko wrote: > > On Thu 22-09-16 10:01:26, Michal Hocko wrote: > > > On Thu 22-09-16 06:15:02, Mike Galbraith wrote: > > > [...] > > > > master.today... > > > > > > Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads > > > per core) cannot reproduce even in 10 minutes or so. I've tried to use > > > the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per > > > soc. and 2 threas per core) and it hit almost instantly. I have tried > > > mutex_lock_killable -> interruptible and it didn't help as I've > > > expected. So the current kernel doesn't do any magic to prevent from the > > > issue as well. > > > > > > So I've stared into do_notify_parent some more and the following was > > > just very confusing > > > > > > > > if (!tsk->ptrace && sig == SIGCHLD && > > > > > (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || > > > > > (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { > > > > > > > /* > > > > > > > * We are exiting and our parent doesn't care. POSIX.1 > > > > > > > * defines special semantics for setting SIGCHLD to SIG_IGN > > > > > > > * or setting the SA_NOCLDWAIT flag: we should be reaped > > > > > > > * automatically and not left for our parent's wait4 call. > > > > > > > * Rather than having the parent do it as a magic kind of > > > > > > > * signal handler, we just set this to tell do_exit that we > > > > > > > * can be cleaned up without becoming a zombie. Note that > > > > > > > * we still call __wake_up_parent in this case, because a > > > > > > > * blocked sys_wait4 might now return -ECHILD. > > > > > > > * > > > > > > > * Whether we send SIGCHLD or not for SA_NOCLDWAIT > > > > > > > * is implementation-defined: we do (if you don't want > > > > > > > * it, just use SIG_IGN instead). > > > > > > > */ > > > > > > > autoreap = true; > > > > > > > if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) > > > > > > > > > sig = 0; > > > > > } > > > > > > it tries to prevent from what I am seeing in a way. If the SIGCHLD is > > > ignored then it just does autoreap and everything is fine. But this > > > doesn't seem to be the case here. In fact we are not sending the signal > > > because sig_task_ignored is true resp. sig_handler_ignored which can > > > fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD > > > seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried > > > > Dohh, I've missed !tsk->ptrace check there. So we are not even going > > that via that path. So the sig_handler_ignored cannot possible help. > > I was just too lucky and didn't hit the lockup with the patch. > > > > So what else might be wrong here? sig_ignored seems to be quite > > confusing > > > > > /* > > > * Tracers may want to know about even ignored signals. > > > */ > > > return !t->ptrace; > > > > t is the tracer here but it shouldn't have t->ptrace because the child > > is not stopped. So do we need something like the following? Not tested > > yet > > > > diff --git a/kernel/signal.c b/kernel/signal.c > > index 1840c7f4e3c2..bd236ce4a29c 100644 > > --- a/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, > > bool force) > > > > if (!sig_task_ignored(t, sig, force)) > > > > > return 0; > > > > +> > /* Do not ignore signals sent from child to the parent */ > > +> > if (current->ptrace && current->parent == t) > > +> > > return 0; > > + > > > > /* > > > > * Tracers may want to know about even ignored signals. > > > > */ > > This patch doesn't help, nor does the previous patch... but with both > applied, all is well. All you have to do now is figure out why :) Ohh, I should be more explicit, this needs the mm_access part as well. Sorry for not being clear enough. So the full change is --- diff --git a/kernel/fork.c b/kernel/fork.c index 5a57b9bab85c..d5b7c3aea187 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) struct mm_struct *mm; int err; - err = mutex_lock_killable(>signal->cred_guard_mutex); + err = mutex_lock_interruptible(>signal->cred_guard_mutex); if (err) return ERR_PTR(err); diff --git a/kernel/signal.c b/kernel/signal.c index 96e9bc40667f..5c8b84b76f0b 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, bool force) if (!sig_task_ignored(t, sig, force)) return 0; + /* Do not ignore signals sent from child to the parent */ + if (current->ptrace && current->parent == t) + return 0; + /* * Tracers may want to know
Re: strace lockup when tracing exec in go
On Thu 22-09-16 10:01:26, Michal Hocko wrote: > On Thu 22-09-16 06:15:02, Mike Galbraith wrote: > [...] > > master.today... > > Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads > per core) cannot reproduce even in 10 minutes or so. I've tried to use > the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per > soc. and 2 threas per core) and it hit almost instantly. I have tried > mutex_lock_killable -> interruptible and it didn't help as I've > expected. So the current kernel doesn't do any magic to prevent from the > issue as well. > > So I've stared into do_notify_parent some more and the following was > just very confusing > > if (!tsk->ptrace && sig == SIGCHLD && > (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || >(psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { > /* >* We are exiting and our parent doesn't care. POSIX.1 >* defines special semantics for setting SIGCHLD to SIG_IGN >* or setting the SA_NOCLDWAIT flag: we should be reaped >* automatically and not left for our parent's wait4 call. >* Rather than having the parent do it as a magic kind of >* signal handler, we just set this to tell do_exit that we >* can be cleaned up without becoming a zombie. Note that >* we still call __wake_up_parent in this case, because a >* blocked sys_wait4 might now return -ECHILD. >* >* Whether we send SIGCHLD or not for SA_NOCLDWAIT >* is implementation-defined: we do (if you don't want >* it, just use SIG_IGN instead). >*/ > autoreap = true; > if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) > sig = 0; > } > > it tries to prevent from what I am seeing in a way. If the SIGCHLD is > ignored then it just does autoreap and everything is fine. But this > doesn't seem to be the case here. In fact we are not sending the signal > because sig_task_ignored is true resp. sig_handler_ignored which can > fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD > seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried Dohh, I've missed !tsk->ptrace check there. So we are not even going that via that path. So the sig_handler_ignored cannot possible help. I was just too lucky and didn't hit the lockup with the patch. So what else might be wrong here? sig_ignored seems to be quite confusing /* * Tracers may want to know about even ignored signals. */ return !t->ptrace; t is the tracer here but it shouldn't have t->ptrace because the child is not stopped. So do we need something like the following? Not tested yet diff --git a/kernel/signal.c b/kernel/signal.c index 1840c7f4e3c2..bd236ce4a29c 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -91,6 +91,10 @@ static int sig_ignored(struct task_struct *t, int sig, bool force) if (!sig_task_ignored(t, sig, force)) return 0; + /* Do not ignore signals sent from child to the parent */ + if (current->ptrace && current->parent == t) + return 0; + /* * Tracers may want to know about even ignored signals. */ -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel
Re: strace lockup when tracing exec in go
On Thu 22-09-16 06:15:02, Mike Galbraith wrote: [...] > master.today... Thanks for trying to reproduce this. My tiny laptop (2 cores, 2 threads per core) cannot reproduce even in 10 minutes or so. I've tried to use the same machine I was testing with 3.12 kernel (2 sockets, 8 cores per soc. and 2 threas per core) and it hit almost instantly. I have tried mutex_lock_killable -> interruptible and it didn't help as I've expected. So the current kernel doesn't do any magic to prevent from the issue as well. So I've stared into do_notify_parent some more and the following was just very confusing if (!tsk->ptrace && sig == SIGCHLD && (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { /* * We are exiting and our parent doesn't care. POSIX.1 * defines special semantics for setting SIGCHLD to SIG_IGN * or setting the SA_NOCLDWAIT flag: we should be reaped * automatically and not left for our parent's wait4 call. * Rather than having the parent do it as a magic kind of * signal handler, we just set this to tell do_exit that we * can be cleaned up without becoming a zombie. Note that * we still call __wake_up_parent in this case, because a * blocked sys_wait4 might now return -ECHILD. * * Whether we send SIGCHLD or not for SA_NOCLDWAIT * is implementation-defined: we do (if you don't want * it, just use SIG_IGN instead). */ autoreap = true; if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) sig = 0; } it tries to prevent from what I am seeing in a way. If the SIGCHLD is ignored then it just does autoreap and everything is fine. But this doesn't seem to be the case here. In fact we are not sending the signal because sig_task_ignored is true resp. sig_handler_ignored which can fail even for handler == SIG_DFL && sig_kernel_ignore() and SIGCHLD seems to be in SIG_KERNEL_IGNORE_MASK. So I've tried diff --git a/kernel/fork.c b/kernel/fork.c index 5a57b9bab85c..d5b7c3aea187 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) struct mm_struct *mm; int err; - err = mutex_lock_killable(>signal->cred_guard_mutex); + err = mutex_lock_interruptible(>signal->cred_guard_mutex); if (err) return ERR_PTR(err); diff --git a/kernel/signal.c b/kernel/signal.c index 96e9bc40667f..1840c7f4e3c2 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1630,7 +1630,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig) psig = tsk->parent->sighand; spin_lock_irqsave(>siglock, flags); if (!tsk->ptrace && sig == SIGCHLD && - (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || + (sig_handler_ignored(>action[SIGCHLD-1].sa.sa_handler, SIGCHLD) || (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { /* * We are exiting and our parent doesn't care. POSIX.1 And it worked out. Now I do not have any idea why the check is explicit for SIG_IGN rather than sig_handler_ignored, maybe there is a strong reason for that. Oleg? I will cook up a full patch if this looks ok. -- Michal Hocko SUSE Labs -- ___ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel