Re: ptrace attach in multi-threaded processes
On Fri, Jul 15, 2016 at 11:01:59AM -0700, Mark Johnston wrote: > Thanks, this seems to give the desired behaviour in the single-threaded > case. I'll write a test case for the multi-threaded case next. > > Am I correct in thinking that r302179 could be reverted if your change > is committed? I suspect that it is not. Suppose that we have a single-threaded process which only thread is right on the syscall exit path when the debugger is attached, and debugger requested PTRACE_SCX stops. Then the debuggee reaches the ptracestop(SIGTRAP) stop point before cursig(9) is ever called. So despite the patch, first reported signal is SIGTRAP and not the attaching STOP. If debugger detaches right after that, the process should still be left in stopped state. You may object that gcore(1) does not request SCX, but my point is that even with the patch, first reported signal could be other than the STOP. I suspect that some other ptracestop() call might cause that behaviour either now or in future even with the default event mask. So I would left the r302179 in place. Below is the patch with reverted next_xthread() bits. I reverted them because Peter found the changes to require much more work to not cause regressions. diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 1da4b99..9e1a494 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -2726,7 +2726,20 @@ issignal(struct thread *td) SIG_STOPSIGMASK(sigpending); if (SIGISEMPTY(sigpending)) /* no signal to send */ return (0); - sig = sig_ffs(&sigpending); + if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED && + (p->p_flag2 & P2_PTRACE_FSTP) != 0 && + SIGISMEMBER(sigpending, SIGSTOP)) { + /* +* If debugger just attached, always consume +* SIGSTOP from ptrace(PT_ATTACH) first, to +* execute the debugger attach ritual in +* order. +*/ + sig = SIGSTOP; + p->p_flag2 &= ~P2_PTRACE_FSTP; + } else { + sig = sig_ffs(&sigpending); + } if (p->p_stops & S_SIG) { mtx_unlock(&ps->ps_mtx); @@ -2743,7 +2756,7 @@ issignal(struct thread *td) sigqueue_delete(&p->p_sigqueue, sig); continue; } - if (p->p_flag & P_TRACED && (p->p_flag & P_PPTRACE) == 0) { + if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED) { /* * If traced, always stop. * Remove old signal from queue before the stop. @@ -2846,6 +2859,8 @@ issignal(struct thread *td) mtx_unlock(&ps->ps_mtx); WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, &p->p_mtx.lock_object, "Catching SIGSTOP"); + sigqueue_delete(&td->td_sigqueue, sig); + sigqueue_delete(&p->p_sigqueue, sig); p->p_flag |= P_STOPPED_SIG; p->p_xsig = sig; PROC_SLOCK(p); @@ -2853,7 +2868,7 @@ issignal(struct thread *td) thread_suspend_switch(td, p); PROC_SUNLOCK(p); mtx_lock(&ps->ps_mtx); - break; + goto next; } else if (prop & SA_IGNORE) { /* * Except for SIGCONT, shouldn't get here. @@ -2884,6 +2899,7 @@ issignal(struct thread *td) } sigqueue_delete(&td->td_sigqueue, sig); /* take the signal! */ sigqueue_delete(&p->p_sigqueue, sig); +next:; } /* NOTREACHED */ } diff --git a/sys/kern/sys_process.c b/sys/kern/sys_process.c index f1477ce..86e7c52 100644 --- a/sys/kern/sys_process.c +++ b/sys/kern/sys_process.c @@ -900,6 +900,7 @@ kern_ptrace(struct thread *td, int req, pid_t pid, void *addr, int data) case PT_TRACE_ME: /* set my trace flag and "owner" so it can read/write me */ p->p_flag |= P_TRACED; + p->p_flag2 |= P2_PTRACE_FSTP; p->p_ptevents = PTRACE_DEFAULT; if (p->p_flag & P_PPWAIT) p->p_flag |= P_PPTRACE; @@ -919,6 +920,7 @@ kern_ptrace(struct thread *td, int req, pid_t pid, void *addr, int data) * on a "detach". */ p->p_flag |= P_TRACED; + p->p_flag2 |= P2_PTRACE_FSTP; p->p_ptevents = PTRACE_DEFAULT; p->p_oppid = p->p_p
Re: ptrace attach in multi-threaded processes
On Fri, Jul 15, 2016 at 10:27:20AM +0300, Konstantin Belousov wrote: > On Thu, Jul 14, 2016 at 11:16:05AM -0700, Mark Johnston wrote: > > Please see the program here: > > https://people.freebsd.org/~markj/ptrace_stop.c > > > > It cheats a bit: it uses SIGSTOP to stop the child before sending a > > SIGHUP to it. However, this is just for convenience; note that PT_ATTACH > > will result in a call to thread_unsuspend() on the child, so PT_ATTACH's > > SIGSTOP will be delivered to a running process. When ptrace attaches, > > the child stops and WSTOPSIG(status) == SIGHUP. When ptrace detaches, > > the child is left stopped. > No, it is not for convenience, it relies on another bug to get the effect, > see below. I see. I should have noted that the result can be reproduced without the first SIGSTOP, just not reliably. That is, I still occasionally get the following output when the kill(SIGSTOP) and subsequent waitpid() call are removed: stopping signal is 1 waiting on child... child is stopped after detach (sig 17) > > As I understand you intent, you prefer to get SIGSTOP from the first > waitpid(2) call after successful PT_ATTACH, am I right ? Hm, I don't care very much about that. I was just addressing your claim that the "debugger interface guarantees that SIGSTOP is noted." > At least for > single-threaded case, this can be achieved with a flag indicating that > we a doing first cursig(9) action after the attach, and preferring > SIGSTOP over any other queued signal. The new flag P2_PTRACE_FSTP > does just that. For mt case, I believe that some enchancements to > my proc_next_xthread() would fix that. This seems like a sound approach to me. It provides the guarantee I referenced above, and ensures that the SIGSTOP from PT_ATTACH is delivered before PT_DETACH. > > But when debugging the code, I found that it still does not work reliably > for your test. The reason is that issignal() consumes a queued stop signal > after the thread_suspend_switch(). It allows the attach to occur, but then > sigqueue_delete() calls ('take the signal!') eat the signal for attach. It > seems that we should consume stops before going to stop state. An open > question is how much this hurts when another (non-debugging) SIGSTOP is > queued while in stopped state. > > Please try this. Thanks, this seems to give the desired behaviour in the single-threaded case. I'll write a test case for the multi-threaded case next. Am I correct in thinking that r302179 could be reverted if your change is committed? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Thu, Jul 14, 2016 at 11:16:05AM -0700, Mark Johnston wrote: > Please see the program here: > https://people.freebsd.org/~markj/ptrace_stop.c > > It cheats a bit: it uses SIGSTOP to stop the child before sending a > SIGHUP to it. However, this is just for convenience; note that PT_ATTACH > will result in a call to thread_unsuspend() on the child, so PT_ATTACH's > SIGSTOP will be delivered to a running process. When ptrace attaches, > the child stops and WSTOPSIG(status) == SIGHUP. When ptrace detaches, > the child is left stopped. No, it is not for convenience, it relies on another bug to get the effect, see below. As I understand you intent, you prefer to get SIGSTOP from the first waitpid(2) call after successful PT_ATTACH, am I right ? At least for single-threaded case, this can be achieved with a flag indicating that we a doing first cursig(9) action after the attach, and preferring SIGSTOP over any other queued signal. The new flag P2_PTRACE_FSTP does just that. For mt case, I believe that some enchancements to my proc_next_xthread() would fix that. But when debugging the code, I found that it still does not work reliably for your test. The reason is that issignal() consumes a queued stop signal after the thread_suspend_switch(). It allows the attach to occur, but then sigqueue_delete() calls ('take the signal!') eat the signal for attach. It seems that we should consume stops before going to stop state. An open question is how much this hurts when another (non-debugging) SIGSTOP is queued while in stopped state. Please try this. diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 2a5e6de..36ed15f 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -2525,22 +2525,21 @@ ptracestop(struct thread *td, int sig) PROC_SUNLOCK(p); return (sig); } - /* -* Just make wait() to work, the last stopped thread -* will win. -*/ - p->p_xsig = sig; - p->p_xthread = td; - p->p_flag |= (P_STOPPED_SIG|P_STOPPED_TRACE); - sig_suspend_threads(td, p, 0); - if ((td->td_dbgflags & TDB_STOPATFORK) != 0) { - td->td_dbgflags &= ~TDB_STOPATFORK; - cv_broadcast(&p->p_dbgwait); + if (p->p_xthread == NULL) + p->p_xthread = td; + if (p->p_xthread == td) { + p->p_xsig = sig; + p->p_flag |= P_STOPPED_SIG | P_STOPPED_TRACE; + sig_suspend_threads(td, p, 0); + if ((td->td_dbgflags & TDB_STOPATFORK) != 0) { + td->td_dbgflags &= ~TDB_STOPATFORK; + cv_broadcast(&p->p_dbgwait); + } } stopme: thread_suspend_switch(td, p); if (p->p_xthread == td) - p->p_xthread = NULL; + proc_next_xthread(p); if (!(p->p_flag & P_TRACED)) break; if (td->td_dbgflags & TDB_SUSPEND) { @@ -2725,7 +2724,20 @@ issignal(struct thread *td) SIG_STOPSIGMASK(sigpending); if (SIGISEMPTY(sigpending)) /* no signal to send */ return (0); - sig = sig_ffs(&sigpending); + if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED && + (p->p_flag2 & P2_PTRACE_FSTP) != 0 && + SIGISMEMBER(sigpending, SIGSTOP)) { + /* +* If debugger just attached, always consume +* SIGSTOP from ptrace(PT_ATTACH) first, to +* execute the debugger attach ritual in +* order. +*/ + sig = SIGSTOP; + p->p_flag2 &= ~P2_PTRACE_FSTP; + } else { + sig = sig_ffs(&sigpending); + } if (p->p_stops & S_SIG) { mtx_unlock(&ps->ps_mtx); @@ -2742,7 +2754,7 @@ issignal(struct thread *td) sigqueue_delete(&p->p_sigqueue, sig); continue; } - if (p->p_flag & P_TRACED && (p->p_flag & P_PPTRACE) == 0) { + if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED) { /* * If traced, always stop. * Remove old signal from queue before the stop. @@ -2845,6 +2857,8 @@ issignal(struct thread *td) mtx_unlock(&ps->ps_mtx); WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, &p->p_mtx.lock_object, "Catching SIGSTOP"); + sigqueue_delete
Re: ptrace attach in multi-threaded processes
On Thu, Jul 14, 2016 at 08:25:37AM +0300, Konstantin Belousov wrote: > On Wed, Jul 13, 2016 at 01:01:39PM -0700, Mark Johnston wrote: > > On Wed, Jul 13, 2016 at 10:19:47PM +0300, Konstantin Belousov wrote: > > > Hmm, I think no, we can not make such change. Issue is, debugger > > > interface guarantees (at least for single-threaded programs it is > > > done correctly) that SIGSTOP is noted. In my opinion, it would be the > > > incompatible API change. > > > > But this guarantee is not honoured in the single-threaded case where > > PT_ATTACH sends SIGSTOP after another signal is already pending. This > > other signal will stop the process in ptracestop(), so SIGSTOP will not > > be reported until after a PT_CONTINUE or PT_DETACH, which seems to > > violate the interface as you described it. Am I missing some reason > > that this cannot occur? If not, I'll write a test case for the > > single-threaded case first. > > Please give me some initial test case, I am fine with single-threaded case. > I do not think that the mt test would be much different ? Please see the program here: https://people.freebsd.org/~markj/ptrace_stop.c It cheats a bit: it uses SIGSTOP to stop the child before sending a SIGHUP to it. However, this is just for convenience; note that PT_ATTACH will result in a call to thread_unsuspend() on the child, so PT_ATTACH's SIGSTOP will be delivered to a running process. When ptrace attaches, the child stops and WSTOPSIG(status) == SIGHUP. When ptrace detaches, the child is left stopped. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Wed, Jul 13, 2016 at 01:01:39PM -0700, Mark Johnston wrote: > On Wed, Jul 13, 2016 at 10:19:47PM +0300, Konstantin Belousov wrote: > > On Wed, Jul 13, 2016 at 09:42:47AM -0700, Mark Johnston wrote: > > > I'm having trouble determining if the diff changes any userland-visible > > > behaviour. It seems that the only potential problem with the current > > > p_xthread handling is in stopevent(), since a thread calling stopevent() > > > from postsig() may clear p_xthread after it was set by another thread in > > > ptracestop(). But I also don't understand why we call stopevent(S_SIG) > > > from both issignal() and postsig() - this would appear to stop the > > > thread twice for the same signal. > > You mean that the patch would not fix your issue ? Quite possible, it > > might require some more code to 'move the torch' to next xthread, so to > > say. When you write the test case, I will spend efforts on the working > > patch. > > I don't think this addresses my issue of the process remaining stopped > after the PT_DETACH, but see below. Patch tries to add some coordination to ptracestop(), I do not object to the statement that what was done is not enough. > > > > > That said, I do not think that we should change anything about stopevent(), > > since this is code which is on life support. If we cannot remove procfs > > debugging interface, let not change it at least in incompatible ways. > > > > > > > > With respect to the desired direction, do you agree that the SIGSTOP > > > from PT_ATTACH should effectively be ignored if a different signal stops > > > the process first? As I said in a previous post, it seems that the > > > SA_STOP property of PT_ATTACH's SIGSTOP is not used in the common case, > > > since ptracestop() will stop the process if any signal is received, and > > > the PT_DETACH operation will typically overwrite the SIGSTOP with 0 in > > > td_xsig. > > Hmm, I think no, we can not make such change. Issue is, debugger > > interface guarantees (at least for single-threaded programs it is > > done correctly) that SIGSTOP is noted. In my opinion, it would be the > > incompatible API change. > > But this guarantee is not honoured in the single-threaded case where > PT_ATTACH sends SIGSTOP after another signal is already pending. This > other signal will stop the process in ptracestop(), so SIGSTOP will not > be reported until after a PT_CONTINUE or PT_DETACH, which seems to > violate the interface as you described it. Am I missing some reason > that this cannot occur? If not, I'll write a test case for the > single-threaded case first. Please give me some initial test case, I am fine with single-threaded case. I do not think that the mt test would be much different ? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Wed, Jul 13, 2016 at 10:19:47PM +0300, Konstantin Belousov wrote: > On Wed, Jul 13, 2016 at 09:42:47AM -0700, Mark Johnston wrote: > > I'm having trouble determining if the diff changes any userland-visible > > behaviour. It seems that the only potential problem with the current > > p_xthread handling is in stopevent(), since a thread calling stopevent() > > from postsig() may clear p_xthread after it was set by another thread in > > ptracestop(). But I also don't understand why we call stopevent(S_SIG) > > from both issignal() and postsig() - this would appear to stop the > > thread twice for the same signal. > You mean that the patch would not fix your issue ? Quite possible, it > might require some more code to 'move the torch' to next xthread, so to > say. When you write the test case, I will spend efforts on the working > patch. I don't think this addresses my issue of the process remaining stopped after the PT_DETACH, but see below. > > That said, I do not think that we should change anything about stopevent(), > since this is code which is on life support. If we cannot remove procfs > debugging interface, let not change it at least in incompatible ways. > > > > > With respect to the desired direction, do you agree that the SIGSTOP > > from PT_ATTACH should effectively be ignored if a different signal stops > > the process first? As I said in a previous post, it seems that the > > SA_STOP property of PT_ATTACH's SIGSTOP is not used in the common case, > > since ptracestop() will stop the process if any signal is received, and > > the PT_DETACH operation will typically overwrite the SIGSTOP with 0 in > > td_xsig. > Hmm, I think no, we can not make such change. Issue is, debugger > interface guarantees (at least for single-threaded programs it is > done correctly) that SIGSTOP is noted. In my opinion, it would be the > incompatible API change. But this guarantee is not honoured in the single-threaded case where PT_ATTACH sends SIGSTOP after another signal is already pending. This other signal will stop the process in ptracestop(), so SIGSTOP will not be reported until after a PT_CONTINUE or PT_DETACH, which seems to violate the interface as you described it. Am I missing some reason that this cannot occur? If not, I'll write a test case for the single-threaded case first. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Wed, Jul 13, 2016 at 09:42:47AM -0700, Mark Johnston wrote: > On Wed, Jul 13, 2016 at 07:54:39AM +0300, Konstantin Belousov wrote: > > I finally see. Might be something like the patch below is a step in > > the desired direction. Idea is in the proc_next_xthread(): p_xthread > > should be set to the next thread with a pending signal. Do you have a > > test case that demonstrates the issue ? > > Not yet, I'll work on one. I've only seen this occur once in an Isilon > test cluster. > > The diff makes sense to me, thanks. I'd find the code easier to read if > proc_next_xthread() was a pure function that returned the flagged > thread instead of setting p_xthread. I did coded it this way, struct thread *proc_next_xthread(struct proc *p); initially, but I did not liked that all uses are p->p_xthread = proc_next_xthread(p); > > I'm having trouble determining if the diff changes any userland-visible > behaviour. It seems that the only potential problem with the current > p_xthread handling is in stopevent(), since a thread calling stopevent() > from postsig() may clear p_xthread after it was set by another thread in > ptracestop(). But I also don't understand why we call stopevent(S_SIG) > from both issignal() and postsig() - this would appear to stop the > thread twice for the same signal. You mean that the patch would not fix your issue ? Quite possible, it might require some more code to 'move the torch' to next xthread, so to say. When you write the test case, I will spend efforts on the working patch. That said, I do not think that we should change anything about stopevent(), since this is code which is on life support. If we cannot remove procfs debugging interface, let not change it at least in incompatible ways. > > With respect to the desired direction, do you agree that the SIGSTOP > from PT_ATTACH should effectively be ignored if a different signal stops > the process first? As I said in a previous post, it seems that the > SA_STOP property of PT_ATTACH's SIGSTOP is not used in the common case, > since ptracestop() will stop the process if any signal is received, and > the PT_DETACH operation will typically overwrite the SIGSTOP with 0 in > td_xsig. Hmm, I think no, we can not make such change. Issue is, debugger interface guarantees (at least for single-threaded programs it is done correctly) that SIGSTOP is noted. In my opinion, it would be the incompatible API change. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Wed, Jul 13, 2016 at 07:54:39AM +0300, Konstantin Belousov wrote: > I finally see. Might be something like the patch below is a step in > the desired direction. Idea is in the proc_next_xthread(): p_xthread > should be set to the next thread with a pending signal. Do you have a > test case that demonstrates the issue ? Not yet, I'll work on one. I've only seen this occur once in an Isilon test cluster. The diff makes sense to me, thanks. I'd find the code easier to read if proc_next_xthread() was a pure function that returned the flagged thread instead of setting p_xthread. I'm having trouble determining if the diff changes any userland-visible behaviour. It seems that the only potential problem with the current p_xthread handling is in stopevent(), since a thread calling stopevent() from postsig() may clear p_xthread after it was set by another thread in ptracestop(). But I also don't understand why we call stopevent(S_SIG) from both issignal() and postsig() - this would appear to stop the thread twice for the same signal. With respect to the desired direction, do you agree that the SIGSTOP from PT_ATTACH should effectively be ignored if a different signal stops the process first? As I said in a previous post, it seems that the SA_STOP property of PT_ATTACH's SIGSTOP is not used in the common case, since ptracestop() will stop the process if any signal is received, and the PT_DETACH operation will typically overwrite the SIGSTOP with 0 in td_xsig. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Tue, Jul 12, 2016 at 09:02:10PM -0700, Mark Johnston wrote: > Hm, the only SIGSTOP in my scenario is the one generated by PT_ATTACH. > The problem occurs when this SIGSTOP races with *any* other signal that's > being delivered to the process and for which the process has registered > a handler. For instance, SIGHUP after a log rotation. > > If a real SIGSTOP races with PT_ATTACH, then I would indeed expect to > find the process in a stopped state after the detach. Does this make > more sense? I finally see. Might be something like the patch below is a step in the desired direction. Idea is in the proc_next_xthread(): p_xthread should be set to the next thread with a pending signal. Do you have a test case that demonstrates the issue ? diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 2a5e6de..1106f3a 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -2525,22 +2525,21 @@ ptracestop(struct thread *td, int sig) PROC_SUNLOCK(p); return (sig); } - /* -* Just make wait() to work, the last stopped thread -* will win. -*/ - p->p_xsig = sig; - p->p_xthread = td; - p->p_flag |= (P_STOPPED_SIG|P_STOPPED_TRACE); - sig_suspend_threads(td, p, 0); - if ((td->td_dbgflags & TDB_STOPATFORK) != 0) { - td->td_dbgflags &= ~TDB_STOPATFORK; - cv_broadcast(&p->p_dbgwait); + if (p->p_xthread == NULL) + p->p_xthread = td; + if (p->p_xthread == td) { + p->p_xsig = sig; + p->p_flag |= (P_STOPPED_SIG|P_STOPPED_TRACE); + sig_suspend_threads(td, p, 0); + if ((td->td_dbgflags & TDB_STOPATFORK) != 0) { + td->td_dbgflags &= ~TDB_STOPATFORK; + cv_broadcast(&p->p_dbgwait); + } } stopme: thread_suspend_switch(td, p); if (p->p_xthread == td) - p->p_xthread = NULL; + proc_next_xthread(p); if (!(p->p_flag & P_TRACED)) break; if (td->td_dbgflags & TDB_SUSPEND) { diff --git a/sys/kern/sys_process.c b/sys/kern/sys_process.c index a6037e3..3d9950b 100644 --- a/sys/kern/sys_process.c +++ b/sys/kern/sys_process.c @@ -1057,7 +1057,7 @@ kern_ptrace(struct thread *td, int req, pid_t pid, void *addr, int data) proctree_locked = 0; } p->p_xsig = data; - p->p_xthread = NULL; + proc_next_xthread(p); if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) != 0) { /* deliver or queue signal */ td2->td_dbgflags &= ~TDB_XSIG; @@ -1065,7 +1065,8 @@ kern_ptrace(struct thread *td, int req, pid_t pid, void *addr, int data) if (req == PT_DETACH) { FOREACH_THREAD_IN_PROC(p, td3) - td3->td_dbgflags &= ~TDB_SUSPEND; + td3->td_dbgflags &= ~(TDB_SUSPEND | + TDB_XSIG); } /* * unsuspend all threads, to not let a thread run, @@ -1376,9 +1377,24 @@ stopevent(struct proc *p, unsigned int event, unsigned int val) do { if (event != S_EXIT) p->p_xsig = val; - p->p_xthread = NULL; + proc_next_xthread(p); p->p_stype = event; /* Which event caused the stop? */ wakeup(&p->p_stype);/* Wake up any PIOCWAIT'ing procs */ msleep(&p->p_step, &p->p_mtx, PWAIT, "stopevent", 0); } while (p->p_step); } + +void +proc_next_xthread(struct proc *p) +{ + struct thread *td; + + PROC_LOCK_ASSERT(p, MA_OWNED); + FOREACH_THREAD_IN_PROC(p, td) { + if (td == p->p_xthread) + continue; + if ((td->td_dbgflags & TDB_XSIG) != 0) + break; + } + p->p_xthread = td; +} diff --git a/sys/sys/proc.h b/sys/sys/proc.h index f533db6..a3132d9 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -999,6 +999,7 @@ int proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb); void procinit(void); void proc_linkup0(struct proc *p, struct thread *td); void proc_linkup(struct proc *p, struct thread *td); +void proc_next_xthread(struct proc *p); struct proc *proc_realparent(struct proc *child); void proc_reap(struct thread *td, struct proc *p, int *status, int options); void proc_reparent(struct proc *child, struct proc *newparent); _
Re: ptrace attach in multi-threaded processes
On Wed, Jul 13, 2016 at 06:30:36AM +0300, Konstantin Belousov wrote: > On Tue, Jul 12, 2016 at 11:24:14AM -0700, Mark Johnston wrote: > > On Tue, Jul 12, 2016 at 08:51:50PM +0300, Konstantin Belousov wrote: > > > On Tue, Jul 12, 2016 at 10:05:02AM -0700, Mark Johnston wrote: > > > > On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote: > > > > I suppose it is not strictly incorrect. I find it surprising that a > > > > PT_ATTACH followed by a PT_DETACH may leave the process in a different > > > > state than it was in before the attach. This means that it is not > > > > possible to gcore a process without potentially leaving it stopped, for > > > > instance. This result may occur in a single-threaded process > > > > as well, since a signal may already be queued when the PT_ATTACH handler > > > > sends SIGSTOP. > > > I still miss somethine. Isn't this an expected outcome from sending a > > > signal with STOP action ? > > > > It is. But I also expect a PT_DETACH operation to resume a stopped > > process, assuming that a second SIGSTOP was not posted while the > > process was suspended. > But as far as the situation was discussed, it seems that real SIGSTOP raced > with PT_ATTACH. And the offered interpretation that SIGSTOP was delivered > 'a bit later' than PT_ATTACH would fit into the description. Hm, the only SIGSTOP in my scenario is the one generated by PT_ATTACH. The problem occurs when this SIGSTOP races with *any* other signal that's being delivered to the process and for which the process has registered a handler. For instance, SIGHUP after a log rotation. If a real SIGSTOP races with PT_ATTACH, then I would indeed expect to find the process in a stopped state after the detach. Does this make more sense? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Tue, Jul 12, 2016 at 11:24:14AM -0700, Mark Johnston wrote: > On Tue, Jul 12, 2016 at 08:51:50PM +0300, Konstantin Belousov wrote: > > On Tue, Jul 12, 2016 at 10:05:02AM -0700, Mark Johnston wrote: > > > On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote: > > > I suppose it is not strictly incorrect. I find it surprising that a > > > PT_ATTACH followed by a PT_DETACH may leave the process in a different > > > state than it was in before the attach. This means that it is not > > > possible to gcore a process without potentially leaving it stopped, for > > > instance. This result may occur in a single-threaded process > > > as well, since a signal may already be queued when the PT_ATTACH handler > > > sends SIGSTOP. > > I still miss somethine. Isn't this an expected outcome from sending a > > signal with STOP action ? > > It is. But I also expect a PT_DETACH operation to resume a stopped > process, assuming that a second SIGSTOP was not posted while the > process was suspended. But as far as the situation was discussed, it seems that real SIGSTOP raced with PT_ATTACH. And the offered interpretation that SIGSTOP was delivered 'a bit later' than PT_ATTACH would fit into the description. > > > > > > Indeed, I somehow missed that. I had assumed that the leaked TDB_XSIG > > > represented a bug in ptracestop(). > > It could, I did not made any statements that deny the bug: > > To be clear, the root of my issue comes from the following: the SIGSTOP > from PT_ATTACH may be handled concurrently with a second signal > delivered to a second thread in the same process. Then, the resulting > behaviour depends on the order in which the recipient threads suspend in > ptracestop(). If the thread that received SIGSTOP suspends last, its > td_xsig will be overwritten with the userland-provided value in the > PT_DETACH handler. If it suspends first, its td_xsig will be preserved, > and upon PT_DETACH the process will be suspended again in issignal(). > > I'm not sure if this is considered a bug. ptracestop() is handling all > signals (including the SIGSTOP generated by the PT_ATTACH handler) in a > consistent way, but this results in inconsistent behaviour from the > perspective of a ptrace(2) consumer. Still I do not understand what is inconsistent. Let look at it from the other side (before, we discussed the implementation in kernel). Is this happens in gcore(1) ? If yes, gcore interaction with ptrace(2) looks like this: ptrace(PT_ATTACH, g_pid); waitpid(g_pid, &g_status, 0); ... if (sig == SIGSTOP) sig = 0; ptrace(PT_DETACH, g_pid, 1, sig); It sounds as if it is desirable for you to modify gcore(1) to consume all signals, or at least, all STOP signals before PT_DETACH. I do not understand why do you want it, but that would probably give you the behaviour you want: ptrace(PT_ATTACH, g_pid); waitpid(g_pid, &g_status, 0); ... /* still consume implicit SIGSTOP from attach */ if (sig == SIGSTOP) sig = 0; do { error = waitpid(g_pid, &g_status, WNOHANG | WSTOPPED); } while (error == 0); ptrace(PT_DETACH, g_pid, 1, sig); ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Tue, Jul 12, 2016 at 08:51:50PM +0300, Konstantin Belousov wrote: > On Tue, Jul 12, 2016 at 10:05:02AM -0700, Mark Johnston wrote: > > On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote: > > I suppose it is not strictly incorrect. I find it surprising that a > > PT_ATTACH followed by a PT_DETACH may leave the process in a different > > state than it was in before the attach. This means that it is not > > possible to gcore a process without potentially leaving it stopped, for > > instance. This result may occur in a single-threaded process > > as well, since a signal may already be queued when the PT_ATTACH handler > > sends SIGSTOP. > I still miss somethine. Isn't this an expected outcome from sending a > signal with STOP action ? It is. But I also expect a PT_DETACH operation to resume a stopped process, assuming that a second SIGSTOP was not posted while the process was suspended. > > > Indeed, I somehow missed that. I had assumed that the leaked TDB_XSIG > > represented a bug in ptracestop(). > It could, I did not made any statements that deny the bug: To be clear, the root of my issue comes from the following: the SIGSTOP from PT_ATTACH may be handled concurrently with a second signal delivered to a second thread in the same process. Then, the resulting behaviour depends on the order in which the recipient threads suspend in ptracestop(). If the thread that received SIGSTOP suspends last, its td_xsig will be overwritten with the userland-provided value in the PT_DETACH handler. If it suspends first, its td_xsig will be preserved, and upon PT_DETACH the process will be suspended again in issignal(). I'm not sure if this is considered a bug. ptracestop() is handling all signals (including the SIGSTOP generated by the PT_ATTACH handler) in a consistent way, but this results in inconsistent behaviour from the perspective of a ptrace(2) consumer. > > > > > Moreover, in my scenario I see a thread with TDB_XSIG set even after > > > > ptrace(PT_DETACH) was called (P_TRACED is cleared). > > > This is interesting, we indeed do not clear the flag consistently. > > > But again, the only consequence seems to be a possible invalid reporting > > > of events. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Tue, Jul 12, 2016 at 10:05:02AM -0700, Mark Johnston wrote: > On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote: > I suppose it is not strictly incorrect. I find it surprising that a > PT_ATTACH followed by a PT_DETACH may leave the process in a different > state than it was in before the attach. This means that it is not > possible to gcore a process without potentially leaving it stopped, for > instance. This result may occur in a single-threaded process > as well, since a signal may already be queued when the PT_ATTACH handler > sends SIGSTOP. I still miss somethine. Isn't this an expected outcome from sending a signal with STOP action ? > Indeed, I somehow missed that. I had assumed that the leaked TDB_XSIG > represented a bug in ptracestop(). It could, I did not made any statements that deny the bug: > > > Moreover, in my scenario I see a thread with TDB_XSIG set even after > > > ptrace(PT_DETACH) was called (P_TRACED is cleared). > > This is interesting, we indeed do not clear the flag consistently. > > But again, the only consequence seems to be a possible invalid reporting > > of events. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Tue, Jul 12, 2016 at 08:57:53AM +0300, Konstantin Belousov wrote: > On Mon, Jul 11, 2016 at 06:19:38PM -0700, Mark Johnston wrote: > > Hi, > > > > It seems to be possible for ptrace(PT_ATTACH) to race with the delivery > > of a signal to the same process. ptrace(PT_ATTACH) sets P_TRACED and > > sends SIGSTOP to a thread in the target process. Consider the case where > > a signal is delivered to a second thread, and both threads are executing > > ast() concurrently. The two threads will both call issignal() and from > > there call ptracestop() because P_TRACED is set, though they will be > > serialized by the proc lock. If the thread receiving SIGSTOP wins the > > race, it will suspend first and set p->p_xthread. The second thread will > > also suspend in ptracestop(), overwriting the p_xthread field set by the > > first thread. Later, ptrace(PT_DETACH) will unsuspend the threads, but > > it will set td->td_xsig only in the second thread. This means that the > > first thread will return SIGSTOP from ptracestop() and subsequently > > suspend the process, which seems rather incorrect. > Why ? In particular, why delivering STOP after attach, in the described > situation, is perceived as incorrect ? Parallel STOPs, one from attach, > and other from kill(2), must result in two stops. I suppose it is not strictly incorrect. I find it surprising that a PT_ATTACH followed by a PT_DETACH may leave the process in a different state than it was in before the attach. This means that it is not possible to gcore a process without potentially leaving it stopped, for instance. This result may occur in a single-threaded process as well, since a signal may already be queued when the PT_ATTACH handler sends SIGSTOP. To me it just seems a bit strange that ptrace's mechanism for stopping the target - sending SIGSTOP - interacts this way with ptrace's handling of signals - ptracestop()). Specifically, PT_ATTACH does not rely on the SA_STOP property of SIGSTOP to stop the process, but rather on the special signal handling in ptracestop(). > > The bit about overwriting p_xsig/p_xthread indeed initially sound worrysome, > but probably not too much. The only consequence of reassigning p_xthread > is the selection of the 'lead' thread in sys_process.c, it seems. > > > > > The above is just a theory to explain an unexpectedly-stopped > > multi-threaded process that I've observed. Is there some mechanism I'm > > missing that prevents multiple threads from suspending in ptracestop() > > at the same time? If not, then I think that's the root of the problem, > > since p_xthread is pretty clearly not meant to be overwritten this way. > Again, why ? > > Note the comment >* Just make wait() to work, the last stopped thread > * will win. > which seems to point to the situation. Indeed, I somehow missed that. I had assumed that the leaked TDB_XSIG represented a bug in ptracestop(). > > > Moreover, in my scenario I see a thread with TDB_XSIG set even after > > ptrace(PT_DETACH) was called (P_TRACED is cleared). > This is interesting, we indeed do not clear the flag consistently. > But again, the only consequence seems to be a possible invalid reporting > of events. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ptrace attach in multi-threaded processes
On Mon, Jul 11, 2016 at 06:19:38PM -0700, Mark Johnston wrote: > Hi, > > It seems to be possible for ptrace(PT_ATTACH) to race with the delivery > of a signal to the same process. ptrace(PT_ATTACH) sets P_TRACED and > sends SIGSTOP to a thread in the target process. Consider the case where > a signal is delivered to a second thread, and both threads are executing > ast() concurrently. The two threads will both call issignal() and from > there call ptracestop() because P_TRACED is set, though they will be > serialized by the proc lock. If the thread receiving SIGSTOP wins the > race, it will suspend first and set p->p_xthread. The second thread will > also suspend in ptracestop(), overwriting the p_xthread field set by the > first thread. Later, ptrace(PT_DETACH) will unsuspend the threads, but > it will set td->td_xsig only in the second thread. This means that the > first thread will return SIGSTOP from ptracestop() and subsequently > suspend the process, which seems rather incorrect. Why ? In particular, why delivering STOP after attach, in the described situation, is perceived as incorrect ? Parallel STOPs, one from attach, and other from kill(2), must result in two stops. The bit about overwriting p_xsig/p_xthread indeed initially sound worrysome, but probably not too much. The only consequence of reassigning p_xthread is the selection of the 'lead' thread in sys_process.c, it seems. > > The above is just a theory to explain an unexpectedly-stopped > multi-threaded process that I've observed. Is there some mechanism I'm > missing that prevents multiple threads from suspending in ptracestop() > at the same time? If not, then I think that's the root of the problem, > since p_xthread is pretty clearly not meant to be overwritten this way. Again, why ? Note the comment * Just make wait() to work, the last stopped thread * will win. which seems to point to the situation. > Moreover, in my scenario I see a thread with TDB_XSIG set even after > ptrace(PT_DETACH) was called (P_TRACED is cleared). This is interesting, we indeed do not clear the flag consistently. But again, the only consequence seems to be a possible invalid reporting of events. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"