Roland, On Thu, Apr 05, 2007 at 02:41:39PM -0700, Roland McGrath wrote: > > I am running into some problems with pfmon when attaching to a thread > > inside a multi-threaded program. I pass the tid (not pid) to the > > ptrace(), wait*() calls. > > > > During the attachment, I do the following sequence: > > ptrace(PTRACE_ATTACH, tid, NULL, NULL); > > ret = waitpid(pid, &status, WUNTRACED); > > Do you mean: > ret = waitpid(tid, &status, WUNTRACED); > here? > Yes.
Here is the strace I get: ptrace(PTRACE_ATTACH, 19779, 0, 0) = 0 --- SIGCHLD (Child exited) @ a000000000010621 (2aff00004d43) --- wait4(19779, 0x60000fffffb7b4b4, WUNTRACED, NULL) = -1 ECHILD (No child processes) The ps command shows: PID SPID TTY TIME CMD 19652 19652 pts/4 00:00:00 tcsh 19661 19661 pts/4 00:00:00 vim 19778 19778 pts/4 00:00:00 mytest 19778 19779 pts/4 00:00:00 mytest 19778 19780 pts/4 00:00:00 mytest 19786 19786 pts/4 00:00:00 ps > > The wait fails with errno=10 (ENOCHILD). If I remove it, I can go past > > this point. > > Which kernel is this? Once PTRACE_ATTACH returns success for TID, then > waitpid on that TID should never produce ECHILD (unless maybe the traced > process is doing a multithreaded exec right then). A notable exception is > that a security module like SELinux can refuse the security_task_wait, and > this leads to false ECHILD failures. I recently posted a patch on lkml for > this (so you'd get EPERM or something else instead of ECHILD). If you are > using SELinux, check for avc messages in dmesg. (I discovered this false > ECHILD behavior because of a bug in SELinux and/or its standard policy that > broke using gdb on certain processes.) This is with 2.6.21-rc5/ia64. I have seen the same behavior with 2.6.20 on i386. The thread is not doing exec(), it is likely blocked in sleep(). No SELinux. > > PTRACE_ATTACH generates a SIGSTOP for the TID attached, as if by > tgkill/tkill. Hence, if it's not already stopped, then it should dequeue > that SIGSTOP and get to a ptrace signal stop soon, so that a waitpid by the > ptracer should return. > > > I do understand that SIGSTOP probably applies to the entire process and > > not just that one thread. Yet it seems strange that the notification is > > not propagated. > > The SIGSTOP is queued for the one thread if you use tgkill/tkill or ptrace > to send it, and for the process as a whole if you use kill to send it. The Well, I am using regular kill(). I did not know about tkill(). This one seems to accept regular pid as well, right? > only instantaneous effect it has is to clear all pending SIGCONT signals > from all queues. In the latter case, any thread in the process might be > the first that happens to dequeue it, but in the former it is on the one > thread's private queue. When a thread dequeues a SIGSTOP, if that thread > is ptrace'd it will stop for ptrace; only when the signal is delivered, by > a PTRACE_CONT or similar call, or if there is no ptracer for that thread, > will the process-wide effect of the SIGSTOP take place. > > I hope that helps, though I don't think I know for sure precisely what you > are doing and what you are seeing so as to try to explain a specific scenario. > -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
