Re: Deadlock involving truss -f, pdfork() and wait4()
As Conrad has pointed out, it's an explicit PID. The test completes successfully when not run under truss -f. On Fri, Sep 13, 2019 at 2:37 PM Mark Johnston wrote: > > On Fri, Sep 13, 2019 at 02:12:56PM -0400, Ryan Stone wrote: > > This gets me a little further but now the wait4 call by the parent > > never reaps the child and instead blocks forever: > > Does it perform a wildcarded wait(), or does it explicitly specify the > PID of the child? By design, the former will not return children > created by pdfork(). ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Deadlock involving truss -f, pdfork() and wait4()
On Fri, Sep 13, 2019 at 11:37 AM Mark Johnston wrote: > > On Fri, Sep 13, 2019 at 02:12:56PM -0400, Ryan Stone wrote: > > This gets me a little further but now the wait4 call by the parent > > never reaps the child and instead blocks forever: > > Does it perform a wildcarded wait(), or does it explicitly specify the > PID of the child? By design, the former will not return children > created by pdfork(). Explicit PID: https://people.freebsd.org/~rstone/pdfork.c Best, Conrad ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Deadlock involving truss -f, pdfork() and wait4()
On Fri, Sep 13, 2019 at 02:12:56PM -0400, Ryan Stone wrote: > This gets me a little further but now the wait4 call by the parent > never reaps the child and instead blocks forever: Does it perform a wildcarded wait(), or does it explicitly specify the PID of the child? By design, the former will not return children created by pdfork(). ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Deadlock involving truss -f, pdfork() and wait4()
This gets me a little further but now the wait4 call by the parent never reaps the child and instead blocks forever: # truss -f ./pdfork -p 706: mmap(0x0,131072,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34361 970688 (0x800221000) 706: issetugid() = 0 (0x0) 706: open("/etc/libmap.conf",O_RDONLY|O_CLOEXEC,010440020) = 3 (0x3) 706: fstat(3,{ mode=-rw-r--r-- ,inode=241058,size=47,blksize=32768 }) = 0 (0x0 ) 706: read(3,"# $FreeBSD$\nincludedir /usr/loc"...,47) = 47 (0x2f) 706: close(3) = 0 (0x0) 706: open("/usr/local/etc/libmap.d",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC, 0165) ERR#2 'No such file or directory' 706: open("/var/run/ld-elf.so.hints",O_RDONLY|O_CLOEXEC,010002355) = 3 (0x3) 706: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0A\0\0\0\0"...,128) = 128 (0x80) 706: fstat(3,{ mode=-r--r--r-- ,inode=72,size=193,blksize=4096 }) = 0 (0x0) 706: pread(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,65,0x80) = 65 (0x41) 706: close(3) = 0 (0x0) 706: open("/lib/libc.so.7",O_RDONLY|O_CLOEXEC|O_VERIFY,057400) = 3 (0x3) 706: fstat(3,{ mode=-r--r--r-- ,inode=402754,size=1915744,blksize=32768 }) = 0 (0x0) 706: mmap(0x0,4096,PROT_READ,MAP_PRIVATE|MAP_PREFAULT_READ,3,0x0) = 3436210176 0 (0x800241000) 706: mmap(0x0,4169728,PROT_NONE,MAP_GUARD,-1,0x0) = 34362105856 (0x800242000) 706: mmap(0x800242000,524288,PROT_READ,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE|MAP_PR EFAULT_READ,3,0x0) = 34362105856 (0x800242000) 706: mmap(0x8002c2000,1327104,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NO CORE|MAP_PREFAULT_READ,3,0x8) = 34362630144 (0x8002c2000) 706: mmap(0x800406000,61440,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_PRE FAULT_READ,3,0x1c4000) = 34363957248 (0x800406000) 706: mmap(0x800415000,2256896,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_A NON,-1,0x0) = 34364018688 (0x800415000) 706: munmap(0x800241000,4096) = 0 (0x0) 706: close(3) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ) = 0 (0x0) 706: sysarch(AMD64_SET_FSBASE,0x7fffe110) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ|PROT_WRITE) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x80040c000,36864,PROT_READ) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: readlink("/etc/malloc.conf",0x7fffd830,1024) ERR#2 'No such file or d irectory' 706: issetugid() = 0 (0x0) 706: __sysctl(0x7fffd7e0,0x2,0x7fffd7dc,0x7fffd7d0,0x0,0x0) = 0 (0 x0) 706: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21 ),-1,0x0) = 34368126976 (0x80080) 706: cap_getmode({ 0 })= 0 (0x0) 706: open("/dev/hpet0",O_RDONLY,00)= 3 (0x3) 706: mmap(0x0,4096,PROT_READ,MAP_SHARED,3,0x0) = 34362101760 (0x800241000) 706: close(3) = 0 (0x0) 706: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12), -1,0x0) = 34366275584 (0x80063c000) 706: mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(21 ),-1,0x0) = 34370224128 (0x800a0) 706: mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),- 1,0x0) = 34366308352 (0x800644000) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 706: mprotect(0x203000,4096,PROT_READ) = 0 (0x0) 706: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 706: pdfork(0x7fffeba8,0x0)= 708 (0x2c4) 708: 708: nanosleep({ 1.0 })= 0 (0x0) 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG TERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFS Z|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 708: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 708: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIG
Re: Deadlock involving truss -f, pdfork() and wait4()
Hello Ryan, Can you verify is this patch fix your issue: https://reviews.freebsd.org/D20362 Thanks, Mariusz On Thu, 12 Sep 2019 at 21:37, Ryan Stone wrote: > > I've hit an issue with a simple use of pdfork(). I have a process > that calls pdfork() and the parent immediately does a wait4() on the > child pid. This works fine under normal conditions, but if the parent > is run under truss -f, the three processes deadlock. If I switch out > pdfork() for fork(), the deadlock does not occur. > > This C file demonstrates the issue: > > https://people.freebsd.org/~rstone/pdfork.c > > If I run "truss -f ./pdfork", which uses fork(), it completes within a > second. If I run "truss -f ./pdfork -p", which uses pdfork(), the > processes deadlock. If I run "./pdfork -p" without truss, it > completes normally. > > procstat reports the following kernel stacks: > > 27572 102043 truss - mi_switch+0xe2 > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf > kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e > fast_syscall_common+0x101 > 27573 102469 pdfork - mi_switch+0xe2 > sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf > kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e > fast_syscall_common+0x101 > 27574 102053 pdfork - mi_switch+0xe2 > thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e > fork_exit+0x83 fork_trampoline+0xe > > As near as I can tell, truss is blocked waiting for ptrace events, the > parent process is blocked in wait4, and the child process is perhaps > waiting for its parent to exit the kernel so it can send the ptrace > event? > > I really don't see anything obvious in the pdfork() code path that > would cause this to happen when fork() doesn't have the problem. It > may be that pdfork() just changes the timing enough to expose a latent > bug. > > I'm seeing this on a recentish current (r351363). > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Deadlock involving truss -f, pdfork() and wait4()
I've hit an issue with a simple use of pdfork(). I have a process that calls pdfork() and the parent immediately does a wait4() on the child pid. This works fine under normal conditions, but if the parent is run under truss -f, the three processes deadlock. If I switch out pdfork() for fork(), the deadlock does not occur. This C file demonstrates the issue: https://people.freebsd.org/~rstone/pdfork.c If I run "truss -f ./pdfork", which uses fork(), it completes within a second. If I run "truss -f ./pdfork -p", which uses pdfork(), the processes deadlock. If I run "./pdfork -p" without truss, it completes normally. procstat reports the following kernel stacks: 27572 102043 truss - mi_switch+0xe2 sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e fast_syscall_common+0x101 27573 102469 pdfork - mi_switch+0xe2 sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e fast_syscall_common+0x101 27574 102053 pdfork - mi_switch+0xe2 thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e fork_exit+0x83 fork_trampoline+0xe As near as I can tell, truss is blocked waiting for ptrace events, the parent process is blocked in wait4, and the child process is perhaps waiting for its parent to exit the kernel so it can send the ptrace event? I really don't see anything obvious in the pdfork() code path that would cause this to happen when fork() doesn't have the problem. It may be that pdfork() just changes the timing enough to expose a latent bug. I'm seeing this on a recentish current (r351363). ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"