I've been exchanging email off-list about this with a few people. One of them remarked that a kernel coredump would help.
Yesterday it wedged again. I got a kernel coredump...and, well, as I put it in off-list mail: >> I now realize I don't know how to coax [process stack traces] out of >> a kernel core. I don't recall hearing of any sort of postmortem >> ddb. I have a the corresponding netbsd.gdb, and I found gdb's >> target kvm, but I haven't manged to get a stack trace for any >> process out of it. The response turned out to be exactly the cluesticking I needed to get stack traces. I've now got (kernel) stack traces. They explain very neatly how unrelated processes end up in puffsrpl - it's the vnode version of the memory-pressure theory I mentioned (as implausible) upthread: #0 0xc04b7beb in mi_switch () #1 0xc04b3dbb in sleepq_block () #2 0xc048eb0f in cv_wait_sig () #3 0xc038b3ea in puffs_msg_wait () #4 0xc038b547 in puffs_msg_wait2 () #5 0xc038ff40 in puffs_vnop_inactive () #6 0xc05281f8 in VOP_INACTIVE () #7 0xc051b7bc in vclean () #8 0xc051d36a in getcleanvnode () #9 0xc051d52e in getnewvnode () #10 0xc0404aa3 in ffs_vget () #11 0xc03f3a45 in ffs_valloc () #12 0xc042f052 in ufs_makeinode () #13 0xc04309fa in ufs_create () #14 0xc05290af in VOP_CREATE () #15 0xc0525df2 in vn_open () #16 0xc0521d44 in sys_open () #17 0xc05a9fcf in syscall () #18 0xc010058e in syscall1 () #0 0xc04b7beb in mi_switch () #1 0xc04b3dbb in sleepq_block () #2 0xc048eb0f in cv_wait_sig () #3 0xc038b3ea in puffs_msg_wait () #4 0xc038b547 in puffs_msg_wait2 () #5 0xc038ff40 in puffs_vnop_inactive () #6 0xc05281f8 in VOP_INACTIVE () #7 0xc051b7bc in vclean () #8 0xc051d36a in getcleanvnode () #9 0xc051d52e in getnewvnode () #10 0xc0404aa3 in ffs_vget () #11 0xc03f3a45 in ffs_valloc () #12 0xc042f052 in ufs_makeinode () #13 0xc04309fa in ufs_create () #14 0xc05290af in VOP_CREATE () #15 0xc0525df2 in vn_open () #16 0xc0521d44 in sys_open () #17 0xc05a9fcf in syscall () #18 0xc010058e in syscall1 () #0 0xc04b7beb in mi_switch () #1 0xc04b3dbb in sleepq_block () #2 0xc048eb0f in cv_wait_sig () #3 0xc038b3ea in puffs_msg_wait () #4 0xc038b547 in puffs_msg_wait2 () #5 0xc038ff40 in puffs_vnop_inactive () #6 0xc05281f8 in VOP_INACTIVE () #7 0xc051b7bc in vclean () #8 0xc051d36a in getcleanvnode () #9 0xc051d52e in getnewvnode () #10 0xc0404aa3 in ffs_vget () #11 0xc042d59b in ufs_lookup () #12 0xc052917c in VOP_LOOKUP () #13 0xc0516ddb in lookup () #14 0xc05175c5 in namei () #15 0xc05205a6 in sys_access () #16 0xc05a9fcf in syscall () #17 0xc010058e in syscall1 () (Arguments are not shown because I made a stupid mistake; I did not have a netbsd.gdb available. But the above traces are informative enough, to me.) There was a git process, it was a child of the main gitfs process, and it was in puffsrpl (it's the last of the above stack traces). So my best-guess theory now is that I have a codepath somewhere in gitfs that forks git and waits for it to finish _without_ processing other puffs requests while waiting. There should be no such, but I can't explain this any other way. The gitfs process is blocked in select, but that's exactly what I'd expect. I now would like _userland_ stack traces. The kernel stack trace for the main gitfs process is exactly what I'd expect #0 0xc04b7beb in mi_switch () #1 0xc04b3dbb in sleepq_block () #2 0xc04e60ed in pollcommon () #3 0xc04e639f in sys_poll () #4 0xc05a9fcf in syscall () #5 0xc010058e in syscall1 () but waiting for git to finish could very well be in poll() waiting for git to print output. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B