Re: lockmgr panic on shutdown
More info. I think this is actually related to kan's reversion of src/sys/kern/vfs_default.c. I'm trying rev 1.88 of that file. In the meantime, here is the panic message and backtrace. I have a crashdump if desired. panic: lockmgr: thread 0xc493be40, not exclusive lock holder 0xc071f320 unlocking #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 #1 0xc0510884 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:372 #2 0xc0510be7 in panic () at /usr/src/sys/kern/kern_shutdown.c:550 #3 0xc05043ea in lockmgr (lkp=0xc49af428, flags=6, interlkp=0x140, td=0xc493be40) at /usr/src/sys/kern/kern_lock.c:414 #4 0xc055e91f in vop_stdunlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:299 #5 0xc055e768 in vop_defaultop (ap=0x0) at /usr/src/sys/kern/vfs_default.c:161 #6 0xc061ed88 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2793 #7 0xc06173d0 in ufs_inactive (ap=0x0) at vnode_if.h:1044 #8 0xc061ed88 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2793 #9 0xc05686d7 in vput (vp=0xc49af36c) at vnode_if.h:953 #10 0xc0610245 in ffs_sync (mp=0xc47b2800, waitfor=2, cred=0xc1d07e80, td=0xc071f320) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1169 #11 0xc056ade4 in sync (td=0xc071f320, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:142 #12 0xc05103e0 in boot (howto=8) at /usr/src/sys/kern/kern_shutdown.c:281 #13 0xc0510046 in reboot (td=0x0, uap=0x0) at /usr/src/sys/kern/kern_shutdown.c:178 #14 0xc066cf32 in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = 0, tf_ebp = -1077937104, tf_isp = -574005900, tf_ebx = 2, tf_edx = -1, tf_ecx = 3, tf_eax = 55, tf_trapno = 12, tf_err = 2, tf_eip = 134516011, tf_cs = 31, tf_eflags = 582, tf_esp = -1077937172, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1012 #15 0xc065d91d in Xint0x80_syscall () at {standard input}:144 On Sat, 1 Nov 2003, Doug White wrote: I can confirm the lockmgr panic on shutdown reported by someone else earlier (whose message I mistakenly deleted). It looks like swapper is trying to undo a lock from pagedaemon and runs into trouble. This is probably related to the Giant pushdown of vm_pageout() that alc did last week. I'm building with INVARIANTS to see if that will catch more info. Will report back soon. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
I can confirm the lockmgr panic on shutdown reported by someone else earlier (whose message I mistakenly deleted). It looks like swapper is trying to undo a lock from pagedaemon and runs into trouble. This is probably related to the Giant pushdown of vm_pageout() that alc did last week. I'm building with INVARIANTS to see if that will catch more info. Will report back soon. Just happened me too. I think I see the problem: When boot() calls sync(), it passes thread0 as the thread argument. This gets propgated up to ffs_sync, which: calls vget(), which takes a thread argument. does some stuff calls vput(), which does _not_ take a thread argument The vget() is passed thread0, as passed from boot. The vput() gets the current thread, which is the process calling boot. The unlocking in vput is asserting that the same thread that aquired the lock is releasing it, which seems reasonable. The obvious solution might be to change line 1161 of ffs_vfsops to pass vget() curthread rather than td. I assume there's a good reason why thread0 is passed from boot(), but I can't see why that's of any use to the vnode locking. i.e.: Index: ffs_vfsops.c === RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.221 diff -u -r1.221 ffs_vfsops.c --- ffs_vfsops.c1 Nov 2003 05:51:54 - 1.221 +++ ffs_vfsops.c2 Nov 2003 03:06:42 - @@ -1158,7 +1158,7 @@ continue; } mtx_unlock(mntvnode_mtx); - if ((error = vget(vp, lockreq, td)) != 0) { + if ((error = vget(vp, lockreq, curthread)) != 0) { mtx_lock(mntvnode_mtx); if (error == ENOENT) goto loop; How come tha parameters to vget and vput are lopsided like this? This might have something to do with the commit of revision 1.218 of ffs_vfsops.c, but I'm not sure. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
For giggles I'm rolling back vfs_default.c back to 1.87 since its along the backtrace path. I suspect I'll need to back up the whole thing to before the commit for the struct mount locking until jeff kan can straighten things out. On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote: I can confirm the lockmgr panic on shutdown reported by someone else earlier (whose message I mistakenly deleted). It looks like swapper is trying to undo a lock from pagedaemon and runs into trouble. This is probably related to the Giant pushdown of vm_pageout() that alc did last week. I'm building with INVARIANTS to see if that will catch more info. Will report back soon. Just happened me too. I think I see the problem: When boot() calls sync(), it passes thread0 as the thread argument. This gets propgated up to ffs_sync, which: calls vget(), which takes a thread argument. does some stuff calls vput(), which does _not_ take a thread argument The vget() is passed thread0, as passed from boot. The vput() gets the current thread, which is the process calling boot. The unlocking in vput is asserting that the same thread that aquired the lock is releasing it, which seems reasonable. The obvious solution might be to change line 1161 of ffs_vfsops to pass vget() curthread rather than td. I assume there's a good reason why thread0 is passed from boot(), but I can't see why that's of any use to the vnode locking. i.e.: Index: ffs_vfsops.c === RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.221 diff -u -r1.221 ffs_vfsops.c --- ffs_vfsops.c1 Nov 2003 05:51:54 - 1.221 +++ ffs_vfsops.c2 Nov 2003 03:06:42 - @@ -1158,7 +1158,7 @@ continue; } mtx_unlock(mntvnode_mtx); - if ((error = vget(vp, lockreq, td)) != 0) { + if ((error = vget(vp, lockreq, curthread)) != 0) { mtx_lock(mntvnode_mtx); if (error == ENOENT) goto loop; How come tha parameters to vget and vput are lopsided like this? This might have something to do with the commit of revision 1.218 of ffs_vfsops.c, but I'm not sure. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
On Sat, 1 Nov 2003, Doug White wrote: For giggles I'm rolling back vfs_default.c back to 1.87 since its along the backtrace path. This didn't work so -CURRENT is fully broke. I'd suggest staying on 10/30 not before 4PM PST if you want to not crash on shutdown. I suspect I'll need to back up the whole thing to before the commit for the struct mount locking until jeff kan can straighten things out. On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote: I can confirm the lockmgr panic on shutdown reported by someone else earlier (whose message I mistakenly deleted). It looks like swapper is trying to undo a lock from pagedaemon and runs into trouble. This is probably related to the Giant pushdown of vm_pageout() that alc did last week. I'm building with INVARIANTS to see if that will catch more info. Will report back soon. Just happened me too. I think I see the problem: When boot() calls sync(), it passes thread0 as the thread argument. This gets propgated up to ffs_sync, which: calls vget(), which takes a thread argument. does some stuff calls vput(), which does _not_ take a thread argument The vget() is passed thread0, as passed from boot. The vput() gets the current thread, which is the process calling boot. The unlocking in vput is asserting that the same thread that aquired the lock is releasing it, which seems reasonable. The obvious solution might be to change line 1161 of ffs_vfsops to pass vget() curthread rather than td. I assume there's a good reason why thread0 is passed from boot(), but I can't see why that's of any use to the vnode locking. i.e.: Index: ffs_vfsops.c === RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.221 diff -u -r1.221 ffs_vfsops.c --- ffs_vfsops.c1 Nov 2003 05:51:54 - 1.221 +++ ffs_vfsops.c2 Nov 2003 03:06:42 - @@ -1158,7 +1158,7 @@ continue; } mtx_unlock(mntvnode_mtx); - if ((error = vget(vp, lockreq, td)) != 0) { + if ((error = vget(vp, lockreq, curthread)) != 0) { mtx_lock(mntvnode_mtx); if (error == ENOENT) goto loop; How come tha parameters to vget and vput are lopsided like this? This might have something to do with the commit of revision 1.218 of ffs_vfsops.c, but I'm not sure. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
For giggles I'm rolling back vfs_default.c back to 1.87 since its along the backtrace path. This didn't work so -CURRENT is fully broke. I'd suggest staying on 10/30 not before 4PM PST if you want to not crash on shutdown. The patch worked for me. (Well, a slightly modified one: I passed 0 for the thread argument to vget: It recognises that as special). Included here is the patch to both the ffs and default sync operations. I didn't exercise the default one, but the ffs case is certainly behaving itself. Index: kern/vfs_default.c === RCS file: /usr/cvs/FreeBSD-CVS/src/sys/kern/vfs_default.c,v retrieving revision 1.89 diff -u -r1.89 vfs_default.c --- kern/vfs_default.c 1 Nov 2003 05:51:54 - 1.89 +++ kern/vfs_default.c 2 Nov 2003 03:36:03 - @@ -898,7 +898,7 @@ } mtx_unlock(mntvnode_mtx); - if ((error = vget(vp, lockreq, td)) != 0) { + if ((error = vget(vp, lockreq, 0)) != 0) { mtx_lock(mntvnode_mtx); if (error == ENOENT) goto loop; Index: ufs/ffs/ffs_vfsops.c === RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.221 diff -u -r1.221 ffs_vfsops.c --- ufs/ffs/ffs_vfsops.c1 Nov 2003 05:51:54 - 1.221 +++ ufs/ffs/ffs_vfsops.c2 Nov 2003 03:22:13 - @@ -1158,7 +1158,7 @@ continue; } mtx_unlock(mntvnode_mtx); - if ((error = vget(vp, lockreq, td)) != 0) { + if ((error = vget(vp, lockreq, 0)) != 0) { mtx_lock(mntvnode_mtx); if (error == ENOENT) goto loop; ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote: For giggles I'm rolling back vfs_default.c back to 1.87 since its along the backtrace path. This didn't work so -CURRENT is fully broke. I'd suggest staying on 10/30 not before 4PM PST if you want to not crash on shutdown. The patch worked for me. (Well, a slightly modified one: I passed 0 for the thread argument to vget: It recognises that as special). kan came up with a different patch that changes the vput in ffs_vfsops:ffs_sync with a vrele. That should be committed shortly. Since he's been working in that area I'll defer to him :) Included here is the patch to both the ffs and default sync operations. I didn't exercise the default one, but the ffs case is certainly behaving itself. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: lockmgr panic on shutdown
On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote: The obvious solution might be to change line 1161 of ffs_vfsops to pass vget() curthread rather than td. I assume there's a good reason why thread0 is passed from boot(), but I can't see why that's of any use to the vnode locking. Passing thread0 in boot() is a quick (and not even wrong) fix for the problem that there is no valid current process^Wthread in the panic case. Long ago in Net/2 (still in Lite2 for at least the i386 version), sync() in boot() was passed the completely bogus parameters ((struct sigcontext *)0) (instead of (p, uap, retval). This worked to the extent that sync()'s proc pointer was not passed further or not dereferenced. Now there are lots of locks, and since thread0 is never the corerect lock holder, things work at most to the extent that sync()'s proc pointer is not passed further. curthread is never null in -current, so upgrading to the version that passes it (i386/i386/machdep.c 1.111 (actually passes curproc)) would probably help in the non-panic case without increasing bugs for the panic case. However, passing curthread is still wrong for the panic case due to the following complications: - panics may occur during context switches or in other critical regions when curthread is not quite current. - under SMP, curthread is per-CPU, so having it non-null doesn't really help. Locks may be held by curproc's running on other CPUs, and in panic() it is difficult to handle the other CPUs correctly -- if you stop them then they won't be able to release their locks, and if you let them run they may run into you. Hopefully in the case of a normal shutdown all the other CPUs release their locks and stop before the sync(). Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]