Re: VFS MFC testers wanted
Kris, If relevant, I may be able to test your patch, but the problem is occurring only rarely. Do you have any suggestions for isolating and reproducing this bug? Run your script in a loop? And I assume I can meaningfully test w/o the rsync call? I.e., just mounting and unmounting the drive over and over again should trigger the error, no? chad ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
Jeff, Kris Kennaway directed me to this thread from FreeBSD-questions. I am seeing a panic: unmount: dangling vnode with 6.0-RELEASE. Here are the relevant threads: 2 probs w/ backup.sh: Device busy and dangling vnode http://lists.freebsd.org/pipermail/freebsd-questions/2006-March/114825.html Panic: unmount: dangling vnode http://lists.freebsd.org/pipermail/freebsd-questions/2006-March/115060.html If relevant, I may be able to test your patch, but the problem is occurring only rarely. Do you have any suggestions for isolating and reproducing this bug? Also, we are seeing this problem on a production box. I notice that the patch fixes 6 issues, and apparently breaks the kernel ABI, which sounds nasty from out here in userland. Any chance of getting a patch that isolates this specific issue? I'll be more likely able to apply such a patch. Our alternative is to simply keep our backup drive always mounted until 6.1 comes out and test your patch then. :^) Thoughts? Thanks for your work on this. Chad Whitacre http://www.zetadev.com/ Jeff Roberson wrote: I plan to MFC all of this lovely stuff for 6.1: http://www.chesapeake.net/~jroberson/vfsmfc.diff I'm looking for people who are willing to patch their stable boxes and test this. This has the following changes in it: 1) Improved debugging with DEBUG_LOCKS via the new stack(9) api. 2) Fixed an INACTIVE leak. 3) Fixed several unmount races. 4) Fixed several nullfs unmount issues. 5) Some more Giant related VFS fixes and asserts. 6) Fixed the quota deadlock. These problems should be rare enough that most of you have not seen them. So just let me know if this introduces any new problems etc. I will be MFCing within a week. Thanks, Jeff ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Tue, Mar 07, 2006 at 09:58:35AM -0500, Chad Whitacre wrote: Kris, If relevant, I may be able to test your patch, but the problem is occurring only rarely. Do you have any suggestions for isolating and reproducing this bug? Run your script in a loop? And I assume I can meaningfully test w/o the rsync call? I.e., just mounting and unmounting the drive over and over again should trigger the error, no? No, the rsync (i.e. activity on the filesystem) is important. Kris pgpbfK9ZsnPAD.pgp Description: PGP signature
Re: VFS MFC testers wanted
Kris, No, the rsync (i.e. activity on the filesystem) is important. Yeah, just ran a test w/o it actually. We are planning to run a test w/ some disk activity later this afternoon. I suppose the more activity, the more likely to see the bug, eh? chad ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Tue, Mar 07, 2006 at 12:26:47PM -0500, Chad Whitacre wrote: Kris, No, the rsync (i.e. activity on the filesystem) is important. Yeah, just ran a test w/o it actually. We are planning to run a test w/ some disk activity later this afternoon. I suppose the more activity, the more likely to see the bug, eh? Yes. FYI, I am fairly confident this is fixed, because I make extensive use of mount/umount+filesystem activity, and I am no longer seeing problems like this. Kris pgpdOdnItHgpw.pgp Description: PGP signature
Re: VFS MFC testers wanted
Kris, Yes. FYI, I am fairly confident this is fixed, because I make extensive use of mount/umount+filesystem activity, and I am no longer seeing problems like this. Great! Thanks for the info. We won't kill ourselves trying to test this then. chad ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Tue, Mar 07, 2006 at 12:33:28PM -0500, Chad Whitacre wrote: Kris, Yes. FYI, I am fairly confident this is fixed, because I make extensive use of mount/umount+filesystem activity, and I am no longer seeing problems like this. Great! Thanks for the info. We won't kill ourselves trying to test this then. Well, I'd still like you to test it just to be sure. I'm not able to detect all FreeBSD bugs, after all (though I try :-) Kris pgpPkk9pi3I9L.pgp Description: PGP signature
Re: VFS MFC testers wanted
On 3/3/06, Jeff Roberson [EMAIL PROTECTED] wrote: I plan to MFC all of this lovely stuff for 6.1: http://www.chesapeake.net/~jroberson/vfsmfc.diff I'm looking for people who are willing to patch their stable boxes and test this. This has the following changes in it: 1) Improved debugging with DEBUG_LOCKS via the new stack(9) api. 2) Fixed an INACTIVE leak. 3) Fixed several unmount races. 4) Fixed several nullfs unmount issues. 5) Some more Giant related VFS fixes and asserts. 6) Fixed the quota deadlock. These problems should be rare enough that most of you have not seen them. So just let me know if this introduces any new problems etc. I will be MFCing within a week. Do you have a list of the PRs that this affects and/or resolves? I'm curious, specifically, about kern/84589. I believe it may be related to snapshots in some way, but I don't know. I'm not going to be able to test the patch on these servers, unfortunately, as they're fully in production now. The servers are stable now with background_fsck=NO, but YES is still the default (last I checked). FWIW: the deadlock in 84589 doesn't involve quotas, as there were no quotas enabled in the kernel. The URLs referenced in the PR are no longer valid (they were never clicked) but I can produce them if necessary. Also: http://www.freebsd.org/releases/6.1R/todo.html The todo page mentions deadlocks but doesn't link to any specific PRs/threads that discuss them. Are these fixed with this MFC? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Tue, Mar 07, 2006 at 10:15:59AM -0800, David Kirchner wrote: Do you have a list of the PRs that this affects and/or resolves? I don't know of any relevant PRs, but I haven't looked very hard. The bugs Jeff has been fixing are mostly those I've been able to reproduce myself in testing. I'm curious, specifically, about kern/84589. I believe it may be related to snapshots in some way, but I don't know. I'm not going to be able to test the patch on these servers, unfortunately, as they're fully in production now. The servers are stable now with background_fsck=NO, but YES is still the default (last I checked). You need to enable debugging, specifically KDB, DDB, INVARIANTS, INVARIANT_SUPPORT, DEBUG_LOCKS and DEBUG_VFS_LOCKS (ideally with the patch applied, as it gives more useful debugging). Then reproduce the deadlock condition, break to DDB and do 'show lockedvnods' and 'wh pid' where pid are the processes listed as holding locks. Also: http://www.freebsd.org/releases/6.1R/todo.html The todo page mentions deadlocks but doesn't link to any specific PRs/threads that discuss them. Are these fixed with this MFC? Some deadlocks are resolved, yes. There are other snapshot-related deadlocks that are not fixed with this patch, but which Jeff and others are still working on. Kris pgpDkCqJCV3wS.pgp Description: PGP signature
Re: VFS MFC testers wanted
Kris, For the record, we ran the following for about 30 minutes, with no ill effect: #!/bin/sh exec /var/log/panic exec 21 echo echo `date` -- trying to panic while [ 1 ] do /sbin/mount /backup/ /bin/rm -rf /backup/foo /bin/cp -R /usr/bin /backup/foo /sbin/umount /backup/ echo -n '.' done At this point our plan is to cross our fingers and wait for 6.1. Thanks for all your efforts! chad Kris Kennaway wrote: On Tue, Mar 07, 2006 at 12:33:28PM -0500, Chad Whitacre wrote: Kris, Yes. FYI, I am fairly confident this is fixed, because I make extensive use of mount/umount+filesystem activity, and I am no longer seeing problems like this. Great! Thanks for the info. We won't kill ourselves trying to test this then. Well, I'd still like you to test it just to be sure. I'm not able to detect all FreeBSD bugs, after all (though I try :-) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Sat, 4 Mar 2006, Kostik Belousov wrote: On Fri, Mar 03, 2006 at 03:41:55PM -0800, Jeff Roberson wrote: I plan to MFC all of this lovely stuff for 6.1: http://www.chesapeake.net/~jroberson/vfsmfc.diff I'm looking for people who are willing to patch their stable boxes and test this. This has the following changes in it: 1) Improved debugging with DEBUG_LOCKS via the new stack(9) api. 2) Fixed an INACTIVE leak. 3) Fixed several unmount races. 4) Fixed several nullfs unmount issues. 5) Some more Giant related VFS fixes and asserts. 6) Fixed the quota deadlock. These problems should be rare enough that most of you have not seen them. So just let me know if this introduces any new problems etc. I will be MFCing within a week. Thanks, Jeff I applied the patch to the today 6-STABLE and now testing it on the (relatively slow, k6/266Mhz) machine, by cvs-ing the sources and building the world. Kernel config is custom (see below issue #2), I added DEBUG_* and WITNESS* options from your patch. Config does not include QUOTAS. 1. The patch breaks the kernel ABI, even for the case when DEBUG_VFS_LOCK is not defined. This is due to changes inside struct mount, adding mnt_ref and rearranging several existing fields. This issue shall be at least mentioned in release notes. Yes, you are correct. I had mentioned this to re but not in the mail here. I intend to make some changes to fix the abi breakage. 2. I built custom kernel with options MAC. After some fs activity, I got the LOR: Strange, I thought I had fixed this some time ago. I'll look into it, thanks. lock order reversal: 1st 0xc1a018f0 vnode interlock (vnode interlock) @ /usr/home/kostik/work/bsd/sys/kern/vfs_subr.c:2449 2nd 0xc0c43144 system map (system map) @ /usr/home/kostik/work/bsd/sys/vm/vm_kern.c:295 KDB: stack backtrace: kdb_backtrace(0,,c06676b0,c0667700,c0636024) at 0xc049d3c9 = kdb_backtrace+0x29 witness_checkorder(c0c43144,9,c061fe28,127) at 0xc04a80c2 = witness_checkorder+0x582 _mtx_lock_flags(c0c43144,0,c061fe28,127) at 0xc047b998 = _mtx_lock_flags+0x58 _vm_map_lock(c0c430c0,c061fe28,127) at 0xc059eb46 = _vm_map_lock+0x26 kmem_malloc(c0c430c0,1000,101,c819fbe0,c059679f) at 0xc059e0d2 = kmem_malloc+0x32 page_alloc(c0c4d300,1000,c819fbd3,101,c06a3bf8) at 0xc0596bda = page_alloc+0x1a slab_zalloc(c0c4d300,101,c0c4d300,c0647a64,c0c4e460) at 0xc059679f = slab_zalloc+0x9f uma_zone_slab(c0c4d300,1,c0c4e468,0,c061f05a,8a2) at 0xc0597dec = uma_zone_slab+0xec uma_zalloc_internal(c0c4d300,0,1,0,c0c4dc48) at 0xc0598129 = uma_zalloc_internal+0x29 bucket_alloc(80,1,c0c380a0,0,c19ab6a4) at 0xc0595eac = bucket_alloc+0x2c uma_zfree_arg(c0c4dc00,c19ab6a4,0) at 0xc0598483 = uma_zfree_arg+0x283 mac_labelzone_free(c19ab6a4,c1a01828,e8,c819fc9c,c0565ad2) at 0xc055dab3 = mac_labelzone_free+0x13 mac_vnode_label_free(c19ab6a4,c1a01828,c819fcac,c04d8766,c1a01828) at 0xc0565aaa = mac_vnode_label_free+0x6a mac_destroy_vnode(c1a01828) at 0xc0565ad2 = mac_destroy_vnode+0x12 vdestroy(c1a01828,c1a01828,c819fcec,c04d8142,c1a01828) at 0xc04d8766 = vdestroy+0x1c6 vdropl(c1a01828,7,a8,c0653ee0,c1a01828) at 0xc04dad1e = vdropl+0x3e vlrureclaim(c15e8000,c1529000,c156f000,c04d8360,c156f000) at 0xc04d8142 = vlrureclaim+0x282 vnlru_proc(0,c819fd38,0,c04d8360,0) at 0xc04d84e3 = vnlru_proc+0x183 fork_exit(c04d8360,0,c819fd38) at 0xc046de7d = fork_exit+0x9d fork_trampoline() at 0xc05d33bc = fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xc819fd6c, ebp = 0 --- A patch to fix the LOR (seems to be relevant for CURRENT too): --- sys/kern/vfs_subr.c.origSat Mar 4 10:44:47 2006 +++ sys/kern/vfs_subr.c Sat Mar 4 10:45:21 2006 @@ -787,9 +787,6 @@ VNASSERT(bo-bo_dirty.bv_root == NULL, vp, (dirtyblkroot not NULL)); VNASSERT(TAILQ_EMPTY(vp-v_cache_dst), vp, (vp has namecache dst)); VNASSERT(LIST_EMPTY(vp-v_cache_src), vp, (vp has namecache src)); -#ifdef MAC - mac_destroy_vnode(vp); -#endif if (vp-v_pollinfo != NULL) { knlist_destroy(vp-v_pollinfo-vpi_selinfo.si_note); mtx_destroy(vp-v_pollinfo-vpi_lock); @@ -801,6 +798,9 @@ #endif lockdestroy(vp-v_vnlock); mtx_destroy(vp-v_interlock); +#ifdef MAC + mac_destroy_vnode(vp); +#endif uma_zfree(vnode_zone, vp); } Up to the moment (uptime 11:09AM up 1:24, 1 user, load averages: 1.32, 1.56, 1.56) everything else seems to be okey. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VFS MFC testers wanted
On Fri, Mar 03, 2006 at 03:41:55PM -0800, Jeff Roberson wrote: I plan to MFC all of this lovely stuff for 6.1: http://www.chesapeake.net/~jroberson/vfsmfc.diff I'm looking for people who are willing to patch their stable boxes and test this. This has the following changes in it: 1) Improved debugging with DEBUG_LOCKS via the new stack(9) api. 2) Fixed an INACTIVE leak. 3) Fixed several unmount races. 4) Fixed several nullfs unmount issues. 5) Some more Giant related VFS fixes and asserts. 6) Fixed the quota deadlock. These problems should be rare enough that most of you have not seen them. So just let me know if this introduces any new problems etc. I will be MFCing within a week. Thanks, Jeff I applied the patch to the today 6-STABLE and now testing it on the (relatively slow, k6/266Mhz) machine, by cvs-ing the sources and building the world. Kernel config is custom (see below issue #2), I added DEBUG_* and WITNESS* options from your patch. Config does not include QUOTAS. 1. The patch breaks the kernel ABI, even for the case when DEBUG_VFS_LOCK is not defined. This is due to changes inside struct mount, adding mnt_ref and rearranging several existing fields. This issue shall be at least mentioned in release notes. 2. I built custom kernel with options MAC. After some fs activity, I got the LOR: lock order reversal: 1st 0xc1a018f0 vnode interlock (vnode interlock) @ /usr/home/kostik/work/bsd/sys/kern/vfs_subr.c:2449 2nd 0xc0c43144 system map (system map) @ /usr/home/kostik/work/bsd/sys/vm/vm_kern.c:295 KDB: stack backtrace: kdb_backtrace(0,,c06676b0,c0667700,c0636024) at 0xc049d3c9 = kdb_backtrace+0x29 witness_checkorder(c0c43144,9,c061fe28,127) at 0xc04a80c2 = witness_checkorder+0x582 _mtx_lock_flags(c0c43144,0,c061fe28,127) at 0xc047b998 = _mtx_lock_flags+0x58 _vm_map_lock(c0c430c0,c061fe28,127) at 0xc059eb46 = _vm_map_lock+0x26 kmem_malloc(c0c430c0,1000,101,c819fbe0,c059679f) at 0xc059e0d2 = kmem_malloc+0x32 page_alloc(c0c4d300,1000,c819fbd3,101,c06a3bf8) at 0xc0596bda = page_alloc+0x1a slab_zalloc(c0c4d300,101,c0c4d300,c0647a64,c0c4e460) at 0xc059679f = slab_zalloc+0x9f uma_zone_slab(c0c4d300,1,c0c4e468,0,c061f05a,8a2) at 0xc0597dec = uma_zone_slab+0xec uma_zalloc_internal(c0c4d300,0,1,0,c0c4dc48) at 0xc0598129 = uma_zalloc_internal+0x29 bucket_alloc(80,1,c0c380a0,0,c19ab6a4) at 0xc0595eac = bucket_alloc+0x2c uma_zfree_arg(c0c4dc00,c19ab6a4,0) at 0xc0598483 = uma_zfree_arg+0x283 mac_labelzone_free(c19ab6a4,c1a01828,e8,c819fc9c,c0565ad2) at 0xc055dab3 = mac_labelzone_free+0x13 mac_vnode_label_free(c19ab6a4,c1a01828,c819fcac,c04d8766,c1a01828) at 0xc0565aaa = mac_vnode_label_free+0x6a mac_destroy_vnode(c1a01828) at 0xc0565ad2 = mac_destroy_vnode+0x12 vdestroy(c1a01828,c1a01828,c819fcec,c04d8142,c1a01828) at 0xc04d8766 = vdestroy+0x1c6 vdropl(c1a01828,7,a8,c0653ee0,c1a01828) at 0xc04dad1e = vdropl+0x3e vlrureclaim(c15e8000,c1529000,c156f000,c04d8360,c156f000) at 0xc04d8142 = vlrureclaim+0x282 vnlru_proc(0,c819fd38,0,c04d8360,0) at 0xc04d84e3 = vnlru_proc+0x183 fork_exit(c04d8360,0,c819fd38) at 0xc046de7d = fork_exit+0x9d fork_trampoline() at 0xc05d33bc = fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xc819fd6c, ebp = 0 --- A patch to fix the LOR (seems to be relevant for CURRENT too): --- sys/kern/vfs_subr.c.origSat Mar 4 10:44:47 2006 +++ sys/kern/vfs_subr.c Sat Mar 4 10:45:21 2006 @@ -787,9 +787,6 @@ VNASSERT(bo-bo_dirty.bv_root == NULL, vp, (dirtyblkroot not NULL)); VNASSERT(TAILQ_EMPTY(vp-v_cache_dst), vp, (vp has namecache dst)); VNASSERT(LIST_EMPTY(vp-v_cache_src), vp, (vp has namecache src)); -#ifdef MAC - mac_destroy_vnode(vp); -#endif if (vp-v_pollinfo != NULL) { knlist_destroy(vp-v_pollinfo-vpi_selinfo.si_note); mtx_destroy(vp-v_pollinfo-vpi_lock); @@ -801,6 +798,9 @@ #endif lockdestroy(vp-v_vnlock); mtx_destroy(vp-v_interlock); +#ifdef MAC + mac_destroy_vnode(vp); +#endif uma_zfree(vnode_zone, vp); } Up to the moment (uptime 11:09AM up 1:24, 1 user, load averages: 1.32, 1.56, 1.56) everything else seems to be okey. pgpAt1bKRI7Cp.pgp Description: PGP signature