Re: VFS MFC testers wanted

2006-03-07 Thread Chad Whitacre

Kris,

If relevant, I may be able to test your patch, but the problem is 
occurring only rarely. Do you have any suggestions for isolating and 
reproducing this bug?


Run your script in a loop?


And I assume I can meaningfully test w/o the rsync call? I.e., just 
mounting and unmounting the drive over and over again should trigger the 
error, no?




chad
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-07 Thread Chad Whitacre

Jeff,

Kris Kennaway directed me to this thread from FreeBSD-questions. I am 
seeing a panic: unmount: dangling vnode with 6.0-RELEASE. Here are the 
relevant threads:


2 probs w/ backup.sh: Device busy and dangling vnode
http://lists.freebsd.org/pipermail/freebsd-questions/2006-March/114825.html

Panic: unmount: dangling vnode
http://lists.freebsd.org/pipermail/freebsd-questions/2006-March/115060.html


If relevant, I may be able to test your patch, but the problem is 
occurring only rarely. Do you have any suggestions for isolating and 
reproducing this bug?


Also, we are seeing this problem on a production box. I notice that the 
patch fixes 6 issues, and apparently breaks the kernel ABI, which 
sounds nasty from out here in userland. Any chance of getting a patch 
that isolates this specific issue? I'll be more likely able to apply 
such a patch. Our alternative is to simply keep our backup drive always 
mounted until 6.1 comes out and test your patch then. :^)


Thoughts?

Thanks for your work on this.



Chad Whitacre
http://www.zetadev.com/



Jeff Roberson wrote:

I plan to MFC all of this lovely stuff for 6.1:

http://www.chesapeake.net/~jroberson/vfsmfc.diff

I'm looking for people who are willing to patch their stable boxes and 
test this.  This has the following changes in it:


1)  Improved debugging with DEBUG_LOCKS via the new stack(9) api.
2)  Fixed an INACTIVE leak.
3)  Fixed several unmount races.
4)  Fixed several nullfs unmount issues.
5)  Some more Giant related VFS fixes and asserts.
6)  Fixed the quota deadlock.

These problems should be rare enough that most of you have not seen 
them. So just let me know if this introduces any new problems etc.  I 
will be MFCing within a week.


Thanks,
Jeff
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-07 Thread Kris Kennaway
On Tue, Mar 07, 2006 at 09:58:35AM -0500, Chad Whitacre wrote:
 Kris,
 
 If relevant, I may be able to test your patch, but the problem is 
 occurring only rarely. Do you have any suggestions for isolating and 
 reproducing this bug?
 
 Run your script in a loop?
 
 And I assume I can meaningfully test w/o the rsync call? I.e., just 
 mounting and unmounting the drive over and over again should trigger the 
 error, no?

No, the rsync (i.e. activity on the filesystem) is important.

Kris


pgpbfK9ZsnPAD.pgp
Description: PGP signature


Re: VFS MFC testers wanted

2006-03-07 Thread Chad Whitacre

Kris,


No, the rsync (i.e. activity on the filesystem) is important.


Yeah, just ran a test w/o it actually. We are planning to run a test w/ 
some disk activity later this afternoon. I suppose the more activity, 
the more likely to see the bug, eh?



chad
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-07 Thread Kris Kennaway
On Tue, Mar 07, 2006 at 12:26:47PM -0500, Chad Whitacre wrote:
 Kris,
 
 No, the rsync (i.e. activity on the filesystem) is important.
 
 Yeah, just ran a test w/o it actually. We are planning to run a test w/ 
 some disk activity later this afternoon. I suppose the more activity, 
 the more likely to see the bug, eh?

Yes.  FYI, I am fairly confident this is fixed, because I make
extensive use of mount/umount+filesystem activity, and I am no longer
seeing problems like this.

Kris


pgpdOdnItHgpw.pgp
Description: PGP signature


Re: VFS MFC testers wanted

2006-03-07 Thread Chad Whitacre

Kris,


Yes.  FYI, I am fairly confident this is fixed, because I make
extensive use of mount/umount+filesystem activity, and I am no longer
seeing problems like this.


Great! Thanks for the info. We won't kill ourselves trying to test this 
then.




chad
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-07 Thread Kris Kennaway
On Tue, Mar 07, 2006 at 12:33:28PM -0500, Chad Whitacre wrote:
 Kris,
 
 Yes.  FYI, I am fairly confident this is fixed, because I make
 extensive use of mount/umount+filesystem activity, and I am no longer
 seeing problems like this.
 
 Great! Thanks for the info. We won't kill ourselves trying to test this 
 then.

Well, I'd still like you to test it just to be sure.  I'm not able to
detect all FreeBSD bugs, after all (though I try :-)

Kris


pgpPkk9pi3I9L.pgp
Description: PGP signature


Re: VFS MFC testers wanted

2006-03-07 Thread David Kirchner
On 3/3/06, Jeff Roberson [EMAIL PROTECTED] wrote:
 I plan to MFC all of this lovely stuff for 6.1:

 http://www.chesapeake.net/~jroberson/vfsmfc.diff

 I'm looking for people who are willing to patch their stable boxes and
 test this.  This has the following changes in it:

 1)  Improved debugging with DEBUG_LOCKS via the new stack(9) api.
 2)  Fixed an INACTIVE leak.
 3)  Fixed several unmount races.
 4)  Fixed several nullfs unmount issues.
 5)  Some more Giant related VFS fixes and asserts.
 6)  Fixed the quota deadlock.

 These problems should be rare enough that most of you have not seen them.
 So just let me know if this introduces any new problems etc.  I will be
 MFCing within a week.

Do you have a list of the PRs that this affects and/or resolves?

I'm curious, specifically, about kern/84589. I believe it may be
related to snapshots in some way, but I don't know. I'm not going to
be able to test the patch on these servers, unfortunately, as they're
fully in production now. The servers are stable now with
background_fsck=NO, but YES is still the default (last I checked).

FWIW: the deadlock in 84589 doesn't involve quotas, as there were no
quotas enabled in the kernel. The URLs referenced in the PR are no
longer valid (they were never clicked) but I can produce them if
necessary.

Also: http://www.freebsd.org/releases/6.1R/todo.html

The todo page mentions deadlocks but doesn't link to any specific
PRs/threads that discuss them. Are these fixed with this MFC?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-07 Thread Kris Kennaway
On Tue, Mar 07, 2006 at 10:15:59AM -0800, David Kirchner wrote:

 Do you have a list of the PRs that this affects and/or resolves?

I don't know of any relevant PRs, but I haven't looked very hard.  The
bugs Jeff has been fixing are mostly those I've been able to reproduce
myself in testing.

 I'm curious, specifically, about kern/84589. I believe it may be
 related to snapshots in some way, but I don't know. I'm not going to
 be able to test the patch on these servers, unfortunately, as they're
 fully in production now. The servers are stable now with
 background_fsck=NO, but YES is still the default (last I checked).

You need to enable debugging, specifically KDB, DDB, INVARIANTS,
INVARIANT_SUPPORT, DEBUG_LOCKS and DEBUG_VFS_LOCKS (ideally with the
patch applied, as it gives more useful debugging).  Then reproduce the
deadlock condition, break to DDB and do 'show lockedvnods' and 'wh
pid' where pid are the processes listed as holding locks.

 Also: http://www.freebsd.org/releases/6.1R/todo.html
 
 The todo page mentions deadlocks but doesn't link to any specific
 PRs/threads that discuss them. Are these fixed with this MFC?

Some deadlocks are resolved, yes.  There are other snapshot-related
deadlocks that are not fixed with this patch, but which Jeff and
others are still working on.

Kris


pgpDkCqJCV3wS.pgp
Description: PGP signature


Re: VFS MFC testers wanted

2006-03-07 Thread Chad Whitacre

Kris,

For the record, we ran the following for about 30 minutes, with no ill 
effect:


  #!/bin/sh
  exec  /var/log/panic
  exec 21
  echo
  echo `date` -- trying to panic
  while [ 1 ]
  do
  /sbin/mount /backup/
  /bin/rm -rf /backup/foo
  /bin/cp -R /usr/bin /backup/foo
  /sbin/umount /backup/
  echo -n '.'
  done


At this point our plan is to cross our fingers and wait for 6.1.

Thanks for all your efforts!



chad






Kris Kennaway wrote:

On Tue, Mar 07, 2006 at 12:33:28PM -0500, Chad Whitacre wrote:

Kris,


Yes.  FYI, I am fairly confident this is fixed, because I make
extensive use of mount/umount+filesystem activity, and I am no longer
seeing problems like this.
Great! Thanks for the info. We won't kill ourselves trying to test this 
then.


Well, I'd still like you to test it just to be sure.  I'm not able to
detect all FreeBSD bugs, after all (though I try :-)

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-05 Thread Jeff Roberson

On Sat, 4 Mar 2006, Kostik Belousov wrote:


On Fri, Mar 03, 2006 at 03:41:55PM -0800, Jeff Roberson wrote:

I plan to MFC all of this lovely stuff for 6.1:

http://www.chesapeake.net/~jroberson/vfsmfc.diff

I'm looking for people who are willing to patch their stable boxes and
test this.  This has the following changes in it:

1)  Improved debugging with DEBUG_LOCKS via the new stack(9) api.
2)  Fixed an INACTIVE leak.
3)  Fixed several unmount races.
4)  Fixed several nullfs unmount issues.
5)  Some more Giant related VFS fixes and asserts.
6)  Fixed the quota deadlock.

These problems should be rare enough that most of you have not seen them.
So just let me know if this introduces any new problems etc.  I will be
MFCing within a week.

Thanks,
Jeff


I applied the patch to the today 6-STABLE and now testing it
on the (relatively slow, k6/266Mhz) machine, by cvs-ing the
sources and building the world. Kernel config is custom (see below
issue #2), I added DEBUG_* and WITNESS* options from your patch.
Config does not include QUOTAS.

1. The patch breaks the kernel ABI, even for the case when DEBUG_VFS_LOCK
is not defined. This is due to changes inside struct mount, adding mnt_ref
and rearranging several existing fields. This issue shall be at least
mentioned in release notes.


Yes, you are correct.  I had mentioned this to re but not in the mail 
here.  I intend to make some changes to fix the abi breakage.




2. I built custom kernel with options MAC. After some fs activity, I got
the LOR:


Strange, I thought I had fixed this some time ago.  I'll look into it, 
thanks.




lock order reversal:
1st 0xc1a018f0 vnode interlock (vnode interlock) @ 
/usr/home/kostik/work/bsd/sys/kern/vfs_subr.c:2449
2nd 0xc0c43144 system map (system map) @ 
/usr/home/kostik/work/bsd/sys/vm/vm_kern.c:295
KDB: stack backtrace:
kdb_backtrace(0,,c06676b0,c0667700,c0636024) at 0xc049d3c9 = 
kdb_backtrace+0x29
witness_checkorder(c0c43144,9,c061fe28,127) at 0xc04a80c2 = 
witness_checkorder+0x582
_mtx_lock_flags(c0c43144,0,c061fe28,127) at 0xc047b998 = _mtx_lock_flags+0x58
_vm_map_lock(c0c430c0,c061fe28,127) at 0xc059eb46 = _vm_map_lock+0x26
kmem_malloc(c0c430c0,1000,101,c819fbe0,c059679f) at 0xc059e0d2 = 
kmem_malloc+0x32
page_alloc(c0c4d300,1000,c819fbd3,101,c06a3bf8) at 0xc0596bda = page_alloc+0x1a
slab_zalloc(c0c4d300,101,c0c4d300,c0647a64,c0c4e460) at 0xc059679f = 
slab_zalloc+0x9f
uma_zone_slab(c0c4d300,1,c0c4e468,0,c061f05a,8a2) at 0xc0597dec = 
uma_zone_slab+0xec
uma_zalloc_internal(c0c4d300,0,1,0,c0c4dc48) at 0xc0598129 = 
uma_zalloc_internal+0x29
bucket_alloc(80,1,c0c380a0,0,c19ab6a4) at 0xc0595eac = bucket_alloc+0x2c
uma_zfree_arg(c0c4dc00,c19ab6a4,0) at 0xc0598483 = uma_zfree_arg+0x283
mac_labelzone_free(c19ab6a4,c1a01828,e8,c819fc9c,c0565ad2) at 0xc055dab3 = 
mac_labelzone_free+0x13
mac_vnode_label_free(c19ab6a4,c1a01828,c819fcac,c04d8766,c1a01828) at 
0xc0565aaa = mac_vnode_label_free+0x6a
mac_destroy_vnode(c1a01828) at 0xc0565ad2 = mac_destroy_vnode+0x12
vdestroy(c1a01828,c1a01828,c819fcec,c04d8142,c1a01828) at 0xc04d8766 = 
vdestroy+0x1c6
vdropl(c1a01828,7,a8,c0653ee0,c1a01828) at 0xc04dad1e = vdropl+0x3e
vlrureclaim(c15e8000,c1529000,c156f000,c04d8360,c156f000) at 0xc04d8142 = 
vlrureclaim+0x282
vnlru_proc(0,c819fd38,0,c04d8360,0) at 0xc04d84e3 = vnlru_proc+0x183
fork_exit(c04d8360,0,c819fd38) at 0xc046de7d = fork_exit+0x9d
fork_trampoline() at 0xc05d33bc = fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc819fd6c, ebp = 0 ---

A patch to fix the LOR (seems to be relevant for CURRENT too):

--- sys/kern/vfs_subr.c.origSat Mar  4 10:44:47 2006
+++ sys/kern/vfs_subr.c Sat Mar  4 10:45:21 2006
@@ -787,9 +787,6 @@
VNASSERT(bo-bo_dirty.bv_root == NULL, vp, (dirtyblkroot not NULL));
VNASSERT(TAILQ_EMPTY(vp-v_cache_dst), vp, (vp has namecache dst));
VNASSERT(LIST_EMPTY(vp-v_cache_src), vp, (vp has namecache src));
-#ifdef MAC
-   mac_destroy_vnode(vp);
-#endif
if (vp-v_pollinfo != NULL) {
knlist_destroy(vp-v_pollinfo-vpi_selinfo.si_note);
mtx_destroy(vp-v_pollinfo-vpi_lock);
@@ -801,6 +798,9 @@
#endif
lockdestroy(vp-v_vnlock);
mtx_destroy(vp-v_interlock);
+#ifdef MAC
+   mac_destroy_vnode(vp);
+#endif
uma_zfree(vnode_zone, vp);
}

Up to the moment
(uptime 11:09AM  up  1:24, 1 user, load averages: 1.32, 1.56, 1.56)
everything else seems to be okey.




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VFS MFC testers wanted

2006-03-04 Thread Kostik Belousov
On Fri, Mar 03, 2006 at 03:41:55PM -0800, Jeff Roberson wrote:
 I plan to MFC all of this lovely stuff for 6.1:
 
 http://www.chesapeake.net/~jroberson/vfsmfc.diff
 
 I'm looking for people who are willing to patch their stable boxes and 
 test this.  This has the following changes in it:
 
 1)  Improved debugging with DEBUG_LOCKS via the new stack(9) api.
 2)  Fixed an INACTIVE leak.
 3)  Fixed several unmount races.
 4)  Fixed several nullfs unmount issues.
 5)  Some more Giant related VFS fixes and asserts.
 6)  Fixed the quota deadlock.
 
 These problems should be rare enough that most of you have not seen them. 
 So just let me know if this introduces any new problems etc.  I will be 
 MFCing within a week.
 
 Thanks,
 Jeff

I applied the patch to the today 6-STABLE and now testing it
on the (relatively slow, k6/266Mhz) machine, by cvs-ing the
sources and building the world. Kernel config is custom (see below
issue #2), I added DEBUG_* and WITNESS* options from your patch.
Config does not include QUOTAS.

1. The patch breaks the kernel ABI, even for the case when DEBUG_VFS_LOCK
is not defined. This is due to changes inside struct mount, adding mnt_ref
and rearranging several existing fields. This issue shall be at least
mentioned in release notes.

2. I built custom kernel with options MAC. After some fs activity, I got
the LOR:

lock order reversal:
 1st 0xc1a018f0 vnode interlock (vnode interlock) @ 
/usr/home/kostik/work/bsd/sys/kern/vfs_subr.c:2449
 2nd 0xc0c43144 system map (system map) @ 
/usr/home/kostik/work/bsd/sys/vm/vm_kern.c:295
KDB: stack backtrace:
kdb_backtrace(0,,c06676b0,c0667700,c0636024) at 0xc049d3c9 = 
kdb_backtrace+0x29
witness_checkorder(c0c43144,9,c061fe28,127) at 0xc04a80c2 = 
witness_checkorder+0x582
_mtx_lock_flags(c0c43144,0,c061fe28,127) at 0xc047b998 = _mtx_lock_flags+0x58
_vm_map_lock(c0c430c0,c061fe28,127) at 0xc059eb46 = _vm_map_lock+0x26
kmem_malloc(c0c430c0,1000,101,c819fbe0,c059679f) at 0xc059e0d2 = 
kmem_malloc+0x32
page_alloc(c0c4d300,1000,c819fbd3,101,c06a3bf8) at 0xc0596bda = page_alloc+0x1a
slab_zalloc(c0c4d300,101,c0c4d300,c0647a64,c0c4e460) at 0xc059679f = 
slab_zalloc+0x9f
uma_zone_slab(c0c4d300,1,c0c4e468,0,c061f05a,8a2) at 0xc0597dec = 
uma_zone_slab+0xec
uma_zalloc_internal(c0c4d300,0,1,0,c0c4dc48) at 0xc0598129 = 
uma_zalloc_internal+0x29
bucket_alloc(80,1,c0c380a0,0,c19ab6a4) at 0xc0595eac = bucket_alloc+0x2c
uma_zfree_arg(c0c4dc00,c19ab6a4,0) at 0xc0598483 = uma_zfree_arg+0x283
mac_labelzone_free(c19ab6a4,c1a01828,e8,c819fc9c,c0565ad2) at 0xc055dab3 = 
mac_labelzone_free+0x13
mac_vnode_label_free(c19ab6a4,c1a01828,c819fcac,c04d8766,c1a01828) at 
0xc0565aaa = mac_vnode_label_free+0x6a
mac_destroy_vnode(c1a01828) at 0xc0565ad2 = mac_destroy_vnode+0x12
vdestroy(c1a01828,c1a01828,c819fcec,c04d8142,c1a01828) at 0xc04d8766 = 
vdestroy+0x1c6
vdropl(c1a01828,7,a8,c0653ee0,c1a01828) at 0xc04dad1e = vdropl+0x3e
vlrureclaim(c15e8000,c1529000,c156f000,c04d8360,c156f000) at 0xc04d8142 = 
vlrureclaim+0x282
vnlru_proc(0,c819fd38,0,c04d8360,0) at 0xc04d84e3 = vnlru_proc+0x183
fork_exit(c04d8360,0,c819fd38) at 0xc046de7d = fork_exit+0x9d
fork_trampoline() at 0xc05d33bc = fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc819fd6c, ebp = 0 ---

A patch to fix the LOR (seems to be relevant for CURRENT too):

--- sys/kern/vfs_subr.c.origSat Mar  4 10:44:47 2006
+++ sys/kern/vfs_subr.c Sat Mar  4 10:45:21 2006
@@ -787,9 +787,6 @@
VNASSERT(bo-bo_dirty.bv_root == NULL, vp, (dirtyblkroot not NULL));
VNASSERT(TAILQ_EMPTY(vp-v_cache_dst), vp, (vp has namecache dst));
VNASSERT(LIST_EMPTY(vp-v_cache_src), vp, (vp has namecache src));
-#ifdef MAC
-   mac_destroy_vnode(vp);
-#endif
if (vp-v_pollinfo != NULL) {
knlist_destroy(vp-v_pollinfo-vpi_selinfo.si_note);
mtx_destroy(vp-v_pollinfo-vpi_lock);
@@ -801,6 +798,9 @@
 #endif
lockdestroy(vp-v_vnlock);
mtx_destroy(vp-v_interlock);
+#ifdef MAC
+   mac_destroy_vnode(vp);
+#endif
uma_zfree(vnode_zone, vp);
 }
 
Up to the moment
(uptime 11:09AM  up  1:24, 1 user, load averages: 1.32, 1.56, 1.56)
everything else seems to be okey.




pgpAt1bKRI7Cp.pgp
Description: PGP signature