Re: Nullfs leaks i-nodes
On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 --- a/sys/fs/nullfs/null.h The page fault seen in fifo_close() seems unrelated to this patch, which I will continue testing some more. The scenario triggering the page fault is the rm: mdconfig -a -t swap -s 1g -u 5 bsdlabel -w md5 auto newfs -U md5a mount /dev/md5a /mnt mount -t nullfs /mnt /mnt2 mkfifo /mnt2/fifo rm /mnt/fifo Not a problem on 8.3-STABLE r247938. - Peter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Nullfs leaks i-nodes
On Thu, May 09, 2013 at 09:02:56AM +0200, Peter Holm wrote: On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 --- a/sys/fs/nullfs/null.h The page fault seen in fifo_close() seems unrelated to this patch, which I will continue testing some more. The scenario triggering the page fault is the rm: mdconfig -a -t swap -s 1g -u 5 bsdlabel -w md5 auto newfs -U md5a mount /dev/md5a /mnt mount -t nullfs /mnt /mnt2 mkfifo /mnt2/fifo rm /mnt/fifo Yeah, I figured this out from the backtrace. The panic in kostik563 is related and directly caused by the patch, since patch erronously performs vgone() on the active (removed) vnode. As result, a spurious VOP_CLOSE() call is performed on the vnode, which is more or less innocent for regular files, but fatal for fifos and probably nfsv4 as well. Updated patch is below. Not a problem on 8.3-STABLE r247938. Yes, new nullfs is only in HEAD and stable/9. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..0972dfc 100644 --- a/sys/fs/nullfs/null.h +++ b/sys/fs/nullfs/null.h @@ -53,8 +53,12 @@ struct null_node { LIST_ENTRY(null_node) null_hash; /* Hash list */ struct vnode*null_lowervp; /* VREFed once */ struct vnode*null_vnode;/* Back pointer */ + u_int null_flags; }; +#defineNULLV_NOUNLOCK 0x0001 +#defineNULLV_DROP 0x0002 + #defineMOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)-mnt_data)) #defineVTONULL(vp) ((struct null_node *)(vp)-v_data) #defineNULLTOV(xp) ((xp)-null_vnode) diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 02932bd..ad02236 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -65,7 +65,6 @@ static vfs_statfs_t nullfs_statfs; static vfs_unmount_t nullfs_unmount; static vfs_vget_t nullfs_vget; static vfs_extattrctl_tnullfs_extattrctl; -static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp; /* * Mount null layer @@ -391,8 +390,37 @@ nullfs_reclaim_lowervp(struct mount *mp, struct vnode *lowervp) vp = null_hashget(mp, lowervp); if (vp == NULL) return; + VTONULL(vp)-null_flags |= NULLV_NOUNLOCK; vgone(vp); - vn_lock(lowervp, LK_EXCLUSIVE | LK_RETRY); + vput(vp); +} + +static void +nullfs_unlink_lowervp(struct mount *mp, struct vnode *lowervp) +{ + struct vnode *vp; + struct null_node *xp; + + vp = null_hashget(mp, lowervp); + if (vp == NULL) + return; + xp = VTONULL(vp); + xp-null_flags |= NULLV_DROP | NULLV_NOUNLOCK; + vhold(vp); + vunref(vp); + + /* +* If vunref() dropped the last use reference on the nullfs +* vnode, it must be reclaimed, and its lock was split from +* the lower vnode lock. Need to do extra unlock before +* allowing the final vdrop() to free the vnode. +*/ + if (vp-v_usecount == 0) { + KASSERT((vp-v_iflag VI_DOOMED) != 0, + (not reclaimed %p, vp)); + VOP_UNLOCK(vp, 0); + } + vdrop(vp); } static struct vfsops null_vfsops = { @@ -408,6 +436,7 @@ static struct vfsops null_vfsops = { .vfs_unmount = nullfs_unmount, .vfs_vget = nullfs_vget, .vfs_reclaim_lowervp = nullfs_reclaim_lowervp, + .vfs_unlink_lowervp = nullfs_unlink_lowervp, }; VFS_SET(null_vfsops, nullfs, VFCF_LOOPBACK | VFCF_JAIL); diff --git
Re: Nullfs leaks i-nodes
--On Wednesday, 08 May, 2013 12:13 PM +0300 Konstantin Belousov kostik...@gmail.com wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 --- a/sys/fs/nullfs/null.h +++ b/sys/fs/nullfs/null.h @@ -53,8 +53,11 @@ struct null_node { LIST_ENTRY(null_node) null_hash; /* Hash list */ struct vnode*null_lowervp; /* VREFed once */ struct vnode*null_vnode;/* Back pointer */ + u_int null_flags; }; +#defineNULLV_NOUNLOCK 0x0001 + #defineMOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)-mnt_data)) #defineVTONULL(vp) ((struct null_node *)(vp)-v_data) #defineNULLTOV(xp) ((xp)-null_vnode) diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 02932bd..544c358 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -65,7 +65,6 @@ static vfs_statfs_t nullfs_statfs; static vfs_unmount_t nullfs_unmount; static vfs_vget_t nullfs_vget; static vfs_extattrctl_tnullfs_extattrctl; -static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp; /* * Mount null layer @@ -391,8 +390,22 @@ nullfs_reclaim_lowervp(struct mount *mp, struct vnode *lowervp) vp = null_hashget(mp, lowervp); if (vp == NULL) return; + VTONULL(vp)-null_flags |= NULLV_NOUNLOCK; vgone(vp); - vn_lock(lowervp, LK_EXCLUSIVE | LK_RETRY); + vput(vp); +} + +static void +nullfs_unlink_lowervp(struct mount *mp, struct vnode *lowervp) +{ + struct vnode *vp; + + vp = null_hashget(mp, lowervp); + if (vp == NULL || vp-v_usecount 1) + return; + VTONULL(vp)-null_flags |= NULLV_NOUNLOCK; + vgone(vp); + vput(vp); } static struct vfsops null_vfsops = { @@ -408,6 +421,7 @@ static struct vfsops null_vfsops = { .vfs_unmount = nullfs_unmount, .vfs_vget = nullfs_vget, .vfs_reclaim_lowervp = nullfs_reclaim_lowervp, + .vfs_unlink_lowervp = nullfs_unlink_lowervp, }; VFS_SET(null_vfsops, nullfs, VFCF_LOOPBACK | VFCF_JAIL); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index f59865f..3c41124 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -692,18 +692,21 @@ null_unlock(struct vop_unlock_args *ap) static int null_inactive(struct vop_inactive_args *ap __unused) { - struct vnode *vp; + struct vnode *vp, *lvp; struct mount *mp; struct null_mount *xmp; vp = ap-a_vp; + lvp = NULLVPTOLOWERVP(vp); mp = vp-v_mount; xmp = MOUNTTONULLMOUNT(mp); - if ((xmp-nullm_flags NULLM_CACHE) == 0) { + if ((xmp-nullm_flags NULLM_CACHE) == 0 || + (lvp-v_vflag VV_NOSYNC) != 0) { /* * If this is the last reference and caching of the -* nullfs vnodes is not enabled, then free up the -* vnode so as not to tie up the lower vnodes. +* nullfs vnodes is not enabled, or the lower vnode is +* deleted, then free up the vnode so as not to tie up +* the lower vnodes. */ vp-v_object = NULL; vrecycle(vp); @@ -748,7 +751,10 @@ null_reclaim(struct vop_reclaim_args *ap) */ if (vp-v_writecount 0) VOP_ADD_WRITECOUNT(lowervp, -1); - vput(lowervp); + if ((xp-null_flags NULLV_NOUNLOCK) != 0) + vunref(lowervp); + else + vput(lowervp); free(xp, M_NULLFSNODE); return (0); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index
Re: Nullfs leaks i-nodes
On Thu, May 09, 2013 at 02:11:44PM +0200, Goran Lowkrantz wrote: I assume this is CURRENT? Tried on STABLE but got this: cc1: warnings being treated as errors /usr/src/sys/kern/vfs_subr.c: In function 'vfs_notify_upper': /usr/src/sys/kern/vfs_subr.c:2801: warning: implicit declaration of function 'VFS_PROLOGUE' /usr/src/sys/kern/vfs_subr.c:2801: warning: nested extern declaration of 'VFS_PROLOGUE' [-Wnested-externs] /usr/src/sys/kern/vfs_subr.c:2801: warning: implicit declaration of function 'VFS_EPILOGUE' /usr/src/sys/kern/vfs_subr.c:2801: warning: nested extern declaration of 'VFS_EPILOGUE' [-Wnested-externs] *** [vfs_subr.o] Error code 1 Yes, the patch is for CURRENT. You can simply remove the VFS_PROLOGUE and VFS_EPILOGUE lines from the patch if applying on stable. Not sure whether it needs more trivial adjustments there. pgpgL_Pws0SGM.pgp Description: PGP signature
Re: Nullfs leaks i-nodes
--On May 7, 2013 23:41:51 +0300 Mikolaj Golub troc...@freebsd.org wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, Göran Löwkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. I am observing an effect that might look like inode leak, which I think is due free nullfs vnodes caching, recently added by kib (r240285): free inode number does not increase after unlink; but if I purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0 and observing vfs.freevnodes decreasing to 0) the inode number grows back. You have only about 1000 inodes available on your underlying fs, while vfs.wantfreevnodes I think is much higher, resulting in running out of i-nodes. If it is really your case you can disable caching, mounting nullfs with nocache (it looks like caching is not important in your case). -- Mikolaj Golub Thanks Mikolaj, mounting the active fs with 'nocache' fixed it, keeping ifree steady. Any idea how to fix this in NanoBSD? The data partition is created with only 1024 i-nodes in the script, so any use that includes file deletion on this r/w area will be bitten. As the nocache attribute is not valid for device mounts, I see no way to inherit it to the nullfs mounts for this specific partition. Easiest thing could be to set vfs.wantfreevnodes=0 in the default sysctl.conf, maybe? But will this have implications for non-nullfs filesystems? Only UFS? Even ZFS? Thanks again, Göran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: Nullfs leaks i-nodes
--On May 8, 2013 8:35:18 +1000 Dewayne Geraghty dewayne.gerag...@heuristicsystems.com.au wrote: -Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Mikolaj Golub Sent: Wednesday, 8 May 2013 6:42 AM To: Göran Löwkrantz Cc: Kostik Belousov; freebsd-stable@freebsd.org Subject: Re: Nullfs leaks i-nodes On Tue, May 07, 2013 at 08:30:06AM +0200, Göran Löwkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. I am observing an effect that might look like inode leak, which I think is due free nullfs vnodes caching, recently added by kib (r240285): free inode number does not increase after unlink; but if I purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0 and observing vfs.freevnodes decreasing to 0) the inode number grows back. You have only about 1000 inodes available on your underlying fs, while vfs.wantfreevnodes I think is much higher, resulting in running out of i-nodes. If it is really your case you can disable caching, mounting nullfs with nocache (it looks like caching is not important in your case). -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Hi Goran, After I included Kib's vnode caching patch the performance on my port builder machine, decreased significantly. The port builder is one of many jails and nullfs is used extensively. I was starving the system of vnodes. Increasing the kern.maxvnodes, resulted in better performance than the original system configuration without vnode caching. Thanks Kib :) I don't think you'll run out of vnodes as it is self adjusting (that was my concern too) I changed kern.maxvnode to approx 3 times what it wanted and tuned for my needs. Try it and keep an eye on: sysctl vfs.numvnodes vfs.wantfreevnodes vfs.freevnodes vm.stats.vm.v_vnodepgsout vm.stats.vm.v_vnodepgsin Regards, Dewayne Hi Dewayne, I got a few of those too but I didn't connect them with the FW problem as here there seems to be reclaim pressure. On the FW I get these numbers: vfs.numvnodes: 7500 vfs.wantfreevnodes: 27936 vfs.freevnodes: 5663 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 4399 while on the jail systems I get something like this: vfs.numvnodes: 51212 vfs.wantfreevnodes: 35668 vfs.freevnodes: 35665 vm.stats.vm.v_vnodepgsout: 5952 vm.stats.vm.v_vnodepgsin: 939563 and as far as I can understand, the fact that vfs.wantfreevnodes and vfs.freevnodes are almost the same suggests that we have a reclaim pressure. So one fix for small NanoBSD systems would be to lower vfs.wantfreevnodes and I will test that on a virtual machine and see if I can get better reclaim. MVH Göran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Nullfs leaks i-nodes
On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 --- a/sys/fs/nullfs/null.h +++ b/sys/fs/nullfs/null.h @@ -53,8 +53,11 @@ struct null_node { LIST_ENTRY(null_node) null_hash; /* Hash list */ struct vnode*null_lowervp; /* VREFed once */ struct vnode*null_vnode;/* Back pointer */ + u_int null_flags; }; +#defineNULLV_NOUNLOCK 0x0001 + #defineMOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)-mnt_data)) #defineVTONULL(vp) ((struct null_node *)(vp)-v_data) #defineNULLTOV(xp) ((xp)-null_vnode) diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 02932bd..544c358 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -65,7 +65,6 @@ static vfs_statfs_t nullfs_statfs; static vfs_unmount_t nullfs_unmount; static vfs_vget_t nullfs_vget; static vfs_extattrctl_tnullfs_extattrctl; -static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp; /* * Mount null layer @@ -391,8 +390,22 @@ nullfs_reclaim_lowervp(struct mount *mp, struct vnode *lowervp) vp = null_hashget(mp, lowervp); if (vp == NULL) return; + VTONULL(vp)-null_flags |= NULLV_NOUNLOCK; vgone(vp); - vn_lock(lowervp, LK_EXCLUSIVE | LK_RETRY); + vput(vp); +} + +static void +nullfs_unlink_lowervp(struct mount *mp, struct vnode *lowervp) +{ + struct vnode *vp; + + vp = null_hashget(mp, lowervp); + if (vp == NULL || vp-v_usecount 1) + return; + VTONULL(vp)-null_flags |= NULLV_NOUNLOCK; + vgone(vp); + vput(vp); } static struct vfsops null_vfsops = { @@ -408,6 +421,7 @@ static struct vfsops null_vfsops = { .vfs_unmount = nullfs_unmount, .vfs_vget = nullfs_vget, .vfs_reclaim_lowervp = nullfs_reclaim_lowervp, + .vfs_unlink_lowervp = nullfs_unlink_lowervp, }; VFS_SET(null_vfsops, nullfs, VFCF_LOOPBACK | VFCF_JAIL); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index f59865f..3c41124 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -692,18 +692,21 @@ null_unlock(struct vop_unlock_args *ap) static int null_inactive(struct vop_inactive_args *ap __unused) { - struct vnode *vp; + struct vnode *vp, *lvp; struct mount *mp; struct null_mount *xmp; vp = ap-a_vp; + lvp = NULLVPTOLOWERVP(vp); mp = vp-v_mount; xmp = MOUNTTONULLMOUNT(mp); - if ((xmp-nullm_flags NULLM_CACHE) == 0) { + if ((xmp-nullm_flags NULLM_CACHE) == 0 || + (lvp-v_vflag VV_NOSYNC) != 0) { /* * If this is the last reference and caching of the -* nullfs vnodes is not enabled, then free up the -* vnode so as not to tie up the lower vnodes. +* nullfs vnodes is not enabled, or the lower vnode is +* deleted, then free up the vnode so as not to tie up +* the lower vnodes. */ vp-v_object = NULL; vrecycle(vp); @@ -748,7 +751,10 @@ null_reclaim(struct vop_reclaim_args *ap) */ if (vp-v_writecount 0) VOP_ADD_WRITECOUNT(lowervp, -1); - vput(lowervp); + if ((xp-null_flags NULLV_NOUNLOCK) != 0) + vunref(lowervp); + else + vput(lowervp); free(xp, M_NULLFSNODE); return (0); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index de118f7..988a114 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -2700,19
Re: Nullfs leaks i-nodes
On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 I got this page fault after interrupting a nullfs test that had been running for three hours: http://people.freebsd.org/~pho/stress/log/kostik562.txt - Peter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Nullfs leaks i-nodes
On Wed, May 08, 2013 at 07:58:56PM +0200, Peter Holm wrote: On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote: On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. The patch below should fix the issue for you, at least it did so in my limited testing. What is does: 1. When inactivating a nullfs vnode, check if the lower vnode is unlinked, and reclaim upper vnode if so. [This fixes your case]. 2. Besides a callback to the upper filesystems for the lower vnode reclaimation, it also calls the upper fs for lower vnode unlink. This allows nullfs to purge cached vnodes for the unlinked lower. [This fixes an opposite case, when the vnode is removed from the lower mount, but upper aliases prevent the vnode from being recycled]. 3. Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be completely innocent, but now it is also formally safe. 4. Fix vnode reference leak in nullfs_reclaim_lowervp(). Please note that the patch is basically not tested, I only verified your scenario and a mirror of it as described in item 2. diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 4f37020..a624be6 100644 I got this page fault after interrupting a nullfs test that had been running for three hours: http://people.freebsd.org/~pho/stress/log/kostik562.txt Seems to be easily reproduced, so I compiled null_vnops.c and fifo_vnops.c without -O in order to get some more info: http://people.freebsd.org/~pho/stress/log/kostik563.txt - Peter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Nullfs leaks i-nodes
On Tue, May 07, 2013 at 08:30:06AM +0200, Göran Löwkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. I am observing an effect that might look like inode leak, which I think is due free nullfs vnodes caching, recently added by kib (r240285): free inode number does not increase after unlink; but if I purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0 and observing vfs.freevnodes decreasing to 0) the inode number grows back. You have only about 1000 inodes available on your underlying fs, while vfs.wantfreevnodes I think is much higher, resulting in running out of i-nodes. If it is really your case you can disable caching, mounting nullfs with nocache (it looks like caching is not important in your case). -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: Nullfs leaks i-nodes
-Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Mikolaj Golub Sent: Wednesday, 8 May 2013 6:42 AM To: Göran Löwkrantz Cc: Kostik Belousov; freebsd-stable@freebsd.org Subject: Re: Nullfs leaks i-nodes On Tue, May 07, 2013 at 08:30:06AM +0200, Göran Löwkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. I am observing an effect that might look like inode leak, which I think is due free nullfs vnodes caching, recently added by kib (r240285): free inode number does not increase after unlink; but if I purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0 and observing vfs.freevnodes decreasing to 0) the inode number grows back. You have only about 1000 inodes available on your underlying fs, while vfs.wantfreevnodes I think is much higher, resulting in running out of i-nodes. If it is really your case you can disable caching, mounting nullfs with nocache (it looks like caching is not important in your case). -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Hi Goran, After I included Kib's vnode caching patch the performance on my port builder machine, decreased significantly. The port builder is one of many jails and nullfs is used extensively. I was starving the system of vnodes. Increasing the kern.maxvnodes, resulted in better performance than the original system configuration without vnode caching. Thanks Kib :) I don't think you'll run out of vnodes as it is self adjusting (that was my concern too) I changed kern.maxvnode to approx 3 times what it wanted and tuned for my needs. Try it and keep an eye on: sysctl vfs.numvnodes vfs.wantfreevnodes vfs.freevnodes vm.stats.vm.v_vnodepgsout vm.stats.vm.v_vnodepgsin Regards, Dewayne ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Nullfs leaks i-nodes
On Wed, May 08, 2013 at 08:35:18AM +1000, Dewayne Geraghty wrote: -Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Mikolaj Golub Sent: Wednesday, 8 May 2013 6:42 AM To: G?ran L?wkrantz Cc: Kostik Belousov; freebsd-stable@freebsd.org Subject: Re: Nullfs leaks i-nodes On Tue, May 07, 2013 at 08:30:06AM +0200, G?ran L?wkrantz wrote: I created a PR, kern/178238, on this but would like to know if anyone has any ideas or patches? Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 and still have the problem. I am observing an effect that might look like inode leak, which I think is due free nullfs vnodes caching, recently added by kib (r240285): free inode number does not increase after unlink; but if I purge the free vnodes cache (temporary setting vfs.wantfreevnodes to 0 and observing vfs.freevnodes decreasing to 0) the inode number grows back. You have only about 1000 inodes available on your underlying fs, while vfs.wantfreevnodes I think is much higher, resulting in running out of i-nodes. If it is really your case you can disable caching, mounting nullfs with nocache (it looks like caching is not important in your case). -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Hi Goran, After I included Kib's vnode caching patch the performance on my port builder machine, decreased significantly. The port builder is one of many jails and nullfs is used extensively. I was starving the system of vnodes. Increasing the kern.maxvnodes, resulted in better performance than the original system configuration without vnode caching. Thanks Kib :) I don't think you'll run out of vnodes as it is self adjusting (that was my concern too) I changed kern.maxvnode to approx 3 times what it wanted and tuned for my needs. Try it and keep an eye on: sysctl vfs.numvnodes vfs.wantfreevnodes vfs.freevnodes vm.stats.vm.v_vnodepgsout vm.stats.vm.v_vnodepgsin Telling people keep an eye on these sysctls is not exactly helpful when there isn't an understanding of what they represent -- it's akin to people using munin or SNMP polling software to monitoring some MIBs without actually knowing, truly, deep down inside, what it is they're looking at. (I cannot tell you how often this happens. In fact, most systems monitoring softwares/graphs/other crap I see these days tends to suffer from exactly this) The only thing I'm aware of is what's in vnode(9) and what I could find here: http://www.youtube.com/watch?v=SpS7Ajnx9Q8 http://bsd-id.blogspot.com/2007/11/vnode.html All said -- has anyone actually seen vfs.freevnodes hit 0? On some of my systems I've seen it reach small numbers (in the 3-digit range), but would later increase (to mid-4-digit range), even after lots of (new/unique, i.e. not previously cached) file I/O. So the auto-adjusting nature of this makes it very hard for one to say keep an eye on these sysctls when the administrator does not know when he/she should become concerned/considering increasing kern.maxvnodes. Next, maxvnodes - numvnodes != freevnodes, which doesn't make sense to me: $ sysctl kern.maxvnodes vfs.freevnodes vfs.wantfreevnodes vfs.numvnodes kern.maxvnodes: 393216 vfs.freevnodes: 51543 vfs.wantfreevnodes: 51545 vfs.numvnodes: 244625 $ expr 393216 - 244625 148591 And finally, the lack of sysctl description for vfs.wantfreevnodes is quite bothersome: $ sysctl -d kern.maxvnodes vfs.freevnodes vfs.wantfreevnodes vfs.numvnodes kern.maxvnodes: Maximum number of vnodes vfs.freevnodes: Number of vnodes in the free list vfs.wantfreevnodes: vfs.numvnodes: Number of vnodes in existence -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: Nullfs leaks i-nodes
Jeremy, Thank-you for your advice. I take your point that the next time that I have an idea that might help, that is, suggesting that maxvnodes should be increased and that vnode depletion vs starvation shouldn't be a concern; I'll reflect on your reply for my failure to review the vnode source and to fully understand the ramifications of examing the sysctl's. Regards, Dewayne. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org