Re: NFS deadlock (unkillable nfsd and no mounts work)
on 05/11/2010 23:27 Kostik Belousov said the following: I agree that the fix a right fix for real issue. It should only affect the filesystems that do support VFS_VGET(). In other words, it is relevant for e.g. UFS exports, but not for ZFS, that is the Andrey case. Actually ZFS does implement vfs_vget, but with a special quirk for .zfs/ and stuff under it: static int zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp) { zfsvfs_t*zfsvfs = vfsp-vfs_data; znode_t *zp; int err; /* * zfs_zget() can't operate on virtual entires like .zfs/ or entries === == * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP. * This will make NFS to switch to LOOKUP instead of using VGET. */ if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR) return (EOPNOTSUPP); ... ... -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
on 05/11/2010 23:27 Kostik Belousov said the following: I agree that the fix a right fix for real issue. It should only affect the filesystems that do support VFS_VGET(). In other words, it is relevant for e.g. UFS exports, but not for ZFS, that is the Andrey case. Actually ZFS does implement vfs_vget, but with a special quirk for .zfs/ and stuff under it: static int zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp) { zfsvfs_t*zfsvfs = vfsp-vfs_data; znode_t *zp; int err; /* * zfs_zget() can't operate on virtual entires like .zfs/ or * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP. * This will make NFS to switch to LOOKUP instead of using VGET. */ if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR) return (EOPNOTSUPP); ... ... -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
on 05/11/2010 07:35 Josh Carroll said the following: Greetings! I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. The current kernel I'm running is via SVN I just grabbed this evening (around 5pm PDT on November 4th), but I was having the same problem yesterday around 9pm PDT after a csup yesterday (I switched to SVN today to rule out a stale /usr/src from an out of sync cvsup mirror). Here are the svn details: Path: /usr/src URL: svn://svn.freebsd.org/base/stable/8 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 214807 Node Kind: directory Schedule: normal Last Changed Author: jhb Last Changed Rev: 214791 Last Changed Date: 2010-11-04 10:25:31 -0700 (Thu, 04 Nov 2010) uname -a: FreeBSD 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807: Thu Nov 4 17:13:05 PDT 2010 r...@pflog.net:/usr/obj/usr/src/sys/PFLOG amd64 I have a Popcorn Hour, and as soon as I try to connect to my NFS mount with it, it hangs on the Popcorn Hour, then eventually pops up a message that says Request cannot be processed. Likewise if I try to mount it from my macbook, it hangs then later just says operation timed out or something like that, after it hangs for quite a while. During this hang, there is nothing in /var/log indicating a problem nor any other indications something is wrong, except that none of my NFS mounts work and the nfsd process will not die. When I try to reboot the server, I wind up having to fsck all my drives (except the ZFS one), since nfsd will not die. Even kill -9 doesn't kill it (it's showing as in the D state): root 444 0.0 0.0 5812 1384 ?? D 9:30PM 0:00.00 nfsd: server (nfsd) You can try 'procstat -kk pid' next time this happens. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
Greetings! I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. Please try the attached patch, rick ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick --- nfs_serv.c.sav 2010-11-05 08:15:57.0 -0400 +++ nfs_serv.c 2010-11-05 08:18:40.0 -0400 @@ -3252,7 +3252,7 @@ nfhp-fh_fsid = nvp-v_mount-mnt_stat.f_fsid; if ((error1 = VOP_VPTOFH(nvp, nfhp-fh_fid)) == 0) error1 = VOP_GETATTR(nvp, vap, cred); - if (vp == nvp) + if (usevget == 0 vp == nvp) vunref(nvp); else vput(nvp); ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. Please try the attached patch, rick Thanks! I had to manually patch for some reason, but I can confirmed that nfsd is now well-behaved with your patch applied. I tested a couple of different mounts and played two separate files on the Popcorn Hour (one lower bitrate, the other higher bitrate) and both played without a hiccup. While those were playing I also was able to automount my home directory on the macbook and move around my home directory. So it looks like this patch did the trick. Thanks Rick, really appreciate the fast response. Is there a reason why this doesn't seem to be getting reported a lot? What is particular in my setup that broke it? ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick If someone can commit this, I'd really appreciate it. I will report back if I notice any problems, but I imagine this would probably get fixed in HEAD first, then MFC'd anyway, right? Unless this is already fixed in HEAD. Anyway, thanks again Rick! I appreciate it. Regards, Josh As far as I can tell, there have been no adverse effects or regressions with the kernel built with this patch (I had t ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. Please try the attached patch, rick Thanks! I had to manually patch for some reason, but I can confirmed that nfsd is now well-behaved with your patch applied. I tested a couple of different mounts and played two separate files on the Popcorn Hour (one lower bitrate, the other higher bitrate) and both played without a hiccup. While those were playing I also was able to automount my home directory on the macbook and move around my home directory. So it looks like this patch did the trick. Thanks Rick, really appreciate the fast response. Is there a reason why this doesn't seem to be getting reported a lot? What is particular in my setup that broke it? Well, the commit that broke things just hit stable/8 on Nov. 3. Also, I'm not sure what scenarios would have caused the breakage. I think it would be something like a file system where vget worked that dropped out of the loop just after looking up . or .. at the root, so that the nvp remained locked. But I'm not sure what the exact scenarios are? (Holding the shared lock shouldn't have stopped further VFS_VGET()s from succeeding, I think?) ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick If someone can commit this, I'd really appreciate it. I will report back if I notice any problems, but I imagine this would probably get fixed in HEAD first, then MFC'd anyway, right? Unless this is already fixed in HEAD. The patch isn't in head, but hopefully someone like kib@ or jhb@ can do it, since I won't be able to MFC it before code freeze. They might have a better patch? Anyhow, good to hear it fixes the problem. Thanks for reporting the problem and testing the patch, rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: NFS deadlock (unkillable nfsd and no mounts work)
On Fri, Nov 05, 2010 at 10:27:09AM -0700, Josh Carroll wrote: I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. Please try the attached patch, rick Thanks! I had to manually patch for some reason, but I can confirmed that nfsd is now well-behaved with your patch applied. I tested a couple of different mounts and played two separate files on the Popcorn Hour (one lower bitrate, the other higher bitrate) and both played without a hiccup. While those were playing I also was able to automount my home directory on the macbook and move around my home directory. So it looks like this patch did the trick. Thanks Rick, really appreciate the fast response. Is there a reason why this doesn't seem to be getting reported a lot? What is particular in my setup that broke it? ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick If someone can commit this, I'd really appreciate it. I will report back if I notice any problems, but I imagine this would probably get fixed in HEAD first, then MFC'd anyway, right? Unless this is already fixed in HEAD. Anyway, thanks again Rick! I appreciate it. Regards, Josh As far as I can tell, there have been no adverse effects or regressions with the kernel built with this patch (I had t I agree that the fix a right fix for real issue. It should only affect the filesystems that do support VFS_VGET(). In other words, it is relevant for e.g. UFS exports, but not for ZFS, that is the Andrey case. The change is committed as r214851 with shortest MFC timeout possible. There is further issue with use of VOP_ISLOCKED(). Andrey, can you try this untested change in your settings ? Thanks and sorry. diff --git a/sys/nfsserver/nfs_serv.c b/sys/nfsserver/nfs_serv.c index 2b9131f..668b02c 100644 --- a/sys/nfsserver/nfs_serv.c +++ b/sys/nfsserver/nfs_serv.c @@ -3037,6 +3037,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, struct vattr va, at, *vap = va; struct nfs_fattr *fp; int len, nlen, rem, xfer, tsiz, i, error = 0, error1, getret = 1; + int vp_locked; int siz, cnt, fullsiz, eofflag, rdonly, dirlen, ncookies; u_quad_t off, toff, verf; u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */ @@ -3067,10 +3068,12 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, fullsiz = siz; error = nfsrv_fhtovp(fhp, 1, vp, vfslocked, nfsd, slp, nam, rdonly, TRUE); + vp_locked = 1; if (!error vp-v_type != VDIR) { error = ENOTDIR; vput(vp); vp = NULL; + vp_locked = 0; } if (error) { nfsm_reply(NFSX_UNSIGNED); @@ -3090,6 +3093,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, error = nfsrv_access(vp, VEXEC, cred, rdonly, 0); if (error) { vput(vp); + vp_locked = 0; vp = NULL; nfsm_reply(NFSX_V3POSTOPATTR); nfsm_srvpostop_attr(getret, at); @@ -3097,6 +3101,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, goto nfsmout; } VOP_UNLOCK(vp, 0); + vp_locked = 0; rbuf = malloc(siz, M_TEMP, M_WAITOK); again: iv.iov_base = rbuf; @@ -3110,6 +3115,7 @@ again: io.uio_td = NULL; eofflag = 0; vn_lock(vp, LK_SHARED | LK_RETRY); + vp_locked = 1; if (cookies) { free((caddr_t)cookies, M_TEMP); cookies = NULL; @@ -3118,6 +3124,7 @@ again: off = (u_quad_t)io.uio_offset; getret = VOP_GETATTR(vp, at, cred); VOP_UNLOCK(vp, 0); + vp_locked = 0; if (!cookies !error) error = NFSERR_PERM; if (!error) @@ -3238,8 +3245,10 @@ again: } else { cn.cn_flags = ~ISDOTDOT; } - if (!VOP_ISLOCKED(vp)) + if (!vp_locked) { vn_lock(vp, LK_SHARED | LK_RETRY); + vp_locked = 1; + } if ((vp-v_vflag VV_ROOT) != 0 (cn.cn_flags ISDOTDOT) != 0) { vref(vp); @@ -3342,7 +3351,7 @@ invalid: cookiep++; ncookies--; } - if (!usevget VOP_ISLOCKED(vp)) + if (!usevget vp_locked) vput(vp); else
Re: NFS deadlock (unkillable nfsd and no mounts work)
On Thu, Nov 04, 2010 at 10:35:15PM -0700, Josh Carroll wrote: Greetings! I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. I had a similar issue on -current a few weeks ago, with processes that would lock up and become unkillable when they tried to access certain parts of the filesystem (running all zfs here). One time it managed to lock up every time you'd do an ls /, but a reboot would always clear it, then a few days later it would pop up again somewhere else. I never lost any data, zfs never found anything wrong, and the drives and hw all checked out. I sync'd up with the latest -current on oct 18th and it stopped happening (or maybe I just stopped noticing it, entirely possible for a very lightly used personal box), plus I was traveling and super busy at the time, so I didn't bother pursuing it further. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org