Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-16 Thread Daniel Braniss
 on 05/11/2010 23:27 Kostik Belousov said the following:
  I agree that the fix a right fix for real issue. It should only
  affect the filesystems that do support VFS_VGET(). In other words,
  it is relevant for e.g. UFS exports, but not for ZFS, that is the
  Andrey case.
 
 Actually ZFS does implement vfs_vget, but with a special quirk for .zfs/ and
 stuff under it:
 
 static int
 zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp)
 {
 zfsvfs_t*zfsvfs = vfsp-vfs_data;
 znode_t *zp;
 int err;
 
 /*
  * zfs_zget() can't operate on virtual entires like .zfs/ or
 entries 

===
==
  * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP.
  * This will make NFS to switch to LOOKUP instead of using VGET.
  */
 if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR)
 return (EOPNOTSUPP);
 ...
 ...
 
 
 -- 
 Andriy Gapon
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-06 Thread Andriy Gapon
on 05/11/2010 23:27 Kostik Belousov said the following:
 I agree that the fix a right fix for real issue. It should only
 affect the filesystems that do support VFS_VGET(). In other words,
 it is relevant for e.g. UFS exports, but not for ZFS, that is the
 Andrey case.

Actually ZFS does implement vfs_vget, but with a special quirk for .zfs/ and
stuff under it:

static int
zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp)
{
zfsvfs_t*zfsvfs = vfsp-vfs_data;
znode_t *zp;
int err;

/*
 * zfs_zget() can't operate on virtual entires like .zfs/ or
 * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP.
 * This will make NFS to switch to LOOKUP instead of using VGET.
 */
if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR)
return (EOPNOTSUPP);
...
...


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Andriy Gapon
on 05/11/2010 07:35 Josh Carroll said the following:
 Greetings!
 
 I'm having a problem with nfsd hanging and not serving mount points,
 during which time it can not not be killed. This problem started
 happening sometime after November 2nd, since kernel from 11/2 sources
 does not exhibit this problem.
 
 The current kernel I'm running is via SVN I just grabbed this evening
 (around 5pm PDT on November 4th), but I was having the same problem
 yesterday around 9pm PDT after a csup yesterday (I switched to SVN
 today to rule out a stale /usr/src from an out of sync cvsup  mirror).
  Here are the svn details:
 
 Path: /usr/src
 URL: svn://svn.freebsd.org/base/stable/8
 Repository Root: svn://svn.freebsd.org/base
 Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
 Revision: 214807
 Node Kind: directory
 Schedule: normal
 Last Changed Author: jhb
 Last Changed Rev: 214791
 Last Changed Date: 2010-11-04 10:25:31 -0700 (Thu, 04 Nov 2010)
 
 uname -a:
 
 FreeBSD 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807: Thu Nov  4 17:13:05
 PDT 2010 r...@pflog.net:/usr/obj/usr/src/sys/PFLOG  amd64
 
 I have a Popcorn Hour, and as soon as I try to connect to my NFS mount
 with it, it hangs on the Popcorn Hour, then eventually pops up a
 message that says Request cannot be processed. Likewise if I try to
 mount it from my macbook, it hangs then later just says operation
 timed out or something like that, after it hangs for quite a while.
 
 During this hang, there is nothing in /var/log indicating a problem
 nor any other indications something is wrong, except that none of my
 NFS mounts work and the nfsd process will not die.
 
 When I try to reboot the server, I wind up having to fsck all my
 drives (except the ZFS one), since nfsd will not die. Even kill -9
 doesn't kill it (it's showing as in the D state):
 
 root 444 0.0 0.0 5812 1384 ?? D   9:30PM  0:00.00 nfsd: server (nfsd)

You can try 'procstat -kk pid' next time this happens.


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Rick Macklem
 Greetings!
 
 I'm having a problem with nfsd hanging and not serving mount points,
 during which time it can not not be killed. This problem started
 happening sometime after November 2nd, since kernel from 11/2 sources
 does not exhibit this problem.

Please try the attached patch, rick
ps: Starting about Monday I won't be able to do commits for about 3 weeks
so, if this patch works, could someone else please commit it, thanks,
rick
--- nfs_serv.c.sav	2010-11-05 08:15:57.0 -0400
+++ nfs_serv.c	2010-11-05 08:18:40.0 -0400
@@ -3252,7 +3252,7 @@
 			nfhp-fh_fsid = nvp-v_mount-mnt_stat.f_fsid;
 			if ((error1 = VOP_VPTOFH(nvp, nfhp-fh_fid)) == 0)
 error1 = VOP_GETATTR(nvp, vap, cred);
-			if (vp == nvp)
+			if (usevget == 0  vp == nvp)
 vunref(nvp);
 			else
 vput(nvp);
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Josh Carroll
 I'm having a problem with nfsd hanging and not serving mount points,
 during which time it can not not be killed. This problem started
 happening sometime after November 2nd, since kernel from 11/2 sources
 does not exhibit this problem.

 Please try the attached patch, rick

Thanks! I had to manually patch for some reason, but I can confirmed
that nfsd is now well-behaved with your patch applied. I tested a
couple of different mounts and played two separate files on the
Popcorn Hour (one lower bitrate, the other higher bitrate) and both
played without a hiccup. While those were playing I also was able to
automount my home directory on the macbook and move around my home
directory.

So it looks like this patch did the trick. Thanks Rick, really
appreciate the fast response. Is there a reason why this doesn't seem
to be getting reported a lot? What is particular in my setup that
broke it?

 ps: Starting about Monday I won't be able to do commits for about 3 weeks
    so, if this patch works, could someone else please commit it, thanks,
    rick


If someone can commit this, I'd really appreciate it. I will report
back if I notice any problems, but I imagine this would probably get
fixed in HEAD first, then MFC'd anyway, right? Unless this is already
fixed in HEAD.

Anyway, thanks again Rick! I appreciate it.

Regards,
Josh
As far as I can tell, there have been no adverse effects or
regressions with the kernel built with this patch (I had t
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Rick Macklem
  I'm having a problem with nfsd hanging and not serving mount
  points,
  during which time it can not not be killed. This problem started
  happening sometime after November 2nd, since kernel from 11/2
  sources
  does not exhibit this problem.
 
  Please try the attached patch, rick
 
 Thanks! I had to manually patch for some reason, but I can confirmed
 that nfsd is now well-behaved with your patch applied. I tested a
 couple of different mounts and played two separate files on the
 Popcorn Hour (one lower bitrate, the other higher bitrate) and both
 played without a hiccup. While those were playing I also was able to
 automount my home directory on the macbook and move around my home
 directory.
 
 So it looks like this patch did the trick. Thanks Rick, really
 appreciate the fast response. Is there a reason why this doesn't seem
 to be getting reported a lot? What is particular in my setup that
 broke it?
 
Well, the commit that broke things just hit stable/8 on Nov. 3. Also,
I'm not sure what scenarios would have caused the breakage. I think it
would be something like a file system where vget worked that dropped
out of the loop just after looking up . or .. at the root, so that
the nvp remained locked. But I'm not sure what the exact scenarios are?
(Holding the shared lock shouldn't have stopped further VFS_VGET()s from
 succeeding, I think?)

  ps: Starting about Monday I won't be able to do commits for about 3
  weeks
     so, if this patch works, could someone else please commit it,
  thanks,
     rick
 
 
 If someone can commit this, I'd really appreciate it. I will report
 back if I notice any problems, but I imagine this would probably get
 fixed in HEAD first, then MFC'd anyway, right? Unless this is already
 fixed in HEAD.
 
The patch isn't in head, but hopefully someone like kib@ or jhb@ can do it,
since I won't be able to MFC it before code freeze. They might have a
better patch?

Anyhow, good to hear it fixes the problem. Thanks for reporting the problem
and testing the patch, rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Kostik Belousov
On Fri, Nov 05, 2010 at 10:27:09AM -0700, Josh Carroll wrote:
  I'm having a problem with nfsd hanging and not serving mount points,
  during which time it can not not be killed. This problem started
  happening sometime after November 2nd, since kernel from 11/2 sources
  does not exhibit this problem.
 
  Please try the attached patch, rick
 
 Thanks! I had to manually patch for some reason, but I can confirmed
 that nfsd is now well-behaved with your patch applied. I tested a
 couple of different mounts and played two separate files on the
 Popcorn Hour (one lower bitrate, the other higher bitrate) and both
 played without a hiccup. While those were playing I also was able to
 automount my home directory on the macbook and move around my home
 directory.
 
 So it looks like this patch did the trick. Thanks Rick, really
 appreciate the fast response. Is there a reason why this doesn't seem
 to be getting reported a lot? What is particular in my setup that
 broke it?
 
  ps: Starting about Monday I won't be able to do commits for about 3 weeks
     so, if this patch works, could someone else please commit it, thanks,
     rick
 
 
 If someone can commit this, I'd really appreciate it. I will report
 back if I notice any problems, but I imagine this would probably get
 fixed in HEAD first, then MFC'd anyway, right? Unless this is already
 fixed in HEAD.
 
 Anyway, thanks again Rick! I appreciate it.
 
 Regards,
 Josh
 As far as I can tell, there have been no adverse effects or
 regressions with the kernel built with this patch (I had t

I agree that the fix a right fix for real issue. It should only
affect the filesystems that do support VFS_VGET(). In other words,
it is relevant for e.g. UFS exports, but not for ZFS, that is the
Andrey case.

The change is committed as r214851 with shortest MFC timeout possible.

There is further issue with use of VOP_ISLOCKED(). Andrey, can you
try this untested change in your settings ?

Thanks and sorry.

diff --git a/sys/nfsserver/nfs_serv.c b/sys/nfsserver/nfs_serv.c
index 2b9131f..668b02c 100644
--- a/sys/nfsserver/nfs_serv.c
+++ b/sys/nfsserver/nfs_serv.c
@@ -3037,6 +3037,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
struct vattr va, at, *vap = va;
struct nfs_fattr *fp;
int len, nlen, rem, xfer, tsiz, i, error = 0, error1, getret = 1;
+   int vp_locked;
int siz, cnt, fullsiz, eofflag, rdonly, dirlen, ncookies;
u_quad_t off, toff, verf;
u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */
@@ -3067,10 +3068,12 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
fullsiz = siz;
error = nfsrv_fhtovp(fhp, 1, vp, vfslocked, nfsd, slp,
nam, rdonly, TRUE);
+   vp_locked = 1;
if (!error  vp-v_type != VDIR) {
error = ENOTDIR;
vput(vp);
vp = NULL;
+   vp_locked = 0;
}
if (error) {
nfsm_reply(NFSX_UNSIGNED);
@@ -3090,6 +3093,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
error = nfsrv_access(vp, VEXEC, cred, rdonly, 0);
if (error) {
vput(vp);
+   vp_locked = 0;
vp = NULL;
nfsm_reply(NFSX_V3POSTOPATTR);
nfsm_srvpostop_attr(getret, at);
@@ -3097,6 +3101,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct 
nfssvc_sock *slp,
goto nfsmout;
}
VOP_UNLOCK(vp, 0);
+   vp_locked = 0;
rbuf = malloc(siz, M_TEMP, M_WAITOK);
 again:
iv.iov_base = rbuf;
@@ -3110,6 +3115,7 @@ again:
io.uio_td = NULL;
eofflag = 0;
vn_lock(vp, LK_SHARED | LK_RETRY);
+   vp_locked = 1;
if (cookies) {
free((caddr_t)cookies, M_TEMP);
cookies = NULL;
@@ -3118,6 +3124,7 @@ again:
off = (u_quad_t)io.uio_offset;
getret = VOP_GETATTR(vp, at, cred);
VOP_UNLOCK(vp, 0);
+   vp_locked = 0;
if (!cookies  !error)
error = NFSERR_PERM;
if (!error)
@@ -3238,8 +3245,10 @@ again:
} else {
cn.cn_flags = ~ISDOTDOT;
}
-   if (!VOP_ISLOCKED(vp))
+   if (!vp_locked) {
vn_lock(vp, LK_SHARED | LK_RETRY);
+   vp_locked = 1;
+   }
if ((vp-v_vflag  VV_ROOT) != 0 
(cn.cn_flags  ISDOTDOT) != 0) {
vref(vp);
@@ -3342,7 +3351,7 @@ invalid:
cookiep++;
ncookies--;
}
-   if (!usevget  VOP_ISLOCKED(vp))
+   if (!usevget  vp_locked)
vput(vp);
else

Re: NFS deadlock (unkillable nfsd and no mounts work)

2010-11-05 Thread Richard A Steenbergen
On Thu, Nov 04, 2010 at 10:35:15PM -0700, Josh Carroll wrote:
 Greetings!
 
 I'm having a problem with nfsd hanging and not serving mount points,
 during which time it can not not be killed. This problem started
 happening sometime after November 2nd, since kernel from 11/2 sources
 does not exhibit this problem.

I had a similar issue on -current a few weeks ago, with processes that 
would lock up and become unkillable when they tried to access certain 
parts of the filesystem (running all zfs here). One time it managed to 
lock up every time you'd do an ls /, but a reboot would always clear it, 
then a few days later it would pop up again somewhere else. I never lost 
any data, zfs never found anything wrong, and the drives and hw all 
checked out. I sync'd up with the latest -current on oct 18th and it 
stopped happening (or maybe I just stopped noticing it, entirely 
possible for a very lightly used personal box), plus I was traveling and 
super busy at the time, so I didn't bother pursuing it further.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org