Re: processes stuck on a vnode lock

2010-11-05 Thread Andriy Gapon
on 04/11/2010 16:45 Andriy Gapon said the following:
 on 04/11/2010 09:49 Andriy Gapon said the following:

 I see a few processes stuck on the same vnode, trying to take or to upgrade 
 to
 an exclusive lock on it, while the lock data suggests that it is already
 shared-locked.  The vnode is a root vnode of one of ZFS filesystems (it's 
 not a
 global root).

 I couldn't find any (other) threads that could actually hold the vnode lock, 
 but
 lock shared count is suspiciously or coincidentally the same as number of
 threads in zfs_root call.
 
 BTW, I still have the system alive and online, so if anyone has ideas I can 
 try them.
 

The kernel is not live now, but I have saved it and vmcore of the system.

Kostik,

just a pure guesswork here - could r214049 have something to do with this?
I looked at the change and it looks completely correct - I don't think that a
vnode lock can be leaked by that code.  But, OTOH, it has some special handling
for VV_ROOT, it's in NFS code and and it's in a right time-frame, so just 
asking.

Here's a link to the start of this report thread:
http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/10659/focus=128893

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: processes stuck on a vnode lock

2010-11-05 Thread Rick Macklem
 on 04/11/2010 16:45 Andriy Gapon said the following:
  on 04/11/2010 09:49 Andriy Gapon said the following:
 
  I see a few processes stuck on the same vnode, trying to take or to
  upgrade to
  an exclusive lock on it, while the lock data suggests that it is
  already
  shared-locked. The vnode is a root vnode of one of ZFS filesystems
  (it's not a
  global root).
 
  I couldn't find any (other) threads that could actually hold the
  vnode lock, but
  lock shared count is suspiciously or coincidentally the same as
  number of
  threads in zfs_root call.
 
  BTW, I still have the system alive and online, so if anyone has
  ideas I can try them.
 
 
 The kernel is not live now, but I have saved it and vmcore of the
 system.
 
 Kostik,
 
 just a pure guesswork here - could r214049 have something to do with
 this?
 I looked at the change and it looks completely correct - I don't think
 that a
 vnode lock can be leaked by that code. But, OTOH, it has some special
 handling
 for VV_ROOT, it's in NFS code and and it's in a right time-frame, so
 just asking.
 
You could try the attached patch which seems to have worked for Josh
Carroll, who had a similar problem with stable/8.

rick
--- nfs_serv.c.sav	2010-11-05 08:15:57.0 -0400
+++ nfs_serv.c	2010-11-05 08:18:40.0 -0400
@@ -3252,7 +3252,7 @@
 			nfhp-fh_fsid = nvp-v_mount-mnt_stat.f_fsid;
 			if ((error1 = VOP_VPTOFH(nvp, nfhp-fh_fid)) == 0)
 error1 = VOP_GETATTR(nvp, vap, cred);
-			if (vp == nvp)
+			if (usevget == 0  vp == nvp)
 vunref(nvp);
 			else
 vput(nvp);
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

processes stuck on a vnode lock

2010-11-04 Thread Andriy Gapon

I see a few processes stuck on the same vnode, trying to take or to upgrade to
an exclusive lock on it, while the lock data suggests that it is already
shared-locked.  The vnode is a root vnode of one of ZFS filesystems (it's not a
global root).

I couldn't find any (other) threads that could actually hold the vnode lock, but
lock shared count is suspiciously or coincidentally the same as number of
threads in zfs_root call.

Relevant data dump:
 1125 100129 mountd   -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock zfs_root lookup
namei getfh syscallenter syscall Xfast_syscall
 1135 100209 nfsd nfsd: servicemi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock zfs_fhtovp
nfsrv_fhtovp nfsrv_readdirplus nfssvc_program svc_run_internal svc_thread_start
fork_exit fork_trampoline
39672 100779 find -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock zfs_root lookup
namei vn_open_cred vn_open kern_openat kern_open open syscallenter syscall
Xfast_syscall
61414 100769 smbd -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock cache_lookup
vfs_cache_lookup VOP_LOOKUP_APV lookup namei vn_open_cred vn_open kern_openat
kern_open open syscallenter
61644 100525 smbd -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock cache_lookup
vfs_cache_lookup VOP_LOOKUP_APV lookup namei vn_open_cred vn_open kern_openat
kern_open open syscallenter
61645 100504 smbd -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock cache_lookup
vfs_cache_lookup VOP_LOOKUP_APV lookup namei vn_open_cred vn_open kern_openat
kern_open open syscallenter
61646 100822 smbd -mi_switch sleepq_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock cache_lookup
vfs_cache_lookup VOP_LOOKUP_APV lookup namei vn_open_cred vn_open kern_openat
kern_open open syscallenter
==

(kgdb) tid 100779
[Switching to thread 521 (Thread 100779)]#0  sched_switch
(td=0xff0051e59450, newtd=0xff0001a4c450, flags=Variable flags is not
available.
)
at /usr/src/sys/kern/sched_ule.c:1851
1851cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xff0051e59450, newtd=0xff0001a4c450,
flags=Variable flags is not available.
) at /usr/src/sys/kern/sched_ule.c:1851
#1  0x8038631e in mi_switch (flags=Variable flags is not available.
) at /usr/src/sys/kern/kern_synch.c:449
#2  0x803bd87b in sleepq_switch (wchan=Variable wchan is not 
available.
) at /usr/src/sys/kern/subr_sleepqueue.c:538
#3  0x803be5a5 in sleepq_wait (wchan=0xff000a3e4098, pri=80) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#4  0x80362d62 in __lockmgr_args (lk=0xff000a3e4098, flags=524288,
ilk=0xff000a3e40c8, wmesg=Variable wmesg is not available.
) at /usr/src/sys/kern/kern_lock.c:218
#5  0x804037f1 in vop_stdlock (ap=Variable ap is not available.
) at lockmgr.h:97
#6  0x805bd322 in VOP_LOCK1_APV (vop=0x807e2580,
a=0xff8126ec05b0) at vnode_if.c:1988
#7  0x80422d98 in _vn_lock (vp=0xff000a3e4000, flags=524288,
file=0x80b23c58
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c,
line=1305) at vnode_if.h:859
#8  0x80abd185 in zfs_root (vfsp=Variable vfsp is not available.
) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1305
#9  0x80408323 in lookup (ndp=0xff8126ec09a0) at
/usr/src/sys/kern/vfs_lookup.c:785
#10 0x80408f0f in namei (ndp=0xff8126ec09a0) at
/usr/src/sys/kern/vfs_lookup.c:273
#11 0x80422120 in vn_open_cred (ndp=0xff8126ec09a0,
flagp=0xff8126ec099c, cmode=2432, vn_open_flags=Variable vn_open_flags is
not available.
)
at /usr/src/sys/kern/vfs_vnops.c:189
#12 0x804223cc in vn_open (ndp=Variable ndp is not available.
) at /usr/src/sys/kern/vfs_vnops.c:95
#13 0x80420b9d in kern_openat (td=0xff0051e59450, fd=-100,
path=0x800c61100 Error reading address 0x800c61100: Bad address,
pathseg=UIO_USERSPACE, flags=131077, mode=13052800) at
/usr/src/sys/kern/vfs_syscalls.c:1083
#14 0x80420f19 in kern_open (td=Variable td is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:1039
#15 0x80420f38 in open (td=Variable td is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:1015
#16 0x803c0f8e in syscallenter (td=0xff0051e59450,
sa=0xff8126ec0bb0) at /usr/src/sys/kern/subr_trap.c:318
#17 0x8055b5f1 in syscall (frame=0xff8126ec0c40) at
/usr/src/sys/amd64/amd64/trap.c:939
#18 0x80546262 in Xfast_syscall () at

Re: processes stuck on a vnode lock

2010-11-04 Thread Andriy Gapon
on 04/11/2010 09:49 Andriy Gapon said the following:
 
 I see a few processes stuck on the same vnode, trying to take or to upgrade to
 an exclusive lock on it, while the lock data suggests that it is already
 shared-locked.  The vnode is a root vnode of one of ZFS filesystems (it's not 
 a
 global root).
 
 I couldn't find any (other) threads that could actually hold the vnode lock, 
 but
 lock shared count is suspiciously or coincidentally the same as number of
 threads in zfs_root call.

BTW, I still have the system alive and online, so if anyone has ideas I can try 
them.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org