Re: ZFS deadlock on rrl-rr_ -- look familiar to anyone?

2013-01-29 Thread Andriy Gapon
on 29/01/2013 05:21 Garrett Wollman said the following:
 When
 I restarted mountd, it hung waiting on rrl-rr_, but the system may
 already have been deadlocked at that point.  procstat reported:
 
 87678 104365 mountd   -mi_switch sleepq_wait _cv_wait 
 rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall 
 Xfast_syscall 
...
 If it happens again

procstat -kk -a

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS deadlock on rrl-rr_ -- look familiar to anyone?

2013-01-28 Thread Garrett Wollman
I just had a big fileserver deadlock in an odd way.  I was
investigating a user's problem, and decided for various reasons to
restart mountd.  It had been complaining like this:

Jan 28 21:06:43 nfs-prod-1 mountd[1108]: can't delete exports for 
/usr/local/.zfs/snapshot/monthly-2013-01: Invalid argument 

for a while, which is odd because /usr/local was never exported.  When
I restarted mountd, it hung waiting on rrl-rr_, but the system may
already have been deadlocked at that point.  procstat reported:

87678 104365 mountd   -mi_switch sleepq_wait _cv_wait 
rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall 
Xfast_syscall 

I was able to run shutdown, and the rc scripts eventually hung in
sync(1) and timed out.  The kernel then hung trying to do the same
thing, but I was able to break into the debugger.  The debugger
interrupted an idle thread, which was not particularly helpful, but I
was able to quickly gather the following information before I had to
reset the machine to restore normal service.

Locked vnodes


0xfe00536383c0: 0xfe00536383c0: tag syncer, type VNON
tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
usecount 1, writecount 0, refcount 2 mountedhere 0
flags (VI(0x200))
flags (VI(0x200))
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)

db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
87996 1 87994 65534  D   rrl-rr_ 0xfe0048ff8108 df
87976 1 87726 0  D+  rrl-rr_ 0xfe0048ff8108 sync
87707 1 87705 65534  D   rrl-rr_ 0xfe0048ff8108 df
87700 1 87698 65534  D   rrl-rr_ 0xfe0048ff8108 df
87678 1 87657 0  D+  rrl-rr_ 0xfe0048ff8108 mountd
87531 1 87529 65534  D   rrl-rr_ 0xfe0048ff8108 df
87387 1 87385 65534  D   rrl-rr_ 0xfe0048ff8108 df
87380 1 87378 65534  D   rrl-rr_ 0xfe0048ff8108 df
87103 1 87101 65534  D   rrl-rr_ 0xfe0048ff8108 df
87096 1 87094 65534  D   rrl-rr_ 0xfe0048ff8108 df
85193 1 85192 0  D   zio-io_ 0xfe10d3e75320 zfs
   24 0 0 0  DL  sdflush  0x80e50878 [softdepflush]
   23 0 0 0  DL  vlruwt   0xfe0048c0a940 [vnlru]
   22 0 0 0  DL  rrl-rr_ 0xfe0048ff8108 [syncer]
   21 0 0 0  DL  psleep   0x80e3c048 [bufdaemon]
   20 0 0 0  DL  pgzero   0x80e5a81c [pagezero]
   19 0 0 0  DL  psleep   0x80e599e8 [vmdaemon]
   18 0 0 0  DL  psleep   0x80e599ac [pagedaemon]
   17 0 0 0  DL  gkt:wait 0x80de6c0c [g_mp_kt]
   16 0 0 0  DL  ipmireq  0xfe00347400b8 [ipmi0: kcs]
9 0 0 0  DL  ccb_scan 0x80dc1360 [xpt_thrd]
8 0 0 0  DL  waiting_ 0x80e41e80 [sctp_iterator]
7 0 0 0  DL  (threaded)  [zfskern]
101355   D   tx-tx_s 0xfe0050342e10 [txg_thread_enter]
101354   D   tx-tx_q 0xfe0050342e30 [txg_thread_enter]
100989   D   tx-tx_s 0xfe004fd27a10 [txg_thread_enter]
100988   D   tx-tx_q 0xfe004fd27a30 [txg_thread_enter]
100593   D   tx-tx_s 0xfe004a8c0a10 [txg_thread_enter]
100592   D   tx-tx_q 0xfe004a8c0a30 [txg_thread_enter]
100216   D   l2arc_fe 0x81228bc0 [l2arc_feed_thread]
100215   D   arc_recl 0x81218d20 
[arc_reclaim_thread]
   15 0 0 0  DL  (threaded)  [usb]
[32 uninteresting and identical threads deleted]
6 0 0 0  DL  mps_scan 0xfe00276816a8 [mps_scan2]
5 0 0 0  DL  mps_scan 0xfe0027612ca8 [mps_scan1]
4 0 0 0  DL  mps_scan 0xfe00274ef4a8 [mps_scan0]
   14 0 0 0  DL  -0x80ded764 [yarrow]
3 0 0 0  DL  crypto_r 0x80e4e0a0 [crypto returns]
2 0 0 0  DL  crypto_w 0x80e4e060 [crypto]
   13 0 0 0  DL  (threaded)  [geom]
100055   D   -0x80de6b90 [g_down]
100054   D   -0x80de6b88 [g_up]
100053   D   -0x80de6b78 [g_event]
   12 0 0 0  WL  (threaded)  [intr]
100189   I   [irq1: atkbd0]
100188   I   [swi0: uart uart]
100187   I   [irq19: atapci1]
100186   I   [irq18: atapci0+]
100169   I