I just had a big fileserver deadlock in an odd way. I was
investigating a user's problem, and decided for various reasons to
restart mountd. It had been complaining like this:
Jan 28 21:06:43 nfs-prod-1 mountd[1108]: can't delete exports for
/usr/local/.zfs/snapshot/monthly-2013-01: Invalid argument
for a while, which is odd because /usr/local was never exported. When
I restarted mountd, it hung waiting on rrl-rr_, but the system may
already have been deadlocked at that point. procstat reported:
87678 104365 mountd -mi_switch sleepq_wait _cv_wait
rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall
Xfast_syscall
I was able to run shutdown, and the rc scripts eventually hung in
sync(1) and timed out. The kernel then hung trying to do the same
thing, but I was able to break into the debugger. The debugger
interrupted an idle thread, which was not particularly helpful, but I
was able to quickly gather the following information before I had to
reset the machine to restore normal service.
Locked vnodes
0xfe00536383c0: 0xfe00536383c0: tag syncer, type VNON
tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
usecount 1, writecount 0, refcount 2 mountedhere 0
flags (VI(0x200))
flags (VI(0x200))
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)
db ps
pid ppid pgrp uid state wmesg wchancmd
87996 1 87994 65534 D rrl-rr_ 0xfe0048ff8108 df
87976 1 87726 0 D+ rrl-rr_ 0xfe0048ff8108 sync
87707 1 87705 65534 D rrl-rr_ 0xfe0048ff8108 df
87700 1 87698 65534 D rrl-rr_ 0xfe0048ff8108 df
87678 1 87657 0 D+ rrl-rr_ 0xfe0048ff8108 mountd
87531 1 87529 65534 D rrl-rr_ 0xfe0048ff8108 df
87387 1 87385 65534 D rrl-rr_ 0xfe0048ff8108 df
87380 1 87378 65534 D rrl-rr_ 0xfe0048ff8108 df
87103 1 87101 65534 D rrl-rr_ 0xfe0048ff8108 df
87096 1 87094 65534 D rrl-rr_ 0xfe0048ff8108 df
85193 1 85192 0 D zio-io_ 0xfe10d3e75320 zfs
24 0 0 0 DL sdflush 0x80e50878 [softdepflush]
23 0 0 0 DL vlruwt 0xfe0048c0a940 [vnlru]
22 0 0 0 DL rrl-rr_ 0xfe0048ff8108 [syncer]
21 0 0 0 DL psleep 0x80e3c048 [bufdaemon]
20 0 0 0 DL pgzero 0x80e5a81c [pagezero]
19 0 0 0 DL psleep 0x80e599e8 [vmdaemon]
18 0 0 0 DL psleep 0x80e599ac [pagedaemon]
17 0 0 0 DL gkt:wait 0x80de6c0c [g_mp_kt]
16 0 0 0 DL ipmireq 0xfe00347400b8 [ipmi0: kcs]
9 0 0 0 DL ccb_scan 0x80dc1360 [xpt_thrd]
8 0 0 0 DL waiting_ 0x80e41e80 [sctp_iterator]
7 0 0 0 DL (threaded) [zfskern]
101355 D tx-tx_s 0xfe0050342e10 [txg_thread_enter]
101354 D tx-tx_q 0xfe0050342e30 [txg_thread_enter]
100989 D tx-tx_s 0xfe004fd27a10 [txg_thread_enter]
100988 D tx-tx_q 0xfe004fd27a30 [txg_thread_enter]
100593 D tx-tx_s 0xfe004a8c0a10 [txg_thread_enter]
100592 D tx-tx_q 0xfe004a8c0a30 [txg_thread_enter]
100216 D l2arc_fe 0x81228bc0 [l2arc_feed_thread]
100215 D arc_recl 0x81218d20
[arc_reclaim_thread]
15 0 0 0 DL (threaded) [usb]
[32 uninteresting and identical threads deleted]
6 0 0 0 DL mps_scan 0xfe00276816a8 [mps_scan2]
5 0 0 0 DL mps_scan 0xfe0027612ca8 [mps_scan1]
4 0 0 0 DL mps_scan 0xfe00274ef4a8 [mps_scan0]
14 0 0 0 DL -0x80ded764 [yarrow]
3 0 0 0 DL crypto_r 0x80e4e0a0 [crypto returns]
2 0 0 0 DL crypto_w 0x80e4e060 [crypto]
13 0 0 0 DL (threaded) [geom]
100055 D -0x80de6b90 [g_down]
100054 D -0x80de6b88 [g_up]
100053 D -0x80de6b78 [g_event]
12 0 0 0 WL (threaded) [intr]
100189 I [irq1: atkbd0]
100188 I [swi0: uart uart]
100187 I [irq19: atapci1]
100186 I [irq18: atapci0+]
100169 I