Hey folks!
We're using zfs-based file servers for our backups and we've been
having some issues as of late with certain situations causing zfs/
zpool commands to hang.
Currently, it appears that raid3155 is in this broken state:
r...@homiebackup10:~# ps auxwww | grep zfs
root 15873 0.0 0.0 4216 1236 pts/2 S 15:56:54 0:00 grep zfs
root 13678 0.0 0.1 7516 2176 ? S 14:18:00 0:00 zfs list -
t filesystem raid3155/angels
root 13691 0.0 0.1 7516 2188 ? S 14:18:04 0:00 zfs list -
t filesystem raid3155/blazers
root 13731 0.0 0.1 7516 2200 ? S 14:18:20 0:00 zfs list -
t filesystem raid3155/broncos
root 13792 0.0 0.1 7516 2220 ? S 14:18:51 0:00 zfs list -
t filesystem raid3155/diamondbacks
root 13910 0.0 0.1 7516 2216 ? S 14:19:52 0:00 zfs list -
t filesystem raid3155/knicks
root 13911 0.0 0.1 7516 2196 ? S 14:19:53 0:00 zfs list -
t filesystem raid3155/lions
root 13916 0.0 0.1 7516 2220 ? S 14:19:55 0:00 zfs list -
t filesystem raid3155/magic
root 13933 0.0 0.1 7516 2232 ? S 14:20:01 0:00 zfs list -
t filesystem raid3155/mariners
root 13966 0.0 0.1 7516 2212 ? S 14:20:11 0:00 zfs list -
t filesystem raid3155/mets
root 13971 0.0 0.1 7516 2208 ? S 14:20:21 0:00 zfs list -
t filesystem raid3155/niners
root 13982 0.0 0.1 7516 2220 ? S 14:20:32 0:00 zfs list -
t filesystem raid3155/padres
root 14064 0.0 0.1 7516 2220 ? S 14:21:03 0:00 zfs list -
t filesystem raid3155/redwings
root 14123 0.0 0.1 7516 2212 ? S 14:21:20 0:00 zfs list -
t filesystem raid3155/seahawks
root 14323 0.0 0.1 7420 2184 ? S 14:22:51 0:00 zfs allow
zfsrcv create,mount,receive,share raid3155
root 15245 0.0 0.1 7468 2256 ? S 15:17:59 0:00 zfs
create raid3155/angels
root 15250 0.0 0.1 7468 2244 ? S 15:18:03 0:00 zfs
create raid3155/blazers
root 15256 0.0 0.1 7468 2248 ? S 15:18:19 0:00 zfs
create raid3155/broncos
root 15284 0.0 0.1 7468 2256 ? S 15:18:51 0:00 zfs
create raid3155/diamondbacks
root 15322 0.0 0.1 7468 2260 ? S 15:19:51 0:00 zfs
create raid3155/knicks
root 15332 0.0 0.1 7468 2260 ? S 15:19:53 0:00 zfs
create raid3155/magic
root 15333 0.0 0.1 7468 2236 ? S 15:19:53 0:00 zfs
create raid3155/lions
root 15345 0.0 0.1 7468 2264 ? S 15:20:01 0:00 zfs
create raid3155/mariners
root 15355 0.0 0.1 7468 2260 ? S 15:20:10 0:00 zfs
create raid3155/mets
root 15363 0.0 0.1 7468 2252 ? S 15:20:20 0:00 zfs
create raid3155/niners
root 15368 0.0 0.1 7468 2256 ? S 15:20:33 0:00 zfs
create raid3155/padres
root 15384 0.0 0.1 7468 2256 ? S 15:21:01 0:00 zfs
create raid3155/redwings
root 15389 0.0 0.1 7468 2264 ? S 15:21:20 0:00 zfs
create raid3155/seahawks
attempting to do a zpool list hangs, as does attempting to do a zpool
status raid3155. Rebooting the system (forcefully) seems to 'fix' the
problem, but once it comes back up, doing a zpool list or zpool status
shows no issues with any of the drives.
(after a reboot):
r...@homiebackup10:~# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
raid3066 32.5T 18.1T 14.4T 55% ONLINE -
raid3154 32.5T 18.2T 14.3T 55% ONLINE -
raid3155 32.5T 18.7T 13.8T 57% ONLINE -
raid3156 32.5T 22.0T 10.5T 67% ONLINE -
rpool 59.5G 14.1G 45.4G 23% ONLINE -
We are using silmech storform iserv r505 machines with 3x silmech
storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3
esas cards all containing 1.5TB seagate 7200.11 sata hard drives. We
make a single striped raidz2 pool out of each chassis giving us ~29TB
of storage out of each 'brick' and we use rsync to copy the data from
the machines to be backed up.
They're currently running OpenSolaris 2009.06 (snv_111b)
We have had issues with the backplanes on these machines, but this
particular machine has been up and running for nearly a year without
any problems. It's currently at about 50% capacity on all pools.
I'm not really sure how to proceed from here as far as getting debug
information while it's hung like this. I saw someone with similar
issues post a few days ago but don't see any replies. The thread
title is [zfs-discuss] Problem with resilvering and faulty disk.
We've been seeing that issue as well while rebuilding these drives.
Any assistance with this would be greatly appreciated, and any
information you folks might need to help troubleshoot this issue I can
provide, just let me know what you need!
-Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss