Hey folks!

We're using zfs-based file servers for our backups and we've been having some issues as of late with certain situations causing zfs/ zpool commands to hang.

Currently, it appears that raid3155 is in this broken state:

r...@homiebackup10:~# ps auxwww | grep zfs
root     15873  0.0  0.0 4216 1236 pts/2    S 15:56:54  0:00 grep zfs
root 13678 0.0 0.1 7516 2176 ? S 14:18:00 0:00 zfs list - t filesystem raid3155/angels root 13691 0.0 0.1 7516 2188 ? S 14:18:04 0:00 zfs list - t filesystem raid3155/blazers root 13731 0.0 0.1 7516 2200 ? S 14:18:20 0:00 zfs list - t filesystem raid3155/broncos root 13792 0.0 0.1 7516 2220 ? S 14:18:51 0:00 zfs list - t filesystem raid3155/diamondbacks root 13910 0.0 0.1 7516 2216 ? S 14:19:52 0:00 zfs list - t filesystem raid3155/knicks root 13911 0.0 0.1 7516 2196 ? S 14:19:53 0:00 zfs list - t filesystem raid3155/lions root 13916 0.0 0.1 7516 2220 ? S 14:19:55 0:00 zfs list - t filesystem raid3155/magic root 13933 0.0 0.1 7516 2232 ? S 14:20:01 0:00 zfs list - t filesystem raid3155/mariners root 13966 0.0 0.1 7516 2212 ? S 14:20:11 0:00 zfs list - t filesystem raid3155/mets root 13971 0.0 0.1 7516 2208 ? S 14:20:21 0:00 zfs list - t filesystem raid3155/niners root 13982 0.0 0.1 7516 2220 ? S 14:20:32 0:00 zfs list - t filesystem raid3155/padres root 14064 0.0 0.1 7516 2220 ? S 14:21:03 0:00 zfs list - t filesystem raid3155/redwings root 14123 0.0 0.1 7516 2212 ? S 14:21:20 0:00 zfs list - t filesystem raid3155/seahawks root 14323 0.0 0.1 7420 2184 ? S 14:22:51 0:00 zfs allow zfsrcv create,mount,receive,share raid3155 root 15245 0.0 0.1 7468 2256 ? S 15:17:59 0:00 zfs create raid3155/angels root 15250 0.0 0.1 7468 2244 ? S 15:18:03 0:00 zfs create raid3155/blazers root 15256 0.0 0.1 7468 2248 ? S 15:18:19 0:00 zfs create raid3155/broncos root 15284 0.0 0.1 7468 2256 ? S 15:18:51 0:00 zfs create raid3155/diamondbacks root 15322 0.0 0.1 7468 2260 ? S 15:19:51 0:00 zfs create raid3155/knicks root 15332 0.0 0.1 7468 2260 ? S 15:19:53 0:00 zfs create raid3155/magic root 15333 0.0 0.1 7468 2236 ? S 15:19:53 0:00 zfs create raid3155/lions root 15345 0.0 0.1 7468 2264 ? S 15:20:01 0:00 zfs create raid3155/mariners root 15355 0.0 0.1 7468 2260 ? S 15:20:10 0:00 zfs create raid3155/mets root 15363 0.0 0.1 7468 2252 ? S 15:20:20 0:00 zfs create raid3155/niners root 15368 0.0 0.1 7468 2256 ? S 15:20:33 0:00 zfs create raid3155/padres root 15384 0.0 0.1 7468 2256 ? S 15:21:01 0:00 zfs create raid3155/redwings root 15389 0.0 0.1 7468 2264 ? S 15:21:20 0:00 zfs create raid3155/seahawks

attempting to do a zpool list hangs, as does attempting to do a zpool status raid3155. Rebooting the system (forcefully) seems to 'fix' the problem, but once it comes back up, doing a zpool list or zpool status shows no issues with any of the drives.

(after a reboot):
r...@homiebackup10:~# zpool list
NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
raid3066  32.5T  18.1T  14.4T    55%  ONLINE  -
raid3154  32.5T  18.2T  14.3T    55%  ONLINE  -
raid3155  32.5T  18.7T  13.8T    57%  ONLINE  -
raid3156  32.5T  22.0T  10.5T    67%  ONLINE  -
rpool     59.5G  14.1G  45.4G    23%  ONLINE  -

We are using silmech storform iserv r505 machines with 3x silmech storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3 esas cards all containing 1.5TB seagate 7200.11 sata hard drives. We make a single striped raidz2 pool out of each chassis giving us ~29TB of storage out of each 'brick' and we use rsync to copy the data from the machines to be backed up.

They're currently running OpenSolaris 2009.06 (snv_111b)

We have had issues with the backplanes on these machines, but this particular machine has been up and running for nearly a year without any problems. It's currently at about 50% capacity on all pools.

I'm not really sure how to proceed from here as far as getting debug information while it's hung like this. I saw someone with similar issues post a few days ago but don't see any replies. The thread title is [zfs-discuss] Problem with resilvering and faulty disk. We've been seeing that issue as well while rebuilding these drives.

Any assistance with this would be greatly appreciated, and any information you folks might need to help troubleshoot this issue I can provide, just let me know what you need!

-Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to