On May 30, 2012, at 9:25 AM, Antonio S. Cofiño wrote:
> Dear All,
> It may be this not the correct mailing list, but I'm having a ZFS issue when
> a disk is failing.
> The system is a supermicro motherboard X8DTH-6F in a 4U chassis
> (SC847E1-R1400LPB) and an external SAS2 JBOD (SC847E16-RJBOD1).
> It makes a system with a total of 4 backplanes (2x SAS + 2x SAS2) each of
> them connected to a 4 different HBA (2x LSI 3081E-R (1068 chip) + 2x LSI
> SAS9200-8e (2008 chip)).
> This system is has a total of 81 disk (2x SAS (SEAGATE ST3146356SS) + 34
> SATA3 (Hitachi HDS722020ALA330) + 45 SATA6 (Hitachi HDS723020BLA642))
> The system is controlled by Opensolaris (snv_134) and it work normally. All
> the SATA disks are part of the same pool separate by raidz2 vdev composed by
> 11 (~) disks.
> The issue arise when on of the disk starts to fail making long time accesses.
> After some time (minutes, but I'm not sure) all the disks, connected to the
> same HBA, start to report errors. This situation produce a general failure
> on the ZFS making the whole POOL unavailable.
> Identifying the original failed disk producing access errors and removing it
> the pool starts to resilver with no problem, and all the spurious errors
> produced by the general error are recovered.
> My question is, there is anyway to anticipate this "choking" situation when a
> disk is failing, to avoid the general failure?
> Any help or suggestion is welcome.
The best, proven solution is to not use SATA disks with SAS expanders.
Since that is likely to be beyond your time and budget, consider upgrading to
latest HBA and expander firmware.
ZFS Performance and Training
zfs-discuss mailing list