It may be this not the correct mailing list, but I'm having a ZFS issue
when a disk is failing.
The system is a supermicro motherboard X8DTH-6F in a 4U chassis
(SC847E1-R1400LPB) and an external SAS2 JBOD (SC847E16-RJBOD1).
It makes a system with a total of 4 backplanes (2x SAS + 2x SAS2) each
of them connected to a 4 different HBA (2x LSI 3081E-R (1068 chip) + 2x
LSI SAS9200-8e (2008 chip)).
This system is has a total of 81 disk (2x SAS (SEAGATE ST3146356SS) + 34
SATA3 (Hitachi HDS722020ALA330) + 45 SATA6 (Hitachi HDS723020BLA642))
The system is controlled by Opensolaris (snv_134) and it work normally.
All the SATA disks are part of the same pool separate by raidz2 vdev
composed by 11 (~) disks.
The issue arise when on of the disk starts to fail making long time
accesses. After some time (minutes, but I'm not sure) all the disks,
connected to the same HBA, start to report errors. This situation
produce a general failure on the ZFS making the whole POOL unavailable.
Identifying the original failed disk producing access errors and
removing it the pool starts to resilver with no problem, and all the
spurious errors produced by the general error are recovered.
My question is, there is anyway to anticipate this "choking" situation
when a disk is failing, to avoid the general failure?
Any help or suggestion is welcome.
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
Ciencias de la Computación
Universidad de Cantabria
Escuela de Caminos
Avenida de los Castros, 44
39005 Santander, Spain
Tel: (+34) 942 20 1731
Fax: (+34) 942 20 1703
zfs-discuss mailing list