Hi, I'm trying to build OpenSolaris storage server but I'm expiriencing regular 
zpool corruptions after one or two days of operation.
I would like if someone would comment on my hardware that I use in this setup, 
and give me some pointers how to troubleshoot this.

Machine that Opensolaris is installed on has Supermicro Intel X7DCT 
motherboard, and LSI22320SE SGL SCSI HBA. Aberdeen XDAS P6 Series - 3U SCSI DAS 
is attached to HBA, (16 bays with 2Tb hitachi drives), and drives are 
configured as Pass Through.
I built just one testPool with one vdev containing 8 drives, in raidz2.

This is the "zpool status" output after the zpool crash:

  pool: testPool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: resilver completed after 0h0m with 0 errors on Fri Jan 22 09:29:51 2010
config:

        NAME         STATE     READ WRITE CKSUM
        testPool     UNAVAIL     0     0     0  insufficient replicas
          raidz2     UNAVAIL     1    28     0  insufficient replicas
            c10t0d0  FAULTED      6    95     3  too many errors
            c10t0d1  FAULTED      5    89     3  too many errors
            c10t0d2  ONLINE       2     0     0
            c10t0d3  ONLINE       2     1     0  6K resilvered
            c10t0d4  ONLINE       4     4     0  5.50K resilvered
            c10t0d5  ONLINE       2     8     0  4K resilvered
            c10t0d6  DEGRADED    1     9     3  too many errors
            c10t0d7  ONLINE       3     8     0  3.50K resilvered

errors: 3 data errors, use '-v' for a list


And this is the relevant lines from my /var/adm/messages:

Jan 22 08:02:54 disk    got firmware SCSI bus reset.
Jan 22 08:02:54 disk log info = 0
Jan 22 08:03:07 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8 (mpt0):
Jan 22 08:03:07 disk    Rev. 8 LSI, Inc. 1030 found.
Jan 22 08:03:07 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8 (mpt0):
Jan 22 08:03:07 disk    mpt0 supports power management.
Jan 22 08:03:07 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8 (mpt0):
Jan 22 08:03:07 disk    mpt0 unrecognized capability 0x6.
Jan 22 08:03:10 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8 (mpt0):
Jan 22 08:03:10 disk    mpt0: IOC Operational.
Jan 22 08:03:13 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8,1 (mpt1):
Jan 22 08:03:13 disk    Rev. 8 LSI, Inc. 1030 found.
Jan 22 08:03:13 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8,1 (mpt1):
Jan 22 08:03:13 disk    mpt1 supports power management.
Jan 22 08:03:13 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8,1 (mpt1):
Jan 22 08:03:13 disk    mpt1 unrecognized capability 0x0.
Jan 22 08:03:13 disk scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,6...@4/pci10b5
,8...@0/pci1000,1...@8,1 (mpt1):
Jan 22 08:03:13 disk    mpt1: IOC Operational.
Jan 22 08:04:50 disk fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYP
E: Fault, VER: 1, SEVERITY: Major
Jan 22 08:04:50 disk EVENT-TIME: Fri Jan 22 08:04:50 GMT 2010
Jan 22 08:04:50 disk PLATFORM: X7DCT, CSN: 0123456789, HOSTNAME: disk
Jan 22 08:04:50 disk SOURCE: zfs-diagnosis, REV: 1.0
Jan 22 08:04:50 disk EVENT-ID: 857d4e64-9a2f-e6fb-94c2-9337566aa6c9
Jan 22 08:04:50 disk DESC: The number of I/O errors associated with a ZFS device
 exceeded
Jan 22 08:04:50 disk         acceptable levels.  Refer to http://sun.com/msg/ZFS
-8000-FD for more information.
Jan 22 08:04:50 disk AUTO-RESPONSE: The device has been offlined and marked as f
aulted.  An attempt
Jan 22 08:04:50 disk         will be made to activate a hot spare if available.
Jan 22 08:04:50 disk IMPACT: Fault tolerance of the pool may be compromised.
Jan 22 08:04:50 disk REC-ACTION: Run 'zpool status -x' and replace the bad devic
e.



Can I build ZFS storage server with this kind of hardware? If yes, how can I 
troubleshoot the problem?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to