I have a VMWare 4-based system running Solaris 10 5/09 s10x_u7wos_08
as a VM configured with PCI passthrough to give it control over an
LSI 22320SE SCSI HBA. This is connected to an external Areca SATA to
SCSI RAID array in JBOD mode. The drives are 750 GB Seagate
Barracudas, ST3750640AS and are part of a raidz2 in ZFS.
The Areca controller reports a single drive has failed and Solaris
notices it simultaneously. So far, so good.
But immediately after that, the other drives on this array start
running frightened as if whatever the first drive got is contagious.
Two others even go offline temporarily. What could cause this to
happen?
Of course, the pool goes offline. However, on reboot, the pool comes
up with just that first failed drive in a "cannot open" state.
Aug 4 16:34:51 therealvault fmd: [ID 441519 daemon.error]
SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 4 16:34:51 therealvault EVENT-TIME: Tue Aug 4 16:34:51 EDT 2009
Aug 4 16:34:51 therealvault PLATFORM: VMware Virtual Platform, CSN:
VMware-42 0d 39 ad 2b 8f 11 e0-5a 2f 74 a0 ad 7a f4 86, HOSTNAME:
therealvault
Aug 4 16:34:51 therealvault SOURCE: zfs-diagnosis, REV: 1.0
Aug 4 16:34:51 therealvault EVENT-ID: c8bc8758-7a63-ecd3-b550-e3181e9c64be
Aug 4 16:34:51 therealvault DESC: The number of checksum errors
associated with a ZFS device
Aug 4 16:34:51 therealvault exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Aug 4 16:34:51 therealvault AUTO-RESPONSE: The device has been
marked as degraded. An attempt
Aug 4 16:34:51 therealvault will be made to activate a hot spare if available.
Aug 4 16:34:51 therealvault IMPACT: Fault tolerance of the pool may
be compromised.
Aug 4 16:34:51 therealvault REC-ACTION: Run 'zpool status -x' and
replace the bad device.
Aug 4 16:36:05 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0 (mpt2):
Aug 4 16:36:05 therealvault Disconnected command timeout for Target 1
Aug 4 16:36:05 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:05 therealvault SCSI transport failed: reason
'reset': retrying command
Aug 4 16:36:08 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:08 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:10 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:10 therealvault disk not responding to selection
Aug 4 16:36:12 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:12 therealvault disk not responding to selection
Aug 4 16:36:14 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:14 therealvault disk not responding to selection
Aug 4 16:36:16 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:16 therealvault disk not responding to selection
Aug 4 16:36:18 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:18 therealvault disk not responding to selection
Aug 4 16:36:20 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug 4 16:36:20 therealvault disk not responding to selection
Now the others are getting nervous:
Aug 4 16:36:20 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31):
Aug 4 16:36:20 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,5 (sd32):
Aug 4 16:36:21 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,2 (sd37):
Aug 4 16:36:21 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,3 (sd38):
Aug 4 16:36:21 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,0 (sd27):
Aug 4 16:36:21 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug 4 16:36:22 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,6 (sd33):
Aug 4 16:36:22 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,7 (sd34):
Aug 4 16:36:22 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,4 (sd39):
Aug 4 16:36:22 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,5 (sd40):
Aug 4 16:36:23 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
Aug 4 16:36:23 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30):
Aug 4 16:36:23 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,6 (sd41):
Aug 4 16:36:23 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:24 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,7 (sd42):
Aug 4 16:36:24 therealvault SCSI transport failed: reason
'incomplete': retrying command
Aug 4 16:36:24 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
And these two were frightened to "death" at least until I rebooted:
Aug 4 16:37:35 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug 4 16:37:35 therealvault drive offline
Aug 4 16:37:35 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,0 (sd27):
Aug 4 16:37:35 therealvault drive offline
Now the log is winding down with errors here:
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,5 (sd32):
Aug 4 16:48:59 therealvault SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31):
Aug 4 16:48:59 therealvault disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31):
Aug 4 16:48:59 therealvault SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30):
Aug 4 16:48:59 therealvault disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30):
Aug 4 16:48:59 therealvault SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
Aug 4 16:48:59 therealvault disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
Aug 4 16:48:59 therealvault SYNCHRONIZE CACHE command failed (5)
Aug 4 16:49:00 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug 4 16:49:00 therealvault disk not responding to selection
Aug 4 16:49:00 therealvault scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug 4 16:49:00 therealvault SYNCHRONIZE CACHE command failed (5)
--
Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss