I have a VMWare 4-based system running Solaris 10 5/09 s10x_u7wos_08 as a VM configured with PCI passthrough to give it control over an LSI 22320SE SCSI HBA. This is connected to an external Areca SATA to SCSI RAID array in JBOD mode. The drives are 750 GB Seagate Barracudas, ST3750640AS and are part of a raidz2 in ZFS.

The Areca controller reports a single drive has failed and Solaris notices it simultaneously. So far, so good.

But immediately after that, the other drives on this array start running frightened as if whatever the first drive got is contagious. Two others even go offline temporarily. What could cause this to happen?

Of course, the pool goes offline. However, on reboot, the pool comes up with just that first failed drive in a "cannot open" state.


Aug 4 16:34:51 therealvault fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
Aug  4 16:34:51 therealvault EVENT-TIME: Tue Aug  4 16:34:51 EDT 2009
Aug 4 16:34:51 therealvault PLATFORM: VMware Virtual Platform, CSN: VMware-42 0d 39 ad 2b 8f 11 e0-5a 2f 74 a0 ad 7a f4 86, HOSTNAME: therealvault
Aug  4 16:34:51 therealvault SOURCE: zfs-diagnosis, REV: 1.0
Aug  4 16:34:51 therealvault EVENT-ID: c8bc8758-7a63-ecd3-b550-e3181e9c64be
Aug 4 16:34:51 therealvault DESC: The number of checksum errors associated with a ZFS device Aug 4 16:34:51 therealvault exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information. Aug 4 16:34:51 therealvault AUTO-RESPONSE: The device has been marked as degraded. An attempt
Aug  4 16:34:51 therealvault will be made to activate a hot spare if available.
Aug 4 16:34:51 therealvault IMPACT: Fault tolerance of the pool may be compromised. Aug 4 16:34:51 therealvault REC-ACTION: Run 'zpool status -x' and replace the bad device. Aug 4 16:36:05 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0 (mpt2):
Aug  4 16:36:05 therealvault    Disconnected command timeout for Target 1
Aug 4 16:36:05 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35): Aug 4 16:36:05 therealvault SCSI transport failed: reason 'reset': retrying command Aug 4 16:36:08 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35): Aug 4 16:36:08 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:10 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:10 therealvault    disk not responding to selection
Aug 4 16:36:12 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:12 therealvault    disk not responding to selection
Aug 4 16:36:14 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:14 therealvault    disk not responding to selection
Aug 4 16:36:16 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:16 therealvault    disk not responding to selection
Aug 4 16:36:18 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:18 therealvault    disk not responding to selection
Aug 4 16:36:20 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,0 (sd35):
Aug  4 16:36:20 therealvault    disk not responding to selection

Now the others are getting nervous:
Aug 4 16:36:20 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31): Aug 4 16:36:20 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,5 (sd32): Aug 4 16:36:21 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,2 (sd37): Aug 4 16:36:21 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,3 (sd38): Aug 4 16:36:21 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:21 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,0 (sd27): Aug 4 16:36:21 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28): Aug 4 16:36:22 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,6 (sd33): Aug 4 16:36:22 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,7 (sd34): Aug 4 16:36:22 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:22 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,4 (sd39): Aug 4 16:36:22 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,5 (sd40): Aug 4 16:36:23 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29): Aug 4 16:36:23 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30): Aug 4 16:36:23 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:23 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,6 (sd41): Aug 4 16:36:23 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:24 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@1,7 (sd42): Aug 4 16:36:24 therealvault SCSI transport failed: reason 'incomplete': retrying command Aug 4 16:36:24 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):

And these two were frightened to "death" at least until I rebooted:
Aug 4 16:37:35 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug  4 16:37:35 therealvault    drive offline
Aug 4 16:37:35 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,0 (sd27):
Aug  4 16:37:35 therealvault    drive offline


Now the log is winding down with errors here:
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,5 (sd32):
Aug  4 16:48:59 therealvault    SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31):
Aug  4 16:48:59 therealvault    disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,4 (sd31):
Aug  4 16:48:59 therealvault    SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30):
Aug  4 16:48:59 therealvault    disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,3 (sd30):
Aug  4 16:48:59 therealvault    SYNCHRONIZE CACHE command failed (5)
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
Aug  4 16:48:59 therealvault    disk not responding to selection
Aug 4 16:48:59 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,2 (sd29):
Aug  4 16:48:59 therealvault    SYNCHRONIZE CACHE command failed (5)
Aug 4 16:49:00 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug  4 16:49:00 therealvault    disk not responding to selection
Aug 4 16:49:00 therealvault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci15ad,7...@17,1/pci1000,1...@0/s...@0,1 (sd28):
Aug  4 16:49:00 therealvault    SYNCHRONIZE CACHE command failed (5)


--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to