James: We are running Phase 16 on our LSISAS3801E's, and have also tried the
recently released Phase 17 but it didn't help. All firmware NVRAM settings are
default. Basically, when we put the disks behind this controller under load
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of
log entries that appear at random intervals:
scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
incomplete read- retrying
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0
(mpt0):
mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0
(mpt0):
mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0
(mpt0):
mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0
(mpt0):
mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Log info 0x31110b00 received for target 40.
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Log info 0x31110b00 received for target 40.
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Log info 0x31110b00 received for target 40.
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Log info 0x31110b00 received for target 40.
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42):
incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
mpt0: IOC Operational.
It seems to be timing out accessing a disk, retrying, giving up and then doing
a bus reset?
This is happening with random disks behind the controller and on multiple
systems with the same hardware config. We are running snv_118 right now and was
hoping this was some magic mpt-related "bug" that was going to be fixed in
snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's
which, albeit a dense solution, it should be able to handle. We are also using
wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower
performance-wise, but the goal here is density not performance. I would have
hoped that the system would just "slow down" if there was IO contention, but
not experience things like bus resets.
Your thoughts?
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss