Frank Fischer wrote: > After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 > Marvell controllers and opensolaris snv79 (same as described here: > http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just > start over using new hardware and opensolaris 2008.05 upgraded to snv94. We > used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. > And guess what? Now we get these error-messages in /var/adm/messages: > > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 (sd11): > Aug 11 18:20:52 thumper2 Error for Command: read(10) > Error Level: Retryable > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Requested Block: > 1423173120 Error Block: 1423173120 > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: WD-WCAP > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Sense Key: > Unit_Attention > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, > reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > > Along whit these messages there are a lot of this messages: > > Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): > Aug 11 18:20:51 thumper2 Log info 0x31123000 received for target 5. > Aug 11 18:20:51 thumper2 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > > > I would believe having a faulty disk, but not two: > > Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): > Aug 11 17:47:47 thumper2 Log info 0x31123000 received for target 4. > Aug 11 17:47:47 thumper2 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 (sd10): > Aug 11 17:47:48 thumper2 Error for Command: read(10) > Error Level: Retryable > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Requested Block: > 252165120 Error Block: 252165120 > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Sense Key: > Unit_Attention > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, > reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): > > > Does somebody know what is going on here? > I have checked the disks with iostat -En : > > -bash-3.2# iostat -En > ... > c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: FUJITSU Product: MBA3073RC Revision: 0103 Serial No: > Size: 73.54GB <73543163904 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t5d0 Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 > Vendor: ATA Product: ST3750330NS Revision: SN04 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t6d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c6t4d0 Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 > Vendor: ATA Product: ST3750640NS Revision: G Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 > Illegal Request: 0 Predictive Failure Analysis: 0 > c6t5d0 Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 > Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 > Illegal Request: 0 Predictive Failure Analysis: 0 > > I have check the drives with smartctl: > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 115 075 006 Pre-fail Always > - 94384069 > 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always > - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always > - 15 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always > - 0 > 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always > - 263091894 > 9 Power_On_Hours 0x0032 096 096 000 Old_age Always > - 4050 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always > - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always > - 22 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always > - 0 > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always > - 0 > 190 Airflow_Temperature_Cel 0x0022 068 062 045 Old_age Always > - 32 (Lifetime Min/Max 30/34) > 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always > - 32 (0 25 0 0) > 195 Hardware_ECC_Recovered 0x001a 065 056 000 Old_age Always > - 173161329 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline > - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always > - 0 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline > - 0 > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always > - 0 > > But with no UDMA_CRC_Errors I believe the disks are fine. > > Message was edited by: > a0040 > >
Could it be that you have faulty cables? I'm using an LSI SAS controller (4 port variant) on SPARC, and it works like a charm. The only problem I'm observing is during boot time: the mpt driver is resetting/initializing all buses twice. This takes quite some time, but finally the machine comes up without a problem. The messages appearing in syslog are of the following form: Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:28 azalin initiator SCSI ID now 7 Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:28 azalin Rev. 1 LSI, Inc. 1064 found. Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:28 azalin mpt2 supports power management. Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:28 azalin mpt2 Firmware version v0.3.1e.0 (IR) Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:28 azalin mpt2: IOC Operational. Aug 12 11:47:43 azalin scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2): Aug 12 11:47:43 azalin mpt2: Initiator WWNs: 0x500062b000005e88-0x500062b000005e8b But as I said - once the system is up and running it works perfectly. - Thomas _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss