Frank Fischer wrote:
> After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 
> Marvell controllers and opensolaris snv79 (same as described here: 
> http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just 
> start over using new hardware and opensolaris 2008.05 upgraded to snv94. We 
> used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. 
> And guess what? Now we get these error-messages in /var/adm/messages:
> 
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd11):
> Aug 11 18:20:52 thumper2        Error for Command: read(10)                
> Error Level: Retryable
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
> 1423173120                Error Block: 1423173120
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA           
>                      Serial Number:      WD-WCAP
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
> Unit_Attention
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
> reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
> 
> Along whit these messages there are a lot of this messages:
> 
> Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
> Aug 11 18:20:51 thumper2        Log info 0x31123000 received for target 5.
> Aug 11 18:20:51 thumper2        scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0xc
> 
> 
> I would believe having a faulty disk, but not two:
> 
> Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
> Aug 11 17:47:47 thumper2        Log info 0x31123000 received for target 4.
> Aug 11 17:47:47 thumper2        scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0xc
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd10):
> Aug 11 17:47:48 thumper2        Error for Command: read(10)                
> Error Level: Retryable
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
> 252165120                 Error Block: 252165120
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA           
>                      Serial Number:
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
> Unit_Attention
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
> reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
> Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0):
> 
> 
> Does somebody know what is going on here?
> I have checked the disks with iostat -En :
> 
> -bash-3.2# iostat -En
> ...
> c4t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
> Vendor: FUJITSU  Product: MBA3073RC        Revision: 0103 Serial No:  
> Size: 73.54GB <73543163904 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c4t5d0           Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 
> Vendor: ATA      Product: ST3750330NS      Revision: SN04 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c4t6d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
> Vendor: ATA      Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c6t4d0           Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 
> Vendor: ATA      Product: ST3750640NS      Revision: G    Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c6t5d0           Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 
> Vendor: ATA      Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> 
> I have check the drives with smartctl:
> 
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   115   075   006    Pre-fail  Always      
>  -       94384069
>   3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always      
>  -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always      
>  -       15
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always      
>  -       0
>   7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always      
>  -       263091894
>   9 Power_On_Hours          0x0032   096   096   000    Old_age   Always      
>  -       4050
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always      
>  -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always      
>  -       22
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always      
>  -       0
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always      
>  -       0
> 190 Airflow_Temperature_Cel 0x0022   068   062   045    Old_age   Always      
>  -       32 (Lifetime Min/Max 30/34)
> 194 Temperature_Celsius     0x0022   032   040   000    Old_age   Always      
>  -       32 (0 25 0 0)
> 195 Hardware_ECC_Recovered  0x001a   065   056   000    Old_age   Always      
>  -       173161329
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always      
>  -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline     
>  -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always      
>  -       0
> 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline     
>  -       0
> 202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always      
>  -       0
> 
> But with no UDMA_CRC_Errors I believe the disks are fine.
> 
> Message was edited by: 
>         a0040
>  
>  

Could it be that you have faulty cables? I'm using an LSI SAS controller
(4 port variant) on SPARC, and it works like a charm.

The only problem I'm observing is during boot time: the mpt driver is
resetting/initializing all buses twice. This takes quite some time, but
finally the machine comes up without a problem. The messages appearing
in syslog are of the following form:
Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice]
/[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2):
Aug 12 11:47:28 azalin  initiator SCSI ID now 7
Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice]
/[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2):
Aug 12 11:47:28 azalin  Rev. 1 LSI, Inc. 1064 found.
Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice]
/[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2):
Aug 12 11:47:28 azalin  mpt2 supports power management.
Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice]
/[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2):
Aug 12 11:47:28 azalin  mpt2 Firmware version v0.3.1e.0 (IR)
Aug 12 11:47:28 azalin scsi: [ID 365881 kern.notice]
/[EMAIL PROTECTED],700000/[EMAIL PROTECTED] (mpt2):
Aug 12 11:47:28 azalin  mpt2: IOC Operational.
Aug 12 11:47:43 azalin scsi: [ID 243001 kern.info] /[EMAIL 
PROTECTED],700000/[EMAIL PROTECTED]
(mpt2):
Aug 12 11:47:43 azalin  mpt2: Initiator WWNs:
0x500062b000005e88-0x500062b000005e8b

But as I said - once the system is up and running it works perfectly.

- Thomas
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to