Hi, I would like to add , yet another, mpt timeout report. Suddently the system started to get slow, noticeable due to the fact that some linux clients where complaining about nfs server timeout, and after some time i saw alot of reset bus messages in the /var/adm/messsages file. I quickly took a look to the JBOD chassis, and one of the disks had a fixed light, and after the physical removal of this disk, the system re-started to respond and the resilver process kicked in, due to a spare disk took the place of the disconnected disk, as seen with the zpool status -v :
zpool status -v DATAPOOL04 pool: DATAPOOL04 state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 1h40m, 8.26% done, 18h32m to go config: NAME STATE READ WRITE CKSUM DATAPOOL04 DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c5t27d0 ONLINE 0 0 0 105M resilvered c5t29d0 ONLINE 0 0 0 105M resilvered c5t30d0 ONLINE 0 0 0 105M resilvered spare DEGRADED 0 0 0 c5t31d0 REMOVED 0 423K 0 c5t28d0 ONLINE 0 0 0 9.83G resilvered c5t32d0 ONLINE 0 0 0 105M resilvered spares c5t28d0 INUSE currently in use errors: No known data errors At this moment the system is doing the resilvering, but the messages regarding disk/disk controller still appear in the log. Could this messages appear due to the fact that the resilver process is a heavy one, or more disks are probably affected? In cases such as this one, what's the best procedure to do? * shutdown server and JBOD , including power off/power on and see how it goes * replace HBA/disk ? * other ? Thanks for the time, and if any other information is required (even ssh access can be granted) please feel free to ask it. Best regards, Bruno Sousa System specs : * OpenSolaris snv_101b, with two Dual-Core AMD, and 16 GB Ram * LSI Logic SAS1068E, revision B3 , MPT Rev 105, Firmware Rev 011a0000 * 24 disks are attached to this HBA, the disks are Seagate Sata 1TB "Enterprise" class (ATA-ST31000340NS-SN06-931.51GB ) * the LSI HBA is connect with 1 SFF 8087 connector cable (SAS 846EL1 BP 1-Port Internal Cascading Cable) to a Supermicro Chassis SC 846 with a SAS / SATA Expander Backplane with single LSI SASX36 Expander Chip /var/adm/messages content Dec 7 13:57:12 san01 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@17,0 (sd18): Dec 7 13:57:12 san01 Error for Command: write(10) Error Level: Retryable Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Requested Block: 48696432 Error Block: 48696432 Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Dec 7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:15 san01 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 Dec 7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:15 san01 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21. Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:45 san01 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@15,0 (sd16): Dec 7 13:57:45 san01 Error for Command: write(10) Error Level: Retryable Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Requested Block: 445125208 Error Block: 445125208 Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Dec 7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:50 san01 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 Dec 7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:50 san01 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28. Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): iostat -En results Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 125 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t22d0 Soft Errors: 18 Hard Errors: 106 Transport Errors: 686 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 106 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t23d0 Soft Errors: 18 Hard Errors: 80 Transport Errors: 339 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 80 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t24d0 Soft Errors: 18 Hard Errors: 59 Transport Errors: 228 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 59 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t25d0 Soft Errors: 18 Hard Errors: 55 Transport Errors: 219 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 55 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t26d0 Soft Errors: 18 Hard Errors: 63 Transport Errors: 249 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 63 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t27d0 Soft Errors: 18 Hard Errors: 11 Transport Errors: 274 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t28d0 Soft Errors: 18 Hard Errors: 182 Transport Errors: 1255 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 182 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t29d0 Soft Errors: 18 Hard Errors: 8 Transport Errors: 201 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 8 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t30d0 Soft Errors: 18 Hard Errors: 10 Transport Errors: 249 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0 c5t31d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 115 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 4 Predictive Failure Analysis: 0 c5t32d0 Soft Errors: 18 Hard Errors: 11 Transport Errors: 222 Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 11 Recoverable: 18 Illegal Request: 6 Predictive Failure Analysis: 0
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss