Hi,

I would like to add , yet another, mpt timeout report.
Suddently the system started to get slow, noticeable due to the fact
that some linux clients where complaining about nfs server timeout, and
after some time i saw alot of reset bus messages in the
/var/adm/messsages file.
I quickly took a look to the JBOD chassis, and one of the disks had a
fixed light, and after the physical removal of this disk, the system
re-started to respond and the resilver process kicked in, due to a spare
disk took the place of the disconnected disk, as seen with the zpool
status -v :

 zpool status -v DATAPOOL04
  pool: DATAPOOL04
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 1h40m, 8.26% done, 18h32m to go
config:

        NAME           STATE     READ WRITE CKSUM
        DATAPOOL04     DEGRADED     0     0     0
          raidz1       DEGRADED     0     0     0
            c5t27d0    ONLINE       0     0     0  105M resilvered
            c5t29d0    ONLINE       0     0     0  105M resilvered
            c5t30d0    ONLINE       0     0     0  105M resilvered
            spare      DEGRADED     0     0     0
              c5t31d0  REMOVED      0  423K     0
              c5t28d0  ONLINE       0     0     0  9.83G resilvered
            c5t32d0    ONLINE       0     0     0  105M resilvered
        spares
          c5t28d0      INUSE     currently in use

errors: No known data errors

At this moment the system is doing the resilvering, but the messages
regarding disk/disk controller still appear in the log. Could this
messages appear due to the fact that the resilver process is a heavy
one, or more disks are probably affected?
In cases such as this one, what's the best procedure to do?

    * shutdown server and JBOD , including power off/power on and see
      how it goes
    * replace HBA/disk ?
    * other ?

Thanks for the time, and if any other information is required (even ssh
access can be granted) please feel free to ask it.

Best regards,
Bruno Sousa



System specs  :

    * OpenSolaris snv_101b, with two Dual-Core AMD, and 16 GB Ram
    * LSI Logic SAS1068E, revision B3 , MPT Rev 105, Firmware Rev 011a0000
    * 24 disks are attached to this HBA, the disks are Seagate Sata 1TB
      "Enterprise" class (ATA-ST31000340NS-SN06-931.51GB )
    * the LSI HBA is connect with 1 SFF 8087 connector cable (SAS 846EL1
      BP 1-Port Internal Cascading Cable)  to a Supermicro Chassis SC
      846 with a SAS / SATA Expander Backplane with single LSI SASX36
      Expander Chip


/var/adm/messages content

Dec  7 13:57:12 san01 scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@17,0 (sd18):
Dec  7 13:57:12 san01      Error for Command: write(10)              
Error Level: Retryable
Dec  7 13:57:12 san01 scsi: [ID 107833 kern.notice]        Requested
Block: 48696432                  Error Block: 48696432
Dec  7 13:57:12 san01 scsi: [ID 107833 kern.notice]        Vendor:
ATA                                Serial Number:
Dec  7 13:57:12 san01 scsi: [ID 107833 kern.notice]        Sense Key:
Unit_Attention
Dec  7 13:57:12 san01 scsi: [ID 107833 kern.notice]        ASC: 0x29
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Dec  7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:15 san01      mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec  7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:15 san01      mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:45 san01      Log info 0x31123000 received for target 21.
Dec  7 13:57:45 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:45 san01 scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@15,0 (sd16):
Dec  7 13:57:45 san01      Error for Command: write(10)              
Error Level: Retryable
Dec  7 13:57:45 san01 scsi: [ID 107833 kern.notice]        Requested
Block: 445125208                 Error Block: 445125208
Dec  7 13:57:45 san01 scsi: [ID 107833 kern.notice]        Vendor:
ATA                                Serial Number:
Dec  7 13:57:45 san01 scsi: [ID 107833 kern.notice]        Sense Key:
Unit_Attention
Dec  7 13:57:45 san01 scsi: [ID 107833 kern.notice]        ASC: 0x29
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Dec  7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:50 san01      mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec  7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:50 san01      mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec  7 13:57:52 san01      Log info 0x31123000 received for target 28.
Dec  7 13:57:52 san01      scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec  7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):


iostat -En results

Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 125 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t22d0          Soft Errors: 18 Hard Errors: 106 Transport Errors: 686
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 106 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t23d0          Soft Errors: 18 Hard Errors: 80 Transport Errors: 339
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 80 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t24d0          Soft Errors: 18 Hard Errors: 59 Transport Errors: 228
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 59 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t25d0          Soft Errors: 18 Hard Errors: 55 Transport Errors: 219
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 55 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t26d0          Soft Errors: 18 Hard Errors: 63 Transport Errors: 249
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 63 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t27d0          Soft Errors: 18 Hard Errors: 11 Transport Errors: 274
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t28d0          Soft Errors: 18 Hard Errors: 182 Transport Errors: 1255
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 182 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t29d0          Soft Errors: 18 Hard Errors: 8 Transport Errors: 201
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 8 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t30d0          Soft Errors: 18 Hard Errors: 10 Transport Errors: 249
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t31d0          Soft Errors: 12 Hard Errors: 0 Transport Errors: 115
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12
Illegal Request: 4 Predictive Failure Analysis: 0
c5t32d0          Soft Errors: 18 Hard Errors: 11 Transport Errors: 222
Vendor: ATA      Product: ST31000340NS     Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 11 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to