The nightly disk scrub on a new opensolaris/zfs fileserver turned up some errors on one of the disks, and scarier info turned up investigating that: in googling one of the errors in the log to see if this is real or not (http://opensolaris.org/jive/thread.jspa?messageID=268273 discusses what I'm seeing a little bit), found that iostat gives some interesting error reports in opensolaris (output slightly reformatted for readability); note that the failed disk is the only one with media errors (and has been replaced), the others all appear to be timeout and/or cabling issues.
I'm seeing a slight amount of this behavior on another, similar file
server (zfs1.rdrop.com), which uses slightly different hardware, and not
at all on my home fileserver (zfs.batie.org), which is using
substantially different hardware (e.g. onboard SATA, rather than the
LSI1068 SAS controller the others use, though I'm in the process of
building a replacement that is a twin of the rdrop server).
These servers use the ASUS P5BV/SAS motherboard in a chassis with two
3-drive hotswap bays (4-drive in the peak server) connected with
standard sata cables.
The question, given apparently normal operation, is "is this normal sata
behavior we just never see, or do we have cabling issues?" Neither
answer is particularly good, but I'd like to know just how worried I
need to be...
It's interesting that drives 0 and 4 (drives 0 in each bay respectively)
have 3x the transport error rate the other drives do... The other thing
to note is that the ssd system disk ("PATRIOT MEMORY") is on SATA in all
the systems, and none of them show errors. It sounds to me like these
are just showing up because the LSI chip is actually reporting and/or
doing the error checking at all, but on the other hand, I'm given to
understand that zfs does it's own error checking...
<zfs01.server.peak.org> [317] # zpool status
pool: data
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
see: http://www.sun.com/msg/ZFS-8000-K4
scrub: scrub completed after 4h23m with 0 errors on Wed Apr 8 07:30:43
2009
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
mirror ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
mirror DEGRADED 0 0 0
c3t2d0 FAULTED 7.78M 6.71M 0 too many errors
c3t6d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
errors: No known data errors
<zfs01.server.peak.org> [316] # iostat -En
c5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: PATRIOT MEMORY Revision: Serial No: MK0209040DE0700 Size:
32.04GB <32044400640 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c3t0d0 Soft Errors: 0 Hard Errors: 158 Transport Errors: 2377
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 155 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t1d0 Soft Errors: 0 Hard Errors: 61 Transport Errors: 782
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 58 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t2d0 Soft Errors: 0 Hard Errors: 225 Transport Errors: 868
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 4 Device Not Ready: 0 No Device: 56 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t3d0 Soft Errors: 0 Hard Errors: 61 Transport Errors: 700
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 59 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t4d0 Soft Errors: 0 Hard Errors: 146 Transport Errors: 2495
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 141 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t5d0 Soft Errors: 0 Hard Errors: 53 Transport Errors: 649
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 51 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t6d0 Soft Errors: 0 Hard Errors: 63 Transport Errors: 872
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 61 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t7d0 Soft Errors: 0 Hard Errors: 53 Transport Errors: 759
Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 51 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
<zfs1.rdrop.com> [101] # iostat -En
c5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: PATRIOT MEMORY Revision: Serial No: DC3201008092300 Size:
32.04GB <32044400640 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c3t0d0 Soft Errors: 0 Hard Errors: 7 Transport Errors: 122
Vendor: ATA Product: WDC WD3200AAKS-0 Revision: 0A40 Serial No:
Size: 320.07GB <320072933376 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 7 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t1d0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 7
Vendor: ATA Product: WDC WD3200AAKS-0 Revision: 0A40 Serial No:
Size: 320.07GB <320072933376 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST31000333AS Revision: SD35 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00L Revision: 1A01 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
<zfs.batie.org> [101] # iostat -En
c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: HDS722525VLSA80 Revision: Serial No: VN6J8FCFD Size:
250.06GB <250058833920 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c6d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: WDC WD2500AAKS- Revision: Serial No: WD-WMAT110 Size:
250.06GB <250058833920 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c7d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST31000340AS Revision: Serial No: 5QJ Size:
1000.20GB <1000202305536 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST31000340AS Revision: Serial No: 9QJ Size:
1000.20GB <1000202305536 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
