> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Karl Rossing
> 
> So i figured out after a couple of scrubs and fmadm faulty that drive
> c9t15d0 was bad.
> 
> My pool now looks like this:
>          NAME           STATE     READ WRITE CKSUM
>          vdipool        DEGRADED     0     0     2
>            raidz1       DEGRADED     0     0     4
>              c9t14d0    ONLINE       0     0     1  512 resilvered
>              spare      DEGRADED     0     0     0
>                c9t15d0  OFFLINE      0     0     0
>                c9t19d0  ONLINE       0     0     0  16.1G resilvered
>              c9t16d0    ONLINE       0     0     1  512 resilvered
>              c9t17d0    ONLINE       0     0     5  2.50K resilvered
>              c9t18d0    ONLINE       0     0     1  512 resilvered
>          spares
>            c9t19d0      INUSE     currently in use

Um...  Call me crazy, but ...  If c9t15d0 was bad, then why do all those
other disks have checksum errors on them?

Although what you said is distinctly possible (faulty disk behaves so badly
that it causes all the components around it to also exhibit failures), it
seems unlikely.  It seems much more likely that a common component (hba,
ram, etc) is faulty, which could possibly be in addition to c9t15d0.
Another possibility is that the faulty hba (or whatever) caused a false
positive on c9t15d0.  Maybe c9t15d0 isn't any more unhealthy than all the
other drives on that bus, which may all be bad, or they may all be good
including c9t15d0.  (It wouldn't be the first time I've seen a whole batch
of disks be bad, from the same mfgr with closely related serial numbers and
mfgr dates.)

I think you have to explain the checksum errors on all the other disks
before drawing any conclusions.

And the fact that it resilvered immediately after it resilvered...  Only
lends more credence to my suspicion in your bad-disk-diagnosis.

BTW, what OS and what hardware are you running?  How long has it been
running, and how much attention do you give it?  That is - Can you
confidently say it was running without errors for 6 months and then suddenly
started exhibiting this behavior?  If this is in fact a new system, or if
you haven't been paying much attention, I would not be surprised to see this
type of behavior if you're running on unsupported or generic hardware.  And
when I say "unsupported" or "generic" I mean ... Intel, Asus, Dell, HP, etc,
big name brands count as "unsupported" and "generic."  Basically anything
other than sun hardware and software fully updated and still in support
contract, if I'm exaggerating to the extreme.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to