Re: [storage-discuss] When to retire the disk?

Roy Sigurd Karlsbakk Wed, 15 Dec 2010 14:04:15 -0800

----- Original Message -----
> After doing some research, this is what I have found:
> 
> Command "iostat -E" returns, among other things, a count of "hard
> errors." If this count is greater than zero, it is time to retire the
> disk.
> 
> There are two other fields, "soft errors" and "transport errors." What
> is not clear is what action to take if any of these counts are greater
> than 0. Do we just ignore them? Or, is there any heuristic such as if
> soft errors is greater than 5, it is time to replace the disk?


Try the script on http://karlsbakk.net/zfs-stats.sh to parse iostat output in a 
little easier-to-read manner. We currently have some 400 spindles in diverse 
servers, and we usually use that regularly to check if the error rates are 
climbing. I haven't found a good way to automate this yet, but we use automated 
Icinga checks for zfs health as reported by zpool status. Usually the errors we 
found with zfs-stats.sh/iostat are usually best in cases where a single drive 
slows down an array. For dies dying, zpool can usually pick that up fine.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Re: [storage-discuss] When to retire the disk?

Reply via email to