----- Original Message ----- > After doing some research, this is what I have found: > > Command "iostat -E" returns, among other things, a count of "hard > errors." If this count is greater than zero, it is time to retire the > disk. > > There are two other fields, "soft errors" and "transport errors." What > is not clear is what action to take if any of these counts are greater > than 0. Do we just ignore them? Or, is there any heuristic such as if > soft errors is greater than 5, it is time to replace the disk?
Try the script on http://karlsbakk.net/zfs-stats.sh to parse iostat output in a little easier-to-read manner. We currently have some 400 spindles in diverse servers, and we usually use that regularly to check if the error rates are climbing. I haven't found a good way to automate this yet, but we use automated Icinga checks for zfs health as reported by zpool status. Usually the errors we found with zfs-stats.sh/iostat are usually best in cases where a single drive slows down an array. For dies dying, zpool can usually pick that up fine. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. _______________________________________________ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss