Re: [zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Jason J. W. Williams wrote: Have any of y'all seen a condition where the ILOM considers a disk faulted (status is 3 instead of 1), but ZFS keeps writing to the disk and doesn't report any errors? I'm going to do a scrub tomorrow and see what comes back. I'm curious what caused the ILOM to fault the disk. Any advice is greatly appreciated. What does `iostat -E` tell you? I've experienced several times that ZFS is very fault tolerant - a bit too tolerant for my taste - when it comes to faulting a disk. I saw external FC drives with hundreds or even thousands of errors, even entire hanging loops or drives with hardware trouble, and neither ZFS nor /var/adm/messages reported a problem. So I prefer examining the iostat output over `zpool status` - but with the unattractive side effect that it's not possible to reset the error count which iostat reports without a reboot, so this method is not suitable for monitoring purposes. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 11 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Hey Guys, Have any of y'all seen a condition where the ILOM considers a disk faulted (status is 3 instead of 1), but ZFS keeps writing to the disk and doesn't report any errors? I'm going to do a scrub tomorrow and see what comes back. I'm curious what caused the ILOM to fault the disk. Any advice is greatly appreciated. Best Regards, Jason P.S. The system is running OpenSolaris Build 54. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Hi Ralf, Thank you for the suggestion. About half of the disks are reporting 1968-1969 in the Soft Errors field. All disks are reporting 1968 in the Illegal Request field. There don't appear to be any other errors; all other counters are 0. The Illegal Request count seems a little fishy...like iostat -E doesn't like the X4500 for some reason. Thank you again for your help. Best Regards, Jason On Dec 4, 2007 2:54 AM, Ralf Ramge [EMAIL PROTECTED] wrote: Jason J. W. Williams wrote: Have any of y'all seen a condition where the ILOM considers a disk faulted (status is 3 instead of 1), but ZFS keeps writing to the disk and doesn't report any errors? I'm going to do a scrub tomorrow and see what comes back. I'm curious what caused the ILOM to fault the disk. Any advice is greatly appreciated. What does `iostat -E` tell you? I've experienced several times that ZFS is very fault tolerant - a bit too tolerant for my taste - when it comes to faulting a disk. I saw external FC drives with hundreds or even thousands of errors, even entire hanging loops or drives with hardware trouble, and neither ZFS nor /var/adm/messages reported a problem. So I prefer examining the iostat output over `zpool status` - but with the unattractive side effect that it's not possible to reset the error count which iostat reports without a reboot, so this method is not suitable for monitoring purposes. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 11 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss