On Thu, Dec 22, 2011 at 11:25 AM, Tim Cook <t...@cook.ms> wrote:
> On Thu, Dec 22, 2011 at 10:00 AM, Myers Carpenter <my...@maski.org> wrote:

>> So the lesson here: Don't be a dumbass like me.  Setup up nagios or some
>> other system to alert you when a pool has become degraded.  ZFS works very
>> well with one drive out of the array, you aren't probably going to notice
>> problems unless you are proactively looking for them.

> Or, if you aren't scrubbing on a regular basis, just change your zpool
> failmode property.  Had you set it to wait or panic, it would've been very
> clear, very quickly that something was wrong.
> http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/

    I'm not sure this will help, as a single failed drive in a raidz1
or 2 in a raidz2 will make the zpool DEGRADED and not FAULTED. I
believe this parameter governs behavior for a FAULTED zpool.

    We have a very simple shell script that runs hourly and does a
`zpool status -x` and generates an email to the admins if any pool is
in any state other than ONLINE. As soon as a zpool goes DEGRADED we
get notified and can initiate the correct response (open a case with
Oracle to replace the failed drive is the usual one). Here is the
snippet from the script of the actual health check (not my code, I
would have done it differently, but this works) ...

not_ok=`${zfs_path}/zpool status -x | egrep -v "all pools are
healthy|no pools available"`

if [ "X${not_ok}" != "X" ]
        fault_details="There is at least one zpool error."
        let fault_count=fault_count+1

Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
zfs-discuss mailing list

Reply via email to