Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

Cindy Swearingen Tue, 01 Feb 2011 10:32:46 -0800

Hi Krunal,

It looks to me like FMA thinks that you removed the disk so you'll need
to confirm whether the cable dropped or something else.


I agree that we need to get email updates for failing devices.

See if fmdump generated an error report using the commands below.

Thanks,

Cindy

# fmdump
TIME                 UUID                                 SUNW-MSG-ID EVENT

Jan 07 14:01:14.7839 04ee736a-b2cb-612f-ce5e-a0e43d666762 ZFS-8000-GHDiagnosedJan 13 10:34:32.2301 04ee736a-b2cb-612f-ce5e-a0e43d666762 FMD-8000-58Updated


Then, review the contents:

fmdump -u 04ee736a-b2cb-612f-ce5e-a0e43d666762 -v
TIME                 UUID                                 SUNW-MSG-ID EVENT

Jan 07 14:01:14.7839 04ee736a-b2cb-612f-ce5e-a0e43d666762 ZFS-8000-GHDiagnosed

  100%  fault.fs.zfs.vdev.checksum

        Problem in: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
           Affects: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
               FRU: -
          Location: -

Jan 13 10:34:32.2301 04ee736a-b2cb-612f-ce5e-a0e43d666762 FMD-8000-58Updated

  100%  fault.fs.zfs.vdev.checksum

        Problem in: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
           Affects: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
               FRU: -
          Location: -

Thanks,

Cindy



On 02/01/11 09:55, Krunal Desai wrote:

I recently discovered a drive failure (either that or a loose cable, I
need to investigate further) on my home fileserver. 'fmadm faulty'
returns no output, but I can clearly see a failure when I do zpool
status -v:

pool: tank
state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
scan: scrub canceled on Tue Feb  1 11:51:58 2011
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          raidz2-0   DEGRADED     0     0     0
            c10t0d0  ONLINE       0     0     0
            c10t1d0  ONLINE       0     0     0
            c10t2d0  ONLINE       0     0     0
            c10t3d0  REMOVED      0     0     0
            c10t4d0  ONLINE       0     0     0
            c10t5d0  ONLINE       0     0     0
            c10t6d0  ONLINE       0     0     0
            c10t7d0  ONLINE       0     0     0

In dmesg, I see:
Feb  1 11:14:33 megatron scsi: [ID 107833 kern.warning] WARNING:
/pci@0,0/pci8086,2e21@1/pci15d9,a580@0/sd@3,0 (sd8):
Feb  1 11:14:33 megatron        Command failed to complete...Device is gone

never had any problems with these drives + mpt in snv_134 (on snv_151a
now), only change was adding a second 1068E-IT that's currently
unpopulated with drives. But more importantly I guess, why can't I see
this failure in fmadm (and how would I go about setting up
automatically dispatching an e-mail to me when stuff like this
happens?)? Is a pool going degraded != to failure?

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

Reply via email to