Hi Krunal,
It looks to me like FMA thinks that you removed the disk so you'll need
to confirm whether the cable dropped or something else.
I agree that we need to get email updates for failing devices.
See if fmdump generated an error report using the commands below.
Thanks,
Cindy
# fmdump
TIME UUID SUNW-MSG-ID EVENT
Jan 07 14:01:14.7839 04ee736a-b2cb-612f-ce5e-a0e43d666762 ZFS-8000-GH
Diagnosed
Jan 13 10:34:32.2301 04ee736a-b2cb-612f-ce5e-a0e43d666762 FMD-8000-58
Updated
Then, review the contents:
fmdump -u 04ee736a-b2cb-612f-ce5e-a0e43d666762 -v
TIME UUID SUNW-MSG-ID EVENT
Jan 07 14:01:14.7839 04ee736a-b2cb-612f-ce5e-a0e43d666762 ZFS-8000-GH
Diagnosed
100% fault.fs.zfs.vdev.checksum
Problem in: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
Affects: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
FRU: -
Location: -
Jan 13 10:34:32.2301 04ee736a-b2cb-612f-ce5e-a0e43d666762 FMD-8000-58
Updated
100% fault.fs.zfs.vdev.checksum
Problem in: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
Affects: zfs://pool=c4538d8607c1e030/vdev=7954b2ff7a8383
FRU: -
Location: -
Thanks,
Cindy
On 02/01/11 09:55, Krunal Desai wrote:
I recently discovered a drive failure (either that or a loose cable, I
need to investigate further) on my home fileserver. 'fmadm faulty'
returns no output, but I can clearly see a failure when I do zpool
status -v:
pool: tank
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub canceled on Tue Feb 1 11:51:58 2011
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c10t0d0 ONLINE 0 0 0
c10t1d0 ONLINE 0 0 0
c10t2d0 ONLINE 0 0 0
c10t3d0 REMOVED 0 0 0
c10t4d0 ONLINE 0 0 0
c10t5d0 ONLINE 0 0 0
c10t6d0 ONLINE 0 0 0
c10t7d0 ONLINE 0 0 0
In dmesg, I see:
Feb 1 11:14:33 megatron scsi: [ID 107833 kern.warning] WARNING:
/pci@0,0/pci8086,2e21@1/pci15d9,a580@0/sd@3,0 (sd8):
Feb 1 11:14:33 megatron Command failed to complete...Device is gone
never had any problems with these drives + mpt in snv_134 (on snv_151a
now), only change was adding a second 1068E-IT that's currently
unpopulated with drives. But more importantly I guess, why can't I see
this failure in fmadm (and how would I go about setting up
automatically dispatching an e-mail to me when stuff like this
happens?)? Is a pool going degraded != to failure?
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss