Lou, Tried to answer this when you asked on IRC. Try a zpool clear and scrub again to see if the errors persist.
Cheers, Bayard On Sat, 2012-01-28 at 17:52 +0000, Lou Picciano wrote: > > > > Hello ZFS wizards, > > Have an odd ZFS problem I'd like to run by you - > > Root pool on this machine is a 'simple' mirror - just two disks. # zpool > status > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 3 > mirror-0 ONLINE 0 0 6 > c2t0d0s0 ONLINE 0 0 6 > c2t1d0s0 ONLINE 0 0 6 > > errors: Permanent errors have been detected in the following files: > > rpool/ROOT/openindiana-userland-154@zfs-auto-snap_monthly-2011-11-22-09h19:/etc/svc/repository-boot-tmpEdaGba > > > ... or similar; CKSUM counts have varied, but were always in that 1x - 2x , > 'symmetrical' pattern. > > After working through the problems above, scrubbing and zfs destroying the > snapshot with 'permanent errors', the CKSUMS clear up, but vestiges of the > file remain as hex addresses: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c2t0d0s0 ONLINE 0 0 0 > c2t1d0s0 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > <0x18e73>:<0x78007> > > I have no evidence that ZFS is itself the direct culprit here; it may just be > on the receiving end of one of the couple of problems we've recently worked > through on this machine: > 1. a defective CPU, managed by the fault manager, but without a > fully-configured crashdump (now rectified), then > 2. the SandyBridge 'interrupt storm' problem, which we seem to have now > worked around. > > The storage pools are scrubbed pretty regularly, and we generally have no > cksum errors at all. At one point, vmstat reported 7+ _million+ interrupt > faults over 5 seconds! I've attempted to clear stats on the pool as well > (didn't expect this to work, but worth a try, right?) > > Important to note that Memtest+ had been run, last time for ~14 hrs, with no > error reported. > > Don't think the storage controller is the culprit, either, as _all_ drives > are controlled by the P67A - and no other problems seen. And no errors > reported via smartctl. > > Would welcome input from two perspectives: > > 1) Before I rebuild the pool/reinstall/whatever, is anyone here interested in > any diagnostic output which might still be available? Is any of this useful > as a bug report? > 2) Then, would love to hear ideas on a solution. > > Proposed solutions include: > 1) creating new BE based on snap of root pool: > - Snapshot root pool > - (zfs send to datapool for safekeeping) > - Split rpool > - zpool create newpool (on Drive 'B') > - beadm -p create newpool NEWboot (being sure to use slice 0 of Drive 'B') > > 2) Simply deleting _all_ snapshots on the rpool. > > 3) complete re-install > > Tks for feedback. Lou Picciano > > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182191/22062040-29ecd758 > Modify Your Subscription: > https://www.listbox.com/member/?member_id=22062040&id_secret=22062040-1799b5be > Powered by Listbox: http://www.listbox.com
signature.asc
Description: This is a digitally signed message part
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss