Lou,

Tried to answer this when you asked on IRC. Try a zpool clear and scrub
again to see if the errors persist.

Cheers,
Bayard

On Sat, 2012-01-28 at 17:52 +0000, Lou Picciano wrote:
> 
> 
> 
> Hello ZFS wizards, 
> 
> Have an odd ZFS problem I'd like to run by you - 
> 
> Root pool on this machine is a 'simple' mirror - just two disks. # zpool 
> status 
> 
> NAME STATE READ WRITE CKSUM 
> rpool ONLINE 0 0 3 
> mirror-0 ONLINE 0 0 6 
> c2t0d0s0 ONLINE 0 0 6 
> c2t1d0s0 ONLINE 0 0 6 
> 
> errors: Permanent errors have been detected in the following files: 
> 
> rpool/ROOT/openindiana-userland-154@zfs-auto-snap_monthly-2011-11-22-09h19:/etc/svc/repository-boot-tmpEdaGba
>  
> 
> ... or similar; CKSUM counts have varied, but were always in that 1x - 2x , 
> 'symmetrical' pattern. 
> 
> After working through the problems above, scrubbing and zfs destroying the 
> snapshot with 'permanent errors', the CKSUMS clear up, but vestiges of the 
> file remain as hex addresses: 
> 
> NAME STATE READ WRITE CKSUM 
> rpool ONLINE 0 0 0 
> mirror-0 ONLINE 0 0 0 
> c2t0d0s0 ONLINE 0 0 0 
> c2t1d0s0 ONLINE 0 0 0 
> 
> errors: Permanent errors have been detected in the following files: 
> 
> <0x18e73>:<0x78007> 
> 
> I have no evidence that ZFS is itself the direct culprit here; it may just be 
> on the receiving end of one of the couple of problems we've recently worked 
> through on this machine: 
> 1. a defective CPU, managed by the fault manager, but without a 
> fully-configured crashdump (now rectified), then 
> 2. the SandyBridge 'interrupt storm' problem, which we seem to have now 
> worked around. 
> 
> The storage pools are scrubbed pretty regularly, and we generally have no 
> cksum errors at all. At one point, vmstat reported 7+ _million+ interrupt 
> faults over 5 seconds! I've attempted to clear stats on the pool as well 
> (didn't expect this to work, but worth a try, right?) 
> 
> Important to note that Memtest+ had been run, last time for ~14 hrs, with no 
> error reported. 
> 
> Don't think the storage controller is the culprit, either, as _all_ drives 
> are controlled by the P67A - and no other problems seen. And no errors 
> reported via smartctl. 
> 
> Would welcome input from two perspectives: 
> 
> 1) Before I rebuild the pool/reinstall/whatever, is anyone here interested in 
> any diagnostic output which might still be available? Is any of this useful 
> as a bug report? 
> 2) Then, would love to hear ideas on a solution. 
> 
> Proposed solutions include: 
> 1) creating new BE based on snap of root pool: 
> - Snapshot root pool 
> - (zfs send to datapool for safekeeping) 
> - Split rpool 
> - zpool create newpool (on Drive 'B') 
> - beadm -p create newpool NEWboot (being sure to use slice 0 of Drive 'B') 
> 
> 2) Simply deleting _all_ snapshots on the rpool. 
> 
> 3) complete re-install 
> 
> Tks for feedback. Lou Picciano 
> 
> 
> 
> -------------------------------------------
> illumos-zfs
> Archives: https://www.listbox.com/member/archive/182191/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182191/22062040-29ecd758
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=22062040&id_secret=22062040-1799b5be
> Powered by Listbox: http://www.listbox.com


Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to