I have a box running snv_134 that had a little boo-boo.

The problem first started a couple of weeks ago with some corruption on two 
filesystems in a 11 disk 10tb raidz2 set.  I ran a couple of scrubs that 
revealed a handful of corrupt files on my 2 de-duplicated zfs filesystems.  No 
biggie.

I thought that my problems had something to do with de-duplication in 134, so I 
went about the process of creating new filesystems and copying over the "good" 
files to another box.  Every time I touched the "bad" files I got a filesystem 
error 5.  When trying to delete them manually, I got kernel panics - which 
eventually turned into reboot loops.

I tried installing nexenta on another disk to see if that would allow me to get 
passed the reboot loop - which it did.  I finished moving the "good" files over 
(using rsync, which skipped over the error 5 files, unlike cp or mv), and 
destroyed one of the two filesystems.  Unfortunately, this caused a kernel 
panic in the middle of the destroy operation, which then became another panic / 
reboot loop.

I was able to get in with milestone=none and delete the zfs cache, but now I 
have a new problem:  Any attempt to import the pool results in a panic.  I have 
tried from my snv_134 install, from the live cd, and from nexenta.  I have 
tried various zdb incantations (with aok=1 and zfs:zfs_recover=1), to no avail 
- these error out after a few minutes.  I have even tried another controller.

I have zdb -e -bcsvL running now from 134 (without aok=1) which has been 
running for several hours.  Can zdb recover from this kind of situation (with a 
half-destroyed filesystem that panics the kernel on import?)  What is the 
impact of the above zdb operation without aok=1?  Is there any likelihood of a 
recovery of non-affected filesystems?

Any suggestions?

Regards,

Matthew Ellison
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to