Re: [zfs-discuss] Corrupt meta data, the coredump

Richard Elling Fri, 12 Jun 2009 07:07:19 -0700

Timh Bergström wrote:

Hi,


It indeed does, I am running a really old version of zfs (3?) so i
figured a newer release would atleast not panic, but the bug report
shows exactly what I saw.


A newer release should not panic, or at least not at the same place.
If it does, then we might be seeing a regression, which would need a
new bug to be filed against it.
-- richard

I'll give it a shot, thanks.

//Timh

Den den 11 juni 2009 17:35 skrev Richard Elling<[email protected]>:

This sounds like
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6587723
which was fixed a long time ago.  You might check that bug against your
stack trace (which was not included in this post).

You may be able to boot from a later OS release and import/export the pool
to repair.
-- richard

Timh Bergström wrote:

Hi all,

I've encountered a not so fun problem with one of our pools, the pool
was built with raidz1 according to the zfs-manual, the discs was
imported through an ERQ 16x750GB FC-Array (exported as JBOD) via
(QLogic) FC-HBA's to Solaris 10u3 (x86). Everything have worked fine
and dandy until this morning when the disc-enclosure "crashed" (Reason
unknown) and subsequently dragged the whole system with it, I didnt
get the coredump at the time but now when i've restarted and
reattached the enclosure and tried to import the zpool again I got the
following:

# zpool status -vx
pool: migrated_data
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-CS
scrub: none requested
config:
...

And just a couple of seconds after zpool status -vx the machine coredumps
with:

panic[cpu0]/thread=fffffe80fcd34ba0: BAD TRAP: type=e (#pf Page fault)
rp=fffffe
800138cb10 addr=0 occurred in module "zfs" due to a NULL pointer
dereference
zpool: #pf Page fault
Bad kernel fault at addr=0x0
pid=1116, pc=0xfffffffff0663b45, sp=0xfffffe800138cc00, eflags=0x10202
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
cr2: 0 cr3: e5f2000 cr8: c
        rdi: ffffffff80039200 rsi: ffffffff89d883c0 rdx:                0
        rcx: fffffe80e3667000  r8:                1  r9:                0
        rax:                0 rbx:                1 rbp: fffffe800138cc10
        r10: ffffffff938eb920 r11:                3 r12: ffffffffb0bc4080
        r13: ffffffffb0bc42f0 r14:                1 r15:                0
        fsb: ffffffff80000000 gsb: fffffffffbc240e0  ds:               43
        es:               43  fs:                0  gs:              1c3
        trp:                e err:                0 rip: fffffffff0663b45
         cs:               28 rfl:            10202 rsp: fffffe800138cc00
         ss:               30
...

This occurs a couple of seconds after the system is fully booted, i've
tried several times to be fast enough to unconfigure the
fc-controllers but.. to slow :-) . So I shut the path for the machine
to the FC-enclosure, and of course the pool is now "UNAVAIL" which is
ok since my other pools work fine.

Im curious though - how can metadata be corrupted like that? Why does
the system panic? Can it be repaired?

I know I should have backups but I dont, and if it's a lost cause it's
fine, the data itself is not important.

_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Corrupt meta data, the coredump

Reply via email to