Re: [osol-discuss] OpenSolaris snv_130 panic.

Brian Ruthven - Sun UK Thu, 14 Jan 2010 09:23:04 -0800

sorry - 0xc0000000 is a bit over-zealous (it's what I had to use for adifferent problem).

Try 0x80000000 instead. I'm just curious what the last module loadedwas. I expected the majority of the output to scroll off the top of thescreen - it's only the last few lines I was hoping to catch with a"loading module XXX" message. It does depend somewhat on how big thestack trace is though.

I'm not sure why kmdb is unusable, but it does ring a bell. Maybe Isuffered that problem in the past. I don't recall what the fault wasthough :-(


Regards,
Brian


Juris Krumins wrote:

I've tried.
But strings are scrolling too fast (is it possible to do some kind of
paging), and the only thing I can see before "trap type 8 ..." is
kobj_read_file: size ....
kobj_close: 0x82
after that I have "trap type 8 ..." and
panic: entering debugger ....
Loaded modules: [ mac specfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq .....


More that that, after I got "trap type 8 ..." I can't use kmdb because
it's unresponsive.
Maybe you can suggest me some tricks (commands for kmdb)  to get all
info, cause I'm not that experienced in kmdb.
-----Original Message-----
From: Brian Ruthven - Sun UK <[email protected]>
To: Juris Krumins <[email protected]>
Cc: [email protected]
Subject: Re: [osol-discuss] OpenSolaris snv_130 panic.
Date: Thu, 14 Jan 2010 11:12:53 +0000


Juris Krumins wrote:
panic[cpu0]/thread = fffffffffbc2e3a0: mutex_enter: bad mutex, lp=20 
owner=f000e987f000fea0 thread=fffffffffbc2e3a0

unix: mutex_panic + 73 ()
unix: mutex_vector_enter +446()
genunix: zone_getspecific+2b ()
genunix: core+5f ()
unix: kern_gpfault+18e ()
unix: trap+41e ()
unix: cmntrap + e6 ()
I've dig a little bit through src.opensolaris.org and I seems to me that mutex_panic comes from startup.c:1517: dispinit(); function callSo dispinit() call disp.c:221: mutex_enter(&cpu_lock); which is the case of mutex_panic().
That's not quite right. The call to mutex_panic is frommutex_vector_enter, and is caused by a "NULL" pointer being passed tomutex_enter (actually the value 0x20, but this was probably fromdereferencing a struct member at offset 0x20 from a NULL pointer).
I'm intrigued that the mutex address (the "lp=" part in the panicmessage) indicates what should be unmapped memory, and I would haveexpected it to panic with a page fault (or whatever the x86 equivalentof a BAD TRAP type 0x31 is). Instead, it appears to have found the value0xf000e987f000fea0 in there, which failed the validation bymutex_vector_enter.
There are three possible calls to mutex_panic from mutex_vector_enter,but only one causes "bad mutex":
        if (!MUTEX_TYPE_ADAPTIVE(lp)) {
                mutex_panic("mutex_enter: bad mutex", lp);
                return;
        }
Anyway, I digress somewhat. The problem is further down the stack -zone_getspecific attempts to acquire zone_lock:
void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
        struct zsd_entry *t;
        void *data;

        mutex_enter(&zone->zone_lock);


Sure enough, zone_lock is at offset 0x20:

 > ::print -a zone_t zone_lock
20 zone_lock {
    20 zone_lock._opaque
}
So "zone" was NULL, and the caller of zone_getspecific passed in a NULL- this was core().
Rather worryingly, if I look at the core() function on my system (albeitsnv_128), I see that just before the return address on the stack, I see:
 > core::dis
[snip]
core+0x4d: movl +0x3255d5(%rip),%edi<core_zone_key>core+0x53: movq +0x2dbba6(%rip),%rsi<global_zone>
core+0x5a:                      call   +0x187701        <zone_getspecific>
core+0x5f:                      movq   %rax,%r15

This corresponds to the first call to zone_getspecific:

        global_cg = zone_getspecific(core_zone_key, global_zone);
So it would seem that the global zone pointer 'global_zone' was not yetinitialised when you took this panic. It starts life as a NULL and isinitialised by zone_init, which happens very early in boot. I don't knowoffhand what user-land processes have been started by then, but I'mprepared for the answer to be "none". In which case, why is a trap takenin user mode?
Can you try this next time please:

    Add the "-kd" option to the kernel$ line as you did before.
    At the kmdb prompt, type:
        moddebug/W 0xe0000000
        :c
Moddebug at this value will cause much to be printed out. See what thelast "loading XXX" message was just before the "trap type 8..." message.Also, gather the output of ::ps to check what user-land processes arerunning. I suspect the list will be short :-)
Thanks,
Brian


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Re: [osol-discuss] OpenSolaris snv_130 panic.

Reply via email to