Re: [osol-discuss] OpenSolaris snv_130 panic.

Juris Krumins Thu, 14 Jan 2010 07:23:16 -0800

I've tried.
But strings are scrolling too fast (is it possible to do some kind of
paging), and the only thing I can see before "trap type 8 ..." is

kobj_read_file: size ....
kobj_close: 0x82

after that I have "trap type 8 ..." and 

panic: entering debugger ....
Loaded modules: [ mac specfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq .....

More that that, after I got "trap type 8 ..." I can't use kmdb because
it's unresponsive.
Maybe you can suggest me some tricks (commands for kmdb)  to get all
info, cause I'm not that experienced in kmdb.

-----Original Message-----
From: Brian Ruthven - Sun UK <[email protected]>
To: Juris Krumins <[email protected]>
Cc: [email protected]
Subject: Re: [osol-discuss] OpenSolaris snv_130 panic.
Date: Thu, 14 Jan 2010 11:12:53 +0000

Juris Krumins wrote:
> panic[cpu0]/thread = fffffffffbc2e3a0: mutex_enter: bad mutex, lp=20 
> owner=f000e987f000fea0 thread=fffffffffbc2e3a0
>
> unix: mutex_panic + 73 ()
> unix: mutex_vector_enter +446()
> genunix: zone_getspecific+2b ()
> genunix: core+5f ()
> unix: kern_gpfault+18e ()
> unix: trap+41e ()
> unix: cmntrap + e6 ()
>
>
> I've dig a little bit through src.opensolaris.org and I seems to me that 
> mutex_panic comes from startup.c:1517:       dispinit(); function call 
> So dispinit() call   disp.c:221: mutex_enter(&cpu_lock); which is the case of 
> mutex_panic().
>   

That's not quite right. The call to mutex_panic is from 
mutex_vector_enter, and is caused by a "NULL" pointer being passed to 
mutex_enter (actually the value 0x20, but this was probably from 
dereferencing a struct member at offset 0x20 from a NULL pointer).

I'm intrigued that the mutex address (the "lp=" part in the panic 
message) indicates what should be unmapped memory, and I would have 
expected it to panic with a page fault (or whatever the x86 equivalent 
of a BAD TRAP type 0x31 is). Instead, it appears to have found the value 
0xf000e987f000fea0 in there, which failed the validation by 
mutex_vector_enter.

There are three possible calls to mutex_panic from mutex_vector_enter, 
but only one causes "bad mutex":

        if (!MUTEX_TYPE_ADAPTIVE(lp)) {
                mutex_panic("mutex_enter: bad mutex", lp);
                return;
        }

Anyway, I digress somewhat. The problem is further down the stack - 
zone_getspecific attempts to acquire zone_lock:

void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
        struct zsd_entry *t;
        void *data;

        mutex_enter(&zone->zone_lock);

Sure enough, zone_lock is at offset 0x20:

 > ::print -a zone_t zone_lock
20 zone_lock {
    20 zone_lock._opaque
}

So "zone" was NULL, and the caller of zone_getspecific passed in a NULL 
- this was core().

Rather worryingly, if I look at the core() function on my system (albeit 
snv_128), I see that just before the return address on the stack, I see:

 > core::dis
[snip]
core+0x4d:                      movl   +0x3255d5(%rip),%edi     
<core_zone_key>
core+0x53:                      movq   +0x2dbba6(%rip),%rsi     
<global_zone>
core+0x5a:                      call   +0x187701        <zone_getspecific>
core+0x5f:                      movq   %rax,%r15

This corresponds to the first call to zone_getspecific:

        global_cg = zone_getspecific(core_zone_key, global_zone);

So it would seem that the global zone pointer 'global_zone' was not yet 
initialised when you took this panic. It starts life as a NULL and is 
initialised by zone_init, which happens very early in boot. I don't know 
offhand what user-land processes have been started by then, but I'm 
prepared for the answer to be "none". In which case, why is a trap taken 
in user mode?

Can you try this next time please:

    Add the "-kd" option to the kernel$ line as you did before.
    At the kmdb prompt, type:
        moddebug/W 0xe0000000
        :c

Moddebug at this value will cause much to be printed out. See what the 
last "loading XXX" message was just before the "trap type 8..." message.
Also, gather the output of ::ps to check what user-land processes are 
running. I suspect the list will be short :-)

Thanks,
Brian

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Re: [osol-discuss] OpenSolaris snv_130 panic.

Reply via email to