sorry - 0xc0000000 is a bit over-zealous (it's what I had to use for a different problem).

Try 0x80000000 instead. I'm just curious what the last module loaded was. I expected the majority of the output to scroll off the top of the screen - it's only the last few lines I was hoping to catch with a "loading module XXX" message. It does depend somewhat on how big the stack trace is though.

I'm not sure why kmdb is unusable, but it does ring a bell. Maybe I suffered that problem in the past. I don't recall what the fault was though :-(

Regards,
Brian


Juris Krumins wrote:
I've tried.
But strings are scrolling too fast (is it possible to do some kind of
paging), and the only thing I can see before "trap type 8 ..." is

kobj_read_file: size ....
kobj_close: 0x82

after that I have "trap type 8 ..." and
panic: entering debugger ....
Loaded modules: [ mac specfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq .....


More that that, after I got "trap type 8 ..." I can't use kmdb because
it's unresponsive.
Maybe you can suggest me some tricks (commands for kmdb)  to get all
info, cause I'm not that experienced in kmdb.

-----Original Message-----
From: Brian Ruthven - Sun UK <[email protected]>
To: Juris Krumins <[email protected]>
Cc: [email protected]
Subject: Re: [osol-discuss] OpenSolaris snv_130 panic.
Date: Thu, 14 Jan 2010 11:12:53 +0000


Juris Krumins wrote:
panic[cpu0]/thread = fffffffffbc2e3a0: mutex_enter: bad mutex, lp=20 
owner=f000e987f000fea0 thread=fffffffffbc2e3a0

unix: mutex_panic + 73 ()
unix: mutex_vector_enter +446()
genunix: zone_getspecific+2b ()
genunix: core+5f ()
unix: kern_gpfault+18e ()
unix: trap+41e ()
unix: cmntrap + e6 ()


I've dig a little bit through src.opensolaris.org and I seems to me that mutex_panic comes from startup.c:1517: dispinit(); function call So dispinit() call disp.c:221: mutex_enter(&cpu_lock); which is the case of mutex_panic().

That's not quite right. The call to mutex_panic is from mutex_vector_enter, and is caused by a "NULL" pointer being passed to mutex_enter (actually the value 0x20, but this was probably from dereferencing a struct member at offset 0x20 from a NULL pointer).

I'm intrigued that the mutex address (the "lp=" part in the panic message) indicates what should be unmapped memory, and I would have expected it to panic with a page fault (or whatever the x86 equivalent of a BAD TRAP type 0x31 is). Instead, it appears to have found the value 0xf000e987f000fea0 in there, which failed the validation by mutex_vector_enter.

There are three possible calls to mutex_panic from mutex_vector_enter, but only one causes "bad mutex":

        if (!MUTEX_TYPE_ADAPTIVE(lp)) {
                mutex_panic("mutex_enter: bad mutex", lp);
                return;
        }


Anyway, I digress somewhat. The problem is further down the stack - zone_getspecific attempts to acquire zone_lock:


void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
        struct zsd_entry *t;
        void *data;

        mutex_enter(&zone->zone_lock);


Sure enough, zone_lock is at offset 0x20:

 > ::print -a zone_t zone_lock
20 zone_lock {
    20 zone_lock._opaque
}


So "zone" was NULL, and the caller of zone_getspecific passed in a NULL - this was core().

Rather worryingly, if I look at the core() function on my system (albeit snv_128), I see that just before the return address on the stack, I see:

 > core::dis
[snip]
core+0x4d: movl +0x3255d5(%rip),%edi <core_zone_key> core+0x53: movq +0x2dbba6(%rip),%rsi <global_zone>
core+0x5a:                      call   +0x187701        <zone_getspecific>
core+0x5f:                      movq   %rax,%r15

This corresponds to the first call to zone_getspecific:

        global_cg = zone_getspecific(core_zone_key, global_zone);

So it would seem that the global zone pointer 'global_zone' was not yet initialised when you took this panic. It starts life as a NULL and is initialised by zone_init, which happens very early in boot. I don't know offhand what user-land processes have been started by then, but I'm prepared for the answer to be "none". In which case, why is a trap taken in user mode?

Can you try this next time please:

    Add the "-kd" option to the kernel$ line as you did before.
    At the kmdb prompt, type:
        moddebug/W 0xe0000000
        :c

Moddebug at this value will cause much to be printed out. See what the last "loading XXX" message was just before the "trap type 8..." message. Also, gather the output of ::ps to check what user-land processes are running. I suspect the list will be short :-)



Thanks,
Brian




--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to