sorry - 0xc0000000 is a bit over-zealous (it's what I had to use for a
different problem).
Try 0x80000000 instead. I'm just curious what the last module loaded
was. I expected the majority of the output to scroll off the top of the
screen - it's only the last few lines I was hoping to catch with a
"loading module XXX" message. It does depend somewhat on how big the
stack trace is though.
I'm not sure why kmdb is unusable, but it does ring a bell. Maybe I
suffered that problem in the past. I don't recall what the fault was
though :-(
Regards,
Brian
Juris Krumins wrote:
I've tried.
But strings are scrolling too fast (is it possible to do some kind of
paging), and the only thing I can see before "trap type 8 ..." is
kobj_read_file: size ....
kobj_close: 0x82
after that I have "trap type 8 ..." and
panic: entering debugger ....
Loaded modules: [ mac specfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq .....
More that that, after I got "trap type 8 ..." I can't use kmdb because
it's unresponsive.
Maybe you can suggest me some tricks (commands for kmdb) to get all
info, cause I'm not that experienced in kmdb.
-----Original Message-----
From: Brian Ruthven - Sun UK <[email protected]>
To: Juris Krumins <[email protected]>
Cc: [email protected]
Subject: Re: [osol-discuss] OpenSolaris snv_130 panic.
Date: Thu, 14 Jan 2010 11:12:53 +0000
Juris Krumins wrote:
panic[cpu0]/thread = fffffffffbc2e3a0: mutex_enter: bad mutex, lp=20
owner=f000e987f000fea0 thread=fffffffffbc2e3a0
unix: mutex_panic + 73 ()
unix: mutex_vector_enter +446()
genunix: zone_getspecific+2b ()
genunix: core+5f ()
unix: kern_gpfault+18e ()
unix: trap+41e ()
unix: cmntrap + e6 ()
I've dig a little bit through src.opensolaris.org and I seems to me that mutex_panic comes from startup.c:1517: dispinit(); function call
So dispinit() call disp.c:221: mutex_enter(&cpu_lock); which is the case of mutex_panic().
That's not quite right. The call to mutex_panic is from
mutex_vector_enter, and is caused by a "NULL" pointer being passed to
mutex_enter (actually the value 0x20, but this was probably from
dereferencing a struct member at offset 0x20 from a NULL pointer).
I'm intrigued that the mutex address (the "lp=" part in the panic
message) indicates what should be unmapped memory, and I would have
expected it to panic with a page fault (or whatever the x86 equivalent
of a BAD TRAP type 0x31 is). Instead, it appears to have found the value
0xf000e987f000fea0 in there, which failed the validation by
mutex_vector_enter.
There are three possible calls to mutex_panic from mutex_vector_enter,
but only one causes "bad mutex":
if (!MUTEX_TYPE_ADAPTIVE(lp)) {
mutex_panic("mutex_enter: bad mutex", lp);
return;
}
Anyway, I digress somewhat. The problem is further down the stack -
zone_getspecific attempts to acquire zone_lock:
void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
struct zsd_entry *t;
void *data;
mutex_enter(&zone->zone_lock);
Sure enough, zone_lock is at offset 0x20:
> ::print -a zone_t zone_lock
20 zone_lock {
20 zone_lock._opaque
}
So "zone" was NULL, and the caller of zone_getspecific passed in a NULL
- this was core().
Rather worryingly, if I look at the core() function on my system (albeit
snv_128), I see that just before the return address on the stack, I see:
> core::dis
[snip]
core+0x4d: movl +0x3255d5(%rip),%edi
<core_zone_key>
core+0x53: movq +0x2dbba6(%rip),%rsi
<global_zone>
core+0x5a: call +0x187701 <zone_getspecific>
core+0x5f: movq %rax,%r15
This corresponds to the first call to zone_getspecific:
global_cg = zone_getspecific(core_zone_key, global_zone);
So it would seem that the global zone pointer 'global_zone' was not yet
initialised when you took this panic. It starts life as a NULL and is
initialised by zone_init, which happens very early in boot. I don't know
offhand what user-land processes have been started by then, but I'm
prepared for the answer to be "none". In which case, why is a trap taken
in user mode?
Can you try this next time please:
Add the "-kd" option to the kernel$ line as you did before.
At the kmdb prompt, type:
moddebug/W 0xe0000000
:c
Moddebug at this value will cause much to be printed out. See what the
last "loading XXX" message was just before the "trap type 8..." message.
Also, gather the output of ::ps to check what user-land processes are
running. I suspect the list will be short :-)
Thanks,
Brian
--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG
_______________________________________________
opensolaris-discuss mailing list
[email protected]