http://defect.opensolaris.org/bz/show_bug.cgi?id=6702
Eric Saxe <eric.saxe at sun.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|eric.saxe at sun.com |bill.holler at sun.com
Status|ACCEPTED |CAUSEKNOWN
--- Comment #2 from Eric Saxe <eric.saxe at sun.com> 2009-02-18 16:01:35 ---
Investigating on PIT's system:
panic[cpu0]/thread=fec25460: BAD TRAP: type=e (#pf Page fault) rp=fec406dc
addr=0 occurred in module "<unknown>" due to a NULL pointer dereference
#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xfe861f02, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 0cr3: c803000
gs: 794a01b0 fs: c6420000 es: 160 ds: 160
edi: 10000000 esi: fec43388 ebp: fec40774 esp: fec40714
ebx: fec266b8 edx: 1 ecx: 1c eax: 0
trp: e err: 0 eip: 0 cs: 158
efl: 10282 usp: fe861f02 ss: fec266b8
cpu address timestamp type vc handler pc
0 fec10650 21eac7f4d4 trap e #pf 0
0 fec105a8 21e9c0bef8 intr d1 cbe_fire mod_hold_by_name_common+5e
0 fec10500 21e8404f20 intr d1 cbe_fire atomic_add_int+b
0 fec10458 21e6bfea54 intr d1 cbe_fire htable_e2va+44
0 fec103b0 21e53f8a74 intr d1 cbe_fire hati_demap_func+56
0 fec10308 21e3bf2fac intr d1 cbe_fire x86pte_mapin+107
0 fec10260 21e26d15a4 intr 4 asyintr dispatch_softint+13
0 fec101b8 21e26d0a54 intr ff unknown fakesoftint+13
0 fec10110 21e264b100 intr 4 asyintr dispatch_softint+13
0 fec10068 21e2648fec intr ff unknown fakesoftint+13
fec40618 unix:die+a1 (e, fec406dc, 0, 0)
fec406c8 unix:trap+179b (fec406dc, 0, 0)
fec406dc unix:cmntrap+10b (794a01b0, c6420000,)
fec40774 0 (caa92dc0)
fec407b4 genunix:thread_create+466 (0, 0, fea75430, c99)
fec40814 genunix:taskq_create_common+1f3 (fec4086c, 0, 1, 3c,)
fec40844 genunix:taskq_create_instance+35 (fec4086c, 0, 1, 3c,)
fec40894 genunix:ddi_taskq_create+81 (c959bc50, feb26c64,)
fec408c4 genunix:attach_node+c5 (c959bc50, 1, fec409)
fec408f4 genunix:i_ndi_config_node+dc (c959bc50, 6, 0, fff)
fec40914 genunix:i_ddi_attachchild+61 (c959bc50, 0)
fec40944 genunix:i_ddi_attach_node_hierarchy+62 (c959bc50, 41000, 20)
fec40984 genunix:attach_driver_nodes+51 (40, fec95f80, fec40)
fec409c4 genunix:ddi_hold_installed_driver+152 (40)
fec409e4 genunix:i_ddi_forceattach_drivers+2f (c0ffc0, fecae720, c)
fec40a04 genunix:main+1b1 ()
panic: entering debugger (no dump device, continue to reboot)
thread_create+0x466 is the function pointer to CL_SETRUN:
> thread_create+0x466::dis
thread_create+0x45a: movl 0x50(%esi),%eax
thread_create+0x45d: movl 0x3c(%eax),%eax
thread_create+0x460: subl $0xc,%esp
thread_create+0x463: pushl %esi
thread_create+0x464: call *%eax
thread_create+0x466: addl $0x4,%esp
thread_create+0x469: pushl 0x120(%esi)
thread_create+0x46f: call -0x131ee8 <disp_lock_exit>
Since this is a taskq thread being created, it belongs to the sys scheduling
class, so CL_SETRUN is setbackdq().
Looking at setbackdq(), I began to grow suspicious of the disp_enq_thread()
function pointer...and sure enough on this machine, it's NULL at the time of
panic. (Presumably, NULL is an invalid opcode) as well...
Since disp_enq_thread is statically initialized to generic_enq_thread(),
something must be wiping the function pointer out prior to the panic. Setting a
watchpoint on disp_enq_thread early in boot stops at:
SunOS Release 5.11 Version pad-gate 32-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
DEBUG enabled
Loaded modules: [ scsi_vhci mac uppc ufs specfs pcplusmp cpu.generic ]
kmdb: stop on write of [disp_enq_thread, disp_enq_thread+1)
kmdb: target stopped at:
cpu_idle_fini+0x39: movl -0x1c(%ebp),%eax
[0]> $c
cpu_idle_fini+0x39(fec266b8, fe8eaab4, 5, fec266b8)
cpu_idle_init+0x1d4(fec266b8, fe8eb52c, 0, 0)
cpupm_init+0x113(fec266b8, fe8f0978, fec409e4, fe83a5db)
post_startup+0xad()
main+0x137()
_locore_start+0x2da()
Disassembling around the area:
cpu_idle_fini+0x29: movl %ebx,0xfec048c0 <idle_cpu>
cpu_idle_fini+0x2f:
movl 0xfec4861c,%eax <non_deep_idle_disp_enq_thread>
cpu_idle_fini+0x34: movl %eax,0xfec048bc <disp_enq_thread>
cpu_idle_fini+0x39: movl -0x1c(%ebp),%eax
cpu_idle_fini+0x3c: movl 0x6c(%eax),%esi
cpu_idle_fini+0x3f: testl %esi,%esi
So it looks like we're replacing disp_enq_thread with whatever
non_deep_idle_disp_enq_thread has, which is:
[0]> non_deep_idle_disp_enq_thread/X
non_deep_idle_disp_enq_thread:
non_deep_idle_disp_enq_thread: 0
So there's the problem. cpu_idle_fini() seems to be trying to restore
disp_enq_thread, but it's doing so with a NULL pointer.
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.