http://defect.opensolaris.org/bz/show_bug.cgi?id=6702


Eric Saxe <eric.saxe at sun.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|eric.saxe at sun.com           |bill.holler at sun.com
             Status|ACCEPTED                    |CAUSEKNOWN




--- Comment #2 from Eric Saxe <eric.saxe at sun.com>  2009-02-18 16:01:35 ---
Investigating on PIT's system:


panic[cpu0]/thread=fec25460: BAD TRAP: type=e (#pf Page fault) rp=fec406dc
addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xfe861f02, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 0cr3: c803000
         gs: 794a01b0  fs: c6420000  es:      160  ds:      160
        edi: 10000000 esi: fec43388 ebp: fec40774 esp: fec40714
        ebx: fec266b8 edx:        1 ecx:       1c eax:        0
        trp:        e err:        0 eip:        0  cs:      158
        efl:    10282 usp: fe861f02  ss: fec266b8

cpu  address     timestamp type  vc  handler   pc
  0 fec10650   21eac7f4d4 trap   e      #pf 0
  0 fec105a8   21e9c0bef8 intr  d1 cbe_fire mod_hold_by_name_common+5e
  0 fec10500   21e8404f20 intr  d1 cbe_fire atomic_add_int+b
  0 fec10458   21e6bfea54 intr  d1 cbe_fire htable_e2va+44
  0 fec103b0   21e53f8a74 intr  d1 cbe_fire hati_demap_func+56
  0 fec10308   21e3bf2fac intr  d1 cbe_fire x86pte_mapin+107
  0 fec10260   21e26d15a4 intr   4  asyintr dispatch_softint+13
  0 fec101b8   21e26d0a54 intr  ff unknown  fakesoftint+13
  0 fec10110   21e264b100 intr   4  asyintr dispatch_softint+13
  0 fec10068   21e2648fec intr  ff unknown  fakesoftint+13

fec40618 unix:die+a1 (e, fec406dc, 0, 0)
fec406c8 unix:trap+179b (fec406dc, 0, 0)
fec406dc unix:cmntrap+10b (794a01b0, c6420000,)
fec40774 0 (caa92dc0)
fec407b4 genunix:thread_create+466 (0, 0, fea75430, c99)
fec40814 genunix:taskq_create_common+1f3 (fec4086c, 0, 1, 3c,)
fec40844 genunix:taskq_create_instance+35 (fec4086c, 0, 1, 3c,)
fec40894 genunix:ddi_taskq_create+81 (c959bc50, feb26c64,)
fec408c4 genunix:attach_node+c5 (c959bc50, 1, fec409)
fec408f4 genunix:i_ndi_config_node+dc (c959bc50, 6, 0, fff)
fec40914 genunix:i_ddi_attachchild+61 (c959bc50, 0)
fec40944 genunix:i_ddi_attach_node_hierarchy+62 (c959bc50, 41000, 20)
fec40984 genunix:attach_driver_nodes+51 (40, fec95f80, fec40)
fec409c4 genunix:ddi_hold_installed_driver+152 (40)
fec409e4 genunix:i_ddi_forceattach_drivers+2f (c0ffc0, fecae720, c)
fec40a04 genunix:main+1b1 ()

panic: entering debugger (no dump device, continue to reboot)

thread_create+0x466 is the function pointer to CL_SETRUN:
> thread_create+0x466::dis
thread_create+0x45a:            movl   0x50(%esi),%eax
thread_create+0x45d:            movl   0x3c(%eax),%eax
thread_create+0x460:            subl   $0xc,%esp
thread_create+0x463:            pushl  %esi
thread_create+0x464:            call   *%eax
thread_create+0x466:            addl   $0x4,%esp
thread_create+0x469:            pushl  0x120(%esi)
thread_create+0x46f:            call   -0x131ee8        <disp_lock_exit>

Since this is a taskq thread being created, it belongs to the sys scheduling
class, so CL_SETRUN is setbackdq().

Looking at setbackdq(), I began to grow suspicious of the disp_enq_thread()
function pointer...and sure enough on this machine, it's NULL at the time of
panic. (Presumably, NULL is an invalid opcode) as well...

Since disp_enq_thread is statically initialized to generic_enq_thread(),
something must be wiping the function pointer out prior to the panic. Setting a
watchpoint on disp_enq_thread early in boot stops at:

SunOS Release 5.11 Version pad-gate 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
DEBUG enabled
Loaded modules: [ scsi_vhci mac uppc ufs specfs pcplusmp cpu.generic ]
kmdb: stop on write of [disp_enq_thread, disp_enq_thread+1)
kmdb: target stopped at:
cpu_idle_fini+0x39:     movl   -0x1c(%ebp),%eax
[0]> $c
cpu_idle_fini+0x39(fec266b8, fe8eaab4, 5, fec266b8)
cpu_idle_init+0x1d4(fec266b8, fe8eb52c, 0, 0)
cpupm_init+0x113(fec266b8, fe8f0978, fec409e4, fe83a5db)
post_startup+0xad()
main+0x137()
_locore_start+0x2da()

Disassembling around the area:
cpu_idle_fini+0x29:             movl   %ebx,0xfec048c0  <idle_cpu>
cpu_idle_fini+0x2f:             
movl   0xfec4861c,%eax  <non_deep_idle_disp_enq_thread>
cpu_idle_fini+0x34:             movl   %eax,0xfec048bc  <disp_enq_thread>
cpu_idle_fini+0x39:             movl   -0x1c(%ebp),%eax
cpu_idle_fini+0x3c:             movl   0x6c(%eax),%esi
cpu_idle_fini+0x3f:             testl  %esi,%esi

So it looks like we're replacing disp_enq_thread with whatever
non_deep_idle_disp_enq_thread has, which is:

[0]> non_deep_idle_disp_enq_thread/X
non_deep_idle_disp_enq_thread:
non_deep_idle_disp_enq_thread:  0 

So there's the problem. cpu_idle_fini() seems to be trying to restore
disp_enq_thread, but it's doing so with a NULL pointer.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Reply via email to