[xen-discuss] onnv_98 domain 0 panic on xvm built from latest xvm source code

Lu Baolu Wed, 15 Oct 2008 01:26:32 -0700

Hi,

I am trying to build a Nevada domain 0 for the xvm hypervisor which
was built from the latest source code. I followed the instructions
described in below thread to build xvm.


http://mail.opensolaris.org/pipermail/xen-discuss/2008-May/003278.html

Solaris domain 0 panic'ed during boot. The information of this panic
is posted below.



panic[cpu0]/thread=fffffffffbc736e0: BAD TRAP: type=e (#pf Page fault)
rp=fffffffffbca6090 addr=d occurred in module "unix" due to a NULL
pointer dereference

#pf Page fault

Bad kernel fault at addr=0xd
pid=0, pc=0xfffffffffb846d3b, sp=0xfffffffffbca6180, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2620<vmxe,xmme,fxsr,pae>
cr2: d
        rdi:              286 rsi:                0 rdx:         fffffffe
        rcx:                1  r8:                0  r9:            40000
        rax:                d rbx:                0 rbp: fffffffffbca61c0
        r10: fffffffffbc74ab0 r11: ffffff012fe59000 r12:                0
        r13: fffffffffbcb6dc0 r14:                1 r15: ffffff0135bc9580
        fsb:        200000000 gsb: fffffffffbc74ab0  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                e err:                2 rip: fffffffffb846d3b
         cs:             e030 rfl:            10246 rsp: fffffffffbca6180
         ss:             e02b

cpu          address    timestamp type  vc  handler   pc
  0 fffffffffbc1ffc8    dc8add987 trap   e      #pf ec_bind_virq_to_irq+ab
  0 fffffffffbc1fe40    dc8ac5dfb intr   4  asyintr sti+86
  0 fffffffffbc1fcb8    dc8ac53ff intr  ff unknown  fakesoftint+4a
  0 fffffffffbc1fb30    dc8ac386f intr 104 cbe_fire restore_int_flag+fc
  0 fffffffffbc1f9a8    dc84d0cb2 intr 104 cbe_fire restore_int_flag+fc
  0 fffffffffbc1f820    dc6a1eceb intr 104 cbe_fire restore_int_flag+fc
  0 fffffffffbc1f698    dc4c7bd21 intr 104 cbe_fire restore_int_flag+fc
  0 fffffffffbc1f510    dc31c5d55 intr 104 cbe_fire restore_int_flag+fc
  0 fffffffffbc1f388    dc20e75f1 intr  13 uhci_intr HYPERVISOR_sched_op+29
  0 fffffffffbc1f200    dc1efa45a intr  13 uhci_intr HYPERVISOR_sched_op+29

fffffffffbca5f50 unix:die+d2 ()
fffffffffbca6080 unix:trap+162f ()
fffffffffbca6090 unix:cmntrap+24d ()
fffffffffbca61c0 unix:ec_bind_virq_to_irq+ab ()
fffffffffbca61f0 xpv_psm:xen_psm_cpu_start+4b ()
fffffffffbca6210 unix:mach_cpu_start+4a ()
fffffffffbca6270 unix:start_cpu+5e ()
fffffffffbca62b0 unix:start_other_cpus+db ()
fffffffffbca62f0 genunix:main+2bf ()
fffffffffbca6300 unix:_locore_start+80 ()

panic: entering debugger (no dump device, continue to reboot)
Loaded modules: [ scsi_vhci neti xpv_psm zfs uhci hook ip usba specfs sctp arp
xpv_uppc ]

kmdb: target stopped at:
kmdb_enter+0xb: movq   %rax,%rdi

[0]>


The source code of the panic code is:

i86xpv/os/evtchn.c
 707 int
 708 ec_bind_virq_to_irq(int virq, int cpu)
 709 {
 710         int err;
 711         int evtchn;
 712         mec_info_t *virqp;
 713
 714         virqp = &virq_info[virq];
 715         cmn_err(CE_CONT, "ec_bind_virq_to_irq: virq = %d\n", virq);
 716         mutex_enter(&ec_lock);
 717
 718         err = xen_bind_virq(virq, cpu, &evtchn);
 719         ASSERT(err == 0);
 720
 721         ASSERT(evtchn_to_irq[evtchn] == INVALID_IRQ);
 722
 723         if (virqp->mi_irq == INVALID_IRQ) {
 724                 virqp->mi_irq = alloc_irq(IRQT_VIRQ, virq, evtchn, cpu);
 725         } else {
 726                 alloc_irq_evtchn(virqp->mi_irq, virq, evtchn, cpu);
 727         }
 728
 729         mutex_exit(&ec_lock);
 730         return (virqp->mi_irq);
 731 }


This panic happened between line 729 and 730. The disassemble of this code is:

[0]> ec_bind_virq_to_irq::dis
ec_bind_virq_to_irq+0x95:       call   -0x97a   <alloc_irq>
ec_bind_virq_to_irq+0x9a:
movw   %ax,0xfffffffffbc46ac0(%r12)     <virq_info+0x200>
ec_bind_virq_to_irq+0xa3:       movq   %r13,%rdi
ec_bind_virq_to_irq+0xa6:       call   +0x16d35 <mutex_exit>
ec_bind_virq_to_irq+0xab:       addb   %al,(%rax)
ec_bind_virq_to_irq+0xad:       addb   %al,(%rax)
ec_bind_virq_to_irq+0xaf:       addb   %al,(%rax)
ec_bind_virq_to_irq+0xb1:       addb   %al,(%rax)
ec_bind_virq_to_irq+0xb3:       sti
ec_bind_virq_to_irq+0xb4:       popq   %r14
ec_bind_virq_to_irq+0xb6:       popq   %r13
ec_bind_virq_to_irq+0xb8:       popq   %r12
ec_bind_virq_to_irq+0xba:       popq   %rbx
ec_bind_virq_to_irq+0xbb:       leave
ec_bind_virq_to_irq+0xbc:       ret


"rax" was "0xd" when "addb   %al,(%rax)" was executed. That led to the
panic. However, before mutex_exit() was called, "rax" still contained
a valid pointer. I have no idea why it was changed during
mutex_exit().

Another strange thing is when I add an ASSERT between mutex_exit() and
return(), this panic disappeared. That is:

 728
 729         mutex_exit(&ec_lock);
               ASSERT(virqp != NULL);
 730         return (virqp->mi_irq);
 731 }

I am very appreciated for any feedback.

Thanks
_______________________________________________
xen-discuss mailing list
[email protected]

[xen-discuss] onnv_98 domain 0 panic on xvm built from latest xvm source code

Reply via email to