Is it possible that the hypervisor writes 64-bits 0's to domain 0
kernel space (ec_bind_virq_to_irq+0x99)? Below is the byte change when
panic happens.
Bytes before kernel start runing:
[0]> fffffffffb846d29,10/B
ec_bind_virq_to_irq+0x99: 41 f bf 84 24 c0
6a c4 fb 41 5e 41 5d 41
5c 5b
Bytes after kernel panic:
[0]> fffffffffb846d29,10/B
ec_bind_virq_to_irq+0x99: 0 0 0 0 0 0
0 0 fb 41 5e 41 5d 41
5c 5b
Thanks,
-Baolu
On Thu, Oct 16, 2008 at 11:06 AM, Lu Baolu <[EMAIL PROTECTED]> wrote:
> Hi Keil,
>
> The assemble of ec_bind_virq_to_irq() are different between the points
> where the kernel hasn't yet started running and where the panic
> happens. I set a write watch point for the changed place, but nothing
> happens before panic. I posted my debug processes below for your
> information.
>
> I removed two cmn_err() lines in ec_bind_virq_to_irq() which I added
> for debugging. So the offset is different from the last post.
>
>
> krtld: Unused kernel arguments: `xen_processor_pmbits=0'.
> Loading kmdb...
>
> Welcome to kmdb
> kmdb: unable to determine terminal type: assuming `vt100'
> Loaded modules: [ unix krtld genunix ]
> [0]> ec_bind_virq_to_irq+0x90::dis
> ec_bind_virq_to_irq+0x90: sti
> ec_bind_virq_to_irq+0x91: movq %r13,%rdi
> ec_bind_virq_to_irq+0x94: call +0x16d37 <mutex_exit>
> ec_bind_virq_to_irq+0x99:
> movswl 0xfffffffffbc46ac0(%r12),%eax <virq_info+0x200>
> << this line of code corrupt before panic
> ec_bind_virq_to_irq+0xa2: popq %r14
> ec_bind_virq_to_irq+0xa4: popq %r13
> ec_bind_virq_to_irq+0xa6: popq %r12
> ec_bind_virq_to_irq+0xa8: popq %rbx
> ec_bind_virq_to_irq+0xa9: leave
> ec_bind_virq_to_irq+0xaa: ret
> ec_bind_virq_to_irq+0xab: leaq +0x110296(%rip),%rdi
> <0xfffffffffb956fd8>
> [0]> ec_bind_virq_to_irq+0x90::wp -w -L 8
> [0]> ec_bind_virq_to_irq+0x98::wp -w -L 8
> [0]> ec_bind_virq_to_irq+0xa0::wp -w -L 2
> [0]> $b
> ID S TA HT LM Description Action
> ----- - -- -- -- ----------------------------------------
> ----------------------
> [ 1 ] + 0 0 stop on write of [ec_bind_virq_to_irq+0x -
> [ 2 ] + 0 0 stop on write of [ec_bind_virq_to_irq+0x -
> [ 3 ] + 0 0 stop on write of [ec_bind_virq_to_irq+0x -
> [0]> :c
> WARNING: Cannot extract revision on this hypervisor version:
> v-unstable-xvm, unexpected version format
> v3.3-unstable-xvm chgset 'Tue Oct 14 12:54:19 2008 +0800 17779:88ed7e891155'
>
> SunOS Release 5.11 Version onnv-98 64-bit
> Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved.
> Use is subject to license terms.
> DEBUG enabled
>
> panic[cpu0]/thread=fffffffffbc736e0: BAD TRAP: type=e (#pf Page fault)
> rp=fffffffffbca6090 addr=d occurred in module "unix" due to a NULL
> pointer dereference
>
> #pf Page fault
> Bad kernel fault at addr=0xd
> pid=0, pc=0xfffffffffb846d29, sp=0xfffffffffbca6180, eflags=0x10246
> cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2620<vmxe,xmme,fxsr,pae>
> cr2: d
> rdi: 286 rsi: 0 rdx: fffffffe
> rcx: 1 r8: 0 r9: 40000
> rax: d rbx: 0 rbp: fffffffffbca61c0
> r10: fffffffffbc74ab0 r11: ffffff012fe59000 r12: 0
> r13: fffffffffbcb6dc0 r14: 1 r15: ffffff0135a8d580
> fsb: 200000000 gsb: fffffffffbc74ab0 ds: 0
> es: 0 fs: 0 gs: 0
> trp: e err: 2 rip: fffffffffb846d29
> cs: e030 rfl: 10246 rsp: fffffffffbca6180
> ss: e02b
>
> cpu address timestamp type vc handler pc
> 0 fffffffffbc20f18 6afbf91146 trap e #pf ec_bind_virq_to_irq+99
> 0 fffffffffbc20d90 6afb5cd249 intr 104 cbe_fire inflate_fast+112
> 0 fffffffffbc20c08 6afb155e6b intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc20a80 6afb02521b intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc208f8 6afaeef1e9 intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc20770 6afad1750e intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc205e8 6afab239db intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc20460 6afa8da042 intr 13 uhci_intr HYPERVISOR_sched_op+29
> 0 fffffffffbc202d8 6afa725ad3 intr 13 uhci_intr copy_pattern+1f
> 0 fffffffffbc20150 6afa618a15 intr 13 uhci_intr HYPERVISOR_sched_op+29
>
> fffffffffbca5f50 unix:die+d2 ()
> fffffffffbca6080 unix:trap+162f ()
> fffffffffbca6090 unix:cmntrap+24d ()
> fffffffffbca61c0 unix:ec_bind_virq_to_irq+99 ()
> fffffffffbca61f0 xpv_psm:xen_psm_cpu_start+4b ()
> fffffffffbca6210 unix:mach_cpu_start+4a ()
> fffffffffbca6270 unix:start_cpu+5e ()
> fffffffffbca62b0 unix:start_other_cpus+db ()
> fffffffffbca62f0 genunix:main+2bf ()
> fffffffffbca6300 unix:_locore_start+80 ()
>
> panic: entering debugger (no dump device, continue to reboot)
>
> Loaded modules: [ scsi_vhci neti xpv_psm zfs uhci hook ip usba specfs sctp arp
> xpv_uppc ]
> kmdb: target stopped at:
> kmdb_enter+0xb: movq %rax,%rdi
> [0]> ec_bind_virq_to_irq+0x90::dis
> ec_bind_virq_to_irq+0x90: sti
> ec_bind_virq_to_irq+0x91: movq %r13,%rdi
> ec_bind_virq_to_irq+0x94: call +0x16d37 <mutex_exit>
> ec_bind_virq_to_irq+0x99: addb %al,(%rax)
> << assemble after corrupt
> ec_bind_virq_to_irq+0x9b: addb %al,(%rax)
> ec_bind_virq_to_irq+0x9d: addb %al,(%rax)
> ec_bind_virq_to_irq+0x9f: addb %al,(%rax)
> ec_bind_virq_to_irq+0xa1: sti
> ec_bind_virq_to_irq+0xa2: popq %r14
> ec_bind_virq_to_irq+0xa4: popq %r13
> ec_bind_virq_to_irq+0xa6: popq %r12
> [0]> ::dis
> ec_bind_virq_to_irq+0xa8: popq %rbx
> ec_bind_virq_to_irq+0xa9: leave
> ec_bind_virq_to_irq+0xaa: ret
> ec_bind_virq_to_irq+0xab: leaq +0x110296(%rip),%rdi
> <0xfffffffffb956fd8>
> [0]>
>
> End of debugging information.
>
> Thanks,
> -Baolu
>
> On Wed, Oct 15, 2008 at 5:13 PM, Juergen Keil <[EMAIL PROTECTED]> wrote:
>>> This panic happened between line 729 and 730. The disassemble of this code
>>> is:
>>>
>>> [0]> ec_bind_virq_to_irq::dis
>>> ec_bind_virq_to_irq+0x95: call -0x97a <alloc_irq>
>>> ec_bind_virq_to_irq+0x9a:
>>> movw %ax,0xfffffffffbc46ac0(%r12) <virq_info+0x200>
>>> ec_bind_virq_to_irq+0xa3: movq %r13,%rdi
>>> ec_bind_virq_to_irq+0xa6: call +0x16d35 <mutex_exit>
>>> ec_bind_virq_to_irq+0xab: addb %al,(%rax)
>>> ec_bind_virq_to_irq+0xad: addb %al,(%rax)
>>> ec_bind_virq_to_irq+0xaf: addb %al,(%rax)
>>> ec_bind_virq_to_irq+0xb1: addb %al,(%rax)
>>> ec_bind_virq_to_irq+0xb3: sti
>>> ec_bind_virq_to_irq+0xb4: popq %r14
>>> ec_bind_virq_to_irq+0xb6: popq %r13
>>> ec_bind_virq_to_irq+0xb8: popq %r12
>>> ec_bind_virq_to_irq+0xba: popq %rbx
>>> ec_bind_virq_to_irq+0xbb: leave
>>> ec_bind_virq_to_irq+0xbc: ret
>>
>>
>> The code starting at ec_bind_virq_to_irq+0xab looks corrupted.
>>
>> Seems as if someone stored a 64-bit "0" to that location, corrupting
>> the original code.
>>
>>
>> When you boot the dom0 kernel with options "-kd" and disassemble
>> ec_bind_virq_to_irq+0xab (before the kernel starts running),
>> does the code look different?
>>
>> In case it's already corrupted before the kernel starts running:
>> try to check the dom0 kernel binary /platform/i86xpv/kernel/amd64/unix
>> with "dis -F ec_bind_virq_to_irq /platform/i86xpv/kernel/amd64/unix",
>> check if ec_bind_virq_to_irq+0xab contains reasonble code there.
>>
>>
>> If case the corruption happens at kernel runtime, it might be possble
>> to find the source of the corruption using kmdb and a write watchpoint:
>>
>> ec_bind_virq_to_irq+0xab::wp -w -L 8
>>
>>
>>
>
_______________________________________________
xen-discuss mailing list
[email protected]