On 02/09/2025 11:17 am, Manuel Bouyer wrote: > Hello, > I'm trying to boot a NetBSD PVH dom0 on Xen 4.20. > The same NetBSD kernel works fine with Xen 4.18 > > The boot options are: > menu=Boot netbsd-current PVH Xen420:dev hd0f:;load /netbsd-PVH console=com0 > root=wd0f; multiboot /xen420-debug.gz dom0_mem=1024M console=com1 > com1=38400,8n1 loglvl=all guest_loglvl=all gnttab_max_nr_frames=64 > sync_console=1 dom0=pvh > > and the full log from serial console is attached. > > With 4.20 the boot fails with: > > (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) > (XEN) Freed 664kB init memory > (XEN) d0v0 Triple fault - invoking HVM shutdown action 1 > (XEN) *** Dumping Dom0 vcpu#0 state: *** > (XEN) ----[ Xen-4.20.2-pre_20250821nb0 x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 7 > (XEN) RIP: 0008:[<000000000020e268>] > (XEN) RFLAGS: 0000000000010006 CONTEXT: hvm guest (d0v0) > (XEN) rax: 000000002024c003 rbx: 000000000020e260 rcx: 00000000000dfeb7 > (XEN) rdx: 0000000000100000 rsi: 0000000000103000 rdi: 000000000013e000 > (XEN) rbp: 0000000080000000 rsp: 00000000014002e4 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000 > (XEN) cr3: 0000000000000000 cr2: 0000000000000000 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > (XEN) ds: 0010 es: 0010 fs: 0000 gs: 0000 ss: 0010 cs: 0008 > > because of the triple fault the RIP above doens't point to the code. > > I tracked it down to this code: > cmpl $0,%ecx ; /* zero-sized? */ \ > je 2f ; \ > pushl %ebp ; \ > movl RELOC(nox_flag),%ebp ; \ > 1: movl %ebp,(PDE_SIZE-4)(%ebx) ; /* upper 32 bits: NX */ \ > movl %eax,(%ebx) ; /* store phys addr */ \ > addl $PDE_SIZE,%ebx ; /* next PTE/PDE */ \ > addl $PAGE_SIZE,%eax ; /* next phys page */ \ > loop 1b ; \ > popl %ebp ; \ > 2: ; > > there are others pushl/popl before so I don't think that's the problem > (in fact the exact same fragment is called just before with different > inputs and it doesn't fault). So the culprit it probably the write to (%ebx), > which would be 0x20e260 > This is in the range: > (XEN) [0000000000100000, 0000000040068e77] (usable) > so I can't see why this would be a problem. > > Any idea, including how to debug this further, welcome
Even though triple fault's are aborts, they're generally accurate under virt, so 0x20e268 is most likely where things die. Your best bet is to instrument hvm_triple_fault(). e.g. diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 23bd7f078a1d..0f960576b3e6 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1746,14 +1746,22 @@ void hvm_hlt(unsigned int eflags) void hvm_triple_fault(void) { + const struct cpu_user_regs *regs = guest_cpu_user_regs(); struct vcpu *v = current; struct domain *d = v->domain; u8 reason = d->arch.hvm.params[HVM_PARAM_TRIPLE_FAULT_REASON]; + char insns[32]; gprintk(XENLOG_ERR, "Triple fault - invoking HVM shutdown action %d\n", reason); vcpu_show_execution_state(v); + + if ( copy_from_user_hvm(insns, _p(regs->rip), ARRAY_SIZE(insns)) ) + printk("Guest code stream: %32ph\n", insns); + else + printk("Bad pagetables for %%rip\n"); + domain_shutdown(d, reason); } will try and get you the instruction which finally caused the fault. (compile tested only) Given that 4.18 works and 4.20 doesn't, it's probably to do with the initial memory map of the guest. Do you have the full logs of both boots? Just for sanity sake (I don't know if it's going to be relevant or not), boot with dom0=pvh,pf-fixup which just might highlight if we've got a mixup with the memory map presented to the guest vs the system. ~Andrew