On 09/12/2020 10:15, Manuel Bouyer wrote: > On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote: >> On 08/12/2020 17:57, Manuel Bouyer wrote: >>> Hello, >>> for the first time I tried to boot a xen kernel from devel with >>> a NetBSD PV dom0. The kernel boots, but when the first userland prcess >>> is launched, it seems to enter a loop involving search_pre_exception_table() >>> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) >>> >>> With xen 4.13 I see it, but exactly once: >>> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 >>> >>> with devel: >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> [...] >>> >>> the dom0 kernel is the same. >>> >>> At first glance it looks like a fault in the guest is not handled at it >>> should, >>> and the userland process keeps faulting on the same address. >>> >>> Any idea what to look at ? >> That is a reoccurring fault on IRET back to guest context, and is >> probably caused by some unwise-in-hindsight cleanup which doesn't >> escalate the failure to the failsafe callback. >> >> This ought to give something useful to debug with: > thanks, I got: > (XEN) IRET fault: #PF[0000] > (XEN) domain_crash called from extable.c:209 > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: 0047:[<00007f7e184007d0>] > (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) > (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 > (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 > (XEN) rbp: 0000000000000000 rsp: 00007f7fff53e3e0 r8: 0000000e00000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 > (XEN) cr3: 0000000079cdb000 cr2: 00007f7fff53e3e0 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 > > (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 > (XEN) Guest stack trace from rsp=00007f7fff53e3e0: > (XEN) 0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000 > (XEN) 0000000000000003 000000004b600040 0000000000000004 0000000000000038 > (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 > (XEN) 0000000000000007 00007f7e18400000 0000000000000008 0000000000000000 > (XEN) 0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000 > (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 > (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000 > (XEN) 00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000 > (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
Pagefaults on IRET come either from stack accesses for operands (not the case here as Xen is otherwise working fine), or from segement selector loads for %cs and %ss. In this example, %ss is in the LDT, which specifically does use pagefaults to promote the frame to PGT_segdesc. I suspect that what is happening is that handle_ldt_mapping_fault() is failing to promote the page (for some reason), and we're taking the "In hypervisor mode? Leave it to the #PF handler to fix up." path due to the confusion in context, and Xen's #PF handler is concluding "nothing else to do". The older behaviour of escalating to the failsafe callback would have broken this cycle by rewriting %ss and re-entering the kernel. Please try the attached debugging patch, which is an extension of what I gave you yesterday. First, it ought to print %cr2, which I expect will point to Xen's virtual mapping of the vcpu's LDT. The logic ought to loop a few times so we can inspect the hypervisor codepaths which are effectively livelocked in this state, and I've also instrumented check_descriptor() failures because I've got a gut feeling that is the root cause of the problem. ~Andrew
>From 841a6950fec5b43b370653e0c833a54fed64882e Mon Sep 17 00:00:00 2001 From: Andrew Cooper <andrew.coop...@citrix.com> Date: Wed, 9 Dec 2020 12:50:38 +0000 Subject: extable-dbg diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c index 70972f1085..88b05bef38 100644 --- a/xen/arch/x86/extable.c +++ b/xen/arch/x86/extable.c @@ -191,6 +191,10 @@ static int __init stub_selftest(void) __initcall(stub_selftest); #endif +#include <xen/sched.h> +#include <xen/softirq.h> +const char *vec_name(unsigned int vec); + unsigned long search_pre_exception_table(struct cpu_user_regs *regs) { @@ -199,7 +203,21 @@ search_pre_exception_table(struct cpu_user_regs *regs) __start___pre_ex_table, __stop___pre_ex_table-1, addr); if ( fixup ) { - dprintk(XENLOG_INFO, "Pre-exception: %p -> %p\n", _p(addr), _p(fixup)); + static int count; + + printk(XENLOG_ERR "IRET fault: %s[%04x]\n", + vec_name(regs->entry_vector), regs->error_code); + + if ( regs->entry_vector == X86_EXC_PF ) + printk(XENLOG_ERR "%%cr2 %016lx\n", read_cr2()); + + if ( count++ > 2 ) + { + domain_crash(current->domain); + for ( ;; ) + do_softirq(); + } + perfc_incr(exception_fixed); } return fixup; diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c index 39c1a2311a..6bc58bba67 100644 --- a/xen/arch/x86/pv/descriptor-tables.c +++ b/xen/arch/x86/pv/descriptor-tables.c @@ -282,6 +282,10 @@ int validate_segdesc_page(struct page_info *page) unmap_domain_page(descs); + if ( i != 512 ) + printk_once("Check Descriptor failed: idx %u, a: %08x, b: %08x\n", + i, descs[i].a, descs[i].b); + return i == 512 ? 0 : -EINVAL; } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 0459cee9fb..1059f3ce66 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -687,7 +687,7 @@ const char *trapstr(unsigned int trapnr) return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???"; } -static const char *vec_name(unsigned int vec) +const char *vec_name(unsigned int vec) { static const char names[][4] = { #define P(x) [X86_EXC_ ## x] = "#" #x