On 09/12/2020 10:15, Manuel Bouyer wrote:
> On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote:
>> On 08/12/2020 17:57, Manuel Bouyer wrote:
>>> Hello,
>>> for the first time I tried to boot a xen kernel from devel with
>>> a NetBSD PV dom0. The kernel boots, but when the first userland prcess
>>> is launched, it seems to enter a loop involving search_pre_exception_table()
>>> (I see an endless stream from the dprintk() at arch/x86/extable.c:202)
>>>
>>> With xen 4.13 I see it, but exactly once:
>>> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8
>>>
>>> with devel:
>>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8    
>>>     
>>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8    
>>>     
>>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8    
>>>     
>>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8    
>>>     
>>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8    
>>>     
>>> [...]
>>>
>>> the dom0 kernel is the same.
>>>
>>> At first glance it looks like a fault in the guest is not handled at it 
>>> should,
>>> and the userland process keeps faulting on the same address.
>>>
>>> Any idea what to look at ?
>> That is a reoccurring fault on IRET back to guest context, and is
>> probably caused by some unwise-in-hindsight cleanup which doesn't
>> escalate the failure to the failsafe callback.
>>
>> This ought to give something useful to debug with:
> thanks, I got:
> (XEN) IRET fault: #PF[0000]                                                 
> (XEN) domain_crash called from extable.c:209                                
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:                                   
> (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:   C   ]----       
> (XEN) CPU:    0                                                             
> (XEN) RIP:    0047:[<00007f7e184007d0>]                                     
> (XEN) RFLAGS: 0000000000000202   EM: 0   CONTEXT: pv guest (d0v0)           
> (XEN) rax: ffff82d04038c309   rbx: 0000000000000000   rcx: 000000000000e008 
> (XEN) rdx: 0000000000010086   rsi: ffff83007fcb7f78   rdi: 000000000000e010 
> (XEN) rbp: 0000000000000000   rsp: 00007f7fff53e3e0   r8:  0000000e00000000 
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000 
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000 
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000002660 
> (XEN) cr3: 0000000079cdb000   cr2: 00007f7fff53e3e0                         
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: ffffffff80cf2dc0   
>  
> (XEN) ds: 0023   es: 0023   fs: 0000   gs: 0000   ss: 003f   cs: 0047       
> (XEN) Guest stack trace from rsp=00007f7fff53e3e0:          
> (XEN)    0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000
> (XEN)    0000000000000003 000000004b600040 0000000000000004 0000000000000038
> (XEN)    0000000000000005 0000000000000008 0000000000000006 0000000000001000
> (XEN)    0000000000000007 00007f7e18400000 0000000000000008 0000000000000000
> (XEN)    0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000
> (XEN)    00000000000007d1 0000000000000000 00000000000007d2 0000000000000000
> (XEN)    00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000
> (XEN)    00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000
> (XEN)    6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

Pagefaults on IRET come either from stack accesses for operands (not the
case here as Xen is otherwise working fine), or from segement selector
loads for %cs and %ss.

In this example, %ss is in the LDT, which specifically does use
pagefaults to promote the frame to PGT_segdesc.

I suspect that what is happening is that handle_ldt_mapping_fault() is
failing to promote the page (for some reason), and we're taking the "In
hypervisor mode? Leave it to the #PF handler to fix up." path due to the
confusion in context, and Xen's #PF handler is concluding "nothing else
to do".

The older behaviour of escalating to the failsafe callback would have
broken this cycle by rewriting %ss and re-entering the kernel.


Please try the attached debugging patch, which is an extension of what I
gave you yesterday.  First, it ought to print %cr2, which I expect will
point to Xen's virtual mapping of the vcpu's LDT.  The logic ought to
loop a few times so we can inspect the hypervisor codepaths which are
effectively livelocked in this state, and I've also instrumented
check_descriptor() failures because I've got a gut feeling that is the
root cause of the problem.

~Andrew
>From 841a6950fec5b43b370653e0c833a54fed64882e Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.coop...@citrix.com>
Date: Wed, 9 Dec 2020 12:50:38 +0000
Subject: extable-dbg


diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 70972f1085..88b05bef38 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -191,6 +191,10 @@ static int __init stub_selftest(void)
 __initcall(stub_selftest);
 #endif
 
+#include <xen/sched.h>
+#include <xen/softirq.h>
+const char *vec_name(unsigned int vec);
+
 unsigned long
 search_pre_exception_table(struct cpu_user_regs *regs)
 {
@@ -199,7 +203,21 @@ search_pre_exception_table(struct cpu_user_regs *regs)
         __start___pre_ex_table, __stop___pre_ex_table-1, addr);
     if ( fixup )
     {
-        dprintk(XENLOG_INFO, "Pre-exception: %p -> %p\n", _p(addr), _p(fixup));
+        static int count;
+
+        printk(XENLOG_ERR "IRET fault: %s[%04x]\n",
+               vec_name(regs->entry_vector), regs->error_code);
+
+        if ( regs->entry_vector == X86_EXC_PF )
+            printk(XENLOG_ERR "%%cr2 %016lx\n", read_cr2());
+
+        if ( count++ > 2 )
+        {
+            domain_crash(current->domain);
+            for ( ;; )
+                do_softirq();
+        }
+
         perfc_incr(exception_fixed);
     }
     return fixup;
diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
index 39c1a2311a..6bc58bba67 100644
--- a/xen/arch/x86/pv/descriptor-tables.c
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -282,6 +282,10 @@ int validate_segdesc_page(struct page_info *page)
 
     unmap_domain_page(descs);
 
+    if ( i != 512 )
+        printk_once("Check Descriptor failed: idx %u, a: %08x, b: %08x\n",
+                    i, descs[i].a, descs[i].b);
+
     return i == 512 ? 0 : -EINVAL;
 }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 0459cee9fb..1059f3ce66 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -687,7 +687,7 @@ const char *trapstr(unsigned int trapnr)
     return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???";
 }
 
-static const char *vec_name(unsigned int vec)
+const char *vec_name(unsigned int vec)
 {
     static const char names[][4] = {
 #define P(x) [X86_EXC_ ## x] = "#" #x

Reply via email to