xen panic
Hello, build from 202005272200Z panics on Xen, on both i386 and amd64 but for different reasons: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/ i386 fails early at boot with: [ 1.000] panic: kernel diagnostic assertion "!*zpte" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/pmap.c", line 3832 pmap_zero_page: lock botch [ 1.000] cpu0: Begin traceback... [ 1.000] vpanic(c059333c,c0da7e00,c0da7e2c,c0127fb3,c059333c,c058e222,c0593bf7,c0592904,ef8,1) at netbsd:vpanic+0x134 [ 1.000] kern_assert(c059333c,c058e222,c0593bf7,c0592904,ef8,1,bfe07090,8000,0,c17a8830) at netbsd:kern_assert+0x23 [ 1.000] pmap_zero_page(1ffef000,0,c03d53c5,c0c875aa,c0c862b0,1,8,c0c875aa,2,1) at netbsd:pmap_zero_page+0x1e3 [ 1.000] uvm_pagealloc_strat(0,0,0,0,3,0,0,15554000,1ffee000,0) at netbsd:uvm_pagealloc_strat+0x2d6 [ 1.000] pmap_get_physpage(8,1,3abee003,1,10002,8,8,8,28,c) at netbsd:pmap_get_physpage+0x203 [ 1.000] pmap_growkernel(d6cfd000,c05b90ea,c17a9000,15554000,1000,0,0,0,0,10002) at netbsd:pmap_growkernel+0xce [ 1.000] uvm_km_bootstrap(c17a9000,f560,0,c17a9000,f560,c0da7fb0,c055f14a,e,3,9) at netbsd:uvm_km_bootstrap+0x2c8 [ 1.000] uvm_init(e,3,9,2,0,0,c0da5000,7ff,c0e1b000,756e6547) at netbsd:uvm_init+0x63 amd64 can boot and run tests, but panics with: kernel/t_trapsignal (97/860): 20 test cases bus_handle: [0.193910s] Passed. bus_handle_recurse: [0.201020s] Passed. bus_ignore: [0.200598s] Passed. bus_mask: [0.199164s] Passed. bus_simple: [0.199066s] Passed. fpe_handle: [0.210561s] Passed. fpe_handle_recurse: [ 872.0704774] panic: kernel diagnostic assertion "curlwp->l_md.md_flags & MDL_FPU_IN_CPU" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/fpu.c", line 487 [ 872.0704774] cpu0: Begin traceback... [ 872.0704774] vpanic() at netbsd:vpanic+0x146 [ 872.0704774] kern_assert() at netbsd:kern_assert+0x48 [ 872.0704774] fputrap() at netbsd:fputrap+0x171 [ 872.0704774] cpu0: End traceback... [ 872.0704774] dumping to dev 168,1 (offset=524254, size=0): not possible [ 872.0704774] rebooting... Any idea what could have changed to cause this ? 2020-05-26 08:40 UTC builds did complete tests. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: pmap lock changes: Xen panic
Manuel, On Tue, Jan 07, 2020 at 10:39:33AM +0100, Manuel Bouyer wrote: > Hello, > with 2020-01-05 00:40 UTC sources, Xen domUs panics because of what looks like > locking changes in the pmap code (full log at > http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/): > mlock error: Mutex: mutex_vector_enter,506: assertion failed: !cpu_intr_p(): > lock 0xc0b46300 cpu 0 lwp 0xc1aac100 > [ 39.3701414] cpu0: Begin traceback... > [ 39.3701414] > vpanic(c05b72e8,d7c7c728,d7c7c744,c041c699,c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300) > at netbsd:vpanic+0x134 > [ 39.3701414] > panic(c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300,0,c1aac100,d7c7c768,c03d2ec9) > at netbsd:panic+0x18 > [ 39.3701414] > lockdebug_abort(c05746b4,1fa,c0b46300,c0b414c8,c05b0454,c0b46240,c0b46300,d7c7c794,c03d3405,c05b0454) > at netbsd:lockdebug_abort+0xc9 > [ 39.3701414] > mutex_abort(c05b0454,0,d39e7004,79339,c0b37340,c1aac100,c0b46240,c0b46300,c1a5e1e8,d7c7c7dc) > at netbsd:mutex_abort+0x39 > [ 39.3701414] > mutex_vector_enter(c0b46300,d7c7c7b8,d7c7c7b4,c03cc135,86908857,d7c7a024,c0b37340,c1aac100,d7c7c7d4,c011924f) > at netbsd:mutex_vector_enter+0x355 > [ 39.3701414] > pmap_extract_ma(c0b46240,d6db3000,d7c7c820,0,c1733508,6,d7e7e000,c1a5e1e0,6,c1a5e008) > at netbsd:pmap_extract_ma+0x1a > [ 39.3701414] > xbd_diskstart(c189e908,c2672e2c,1c0,d7c7c884,c011d25e,c0b35880,c1a5e018,32b0,c1a5e008,) > at netbsd:xbd_diskstart+0x234 > [ 39.3701414] > dk_start(c1a5e008,0,c23fc4dc,23f1000,0,1,c2670558,c1a5e1e0,6,d7e7e000) at > netbsd:dk_start+0xea > [ 39.3701414] > xbd_handler(c1a5e008,6,d7c7c978,c18a6318,d7c7c924,c011cc99,c18a6318,d7c7c978,d7c7ca3c,c04a1a7d) > at netbsd:xbd_handler+0x12e > [ 39.3701414] > xen_intr_biglock_wrapper(c18a6318,d7c7c978,d7c7ca3c,c04a1a7d,c1cba008,23f,0,d7c7c9ec,d7c7ca10,c0c589b8) > at netbsd:xen_intr_biglock_wrapper+0x1f > [ 39.3701414] > evtchn_do_event(6,d7c7c978,c0e670fc,c0e62548,c0e68494,0,c0b37340,c0d9a000,c0b37340,0) > at netbsd:evtchn_do_event+0xf9 > [ 39.3701414] > do_hypervisor_callback(d7c7c978,0,11,31,11,c0b40011,0,0,d7c7ca44,bfec0d70) at > netbsd:do_hypervisor_callback+0x15f > [ 39.3701414] > Xhypervisor_pvhvm_callback(c0b46240,d81ae000,696f3000,1,1ef3000,0,1,21,7ff0,21) > at netbsd:Xhypervisor_pvhvm_callback+0x67 > > > In rev 1.35 of xen/x86/xen_pmap.c the pmap lock is taken unconditionally, > even in the pmap_kernel() case, which means pmap_extract_ma() can't be > used from interrrupt context any more. I don't think we can impose such > restrictions on pmap_kernel(); bus_dma(9) needs it. This was a mistake on my part. We should never need to lock pmap_kernel() for pmap_extract() since it only touches the PTEs. It should be fixed now with xen_pmap.c 1.37. Cheers, Andrew
pmap lock changes: Xen panic
Hello, with 2020-01-05 00:40 UTC sources, Xen domUs panics because of what looks like locking changes in the pmap code (full log at http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/): mlock error: Mutex: mutex_vector_enter,506: assertion failed: !cpu_intr_p(): lock 0xc0b46300 cpu 0 lwp 0xc1aac100 [ 39.3701414] cpu0: Begin traceback... [ 39.3701414] vpanic(c05b72e8,d7c7c728,d7c7c744,c041c699,c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300) at netbsd:vpanic+0x134 [ 39.3701414] panic(c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300,0,c1aac100,d7c7c768,c03d2ec9) at netbsd:panic+0x18 [ 39.3701414] lockdebug_abort(c05746b4,1fa,c0b46300,c0b414c8,c05b0454,c0b46240,c0b46300,d7c7c794,c03d3405,c05b0454) at netbsd:lockdebug_abort+0xc9 [ 39.3701414] mutex_abort(c05b0454,0,d39e7004,79339,c0b37340,c1aac100,c0b46240,c0b46300,c1a5e1e8,d7c7c7dc) at netbsd:mutex_abort+0x39 [ 39.3701414] mutex_vector_enter(c0b46300,d7c7c7b8,d7c7c7b4,c03cc135,86908857,d7c7a024,c0b37340,c1aac100,d7c7c7d4,c011924f) at netbsd:mutex_vector_enter+0x355 [ 39.3701414] pmap_extract_ma(c0b46240,d6db3000,d7c7c820,0,c1733508,6,d7e7e000,c1a5e1e0,6,c1a5e008) at netbsd:pmap_extract_ma+0x1a [ 39.3701414] xbd_diskstart(c189e908,c2672e2c,1c0,d7c7c884,c011d25e,c0b35880,c1a5e018,32b0,c1a5e008,) at netbsd:xbd_diskstart+0x234 [ 39.3701414] dk_start(c1a5e008,0,c23fc4dc,23f1000,0,1,c2670558,c1a5e1e0,6,d7e7e000) at netbsd:dk_start+0xea [ 39.3701414] xbd_handler(c1a5e008,6,d7c7c978,c18a6318,d7c7c924,c011cc99,c18a6318,d7c7c978,d7c7ca3c,c04a1a7d) at netbsd:xbd_handler+0x12e [ 39.3701414] xen_intr_biglock_wrapper(c18a6318,d7c7c978,d7c7ca3c,c04a1a7d,c1cba008,23f,0,d7c7c9ec,d7c7ca10,c0c589b8) at netbsd:xen_intr_biglock_wrapper+0x1f [ 39.3701414] evtchn_do_event(6,d7c7c978,c0e670fc,c0e62548,c0e68494,0,c0b37340,c0d9a000,c0b37340,0) at netbsd:evtchn_do_event+0xf9 [ 39.3701414] do_hypervisor_callback(d7c7c978,0,11,31,11,c0b40011,0,0,d7c7ca44,bfec0d70) at netbsd:do_hypervisor_callback+0x15f [ 39.3701414] Xhypervisor_pvhvm_callback(c0b46240,d81ae000,696f3000,1,1ef3000,0,1,21,7ff0,21) at netbsd:Xhypervisor_pvhvm_callback+0x67 In rev 1.35 of xen/x86/xen_pmap.c the pmap lock is taken unconditionally, even in the pmap_kernel() case, which means pmap_extract_ma() can't be used from interrrupt context any more. I don't think we can impose such restrictions on pmap_kernel(); bus_dma(9) needs it. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Xen panic in lwp_need_userret()
OK, I just checked in a fix. Andrew On Fri, Nov 29, 2019 at 09:42:44AM +0100, Manuel Bouyer wrote: > On Tue, Nov 26, 2019 at 01:38:08PM +, Andrew Doran wrote: > > Hi Manuel, > > > > On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote: > > > > > Any chance this has been fixed since 2 days ago ? > > > > Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c. > > Well, the 201911261940Z now panics with: > [ 1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface > [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver > [ 1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) > != 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", > line 317 > [ 1.030] cpu0: Begin traceback... > [ 1.030] vpanic() at netbsd:vpanic+0x146 > [ 1.030] kern_assert() at netbsd:kern_assert+0x48 > [ 1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3 > [ 1.030] hardclock() at netbsd:hardclock+0xc4 > [ 1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66 > [ 1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49 > [ 1.030] --- interrupt --- > [ 1.030] Xspllower() at netbsd:Xspllower+0xe > [ 1.030] cpu0: End traceback... > > (http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt) > > -- > Manuel Bouyer > NetBSD: 26 ans d'experience feront toujours la difference > --
Re: Xen panic in lwp_need_userret()
Hi, On Fri, Nov 29, 2019 at 09:42:44AM +0100, Manuel Bouyer wrote: > > Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c. > > Well, the 201911261940Z now panics with: > [ 1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface > [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver > [ 1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) > != 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", > line 317 > [ 1.030] cpu0: Begin traceback... > [ 1.030] vpanic() at netbsd:vpanic+0x146 > [ 1.030] kern_assert() at netbsd:kern_assert+0x48 > [ 1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3 > [ 1.030] hardclock() at netbsd:hardclock+0xc4 > [ 1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66 > [ 1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49 > [ 1.030] --- interrupt --- > [ 1.030] Xspllower() at netbsd:Xspllower+0xe > [ 1.030] cpu0: End traceback... > > (http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt) You have a CPU hog stuck in kernel and it's trying to force it off with a kernel preemption, but NetBSD/xen doesn't have kernel preemption. I'll fix it when I get home from work later. Andrew
Re: Xen panic in lwp_need_userret()
On Tue, Nov 26, 2019 at 01:38:08PM +, Andrew Doran wrote: > Hi Manuel, > > On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote: > > > Any chance this has been fixed since 2 days ago ? > > Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c. Well, the 201911261940Z now panics with: [ 1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver [ 1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) != 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", line 317 [ 1.030] cpu0: Begin traceback... [ 1.030] vpanic() at netbsd:vpanic+0x146 [ 1.030] kern_assert() at netbsd:kern_assert+0x48 [ 1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3 [ 1.030] hardclock() at netbsd:hardclock+0xc4 [ 1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66 [ 1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49 [ 1.030] --- interrupt --- [ 1.030] Xspllower() at netbsd:Xspllower+0xe [ 1.030] cpu0: End traceback... (http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Xen panic in lwp_need_userret()
Hi Manuel, On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote: > Any chance this has been fixed since 2 days ago ? Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c. Cheers, Andrew
Xen panic in lwp_need_userret()
Hello, as shown here: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/i386/201911240030Z_anita.txt Xen domU panics at boot with: [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver [ 1.030] panic: kernel diagnostic assertion "lwp_locked(l, NULL)" failed: file "/home/source/ab/HEAD/src/sys/kern/kern_lwp.c", line 1641 [ 1.030] cpu0: Begin traceback... [ 1.030] vpanic(c0576fec,c0d6af0c,c0d6af20,c03cabc2,c0576fec,c0576f86,c05a4b17,c05a77c4,669,cce48314) at netbsd:vpanic+0x134 [ 1.030] kern_assert(c0576fec,c0576f86,c05a4b17,c05a77c4,669,cce48314,c0d6af4c,c03edf47,c0b36ce0,330d9a57) at netbsd:kern_assert+0x23 [ 1.030] lwp_need_userret(c0b36ce0,330d9a57,11db16,c0b36ce0,32c,0,c03e0b00,0,314,c0d6af64) at netbsd:lwp_need_userret+0x52 [ 1.030] softint_schedule(314,c03e0b00,0,0,c0d6afb0,c0548f9a,c0c89c18,6,3,0) at netbsd:softint_schedule+0xd7 [ 1.030] rnd_init_softint(c0c89c18,6,3,0,3,9,1,0,0,c0d68000) at netbsd:rnd_init_softint+0x5e [ 1.030] main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x3ca [ 1.030] cpu0: End traceback... [ 1.030] fatal breakpoint trap in supervisor mode [ 1.030] trap type 1 code 0 eip 0xc0105574 cs 0x9 eflags 0x202 cr2 0 ilevel 0x8 esp 0xc0d6aef0 [ 1.030] curlwp 0xc0b36ce0 pid 0 lid 1 lowest kstack 0xc0d682c0 Any chance this has been fixed since 2 days ago ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --