xen panic

2020-05-30 Thread Manuel Bouyer
Hello,
build from 202005272200Z panics on Xen, on both i386 and amd64 but for
different reasons: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/

i386 fails early at boot with:
[   1.000] panic: kernel diagnostic assertion "!*zpte" failed: file 
"/home/source/ab/HEAD/src/sys/arch/x86/x86/pmap.c", line 3832 pmap_zero_page: 
lock botch
[   1.000] cpu0: Begin traceback...
[   1.000] 
vpanic(c059333c,c0da7e00,c0da7e2c,c0127fb3,c059333c,c058e222,c0593bf7,c0592904,ef8,1)
 at netbsd:vpanic+0x134
[   1.000] 
kern_assert(c059333c,c058e222,c0593bf7,c0592904,ef8,1,bfe07090,8000,0,c17a8830)
 at netbsd:kern_assert+0x23
[   1.000] 
pmap_zero_page(1ffef000,0,c03d53c5,c0c875aa,c0c862b0,1,8,c0c875aa,2,1) at 
netbsd:pmap_zero_page+0x1e3
[   1.000] uvm_pagealloc_strat(0,0,0,0,3,0,0,15554000,1ffee000,0) at 
netbsd:uvm_pagealloc_strat+0x2d6
[   1.000] pmap_get_physpage(8,1,3abee003,1,10002,8,8,8,28,c) at 
netbsd:pmap_get_physpage+0x203
[   1.000] 
pmap_growkernel(d6cfd000,c05b90ea,c17a9000,15554000,1000,0,0,0,0,10002) at 
netbsd:pmap_growkernel+0xce
[   1.000] 
uvm_km_bootstrap(c17a9000,f560,0,c17a9000,f560,c0da7fb0,c055f14a,e,3,9) 
at netbsd:uvm_km_bootstrap+0x2c8
[   1.000] uvm_init(e,3,9,2,0,0,c0da5000,7ff,c0e1b000,756e6547) at 
netbsd:uvm_init+0x63

amd64 can boot and run tests, but panics with:
kernel/t_trapsignal (97/860): 20 test cases
bus_handle: [0.193910s] Passed.
bus_handle_recurse: [0.201020s] Passed.
bus_ignore: [0.200598s] Passed.
bus_mask: [0.199164s] Passed.
bus_simple: [0.199066s] Passed.
fpe_handle: [0.210561s] Passed.
fpe_handle_recurse: [ 872.0704774] panic: kernel diagnostic assertion 
"curlwp->l_md.md_flags & MDL_FPU_IN_CPU" failed: file 
"/home/source/ab/HEAD/src/sys/arch/x86/x86/fpu.c", line 487 
[ 872.0704774] cpu0: Begin traceback...
[ 872.0704774] vpanic() at netbsd:vpanic+0x146
[ 872.0704774] kern_assert() at netbsd:kern_assert+0x48
[ 872.0704774] fputrap() at netbsd:fputrap+0x171
[ 872.0704774] cpu0: End traceback...

[ 872.0704774] dumping to dev 168,1 (offset=524254, size=0): not possible
[ 872.0704774] rebooting...

Any idea what could have changed to cause this ?
2020-05-26 08:40 UTC builds did complete tests.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: pmap lock changes: Xen panic

2020-01-07 Thread Andrew Doran
Manuel,

On Tue, Jan 07, 2020 at 10:39:33AM +0100, Manuel Bouyer wrote:

> Hello,
> with 2020-01-05 00:40 UTC sources, Xen domUs panics because of what looks like
> locking changes in the pmap code (full log at
> http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/):
> mlock error: Mutex: mutex_vector_enter,506: assertion failed: !cpu_intr_p(): 
> lock 0xc0b46300 cpu 0 lwp 0xc1aac100
> [  39.3701414] cpu0: Begin traceback...
> [  39.3701414] 
> vpanic(c05b72e8,d7c7c728,d7c7c744,c041c699,c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300)
>  at netbsd:vpanic+0x134
> [  39.3701414] 
> panic(c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300,0,c1aac100,d7c7c768,c03d2ec9)
>  at netbsd:panic+0x18
> [  39.3701414] 
> lockdebug_abort(c05746b4,1fa,c0b46300,c0b414c8,c05b0454,c0b46240,c0b46300,d7c7c794,c03d3405,c05b0454)
>  at netbsd:lockdebug_abort+0xc9
> [  39.3701414] 
> mutex_abort(c05b0454,0,d39e7004,79339,c0b37340,c1aac100,c0b46240,c0b46300,c1a5e1e8,d7c7c7dc)
>  at netbsd:mutex_abort+0x39
> [  39.3701414] 
> mutex_vector_enter(c0b46300,d7c7c7b8,d7c7c7b4,c03cc135,86908857,d7c7a024,c0b37340,c1aac100,d7c7c7d4,c011924f)
>  at netbsd:mutex_vector_enter+0x355
> [  39.3701414] 
> pmap_extract_ma(c0b46240,d6db3000,d7c7c820,0,c1733508,6,d7e7e000,c1a5e1e0,6,c1a5e008)
>  at netbsd:pmap_extract_ma+0x1a
> [  39.3701414] 
> xbd_diskstart(c189e908,c2672e2c,1c0,d7c7c884,c011d25e,c0b35880,c1a5e018,32b0,c1a5e008,)
>  at netbsd:xbd_diskstart+0x234
> [  39.3701414] 
> dk_start(c1a5e008,0,c23fc4dc,23f1000,0,1,c2670558,c1a5e1e0,6,d7e7e000) at 
> netbsd:dk_start+0xea
> [  39.3701414] 
> xbd_handler(c1a5e008,6,d7c7c978,c18a6318,d7c7c924,c011cc99,c18a6318,d7c7c978,d7c7ca3c,c04a1a7d)
>  at netbsd:xbd_handler+0x12e
> [  39.3701414] 
> xen_intr_biglock_wrapper(c18a6318,d7c7c978,d7c7ca3c,c04a1a7d,c1cba008,23f,0,d7c7c9ec,d7c7ca10,c0c589b8)
>  at netbsd:xen_intr_biglock_wrapper+0x1f
> [  39.3701414] 
> evtchn_do_event(6,d7c7c978,c0e670fc,c0e62548,c0e68494,0,c0b37340,c0d9a000,c0b37340,0)
>  at netbsd:evtchn_do_event+0xf9
> [  39.3701414] 
> do_hypervisor_callback(d7c7c978,0,11,31,11,c0b40011,0,0,d7c7ca44,bfec0d70) at 
> netbsd:do_hypervisor_callback+0x15f
> [  39.3701414] 
> Xhypervisor_pvhvm_callback(c0b46240,d81ae000,696f3000,1,1ef3000,0,1,21,7ff0,21)
>  at netbsd:Xhypervisor_pvhvm_callback+0x67
> 
> 
> In rev 1.35 of xen/x86/xen_pmap.c the pmap lock is taken unconditionally,
> even in the pmap_kernel() case, which means pmap_extract_ma() can't be
> used from interrrupt context any more. I don't think we can impose such
> restrictions on pmap_kernel(); bus_dma(9) needs it.

This was a mistake on my part.  We should never need to lock pmap_kernel()
for pmap_extract() since it only touches the PTEs.  It should be fixed now
with xen_pmap.c 1.37.

Cheers,
Andrew


pmap lock changes: Xen panic

2020-01-07 Thread Manuel Bouyer
Hello,
with 2020-01-05 00:40 UTC sources, Xen domUs panics because of what looks like
locking changes in the pmap code (full log at
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/):
mlock error: Mutex: mutex_vector_enter,506: assertion failed: !cpu_intr_p(): 
lock 0xc0b46300 cpu 0 lwp 0xc1aac100
[  39.3701414] cpu0: Begin traceback...
[  39.3701414] 
vpanic(c05b72e8,d7c7c728,d7c7c744,c041c699,c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300)
 at netbsd:vpanic+0x134
[  39.3701414] 
panic(c05b72e8,c05b04e0,c05746b4,1fa,c05b0454,c0b46300,0,c1aac100,d7c7c768,c03d2ec9)
 at netbsd:panic+0x18
[  39.3701414] 
lockdebug_abort(c05746b4,1fa,c0b46300,c0b414c8,c05b0454,c0b46240,c0b46300,d7c7c794,c03d3405,c05b0454)
 at netbsd:lockdebug_abort+0xc9
[  39.3701414] 
mutex_abort(c05b0454,0,d39e7004,79339,c0b37340,c1aac100,c0b46240,c0b46300,c1a5e1e8,d7c7c7dc)
 at netbsd:mutex_abort+0x39
[  39.3701414] 
mutex_vector_enter(c0b46300,d7c7c7b8,d7c7c7b4,c03cc135,86908857,d7c7a024,c0b37340,c1aac100,d7c7c7d4,c011924f)
 at netbsd:mutex_vector_enter+0x355
[  39.3701414] 
pmap_extract_ma(c0b46240,d6db3000,d7c7c820,0,c1733508,6,d7e7e000,c1a5e1e0,6,c1a5e008)
 at netbsd:pmap_extract_ma+0x1a
[  39.3701414] 
xbd_diskstart(c189e908,c2672e2c,1c0,d7c7c884,c011d25e,c0b35880,c1a5e018,32b0,c1a5e008,)
 at netbsd:xbd_diskstart+0x234
[  39.3701414] 
dk_start(c1a5e008,0,c23fc4dc,23f1000,0,1,c2670558,c1a5e1e0,6,d7e7e000) at 
netbsd:dk_start+0xea
[  39.3701414] 
xbd_handler(c1a5e008,6,d7c7c978,c18a6318,d7c7c924,c011cc99,c18a6318,d7c7c978,d7c7ca3c,c04a1a7d)
 at netbsd:xbd_handler+0x12e
[  39.3701414] 
xen_intr_biglock_wrapper(c18a6318,d7c7c978,d7c7ca3c,c04a1a7d,c1cba008,23f,0,d7c7c9ec,d7c7ca10,c0c589b8)
 at netbsd:xen_intr_biglock_wrapper+0x1f
[  39.3701414] 
evtchn_do_event(6,d7c7c978,c0e670fc,c0e62548,c0e68494,0,c0b37340,c0d9a000,c0b37340,0)
 at netbsd:evtchn_do_event+0xf9
[  39.3701414] 
do_hypervisor_callback(d7c7c978,0,11,31,11,c0b40011,0,0,d7c7ca44,bfec0d70) at 
netbsd:do_hypervisor_callback+0x15f
[  39.3701414] 
Xhypervisor_pvhvm_callback(c0b46240,d81ae000,696f3000,1,1ef3000,0,1,21,7ff0,21) 
at netbsd:Xhypervisor_pvhvm_callback+0x67


In rev 1.35 of xen/x86/xen_pmap.c the pmap lock is taken unconditionally,
even in the pmap_kernel() case, which means pmap_extract_ma() can't be
used from interrrupt context any more. I don't think we can impose such
restrictions on pmap_kernel(); bus_dma(9) needs it.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Xen panic in lwp_need_userret()

2019-11-29 Thread Andrew Doran
OK, I just checked in a fix.

Andrew

On Fri, Nov 29, 2019 at 09:42:44AM +0100, Manuel Bouyer wrote:
> On Tue, Nov 26, 2019 at 01:38:08PM +, Andrew Doran wrote:
> > Hi Manuel,
> > 
> > On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote:
> > 
> > > Any chance this has been fixed since 2 days ago ?
> > 
> > Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c.
> 
> Well, the 201911261940Z now panics with:
> [   1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
> [   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
> [   1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) 
> != 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", 
> line 317 
> [   1.030] cpu0: Begin traceback...
> [   1.030] vpanic() at netbsd:vpanic+0x146
> [   1.030] kern_assert() at netbsd:kern_assert+0x48
> [   1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3
> [   1.030] hardclock() at netbsd:hardclock+0xc4
> [   1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66
> [   1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49
> [   1.030] --- interrupt ---
> [   1.030] Xspllower() at netbsd:Xspllower+0xe
> [   1.030] cpu0: End traceback...
> 
> (http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt)
> 
> -- 
> Manuel Bouyer 
>  NetBSD: 26 ans d'experience feront toujours la difference
> --


Re: Xen panic in lwp_need_userret()

2019-11-29 Thread Andrew Doran
Hi,

On Fri, Nov 29, 2019 at 09:42:44AM +0100, Manuel Bouyer wrote:

> > Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c.
> 
> Well, the 201911261940Z now panics with:
> [   1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
> [   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
> [   1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) 
> != 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", 
> line 317 
> [   1.030] cpu0: Begin traceback...
> [   1.030] vpanic() at netbsd:vpanic+0x146
> [   1.030] kern_assert() at netbsd:kern_assert+0x48
> [   1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3
> [   1.030] hardclock() at netbsd:hardclock+0xc4
> [   1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66
> [   1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49
> [   1.030] --- interrupt ---
> [   1.030] Xspllower() at netbsd:Xspllower+0xe
> [   1.030] cpu0: End traceback...
> 
> (http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt)

You have a CPU hog stuck in kernel and it's trying to force it off with a
kernel preemption, but NetBSD/xen doesn't have kernel preemption.  I'll
fix it when I get home from work later.

Andrew


Re: Xen panic in lwp_need_userret()

2019-11-29 Thread Manuel Bouyer
On Tue, Nov 26, 2019 at 01:38:08PM +, Andrew Doran wrote:
> Hi Manuel,
> 
> On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote:
> 
> > Any chance this has been fixed since 2 days ago ?
> 
> Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c.

Well, the 201911261940Z now panics with:
[   1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
[   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
[   1.030] panic: kernel diagnostic assertion "(flags & RESCHED_UPREEMPT) 
!= 0" failed: file "/home/source/ab/HEAD/src/sys/arch/x86/x86/x86_machdep.c", 
line 317 
[   1.030] cpu0: Begin traceback...
[   1.030] vpanic() at netbsd:vpanic+0x146
[   1.030] kern_assert() at netbsd:kern_assert+0x48
[   1.030] cpu_need_resched() at netbsd:cpu_need_resched+0xb3
[   1.030] hardclock() at netbsd:hardclock+0xc4
[   1.030] xen_timer_handler() at netbsd:xen_timer_handler+0x66
[   1.030] Xresume_xenev7() at netbsd:Xresume_xenev7+0x49
[   1.030] --- interrupt ---
[   1.030] Xspllower() at netbsd:Xspllower+0xe
[   1.030] cpu0: End traceback...

(http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201911261940Z_anita.txt)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Xen panic in lwp_need_userret()

2019-11-26 Thread Andrew Doran
Hi Manuel,

On Tue, Nov 26, 2019 at 09:01:28AM +0100, Manuel Bouyer wrote:

> Any chance this has been fixed since 2 days ago ?

Yes indeed, since yesterday with rev 1.51 src/sys/kern/kern_softint.c.

Cheers,
Andrew


Xen panic in lwp_need_userret()

2019-11-26 Thread Manuel Bouyer
Hello,
as shown here:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/i386/201911240030Z_anita.txt
Xen domU panics at boot with:
[   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
[   1.030] panic: kernel diagnostic assertion "lwp_locked(l, NULL)" failed: 
file "/home/source/ab/HEAD/src/sys/kern/kern_lwp.c", line 1641 
[   1.030] cpu0: Begin traceback...
[   1.030] 
vpanic(c0576fec,c0d6af0c,c0d6af20,c03cabc2,c0576fec,c0576f86,c05a4b17,c05a77c4,669,cce48314)
 at netbsd:vpanic+0x134
[   1.030] 
kern_assert(c0576fec,c0576f86,c05a4b17,c05a77c4,669,cce48314,c0d6af4c,c03edf47,c0b36ce0,330d9a57)
 at netbsd:kern_assert+0x23
[   1.030] 
lwp_need_userret(c0b36ce0,330d9a57,11db16,c0b36ce0,32c,0,c03e0b00,0,314,c0d6af64)
 at netbsd:lwp_need_userret+0x52
[   1.030] 
softint_schedule(314,c03e0b00,0,0,c0d6afb0,c0548f9a,c0c89c18,6,3,0) at 
netbsd:softint_schedule+0xd7
[   1.030] rnd_init_softint(c0c89c18,6,3,0,3,9,1,0,0,c0d68000) at 
netbsd:rnd_init_softint+0x5e
[   1.030] main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x3ca
[   1.030] cpu0: End traceback...
[   1.030] fatal breakpoint trap in supervisor mode
[   1.030] trap type 1 code 0 eip 0xc0105574 cs 0x9 eflags 0x202 cr2 0 
ilevel 0x8 esp 0xc0d6aef0
[   1.030] curlwp 0xc0b36ce0 pid 0 lid 1 lowest kstack 0xc0d682c0

Any chance this has been fixed since 2 days ago ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--