Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-24 Thread Manuel Bouyer
On Mon, Jul 16, 2018 at 05:02:01AM -0600, Jan Beulich wrote: > > Unfortunably there has been a crash last week: > > Hmm, looks to be still all the same as before (except for the line > number). I'm afraid I'm out of ideas, at least for the moment. OK, FYI I commited xen 4.11 packages for NetBSD,

Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-16 Thread Jan Beulich
>>> On 16.07.18 at 12:30, wrote: > On Fri, Jul 06, 2018 at 04:26:38PM +0200, Manuel Bouyer wrote: >> On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote: >> > > So instead of the debugging patch, could you give the one below >> > > a try? >> > >> > Sure, the test server is now running

Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-16 Thread Manuel Bouyer
On Fri, Jul 06, 2018 at 04:26:38PM +0200, Manuel Bouyer wrote: > On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote: > > > So instead of the debugging patch, could you give the one below > > > a try? > > > > Sure, the test server is now running with it. > > As I'm still using 4.11rc4

Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-06 Thread Manuel Bouyer
On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote: > > So instead of the debugging patch, could you give the one below > > a try? > > Sure, the test server is now running with it. > As I'm still using 4.11rc4 sources I had to adjust it a bit (the second chunk > didn't apply cleanly)

Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-03 Thread Manuel Bouyer
On Tue, Jul 03, 2018 at 09:14:30AM -0600, Jan Beulich wrote: > >>> On 25.06.18 at 10:33, wrote: > > the dom0 has been running for a week now, running the daily NetBSD tests. > > Attached is the console log. > > I didn't notice anything suspect, exept a few domU crashes (crashing in > > Xen, the

Re: [Xen-devel] 4.11.0 RC1 panic

2018-07-03 Thread Jan Beulich
>>> On 25.06.18 at 10:33, wrote: > the dom0 has been running for a week now, running the daily NetBSD tests. > Attached is the console log. > I didn't notice anything suspect, exept a few domU crashes (crashing in > Xen, the problem is not reported back to the domU). But as this is > running

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-26 Thread Jan Beulich
>>> On 25.06.18 at 10:33, wrote: > On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote: >> > So far I've not been able to make Xen panic with the new xen kernel. >> > Attached is a log of the serial console, in case you notice something. >> >> None of the printk()s replacing ASSERT()s

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-25 Thread Manuel Bouyer
On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote: > > So far I've not been able to make Xen panic with the new xen kernel. > > Attached is a log of the serial console, in case you notice something. > > None of the printk()s replacing ASSERT()s have triggered, so nothing > interesting

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-15 Thread Manuel Bouyer
On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote: > > So far I've not been able to make Xen panic with the new xen kernel. > > Attached is a log of the serial console, in case you notice something. > > None of the printk()s replacing ASSERT()s have triggered, so nothing > interesting

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-14 Thread Jan Beulich
>>> On 14.06.18 at 00:16, wrote: > On Wed, Jun 13, 2018 at 03:59:19AM -0600, Jan Beulich wrote: >> >>> On 13.06.18 at 10:57, wrote: >> > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote: >> >> >> >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596 >> >> >> >> In fact,

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Manuel Bouyer
On Wed, Jun 13, 2018 at 03:59:19AM -0600, Jan Beulich wrote: > >>> On 13.06.18 at 10:57, wrote: > > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote: > >> > >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596 > >> > >> In fact, there's no assertion with that expression

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Jan Beulich
>>> On 13.06.18 at 10:57, wrote: > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote: >> >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596 >> >> In fact, there's no assertion with that expression anywhere I could >> see. Do you have any local patches in place? > > Yes, 2

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Manuel Bouyer
On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote: > > (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596 > > In fact, there's no assertion with that expression anywhere I could > see. Do you have any local patches in place? Yes, 2 of them from you (the first one is where the

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Jan Beulich
>>> On 12.06.18 at 22:55, wrote: > On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote: >> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: >> > I applied this patch to 4.11rc4 (let's not change too much things at the >> > same time) and rebooted my test host. Hopefully

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Jan Beulich
>>> On 12.06.18 at 18:29, wrote: > On 12/06/18 17:00, Manuel Bouyer wrote: >> On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote: >>> On 12/06/18 16:38, Manuel Bouyer wrote: On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: > I applied this patch to 4.11rc4 (let's

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-13 Thread Jan Beulich
>>> On 12.06.18 at 22:55, wrote: > On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote: >> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: >> > I applied this patch to 4.11rc4 (let's not change too much things at the >> > same time) and rebooted my test host. Hopefully

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Manuel Bouyer
On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote: > On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: > > I applied this patch to 4.11rc4 (let's not change too much things at the > > same time) and rebooted my test host. Hopefully I'll have some data to > > report > >

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Andrew Cooper
On 12/06/18 17:00, Manuel Bouyer wrote: > On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote: >> On 12/06/18 16:38, Manuel Bouyer wrote: >>> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: I applied this patch to 4.11rc4 (let's not change too much things at the

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Manuel Bouyer
On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote: > On 12/06/18 16:38, Manuel Bouyer wrote: > > On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: > >> I applied this patch to 4.11rc4 (let's not change too much things at the > >> same time) and rebooted my test host.

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Andrew Cooper
On 12/06/18 16:38, Manuel Bouyer wrote: > On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote: >> I applied this patch to 4.11rc4 (let's not change too much things at the >> same time) and rebooted my test host. Hopefully I'll have some data to report >> soon > Got the first panic (still

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Manuel Bouyer
On Tue, Jun 12, 2018 at 01:57:35AM -0600, Jan Beulich wrote: > Let's focus on this scenario for now, as it is under better (timing) control > on the Xen side. Below is a first debugging patch which > - avoids the ASSERT() in question, instead triggering a printk(), in the hope > that the data

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-12 Thread Jan Beulich
>>> On 10.06.18 at 12:57, wrote: > (XEN) Xen call trace: > (XEN)[] mm.c#dec_linear_entries+0x12/0x20 > (XEN)[] mm.c#_put_page_type+0x13e/0x350 > (XEN)[] _spin_lock+0xd/0x50 > (XEN)[] mm.c#put_page_from_l2e+0xdf/0x110 > (XEN)[] free_page_type+0x2f9/0x790 > (XEN)[]

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-11 Thread Manuel Bouyer
On Mon, Jun 11, 2018 at 03:58:01AM -0600, Jan Beulich wrote: > >>> On 10.06.18 at 18:32, wrote: > > On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote: > >> What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't > >> allow > >> them to be pinned, hence I'd expect there

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-11 Thread Jan Beulich
>>> On 10.06.18 at 18:32, wrote: > On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote: >> What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't >> allow >> them to be pinned, hence I'd expect there to be some special casing in your >> code. Considering no similar

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-10 Thread Manuel Bouyer
On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote: > >>> Manuel Bouyer 06/10/18 1:30 PM >>> > >When a new set of page tables is needed (this is pmap_create()), a pdp is > >requested from a cache. If the cache is empty, pages are allocated in > >pmap_pdp_ctor(), which is going to also

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-10 Thread Jan Beulich
>>> Manuel Bouyer 06/10/18 1:30 PM >>> >When a new set of page tables is needed (this is pmap_create()), a pdp is >requested from a cache. If the cache is empty, pages are allocated in >pmap_pdp_ctor(), which is going to also pin the L2 pages. >When the page table is not needed any more (this is

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-10 Thread Manuel Bouyer
On Sun, Jun 10, 2018 at 03:54:45AM -0600, Jan Beulich wrote: > [...] > > So I've been trying to look into this some more, and I've noticed an oddity in > the raw stack dump you had provided with the first report. Unfortunately you > didn't include that part for either of the above (the first one

Re: [Xen-devel] 4.11.0 RC1 panic

2018-06-10 Thread Jan Beulich
>>> Manuel Bouyer 05/22/18 1:01 PM >>> >So far I've seen 2 stack traces with 4.11: >(XEN) Xen call trace: >(XEN)[] mm.c#dec_linear_entries+0x12/0x20 >(XEN)[] mm.c#_put_page_type+0x13e/0x350 >(XEN)[] _spin_lock+0xd/0x50 >(XEN)[] mm.c#put_page_from_l2e+0xdf/0x110 >(XEN)[]

Re: [Xen-devel] 4.11.0 RC1 panic

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 13:01, wrote: > On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote: >> - reduce the test environment (ideally to a simple [XTF?] test), or >> - at least narrow the conditions, or > > Now that I know where to find the domU number in the panic

Re: [Xen-devel] 4.11.0 RC1 panic

2018-05-22 Thread Manuel Bouyer
On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote: > >> So in combination with your later reply I'm confused: Are you observing > >> this with 64-bit guests as well (your later reply appears to hint towards > >> 64-bit-ness), or (as the stack trace suggests) only 32-bit ones? Knowing >

Re: [Xen-devel] 4.11.0 RC1 panic

2018-05-15 Thread Jan Beulich
>>> On 01.05.18 at 22:22, wrote: > On Mon, Apr 30, 2018 at 07:31:28AM -0600, Jan Beulich wrote: >> >>> On 25.04.18 at 16:42, wrote: >> > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: >> >> > Without line numbers associated with at

Re: [Xen-devel] 4.11.0 RC1 panic

2018-05-01 Thread Manuel Bouyer
On Mon, Apr 30, 2018 at 07:31:28AM -0600, Jan Beulich wrote: > >>> On 25.04.18 at 16:42, wrote: > > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: > >> > Without line numbers associated with at least the top stack trace entry > >> > I can only guess what it

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-30 Thread Jan Beulich
>>> On 25.04.18 at 16:42, wrote: > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: >> > Without line numbers associated with at least the top stack trace entry >> > I can only guess what it might be - could you give the patch below a try? >> > (This may not

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Manuel Bouyer
On Wed, Apr 25, 2018 at 09:28:03AM -0600, Jan Beulich wrote: > >>> On 25.04.18 at 16:42, wrote: > > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: > >> > Without line numbers associated with at least the top stack trace entry > >> > I can only guess what it

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Jan Beulich
>>> On 25.04.18 at 16:42, wrote: > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: >> > Without line numbers associated with at least the top stack trace entry >> > I can only guess what it might be - could you give the patch below a try? >> > (This may not

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Manuel Bouyer
On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote: > > Without line numbers associated with at least the top stack trace entry > > I can only guess what it might be - could you give the patch below a try? > > (This may not be the final patch, as I'm afraid there may be some race > >

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Manuel Bouyer
On Wed, Apr 25, 2018 at 09:16:59AM +0100, Andrew Cooper wrote: > Manuel: As a tangentially related question, does NetBSD ever try to page > out its LDT? AFAIK no, LDTs are allocated as kernel wired memory -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Manuel Bouyer
On Wed, Apr 25, 2018 at 12:58:47AM -0600, Jan Beulich wrote: > >>> On 24.04.18 at 18:06, wrote: > > Hello, > > I tested xen 4.11.0 rc1 with NetBSD as dom0. > > I could boot a NetBSD PV domU without problem, but at shutdown time > > (poweroff > > in the domU), I got a Xen

Re: [Xen-devel] 4.11.0 RC1 panic

2018-04-25 Thread Andrew Cooper
On 25/04/2018 07:58, Jan Beulich wrote: On 24.04.18 at 18:06, wrote: >> Hello, >> I tested xen 4.11.0 rc1 with NetBSD as dom0. >> I could boot a NetBSD PV domU without problem, but at shutdown time >> (poweroff >> in the domU), I got a Xen panic: >> (XEN) Assertion

[Xen-devel] 4.11.0 RC1 panic

2018-04-24 Thread Manuel Bouyer
Hello, I tested xen 4.11.0 rc1 with NetBSD as dom0. I could boot a NetBSD PV domU without problem, but at shutdown time (poweroff in the domU), I got a Xen panic: (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97 A xl destroy instead of poweroff