Re: [Xen-devel] [TESTDAY] Test report

2019-11-18 Thread Roman Shaposhnik
On Sun, Nov 17, 2019 at 10:15 PM Jürgen Groß  wrote:
>
> On 16.11.19 02:12, Roman Shaposhnik wrote:
> > NOTE: this may or may not be a hair on fire problem, reporting it
> > anyway since I'd hate to pass on something that maybe a serious issue.
> > I haven't had time to debug this just yet -- so just reporting it here
> > pretty raw.
> >
> > Software:
> > Xen 4.13 RC2
> > Linux kernel 4.19.5
> > Hardware:
> > Supermicro E300
> > 
> > https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
> > Supermicro E100
> > https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
> > Supermicro E50
> > https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm
> >
> > Functionality tested: trying to boot Dom0
> > Comments: Xen boots completely and then seems like it either dies
> > right after saying
> >  Xen relinquishing a console
> > or Dom0 dies (without printing a single line of output)
> >
> > FWIW, this started happening after upgrade to RC2. IOW, if I take my
> > previous RC1 binary and stick it into the very same setup --
> > everything boots fine.
> >
> > The issue doesn't seem to be reproducible on Dell boxes (and in my
> > virtual QEmu setup) that I've got.
>
> Can you please add the following to dom0's boot parameters:
>
> console=hvc0 earlyprintk=xen
>
> and send the Xen boot log (obtained via serial line)?

Will do once I get to the lab (traveling for KubeCON for the next
couple of days).

That said, if you see the other thread -- we've figured out that the
culprit was efi=no-rs
that regressed in functionality between RC1 and RC2. Marek has suggested a patch
that I need to test.

Now, if I drop efi=no-rs -- I can boot all the hardware mentioned in
*this* report
just fine.

A much bigger problem is that the following entire product line is now
busted with Xen 4.13 RC2:
 
https://www.dell.com/en-us/work/shop/gateways-embedded-computing/sc/gateways-embedded-pcs/edge-gateway?~ck=bt

On all these boxes:
   - Without efi=no-rs option Xen panics on boot
   - With efi=no-rs Xen boots fine, but Dom0 can't come up

Thanks,
Roman.

P.S. An additional complication with these Dell boxes is that it
required reasonably major brain surgery with soldering iron to rig
console output on them. I did it for one box in my lab but I need
physical access to it and I'm currently traveling.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-17 Thread Jürgen Groß

On 16.11.19 02:12, Roman Shaposhnik wrote:

NOTE: this may or may not be a hair on fire problem, reporting it
anyway since I'd hate to pass on something that maybe a serious issue.
I haven't had time to debug this just yet -- so just reporting it here
pretty raw.

Software:
Xen 4.13 RC2
Linux kernel 4.19.5
Hardware:
Supermicro E300
https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm
Supermicro E100
https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm
Supermicro E50
https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm

Functionality tested: trying to boot Dom0
Comments: Xen boots completely and then seems like it either dies
right after saying
 Xen relinquishing a console
or Dom0 dies (without printing a single line of output)

FWIW, this started happening after upgrade to RC2. IOW, if I take my
previous RC1 binary and stick it into the very same setup --
everything boots fine.

The issue doesn't seem to be reproducible on Dell boxes (and in my
virtual QEmu setup) that I've got.


Can you please add the following to dom0's boot parameters:

console=hvc0 earlyprintk=xen

and send the Xen boot log (obtained via serial line)?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-15 Thread Jürgen Groß

On 15.11.19 16:19, Tamas K Lengyel wrote:

On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper  wrote:


On 14/11/2019 22:36, Tamas K Lengyel wrote:

On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
 wrote:

On 14/11/2019 18:34, Tamas K Lengyel wrote:

* Comments: All works, altp2m+introspection requires the ept=pml=0
boot flag specified to workaround a deadlock in Xen

Is this separate from the general problem with EPT A/D and
write-protecting pagetables?


It sounds like it is, it happens without write-protecting in-guest
pagetables. I didn't have time to investigate where the deadlock
happens and since the workaround is fine for the usecase it wasn't a
priority to figure out.


Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
as any altp2m gfn translations are used.

I'd be tempted to work around the deadlock by disabling pml the moment
altp2m is touched.  That would give a sightly less bad user experience,
and should be easy to sort for 4.13.

Thoughts, (inc. Juergen as RM) ?


That sounds like a good idea to me, that way you can keep pml for
guests where it doesn't cause an issue instead of disabling it system
wide.


Sounds like decent way to handle it.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-15 Thread Tamas K Lengyel
On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper  wrote:
>
> On 14/11/2019 22:36, Tamas K Lengyel wrote:
> > On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
> >  wrote:
> >> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> >>> * Comments: All works, altp2m+introspection requires the ept=pml=0
> >>> boot flag specified to workaround a deadlock in Xen
> >> Is this separate from the general problem with EPT A/D and
> >> write-protecting pagetables?
> >>
> > It sounds like it is, it happens without write-protecting in-guest
> > pagetables. I didn't have time to investigate where the deadlock
> > happens and since the workaround is fine for the usecase it wasn't a
> > priority to figure out.
>
> Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
> as any altp2m gfn translations are used.
>
> I'd be tempted to work around the deadlock by disabling pml the moment
> altp2m is touched.  That would give a sightly less bad user experience,
> and should be easy to sort for 4.13.
>
> Thoughts, (inc. Juergen as RM) ?

That sounds like a good idea to me, that way you can keep pml for
guests where it doesn't cause an issue instead of disabling it system
wide.

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-15 Thread Andrew Cooper
On 14/11/2019 22:36, Tamas K Lengyel wrote:
> On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
>  wrote:
>> On 14/11/2019 18:34, Tamas K Lengyel wrote:
>>> * Comments: All works, altp2m+introspection requires the ept=pml=0
>>> boot flag specified to workaround a deadlock in Xen
>> Is this separate from the general problem with EPT A/D and
>> write-protecting pagetables?
>>
> It sounds like it is, it happens without write-protecting in-guest
> pagetables. I didn't have time to investigate where the deadlock
> happens and since the workaround is fine for the usecase it wasn't a
> priority to figure out.

Thinking about it, PML will do the wrong thing (deadlocks aside) as soon
as any altp2m gfn translations are used.

I'd be tempted to work around the deadlock by disabling pml the moment
altp2m is touched.  That would give a sightly less bad user experience,
and should be easy to sort for 4.13.

Thoughts, (inc. Juergen as RM) ?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-14 Thread Jürgen Groß

On 15.11.19 03:39, Roman Shaposhnik wrote:

* Software: Xen 4.13 RC2
* Hardware: Dell IoT Gateway 3000 series
* Software: Project EVE
* Guest operating systems: Alpine Linux
* Functionality tested: compiling, installing, Booting with dom0=pv
* Comments: All works, aside from xl create often timing out

The timeout happens when either doing xl create or
xl creating in a paused state (with -p) and later resuming.
The error message is below:
libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain
3:Failed to resume device model: rc=-9

We've actually tracked this issue down to this piece of code:
 
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be0672;hb=HEAD#l515

Curiously enough it seems to be the only place (aside from
libxl__wait_for_device_model_deprecated) that uses the
timeout value that low. Everywhere else it seems to be
 LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000


Thanks for the thorough analysis.

It's clearly a regression. Patch sent.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-14 Thread Tamas K Lengyel
On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper
 wrote:
>
> On 14/11/2019 18:34, Tamas K Lengyel wrote:
> > * Comments: All works, altp2m+introspection requires the ept=pml=0
> > boot flag specified to workaround a deadlock in Xen
>
> Is this separate from the general problem with EPT A/D and
> write-protecting pagetables?
>

It sounds like it is, it happens without write-protecting in-guest
pagetables. I didn't have time to investigate where the deadlock
happens and since the workaround is fine for the usecase it wasn't a
priority to figure out.

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [TESTDAY] Test report

2019-11-14 Thread Andrew Cooper
On 14/11/2019 18:34, Tamas K Lengyel wrote:
> * Comments: All works, altp2m+introspection requires the ept=pml=0
> boot flag specified to workaround a deadlock in Xen

Is this separate from the general problem with EPT A/D and
write-protecting pagetables?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel