Re: [Xen-devel] [TESTDAY] Test report
On Sun, Nov 17, 2019 at 10:15 PM Jürgen Groß wrote: > > On 16.11.19 02:12, Roman Shaposhnik wrote: > > NOTE: this may or may not be a hair on fire problem, reporting it > > anyway since I'd hate to pass on something that maybe a serious issue. > > I haven't had time to debug this just yet -- so just reporting it here > > pretty raw. > > > > Software: > > Xen 4.13 RC2 > > Linux kernel 4.19.5 > > Hardware: > > Supermicro E300 > > > > https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm > > Supermicro E100 > > https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm > > Supermicro E50 > > https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm > > > > Functionality tested: trying to boot Dom0 > > Comments: Xen boots completely and then seems like it either dies > > right after saying > > Xen relinquishing a console > > or Dom0 dies (without printing a single line of output) > > > > FWIW, this started happening after upgrade to RC2. IOW, if I take my > > previous RC1 binary and stick it into the very same setup -- > > everything boots fine. > > > > The issue doesn't seem to be reproducible on Dell boxes (and in my > > virtual QEmu setup) that I've got. > > Can you please add the following to dom0's boot parameters: > > console=hvc0 earlyprintk=xen > > and send the Xen boot log (obtained via serial line)? Will do once I get to the lab (traveling for KubeCON for the next couple of days). That said, if you see the other thread -- we've figured out that the culprit was efi=no-rs that regressed in functionality between RC1 and RC2. Marek has suggested a patch that I need to test. Now, if I drop efi=no-rs -- I can boot all the hardware mentioned in *this* report just fine. A much bigger problem is that the following entire product line is now busted with Xen 4.13 RC2: https://www.dell.com/en-us/work/shop/gateways-embedded-computing/sc/gateways-embedded-pcs/edge-gateway?~ck=bt On all these boxes: - Without efi=no-rs option Xen panics on boot - With efi=no-rs Xen boots fine, but Dom0 can't come up Thanks, Roman. P.S. An additional complication with these Dell boxes is that it required reasonably major brain surgery with soldering iron to rig console output on them. I did it for one box in my lab but I need physical access to it and I'm currently traveling. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On 16.11.19 02:12, Roman Shaposhnik wrote: NOTE: this may or may not be a hair on fire problem, reporting it anyway since I'd hate to pass on something that maybe a serious issue. I haven't had time to debug this just yet -- so just reporting it here pretty raw. Software: Xen 4.13 RC2 Linux kernel 4.19.5 Hardware: Supermicro E300 https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-8D.cfm Supermicro E100 https://www.supermicro.com/en/products/system/Box_PC/SYS-E100-9S.cfm Supermicro E50 https://www.supermicro.com/en/products/system/Box_PC/SYS-E50-9AP.cfm Functionality tested: trying to boot Dom0 Comments: Xen boots completely and then seems like it either dies right after saying Xen relinquishing a console or Dom0 dies (without printing a single line of output) FWIW, this started happening after upgrade to RC2. IOW, if I take my previous RC1 binary and stick it into the very same setup -- everything boots fine. The issue doesn't seem to be reproducible on Dell boxes (and in my virtual QEmu setup) that I've got. Can you please add the following to dom0's boot parameters: console=hvc0 earlyprintk=xen and send the Xen boot log (obtained via serial line)? Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On 15.11.19 16:19, Tamas K Lengyel wrote: On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper wrote: On 14/11/2019 22:36, Tamas K Lengyel wrote: On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper wrote: On 14/11/2019 18:34, Tamas K Lengyel wrote: * Comments: All works, altp2m+introspection requires the ept=pml=0 boot flag specified to workaround a deadlock in Xen Is this separate from the general problem with EPT A/D and write-protecting pagetables? It sounds like it is, it happens without write-protecting in-guest pagetables. I didn't have time to investigate where the deadlock happens and since the workaround is fine for the usecase it wasn't a priority to figure out. Thinking about it, PML will do the wrong thing (deadlocks aside) as soon as any altp2m gfn translations are used. I'd be tempted to work around the deadlock by disabling pml the moment altp2m is touched. That would give a sightly less bad user experience, and should be easy to sort for 4.13. Thoughts, (inc. Juergen as RM) ? That sounds like a good idea to me, that way you can keep pml for guests where it doesn't cause an issue instead of disabling it system wide. Sounds like decent way to handle it. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On Fri, Nov 15, 2019 at 4:56 AM Andrew Cooper wrote: > > On 14/11/2019 22:36, Tamas K Lengyel wrote: > > On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper > > wrote: > >> On 14/11/2019 18:34, Tamas K Lengyel wrote: > >>> * Comments: All works, altp2m+introspection requires the ept=pml=0 > >>> boot flag specified to workaround a deadlock in Xen > >> Is this separate from the general problem with EPT A/D and > >> write-protecting pagetables? > >> > > It sounds like it is, it happens without write-protecting in-guest > > pagetables. I didn't have time to investigate where the deadlock > > happens and since the workaround is fine for the usecase it wasn't a > > priority to figure out. > > Thinking about it, PML will do the wrong thing (deadlocks aside) as soon > as any altp2m gfn translations are used. > > I'd be tempted to work around the deadlock by disabling pml the moment > altp2m is touched. That would give a sightly less bad user experience, > and should be easy to sort for 4.13. > > Thoughts, (inc. Juergen as RM) ? That sounds like a good idea to me, that way you can keep pml for guests where it doesn't cause an issue instead of disabling it system wide. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On 14/11/2019 22:36, Tamas K Lengyel wrote: > On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper > wrote: >> On 14/11/2019 18:34, Tamas K Lengyel wrote: >>> * Comments: All works, altp2m+introspection requires the ept=pml=0 >>> boot flag specified to workaround a deadlock in Xen >> Is this separate from the general problem with EPT A/D and >> write-protecting pagetables? >> > It sounds like it is, it happens without write-protecting in-guest > pagetables. I didn't have time to investigate where the deadlock > happens and since the workaround is fine for the usecase it wasn't a > priority to figure out. Thinking about it, PML will do the wrong thing (deadlocks aside) as soon as any altp2m gfn translations are used. I'd be tempted to work around the deadlock by disabling pml the moment altp2m is touched. That would give a sightly less bad user experience, and should be easy to sort for 4.13. Thoughts, (inc. Juergen as RM) ? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On 15.11.19 03:39, Roman Shaposhnik wrote: * Software: Xen 4.13 RC2 * Hardware: Dell IoT Gateway 3000 series * Software: Project EVE * Guest operating systems: Alpine Linux * Functionality tested: compiling, installing, Booting with dom0=pv * Comments: All works, aside from xl create often timing out The timeout happens when either doing xl create or xl creating in a paused state (with -p) and later resuming. The error message is below: libxl: error: libxl_dom_suspend.c:609:dm_resume_done: Domain 3:Failed to resume device model: rc=-9 We've actually tracked this issue down to this piece of code: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_dom_suspend.c;h=248dbc33e384ae008e4ab9ce8fb573be0672;hb=HEAD#l515 Curiously enough it seems to be the only place (aside from libxl__wait_for_device_model_deprecated) that uses the timeout value that low. Everywhere else it seems to be LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000 Thanks for the thorough analysis. It's clearly a regression. Patch sent. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On Thu, Nov 14, 2019 at 11:39 AM Andrew Cooper wrote: > > On 14/11/2019 18:34, Tamas K Lengyel wrote: > > * Comments: All works, altp2m+introspection requires the ept=pml=0 > > boot flag specified to workaround a deadlock in Xen > > Is this separate from the general problem with EPT A/D and > write-protecting pagetables? > It sounds like it is, it happens without write-protecting in-guest pagetables. I didn't have time to investigate where the deadlock happens and since the workaround is fine for the usecase it wasn't a priority to figure out. Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [TESTDAY] Test report
On 14/11/2019 18:34, Tamas K Lengyel wrote: > * Comments: All works, altp2m+introspection requires the ept=pml=0 > boot flag specified to workaround a deadlock in Xen Is this separate from the general problem with EPT A/D and write-protecting pagetables? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel