Re: [Xen-devel] Test for osstest, features used in Qubes OS
Jan Beulich: On 23.05.18 at 00:21, wrote: >> I have done some more testing in the meantime. The issue also affect >> 4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter. >> A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and >> all .data/.rodata/.bss mappings" as the commit which breaks suspend. >> >> 8462c575d9 is a squashed backport of: >> >> 422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss >> mappings >> d1d6fc97d6 x86/xpti: really hide almost all of Xen image >> 044fedfaa2 x86/traps: Put idt_table[] back into .bss >> >> And indeed, reverting those on staging fixes suspend. (This also matches >> the behavior that xpti=off fixes suspend as George already reported >> earlier today). > > Okay, that was quite helpful - I think I see now where I screwed up (i.e. > the issue is in the middle of the three commits). Could you confirm that a > Xen booted with "nosmp" suspends and resumes fine? Yes, with nosmp suspend works. signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
>>> On 23.05.18 at 00:21, wrote: > I have done some more testing in the meantime. The issue also affect > 4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter. > A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and > all .data/.rodata/.bss mappings" as the commit which breaks suspend. > > 8462c575d9 is a squashed backport of: > > 422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss > mappings > d1d6fc97d6 x86/xpti: really hide almost all of Xen image > 044fedfaa2 x86/traps: Put idt_table[] back into .bss > > And indeed, reverting those on staging fixes suspend. (This also matches > the behavior that xpti=off fixes suspend as George already reported > earlier today). Okay, that was quite helpful - I think I see now where I screwed up (i.e. the issue is in the middle of the three commits). Could you confirm that a Xen booted with "nosmp" suspends and resumes fine? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
George Dunlap: > On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski > wrote: >> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: >> On 18.05.18 at 17:33, wrote: Yes, I'm happy to help with that. As I've said, the basic test is very simple (rtcwake command) and already very useful. The fact that it is(?) broken on staging doesn't make it easier, >>> >>> Details on the breakage would be appreciated (on a separate thread), >>> unless you plan to address it yourself. I recall Simon(?) mentioning this as >>> well, but also not providing sufficient data to consider looking into it >>> (perhaps simply because it wasn't easy to obtain useful data, as >>> frequently is the case with S3 resume). I think it would be nice if we could >>> release 4.11 without a regression here. >> >> I only know that Simon have tested it and it fails. Cc'ing him. I run into the same problem as George below (see [1] for the inital report). > Well I tried it with a post-RC 4.11 and got the below. I haven't done > any investigation. > > -George > [...] > (XEN) *** DOUBLE FAULT *** > (XEN) [ Xen-4.11-rc x86_64 debug=y Not tainted ] > (XEN) CPU:0 > (XEN) RIP:e008:[] handle_exception+0x9c/0xf7 > (XEN) RFLAGS: 00010006 CONTEXT: hypervisor > (XEN) rax: c900422480b8 rbx: rcx: 0005 > (XEN) rdx: rsi: rdi: > (XEN) rbp: 36ffbddb7f27 rsp: c90042248000 r8: > (XEN) r9: r10: r11: > (XEN) r12: r13: r14: c9004224 > (XEN) r15: cr0: 8005003b cr4: 26e0 > (XEN) cr3: 00018a10 cr2: c90042247ff8 > (XEN) fsb: 7f6242d95700 gsb: 88003dc0 gss: > (XEN) ds: es: fs: gs: ss: e010 cs: e008 > (XEN) Current stack base c90042248000 differs from expected > 8300dfa8 > (XEN) Valid stack range: c9004224e000-c9004225, > sp=c90042248000, tss.rsp0=8300dfa87fa0 > (XEN) No stack overflow detected. Skipping stack trace. > (XEN) > (XEN) > (XEN) Panic on CPU 0: > (XEN) DOUBLE FAULT -- system shutdown > (XEN) > (XEN) > (XEN) Reboot in five seconds... I have done some more testing in the meantime. The issue also affect 4.10.1, but not 4.10.0. That's useful since it makes the bisect shorter. A bisect identifies 8462c575d9 "x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings" as the commit which breaks suspend. 8462c575d9 is a squashed backport of: 422588e885 x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings d1d6fc97d6 x86/xpti: really hide almost all of Xen image 044fedfaa2 x86/traps: Put idt_table[] back into .bss And indeed, reverting those on staging fixes suspend. (This also matches the behavior that xpti=off fixes suspend as George already reported earlier today). [1]: https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg01137.html signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Mon, 2018-05-21 at 14:57 +0100, Ian Jackson wrote: > > On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote: > > > What if we 1) have two versions of the test -- "Fake suspend" and > > > "Real Suspend"; 2) only run "Real suspend" on hardware > > > specifically > > > marked as having a suspend that works reliably; 3) default all > > > hardware to 'false' until we do some testing to find out how > > > reliable > > > it is? > > > > > OK, for starters, how about we add the fake suspend test to every > flight. > > Do we want or need to do that test with a guest running ? > Doing it with a guest running would be more complete, I think. I think the best would be to do both, i.e.: - suspend without any guest - (when resumed) start a guest - suspend with a guest Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Mon, May 21, 2018 at 5:28 PM, George Dunlap wrote: > On Mon, May 21, 2018 at 5:17 PM, Andrew Cooper > wrote: >> On 21/05/18 16:48, George Dunlap wrote: >>> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski >>> wrote: On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: On 18.05.18 at 17:33, wrote: >> Yes, I'm happy to help with that. As I've said, the basic test is very >> simple (rtcwake command) and already very useful. The fact that it is(?) >> broken on staging doesn't make it easier, > Details on the breakage would be appreciated (on a separate thread), > unless you plan to address it yourself. I recall Simon(?) mentioning this > as > well, but also not providing sufficient data to consider looking into it > (perhaps simply because it wasn't easy to obtain useful data, as > frequently is the case with S3 resume). I think it would be nice if we > could > release 4.11 without a regression here. I only know that Simon have tested it and it fails. Cc'ing him. >>> Well I tried it with a post-RC 4.11 and got the below. I haven't done >>> any investigation. >>> >>> -George >>> >>> >>> (XEN) CPU3: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 >>> (XEN) *** DOUBLE FAULT *** >>> (XEN) [ Xen-4.11-rc x86_64 debug=y Not tainted ] >>> (XEN) CPU:0 >>> (XEN) RIP:e008:[] handle_exception+0x9c/0xf7 >> >> Do you have xen-syms from this build? That looks like its in the middle >> of the Spectre alternative, but isn't the wrmsr instruction itself. > > Hmm, sorry, I've trashed it -- I was really trying to test my > "acpi_sleep=s3_fake" test. > > I've never tried suspend on this particular box, so I'm not sure it > works generally. Let me get a reasonable baseline first. OK, well suspend / resume works on this box in all the following configurations: * 4.8.0 (real) * 4.8.0 with s3_fake backported (fake) * 4.8.3 (real) * staging-4.8 with bti=false and xpti=false (real) It fails in the following configuration: * staging-4.8 with speculation mitigations at default. (It is an Intel box, so BTI and XPTI will both be on.) I didn't get a stack trace unfortunately -- the box just stopped responding. I'll do some more playing around on staging tomorrow. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Mon, May 21, 2018 at 5:17 PM, Andrew Cooper wrote: > On 21/05/18 16:48, George Dunlap wrote: >> On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski >> wrote: >>> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: >>> On 18.05.18 at 17:33, wrote: > Yes, I'm happy to help with that. As I've said, the basic test is very > simple (rtcwake command) and already very useful. The fact that it is(?) > broken on staging doesn't make it easier, Details on the breakage would be appreciated (on a separate thread), unless you plan to address it yourself. I recall Simon(?) mentioning this as well, but also not providing sufficient data to consider looking into it (perhaps simply because it wasn't easy to obtain useful data, as frequently is the case with S3 resume). I think it would be nice if we could release 4.11 without a regression here. >>> I only know that Simon have tested it and it fails. Cc'ing him. >> Well I tried it with a post-RC 4.11 and got the below. I haven't done >> any investigation. >> >> -George >> >> >> (XEN) CPU3: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 >> (XEN) *** DOUBLE FAULT *** >> (XEN) [ Xen-4.11-rc x86_64 debug=y Not tainted ] >> (XEN) CPU:0 >> (XEN) RIP:e008:[] handle_exception+0x9c/0xf7 > > Do you have xen-syms from this build? That looks like its in the middle > of the Spectre alternative, but isn't the wrmsr instruction itself. Hmm, sorry, I've trashed it -- I was really trying to test my "acpi_sleep=s3_fake" test. I've never tried suspend on this particular box, so I'm not sure it works generally. Let me get a reasonable baseline first. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On 21/05/18 16:48, George Dunlap wrote: > On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski > wrote: >> On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: >> On 18.05.18 at 17:33, wrote: Yes, I'm happy to help with that. As I've said, the basic test is very simple (rtcwake command) and already very useful. The fact that it is(?) broken on staging doesn't make it easier, >>> Details on the breakage would be appreciated (on a separate thread), >>> unless you plan to address it yourself. I recall Simon(?) mentioning this as >>> well, but also not providing sufficient data to consider looking into it >>> (perhaps simply because it wasn't easy to obtain useful data, as >>> frequently is the case with S3 resume). I think it would be nice if we could >>> release 4.11 without a regression here. >> I only know that Simon have tested it and it fails. Cc'ing him. > Well I tried it with a post-RC 4.11 and got the below. I haven't done > any investigation. > > -George > > > (XEN) CPU3: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 > (XEN) *** DOUBLE FAULT *** > (XEN) [ Xen-4.11-rc x86_64 debug=y Not tainted ] > (XEN) CPU:0 > (XEN) RIP:e008:[] handle_exception+0x9c/0xf7 Do you have xen-syms from this build? That looks like its in the middle of the Spectre alternative, but isn't the wrmsr instruction itself. > (XEN) RFLAGS: 00010006 CONTEXT: hypervisor > (XEN) rax: c900422480b8 rbx: rcx: 0005 > (XEN) rdx: rsi: rdi: > (XEN) rbp: 36ffbddb7f27 rsp: c90042248000 r8: > (XEN) r9: r10: r11: > (XEN) r12: r13: r14: c9004224 > (XEN) r15: cr0: 8005003b cr4: 26e0 > (XEN) cr3: 00018a10 cr2: c90042247ff8 > (XEN) fsb: 7f6242d95700 gsb: 88003dc0 gss: > (XEN) ds: es: fs: gs: ss: e010 cs: e008 > (XEN) Current stack base c90042248000 differs from expected > 8300dfa8 > (XEN) Valid stack range: c9004224e000-c9004225, > sp=c90042248000, tss.rsp0=8300dfa87fa0 > (XEN) No stack overflow detected. Skipping stack trace. I really need to wire up the code dump, irrespective of this particular issue. ~Andrew > (XEN) > (XEN) > (XEN) Panic on CPU 0: > (XEN) DOUBLE FAULT -- system shutdown > (XEN) > (XEN) > (XEN) Reboot in five seconds... > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Fri, May 18, 2018 at 5:19 PM, Marek Marczykowski wrote: > On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: >> >>> On 18.05.18 at 17:33, wrote: >> > Yes, I'm happy to help with that. As I've said, the basic test is very >> > simple (rtcwake command) and already very useful. The fact that it is(?) >> > broken on staging doesn't make it easier, >> >> Details on the breakage would be appreciated (on a separate thread), >> unless you plan to address it yourself. I recall Simon(?) mentioning this as >> well, but also not providing sufficient data to consider looking into it >> (perhaps simply because it wasn't easy to obtain useful data, as >> frequently is the case with S3 resume). I think it would be nice if we could >> release 4.11 without a regression here. > > I only know that Simon have tested it and it fails. Cc'ing him. Well I tried it with a post-RC 4.11 and got the below. I haven't done any investigation. -George (XEN) CPU0 CMCI LVT vector (0xf2) already installed (XEN) CPU0: Thermal monitoring enabled (TM1) (XEN) Finishing wakeup from ACPI S3 state. (XEN) Preparing system for ACPI S3 state. (XEN) Disabling non-boot CPUs ... (XEN) Broke affinity for irq 16 (XEN) Broke affinity for irq 49 (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 0 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 12288K (XEN) Enabling non-boot CPUs ... (XEN) Booting processor 1/2 eip 8e000 (XEN) Initializing CPU#1 (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 1 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 12288K (XEN) CPU1: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 (XEN) Booting processor 2/18 eip 8e000 (XEN) Initializing CPU#2 (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 9 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 12288K (XEN) CPU2: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 (XEN) Booting processor 3/20 eip 8e000 (XEN) Initializing CPU#3 (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 10 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 12288K (XEN) CPU3: Intel(R) Xeon(R) CPU E5630 @ 2.53GHz stepping 02 (XEN) *** DOUBLE FAULT *** (XEN) [ Xen-4.11-rc x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[] handle_exception+0x9c/0xf7 (XEN) RFLAGS: 00010006 CONTEXT: hypervisor (XEN) rax: c900422480b8 rbx: rcx: 0005 (XEN) rdx: rsi: rdi: (XEN) rbp: 36ffbddb7f27 rsp: c90042248000 r8: (XEN) r9: r10: r11: (XEN) r12: r13: r14: c9004224 (XEN) r15: cr0: 8005003b cr4: 26e0 (XEN) cr3: 00018a10 cr2: c90042247ff8 (XEN) fsb: 7f6242d95700 gsb: 88003dc0 gss: (XEN) ds: es: fs: gs: ss: e010 cs: e008 (XEN) Current stack base c90042248000 differs from expected 8300dfa8 (XEN) Valid stack range: c9004224e000-c9004225, sp=c90042248000, tss.rsp0=8300dfa87fa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) DOUBLE FAULT -- system shutdown (XEN) (XEN) (XEN) Reboot in five seconds... ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Mon, May 21, 2018 at 2:57 PM, Ian Jackson wrote: > Dario Faggioli writes ("Re: [Xen-devel] Test for osstest, features used in > Qubes OS"): >> On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote: >> > What if we 1) have two versions of the test -- "Fake suspend" and >> > "Real Suspend"; 2) only run "Real suspend" on hardware specifically >> > marked as having a suspend that works reliably; 3) default all >> > hardware to 'false' until we do some testing to find out how reliable >> > it is? >> > >> > That way we get suspend testing 95% effective as quickly as possible, >> > and we can complete it as we have time. >> >> That sounds a very good plan to me, FWIW. > > OK, for starters, how about we add the fake suspend test to every > flight. > > What is the rune for that. > > Do we want or need to do that test with a guest running ? Unfortunately the patch was never checked in. I'll send an updated patch. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
Dario Faggioli writes ("Re: [Xen-devel] Test for osstest, features used in Qubes OS"): > On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote: > > What if we 1) have two versions of the test -- "Fake suspend" and > > "Real Suspend"; 2) only run "Real suspend" on hardware specifically > > marked as having a suspend that works reliably; 3) default all > > hardware to 'false' until we do some testing to find out how reliable > > it is? > > > > That way we get suspend testing 95% effective as quickly as possible, > > and we can complete it as we have time. > > That sounds a very good plan to me, FWIW. OK, for starters, how about we add the fake suspend test to every flight. What is the rune for that. Do we want or need to do that test with a guest running ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Mon, 2018-05-21 at 12:04 +0100, George Dunlap wrote: > On Thu, May 17, 2018 at 4:12 PM, Ian Jackson > wrote: > > That's not entirely trivial then, especially for you, unless you > > want > > to set up your own osstest production instance. However, I can > > probably do the osstest-machinery work if you will help debug it, > > review logs, tell me what to do next, etc. :-). > > I'm pretty sure it would be possible to test the Xen "get ready for > suspend" and "resume from suspend" functionality without actually > needing to interact with ACPI -- we just get it to the point where it > would start interacting with ACPI, and then have it return instead. > From a "I'm positive this will continue to work" point of view it's > not as satisfying as actually doing the suspend; but from a practical > point of view, it will catch the vast majority of bugs in Xen (as > opposed to hardware-specific quirks); and it will run on any hardware > (which means not having to do reliability testing). > > IIRC Dario actually had a patch for something like this for his own > testing at some point -- Dario, anything to add? > Indeed I had a patch (it's originally from Ben, actually). I sent it, so it can be found in list archives. And, in any case, I still have it around and can resend it. I did catch quite a few bugs with it back then. > What if we 1) have two versions of the test -- "Fake suspend" and > "Real Suspend"; 2) only run "Real suspend" on hardware specifically > marked as having a suspend that works reliably; 3) default all > hardware to 'false' until we do some testing to find out how reliable > it is? > > That way we get suspend testing 95% effective as quickly as possible, > and we can complete it as we have time. > That sounds a very good plan to me, FWIW. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Thu, 2018-05-17 at 16:12 +0100, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features > used in Qubes OS"): > > On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: > > > Is it likely that this will depend on non-buggy host firmware > > > ? If so > > > then we need to make arrangements to test it and only do it on > > > hosts > > > which are not buggy. In practice this probably means wiring it > > > up to > > > the automatic host examiner. > > > > Yes, probably. > > That's not entirely trivial then, especially for you, unless you want > to set up your own osstest production instance. However, I can > probably do the osstest-machinery work if you will help debug it, > review logs, tell me what to do next, etc. :-). > I'm not sure what 'non-bugs' in the firmware we're talking about, but I problem I had when trying to do something like testing S3 suspend/resume in osstest, was that most server class hardware I could find, did not support that. If that's the bug you're talking about, yes, I agree it's not trivial. :-) (although, I did not actually check the boxes in the MA colo, they were just servers from Citrix's lab). There's a (non-perfect) workaround, though, as George suggests, which would allow us to run a "quasi-suspend" test at every flight on every hardware. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Thu, May 17, 2018 at 4:12 PM, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in > Qubes OS"): >> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: >> > Is it likely that this will depend on non-buggy host firmware ? If so >> > then we need to make arrangements to test it and only do it on hosts >> > which are not buggy. In practice this probably means wiring it up to >> > the automatic host examiner. >> >> Yes, probably. > > That's not entirely trivial then, especially for you, unless you want > to set up your own osstest production instance. However, I can > probably do the osstest-machinery work if you will help debug it, > review logs, tell me what to do next, etc. :-). I'm pretty sure it would be possible to test the Xen "get ready for suspend" and "resume from suspend" functionality without actually needing to interact with ACPI -- we just get it to the point where it would start interacting with ACPI, and then have it return instead. From a "I'm positive this will continue to work" point of view it's not as satisfying as actually doing the suspend; but from a practical point of view, it will catch the vast majority of bugs in Xen (as opposed to hardware-specific quirks); and it will run on any hardware (which means not having to do reliability testing). IIRC Dario actually had a patch for something like this for his own testing at some point -- Dario, anything to add? What if we 1) have two versions of the test -- "Fake suspend" and "Real Suspend"; 2) only run "Real suspend" on hardware specifically marked as having a suspend that works reliably; 3) default all hardware to 'false' until we do some testing to find out how reliable it is? That way we get suspend testing 95% effective as quickly as possible, and we can complete it as we have time. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Fri, May 18, 2018 at 09:54:37AM -0600, Jan Beulich wrote: > >>> On 18.05.18 at 17:33, wrote: > > Yes, I'm happy to help with that. As I've said, the basic test is very > > simple (rtcwake command) and already very useful. The fact that it is(?) > > broken on staging doesn't make it easier, > > Details on the breakage would be appreciated (on a separate thread), > unless you plan to address it yourself. I recall Simon(?) mentioning this as > well, but also not providing sufficient data to consider looking into it > (perhaps simply because it wasn't easy to obtain useful data, as > frequently is the case with S3 resume). I think it would be nice if we could > release 4.11 without a regression here. I only know that Simon have tested it and it fails. Cc'ing him. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
>>> On 18.05.18 at 17:33, wrote: > Yes, I'm happy to help with that. As I've said, the basic test is very > simple (rtcwake command) and already very useful. The fact that it is(?) > broken on staging doesn't make it easier, Details on the breakage would be appreciated (on a separate thread), unless you plan to address it yourself. I recall Simon(?) mentioning this as well, but also not providing sufficient data to consider looking into it (perhaps simply because it wasn't easy to obtain useful data, as frequently is the case with S3 resume). I think it would be nice if we could release 4.11 without a regression here. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Thu, May 17, 2018 at 08:00:38PM +0200, Sander Eikelenboom wrote: > Marek / Ian, > > Nice to see PCI-passthrough getting some attention again. > > On 17/05/18 17:12, Ian Jackson wrote: > > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in > > Qubes OS"): > >> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: > >>> Is there some kind of cheap USB HID, that is interactable-with, which > >>> we could plug into each machine's USB port ? I'm slightly concerned > >>> that plugging in a storage device, or connecting the other NIC, might > >>> interfere with booting. > >> > >> I use mass storage for tests... But if you use network boot, it > >> shouldn't really interfere, no? > > > > We do both network boot and disk boot. I think the BIOS disk boot has > > to continue to work and boot the HDD. In fact, using any device should be enough for the start. USB mouse for example. Just reading USB descriptor involve some communication with the controller, so it should be some indication about its state. > As a user of pci-passthrough for quite some time and reporting some > pci-passthrough bugs in the past, > I do have some comments: > > - First of all it would be very nice to get some autotesting :). > - But if you want to thoroughly test pci-passthrough, > it will be far from easy since there is quite a multi-dimensional support > matrix > (I'm not implying that everything should be done or it won't be valuable if > any is missing, >it's only meant for reference): > 1) Guest side implementation: > - PV guest (pcifront) > - HVM (qemu-traditional) > - HVM (qemu-xen) > - HVM (qemu-upstream) > - perhaps PVH support for pci passthrough coming around the corner. > > 2) (Un)Binding method to pciback: > - binding pci devices to pciback on host boot (command line) > - de/re/unbinding devices from dom0 while running. > > 3) (Un)binding to guest: > - On guest start (guest.cfg pci=[...]) > - After the guest has been started with 'xl pci-*' commands > 3) Device interrupts: legacy versus MSI versus MSI-X > 4) Other pci device features: roms, BAR sizes, etc. > 5) AMD versus Intel IOMMU > > From the past reports, I know (1) and (3) did matter (problems being isolated > to one of these variants only). Yes, that's right, my experience is similar in that matter. Especially point 3 is tricky/problematic, as some devices (or rather: drivers) doesn't correctly fallback to legacy interrupts if MSI/MSI-X isn't available. So, the ideal test should check those things too - if the guest driver really use what it's expected to use. But lets start with something first. I don't know how osstest handle it yet, but I'd expect adding more guest configurations to run the same test on should be easy. > As for restarting guests and reassigning pci-devices again to other guests > the current pciback reset support lacks > the bus-reset patches at present in upstream linux kernels. Passthrough of > AMD Radeon graphics adapters works only one > time without it (if you stop and restart a guest it doesn't work anymore and > you need to reboot the host). > With the bus-reset patches (which have been posted to the list and seem to be > in both Qubes and Xenserver > in some form but not in upstream linux). Someone from Oracle had picked them > up to get them upstream some time ago, > but that effort seems to have stalled. Can you point specifically what patches are you talking about? In Qubes in most cases device reset is handled by libvirt... > The code in libxl seems to be quite messy for pci-passthrough especially for > handling all the guest side implementations (1) > and xenstore interactions that go with it (or don't for qemu). > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Thu, May 17, 2018 at 04:12:09PM +0100, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in > Qubes OS"): > > On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: > > > Is it likely that this will depend on non-buggy host firmware ? If so > > > then we need to make arrangements to test it and only do it on hosts > > > which are not buggy. In practice this probably means wiring it up to > > > the automatic host examiner. > > > > Yes, probably. > > That's not entirely trivial then, especially for you, unless you want > to set up your own osstest production instance. However, I can > probably do the osstest-machinery work if you will help debug it, > review logs, tell me what to do next, etc. :-). Yes, I'm happy to help with that. As I've said, the basic test is very simple (rtcwake command) and already very useful. The fact that it is(?) broken on staging doesn't make it easier, but I think setting up the test using 4.8 branch first should be fine. If you want to talk on IRC about it, just ping me on email first, I don't have my irc client running all the time. In the meantime, I'll try to familiarize myself with osstest... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
Marek / Ian, Nice to see PCI-passthrough getting some attention again. On 17/05/18 17:12, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in > Qubes OS"): >> On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: >>> Is it likely that this will depend on non-buggy host firmware ? If so >>> then we need to make arrangements to test it and only do it on hosts >>> which are not buggy. In practice this probably means wiring it up to >>> the automatic host examiner. >> >> Yes, probably. > > That's not entirely trivial then, especially for you, unless you want > to set up your own osstest production instance. However, I can > probably do the osstest-machinery work if you will help debug it, > review logs, tell me what to do next, etc. :-). > >>> Is there some kind of cheap USB HID, that is interactable-with, which >>> we could plug into each machine's USB port ? I'm slightly concerned >>> that plugging in a storage device, or connecting the other NIC, might >>> interfere with booting. >> >> I use mass storage for tests... But if you use network boot, it >> shouldn't really interfere, no? > > We do both network boot and disk boot. I think the BIOS disk boot has > to continue to work and boot the HDD. As a user of pci-passthrough for quite some time and reporting some pci-passthrough bugs in the past, I do have some comments: - First of all it would be very nice to get some autotesting :). - But if you want to thoroughly test pci-passthrough, it will be far from easy since there is quite a multi-dimensional support matrix (I'm not implying that everything should be done or it won't be valuable if any is missing, it's only meant for reference): 1) Guest side implementation: - PV guest (pcifront) - HVM (qemu-traditional) - HVM (qemu-xen) - HVM (qemu-upstream) - perhaps PVH support for pci passthrough coming around the corner. 2) (Un)Binding method to pciback: - binding pci devices to pciback on host boot (command line) - de/re/unbinding devices from dom0 while running. 3) (Un)binding to guest: - On guest start (guest.cfg pci=[...]) - After the guest has been started with 'xl pci-*' commands 3) Device interrupts: legacy versus MSI versus MSI-X 4) Other pci device features: roms, BAR sizes, etc. 5) AMD versus Intel IOMMU From the past reports, I know (1) and (3) did matter (problems being isolated to one of these variants only). As for restarting guests and reassigning pci-devices again to other guests the current pciback reset support lacks the bus-reset patches at present in upstream linux kernels. Passthrough of AMD Radeon graphics adapters works only one time without it (if you stop and restart a guest it doesn't work anymore and you need to reboot the host). With the bus-reset patches (which have been posted to the list and seem to be in both Qubes and Xenserver in some form but not in upstream linux). Someone from Oracle had picked them up to get them upstream some time ago, but that effort seems to have stalled. The code in libxl seems to be quite messy for pci-passthrough especially for handling all the guest side implementations (1) and xenstore interactions that go with it (or don't for qemu). -- Sander >>> If you want to get pci passthrough tests working I would suggest >>> testing it with non-stubdom first. I assume the config etc. is the >>> same, so having got that working, osstest would be able to test it for >>> the stubdom tests too. >> >> Oh, I though there are already tests for that... > > There are no PCI passthrough tests at all. For a while we had some > SRIOV NIC tests which were requested by Intel. But they always failed > giving kernel stack dumps. We kept poking Intel to get them to fix > them, or tell us how the tests were wrong, but to no avail. So we > dropped them. > > So any work in this area would be greatly appreciated! > > Ian. > > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
Marek Marczykowski-Górecki writes ("Re: Test for osstest, features used in Qubes OS"): > On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: > > Is it likely that this will depend on non-buggy host firmware ? If so > > then we need to make arrangements to test it and only do it on hosts > > which are not buggy. In practice this probably means wiring it up to > > the automatic host examiner. > > Yes, probably. That's not entirely trivial then, especially for you, unless you want to set up your own osstest production instance. However, I can probably do the osstest-machinery work if you will help debug it, review logs, tell me what to do next, etc. :-). > > Is there some kind of cheap USB HID, that is interactable-with, which > > we could plug into each machine's USB port ? I'm slightly concerned > > that plugging in a storage device, or connecting the other NIC, might > > interfere with booting. > > I use mass storage for tests... But if you use network boot, it > shouldn't really interfere, no? We do both network boot and disk boot. I think the BIOS disk boot has to continue to work and boot the HDD. > > If you want to get pci passthrough tests working I would suggest > > testing it with non-stubdom first. I assume the config etc. is the > > same, so having got that working, osstest would be able to test it for > > the stubdom tests too. > > Oh, I though there are already tests for that... There are no PCI passthrough tests at all. For a while we had some SRIOV NIC tests which were requested by Intel. But they always failed giving kernel stack dumps. We kept poking Intel to get them to fix them, or tell us how the tests were wrong, but to no avail. So we dropped them. So any work in this area would be greatly appreciated! Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
On Thu, May 17, 2018 at 01:26:30PM +0100, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("Test for osstest, features used in Qubes > OS"): > > As discussed some time ago, I'd like to help with adding tests for some > > features we use in Qubes OS. > > > > IMO the easiest thing to test is host suspend. You just need to execute > > "rtcwake -s 30 -m mem", and see if the host is back to live after ~30s. > > Right now I know it works on Xen 4.8, but supposedly is broken on > > staging (haven't tested the most recent version). > > Next step would be the same while having some domains running. > > > > How the test should look like (where to add this? etc)? > > I guess this should be a new > ts-host-suspend-test > script. > > Is it likely that this will depend on non-buggy host firmware ? If so > then we need to make arrangements to test it and only do it on hosts > which are not buggy. In practice this probably means wiring it up to > the automatic host examiner. Yes, probably. > > Next things would be mostly related to PCI passthrough: > > - PCI passthrough with qemu in stubdomain > > - the same as above, but with Linux-based stubdomain (we need cleanup > >and send patches for that first, probably 4.12 material) > > - guest suspend (recently added libxl_domain_suspend_only), for > >different guest types (PV, PVH, HVM), also with/without PCI device > > > > For this, the machine obviously need to have IOMMU (I assume at least > > some of the hardware used in test lab have it), and some spare PCI > > device. I use sound card for some of such tests. But testing on USB > > controllers would be more useful (from out experience, one of the most > > problematic devices for suspend, sadly also lacking FLR or such...). > > I doubt any of our x86 machines have sound cards. ... Just looked at > one and it says > 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core > Processor HD Audio Controller (rev 06) > which is obviously mad. > > I'm pretty sure they all have usb controllers. Almost all of them > have multiple NICs, often on different pci devices, although it is > difficult to tell if a NIC not connected to anything is working. > > Eg, > > 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > > 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > > Is there some kind of cheap USB HID, that is interactable-with, which > we could plug into each machine's USB port ? I'm slightly concerned > that plugging in a storage device, or connecting the other NIC, might > interfere with booting. I use mass storage for tests... But if you use network boot, it shouldn't really interfere, no? > If you want to get pci passthrough tests working I would suggest > testing it with non-stubdom first. I assume the config etc. is the > same, so having got that working, osstest would be able to test it for > the stubdom tests too. Oh, I though there are already tests for that... Yes, good idea. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Test for osstest, features used in Qubes OS
Marek Marczykowski-Górecki writes ("Test for osstest, features used in Qubes OS"): > As discussed some time ago, I'd like to help with adding tests for some > features we use in Qubes OS. > > IMO the easiest thing to test is host suspend. You just need to execute > "rtcwake -s 30 -m mem", and see if the host is back to live after ~30s. > Right now I know it works on Xen 4.8, but supposedly is broken on > staging (haven't tested the most recent version). > Next step would be the same while having some domains running. > > How the test should look like (where to add this? etc)? I guess this should be a new ts-host-suspend-test script. Is it likely that this will depend on non-buggy host firmware ? If so then we need to make arrangements to test it and only do it on hosts which are not buggy. In practice this probably means wiring it up to the automatic host examiner. > Next things would be mostly related to PCI passthrough: > - PCI passthrough with qemu in stubdomain > - the same as above, but with Linux-based stubdomain (we need cleanup >and send patches for that first, probably 4.12 material) > - guest suspend (recently added libxl_domain_suspend_only), for >different guest types (PV, PVH, HVM), also with/without PCI device > > For this, the machine obviously need to have IOMMU (I assume at least > some of the hardware used in test lab have it), and some spare PCI > device. I use sound card for some of such tests. But testing on USB > controllers would be more useful (from out experience, one of the most > problematic devices for suspend, sadly also lacking FLR or such...). I doubt any of our x86 machines have sound cards. ... Just looked at one and it says 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) which is obviously mad. I'm pretty sure they all have usb controllers. Almost all of them have multiple NICs, often on different pci devices, although it is difficult to tell if a NIC not connected to anything is working. Eg, 02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Is there some kind of cheap USB HID, that is interactable-with, which we could plug into each machine's USB port ? I'm slightly concerned that plugging in a storage device, or connecting the other NIC, might interfere with booting. If you want to get pci passthrough tests working I would suggest testing it with non-stubdom first. I assume the config etc. is the same, so having got that working, osstest would be able to test it for the stubdom tests too. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel