Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]
On 05/12/17 15:31, Jan Beulich wrote: >>>> On 05.12.17 at 16:05, <ian.jack...@eu.citrix.com> wrote: >> Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions >> - >> FAIL"): >>> This is a blue screen, recurring, and has first been reported in flight >>> 116779, i.e. was likely introduced in the batch ending in commit >>> 4cd0fad645. Among those the most likely candidates appear to be >>> the SVM changes (the failures are all on AMD hardware). The logs >>> there also have huge amounts of "Unexpected nested vmexit", >>> albeit not directly connected with the failed test afaict. >> Ian Jackson writes ("Re: [xen-unstable test] 116832: regressions - FAIL"): >>> This is the expected Windows failure. Force pushed. >> Oops. Sorry about that. >> >> I think this goes to show that (i) leaving known failures languishing >> for months and expecting them to be force pushed results in human >> error (ii) I should read the whole email thread first. > Oh, that's pretty unfortunate. I think we'll then need a custom flight > tied to the box that this failure occurred on, to have a way to tell > whether the fix I'm about to prepare has actually helped, the more > that the same issue is presumably also present on the 4.10 branch. > Thing is that newer AMD hardware (with decode assist) doesn't > appear to demonstrate the misbehavior, and for some reason it also > doesn't show on Intel systems. > > I've spent quite a bit of time to repro this on my old AMD box, but the > distro on there is just too old to be able to start a suitable Windows > guest (part(?) of the reason being that scripts in /etc/xen/scripts > appear to get invoked alongside the ones from the separate unstable > install tree, and at some point I then decided to give up trying to hack > things up so they would work together again). If you've got a provisional patch, I can get some testing organised on newer and older hardware. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]
Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]"): > Oh, that's pretty unfortunate. I think we'll then need a custom flight > tied to the box that this failure occurred on, to have a way to tell > whether the fix I'm about to prepare has actually helped, Even though it is no longer regarded as a regression by osstest, the job will still be host-sticky. So if you commit a patch to staging, you should be able to see whether it has helped. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]
>>> On 05.12.17 at 16:05, <ian.jack...@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions > - > FAIL"): >> This is a blue screen, recurring, and has first been reported in flight >> 116779, i.e. was likely introduced in the batch ending in commit >> 4cd0fad645. Among those the most likely candidates appear to be >> the SVM changes (the failures are all on AMD hardware). The logs >> there also have huge amounts of "Unexpected nested vmexit", >> albeit not directly connected with the failed test afaict. > > Ian Jackson writes ("Re: [xen-unstable test] 116832: regressions - FAIL"): >> This is the expected Windows failure. Force pushed. > > Oops. Sorry about that. > > I think this goes to show that (i) leaving known failures languishing > for months and expecting them to be force pushed results in human > error (ii) I should read the whole email thread first. Oh, that's pretty unfortunate. I think we'll then need a custom flight tied to the box that this failure occurred on, to have a way to tell whether the fix I'm about to prepare has actually helped, the more that the same issue is presumably also present on the 4.10 branch. Thing is that newer AMD hardware (with decode assist) doesn't appear to demonstrate the misbehavior, and for some reason it also doesn't show on Intel systems. I've spent quite a bit of time to repro this on my old AMD box, but the distro on there is just too old to be able to start a suitable Windows guest (part(?) of the reason being that scripts in /etc/xen/scripts appear to get invoked alongside the ones from the separate unstable install tree, and at some point I then decided to give up trying to hack things up so they would work together again). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
osstest service owner writes ("[xen-unstable test] 116832: regressions - FAIL"): > flight 116832 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/116832/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. > 116744 > version targeted for testing: > xen 553ac37137c2d1c03bf1b69cfb192ffbfe29daa4 This is the expected Windows failure. Force pushed. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
On 05/12/17 11:16, Jan Beulich wrote: On 05.12.17 at 11:03,wrote: >> On 05/12/2017 09:30, Jan Beulich wrote: >> On 05.12.17 at 09:49, wrote: flight 116832 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/116832/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. >> 116744 >>> This is a blue screen, recurring, and has first been reported in flight >>> 116779, i.e. was likely introduced in the batch ending in commit >>> 4cd0fad645. Among those the most likely candidates appear to be >>> the SVM changes (the failures are all on AMD hardware). The logs >>> there also have huge amounts of "Unexpected nested vmexit", >>> albeit not directly connected with the failed test afaict. >> The unexpected nested vmexit is from a previous test, (and hopefully the >> nested virt test, as that path shouldn't be reachable elsehow). >> >> The windows boot which actually failed has: >> >> Dec 5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ >> 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc > How did I not spot this? This is a REP INSW, which then makes it > far more likely to be a result of Paul's 9c9384d6d8. Sadly the > windows-install tests of the two 4.10 flights we've had so far all > ran on Intel hardware, so we can't easily tell whether the > problem is present there as well (in which case it would for sure > be that commit). > > For the moment I'm struggling to understand how we can end up > on the CR access emulation path here, but I'll take a closer look. I am equally perplexed. The SVM code definitely used to (ab)use the MMIO path, so I expect there is still some remnants left. The CR access in the message proves that we started this emulation from a CR vmexit. My best guess is that _hvm_emulate_one() reused the instruction cache rather than starting fresh. The unhandleable is probably from a ->validate() failure, and the bytes are probably stale from the previous emulation. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
>>> On 05.12.17 at 11:03,wrote: > On 05/12/2017 09:30, Jan Beulich wrote: > On 05.12.17 at 09:49, wrote: >>> flight 116832 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/116832/ >>> >>> Regressions :-( >>> >>> Tests which did not succeed and are blocking, >>> including tests which could not be run: >>> test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. > 116744 >> This is a blue screen, recurring, and has first been reported in flight >> 116779, i.e. was likely introduced in the batch ending in commit >> 4cd0fad645. Among those the most likely candidates appear to be >> the SVM changes (the failures are all on AMD hardware). The logs >> there also have huge amounts of "Unexpected nested vmexit", >> albeit not directly connected with the failed test afaict. > > The unexpected nested vmexit is from a previous test, (and hopefully the > nested virt test, as that path shouldn't be reachable elsehow). > > The windows boot which actually failed has: > > Dec 5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ > 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc How did I not spot this? This is a REP INSW, which then makes it far more likely to be a result of Paul's 9c9384d6d8. Sadly the windows-install tests of the two 4.10 flights we've had so far all ran on Intel hardware, so we can't easily tell whether the problem is present there as well (in which case it would for sure be that commit). For the moment I'm struggling to understand how we can end up on the CR access emulation path here, but I'll take a closer look. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
On 05/12/2017 10:03, Andrew Cooper wrote: > On 05/12/2017 09:30, Jan Beulich wrote: > On 05.12.17 at 09:49,wrote: >>> flight 116832 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/116832/ >>> >>> Regressions :-( >>> >>> Tests which did not succeed and are blocking, >>> including tests which could not be run: >>> test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. >>> 116744 >> This is a blue screen, recurring, and has first been reported in flight >> 116779, i.e. was likely introduced in the batch ending in commit >> 4cd0fad645. Among those the most likely candidates appear to be >> the SVM changes (the failures are all on AMD hardware). The logs >> there also have huge amounts of "Unexpected nested vmexit", >> albeit not directly connected with the failed test afaict. > The unexpected nested vmexit is from a previous test, (and hopefully the > nested virt test, as that path shouldn't be reachable elsehow). > > The windows boot which actually failed has: > > Dec 5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ > 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc > Dec 5 04:21:49.555130 (XEN) stdvga.c:173:d1v0 entering stdvga mode > > which I expect is the root of the problem. Yes - that is the cause of the problem. The BSOD is a 0x3D (exception not handled) referencing the same %rip as the CR emulation failure. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
On 05/12/2017 09:30, Jan Beulich wrote: On 05.12.17 at 09:49,wrote: >> flight 116832 xen-unstable real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/116832/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. >> 116744 > This is a blue screen, recurring, and has first been reported in flight > 116779, i.e. was likely introduced in the batch ending in commit > 4cd0fad645. Among those the most likely candidates appear to be > the SVM changes (the failures are all on AMD hardware). The logs > there also have huge amounts of "Unexpected nested vmexit", > albeit not directly connected with the failed test afaict. The unexpected nested vmexit is from a previous test, (and hopefully the nested virt test, as that path shouldn't be reachable elsehow). The windows boot which actually failed has: Dec 5 04:20:08.735216 (XEN) CR access emulation failed (1): d1v0 64bit @ 0010:f8000ce9e4ab -> 66 f3 6d 48 8b 7c 24 08 c3 cc cc cc cc cc cc cc Dec 5 04:21:49.555130 (XEN) stdvga.c:173:d1v0 entering stdvga mode which I expect is the root of the problem. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL
>>> On 05.12.17 at 09:49,wrote: > flight 116832 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/116832/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. > 116744 This is a blue screen, recurring, and has first been reported in flight 116779, i.e. was likely introduced in the batch ending in commit 4cd0fad645. Among those the most likely candidates appear to be the SVM changes (the failures are all on AMD hardware). The logs there also have huge amounts of "Unexpected nested vmexit", albeit not directly connected with the failed test afaict. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 116832: regressions - FAIL
flight 116832 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/116832/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail REGR. vs. 116744 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 116744 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 116744 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 116744 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 116744 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 116744 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 116744 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 116744 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 116744 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 116744 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass test-amd64-amd64-xl-pvhv2-amd 12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qcow2 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass version targeted for testing: xen 553ac37137c2d1c03bf1b69cfb192ffbfe29daa4 baseline version: xen 6da091d95dfcbe00daf91308d044ee5151b1ac9e Last test of basis 116744 2017-12-01 13:53:15 Z3 days Failing since116779 2017-12-02 17:06:23 Z2 days3 attempts Testing same since 116832 2017-12-04 13:57:29 Z0 days1 attempts People who touched revisions under test: Andrew CooperBoris Ostrovsky Brian Woods David Esler Euan Harris Gregory Herrero Haozhong Zhang Ian Jackson Jan Beulich Kevin Tian Paul Durrant Roger Pau Monné Sergey Dyasli Stefano Stabellini Zhenzhong Duan