Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]

2017-12-05 Thread Andrew Cooper
On 05/12/17 15:31, Jan Beulich wrote:
 On 05.12.17 at 16:05,  wrote:
>> Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions 
>> - 
>> FAIL"):
>>> This is a blue screen, recurring, and has first been reported in flight
>>> 116779, i.e. was likely introduced in the batch ending in commit
>>> 4cd0fad645. Among those the most likely candidates appear to be
>>> the SVM changes (the failures are all on AMD hardware). The logs
>>> there also have huge amounts of "Unexpected nested vmexit",
>>> albeit not directly connected with the failed test afaict.
>> Ian Jackson writes ("Re: [xen-unstable test] 116832: regressions - FAIL"):
>>> This is the expected Windows failure.  Force pushed.
>> Oops.  Sorry about that.
>>
>> I think this goes to show that (i) leaving known failures languishing
>> for months and expecting them to be force pushed results in human
>> error (ii) I should read the whole email thread first.
> Oh, that's pretty unfortunate. I think we'll then need a custom flight
> tied to the box that this failure occurred on, to have a way to tell
> whether the fix I'm about to prepare has actually helped, the more
> that the same issue is presumably also present on the 4.10 branch.
> Thing is that newer AMD hardware (with decode assist) doesn't
> appear to demonstrate the misbehavior, and for some reason it also
> doesn't show on Intel systems.
>
> I've spent quite a bit of time to repro this on my old AMD box, but the
> distro on there is just too old to be able to start a suitable Windows
> guest (part(?) of the reason being that scripts in /etc/xen/scripts
> appear to get invoked alongside the ones from the separate unstable
> install tree, and at some point I then decided to give up trying to hack
> things up so they would work together again).

If you've got a provisional patch, I can get some testing organised on
newer and older hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]

2017-12-05 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions - 
FAIL [and 1 more messages]"):
> Oh, that's pretty unfortunate. I think we'll then need a custom flight
> tied to the box that this failure occurred on, to have a way to tell
> whether the fix I'm about to prepare has actually helped,

Even though it is no longer regarded as a regression by osstest, the
job will still be host-sticky.  So if you commit a patch to staging,
you should be able to see whether it has helped.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 116832: regressions - FAIL [and 1 more messages]

2017-12-05 Thread Jan Beulich
>>> On 05.12.17 at 16:05,  wrote:
> Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 116832: regressions 
> - 
> FAIL"):
>> This is a blue screen, recurring, and has first been reported in flight
>> 116779, i.e. was likely introduced in the batch ending in commit
>> 4cd0fad645. Among those the most likely candidates appear to be
>> the SVM changes (the failures are all on AMD hardware). The logs
>> there also have huge amounts of "Unexpected nested vmexit",
>> albeit not directly connected with the failed test afaict.
> 
> Ian Jackson writes ("Re: [xen-unstable test] 116832: regressions - FAIL"):
>> This is the expected Windows failure.  Force pushed.
> 
> Oops.  Sorry about that.
> 
> I think this goes to show that (i) leaving known failures languishing
> for months and expecting them to be force pushed results in human
> error (ii) I should read the whole email thread first.

Oh, that's pretty unfortunate. I think we'll then need a custom flight
tied to the box that this failure occurred on, to have a way to tell
whether the fix I'm about to prepare has actually helped, the more
that the same issue is presumably also present on the 4.10 branch.
Thing is that newer AMD hardware (with decode assist) doesn't
appear to demonstrate the misbehavior, and for some reason it also
doesn't show on Intel systems.

I've spent quite a bit of time to repro this on my old AMD box, but the
distro on there is just too old to be able to start a suitable Windows
guest (part(?) of the reason being that scripts in /etc/xen/scripts
appear to get invoked alongside the ones from the separate unstable
install tree, and at some point I then decided to give up trying to hack
things up so they would work together again).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel