Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-07-16 Thread Jim Fehlig

On 06/13/2018 05:18 AM, Ian Jackson wrote:

Jim: please read down to where I discuss
test-amd64-amd64-libvirt-pair.  If you have any insight I'd appreciate
it.  Let me know if you want me to preserve the logs, which will
otherwise expire in a few weeks.


Whoa, sorry for the delay. This mail found a dumb bug in my filter for xen-devel 
mail.



  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass in 
123701


 From the log:

2018-06-12 20:59:40 Z executing ssh ... root@172.16.144.61 virsh migrate --live 
debian.guest.osstest xen+ssh://joubertin0
error: Timed out during operation: cannot acquire state change lock
2018-06-12 21:00:16 Z command nonzero waitstatus 256: [..]

The libvirt libxl logs seem to show libxl doing a successful
migration.


With the long delay, I'm afraid the logs have expired. Do you still see the 
problem? All the recent runs seem to be plagued with libvirt's change to require 
GnuTLS


https://libvirt.org/git/?p=libvirt.git;a=commit;h=60d9ad6f1e42618fce10baeb0f02c35e5ebd5b24


Looking at the logs I see this:

2018-06-12 21:00:16.784+: 3507: warning :
libxlDomainObjBeginJob:151 : Cannot start job (modify) for domain
debian.guest.osstest; current job is (modify) owned by (24947)

That job number looks like it's about right for a pid, but I think it
must be a thread because it doesn't show up in the ps output.


Likely a libvirtd worker thread doing something that requires modifying the 
state of virDomainObj.



I did see this:

Jun 12 21:00:20 joubertin0 logger: /etc/xen/scripts/vif-bridge: iptables setup 
failed. This may affect guest networking.

but that seems to be after the failure.


A wild guess, but is it possible thread 24947 is running a domain create 
operation, which includes executing vif-bridge, that is taking longer than 
expected to complete?



I don't have an explanation.  I don't really know what this lock is.


It's a lock that serializes domain state modifications (changing virDomainObj). 
Wait time for the lock is currently hardcoded to 30sec. The thread emitting the 
warning surpassed the timeout, waiting for 24947 to finish whatever it was doing.


Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-06-14 Thread Ian Jackson
Jan Beulich writes ("Re: [xen-4.8-testing test] 124100: regressions - FAIL"):
> I'd favor a sufficiently justified (as it is now) force push, and then
> an immediate release. As you say elsewhere, the problem with in
> particular the albanas has been bad enough for it to be sufficiently
> unpredictable when a normal push might happen (in fact I was
> surprised by how quickly this happened on 4.7, as I had also
> expressed in a reply to that flight's report).
> 
> For the release you'd need to tag qemu-trad and (as was asked by
> Wei iirc on irc) mini-os, such that I could then push the version
> update on the main tree.

Right.  So, I have force pushed
1522a81acea5c6109f6f791d528fd8724117fb63.

I have tagged xen-4.8.4 in qemu-xen-traditional and xen-RELEASE-4.8.4
in mini-os.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-06-13 Thread Jan Beulich
>>> On 13.06.18 at 13:18,  wrote:
> Jan asked me to investigate why we weren't getting a push on Xen 4.8.
> I investigated the failures in this osstest report.
> 
> I think we have a combination of:
> 
>  * Flaky armhf hardware
> 
>  * A real Xen-related heisenbug bug (but which is not a regression)
>(the EFAULT libxc/linux bug; CC to Juergen)
> 
>  * Mystery failures to make progress during local computation
>and I/O which look like Linux kernel bugs
> 
>  * Incompatibility between Xen 4.8 and osstest's approach to UEFI
>booting, now fixed.
> 
>  * A mystery libvirt heisenbug.  (Hence the CC to Jim.)

Thanks a lot for the analysis!

> Jan: I would be inclined to force push this.  OTOH, if we wait,
> eventually osstest will collect a set of flights which osstest's
> archeaologist can see justifies a push.

Considering

Last test of basis   123091  2018-05-23 07:11:28 Z   20 days
Failing since123345  2018-05-29 08:36:34 Z   14 days   13 attempts
Testing same since   123492  2018-05-31 20:14:51 Z   12 days   11 attempts

I'd favor a sufficiently justified (as it is now) force push, and then
an immediate release. As you say elsewhere, the problem with in
particular the albanas has been bad enough for it to be sufficiently
unpredictable when a normal push might happen (in fact I was
surprised by how quickly this happened on 4.7, as I had also
expressed in a reply to that flight's report).

For the release you'd need to tag qemu-trad and (as was asked by
Wei iirc on irc) mini-os, such that I could then push the version
update on the main tree.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-06-13 Thread Ian Jackson
Jan asked me to investigate why we weren't getting a push on Xen 4.8.
I investigated the failures in this osstest report.

I think we have a combination of:

 * Flaky armhf hardware

 * A real Xen-related heisenbug bug (but which is not a regression)
   (the EFAULT libxc/linux bug; CC to Juergen)

 * Mystery failures to make progress during local computation
   and I/O which look like Linux kernel bugs

 * Incompatibility between Xen 4.8 and osstest's approach to UEFI
   booting, now fixed.

 * A mystery libvirt heisenbug.  (Hence the CC to Jim.)

Jan: I would be inclined to force push this.  OTOH, if we wait,
eventually osstest will collect a set of flights which osstest's
archeaologist can see justifies a push.

Jim: please read down to where I discuss
test-amd64-amd64-libvirt-pair.  If you have any insight I'd appreciate
it.  Let me know if you want me to preserve the logs, which will
otherwise expire in a few weeks.

Juergen: this is just FYI.

HTH.


osstest service owner writes ("[xen-4.8-testing test] 124100: regressions - 
FAIL"):
> flight 124100 xen-4.8-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/124100/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  build-armhf-xsm  broken  in 
> 124070
>  build-armhf-xsm  5 host-build-prep fail in 124070 REGR. vs. 
> 123091
>  build-armhf   6 xen-build  fail in 123844 REGR. vs. 
> 123091

I haven't looked but I think these are the arndale bug.  It looked at
124070 and 123844 because it was trying to determine that other
failures were heisenbugs.


>  build-i386-pvops  6 kernel-build   fail in 123844 REGR. vs. 
> 123091

This is "git checkout 57a3ca7835962109d94533465a75e8c716b26845" taking
more than 4000 seconds (!) on albana1.

I have looked at the host logs for albana1 and there seem to be no
other build failures, except for libvirt ones (which is expected
because there is a race in the libvirt makefiles).

I looked at the logs for this particular failure.  osstest collected
ps output, which shows this:

14583 ?D00:00:03   0   3  0.6  1.7  1 balance_dirty_pages_ratel 
 \_ git checkout 57a3ca7835962109d94533465a75e8c716b26845

There is nothing unexpected or interesting in any of the logfiles.
Note that this host was not running Xen.  The kernel was the default
Debian jessie i386 kernel.

I have no real explanation.  This seems like it must be a bug in the
kernel.


> Tests which are failing intermittently (not blocking):
>  test-amd64-amd64-xl-credit2   7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail in 123701 pass 
> in 124100
>  test-amd64-amd64-livepatch7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-amd64-pair   10 xen-boot/src_host fail in 123701 pass in 
> 124100
>  test-amd64-amd64-pair   11 xen-boot/dst_host fail in 123701 pass in 
> 124100
>  test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-i386-rumprun-i386  7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-i386-libvirt-xsm   7 xen-boot fail in 123701 pass in 
> 124100
>  test-amd64-i386-migrupgrade 10 xen-boot/src_host fail in 123701 pass in 
> 124100
>  test-amd64-i386-migrupgrade 11 xen-boot/dst_host fail in 123701 pass in 
> 124100
>  test-amd64-amd64-xl-multivcpu  7 xen-bootfail in 123701 pass in 
> 124100
>  test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail in 123701 pass in 
> 124100
>  test-xtf-amd64-amd64-37 xen-boot fail in 123844 pass in 
> 124100

I haven't looked at all of these, but I have looked at a few,
including the xtf test in 123844.  The jobs I looked at ran on one of
the albanas (the new uefi hosts).  These flights were after albana*
were put into service but before I taught osstest to avoid trying to
boot xen.gz from 4.9 and earlier on uefi hosts (by avoiding running
4.9 tests on those hosts at all).

So I think these failures are all understood and expected.  osstest is
fixed now, so they will not occur in new runs.  osstest is trying to
justify them as heisenbugs, by observing that they passed in 124100.

The wide range of affected tests means that osstest ends up looking
for a lot of other passes to justify these, and I think that is a big
part of the reason why the push is taking so long.


>  test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass 
> in 123701

From the log:

2018-06-12 20:59:40 Z executing ssh ... root@172.16.144.61 virsh migrate --live 
debian.guest.osstest xen+ssh://joubertin0
error: Timed out during operation: cannot acquire state change lock
2018-06-12 21:00:16 Z command nonzero 

[Xen-devel] [xen-4.8-testing test] 124100: regressions - FAIL

2018-06-12 Thread osstest service owner
flight 124100 xen-4.8-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/124100/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-xsm  broken  in 124070
 build-i386-pvops  6 kernel-build   fail in 123844 REGR. vs. 123091
 build-armhf   6 xen-build  fail in 123844 REGR. vs. 123091
 build-armhf-xsm  5 host-build-prep fail in 124070 REGR. vs. 123091

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-credit2   7 xen-boot fail in 123701 pass in 124100
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail in 123701 pass 
in 124100
 test-amd64-amd64-livepatch7 xen-boot fail in 123701 pass in 124100
 test-amd64-amd64-pair   10 xen-boot/src_host fail in 123701 pass in 124100
 test-amd64-amd64-pair   11 xen-boot/dst_host fail in 123701 pass in 124100
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail in 123701 pass in 124100
 test-amd64-i386-rumprun-i386  7 xen-boot fail in 123701 pass in 124100
 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail in 123701 pass in 
124100
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail in 123701 pass in 124100
 test-amd64-i386-libvirt-xsm   7 xen-boot fail in 123701 pass in 124100
 test-amd64-i386-migrupgrade 10 xen-boot/src_host fail in 123701 pass in 124100
 test-amd64-i386-migrupgrade 11 xen-boot/dst_host fail in 123701 pass in 124100
 test-amd64-amd64-xl-multivcpu  7 xen-bootfail in 123701 pass in 124100
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail in 123701 pass in 124100
 test-xtf-amd64-amd64-37 xen-boot fail in 123844 pass in 124100
 test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass in 
123701
 test-amd64-amd64-xl-rtds 10 debian-install fail pass in 123844
 test-amd64-i386-libvirt-pair 22 guest-migrate/src_host/dst_host fail pass in 
124070
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 13 guest-saverestore fail pass 
in 124070

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)blocked in 123844 n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 1 build-check(1) blocked in 
123844 n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked in 
123844 n/a
 test-amd64-i386-xl-raw1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1) blocked in 123844 n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked in 123844 n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)blocked in 123844 n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-livepatch 1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64 1 build-check(1) blocked in 123844 n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1) blocked in 123844 n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1) blocked in 123844 n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked in 123844 n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl-qemuu-win10-i386  1 build-check(1)blocked in 123844 n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked in 123844 n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked in 
123844 n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)  blocked in 123844 n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-check(1)blocked in 123844 n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked in 123844 n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-rumprun-i386  1 build-check(1)   blocked in 123844 n/a
 build-armhf-libvirt   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl-qemut-ws16-amd64  1 build-check(1)blocked in 123844 n/a
 test-amd64-i386-pair  1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-migrupgrade   1 build-check(1)   blocked in 123844 n/a
 test-amd64-i386-xl-qemut-win10-i386  1 build-check(1)blocked in 123844 n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 1 build-check(1) blocked in