Public bug reported:
Hi,
I happened rather often (but not 100% reproducible) into an issue that I wanted
to document and ask if that is some sort of known issue.
On migration with options like:
$ virsh migrate --live --postcopy --postcopy-after-precopy
kvmguest-zesty-postcopy qemu+ssh://10.93.175.192/system
Note: All other migration types we test are working.
Even postcopy is good without background workload.
Just this combination of postcopy-after-precopy + background workload seems to
make it fail.
FYI - BG-Load is on a 4 vcpu guest
- nohup stress-ng -m 1 --vm-keep --vm-bytes 256M 1>/dev/null 2>&1 &
- nohup md5sum /dev/urandom 1>/dev/null 2>&1 &
- nohup bash -c "while /bin/true; do dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M
count=100; done" 1>/dev/null 2>&1
That load runs on 3 of those guests on a 8 CPU Host.
So we make more than the 8 cpus we have busy with the load.
Migration is accounted as success on initiator, but as paused on target
46577 State: paused
The error I get from qmeu is on kvm run like:
cat /var/log/libvirt/qemu/kvmguest-zesty-postcopy.log
[...]
46164 error: kvm run failed Bad address
46165 PSW=mask 0404d00180000000 addr 0000000000831996 cc 00
46166 R00=0000000021109f7a R01=000000002b218d03 R02=00000000f8e7ec87
R03=0000000053e86c4c
46167 R04=000000005a446a8d R05=0000000099f29f74 R06=0000000037fee6fa
R07=000000009640eb9d
46168 R08=00000000ac47c987 R09=0000000089a8182d R10=070000004656507d
R11=00000000b1edcf28
46169 R12=00000000d915d7c0 R13=00000000008a5eb0 R14=000000000060f146
R15=000000001da3bc58
46170 F00=000003ffc0f7eb58 F01=000002aa112cc260 F02=000002aa10c88b40
F03=0000000000008000
46171 F04=0000000000008000 F05=000003ffc0f7eeb0 F06=000002aa112cc030
F07=000003ffc0f7ebfc
46172 F08=000002aa10c8d100 F09=000003ffa9b92200 F10=0000000021deb968
F11=000002aa3f7a9820
46173 F12=0000000021dea7c8 F13=000003ffcdcfeaa8 F14=000003ffefc7f390
F15=000003ffc0f7eea8
46174 V00=000003ffc0f7eb580000000000000000
V01=000002aa112cc2600000000000000000
46175 V02=000002aa10c88b400000000000000000
V03=00000000000080000000000000000000
46176 V04=00000000000080000000000000000000
V05=000003ffc0f7eeb00000000000000000
46177 V06=000002aa112cc0300000000000000000
V07=000003ffc0f7ebfc0000000000000000
46178 V08=000002aa10c8d1000000000000000000
V09=000003ffa9b922000000000000000000
46179 V10=0000000021deb9680000000000000000
V11=000002aa3f7a98200000000000000000
46180 V12=0000000021dea7c80000000000000000
V13=000003ffcdcfeaa80000000000000000
46181 V14=000003ffefc7f3900000000000000000
V15=000003ffc0f7eea80000000000000000
46182 V16=00000000000000050000000000000000
V17=00000000000000060000000000000000
46183 V18=40404040404040404040404040404040
V19=00000000000000050000000000000000
46184 V20=0f0e0d0c0b0a09080706050403020100
V21=ffffffff00ffff000000000000000000
46185 V22=0000ff00000000000000000000000000
V23=00000000000000000000000000000000
46186 V24=00000000000000000000000000000000
V25=00000000000000000000000000000000
46187 V26=00000000000000000000000000000000
V27=00000000000000000000000000000000
46188 V28=00000000000000000000000000000000
V29=00000000000000000000000000000000
46189 V30=000002aa0ba5bc300000000000000000
V31=00000000010e14190000000000000001
46190 C00=0080000014866a10 C01=000000001d3d41c7 C02=0000000000011140
C03=0000000000000000
46191 C04=0000000000000a74 C05=0000000000000400 C06=0000000010000000
C07=000000001d3d41c7
46192 C08=0000000000000000 C09=0000000000000000 C10=0000000000000000
C11=0000000000000000
46193 C12=0000000000000000 C13=0000000000d6c007 C14=00000000db000000
C15=0000000000011280
FYI: Our machine is generally very slow, especially on I/O, but also on CPU
when the builders are busy. Same test run good a few days ago, seems to depend
on overall machine load adding up to the background load on migration test.
Which in turn adds up to break it on s390x.
Note: It is also a very unfair comparison, we have 8 cores on s390x, while on
x86 and ppc we have way more.
I haven't catched it "live" so far to debug it any further - only in
automated testing I realized that this is at least occurring once every
other week.
Affected releases seem to be Yakkety (libvirt 2.1 / qemu 2.6.1) and zesty
(libvirt 2.5 / qmeu 2.8).
As soon as our Artful stack is fully done I'll add those.
For know a check against known issues would be nice.
** Affects: ubuntu-z-systems
Importance: Undecided
Assignee: bugproxy (bugproxy)
Status: New
** Affects: qemu (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704829
Title:
crash on s390 in kvm run due to background load on postcopy
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1704829/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs