Public bug reported:

Hi,
I happened rather often (but not 100% reproducible) into an issue that I wanted 
to document and ask if that is some sort of known issue.

On migration with options like:
$ virsh migrate --live --postcopy --postcopy-after-precopy 
kvmguest-zesty-postcopy qemu+ssh://10.93.175.192/system


Note: All other migration types we test are working.
Even postcopy is good without background workload.
Just this combination of postcopy-after-precopy + background workload seems to 
make it fail.

FYI - BG-Load is on a 4 vcpu guest
- nohup stress-ng -m 1 --vm-keep --vm-bytes 256M 1>/dev/null 2>&1 &
- nohup md5sum /dev/urandom 1>/dev/null 2>&1 &
- nohup bash -c "while /bin/true; do dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M 
count=100; done" 1>/dev/null 2>&1
That load runs on 3 of those guests on a 8 CPU Host.
So we make more than the 8 cpus we have busy with the load.

Migration is accounted as success on initiator, but as paused on target
 46577 State:          paused

The error I get from qmeu is on kvm run like:
cat /var/log/libvirt/qemu/kvmguest-zesty-postcopy.log
[...]
 46164 error: kvm run failed Bad address                                        
        
 46165 PSW=mask 0404d00180000000 addr 0000000000831996 cc 00                    
        
 46166 R00=0000000021109f7a R01=000000002b218d03 R02=00000000f8e7ec87 
R03=0000000053e86c4c
 46167 R04=000000005a446a8d R05=0000000099f29f74 R06=0000000037fee6fa 
R07=000000009640eb9d
 46168 R08=00000000ac47c987 R09=0000000089a8182d R10=070000004656507d 
R11=00000000b1edcf28
 46169 R12=00000000d915d7c0 R13=00000000008a5eb0 R14=000000000060f146 
R15=000000001da3bc58
 46170 F00=000003ffc0f7eb58 F01=000002aa112cc260 F02=000002aa10c88b40 
F03=0000000000008000
 46171 F04=0000000000008000 F05=000003ffc0f7eeb0 F06=000002aa112cc030 
F07=000003ffc0f7ebfc
 46172 F08=000002aa10c8d100 F09=000003ffa9b92200 F10=0000000021deb968 
F11=000002aa3f7a9820
 46173 F12=0000000021dea7c8 F13=000003ffcdcfeaa8 F14=000003ffefc7f390 
F15=000003ffc0f7eea8
 46174 V00=000003ffc0f7eb580000000000000000 
V01=000002aa112cc2600000000000000000        
 46175 V02=000002aa10c88b400000000000000000 
V03=00000000000080000000000000000000        
 46176 V04=00000000000080000000000000000000 
V05=000003ffc0f7eeb00000000000000000        
 46177 V06=000002aa112cc0300000000000000000 
V07=000003ffc0f7ebfc0000000000000000        
 46178 V08=000002aa10c8d1000000000000000000 
V09=000003ffa9b922000000000000000000        
 46179 V10=0000000021deb9680000000000000000 
V11=000002aa3f7a98200000000000000000        
 46180 V12=0000000021dea7c80000000000000000 
V13=000003ffcdcfeaa80000000000000000        
 46181 V14=000003ffefc7f3900000000000000000 
V15=000003ffc0f7eea80000000000000000        
 46182 V16=00000000000000050000000000000000 
V17=00000000000000060000000000000000        
 46183 V18=40404040404040404040404040404040 
V19=00000000000000050000000000000000        
 46184 V20=0f0e0d0c0b0a09080706050403020100 
V21=ffffffff00ffff000000000000000000        
 46185 V22=0000ff00000000000000000000000000 
V23=00000000000000000000000000000000        
 46186 V24=00000000000000000000000000000000 
V25=00000000000000000000000000000000        
 46187 V26=00000000000000000000000000000000 
V27=00000000000000000000000000000000        
 46188 V28=00000000000000000000000000000000 
V29=00000000000000000000000000000000        
 46189 V30=000002aa0ba5bc300000000000000000 
V31=00000000010e14190000000000000001        
 46190 C00=0080000014866a10 C01=000000001d3d41c7 C02=0000000000011140 
C03=0000000000000000
 46191 C04=0000000000000a74 C05=0000000000000400 C06=0000000010000000 
C07=000000001d3d41c7
 46192 C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 
C11=0000000000000000
 46193 C12=0000000000000000 C13=0000000000d6c007 C14=00000000db000000 
C15=0000000000011280


FYI: Our machine is generally very slow, especially on I/O, but also on CPU 
when the builders are busy. Same test run good a few days ago, seems to depend 
on overall machine load adding up to the background load on migration test. 
Which in turn adds up to break it on s390x.
Note: It is also a very unfair comparison, we have 8 cores on s390x, while on 
x86 and ppc we have way more.

I haven't catched it "live" so far to debug it any further - only in
automated testing I realized that this is at least occurring once every
other week.

Affected releases seem to be Yakkety (libvirt 2.1 / qemu 2.6.1) and zesty 
(libvirt 2.5 / qmeu 2.8).
As soon as our Artful stack is fully done I'll add those.

For know a check against known issues would be nice.

** Affects: ubuntu-z-systems
     Importance: Undecided
     Assignee: bugproxy (bugproxy)
         Status: New

** Affects: qemu (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704829

Title:
  crash on s390 in kvm run due to background load on postcopy

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1704829/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to