> On Thursday 30 January 2014 13:23:04 Neil Skrypuch wrote: >> First, let me briefly outline the way we use live migration, as it is >> probably not typical. We use live migration (with block migration) to make >> backups of VMs with zero downtime. The basic process goes like this: >> >> 1) migrate src VM -> dest VM >> 2) migration completes >> 3) cont src VM >> 4) gracefully shut down dest VM >> 5) dest VM's disk image is now a valid backup >> >> In general, this works very well. >> >> Up until now we have been using qemu-kvm 1.1.2 and have not had any issues >> with the above process. I am now attempting to upgrade us to a newer version >> of qemu, but all newer versions I've tried occasionally result in the >> virtio- net device ceasing to function on the src VM after step 3. >> >> I am able to reproduce this reliably (given enough iterations), it happens >> in roughly 2% of all migrations. >> >> Here is the complete qemu command line for the src VM: >> >> /usr/bin/qemu-system-x86_64 -machine accel=kvm -drive >> file=/var/lib/kvm/testbackup.polldev.com.img,if=virtio -m 2048 -smp >> 4,cores=4,sockets=1,threads=1 -net >> nic,macaddr=52:54:98:00:00:00,model=virtio -net tap,script=/etc/qemu-ifup- >> br2,downscript=no -curses -name >> "testbackup.polldev.com",process=testbackup.polldev.com -monitor >> unix:/var/lib/kvm/monitor/testbackup,server,nowait >> >> The dest VM: >> >> /usr/bin/qemu-system-x86_64 -machine accel=kvm -drive >> file=/backup/testbackup.polldev.com.img.bak20140129,if=virtio -m 2048 -smp >> 4,cores=4,sockets=1,threads=1 -net >> nic,macaddr=52:54:98:00:00:00,model=virtio -net tap,script=no,downscript=no >> - curses -name "testbackup.polldev.com",process=testbackup.polldev.com >> -monitor unix:/var/lib/kvm/monitor/testbackup.bak,server,nowait -incoming >> tcp:0:4444 >> >> The migration is performed like so: >> >> echo "migrate -b tcp:localhost:4444" | socat STDIO UNIX- >> CONNECT:/var/lib/kvm/monitor/testbackup >> echo "migrate_set_speed 1G" | socat STDIO UNIX- >> CONNECT:/var/lib/kvm/monitor/testbackup >> #wait >> echo cont | socat STDIO UNIX-CONNECT:/var/lib/kvm/monitor/testbackup >> >> The guest in question is a minimal install of CentOS 6.5. >> >> I have observed this issue across the following qemu versions: >> >> qemu 1.4.2 >> qemu 1.6.0 >> qemu 1.6.1 >> qemu 1.7.0 >> >> I also attempted to test qemu 1.5.3, but live migration flat out crashed >> there (totally different issue). >> >> I have also tested a number of other scenarios with qemu 1.6.0, all of which >> exhibit the same failure mode: >> >> qemu 1.6.0 + host kernel 3.1.0 >> qemu 1.6.0 + host kernel 3.10.7 >> qemu 1.6.0 + host kernel 3.10.17 >> qemu 1.6.0 + virtio with -netdev/-device syntax >> qemu 1.6.0 + accel=tcg >> >> The one case I have found that works properly is the following: >> >> qemu 1.6.0 + e1000 >> >> It is worth noting that when the virtio-net device ceases to function in the >> guest that removing and reinserting the virtio-net kernel module results in >> the device working again (except in 1.4.2, this had no effect there). >> >> As mentioned above I can reproduce this with minimal effort, and am willing >> to test out any patches or provide further details as necessary. >> >> - Neil > > Ok, I was able to narrow this down to somewhere in between 1.2.2 (or rather, > 1.2.0) and 1.3.0. Migration in 1.3.0 is broken, however, I was able to cherry > pick d7cd369, d5f1f28, and 9ee0cb2 on top of 1.3.0 to fix the unrelated > migration bug and confirm that the bug from this thread is still present in > 1.3.0. > > I started a git bisect on 1.2.2..1.3.0 but didn't get very far before running > into several unrelated bugs which kept migration from working. > > I also tested out the latest master code (d844a7b) and it fails in the same > way as 1.7.0. > > - Neil >
hi,have you try to ping from vm to other host after migration?