Bug#627615: VM terminates when doing a live migration
25.05.2011 22:42, Daniel Bareiro wrote: I tested the new version of qemu-kvm (0.12.5+dfsg-5+squeeze2) available on Squeeze which was notified in the DSA 2241-1, but the problem still persists. Sure it persists because DSA 2241-1 has absolutely nothing to do with the problem. It was an unrelated security update. You may guess that by the fact that this bug has not been closed by the update. I told you where the fix is. /mjt -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#627615: VM terminates when doing a live migration
On Thursday, 26 May 2011 12:22:36 +0400, Michael Tokarev wrote: I tested the new version of qemu-kvm (0.12.5+dfsg-5+squeeze2) available on Squeeze which was notified in the DSA 2241-1, but the problem still persists. Sure it persists because DSA 2241-1 has absolutely nothing to do with the problem. It was an unrelated security update. You may guess that by the fact that this bug has not been closed by the update. Yes, I know it has nothing to do with the problem. But as you told me that both problems will be fixed in the upcoming Squeeze version of qemu-kvm, I wanted to try this version to see if there was any difference. I found it odd that nobody has answered your report in #625571. I told you where the fix is. Great! Thanks! Regards, Daniel -- Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 17:03:20 up 5 days, 3:22, 10 users, load average: 0.07, 0.04, 0.00 signature.asc Description: Digital signature
Bug#627615: VM terminates when doing a live migration
On Monday, 23 May 2011 15:45:20 +0400, Michael Tokarev wrote: This bug has 2 halves, one half is general 32bit migration issue (it does not actually work on 32bits, the fact it worked for you is pure luck), and second half is special case of 32bit userspace running on 64bit kernel (due to wrong kernel/user space communications). Both will be fixed in the upcoming squeeze version of qemu-kvm. I tested the new version of qemu-kvm (0.12.5+dfsg-5+squeeze2) available on Squeeze which was notified in the DSA 2241-1, but the problem still persists. Regards, Daniel -- Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 15:34:52 up 4 days, 1:54, 9 users, load average: 0.07, 0.10, 0.04 signature.asc Description: Digital signature
Bug#627615: VM terminates when doing a live migration
On Monday, 23 May 2011 15:45:20 +0400, Michael Tokarev wrote: This is the output without daemonizing: *** glibc detected *** kvm: free(): invalid next size (fast): 0x09fad3c0 *** [...] Can we confirm that it is the same problem? If you need to do another test, please don't hesitate to ask me. Yes it's exactly this problem, you can check the other bug I mentioned - it shows this very memory corruption too. That's why I merged the two. I didn't even do a test of migration between two 64bit VMHost. In this scenario there is any problem? As I noted earlier, trying to migrate from Defiant (Debian GNU/Linux 5.0.8 with Linux 2.6.32-15~bpo50+1 and qemu-kvm 0.12.5+dfsg-3~bpo50+2) to SS01, this problem does not occur. Both installation are 32-bit, but the kernel in SS01 is amd64 and the kernel in Defiant is i686. Ie both are 32bit userspace with the difference that ss01 has a 64-bit kernel. The problem is there? Because versions of Linux and qemu-kvm look the same. This bug has 2 halves, one half is general 32bit migration issue (it does not actually work on 32bits, the fact it worked for you is pure luck), and second half is special case of 32bit userspace running on 64bit kernel (due to wrong kernel/user space communications). Both will be fixed in the upcoming squeeze version of qemu-kvm. From what I read in #625571, the problem was detected in 0.12.5+dfsg-5+squeeze1 (the same version as in ss01) and 0.12.0+dfsg-5 (Lenny backports?), but in the VMHost where didn't appear the problem, I am using 0.12.5+dfsg-3~bpo50+2 (Lenny backports). Perhaps 0.12.5+dfsg-3~bpo50+2 is not affected. You can try patching and rebuilding your qemu-kvm using patches in the package waiting upload. Unfortunately right now the anonscm.debian.org service (with http access to the git repository) does not work due to system maintenance. It's in git://git.debian.org/collab-maint/qemu-kvm.git, I extracted the patch into my site here: http://www.corpit.ru/mjt/tmp/fix-crash-in-migration-32-bit-51b0c6065a.diff This patch fixes both halves of the problem. Perfect! Thanks! Thanks for your reply. Regards, Daniel -- Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 17:51:22 up 3 days, 4:10, 10 users, load average: 0.00, 0.00, 0.00 signature.asc Description: Digital signature
Bug#627615: VM terminates when doing a live migration
On Monday, 23 May 2011 01:54:01 +0400, Michael Tokarev wrote: forcemerge 625571 627615 thanks 22.05.2011 22:11, Daniel Bareiro wrote: Package: qemu-kvm Version: 0.12.5+dfsg-5+squeeze1 Severity: important model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ -- System Information: Debian Release: 6.0.1 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) ss01:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 \ -net tap -daemonize -vnc :15 -k es -localtime -cdrom \ /mnt/systemrescuecd-x86-2.0.1.iso -monitor telnet:localhost:4055,server,nowait Destination: defiant:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 -net tap \ -daemonize -vnc :1 -k es -localtime -cdrom /mnt/systemrescuecd-x86-2.0.1.iso -monitor \ telnet:localhost:4041,server,nowait -incoming tcp:0:4455 Migration: ss01:~# telnet localhost 4055 Trying ::1... Connected to localhost. Escape character is '^]'. QEMU 0.12.5 monitor - type 'help' for more information (qemu) migrate -d tcp:10.1.0.65:4455 (qemu) Connection closed by foreign host. ss01:~# ps ax|grep systemrescuecd 15640 pts/0R+ 0:00 grep systemrescuecd When debugging don't enable daemonizing, instead run it in foreground to see what messages, if any, it prints. But this is, with a very good chance, #625571 - migration fails on 32bit userspace always. That bug is finally fixed, after more than 2 years, and is pending upload after we will sort out other, more important issues. This is the output without daemonizing: *** glibc detected *** kvm: free(): invalid next size (fast): 0x09fad3c0 *** === Backtrace: = /lib/i686/cmov/libc.so.6(+0x6b281)[0xf723e281] /lib/i686/cmov/libc.so.6(+0x6cad8)[0xf723fad8] /lib/i686/cmov/libc.so.6(cfree+0x6d)[0xf7242bbd] kvm[0x806f6e7] kvm[0x806f7d3] kvm[0x8051c85] kvm[0x8051e1b] kvm[0x810d3c7] kvm[0x8104fe9] kvm[0x8105e06] kvm[0x80529b0] kvm[0x806de64] kvm[0x8055a95] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xf71e9c76] kvm[0x804f3a1] === Memory map: 08048000-0823e000 r-xp 09:01 426838 /usr/bin/kvm 0823e000-0825 rw-p 001f6000 09:01 426838 /usr/bin/kvm 0825-0846 rw-p 00:00 0 09f6-09f89000 rw-p 00:00 0 09f89000-09f91000 rw-p 00:00 0 09f91000-0a081000 rw-p 00:00 0 0a081000-0a091000 rw-p 00:00 0 0a091000-0a16 rw-p 00:00 0 e440-e4421000 rw-p 00:00 0 e4421000-e450 ---p 00:00 0 e4586000-e45a3000 r-xp 09:01 16315 /lib/libgcc_s.so.1 e45a3000-e45a4000 rw-p 0001c000 09:01 16315 /lib/libgcc_s.so.1 e45a4000-e45a5000 ---p 00:00 0 e45a5000-e4da5000 rwxp 00:00 0 e4da5000-e4e06000 rw-p 00:00 0 e4e19000-e4f33000 rw-p 00:00 0 e4f33000-e5096000 r-xp 09:01 424347 /usr/lib/libdb-4.8.so e5096000-e5099000 rw-p 00163000 09:01 424347 /usr/lib/libdb-4.8.so e509e000-e50a2000 r-xp 09:01 440618 /usr/lib/sasl2/libsasldb.so.2.0.23 e50a2000-e50a3000 rw-p 4000 09:01 440618 /usr/lib/sasl2/libsasldb.so.2.0.23 e50a3000-e52a2000 rw-p 00:00 0 e52b5000-e52b6000 rw-p 00:00 0 e52b6000-e62b6000 rw-p 00:00 0 e62b6000-e62b8000 rw-p 00:00 0 e62b8000-e62d8000 rw-p 00:00 0 e62d8000-e62d9000 rw-p 00:00 0 e62fa000-e62fb000 rw-p 00:00 0 e62fb000-e631b000 rw-p 00:00 0 e631b000-e631d000 rw-p 00:00 0 e631d000-f631d000 rw-p 00:00 0 f631d000-f631e000 rw-p 00:00 0 f631e000-f631f000 ---p 00:00 0 f631f000-f6b1f000 rwxp 00:00 0 f6b1f000-f6b29000 r-xp 09:01 27073 /lib/i686/cmov/libnss_files-2.11.2.so f6b29000-f6b2a000 r--p 9000 09:01 27073 /lib/i686/cmov/libnss_files-2.11.2.so f6b2a000-f6b2b000 rw-p a000 09:01 27073 /lib/i686/cmov/libnss_files-2.11.2.so f6b2b000-f6b2e000 rw-p 00:00 0 f6b2e000-f6b33000 r-xp 09:01 425645 /usr/lib/libogg.so.0.7.0 f6b33000-f6b34000 rw-p 4000 09:01 425645 /usr/lib/libogg.so.0.7.0 f6b34000-f6b5b000 r-xp 09:01 425649 /usr/lib/libvorbis.so.0.4.4 f6b5b000-f6b5c000 rw-p 00026000 09:01 425649 /usr/lib/libvorbis.so.0.4.4 f6b5c000-f6b5d000 rw-p 00:00 0 f6b5d000-f6cc2000 r-xp 09:01 425652 /usr/lib/libvorbisenc.so.2.0.7 f6cc2000-f6cd3000 rw-p 00165000 09:01 425652 /usr/lib/libvorbisenc.so.2.0.7
Bug#627615: VM terminates when doing a live migration
23.05.2011 15:30, Daniel Bareiro wrpte: This is the output without daemonizing: *** glibc detected *** kvm: free(): invalid next size (fast): 0x09fad3c0 *** [...] Can we confirm that it is the same problem? If you need to do another test, please don't hesitate to ask me. Yes it's exactly this problem, you can check the other bug I mentioned - it shows this very memory corruption too. That's why I merged the two. As I noted earlier, trying to migrate from Defiant (Debian GNU/Linux 5.0.8 with Linux 2.6.32-15~bpo50+1 and qemu-kvm 0.12.5+dfsg-3~bpo50+2) to SS01, this problem does not occur. Both installation are 32-bit, but the kernel in SS01 is amd64 and the kernel in Defiant is i686. Ie both are 32bit userspace with the difference that ss01 has a 64-bit kernel. The problem is there? Because versions of Linux and qemu-kvm look the same. This bug has 2 halves, one half is general 32bit migration issue (it does not actually work on 32bits, the fact it worked for you is pure luck), and second half is special case of 32bit userspace running on 64bit kernel (due to wrong kernel/user space communications). Both will be fixed in the upcoming squeeze version of qemu-kvm. You can try patching and rebuilding your qemu-kvm using patches in the package waiting upload. Unfortunately right now the anonscm.debian.org service (with http access to the git repository) does not work due to system maintenance. It's in git://git.debian.org/collab-maint/qemu-kvm.git, I extracted the patch into my site here: http://www.corpit.ru/mjt/tmp/fix-crash-in-migration-32-bit-51b0c6065a.diff This patch fixes both halves of the problem. /mjt -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#627615: VM terminates when doing a live migration
Package: qemu-kvm Version: 0.12.5+dfsg-5+squeeze1 Severity: important -- Package-specific info: /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 75 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping: 2 cpu MHz : 2009.081 cache size : 512 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips: 4018.16 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 75 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping: 2 cpu MHz : 2009.081 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips: 4018.54 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc -- System Information: Debian Release: 6.0.1 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) Locale: LANG=es_AR.UTF-8, LC_CTYPE=es_AR.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to es_AR.UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages qemu-kvm depends on: ii adduser3.112+nmu2add and remove users and groups ii bridge-utils 1.4-5 Utilities for configuring the Linu ii iproute20100519-3networking and traffic control too ii libaio10.3.107-7 Linux kernel AIO access library - ii libasound2 1.0.23-2.1shared library for ALSA applicatio ii libbluetooth3 4.66-3Library to use the BlueZ Linux Blu ii libbrlapi0.5 4.2-7 braille display access via BRLTTY ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib ii libcurl3-gnutls7.21.0-1 Multi-protocol file transfer libra ii libgnutls262.8.6-1 the GNU TLS library - runtime libr ii libncurses55.7+20100313-5shared libraries for terminal hand ii libpci31:3.1.7-6 Linux PCI Utilities (shared librar ii libpulse0 0.9.21-3+squeeze1 PulseAudio client libraries ii libsasl2-2 2.1.23.dfsg1-7Cyrus SASL - authentication abstra ii libsdl1.2debian1.2.14-6.1Simple DirectMedia Layer ii libuuid1 2.17.2-9 Universally Unique ID library ii libvdeplug22.2.3-3 Virtual Distributed Ethernet - Plu ii libx11-6 2:1.3.3-4 X11 client-side library ii python 2.6.6-3+squeeze6 interactive high-level object-orie ii zlib1g 1:1.2.3.4.dfsg-3 compression library - runtime Versions of packages qemu-kvm recommends: ii linux-image-2.6.32-5-686 [lin 2.6.32-31 Linux 2.6.32 for modern PCs ii linux-image-2.6.32-5-amd64 [l 2.6.32-31 Linux 2.6.32 for 64-bit PCs Versions of packages qemu-kvm suggests: pn debootstrap none (no description available) pn samba none (no description available) pn vde2 none (no description available) -- Configuration Files: /etc/kvm/kvm-ifup changed: switch=$(ip route ls | \ awk '/^default / { for(i=0;iNF;i++) { if ($i == dev) { print $(i+1); exit; } } }' ) /sbin/ifconfig $1 0.0.0.0 up if [ -n $switch -a -d /sys/class/net/$switch/bridge/. ]; then /usr/sbin/brctl addif $switch $1 || : fi -- no debconf information First Test: === Starting the VM in the source VMHost and starting the VM on the destination VMHost with the exact same parameters as the VM on the source, in migration-listen mode: Source: ss01:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 \ -net tap -daemonize -vnc :15 -k es -localtime -cdrom \ /mnt/systemrescuecd-x86-2.0.1.iso -monitor telnet:localhost:4055,server,nowait Destination: defiant:~# kvm -m 256
Bug#627615: VM terminates when doing a live migration
forcemerge 625571 627615 thanks 22.05.2011 22:11, Daniel Bareiro wrote: Package: qemu-kvm Version: 0.12.5+dfsg-5+squeeze1 Severity: important model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ -- System Information: Debian Release: 6.0.1 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) ss01:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 \ -net tap -daemonize -vnc :15 -k es -localtime -cdrom \ /mnt/systemrescuecd-x86-2.0.1.iso -monitor telnet:localhost:4055,server,nowait Destination: defiant:~# kvm -m 256 -boot d -net nic,vlan=0,macaddr=52:54:67:92:9d:63 -net tap \ -daemonize -vnc :1 -k es -localtime -cdrom /mnt/systemrescuecd-x86-2.0.1.iso -monitor \ telnet:localhost:4041,server,nowait -incoming tcp:0:4455 Migration: ss01:~# telnet localhost 4055 Trying ::1... Connected to localhost. Escape character is '^]'. QEMU 0.12.5 monitor - type 'help' for more information (qemu) migrate -d tcp:10.1.0.65:4455 (qemu) Connection closed by foreign host. ss01:~# ps ax|grep systemrescuecd 15640 pts/0R+ 0:00 grep systemrescuecd When debugging don't enable daemonizing, instead run it in foreground to see what messages, if any, it prints. But this is, with a very good chance, #625571 - migration fails on 32bit userspace always. That bug is finally fixed, after more than 2 years, and is pending upload after we will sort out other, more important issues. ss01:~# telnet localhost 4055 Trying ::1... Connected to localhost. Escape character is '^]'. QEMU 0.12.5 monitor - type 'help' for more information (qemu) stop (qemu) migrate_set_speed 4095m (qemu) migrate exec:gzip -c STATEFILE.gz Connection closed by foreign host. ss01:~# ps ax|grep systemrescuecd 26564 pts/0S+ 0:00 grep systemrescuecd Again, this is a very bad idea to run it in backrgound when debugging. It's impossible to tell what exactly it is doing this way. /mjt -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org