[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
If I've got comment 27 right, the issue has also been fixed upstream, so I'm setting the status now to "Fix released". If there's still something left to do here, feel free to change it again. ** Changed in: qemu Status: New => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: Fix Released Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Fix Released Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.25 --- qemu (2.0.0+dfsg-2ubuntu1.25) trusty; urgency=medium [Kai Storbeck] * backport patch to fix guest hangs after live migration (LP: #1297218) -- Serge HallynFri, 01 Jul 2016 14:25:20 -0500 ** Changed in: qemu (Ubuntu Trusty) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Fix Released Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Awesome- thanks for verifying ** Tags removed: verification-needed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Fix Committed Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Hello Chris, Steps taken to test the proposed package: 1) enabled trusty-proposed 2) installed qemu-system-arm qemu-system-common qemu-system-misc qemu-system-x86 qemu-user version 2.0.0+dfsg-2ubuntu1.25 3) again on a second trusty14.04 server 4) migrate 41 running VM's (uptimes vary between 1 and 142 days) between 2 upgraded hosts, several times. Result: I haven't observed any problems after 168 live migrations. Hypervisor OS: ubuntu 14.04 (trusty): ceph: 0.94.7-1trusty qemu: 2.0.0+dfsg-2ubuntu1.25 linux: 3.13.0.92.99 libvirt: 1.2.2-0ubuntu13.1.17 VMs are a mix of debian-jessie and debian-wheezy. ** Tags added: verification-done -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Fix Committed Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Hello Paul, or anyone else affected, Accepted qemu into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg- 2ubuntu1.25 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: qemu (Ubuntu Trusty) Status: Confirmed => Fix Committed ** Tags added: verification-needed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Fix Committed Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
No, I'm afraid not. But if you can test when this package is accepted into trusty-proposed that'll be great. ** Description changed: + = + SRU Justification: + 1. Impact: guests hang after live migration with 100% cpu + 2. Upstream fix: a set of four patches fix this upstream + 3. Stable fix: we have a backport of the four patches into a single patch. + 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) + 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. + = + We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). - The interconnect between both machines (both for migration and gluster) is 10GbE. + The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. - The XML definition of the guests only contains: + The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. - --- + --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: - + _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: = SRU Justification: 1. Impact: guests hang after live migration with 100% cpu 2. Upstream fix: a set of four patches fix this upstream 3. Stable fix: we have a backport of the four patches into a single patch. 4. Test case: try a set of migrations of different VMS (it is unfortunately not 100% reproducible) 5. Regression potential: the patch is not trivial, however the lp:qa-regression-tests testsuite passed 100% with this package. = We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
We have four identical Ubuntu servers running libvirt/kvm/qemu, with access to CEPH rbd filesystems. Guests can be live migrated between them. However, live migration leads in ~30% of the cases to guests being stuck at 100%. Only a few times we had the patience to wait, and upon logging in our passwords were said to be expired; thus we couldn't log in. This happened mostly to long running VM's, but not all of them; this makes debugging hard. These servers run ubuntu 14.04 (trusty): ceph:0.94.7-1trusty qemu:2.0.0+dfsg-2ubuntu1.22 linux: 3.13.0.88.94 libvirt: 1.2.2-0ubuntu13.1.16 Is this what you request? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Conflicting experimental packages in that ppa, trying ubuntu-virt/ppa instead. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Thank you. I'm doing a test build in ppa:serge-hallyn/virt, and will run a full regression test from there. I'll push for SRU if that passes. Would you mind putting in the bug Description (at top) a concise summary of the test case, for the SRU process? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
I can reasonably assume that this solved my problem. I've live migrated 41 VM's 5 times between 2 hypervisors without the 100% cpu problem appearing. My production servers run 2.0.0+dfsg-2ubuntu1.22, and still observe the same problem. Attached is the patch that I created with quilt in debian/patches; This one mirrors the 4 patches that are listed in the debian bugreport[1]. The patch should apply cleanly with qemu 2.0.0+dfsg-2ubuntu1.24 (from trusty-security). Let's hope others can benefit from an ubuntu update :) [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789 ** Patch added: "backport-fixtime.patch" https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1297218/+attachment/469/+files/backport-fixtime.patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
That patch does not apply cleanly on qemu (2.0.0+dfsg, from trusty). There are changes in "kvmclock_pre_save" and "kvmclock_post_save", there's only "kvmclock_vm_state_change" in 2.0.0. Peeking at the 4 referenced patches on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789 the code changes are "about the same", so I'll concatenate those 4 and build that. Thanks, -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
See https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1297218/+attachment/4301780/+files/backport.patch referenced in comment #29 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
@serge, I'd be happy to test each of the patches, but considering the length of this page I'd like an exact link to a patch and/or patches that need to be tested, and against which version (trusty-security i suppose?). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
@Steve, it seems to me those are the same as the 'backport.patch' from an earlier comment? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
In the absence of any progress on this, and suddenly remembering that I'm occasionally affected by this, I did some digging and found this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789 Looks like those four patches might be the solution? ** Bug watch added: Debian Bug tracker #786789 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
But unfortunately we do not know which patch fixed it, making an SRU much more problematic. Someone who is able to reproduce the bug would need to try to either bisect, or make educated guesses and test patch cherrypicks. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Thanks - marked fixed released for development release. We can SRU this into trusty if we know exactly which patch actualy fixed it. ** Changed in: qemu (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Thanks for looking into this. We started experimenting with live migration on 14.04 and stumbled over this bug. As a workaround we've installed qemu from the Ubuntu Cloud archive (https://wiki.ubuntu.com/ServerTeam/CloudArchive). I can confirm this bug is fixed in qemu-kvm:amd64/trusty-updates 1:2.2+dfsg-5expubuntu9.3~cloud0 An SRU for 14.04 would be nice. Thanks in advance. ** Changed in: qemu (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Confirmed Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Could someone confirm whether this is fixed in 15.04 and/or 15.10? ** Changed in: qemu (Ubuntu) Status: Triaged = Incomplete -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Incomplete Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
I'm sorry, but I'm not clear at this point on the status of this bug. I never received an answer to comments #32 and comment #35, and don't know what, if anything, to apply in an SRU. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
I believe this should be fixed in 15.04, as the cited patches are present. Could someone confirm? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: glusterfs (Ubuntu Trusty) Status: New = Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Ping. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Hi Serge, Yes, that's the case. Let me also make it clear that this is a backport on top of qemu 1.2 stable. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Hi, just to be clear, the backport.patch which you uploaded actually increased jitter for you, making the situation worse, right? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Hi, I've seen some strange time behavior in some of our VMs usually triggered by live migration. In some VMs we have seen some significant time drift which NTP was not able to correct after doing a live migration. I've not been able so far to reproduce the same case, however, I did notice that live migration does introduce some increase in clock jitter values, and I am not sure if that might have anything to do with any significant time drift. Here is an example of a CentOS 6 guest running under qemu 1.2 before doing a live migration: [root@centos ~]# ntpq -pcrv remote refid st t when poll reach delay offset jitter == +helium.constant 18.26.4.105 2 u 65 64 377 60.539 -0.011 0.554 -209.118.204.201 128.9.176.30 2 u 47 64 377 15.750 -1.835 0.388 *time3.chpc.utah 198.60.22.2402 u 46 64 377 30.5853.934 0.253 +dns2.untangle.c 216.218.254.202 2 u 21 64 377 22.1962.345 0.740 associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync, version=ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1), processor=x86_64, system=Linux/2.6.32-504.3.3.el6.x86_64, leap=00, stratum=3, precision=-21, rootdelay=32.355, rootdisp=53.173, refid=155.101.3.115, reftime=d86264f3.444c75e7 Thu, Jan 15 2015 16:10:27.266, clock=d86265ed.10a34c1c Thu, Jan 15 2015 16:14:37.064, peer=3418, tc=6, mintc=3, offset=0.000, frequency=2.863, sys_jitter=2.024, clk_jitter=2.283, clk_wander=0.000 [root@centos ~]# ntpdc -c kerninfo pll offset: 0 s pll frequency:2.863 ppm maximum error:0.19838 s estimated error: 0.002282 s status: 2001 pll nano pll time constant:6 precision:1e-09 s frequency tolerance: 500 ppm Immediately after live migration, you can see that there is an increase in jitter values: [root@centos ~]# ntpq -pcrv remote refid st t when poll reach delay offset jitter == -helium.constant 18.26.4.105 2 u 59 64 377 60.556 -0.916 31.921 +209.118.204.201 128.9.176.30 2 u 38 64 377 15.717 28.879 12.220 +time3.chpc.utah 132.163.4.1032 u 45 64 353 30.6393.240 26.975 *dns2.untangle.c 216.218.254.202 2 u 17 64 377 22.248 33.039 11.791 associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync, version=ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1), processor=x86_64, system=Linux/2.6.32-504.3.3.el6.x86_64, leap=00, stratum=3, precision=-21, rootdelay=25.086, rootdisp=83.736, refid=74.123.29.4, reftime=d8626838.47529689 Thu, Jan 15 2015 16:24:24.278, clock=d8626849.4920018a Thu, Jan 15 2015 16:24:41.285, peer=3419, tc=6, mintc=3, offset=24.118, frequency=11.560, sys_jitter=15.145, clk_jitter=8.056, clk_wander=2.757 [root@centos ~]# ntpdc -c kerninfo pll offset: 0.0211957 s pll frequency:11.560 ppm maximum error:0.112523 s estimated error: 0.008055 s status: 2001 pll nano pll time constant:6 precision:1e-09 s frequency tolerance: 500 ppm The increase in the jitter and offset values is well within the 500 ppm frequency tolerance limit, and therefore are easily corrected by subsequent NTP clock sync events, but some live migrations do cause much higher jitter and offset jumps, which can not be corrected by NTP and cause the time to go way off. Any idea why this is the case? I've tried backporting the patches (9a48bcd1b82494671c09b0eefdb882581499 and 317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c) on top of upstream qemu 1.2, but it actually caused even higher jitter in the order of 100+ ppm. Any idea what I might be missing? ** Patch added: backport.patch https://bugs.launchpad.net/qemu/+bug/1297218/+attachment/4301780/+files/backport.patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
The attachment backport.patch seems to be a patch. If it isn't, please remove the patch flag from the attachment, remove the patch tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team. [This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.] ** Tags added: patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
It looks as though the relevant commits were re-committed to upstream git HEAD (9a48bcd1b82494671c09b0eefdb882581499 and 317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c). So this may be fixed in vivid, and we might be able to cherrypick the final patches to trusty. ** Package changed: libvirt (Ubuntu) = qemu (Ubuntu) ** Also affects: qemu (Ubuntu Trusty) Importance: Undecided Status: New ** Also affects: glusterfs (Ubuntu Trusty) Importance: Undecided Status: New ** Changed in: qemu (Ubuntu Trusty) Status: New = Triaged ** Changed in: qemu (Ubuntu Trusty) Status: Triaged = Confirmed ** Changed in: qemu (Ubuntu Trusty) Importance: Undecided = High -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
@Paul, could you confirm whether qemu 1:2.2+dfsg-3exp~ubuntu1 from https://launchpad.net/~ubuntu-virt/+archive/ubuntu/virt-daily-upstream fixes this issue? If it does then I'll go ahead and backport the patch. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: New Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Andrey: the bug also occurs when not using '--copy-storage-inc'. I originally encountered the bug on a pair of servers that share a glusterfs filesystem. As part of the debugging effort, I took glusterfs out of the equation to show that it is not the cause of the issue. My test envirement is therefore currently setup with --copy-storage-inc, but my production environment uses glusterfs, and has the same issue. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Another test with NTP disabled on the servers (but enabled on the guest): Still running qemu-git-2.1.0-rc2-git-20140721 server-a:~$ ntpdate -q cl0 stratum 3, offset -29.405612, delay 0.02597 server-b$ ntpdate -q cl0 stratum 3, offset -32.990292, delay 0.02597 The guest is running NTP, hosted on server-b for more than 10 days. Its NTP reports:. ntp.drift=-35.387 ppm. This matches the offset of server-b (32.99 seconds in 10 days, 19h21) very well: -35.33 ppm. Doing a live migration of the guest from server-b to server-a: Guest does not hang at all (NTP offset after migration: 0.38s). So if NTP is not running on the hosts, then there are no issues with the guest, it seems. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Andrey, I don't quite understand (I suspect Paul didn't quite understand) what you wanted tested. Could you please rephrase, as specifically as possible? Do you want Paul to verify that the packages with the patchset still work with --copy-storage-inc enabled? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Paul, do I understand right that: 1. disabling ksm on the hosts always fixes the pause on migration 2. disabling ksm on the host is not needed with the patchset by Alexander Graf? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
As another test (still running qemu-git-2.1.0-rc2-git-20140721), I disabled NTP on the two servers (and rebooted them), but left it running on the guest. When doing the migration, server a (where the guest was running) had an NTP offset of -3.037619 s, and server b was at -3.337718 s. The guest was nicely synchronized before the migration, but afterwards had a clock offset of 0.349590 s, which roughly corresponds to the difference in offsets. The small NTP offset on the guest after the migration implies that it did briefly freeze, but too short to notice. I'll leave it running for longer to be able to confirm this with sufficient accuracy. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
Re: [Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
On 29/07/2014 17:45, Paul Boven wrote: https://bugs.launchpad.net/bugs/1297218 ... Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. I've seen this a couple of times, though under odd circumstances. If it's useful to repeat it, here's what I was doing. * Run a host with ntpd (in my case the host was 12.04 running in a VMware fusion (sorry) VM on my Mac with VMware tools installed, which also syncs time), with nested virtualisation permitted * Run 14.04 in two lxc containers which are fully privileged and also running ntp * Run qemu in the two containers (with kvm), and migrate from one to the other and back repeatedly. I was using a cirros image. Given that there are 3 versions of ntp and vmware tools all fighting over synchronizing one virtual clock, this configuration is clearly broken timewise, so problems were not unexpected. But it might help someone debug it. Once, eventually the guest broke and gave an oops type thing which I didn't even bother to copy down - repeated lines showing it trying to sync up clocks by increasing numbers of microseconds and finally giving up. -- Alex Bligh
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Serge, Paul As I can see all the cases in this ticket involves storage blkcopy with --copy-storage-inc. My initial reason for rolling this patchset back was a freeze on p2p migration without this flag being set. Although I am hitting both of the problems, and cumulative after-migration delay hitting me for almost a year, I prefer to live with it instead of observing guest with completely stuck I/O. If possible, please confirm if bug disappears for migration without flag from above in your case. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
** Also affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in “glusterfs” package in Ubuntu: Invalid Status in “libvirt” package in Ubuntu: Triaged Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: clock offset='utc'/ Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions