[Bug 1131284] Re: Folsom erroneously destroys paused VMs
Will a fix be released for the folsom (2012.2) packages? That is the intent of the bug filing. Thanks! -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1131284/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1131284] Re: Folsom erroneously destroys paused VMs
Will a fix be released for the folsom (2012.2) packages? That is the intent of the bug filing. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1131284/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1131284] Re: Folsom erroneously destroys paused VMs
As mentioned in the description, the issue has been fixed in nova grizzly and backported to nova folsom. Sorry for any confusion. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1131284/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1131284] Re: Folsom erroneously destroys paused VMs
As mentioned in the description, the issue has been fixed in nova grizzly and backported to nova folsom. Sorry for any confusion. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1131284/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1131284] [NEW] Folsom erroneously destroys paused VMs
Public bug reported: Requesting to add upstream stable commit: https://github.com/openstack/nova/commit/7ace55fcf9e1b7fea074f6c0331b6feafbbc4178 reviewed here: https://review.openstack.org/#/c/20337/ and which addresses this upstream bug: https://bugs.launchpad.net/nova/+bug/1097806 (updated description of bug follows) Libvirt-managed qemu/KVM VMs can be paused outside of nova compute's workflow through a variety of means. * By issuing virsh suspend * By issuing virsh qemu-monitor-command '{execute : stop}' * By causing qemu to emit a STOP event, for example when attaching a GDB debugger and single-stepping * By connecting through an additional qemu monitor and issuing any commands that may cause qemu to emit a STOP event. Starting in Folsom (specifically https://github.com/openstack/nova/commit/129b87e17daeaa9e855a70dea51e6581ea63#L6R2502 i.e. commit 129b87e diff line 2502) nova compute will destroy a VM if libvirt reports it as paused and this doesn't fit nova compute's recorded state for the VM. While the original rationale is to destroy VMs that are paused by IO errors or KVM emulation errors, which would also cause qemu to emit STOP events. The problem is that this will also destroy VMs that are paused through a variety of valid reasons as outlined above. The problem is exacerbated by a Libvirt bug (https://bugzilla.redhat.com/show_bug.cgi?id=892791) which latches the state of a VM to paused even though the VM is running. The fix is already committed upstream (http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f28a39fd655574fe34f1265), as well as being integrated into Raring and triaged for backport into Precise: https://bugs.launchpad.net/bugs/1097824. Even with libvirt's bug fixed, there are still points in time at which nova-compute will check a VMs state, find it paused for a valid reason, and decide to erroneously destroy it. The fix is to either remove this behavior, or to further query libvirt for the paused reason, which will show conclusively whether the VM is effectively crashed, or just paused. ** Affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1131284/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1131284] [NEW] Folsom erroneously destroys paused VMs
Public bug reported: Requesting to add upstream stable commit: https://github.com/openstack/nova/commit/7ace55fcf9e1b7fea074f6c0331b6feafbbc4178 reviewed here: https://review.openstack.org/#/c/20337/ and which addresses this upstream bug: https://bugs.launchpad.net/nova/+bug/1097806 (updated description of bug follows) Libvirt-managed qemu/KVM VMs can be paused outside of nova compute's workflow through a variety of means. * By issuing virsh suspend * By issuing virsh qemu-monitor-command '{execute : stop}' * By causing qemu to emit a STOP event, for example when attaching a GDB debugger and single-stepping * By connecting through an additional qemu monitor and issuing any commands that may cause qemu to emit a STOP event. Starting in Folsom (specifically https://github.com/openstack/nova/commit/129b87e17daeaa9e855a70dea51e6581ea63#L6R2502 i.e. commit 129b87e diff line 2502) nova compute will destroy a VM if libvirt reports it as paused and this doesn't fit nova compute's recorded state for the VM. While the original rationale is to destroy VMs that are paused by IO errors or KVM emulation errors, which would also cause qemu to emit STOP events. The problem is that this will also destroy VMs that are paused through a variety of valid reasons as outlined above. The problem is exacerbated by a Libvirt bug (https://bugzilla.redhat.com/show_bug.cgi?id=892791) which latches the state of a VM to paused even though the VM is running. The fix is already committed upstream (http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f28a39fd655574fe34f1265), as well as being integrated into Raring and triaged for backport into Precise: https://bugs.launchpad.net/bugs/1097824. Even with libvirt's bug fixed, there are still points in time at which nova-compute will check a VMs state, find it paused for a valid reason, and decide to erroneously destroy it. The fix is to either remove this behavior, or to further query libvirt for the paused reason, which will show conclusively whether the VM is effectively crashed, or just paused. ** Affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1131284 Title: Folsom erroneously destroys paused VMs To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1131284/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] Re: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
Serge, additional (and unexpected!) motivation to include this patch http://www.redhat.com/archives/libvir-list/2013-January/msg01049.html Thanks Andres -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] Re: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
The backport needs a small tweak for Raring/1.0.0 ** Patch added: Backport for Raring Ringtail https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+attachment/3480721/+files/handle_resume_1.0.0-0ubuntu4.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] [NEW] Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
Public bug reported: If a qemu/KVM VM is paused through a monitor by manual issuing of the stop command, the state of the VM in libvirtd's view will transition to paused. This is because libvirtd listens to STOP events in the JSON monitor. However, libvirt does not listen to RESUME events on any monitor. So, when the VM is resumed by manually issuing cont, the internal state will remain as paused even though the VM is running. Libvirt maintains its internal view of the state in sync for migration, etc. But without listening to RESUME events it cannot correctly cope with third parties issuing stop commands (such as GDB, virsh qemu- monitor-command, or software opening another QMP monitor). This is verified to happen on Precise and Quantal's libvirt versions. Since it's a bug in upstream, I expect it to be faulty in Raring as well. The upshot in Openstack is that VMs, even though running, will be reported as paused to nova. Due to (https://bugs.launchpad.net/nova/+bug/1097806), nova compute will erroneously destroy them. This is a nova-compute problem that is exacerbated by this bug. Steps to Reproduce: # virsh list IdName State 1 instance-0020 running # virsh qemu-monitor-command 1 '{execute:stop}' {return:{},id:libvirt-10} # virsh list IdName State 1 instance-0020 paused # virsh qemu-monitor-command 1 '{execute:cont}' {return:{},id:libvirt-11} # virsh list IdName State 1 instance-0020 paused (the state should be running) Another way to reproduce this is by if attaching GDB to qemu and start single-stepping, libvirt will drop dozens RESUME events and be mightily confused. Client software like OpenStack will tag the VM as paused. Upstream: Reported to libvirt upstream: https://bugzilla.redhat.com/show_bug.cgi?id=892791 Fixed in libvirt's master git: http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f28a39fd655574fe34f1265 I will attach a backport of the master branch fix to 0.9.13-0ubuntu12~cloud0 ** Affects: libvirt (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] Re: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
** Patch added: handle_resume_0.9.13-0ubuntu12~cloud0.patch https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+attachment/3478189/+files/handle_resume_0.9.13-0ubuntu12%7Ecloud0.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] Re: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
With the above patch: # virsh list IdName State 1 instance-0022 running # virsh qemu-monitor-command 1 '{execute:stop}' {return:{},id:libvirt-12} # virsh list IdName State 1 instance-0022 paused # virsh qemu-monitor-command 1 '{execute:cont}' {return:{},id:libvirt-13} # virsh list IdName State 1 instance-0022 running -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1097824] Re: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever
Serge, no problem. What is the status for Raring? The upstream commit is not in 1.0.0. I bet the patch as is won't apply, should I rebase? Thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in paused state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs