** Description changed: - If a qemu/KVM VM is paused through a monitor by manual issuing of the - "stop" command, the state of the VM in libvirtd's view will transition - to "paused". This is because libvirtd listens to "STOP" events in the - JSON monitor. However, libvirt does not listen to RESUME events on any - monitor. So, when the VM is resumed by manually issuing "cont", the - internal state will remain as "paused" even though the VM is running. + ================================= + SRU Justification: + 1. Impact: if a Vm is paused over the monitor, and then resumed, libvirt will continue to report the running VM as paused. + 2. Development fix: add a hook to follow the resume event + 3. Stable fix: same as development fix + 4. Test case: see below + 5. Regression potential: an error in the hook could cause the above situation to cause a crash instead of libvirt following the VM resume. All regression tests passed with this fix. + ================================= + If a qemu/KVM VM is paused through a monitor by manual issuing of the "stop" command, the state of the VM in libvirtd's view will transition to "paused". This is because libvirtd listens to "STOP" events in the JSON monitor. However, libvirt does not listen to RESUME events on any monitor. So, when the VM is resumed by manually issuing "cont", the internal state will remain as "paused" even though the VM is running. Libvirt maintains its internal view of the state in sync for migration, etc. But without listening to RESUME events it cannot correctly cope with third parties issuing stop commands (such as GDB, virsh qemu- monitor-command, or software opening another QMP monitor). This is verified to happen on Precise and Quantal's libvirt versions. Since it's a bug in upstream, I expect it to be faulty in Raring as well. The upshot in Openstack is that VMs, even though running, will be reported as paused to nova. Due to (https://bugs.launchpad.net/nova/+bug/1097806), nova compute will erroneously destroy them. This is a nova-compute problem that is exacerbated by this bug. Steps to Reproduce: # virsh list - Id Name State + Id Name State ---------------------------------------------------- - 1 instance-00000020 running + 1 instance-00000020 running # virsh qemu-monitor-command 1 '{"execute":"stop"}' {"return":{},"id":"libvirt-10"} # virsh list - Id Name State + Id Name State ---------------------------------------------------- - 1 instance-00000020 paused + 1 instance-00000020 paused # virsh qemu-monitor-command 1 '{"execute":"cont"}' {"return":{},"id":"libvirt-11"} # virsh list - Id Name State + Id Name State ---------------------------------------------------- - 1 instance-00000020 paused + 1 instance-00000020 paused (the state should be "running") Another way to reproduce this is by if attaching GDB to qemu and start single-stepping, libvirt will drop dozens RESUME events and be mightily confused. Client software like OpenStack will tag the VM as paused. Upstream: Reported to libvirt upstream: https://bugzilla.redhat.com/show_bug.cgi?id=892791 Fixed in libvirt's master git: http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f266668a39fd655574fe34f1265 I will attach a backport of the master branch fix to 0.9.13-0ubuntu12~cloud0
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1097824 Title: Libvirt does not follow RESUME qemu monitor events. VMs remain in "paused" state forever To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1097824/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
