Public bug reported:
== Comment: #0 - Frank P. Novak <[email protected]> - 2016-06-29
18:12:57 ==
libvirt support for QEMU live patching..
QEMU hot-patching
Should be just {sic} a matter of updating the RPM package and, then, live
migrating the VMs to another QEMU instance on the same host (which would point
to the just installed new QEMU executable).
It is typcial for a software deployed in production to undergo multiple updates
during its lifecycle. Updates typically are done to fix bugs, apply security
fixes and add new features. Many a times, these updates involve some level of
downtime for business as the software deployment needs to be brought down,
newer versions need to be installed and it needs to be brought up
again live. If this downtime is not tolerable, then the software deployment has
to be updated live without bringing it down at all. This method of patching the
running software instance to mostly fix bugs is usually referred to as hot
patching.
QEMU hot-patching
QEMU is an important component in KVM virtualization that provides
virtual machine environment for the virtual machines (VM) or guests.
QEMU does much of the heavy lifting in KVM virtualization and is used by
the user (typically by libvirt or direct user) with plethora of command
line options to start a virtual machine of a given flavour and required
configuration.
Updating a running instance of QEMU to a newer version with the virtual
machine experiencing near zero downtime is now possible with the recent
developments in QEMU as well as Linux Kernel. This is extremely useful
in virtualization deployments where the software solution deployed on
virtual machines will not be aware of the updates that are done to the
underlying hypervisor infrastructure (QEMU).
QEMU hot patching can essentially be achieved by live migrating a VM
running on a buggy instance of QEMU to an instance that has fixes
applied. However this has two problems:
- Migration involves copying over the entire RAM of the guest from source to
destination instance and this can take quite a while to complete.
- If guest RAM is getting dirtied at a fast rate, there can be situations where
the migration just doesn't complete as it keeps copying over the newly
dirtied RAM pages from source to the destination.
These two issues have been addressed by a new migration technique in QEMU
called the postcopy migration.
Userfaultfd and postcopy migration
Page faults have always been handled entirely by the Kernel, but a new
system call named userfaultfd added to Linux Kernel changes it.
userfaultfd allows for memory externalization: a user space program can
create a file descriptor (FD) using this call and register regions of
process' virtual address space for which page faults can be handled by
user space by talking to the Kernel using the userfaultfd protocol.
One of the main motivations for this system call was to allow QEMU to
achieve what is called postcopy kind of VM migration. Live migrating a
VM from source host to destination host involves among other things,
copying the contents of guest RAM from source to destination host over
the network. Since the guest RAM can keep changing, copying becomes an
iterative process which continues until a convergence is reached at
which point the VM becomes live at the destination and is brought down
at the source. If the guest memory is being dirtied at a rate which is
more than the rate at which guest memory pages are transferred from
source of migration, the live migration would never terminate and this
is where postcopy migration helps.
In postcopy migration, VM becomes available at the destination as soon as some
minimal required data is transferred from source to destination. Bulk of the
guest memory is transferred on demand from source to destination whenever guest
requires those pages. QEMU registers the RAM areas of the guest with
userfaultfd and whenever there is page fault in the destination host for any
page from guest's RAM space, the required page is brought over from source
host. Thus postcopy migration in QEMU uses userfaultfd system call to achive a
quick and time bound live migration of guest.
QEMU hot patching using postcopy migration
The quick and time bound way to migrate a VM between two hosts using postcopy
migration opens up an interesting use case when the source and the destination
hosts are the same. If postcopy migration is used to migrate a guest from one
instance to another within the same host, we could effectively achieve QEMU
hot-patching. If VM is being run using QEMU which has a known bug, then we
could start another instance with updated QEMU (incoming instance) and migrate
the VM from the source to the destination instance on the same host using
postcopy migration. Given that postcopy migration doesn't copy all the guest
RAM upfront, we can migrate the VM from one instance to another with
practically zero downtime. So we can use postcopy migration within the same
host to dynamically update the QEMU thus achieving QEMU hot-patching.
Pre requisites
- userfaultfd system call is part of Linux Kernel and will be available
from kernel version 4.3 onwards.
- Postcopy migration support is under development and will hopefully be
available from QEMU-2.5 onwards.
Steps involved
- Assume a VM is running on a buggy instance of qemu.
# qemu-system-ppc64 ....
- Start a new incoming instance on the same host.
# qemu-system-ppc64 .... -incoming tcp:localhost:4444
- Post-copy migrate the exising VM
(qemu) migrate_set_capability x-postcopy-ram on
(qemu) migrate -d tcp:localhost:4444
(qemu) migrate_start_postcopy
- After this the VM will be up and running with the fixed version of QEMU.
Limitations
- The destination instance of QEMU that we use can't vary widely from the
source instance and should conform to all the requirements that should be met
for VM migration.
- Support for postcopy migration isn't present in libvirt yet. Even when it is
supported, libvirt doesn't support the notion of VM migration within the same
host. Some development effort in addition to just postcopy migration is needed
in libvirt to support hot-patching.
** Affects: ubuntu-power-systems
Importance: Undecided
Assignee: Canonical Server Team (canonical-server)
Status: New
** Affects: libvirt (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New
** Tags: architecture-ppc64le bugnameltc-143246 severity-high
targetmilestone-inin1710
** Tags added: architecture-ppc64le bugnameltc-143246 severity-high
targetmilestone-inin1710
** Changed in: ubuntu
Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage
(ubuntu-power-triage)
** Package changed: ubuntu => libvirt (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1692457
Title:
[17.10 FEAT] libvirt: libvirt support for QEMU live patching
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1692457/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs