Public bug reported:

== Comment: #0 - Frank P. Novak <[email protected]> - 2016-06-29
18:12:57 ==

libvirt support for QEMU live patching..

 QEMU hot-patching
Should be just {sic} a matter of updating the RPM package and, then, live 
migrating the VMs to another QEMU instance on the same host (which would point 
to the just installed new QEMU executable).

It is typcial for a software deployed in production to undergo multiple updates 
during its lifecycle. Updates typically are done to fix bugs, apply security 
fixes and add new features. Many a times, these updates involve some level of 
downtime for business as the software deployment needs to be brought down, 
newer versions need to be installed and it needs to be brought up
again live. If this downtime is not tolerable, then the software deployment has 
to be updated live without bringing it down at all. This method of patching the 
running software instance to mostly fix bugs is usually referred to as hot 
patching.
QEMU hot-patching

QEMU is an important component in KVM virtualization that provides
virtual machine environment for the virtual machines (VM) or guests.
QEMU does much of the heavy lifting in KVM virtualization and is used by
the user (typically by libvirt or direct user) with plethora of command
line options to start a virtual machine of a given flavour and required
configuration.

Updating a running instance of QEMU to a newer version with the virtual
machine experiencing near zero downtime is now possible with the recent
developments in QEMU as well as Linux Kernel. This is extremely useful
in virtualization deployments where the software solution deployed on
virtual machines will not be aware of the updates that are done to the
underlying hypervisor infrastructure (QEMU).

QEMU hot patching can essentially be achieved by live migrating a VM
running on a buggy instance of QEMU to an instance that has fixes
applied. However this has two problems:

- Migration involves copying over the entire RAM of the guest from source  to 
destination instance and this can take quite a while to complete.
- If guest RAM is getting dirtied at a fast rate, there can be situations where 
the migration just doesn't complete as it keeps copying over the   newly 
dirtied RAM pages from source to the destination.

These two issues have been addressed by a new migration technique in QEMU 
called the postcopy migration.
Userfaultfd and postcopy migration

Page faults have always been handled entirely by the Kernel, but a new
system call named userfaultfd added to Linux Kernel changes it.
userfaultfd allows for memory externalization: a user space program can
create a file descriptor (FD) using this call and register regions of
process' virtual address space for which page faults can be handled by
user space by talking to the Kernel using the userfaultfd protocol.

One of the main motivations for this system call was to allow QEMU to
achieve what is called postcopy kind of VM migration. Live migrating a
VM from source host to destination host involves among other things,
copying the contents of guest RAM from source to destination host over
the network. Since the guest RAM can keep changing, copying becomes an
iterative process which continues until a convergence is reached at
which point the VM becomes live at the destination and is brought down
at the source. If the guest memory is being dirtied at a rate which is
more than the rate at which guest memory pages are transferred from
source of migration, the live migration would never terminate and this
is where postcopy migration helps.

In postcopy migration, VM becomes available at the destination as soon as some 
minimal required data is transferred from source to destination. Bulk of the 
guest memory is transferred on demand from source to destination whenever guest 
requires those pages. QEMU registers the RAM areas of the guest with 
userfaultfd and whenever there is page fault in the destination host for any 
page from guest's RAM space, the required page is brought over from source 
host. Thus postcopy migration in QEMU uses userfaultfd system call to achive a 
quick and time bound live migration of guest.
QEMU hot patching using postcopy migration

The quick and time bound way to migrate a VM between two hosts using postcopy 
migration opens up an interesting use case when the source and the destination 
hosts are the same. If postcopy migration is used to migrate a guest from one 
instance to another within the same host, we could effectively achieve QEMU 
hot-patching. If VM is being run using QEMU which has a known bug, then we 
could start another instance with updated QEMU (incoming instance) and migrate 
the VM from the source to the destination instance on the same host using
postcopy migration. Given that postcopy migration doesn't copy all the guest 
RAM upfront, we can migrate the VM from one instance to another with 
practically zero downtime. So we can use postcopy migration within the same 
host to dynamically update the QEMU thus achieving QEMU hot-patching.
Pre requisites

- userfaultfd system call is part of Linux Kernel and will be available
from kernel version 4.3 onwards.

- Postcopy migration support is under development and will hopefully be 
available from QEMU-2.5 onwards.
Steps involved

- Assume a VM is running on a buggy instance of qemu.
  # qemu-system-ppc64 ....
- Start a new incoming instance on the same host.
  # qemu-system-ppc64 .... -incoming tcp:localhost:4444
- Post-copy migrate the exising VM
  (qemu) migrate_set_capability x-postcopy-ram on
  (qemu) migrate -d tcp:localhost:4444
  (qemu) migrate_start_postcopy
- After this the VM will be up and running with the fixed version of QEMU.
Limitations

- The destination instance of QEMU that we use can't vary widely from  the 
source instance and should conform to all the requirements that  should be met 
for VM migration.
- Support for postcopy migration isn't present in libvirt yet. Even when  it is 
supported, libvirt doesn't support the notion of VM migration  within the same 
host. Some development effort in addition to just  postcopy migration is needed 
in libvirt to support hot-patching.

** Affects: ubuntu-power-systems
     Importance: Undecided
     Assignee: Canonical Server Team (canonical-server)
         Status: New

** Affects: libvirt (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-143246 severity-high 
targetmilestone-inin1710

** Tags added: architecture-ppc64le bugnameltc-143246 severity-high
targetmilestone-inin1710

** Changed in: ubuntu
     Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

** Package changed: ubuntu => libvirt (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1692457

Title:
  [17.10 FEAT] libvirt: libvirt support for QEMU live patching

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1692457/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to