Hello Antonio (@arcimboldo) The fix only makes sense for newer QEMUs (>= Xenial, like the one from Mitaka Ubuntu Cloud Archive).
OBS: The "migration check" is done in VHOST initialization functions when the devices are virtually attached to the virtual machine. If you are using kernel 3.13 and have apparmor enabled, then all the running instances have the "migration blocker" ON - because of this buggy migration check - and won't be able to live migration. Unfortunately there is a "in-memory" linked list telling qemu that is has a blocker (with the reason). This blocker was added during instance startup and will be checked/used only when instance is live-migrated. Check this: http://pastebin.ubuntu.com/23517175/ If you started the instance in a host not running apparmor (or not having libvirt profile loaded, for example) it won't block the creation of /tmp/memfd-XXX files during instance initialization. That won't trigger the "blocker flag" inside the running program and, if/when needed, the live migration will be able to occur. This means that, after installing the new package, if you're using apparmor, yes, you would have to RESTART running instances that were affected by this bug in order to live migrating them. Sorry for the bad news! Even if you remove the apparmor rules, the migration blocker is already set. Hacking your process virtual memory would jeopardize the contents of the virtual memory (could be catastrophic specially for a virtual machine). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Committed Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1626972/+subscriptions