Re: [PATCH v3 3/4] failover: really display a warning when the primary device is not found
On Fri, Feb 12, 2021 at 02:52:49PM +0100, Laurent Vivier wrote: In failover_add_primary(), we search the id of the failover device by scanning the list of the devices in the opts list to find a device with a failover_pair_id equals to the id of the virtio-net device. If the failover_pair_id is not found, QEMU ignores the primary device silently (which also means it will not be hidden and it will be enabled directly at boot). After that, we search the id in the opts list to do a qdev_device_add() with it. The device will be always found as otherwise we had exited before, and thus the warning is never displayed. Fix that by moving the error report to the first exit condition. Also add a g_assert() to be sure the compiler will not complain about a possibly NULL pointer. Signed-off-by: Laurent Vivier --- hw/net/virtio-net.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) Thank you Laurent! Reviewed-by: Jens Freimann regards, Jens
Re: [PATCH 2/2] virtio-net: add missing object_unref()
On Sat, Feb 06, 2021 at 01:39:55PM +0100, Laurent Vivier wrote: failover_add_primary() calls qdev_device_add() and doesn't unref the device. Because of that, when the device is unplugged a reference is remaining and prevents the cleanup of the object. This prevents to be able to plugin back the failover primary device, with errors like: (qemu) device_add vfio-pci,host=:41:00.0,id=hostdev0,bus=root.3,failover_pair_id=net0 (qemu) device_del hostdev0 We can check with "info qtree" and "info pci" that the device has been removed, and then: (qemu) device_add vfio-pci,host=:41:00.0,id=hostdev1,bus=root.3,failover_pair_id=net0 Error: vfio :41:00.0: device is already attached (qemu) device_add vfio-pci,host=:41:00.0,id=hostdev0,bus=root.3,failover_pair_id=net0 qemu-kvm: Duplicate ID 'hostdev0' for device Fixes: 21e8709b29cd ("failover: Remove primary_dev member") Cc: quint...@redhat.com Signed-off-by: Laurent Vivier --- hw/net/virtio-net.c | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Jens Freimann Thank you Laurent! regards, Jens
Re: [PATCH 1/2] pci: cleanup failover sanity check
On Sat, Feb 06, 2021 at 01:39:54PM +0100, Laurent Vivier wrote: Commit a1190ab628 has added a "allow_unplug_during_migration = true" at the end of the main "if" block, so it is not needed to set it anymore in the previous checking. Remove it, to have only sub-ifs that check for needed conditions and exit if one fails. Fixes: 4f5b6a05a4e7 ("pci: add option for net failover") Fixes: a1190ab628c0 ("migration: allow unplug during migration for failover devices") Cc: jfreim...@redhat.com Signed-off-by: Laurent Vivier --- hw/pci/pci.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) Reviewed-by: Jens Freimann Thank you Laurent! regards, Jens
Re: Documents not in sphinx toctree
On Thu, Nov 05, 2020 at 04:19:00PM +, Peter Maydell wrote: On Thu, 5 Nov 2020 at 16:14, Marc-André Lureau wrote: By running sphinx over the docs/ directory (like readthedocs.org presumably does), it finds a couple of rst documents that are not referenced: - cpu-hotplug.rst - microvm.rst - pr-manager.rst - virtio-net-failover.rst Given the current structure of the content in https://qemu.readthedocs.io/en/latest/, would adding this as a new bullet in "QEMU System Emulation User’s Guide" be the right thing to do? regards, Jens
Re: [PULL 1/2] error: Remove NULL checks on error_propagate() calls (again)
On Fri, Oct 09, 2020 at 08:48:57AM +0200, Markus Armbruster wrote: Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \ --macro-file scripts/cocci-macro-file.h \ --use-gitgrep . Cc: Jens Freimann Cc: Hailiang Zhang Cc: Juan Quintela Signed-off-by: Markus Armbruster Message-Id: <20200722084048.1726105-4-arm...@redhat.com> Reviewed-by: Eric Blake --- hw/net/virtio-net.c | 8 ++-- migration/colo.c | 4 +--- migration/migration.c | 8 ++-- 3 files changed, 5 insertions(+), 15 deletions(-) Reviewed-by: Jens Freimann
Re: [PATCH] virtiofsd: avoid /proc/self/fd tempdir
On Tue, Oct 06, 2020 at 10:58:26AM +0100, Stefan Hajnoczi wrote: In order to prevent /proc/self/fd escapes a temporary directory is created where /proc/self/fd is bind-mounted. This doesn't work on read-only file systems. Avoid the temporary directory by bind-mounting /proc/self/fd over /proc. This does not affect other processes since we remounted / with MS_REC | MS_SLAVE. /proc must exist and virtiofsd does not use it so it's safe to do this. Path traversal can be tested with the following function: static void test_proc_fd_escape(struct lo_data *lo) { int fd; int level = 0; ino_t last_ino = 0; fd = lo->proc_self_fd; for (;;) { struct stat st; if (fstat(fd, &st) != 0) { perror("fstat"); return; } if (last_ino && st.st_ino == last_ino) { fprintf(stderr, "inode number unchanged, stopping\n"); return; } last_ino = st.st_ino; fprintf(stderr, "Level %d dev %lu ino %lu\n", level, (unsigned long)st.st_dev, (unsigned long)last_ino); fd = openat(fd, "..", O_PATH | O_DIRECTORY | O_NOFOLLOW); level++; } } Before and after this patch only Level 0 is displayed. Without /proc/self/fd bind-mount protection it is possible to traverse parent directories. Fixes: 397ae982f4df4 ("virtiofsd: jail lo->proc_self_fd") Cc: Miklos Szeredi Cc: Jens Freimann Signed-off-by: Stefan Hajnoczi Thanks Stefan, it fixes the problem we had! Tested-by: Jens Freimann Reviewed-by: Jens Freimann regards, Jens
Re: [RFC 0/1] tools/virtiofsd: don't create temporary directory in /
On Thu, Oct 01, 2020 at 08:15:18AM +0200, Jens Freimann wrote: I'm sending this as an RFC because: Maybe just prepending "/tmp" is not generic enough and we should make it somehow configurable or use $TMPDIR. Also there might be security implications I'm not aware of. The process is running with container_kvm_t context which also needs a change to be allowed to create files in tmpfs to make it work. Fabiano had the idea to use a glib function to create the temporary directory. It would be good because it uses the $TMPDIR env variable. But before we decide about glib or not: the change is in the call chain of setup_sandbox() and there was a question what other implications that has. What do you think? regards, Jens
[RFC 0/1] tools/virtiofsd: don't create temporary directory in /
When running a Kata container with virtiofs in OpenShift/k8s I get a "Operation not permitted" error from a mkdtemp() call in virtiofsd because it is trying to create a directory like /virtiofsd.11RAND To avoid this change in virtiofsd, I've tried to set the TMPDIR environment variable for the virtiofsd process, hoping that mkdtemp() would use it, but it does not. Looking at glibc code it seems to be used by tmpfile() etc. only. I'm sending this as an RFC because: Maybe just prepending "/tmp" is not generic enough and we should make it somehow configurable or use $TMPDIR. Also there might be security implications I'm not aware of. The process is running with container_kvm_t context which also needs a change to be allowed to create files in tmpfs to make it work. Jens Freimann (1): tools/virtiofsd: create tmpdir in /tmp tools/virtiofsd/passthrough_ll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.26.2
[RFC 1/1] tools/virtiofsd: create temporary directory in /tmp
mkdtemp() will try to create a current directory in the working directory of the process. In this case it's trying to create it in /. This is a problem when the process doesn't have write access there. This patch changes the template string and prepends "/tmp" which is typically writable. Signed-off-by: Jens Freimann --- tools/virtiofsd/passthrough_ll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index 0b229ebd57..f79bcce0d7 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -2393,7 +2393,7 @@ static void setup_wait_parent_capabilities(void) static void setup_namespaces(struct lo_data *lo, struct fuse_session *se) { pid_t child; -char template[] = "virtiofsd-XX"; +char template[] = "/tmp/virtiofsd-XX"; char *tmpdir; /* -- 2.26.2
Re: [PATCH 02/22] pci: Delete useless error_propagate()
On Mon, Jun 22, 2020 at 12:42:30PM +0200, Markus Armbruster wrote: Cc: Jens Freimann Cc: Michael S. Tsirkin Cc: Marcel Apfelbaum Signed-off-by: Markus Armbruster --- hw/pci/pci.c | 3 --- 1 file changed, 3 deletions(-) Reviewed-by: Jens Freimann Thanks! regards, Jens
Re: [PATCH 01/22] net/virtio: Fix failover_replug_primary() return value regression
On Mon, Jun 22, 2020 at 12:42:29PM +0200, Markus Armbruster wrote: Commit 150ab54aa6 "net/virtio: fix re-plugging of primary device" fixed failover_replug_primary() to return false on failure. Commit 5a0948d36c "net/virtio: Fix failover error handling crash bugs" broke it again for hotplug_handler_plug() failure. Unbreak it. Commit 5a0948d36c4cbc1c5534afac6fee99de55245d12 Fixes: 5a0948d36c4cbc1c5534afac6fee99de55245d12 Cc: Jens Freimann Cc: Michael S. Tsirkin Cc: qemu-sta...@nongnu.org Signed-off-by: Markus Armbruster --- hw/net/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Jens Freimann Thanks! regards, Jens
Re: [PATCH] migration: Optimization about wait-unplug migration state
On Tue, Feb 04, 2020 at 01:08:41PM +0800, Keqian Zhu wrote: qemu_savevm_nr_failover_devices() is originally designed to get the number of failover devices, but it actually returns the number of "unplug-pending" failover devices now. Moreover, what drives migration state to wait-unplug should be the number of "unplug-pending" failover devices, not all failover devices. We can also notice that qemu_savevm_state_guest_unplug_pending() and qemu_savevm_nr_failover_devices() is equivalent almost (from the code view). So the latter is incorrect semantically and useless, just delete it. In the qemu_savevm_state_guest_unplug_pending(), once hit a unplug-pending failover device, then it can return true right now to save cpu time. Signed-off-by: Keqian Zhu --- Cc: jfreim...@redhat.com Cc: Juan Quintela Cc: "Dr. David Alan Gilbert" --- migration/migration.c | 2 +- migration/savevm.c| 24 +++- migration/savevm.h| 1 - 3 files changed, 4 insertions(+), 23 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3a21a4686c..deedc968cf 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -,7 +,7 @@ static void *migration_thread(void *opaque) qemu_savevm_state_setup(s->to_dst_file); -if (qemu_savevm_nr_failover_devices()) { +if (qemu_savevm_state_guest_unplug_pending()) { migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_WAIT_UNPLUG); diff --git a/migration/savevm.c b/migration/savevm.c index f19cb9ec7a..1d4220ece8 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1140,36 +1140,18 @@ void qemu_savevm_state_header(QEMUFile *f) } } -int qemu_savevm_nr_failover_devices(void) +bool qemu_savevm_state_guest_unplug_pending(void) { SaveStateEntry *se; -int n = 0; QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (se->vmsd && se->vmsd->dev_unplug_pending && se->vmsd->dev_unplug_pending(se->opaque)) { -n++; -} -} - -return n; -} - -bool qemu_savevm_state_guest_unplug_pending(void) -{ -SaveStateEntry *se; -int n = 0; - -QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { -if (!se->vmsd || !se->vmsd->dev_unplug_pending) { -continue; -} -if (se->vmsd->dev_unplug_pending(se->opaque)) { -n++; +return true; } } -return n > 0; +return false; } void qemu_savevm_state_setup(QEMUFile *f) diff --git a/migration/savevm.h b/migration/savevm.h index c42b9c80ee..ba64a7e271 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -31,7 +31,6 @@ bool qemu_savevm_state_blocked(Error **errp); void qemu_savevm_state_setup(QEMUFile *f); -int qemu_savevm_nr_failover_devices(void); bool qemu_savevm_state_guest_unplug_pending(void); int qemu_savevm_state_resume_prepare(MigrationState *s); void qemu_savevm_state_header(QEMUFile *f); -- 2.19.1 Looks good to me. I tested it and it still works, so Tested-by: Jens Freimann Reviewed-by: Jens Freimann regards Jens
Re: qom device lifecycle interaction with hotplug/hotunplug ?
On Wed, Dec 11, 2019 at 01:52:33PM +0100, Damien Hedde wrote: On 12/4/19 7:51 PM, Eduardo Habkost wrote: On Wed, Dec 04, 2019 at 05:21:25PM +0100, Jens Freimann wrote: On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote: On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote: On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote: +jfreimann, +mst And if migration fails this same device is plugged back using failover_replug_primary(): static bool failover_replug_primary(VirtIONet *n, Error **errp) { [...] qdev_set_parent_bus(n->primary_dev, n->primary_bus); [...] hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); if (hotplug_ctrl) { hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, &err); if (err) { goto out; } hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); } [...] } My concern is about the qdev_set_parent_bus() call above (because I touch this function in my series) and don't want to have side effects there. I looked at the code and thought the partial unplug has the effect of cutting off the unplug procedure just before doing the qdev real unplug. In pcie_unplug_device() we return before doing the object_unparent(): static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, ... { [...] if (dev->partially_hotplugged) { dev->qdev.pending_deleted_event = false; return; } hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); object_unparent(OBJECT(dev)); } From my understanding, object_unparent() is the only way to really unplug a device from a bus regarding qdev (and it also unrealizes the device). So I have the feeling that qdev_set_parent_bus() here is a no-op (because primary_dev is already on primary_bus). I tested this now and it really is a no-op. It was required in a earlier version of the patches and I missed to remove it when I reworked the re-plug functionality. I will send a patch to remove it. regards Jens
Re: qom device lifecycle interaction with hotplug/hotunplug ?
On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote: On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote: On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote: > +jfreimann, +mst > > On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote: > > On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost wrote: > > > So, to summarize the current issues: > > > > > > 1) realize triggers a plug operation implicitly. > > > 2) unplug triggers unrealize implicitly. > > > > > > Do you expect to see use cases that will require us to implement > > > realize-without-plug? > > > > I don't think so, but only because of the oddity that > > we put lots of devices on the 'sysbus' and claim that > > that's plugging them into the bus. The common case of > > 'realize' is where one device (say an SoC) has a bunch of child > > devices (like UARTs); the SoC's realize method realizes its child > > devices. Those devices all end up plugged into the 'sysbus' > > but there's no actual bus there, it's fictional and about > > the only thing it matters for is reset propagation (which > > we don't model right either). A few devices don't live on > > buses at all. > > That's my impression as well. > > > > > > Similarly, do you expect use cases that will require us to > > > implement unplug-without-unrealize? > > > > I don't know enough about hotplug to answer this one: > > it's essentially what I'm hoping you'd be able to answer. > > I vaguely had in mind that eg the user might be able to > > create a 'disk' object, plug it into a SCSI bus, then > > unplug it from the bus without the disk and all its data > > evaporating, and maybe plug it back into the SCSI > > bus (or some other SCSI bus) later ? But I don't know > > anything about how we expose that kind of thing to the > > user via QMP/HMP. > > This ability isn't exposed to the user at all. Our existing > interfaces are -device, device_add and device_del. > > We do have something new that sounds suspiciously similar to > "unplugged but not unrealized", though: the new hidden device > API, added by commit f3a850565693 ("qdev/qbus: add hidden device > support"). > > Jens, Michael, what exactly is the difference between a "hidden" > device and a "unplugged" device? "hidden" the way we use it for virtio-net failover is actually unplugged. But it doesn't have to be that way. You can register a function that decides if the device should be hidden, i.e. plugged now, or do something else with it (in the virtio-net failover case we just save everything we need to plug the device later). We did introduce a "unplugged but not unrealized" function too as part of the failover feature. See "a99c4da9fc pci: mark devices partially unplugged" This was needed so we would be able to re-plug the device in case a migration failed and we need to hotplug the primary device back to the guest. To avoid the risk of not getting the resources the device needs we don't unrealize but just trigger the unplug from the guest OS. Thanks for the explanation. Let me confirm if I understand the purpose of the new mechanisms: should_be_hidden is a mechanism for implementing realize-without-plug. partially_hotplugged is a mechanism for implementing unplug-without-unrealize. Is that correct? should_be_hidden is a mechanism for implementing realize-without-plug: kind of. It's a mechanism that ensures qdev_device_add() returns early as long as the condition to hide the device is true. You could to the realize-without-plug in the handler function that decides if the device should be "hidden". partially_hotplugged is a mechanism for implementing unplug-without-unrealize: yes. regards Jens
Re: qom device lifecycle interaction with hotplug/hotunplug ?
On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote: +jfreimann, +mst On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote: On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost wrote: > So, to summarize the current issues: > > 1) realize triggers a plug operation implicitly. > 2) unplug triggers unrealize implicitly. > > Do you expect to see use cases that will require us to implement > realize-without-plug? I don't think so, but only because of the oddity that we put lots of devices on the 'sysbus' and claim that that's plugging them into the bus. The common case of 'realize' is where one device (say an SoC) has a bunch of child devices (like UARTs); the SoC's realize method realizes its child devices. Those devices all end up plugged into the 'sysbus' but there's no actual bus there, it's fictional and about the only thing it matters for is reset propagation (which we don't model right either). A few devices don't live on buses at all. That's my impression as well. > Similarly, do you expect use cases that will require us to > implement unplug-without-unrealize? I don't know enough about hotplug to answer this one: it's essentially what I'm hoping you'd be able to answer. I vaguely had in mind that eg the user might be able to create a 'disk' object, plug it into a SCSI bus, then unplug it from the bus without the disk and all its data evaporating, and maybe plug it back into the SCSI bus (or some other SCSI bus) later ? But I don't know anything about how we expose that kind of thing to the user via QMP/HMP. This ability isn't exposed to the user at all. Our existing interfaces are -device, device_add and device_del. We do have something new that sounds suspiciously similar to "unplugged but not unrealized", though: the new hidden device API, added by commit f3a850565693 ("qdev/qbus: add hidden device support"). Jens, Michael, what exactly is the difference between a "hidden" device and a "unplugged" device? "hidden" the way we use it for virtio-net failover is actually unplugged. But it doesn't have to be that way. You can register a function that decides if the device should be hidden, i.e. plugged now, or do something else with it (in the virtio-net failover case we just save everything we need to plug the device later). We did introduce a "unplugged but not unrealized" function too as part of the failover feature. See "a99c4da9fc pci: mark devices partially unplugged" This was needed so we would be able to re-plug the device in case a migration failed and we need to hotplug the primary device back to the guest. To avoid the risk of not getting the resources the device needs we don't unrealize but just trigger the unplug from the guest OS. regards Jens
Re: [PATCH 00/21] Error handling fixes, may contain 4.2 material
On Sat, Nov 30, 2019 at 08:42:19PM +0100, Markus Armbruster wrote: PATCH 2-4 fix crash bugs. Including them would be a no-brainer at -rc0. But we're post -rc3, and even for crash bugs we require a certain likelihood of users getting bitten. Jens, please assess impact of PATCH 2's crash bug. Guest can't use it to trigger a qemu crash because the hotplug_handler called after the pre-plug doesn't do anything with errp. qemu_opt_set_bool() is also unlikely to trigger the bug because both reasons for why it would set errp are not true in this path. I would prefer for this fix to go into 4.2 but if it is the only one and triggers a large amount of work then don't do it. regards Jens
Re: [PATCH 01/21] net/virtio: Drop useless n->primary_dev not null checks
On Sat, Nov 30, 2019 at 08:42:20PM +0100, Markus Armbruster wrote: virtio_net_handle_migration_primary() returns early when it can't ensure n->primary_dev is non-null. Checking it again right after that early return is redundant. Drop. If n->primary_dev is null on entering failover_replug_primary(), @pdev will become null, and pdev->partially_hotplugged will crash. Checking n->primary_dev later is useless. It can't actually be null, because its caller virtio_net_handle_migration_primary() ensures it isn't. Drop the useless check. Cc: Jens Freimann Cc: Michael S. Tsirkin Signed-off-by: Markus Armbruster --- hw/net/virtio-net.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) Thanks Markus! Reviewed-by: Jens Freimann regards Jens
Re: [PATCH 02/21] net/virtio: Fix failover error handling crash bugs
On Sat, Nov 30, 2019 at 08:42:21PM +0100, Markus Armbruster wrote: Functions that take an Error ** parameter to pass an error to the caller expect the parameter to point to null. failover_replug_primary() violates this precondition in several places: * After qemu_opts_from_qdict() failed, *errp is no longer null. Passing it to error_setg() is wrong, and will trip the assertion in error_setv(). Messed up in commit 150ab54aa6 "net/virtio: fix re-plugging of primary device". Simply drop the error_setg(). * Passing @errp to qemu_opt_set_bool(), hotplug_handler_pre_plug(), and hotplug_handler_plug() is wrong. If one of the first two fails, *errp is no longer null. Risks tripping the same assertion. Moreover, continuing after such errors is unsafe. Messed up in commit 9711cd0dfc "net/virtio: add failover support". Fix by handling each error properly. failover_replug_primary() crashes when passed a null @errp. Also messed up in commit 9711cd0dfc. This bug can't bite as no caller actually passes null. Fix it anyway. Fixes: 9711cd0dfc3fa414f7f64935713c07134ae67971 Fixes: 150ab54aa6934583180f88a2bd540bc6fc4fbff3 Cc: Jens Freimann Cc: Michael S. Tsirkin Signed-off-by: Markus Armbruster --- hw/net/virtio-net.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) Thanks Markus! Reviewed-by: Jens Freimann
Re: [PULL 0/5] i386 patches for QEMU 4.2-rc
On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote: * Philippe Mathieu-Daudé (phi...@redhat.com) wrote: On 11/26/19 10:19 AM, no-re...@patchew.org wrote: > Patchew URL: https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/ > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > their output below. If you have Docker installed, you can probably reproduce it > locally. > > === TEST SCRIPT BEGIN === > #!/bin/bash > make docker-image-centos7 V=1 NETWORK=1 > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > === TEST SCRIPT END === > >TESTcheck-unit: tests/test-thread-pool > wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 > ** > ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) > ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) > make: *** [check-qtest-aarch64] Error 1 Should we worry about this error? Interesting; that should be fixed by Jens' 284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending is still lying; it's showing we're still landing in 'wait-unplug' on aarch, because it's got a virtio-net by default; even though we've not got a failover device setup. CCing Jens. I've run this test on aarch64 in a loop today for a few hours but could not reproduce this error. One bug I found is that in primary_unplug_device() I look at the virtio guest feature bits instead of the negotiated bits. But I don't think this could lead to the above problem because even if the check for the feature bit fails, primary_unplug_pending would still return false because no primary device was set and n->primary_dev is NULL. I'll keep the test running until I can reproduce it. regards Jens
Re: [PULL 0/5] i386 patches for QEMU 4.2-rc
On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote: * Philippe Mathieu-Daudé (phi...@redhat.com) wrote: On 11/26/19 10:19 AM, no-re...@patchew.org wrote: > Patchew URL: https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/ > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > their output below. If you have Docker installed, you can probably reproduce it > locally. > > === TEST SCRIPT BEGIN === > #!/bin/bash > make docker-image-centos7 V=1 NETWORK=1 > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > === TEST SCRIPT END === > >TESTcheck-unit: tests/test-thread-pool > wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 > ** > ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) > ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) > make: *** [check-qtest-aarch64] Error 1 Should we worry about this error? Interesting; that should be fixed by Jens' 284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending is still lying; it's showing we're still landing in 'wait-unplug' on aarch, because it's got a virtio-net by default; even though we've not got a failover device setup. CCing Jens. Hmm, I did test it. I'm looking into it. regards Jens
[PATCH v3 2/4] net/virtio: return early when failover primary alread added
Bail out when primary device was already added before. This avoids printing a wrong warning message during reboot. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 946039c0dc..ac4d19109e 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error **errp) { Error *err = NULL; +if (n->primary_dev) { +return; +} + n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"), n->primary_device_id); if (n->primary_device_opts) { -- 2.21.0
[PATCH v3 3/4] net/virtio: fix re-plugging of primary device
failover_replug_primary was returning true on failure which lead to re-plug not working when a migration failed. Fix this by returning success when hotplug worked. This is a bug that was missed in last round of testing but was tested succesfully with this version. Also make sure we don't pass NULL to qdev_set_parent_bus(). This fixes CID 1407224. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index ac4d19109e..565dea0b93 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) n->primary_device_opts = qemu_opts_from_qdict( qemu_find_opts("device"), n->primary_device_dict, errp); -} -if (n->primary_device_opts) { -if (n->primary_dev) { -n->primary_bus = n->primary_dev->parent_bus; -} -qdev_set_parent_bus(n->primary_dev, n->primary_bus); -n->primary_should_be_hidden = false; -qemu_opt_set_bool(n->primary_device_opts, -"partially_hotplugged", true, errp); -hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); -if (hotplug_ctrl) { -hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); -hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +if (!n->primary_device_opts) { +error_setg(errp, "virtio_net: couldn't find primary device opts"); +goto out; } -if (!n->primary_dev) { +} +if (!n->primary_dev) { error_setg(errp, "virtio_net: couldn't find primary device"); -} +goto out; } -return *errp != NULL; + +n->primary_bus = n->primary_dev->parent_bus; +if (!n->primary_bus) { +error_setg(errp, "virtio_net: couldn't find primary bus"); +goto out; +} +qdev_set_parent_bus(n->primary_dev, n->primary_bus); +n->primary_should_be_hidden = false; +qemu_opt_set_bool(n->primary_device_opts, + "partially_hotplugged", true, errp); +hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); +if (hotplug_ctrl) { +hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); +hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +} + +out: +return *errp == NULL; } static void virtio_net_handle_migration_primary(VirtIONet *n, @@ -2852,7 +2860,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, warn_report("couldn't unplug primary device"); } } else if (migration_has_failed(s)) { -/* We already unplugged the device let's plugged it back */ +/* We already unplugged the device let's plug it back */ if (!failover_replug_primary(n, &err)) { if (err) { error_report_err(err); -- 2.21.0
[PATCH v3 1/4] net/virtio: fix dev_unplug_pending
.dev_unplug_pending is set up by virtio-net code indepent of failover support was set for the device or not. This gives a wrong result when we check for existing primary devices in migration code. Fix this by actually calling dev_unplug_pending() instead of just checking if the function pointer was set. When the feature was not negotiated dev_unplug_pending() will always return false. This prevents us from going into the wait-unplug state when there's no primary device present. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reported-by: Dr. David Alan Gilbert Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 3 +++ migration/savevm.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 97a5113f7e..946039c0dc 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIONet *n = VIRTIO_NET(vdev); +if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) { +return false; +} return n->primary_dev ? n->primary_dev->pending_deleted_event : false; } diff --git a/migration/savevm.c b/migration/savevm.c index 966a9c3bdb..a71b930b91 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void) int n = 0; QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { -if (se->vmsd && se->vmsd->dev_unplug_pending) { +if (se->vmsd && se->vmsd->dev_unplug_pending && +se->vmsd->dev_unplug_pending(se->opaque)) { n++; } } -- 2.21.0
[PATCH v3 4/4] net/virtio: return error when device_opts arg is NULL
This fixes CID 1407222. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 565dea0b93..3c31471026 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2880,9 +2880,12 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, QemuOpts *device_opts) { VirtIONet *n = container_of(listener, VirtIONet, primary_listener); -bool match_found; -bool hide; +bool match_found = false; +bool hide = false; +if (!device_opts) { +return -1; +} n->primary_device_dict = qemu_opts_to_qdict(device_opts, n->primary_device_dict); if (n->primary_device_dict) { @@ -2890,7 +2893,7 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict, "failover_pair_id")); } -if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) { +if (g_strcmp0(n->standby_id, n->netclient_name) == 0) { match_found = true; } else { match_found = false; -- 2.21.0
[PATCH v3 0/4] net/virtio: fixes for failover
This series fixes bugs found by coverity and one reported by David Gilbert. v2->v3: * change patch description and subject of patch 3/4 Jens Freimann (4): net/virtio: fix dev_unplug_pending net/virtio: return early when failover primary alread added net/virtio: fix re-plugging of primary device net/virtio: return error when device_opts arg is NULL hw/net/virtio-net.c | 58 + migration/savevm.c | 3 ++- 2 files changed, 40 insertions(+), 21 deletions(-) -- 2.21.0
Re: [PATCH v2 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus
On Wed, Nov 20, 2019 at 10:04:01AM -0500, Michael S. Tsirkin wrote: On Wed, Nov 20, 2019 at 03:38:58PM +0100, Jens Freimann wrote: Make sure we don't pass NULL to qdev_set_parent_bus(). Also simplify code a bit and fix a typo. This fixes CID 1407224. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann patch itself is ok I think Reviewed-by: Michael S. Tsirkin but some questions on the commit log --- hw/net/virtio-net.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index ac4d19109e..78f1e4e87c 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) n->primary_device_opts = qemu_opts_from_qdict( qemu_find_opts("device"), n->primary_device_dict, errp); -} -if (n->primary_device_opts) { -if (n->primary_dev) { -n->primary_bus = n->primary_dev->parent_bus; -} -qdev_set_parent_bus(n->primary_dev, n->primary_bus); -n->primary_should_be_hidden = false; -qemu_opt_set_bool(n->primary_device_opts, -"partially_hotplugged", true, errp); -hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); -if (hotplug_ctrl) { -hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); -hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +if (!n->primary_device_opts) { +error_setg(errp, "virtio_net: couldn't find primary device opts"); +goto out; } -if (!n->primary_dev) { +} +if (!n->primary_dev) { error_setg(errp, "virtio_net: couldn't find primary device"); -} +goto out; } -return *errp != NULL; + +n->primary_bus = n->primary_dev->parent_bus; +if (!n->primary_bus) { +error_setg(errp, "virtio_net: couldn't find primary bus"); +goto out; +} +qdev_set_parent_bus(n->primary_dev, n->primary_bus); +n->primary_should_be_hidden = false; +qemu_opt_set_bool(n->primary_device_opts, + "partially_hotplugged", true, errp); +hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); +if (hotplug_ctrl) { +hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); +hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +} + +out: +return *errp == NULL; } static void virtio_net_handle_migration_primary(VirtIONet *n, So the logic actually was inverted here? Then ... how did this work? and how was it tested? It did not work and I missed the error in my test output. I tested it now manually by migrating and then cancelling. Device was added back to source VM successfully. Will sent new version with better patch description. Thanks! regards Jens
[PATCH v2 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus
Make sure we don't pass NULL to qdev_set_parent_bus(). Also simplify code a bit and fix a typo. This fixes CID 1407224. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann --- hw/net/virtio-net.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index ac4d19109e..78f1e4e87c 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) n->primary_device_opts = qemu_opts_from_qdict( qemu_find_opts("device"), n->primary_device_dict, errp); -} -if (n->primary_device_opts) { -if (n->primary_dev) { -n->primary_bus = n->primary_dev->parent_bus; -} -qdev_set_parent_bus(n->primary_dev, n->primary_bus); -n->primary_should_be_hidden = false; -qemu_opt_set_bool(n->primary_device_opts, -"partially_hotplugged", true, errp); -hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); -if (hotplug_ctrl) { -hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); -hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +if (!n->primary_device_opts) { +error_setg(errp, "virtio_net: couldn't find primary device opts"); +goto out; } -if (!n->primary_dev) { +} +if (!n->primary_dev) { error_setg(errp, "virtio_net: couldn't find primary device"); -} +goto out; } -return *errp != NULL; + +n->primary_bus = n->primary_dev->parent_bus; +if (!n->primary_bus) { +error_setg(errp, "virtio_net: couldn't find primary bus"); +goto out; +} +qdev_set_parent_bus(n->primary_dev, n->primary_bus); +n->primary_should_be_hidden = false; +qemu_opt_set_bool(n->primary_device_opts, + "partially_hotplugged", true, errp); +hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev); +if (hotplug_ctrl) { +hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); +hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); +} + +out: +return *errp == NULL; } static void virtio_net_handle_migration_primary(VirtIONet *n, @@ -2852,7 +2860,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, warn_report("couldn't unplug primary device"); } } else if (migration_has_failed(s)) { -/* We already unplugged the device let's plugged it back */ +/* We already unplugged the device let's plug it back */ if (!failover_replug_primary(n, &err)) { if (err) { error_report_err(err); -- 2.21.0
[PATCH v2 0/4] net/virtio: fixes for failover
This series fixes bugs found by coverity and one reported by David Gilbert. Jens Freimann (4): net/virtio: fix dev_unplug_pending net/virtio: return early when failover primary alread added net/virtio: avoid passing NULL to qdev_set_parent_bus net/virtio: return error when device_opts arg is NULL hw/net/virtio-net.c | 58 + migration/savevm.c | 3 ++- 2 files changed, 40 insertions(+), 21 deletions(-) -- 2.21.0
[PATCH v2 4/4] net/virtio: return error when device_opts arg is NULL
This fixes CID 1407222. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 78f1e4e87c..600c6e164f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2880,9 +2880,12 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, QemuOpts *device_opts) { VirtIONet *n = container_of(listener, VirtIONet, primary_listener); -bool match_found; -bool hide; +bool match_found = false; +bool hide = false; +if (!device_opts) { +return -1; +} n->primary_device_dict = qemu_opts_to_qdict(device_opts, n->primary_device_dict); if (n->primary_device_dict) { @@ -2890,7 +2893,7 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict, "failover_pair_id")); } -if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) { +if (g_strcmp0(n->standby_id, n->netclient_name) == 0) { match_found = true; } else { match_found = false; -- 2.21.0
[PATCH v2 2/4] net/virtio: return early when failover primary alread added
Bail out when primary device was already added before. This avoids printing a wrong warning message during reboot. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 946039c0dc..ac4d19109e 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error **errp) { Error *err = NULL; +if (n->primary_dev) { +return; +} + n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"), n->primary_device_id); if (n->primary_device_opts) { -- 2.21.0
[PATCH v2 1/4] net/virtio: fix dev_unplug_pending
.dev_unplug_pending is set up by virtio-net code indepent of failover support was set for the device or not. This gives a wrong result when we check for existing primary devices in migration code. Fix this by actually calling dev_unplug_pending() instead of just checking if the function pointer was set. When the feature was not negotiated dev_unplug_pending() will always return false. This prevents us from going into the wait-unplug state when there's no primary device present. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reported-by: Dr. David Alan Gilbert Reviewed-by: Michael S. Tsirkin --- hw/net/virtio-net.c | 3 +++ migration/savevm.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 97a5113f7e..946039c0dc 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIONet *n = VIRTIO_NET(vdev); +if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) { +return false; +} return n->primary_dev ? n->primary_dev->pending_deleted_event : false; } diff --git a/migration/savevm.c b/migration/savevm.c index 966a9c3bdb..a71b930b91 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void) int n = 0; QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { -if (se->vmsd && se->vmsd->dev_unplug_pending) { +if (se->vmsd && se->vmsd->dev_unplug_pending && +se->vmsd->dev_unplug_pending(se->opaque)) { n++; } } -- 2.21.0
Re: [PATCH 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus
On Wed, Nov 20, 2019 at 05:46:36AM -0500, Michael S. Tsirkin wrote: On Thu, Nov 14, 2019 at 03:16:12PM +0100, Jens Freimann wrote: Make sure no arguments for qdev_set_parent_bus are NULL. This fixes CID 1407224. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann --- hw/net/virtio-net.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index ac4d19109e..81650d4dc0 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2809,8 +2809,16 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) if (n->primary_device_opts) { if (n->primary_dev) { n->primary_bus = n->primary_dev->parent_bus; +if (n->primary_bus) { +qdev_set_parent_bus(n->primary_dev, n->primary_bus); +} else { +error_setg(errp, "virtio_net: couldn't set bus for primary"); +goto out; +} +} else { +error_setg(errp, "virtio_net: couldn't find primary device"); +goto out; } A bit less messy: if (!n->primary_dev) { error_setg(errp, "virtio_net: couldn't find primary device"); goto err; } n->primary_bus = n->primary_dev->parent_bus; if (!n->primary_bus) { error_setg(errp, "virtio_net: couldn't find primary device"); goto err; } qdev_set_parent_bus(n->primary_dev, n->primary_bus); err: return false; also is it valid for primary_device_opts to not be present at all? Maybe we should check that too. -qdev_set_parent_bus(n->primary_dev, n->primary_bus); n->primary_should_be_hidden = false; qemu_opt_set_bool(n->primary_device_opts, "partially_hotplugged", true, errp); @@ -2819,10 +2827,8 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); } -if (!n->primary_dev) { -error_setg(errp, "virtio_net: couldn't find primary device"); -} } +out: return *errp != NULL; } I'd handle errors separately from good path. BTW I don't understand something here: *errp != NULL is true on error, no? Why are we returning success? Confused. Just return true would be cleaner imho. I'll fix this and sent a new version as a single patch. As you said this is not really a series, just individual patches. Thanks! regards Jens
[PATCH 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus
Make sure no arguments for qdev_set_parent_bus are NULL. This fixes CID 1407224. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann --- hw/net/virtio-net.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index ac4d19109e..81650d4dc0 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2809,8 +2809,16 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) if (n->primary_device_opts) { if (n->primary_dev) { n->primary_bus = n->primary_dev->parent_bus; +if (n->primary_bus) { +qdev_set_parent_bus(n->primary_dev, n->primary_bus); +} else { +error_setg(errp, "virtio_net: couldn't set bus for primary"); +goto out; +} +} else { +error_setg(errp, "virtio_net: couldn't find primary device"); +goto out; } -qdev_set_parent_bus(n->primary_dev, n->primary_bus); n->primary_should_be_hidden = false; qemu_opt_set_bool(n->primary_device_opts, "partially_hotplugged", true, errp); @@ -2819,10 +2827,8 @@ static bool failover_replug_primary(VirtIONet *n, Error **errp) hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp); hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp); } -if (!n->primary_dev) { -error_setg(errp, "virtio_net: couldn't find primary device"); -} } +out: return *errp != NULL; } -- 2.21.0
[PATCH 4/4] net/virtio: return error when device_opts arg is NULL
device_opts could be NULL. Make sure we don't pass it to qemu_opts_to_dict. When we made sure it can't be NULL we can also remove it from the if condition. This fixes CID 1407222. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann --- hw/net/virtio-net.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 81650d4dc0..d53aa5796b 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2878,9 +2878,12 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, QemuOpts *device_opts) { VirtIONet *n = container_of(listener, VirtIONet, primary_listener); -bool match_found; -bool hide; +bool match_found = false; +bool hide = false; +if (!device_opts) { +return -1; +} n->primary_device_dict = qemu_opts_to_qdict(device_opts, n->primary_device_dict); if (n->primary_device_dict) { @@ -2888,7 +2891,7 @@ static int virtio_net_primary_should_be_hidden(DeviceListener *listener, n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict, "failover_pair_id")); } -if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) { +if (g_strcmp0(n->standby_id, n->netclient_name) == 0) { match_found = true; } else { match_found = false; -- 2.21.0
[PATCH 2/4] net/virtio: return early when failover primary alread added
Bail out when primary device was already added before. This avoids printing a wrong warning message during reboot. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 946039c0dc..ac4d19109e 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error **errp) { Error *err = NULL; +if (n->primary_dev) { +return; +} + n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"), n->primary_device_id); if (n->primary_device_opts) { -- 2.21.0
[PATCH 1/4] net/virtio: fix dev_unplug_pending
.dev_unplug_pending is set up by virtio-net code indepent of whether failover=on was set for the device or not. This gives a wrong result when we check for existing primary devices in migration code. Fix this by actually calling dev_unplug_pending() instead of just checking if the function pointer was set. When the feature was not negotiated dev_unplug_pending() will always return false. This prevents us from going into the wait-unplug state when there's no primary device present. Fixes: 9711cd0dfc3f ("net/virtio: add failover support") Signed-off-by: Jens Freimann Reported-by: Dr. David Alan Gilbert --- hw/net/virtio-net.c | 3 +++ migration/savevm.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 97a5113f7e..946039c0dc 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIONet *n = VIRTIO_NET(vdev); +if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) { +return false; +} return n->primary_dev ? n->primary_dev->pending_deleted_event : false; } diff --git a/migration/savevm.c b/migration/savevm.c index 966a9c3bdb..a71b930b91 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void) int n = 0; QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { -if (se->vmsd && se->vmsd->dev_unplug_pending) { +if (se->vmsd && se->vmsd->dev_unplug_pending && +se->vmsd->dev_unplug_pending(se->opaque)) { n++; } } -- 2.21.0
[PATCH] vfio: don't ignore return value of migrate_add_blocker
When an error occurs in migrate_add_blocker() it sets a negative return value and uses error pointer we pass in. Instead of just looking at the error pointer check for a negative return value and avoid a coverity error because the return value is set but never used. This fixes CID 1407219. Fixes: f045a0104c8c ("vfio: unplug failover primary device before migration") Signed-off-by: Jens Freimann --- hw/vfio/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index e6569a7968..ed01774673 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2737,7 +2737,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) error_setg(&vdev->migration_blocker, "VFIO device doesn't support migration"); ret = migrate_add_blocker(vdev->migration_blocker, &err); -if (err) { +if (ret) { error_propagate(errp, err); error_free(vdev->migration_blocker); return; -- 2.21.0
Re: [PATCH] tests/migration: Print some debug on bad status
On Fri, Nov 08, 2019 at 06:38:17PM +, Dr. David Alan Gilbert wrote: Hi Jens, the unplug failover stuff is triggering an assertion occasionally on aarch64; but a) I'm not sure the right way to fix it b) And I'm out for a little over a week so... I'll take care of it. * no-re...@patchew.org (no-re...@patchew.org) wrote: Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilb...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === TESTcheck-unit: tests/test-bdrv-drain wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 In tests/migration-test.c we've got wait_for_migration_fail, and it's expecting the state to be any one of: setup, failed or maybe active but it's getting surprised by seeing a 'wait-unplug' So the question is should we see a wait-unplug? the migration code has: if (qemu_savevm_nr_failover_devices()) { migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_WAIT_UNPLUG); Should qemu_savevm_nr_failover_devices() be true? On aarch64 it seems to have a virtio-net device by default and qemu_savevm_nr_failover_devices() checks for devices having dev_unplug_pending but doesn't call it. I see two fixes but am not sure which is right: a) Add 'wait-unplug' to the wait_for_migration_fail (easy) b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices so that on a guest which has a virtio-net, but no failover device, the state isn't entered. I also prefer b over a, but I should only set up dev_unplug_pending when failover=on to avoid this. I'll send a patch. regards Jens
Re: [PATCH v6 06/11] qapi: add failover negotiated event
On Tue, Oct 29, 2019 at 06:41:58PM -0400, Michael S. Tsirkin wrote: On Fri, Oct 25, 2019 at 07:39:21PM +0200, Jens Freimann wrote: On Fri, Oct 25, 2019 at 04:03:54PM +0200, Markus Armbruster wrote: > Bear with me, I know next to nothing about failover. > > Jens Freimann writes: > > > This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature > > was enabled. The primary device this virtio-net device is associated > > with, will now be hotplugged via qdev_device_add(). > > Passive voice deftly avoids telling the reader who will do the > hot-plugging. Intentional? Not really, it's in the comment to the event. The hotplug will be done by the virtio-net device code that activates the feature, in virtio_net_set_features(). > > > Signed-off-by: Jens Freimann > > Acked-by: Cornelia Huck > > --- > > qapi/net.json | 19 +++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/qapi/net.json b/qapi/net.json > > index 728990f4fb..ea64f7 100644 > > --- a/qapi/net.json > > +++ b/qapi/net.json > > @@ -737,3 +737,22 @@ > > ## > > { 'command': 'announce-self', 'boxed': true, > >'data' : 'AnnounceParameters'} > > + > > +## > > +# @FAILOVER_NEGOTIATED: > > +# > > +# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation. > > +# Failover primary devices which were hidden (not hotplugged when requested) > > +# before will now be hotplugged by the virtio-net standby device. > > +# > > +# device-id: QEMU device id of the unplugged device > > @device-id is new since v5. > > A quick skim of > https://www.kernel.org/doc/html/latest/networking/net_failover.html > tells me there are three devices involved: master, primary slave, > standby slave. Which one is @device-id? Or am I confused? Yes, the device-id is new and it's the device-id of the standby (i.e. virtio-net) device. regards, Jens And now I am confused. How is standby "the unplugged device"? Why not just say "the standby device"? Yes, that's better. Do you want a patch on top? diff --git a/qapi/net.json b/qapi/net.json index ea64f7..0f225cb900 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -745,7 +745,7 @@ # Failover primary devices which were hidden (not hotplugged when requested) # before will now be hotplugged by the virtio-net standby device. # -# device-id: QEMU device id of the unplugged device +# device-id: QEMU device id of the standby device # Since: 4.2 # # Example: Signed-off-by: Jens Freimann regards, Jens -- MST
Re: [PATCH v7 05/11] qapi: add unplug primary event
On Tue, Oct 29, 2019 at 01:50:02PM +0100, Markus Armbruster wrote: Jens Freimann writes: This event is emitted when we sent a request to unplug a Uh, "we sent a requestion [...] from the Guest OS"... do you mean "we received"? No, we sent a pci hotplug event to the guest by "pushing" the pcie attention button. So, it's QEMU requesting the unplug. +## +# @UNPLUG_PRIMARY: +# +# Emitted from source side of a migration when migration state is +# WAIT_UNPLUG. Device was unplugged by guest operating system. +# Device resources in QEMU are kept on standby to be able to re-plug it in case +# of migration failure. +# +# @device-id: QEMU device id of the unplugged device +# +# Since: 4.2 +# +# Example: +# {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} } +# +## +{ 'event': 'UNPLUG_PRIMARY', + 'data': { 'device-id': 'str' } } Commit message might need a touch-up. Regardless: Acked-by: Markus Armbruster Thanks for the review! regards, Jens
[PATCH v7 11/11] vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann Acked-by: Alex Williamson --- hw/vfio/pci.c | 26 -- hw/vfio/pci.h | 1 + 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 12fac39804..e6569a7968 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -40,6 +40,7 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "migration/blocker.h" #define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) @@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) return; } +if (!pdev->failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +error_free(vdev->migration_blocker); +return; +} +} + vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev); vdev->vbasedev.ops = &vfio_pci_ops; vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI; @@ -3008,6 +3020,10 @@ out_teardown: vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} } static void vfio_instance_finalize(Object *obj) @@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj) vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} /* * XXX Leaking igd_opregion is not an oversight, we can't remove the * fw_cfg entry therefore leaking this allocation seems like the safest @@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static const VMStateDescription vfio_pci_vmstate = { -.name = "vfio-pci", -.unmigratable = 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) dc->reset = vfio_pci_reset; dc->props = vfio_pci_dev_properties; -dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 834a90d646..b329d50338 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice { bool no_vfio_ioeventfd; bool enable_ramfb; VFIODisplay *dpy; +Error *migration_blocker; } VFIOPCIDevice; uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); -- 2.21.0
[PATCH v7 09/11] libqos: tolerate wait-unplug migration state
Signed-off-by: Jens Freimann --- tests/libqos/libqos.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c index d71557c5cb..f229eb2cb8 100644 --- a/tests/libqos/libqos.c +++ b/tests/libqos/libqos.c @@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri) break; } -if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) { +if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0) +|| (strcmp(st, "wait-unplug") == 0)) { qobject_unref(rsp); g_usleep(5000); continue; -- 2.21.0
[PATCH v7 08/11] migration: add new migration state wait-unplug
This patch adds a new migration state called wait-unplug. It is entered after the SETUP state if failover devices are present. It will transition into ACTIVE once all devices were succesfully unplugged from the guest. So if a guest doesn't respond or takes long to honor the unplug request the user will see the migration state 'wait-unplug'. In the migration thread we query failover devices if they're are still pending the guest unplug. When all are unplugged the migration continues. If one device won't unplug migration will stay in wait_unplug state. Signed-off-by: Jens Freimann Reviewed-by: Dr. David Alan Gilbert --- include/migration/vmstate.h | 2 ++ migration/migration.c | 21 + migration/migration.h | 3 +++ migration/savevm.c | 31 +++ migration/savevm.h | 2 ++ qapi/migration.json | 5 - 6 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index b9ee563aa4..ac4f46a67d 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -186,6 +186,8 @@ struct VMStateDescription { int (*pre_save)(void *opaque); int (*post_save)(void *opaque); bool (*needed)(void *opaque); +bool (*dev_unplug_pending)(void *opaque); + const VMStateField *fields; const VMStateDescription **subsections; }; diff --git a/migration/migration.c b/migration/migration.c index 4133ed2684..354ad072fa 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -52,6 +52,7 @@ #include "hw/qdev-properties.h" #include "monitor/monitor.h" #include "net/announce.h" +#include "qemu/queue.h" #define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */ @@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state) case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return true; default: @@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info) case MIGRATION_STATUS_CANCELLED: info->has_status = true; break; +case MIGRATION_STATUS_WAIT_UNPLUG: +info->has_status = true; +break; } info->status = s->state; } @@ -1694,6 +1699,7 @@ bool migration_is_idle(void) case MIGRATION_STATUS_COLO: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return false; case MIGRATION_STATUS__MAX: g_assert_not_reached(); @@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque) qemu_savevm_state_setup(s->to_dst_file); +if (qemu_savevm_nr_failover_devices()) { +migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, + MIGRATION_STATUS_WAIT_UNPLUG); + +while (s->state == MIGRATION_STATUS_WAIT_UNPLUG && + qemu_savevm_state_guest_unplug_pending()) { +qemu_sem_timedwait(&s->wait_unplug_sem, 250); +} + +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, +MIGRATION_STATUS_ACTIVE); +} + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); @@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj) qemu_mutex_destroy(&ms->qemu_file_lock); g_free(params->tls_hostname); g_free(params->tls_creds); +qemu_sem_destroy(&ms->wait_unplug_sem); qemu_sem_destroy(&ms->rate_limit_sem); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); @@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj) qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_sem, 0); qemu_sem_init(&ms->rate_limit_sem, 0); +qemu_sem_init(&ms->wait_unplug_sem, 0); qemu_mutex_init(&ms->qemu_file_lock); } diff --git a/migration/migration.h b/migration/migration.h index 4f2fe193dc..79b3dda146 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -206,6 +206,9 @@ struct MigrationState /* Flag set once the migration thread called bdrv_inactivate_all */ bool block_inactive; +/* Migration is waiting for guest to unplug device */ +QemuSemaphore wait_unplug_sem; + /* Migration is paused due to pause-before-switchover */ QemuSemaphore pause_sem; diff --git a/migration/savevm.c b/migration/savevm.c index 8d95e261f6..966a9c3bdb 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1113,6 +1113,37 @@ void qemu_savevm_state_header(QEMUFile *f) } } +int qemu_sav
[PATCH v7 06/11] qapi: add failover negotiated event
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature is enabled. The primary device this virtio-net (standby) device is associated with, is now hotplugged by the virtio-net device. Signed-off-by: Jens Freimann --- qapi/net.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 728990f4fb..ea64f7 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -737,3 +737,22 @@ ## { 'command': 'announce-self', 'boxed': true, 'data' : 'AnnounceParameters'} + +## +# @FAILOVER_NEGOTIATED: +# +# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation. +# Failover primary devices which were hidden (not hotplugged when requested) +# before will now be hotplugged by the virtio-net standby device. +# +# device-id: QEMU device id of the unplugged device +# Since: 4.2 +# +# Example: +# +# <- { "event": "FAILOVER_NEGOTIATED", +# "data": "net1" } +# +## +{ 'event': 'FAILOVER_NEGOTIATED', + 'data': {'device-id': 'str'} } -- 2.21.0
[PATCH v7 05/11] qapi: add unplug primary event
This event is emitted when we sent a request to unplug a failover primary device from the Guest OS and it includes the device id of the primary device. Signed-off-by: Jens Freimann --- qapi/migration.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index 82feb5bd39..e9e7a97c03 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1448,3 +1448,22 @@ # Since: 3.0 ## { 'command': 'migrate-pause', 'allow-oob': true } + +## +# @UNPLUG_PRIMARY: +# +# Emitted from source side of a migration when migration state is +# WAIT_UNPLUG. Device was unplugged by guest operating system. +# Device resources in QEMU are kept on standby to be able to re-plug it in case +# of migration failure. +# +# @device-id: QEMU device id of the unplugged device +# +# Since: 4.2 +# +# Example: +# {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} } +# +## +{ 'event': 'UNPLUG_PRIMARY', + 'data': { 'device-id': 'str' } } -- 2.21.0
[PATCH v7 10/11] net/virtio: add failover support
This patch adds support to handle failover device pairs of a virtio-net device and a (vfio-)pci device, where the virtio-net acts as the standby device and the (vfio-)pci device as the primary. The general idea is that we have a pair of devices, a (vfio-)pci and a emulated (virtio-net) device. Before migration the vfio device is unplugged and data flows to the emulated device, on the target side another (vfio-)pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. To achieve this we need: 1. Provide a callback function for the should_be_hidden DeviceListener. It is called when the primary device is plugged in. Evaluate the QOpt passed in to check if it is the matching primary device. It returns if the device should be hidden or not. When it should be hidden it stores the device options in the VirtioNet struct and the device is added once the VIRTIO_NET_F_STANDBY feature is negotiated during virtio feature negotiation. If the virtio-net devices are not realized at the time the (vfio-)pci devices are realized, we need to connect the devices later. This way we make sure primary and standby devices can be specified in any order. 2. Register a callback for migration status notifier. When called it will unplug its primary device before the migration happens. 3. Register a callback for the migration code that checks if a device needs to be unplugged from the guest. Signed-off-by: Jens Freimann --- MAINTAINERS| 1 + docs/virtio-net-failover.rst | 68 hw/net/virtio-net.c| 302 + include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + 5 files changed, 384 insertions(+) create mode 100644 docs/virtio-net-failover.rst diff --git a/MAINTAINERS b/MAINTAINERS index 42e702f346..9d94cbacf8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1421,6 +1421,7 @@ S: Odd Fixes F: hw/net/ F: include/hw/net/ F: tests/virtio-net-test.c +F: docs/virtio-net-failover.rst T: git https://github.com/jasowang/qemu.git net Parallel NOR Flash devices diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst new file mode 100644 index 00..22f64c7bc8 --- /dev/null +++ b/docs/virtio-net-failover.rst @@ -0,0 +1,68 @@ + +QEMU virtio-net standby (net_failover) + + +This document explains the setup and usage of virtio-net standby feature which +is used to create a net_failover pair of devices. + +The general idea is that we have a pair of devices, a (vfio-)pci and a +virtio-net device. Before migration the vfio device is unplugged and data flows +through the virtio-net device, on the target side another vfio-pci device is +plugged in to take over the data-path. In the guest the net_failover kernel +module will pair net devices with the same MAC address. + +The two devices are called primary and standby device. The fast hardware based +networking device is called the primary device and the virtio-net device is the +standby device. + +Restrictions + + +Currently only PCIe devices are allowed as primary devices, this restriction +can be lifted in the future with enhanced QEMU support. Also, only networking +devices are allowed as primary device. The user needs to ensure that primary +and standby devices are not plugged into the same PCIe slot. + +Usecase +--- + + Virtio-net standby allows easy migration while using a passed-through fast + networking device by falling back to a virtio-net device for the duration of + the migration. It is like a simple version of a bond, the difference is that it + requires no configuration in the guest. When a guest is live-migrated to + another host QEMU will unplug the primary device via the PCIe based hotplug + handler and traffic will go through the virtio-net device. On the target + system the primary device will be automatically plugged back and the + net_failover module registers it again as the primary device. + +Usage +- + + The primary device can be hotplugged or be part of the startup configuration + + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \ +bus=root2,failover=on + + With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be enabled. + + -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1 + + failover_pair_id references the id of the virtio-net standby device. This + is only for pairing the devices within QEMU. The guest kernel module + net_failover will match devices with identical MAC addresses. + +Hotplug +--- + + Both primary and standby device can be hotplugged via the QEMU monitor. Note + that if the virtio-net device is plugged first a warning will be issued that it + couldn't find the primary device. + +Migration +- + + A new migration state wait-unplug was added for
[PATCH v7 04/11] pci: mark device having guest unplug request pending
Set pending_deleted_event in DeviceState for failover primary devices that were successfully unplugged by the Guest OS. Signed-off-by: Jens Freimann --- hw/pci/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 19363ff8ce..08718188bb 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); if (dev->partially_hotplugged) { +dev->qdev.pending_deleted_event = false; return; } hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); @@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, return; } +dev->pending_deleted_event = true; + /* In case user cancel the operation of multi-function hot-add, * remove the function that is unexposed to guest individually, * without interaction with guest. -- 2.21.0
[PATCH v7 03/11] pci: mark devices partially unplugged
Only the guest unplug request was triggered. This is needed for the failover feature. In case of a failed migration we need to plug the device back to the guest. Signed-off-by: Jens Freimann --- hw/pci/pcie.c| 3 +++ include/hw/pci/pci.h | 1 + 2 files changed, 4 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index a6beb567bd..19363ff8ce 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) { HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); +if (dev->partially_hotplugged) { +return; +} hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); object_unparent(OBJECT(dev)); } diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 69d1f0228b..db75c6dfd0 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache; struct PCIDevice { DeviceState qdev; +bool partially_hotplugged; /* PCI config space */ uint8_t *config; -- 2.21.0
[PATCH v7 07/11] migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figured out what works. For failover primary devices qdev_unplug() is called from the migration handler, i.e. during migration. This patch adds a flag to DeviceState which is set to false for all devices and makes an exception for PCI devices that are also primary devices in a failover pair. Signed-off-by: Jens Freimann --- hw/core/qdev.c | 1 + hw/pci/pci.c | 1 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 2 +- 4 files changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 3b8d43d0fd..cf1ba28fe3 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -996,6 +996,7 @@ static void device_initfn(Object *obj) dev->instance_id_alias = -1; dev->realized = false; +dev->allow_unplug_during_migration = false; object_property_add_bool(obj, "realized", device_get_realized, device_set_realized, NULL); diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 824ab4ed7b..c68498c0de 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2130,6 +2130,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) pci_qdev_unrealize(DEVICE(pci_dev), NULL); return; } +qdev->allow_unplug_during_migration = true; } /* rom loading */ diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 710981af36..1518495b1e 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -156,6 +156,7 @@ struct DeviceState { bool pending_deleted_event; QemuOpts *opts; int hotplugged; +bool allow_unplug_during_migration; BusState *parent_bus; QLIST_HEAD(, NamedGPIOList) gpios; QLIST_HEAD(, BusState) child_bus; diff --git a/qdev-monitor.c b/qdev-monitor.c index ffa08c670f..e6b112eb0a 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp) return; } -if (!migration_is_idle()) { +if (!migration_is_idle() && !dev->allow_unplug_during_migration) { error_setg(errp, "device_del not allowed while migrating"); return; } -- 2.21.0
[PATCH v7 02/11] pci: add option for net failover
This patch adds a failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. The requirement for PCIe is because it doesn't support other hotplug controllers at the moment. The failover functionality can be added to other hotplug controllers like ACPI, SHCP,... later on. Signed-off-by: Jens Freimann --- hw/pci/pci.c | 31 +++ include/hw/pci/pci.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..824ab4ed7b 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -75,6 +75,8 @@ static Property pci_props[] = { QEMU_PCIE_LNKSTA_DLLLA_BITNR, true), DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present, QEMU_PCIE_EXTCAP_INIT_BITNR, true), +DEFINE_PROP_STRING("failover_pair_id", PCIDevice, + failover_pair_id), DEFINE_PROP_END_OF_LIST() }; @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) ObjectClass *klass = OBJECT_CLASS(pc); Error *local_err = NULL; bool is_default_rom; +uint16_t class_id; /* initialize cap_present for pci_is_express() and pci_config_size(), * Note that hybrid PCIs are not set automatically and need to manage @@ -2101,6 +2104,34 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->failover_pair_id) { +if (!pci_bus_is_express(pci_get_bus(pci_dev))) { +error_setg(errp, "failover primary device must be on " + "PCIExpress bus"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE); +if (class_id != PCI_CLASS_NETWORK_ETHERNET) { +error_setg(errp, "failover primary device is not an " + "Ethernet device"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) +&& (PCI_FUNC(pci_dev->devfn) == 0)) { +qdev->allow_unplug_during_migration = true; +} else { +error_setg(errp, "failover: primary device must be in its own " + "PCI slot"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +} + /* rom loading */ is_default_rom = false; if (pci_dev->romfile == NULL && pc->romfile != NULL) { diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index f3f0ffd5fb..69d1f0228b 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -352,6 +352,9 @@ struct PCIDevice { MSIVectorUseNotifier msix_vector_use_notifier; MSIVectorReleaseNotifier msix_vector_release_notifier; MSIVectorPollNotifier msix_vector_poll_notifier; + +/* ID of standby device in net_failover pair */ +char *failover_pair_id; }; void pci_register_bar(PCIDevice *pci_dev, int region_num, -- 2.21.0
[PATCH v7 0/11] add failover feature for assigned network devices
-qmp unix:/tmp/qmp.socket,server,nowait \ -monitor telnet:127.0.0.1:,server,nowait \ -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \ -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \ -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \ -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \ -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \ /root/rhel-guest-image-8.0-1781.x86_64.qcow2 I'm grateful for any remarks or ideas! Thanks! regards, Jens Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration MAINTAINERS| 1 + docs/virtio-net-failover.rst | 68 hw/core/qdev.c | 25 +++ hw/net/virtio-net.c| 302 + hw/pci/pci.c | 32 hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 30 include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h| 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 33 migration/savevm.h | 2 + qapi/migration.json| 24 ++- qapi/net.json | 19 +++ qdev-monitor.c | 43 - tests/libqos/libqos.c | 3 +- vl.c | 6 +- 22 files changed, 649 insertions(+), 15 deletions(-) create mode 100644 docs/virtio-net-failover.rst -- 2.21.0 *** BLURB HERE *** Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration MAINTAINERS| 1 + docs/virtio-net-failover.rst | 68 hw/core/qdev.c | 25 +++ hw/net/virtio-net.c| 302 + hw/pci/pci.c | 32 hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 30 include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h| 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 31 migration/savevm.h | 2 + qapi/migration.json| 24 ++- qapi/net.json | 19 +++ qdev-monitor.c | 43 - tests/libqos/libqos.c | 3 +- vl.c | 6 +- 22 files changed, 647 insertions(+), 15 deletions(-) create mode 100644 docs/virtio-net-failover.rst -- 2.21.0
[PATCH v7 01/11] qdev/qbus: add hidden device support
This adds support for hiding a device to the qbus and qdev APIs. The first user of this will be the virtio-net failover feature but the API introduced with this patch could be used to implement other features as well, for example hiding pci devices when a pci bus is powered off. qdev_device_add() is modified to check for a failover_pair_id argument in the option string. A DeviceListener callback should_be_hidden() is added. It can be used by a standby device to inform qdev that this device should not be added now. The standby device handler can store the device options to plug the device in at a later point in time. One reason for hiding the device is that we don't want to expose both devices to the guest kernel until the respective virtio feature bit VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be handled correctly by the guest. More information on the kernel feature this is using: https://www.kernel.org/doc/html/latest/networking/net_failover.html An example where the primary device is a vfio-pci device and the standby device is a virtio-net device: A device is hidden when it has an "failover_pair_id" option, e.g. -device virtio-net-pci,...,failover=on,... -device vfio-pci,...,failover_pair_id=net1,... Signed-off-by: Jens Freimann --- hw/core/qdev.c | 24 include/hw/qdev-core.h | 29 + qdev-monitor.c | 41 + vl.c | 6 -- 4 files changed, 94 insertions(+), 6 deletions(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index cbad6c1d55..3b8d43d0fd 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -212,6 +212,30 @@ void device_listener_unregister(DeviceListener *listener) QTAILQ_REMOVE(&device_listeners, listener, link); } +bool qdev_should_hide_device(QemuOpts *opts) +{ +int rc = -1; +DeviceListener *listener; + +QTAILQ_FOREACH(listener, &device_listeners, link) { + if (listener->should_be_hidden) { +/* + * should_be_hidden_will return + * 1 if device matches opts and it should be hidden + * 0 if device matches opts and should not be hidden + * -1 if device doesn't match ops + */ +rc = listener->should_be_hidden(listener, opts); +} + +if (rc > 0) { +break; +} +} + +return rc > 0; +} + void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, int required_for_version) { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index aa123f88cb..710981af36 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp); * respective parent types. * * + * + * # Hiding a device # + * To hide a device, a DeviceListener function should_be_hidden() needs to + * be registered. + * It can be used to defer adding a device and therefore hide it from the + * guest. The handler registering to this DeviceListener can save the QOpts + * passed to it for re-using it later and must return that it wants the device + * to be/remain hidden or not. When the handler function decides the device + * shall not be hidden it will be added in qdev_device_add() and + * realized as any other device. Otherwise qdev_device_add() will return early + * without adding the device. The guest will not see a "hidden" device + * until it was marked don't hide and qdev_device_add called again. + * */ typedef struct DeviceClass { /*< private >*/ @@ -154,6 +167,12 @@ struct DeviceState { struct DeviceListener { void (*realize)(DeviceListener *listener, DeviceState *dev); void (*unrealize)(DeviceListener *listener, DeviceState *dev); +/* + * This callback is called upon init of the DeviceState and allows to + * inform qdev that a device should be hidden, depending on the device + * opts, for example, to hide a standby device. + */ +int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts); QTAILQ_ENTRY(DeviceListener) link; }; @@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus) void device_listener_register(DeviceListener *listener); void device_listener_unregister(DeviceListener *listener); +/** + * @qdev_should_hide_device: + * @opts: QemuOpts as passed on cmdline. + * + * Check if a device should be added. + * When a device is added via qdev_device_add() this will be called, + * and return if the device should be added now or not. + */ +bool qdev_should_hide_device(QemuOpts *opts); + #endif diff --git a/qdev-monitor.c b/qdev-monitor.c index 148df9cacf..ffa08c670f 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -32,9 +32,11 @@ #include "qemu/help_option.h" #include "qemu/option.h" #include "qemu/qemu-print
Re: [PATCH v6 0/11] add failover feature for assigned network devices
On Mon, Oct 28, 2019 at 11:58:53AM -0400, Michael S. Tsirkin wrote: I see at least comments from Markus. You answered but don't you need to also tweak the patch? That comment was addressed already IMO, but I'll change the patch description as well and while I'm at it will also fix David's comment. regards, Jens
Re: [PATCH v6 08/11] migration: add new migration state wait-unplug
Dr. David Alan Gilbert schrieb am Di., 29. Okt. 2019, 03:57: > * Jens Freimann (jfreim...@redhat.com) wrote: > > This patch adds a new migration state called wait-unplug. It is entered > > after the SETUP state if failover devices are present. It will transition > > into ACTIVE once all devices were succesfully unplugged from the guest. > > > > So if a guest doesn't respond or takes long to honor the unplug request > > the user will see the migration state 'wait-unplug'. > > > > In the migration thread we query failover devices if they're are still > > pending the guest unplug. When all are unplugged the migration > > continues. If one device won't unplug migration will stay in wait_unplug > > state. > > > > Signed-off-by: Jens Freimann > > Acked-by: Cornelia Huck > > I think this is OK, so > > > Reviewed-by: Dr. David Alan Gilbert > > but see question below > > > --- > > include/migration/vmstate.h | 2 ++ > > migration/migration.c | 21 + > > migration/migration.h | 3 +++ > > migration/savevm.c | 36 > > migration/savevm.h | 2 ++ > > qapi/migration.json | 5 - > > 6 files changed, 68 insertions(+), 1 deletion(-) > > > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h > > index b9ee563aa4..ac4f46a67d 100644 > > --- a/include/migration/vmstate.h > > +++ b/include/migration/vmstate.h > > @@ -186,6 +186,8 @@ struct VMStateDescription { > > int (*pre_save)(void *opaque); > > int (*post_save)(void *opaque); > > bool (*needed)(void *opaque); > > +bool (*dev_unplug_pending)(void *opaque); > > + > > const VMStateField *fields; > > const VMStateDescription **subsections; > > }; > > diff --git a/migration/migration.c b/migration/migration.c > > index 3febd0f8f3..51764f2565 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -52,6 +52,7 @@ > > #include "hw/qdev-properties.h" > > #include "monitor/monitor.h" > > #include "net/announce.h" > > +#include "qemu/queue.h" > > > > #define MAX_THROTTLE (32 << 20) /* Migration transfer speed > throttling */ > > > > @@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state) > > case MIGRATION_STATUS_SETUP: > > case MIGRATION_STATUS_PRE_SWITCHOVER: > > case MIGRATION_STATUS_DEVICE: > > +case MIGRATION_STATUS_WAIT_UNPLUG: > > return true; > > > > default: > > @@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo > *info) > > case MIGRATION_STATUS_CANCELLED: > > info->has_status = true; > > break; > > +case MIGRATION_STATUS_WAIT_UNPLUG: > > +info->has_status = true; > > +break; > > } > > info->status = s->state; > > } > > @@ -1694,6 +1699,7 @@ bool migration_is_idle(void) > > case MIGRATION_STATUS_COLO: > > case MIGRATION_STATUS_PRE_SWITCHOVER: > > case MIGRATION_STATUS_DEVICE: > > +case MIGRATION_STATUS_WAIT_UNPLUG: > > return false; > > case MIGRATION_STATUS__MAX: > > g_assert_not_reached(); > > @@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque) > > > > qemu_savevm_state_setup(s->to_dst_file); > > > > +if (qemu_savevm_nr_failover_devices()) { > > +migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > > + MIGRATION_STATUS_WAIT_UNPLUG); > > + > > +while (s->state == MIGRATION_STATUS_WAIT_UNPLUG && > > +!qemu_savevm_state_guest_unplug_pending()) { > > +qemu_sem_timedwait(&s->wait_unplug_sem, 250); > > +} > > + > > +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, > > +MIGRATION_STATUS_ACTIVE); > > +} > > + > > s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; > > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > >MIGRATION_STATUS_ACTIVE); > > @@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object > *obj) > > qemu_mutex_destroy(&ms->qemu_file_lock); > > g_free(params->tls_hostname); > > g_free(params->tls_creds); > > +qemu_sem_destroy(&ms->wai
Re: [PATCH v6 0/11] add failover feature for assigned network devices
Hi Michael, I addressed all comments and feedback and think this can be merged but I'm unclear about which tree it should go to. Will you merge it into the virtio-tree? regards, Jens On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote: This is implementing the host side of the net_failover concept (https://www.kernel.org/doc/html/latest/networking/net_failover.html) Changes since v5: * rename net_failover_pair_id parameter/property to failover_pair_id * in PCI code use pci_bus_is_express(). This won't fail on functions > 0 * make sure primary and standby can't be added to same PCI slot * add documentation file in docs/ to virtio-net patch, add file to MAINTAINERS (added to networking devices section) * add comment to QAPI event for failover negotiation, try to improve commit message The general idea is that we have a pair of devices, a vfio-pci and a virtio-net device. Before migration the vfio device is unplugged and data flows to the virtio-net device, on the target side another vfio-pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs * Patch 2 adds checks to PCIDevice for only allowing ethernet devices as failover primary and only PCIExpress capable devices * Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we use to skip the unrealize code path when doing a unplug of the primary device * Patch 4 sets the pending_deleted_event before triggering the guest unplug request * Patch 5 and 6 add new qmp events, one sends the device id of a device that was just requested to be unplugged from the guest and another one to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated * Patch 7 make sure that we can unplug the vfio-device before migration starts * Patch 8 adds a new migration state that is entered while we wait for devices to be unplugged by guest OS * Patch 9 just adds the new migration state to a check in libqos code * Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio device until the VIRTIO_NET_F_STANDBY feature is acked. It also implements the migration handler to unplug the device from the guest and re-plug in case of migration failure * Patch 11 allows migration for failover vfio-pci devices Previous discussion: RFC v1 https://patchwork.ozlabs.org/cover/989098/ RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html v3: https://patchew.org/QEMU/2019102015.11785-1-jfreim...@redhat.com/ v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreim...@redhat.com/ v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreim...@redhat.com/ To summarize concerns/feedback from previous discussion: 1.- guest OS can reject or worse _delay_ unplug by any amount of time. Migration might get stuck for unpredictable time with unclear reason. This approach combines two tricky things, hot/unplug and migration. -> We need to let libvirt know what's happening. Add new qmp events and a new migration state. When a primary device is (partially) unplugged (only from guest) we send a qmp event with the device id. When it is unplugged from the guest the DEVICE_DELETED event is sent. Migration will enter the wait-unplug state while waiting for the guest os to unplug all primary devices and then move on with migration. 2. PCI devices are a precious ressource. The primary device should never be added to QEMU if it won't be used by guest instead of hiding it in QEMU. -> We only hotplug the device when the standby feature bit was negotiated. We save the device cmdline options until we need it for qdev_device_add() Hiding a device can be a useful concept to model. For example a pci device in a powered-off slot could be marked as hidden until the slot is powered on (mst). 3. Management layer software should handle this. Open Stack already has components/code to handle unplug/replug VFIO devices and metadata to provide to the guest for detecting which devices should be paired. -> An approach that includes all software from firmware to higher-level management software wasn't tried in the last years. This is an attempt to keep it simple and contained in QEMU as much as possible. One of the problems that stopped management software and libvirt from implementing this idea is that it can't be sure that it's possible to re-plug the primary device. By not freeing the devices resources in QEMU and only asking the guest OS to unplug it is possible to re-plug the device in case of a migration failure. 4. Hotplugging a device and then making it part of a failover setup is not possi
Re: [PATCH v6 06/11] qapi: add failover negotiated event
On Fri, Oct 25, 2019 at 04:03:54PM +0200, Markus Armbruster wrote: Bear with me, I know next to nothing about failover. Jens Freimann writes: This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature was enabled. The primary device this virtio-net device is associated with, will now be hotplugged via qdev_device_add(). Passive voice deftly avoids telling the reader who will do the hot-plugging. Intentional? Not really, it's in the comment to the event. The hotplug will be done by the virtio-net device code that activates the feature, in virtio_net_set_features(). Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- qapi/net.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 728990f4fb..ea64f7 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -737,3 +737,22 @@ ## { 'command': 'announce-self', 'boxed': true, 'data' : 'AnnounceParameters'} + +## +# @FAILOVER_NEGOTIATED: +# +# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation. +# Failover primary devices which were hidden (not hotplugged when requested) +# before will now be hotplugged by the virtio-net standby device. +# +# device-id: QEMU device id of the unplugged device @device-id is new since v5. A quick skim of https://www.kernel.org/doc/html/latest/networking/net_failover.html tells me there are three devices involved: master, primary slave, standby slave. Which one is @device-id? Or am I confused? Yes, the device-id is new and it's the device-id of the standby (i.e. virtio-net) device. regards, Jens
[PATCH v6 11/11] vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann Acked-by: Alex Williamson Acked-by: Cornelia Huck --- hw/vfio/pci.c | 26 -- hw/vfio/pci.h | 1 + 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 12fac39804..e6569a7968 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -40,6 +40,7 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "migration/blocker.h" #define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) @@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) return; } +if (!pdev->failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +error_free(vdev->migration_blocker); +return; +} +} + vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev); vdev->vbasedev.ops = &vfio_pci_ops; vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI; @@ -3008,6 +3020,10 @@ out_teardown: vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} } static void vfio_instance_finalize(Object *obj) @@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj) vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} /* * XXX Leaking igd_opregion is not an oversight, we can't remove the * fw_cfg entry therefore leaking this allocation seems like the safest @@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static const VMStateDescription vfio_pci_vmstate = { -.name = "vfio-pci", -.unmigratable = 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) dc->reset = vfio_pci_reset; dc->props = vfio_pci_dev_properties; -dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 834a90d646..b329d50338 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice { bool no_vfio_ioeventfd; bool enable_ramfb; VFIODisplay *dpy; +Error *migration_blocker; } VFIOPCIDevice; uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); -- 2.21.0
[PATCH v6 10/11] net/virtio: add failover support
This patch adds support to handle failover device pairs of a virtio-net device and a (vfio-)pci device, where the virtio-net acts as the standby device and the (vfio-)pci device as the primary. The general idea is that we have a pair of devices, a (vfio-)pci and a emulated (virtio-net) device. Before migration the vfio device is unplugged and data flows to the emulated device, on the target side another (vfio-)pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. To achieve this we need: 1. Provide a callback function for the should_be_hidden DeviceListener. It is called when the primary device is plugged in. Evaluate the QOpt passed in to check if it is the matching primary device. It returns if the device should be hidden or not. When it should be hidden it stores the device options in the VirtioNet struct and the device is added once the VIRTIO_NET_F_STANDBY feature is negotiated during virtio feature negotiation. If the virtio-net devices are not realized at the time the (vfio-)pci devices are realized, we need to connect the devices later. This way we make sure primary and standby devices can be specified in any order. 2. Register a callback for migration status notifier. When called it will unplug its primary device before the migration happens. 3. Register a callback for the migration code that checks if a device needs to be unplugged from the guest. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- MAINTAINERS| 1 + docs/virtio-net-failover.rst | 68 hw/net/virtio-net.c| 302 + include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + 5 files changed, 384 insertions(+) create mode 100644 docs/virtio-net-failover.rst diff --git a/MAINTAINERS b/MAINTAINERS index ed41d7d1b6..28f957e83d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1412,6 +1412,7 @@ S: Odd Fixes F: hw/net/ F: include/hw/net/ F: tests/virtio-net-test.c +F: docs/virtio-net-failover.rst T: git https://github.com/jasowang/qemu.git net Parallel NOR Flash devices diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst new file mode 100644 index 00..22f64c7bc8 --- /dev/null +++ b/docs/virtio-net-failover.rst @@ -0,0 +1,68 @@ + +QEMU virtio-net standby (net_failover) + + +This document explains the setup and usage of virtio-net standby feature which +is used to create a net_failover pair of devices. + +The general idea is that we have a pair of devices, a (vfio-)pci and a +virtio-net device. Before migration the vfio device is unplugged and data flows +through the virtio-net device, on the target side another vfio-pci device is +plugged in to take over the data-path. In the guest the net_failover kernel +module will pair net devices with the same MAC address. + +The two devices are called primary and standby device. The fast hardware based +networking device is called the primary device and the virtio-net device is the +standby device. + +Restrictions + + +Currently only PCIe devices are allowed as primary devices, this restriction +can be lifted in the future with enhanced QEMU support. Also, only networking +devices are allowed as primary device. The user needs to ensure that primary +and standby devices are not plugged into the same PCIe slot. + +Usecase +--- + + Virtio-net standby allows easy migration while using a passed-through fast + networking device by falling back to a virtio-net device for the duration of + the migration. It is like a simple version of a bond, the difference is that it + requires no configuration in the guest. When a guest is live-migrated to + another host QEMU will unplug the primary device via the PCIe based hotplug + handler and traffic will go through the virtio-net device. On the target + system the primary device will be automatically plugged back and the + net_failover module registers it again as the primary device. + +Usage +- + + The primary device can be hotplugged or be part of the startup configuration + + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \ +bus=root2,failover=on + + With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be enabled. + + -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1 + + failover_pair_id references the id of the virtio-net standby device. This + is only for pairing the devices within QEMU. The guest kernel module + net_failover will match devices with identical MAC addresses. + +Hotplug +--- + + Both primary and standby device can be hotplugged via the QEMU monitor. Note + that if the virtio-net device is plugged first a warning will be issued that it + couldn't find the primary device. + +Migration +- + + A new migration state
[PATCH v6 09/11] libqos: tolerate wait-unplug migration state
Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- tests/libqos/libqos.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c index d71557c5cb..f229eb2cb8 100644 --- a/tests/libqos/libqos.c +++ b/tests/libqos/libqos.c @@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri) break; } -if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) { +if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0) +|| (strcmp(st, "wait-unplug") == 0)) { qobject_unref(rsp); g_usleep(5000); continue; -- 2.21.0
[PATCH v6 05/11] qapi: add unplug primary event
This event is emitted when we sent a request to unplug a failover primary device from the Guest OS and it includes the device id of the primary device. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- qapi/migration.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index 82feb5bd39..e9e7a97c03 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1448,3 +1448,22 @@ # Since: 3.0 ## { 'command': 'migrate-pause', 'allow-oob': true } + +## +# @UNPLUG_PRIMARY: +# +# Emitted from source side of a migration when migration state is +# WAIT_UNPLUG. Device was unplugged by guest operating system. +# Device resources in QEMU are kept on standby to be able to re-plug it in case +# of migration failure. +# +# @device-id: QEMU device id of the unplugged device +# +# Since: 4.2 +# +# Example: +# {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} } +# +## +{ 'event': 'UNPLUG_PRIMARY', + 'data': { 'device-id': 'str' } } -- 2.21.0
[PATCH v6 08/11] migration: add new migration state wait-unplug
This patch adds a new migration state called wait-unplug. It is entered after the SETUP state if failover devices are present. It will transition into ACTIVE once all devices were succesfully unplugged from the guest. So if a guest doesn't respond or takes long to honor the unplug request the user will see the migration state 'wait-unplug'. In the migration thread we query failover devices if they're are still pending the guest unplug. When all are unplugged the migration continues. If one device won't unplug migration will stay in wait_unplug state. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- include/migration/vmstate.h | 2 ++ migration/migration.c | 21 + migration/migration.h | 3 +++ migration/savevm.c | 36 migration/savevm.h | 2 ++ qapi/migration.json | 5 - 6 files changed, 68 insertions(+), 1 deletion(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index b9ee563aa4..ac4f46a67d 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -186,6 +186,8 @@ struct VMStateDescription { int (*pre_save)(void *opaque); int (*post_save)(void *opaque); bool (*needed)(void *opaque); +bool (*dev_unplug_pending)(void *opaque); + const VMStateField *fields; const VMStateDescription **subsections; }; diff --git a/migration/migration.c b/migration/migration.c index 3febd0f8f3..51764f2565 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -52,6 +52,7 @@ #include "hw/qdev-properties.h" #include "monitor/monitor.h" #include "net/announce.h" +#include "qemu/queue.h" #define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */ @@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state) case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return true; default: @@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info) case MIGRATION_STATUS_CANCELLED: info->has_status = true; break; +case MIGRATION_STATUS_WAIT_UNPLUG: +info->has_status = true; +break; } info->status = s->state; } @@ -1694,6 +1699,7 @@ bool migration_is_idle(void) case MIGRATION_STATUS_COLO: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return false; case MIGRATION_STATUS__MAX: g_assert_not_reached(); @@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque) qemu_savevm_state_setup(s->to_dst_file); +if (qemu_savevm_nr_failover_devices()) { +migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, + MIGRATION_STATUS_WAIT_UNPLUG); + +while (s->state == MIGRATION_STATUS_WAIT_UNPLUG && +!qemu_savevm_state_guest_unplug_pending()) { +qemu_sem_timedwait(&s->wait_unplug_sem, 250); +} + +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, +MIGRATION_STATUS_ACTIVE); +} + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); @@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj) qemu_mutex_destroy(&ms->qemu_file_lock); g_free(params->tls_hostname); g_free(params->tls_creds); +qemu_sem_destroy(&ms->wait_unplug_sem); qemu_sem_destroy(&ms->rate_limit_sem); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); @@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj) qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_sem, 0); qemu_sem_init(&ms->rate_limit_sem, 0); +qemu_sem_init(&ms->wait_unplug_sem, 0); qemu_mutex_init(&ms->qemu_file_lock); } diff --git a/migration/migration.h b/migration/migration.h index 4f2fe193dc..79b3dda146 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -206,6 +206,9 @@ struct MigrationState /* Flag set once the migration thread called bdrv_inactivate_all */ bool block_inactive; +/* Migration is waiting for guest to unplug device */ +QemuSemaphore wait_unplug_sem; + /* Migration is paused due to pause-before-switchover */ QemuSemaphore pause_sem; diff --git a/migration/savevm.c b/migration/savevm.c index 8d95e261f6..0f18dea49e 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1113,6 +1113,42 @@ void qemu_savevm_state_header(QEMUFile *f) } } +int qemu_sav
[PATCH v6 07/11] migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figured out what works. For failover primary devices qdev_unplug() is called from the migration handler, i.e. during migration. This patch adds a flag to DeviceState which is set to false for all devices and makes an exception for PCI devices that are also primary devices in a failover pair. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- hw/core/qdev.c | 1 + hw/pci/pci.c | 1 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 2 +- 4 files changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 3b8d43d0fd..cf1ba28fe3 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -996,6 +996,7 @@ static void device_initfn(Object *obj) dev->instance_id_alias = -1; dev->realized = false; +dev->allow_unplug_during_migration = false; object_property_add_bool(obj, "realized", device_get_realized, device_set_realized, NULL); diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 824ab4ed7b..c68498c0de 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2130,6 +2130,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) pci_qdev_unrealize(DEVICE(pci_dev), NULL); return; } +qdev->allow_unplug_during_migration = true; } /* rom loading */ diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 710981af36..1518495b1e 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -156,6 +156,7 @@ struct DeviceState { bool pending_deleted_event; QemuOpts *opts; int hotplugged; +bool allow_unplug_during_migration; BusState *parent_bus; QLIST_HEAD(, NamedGPIOList) gpios; QLIST_HEAD(, BusState) child_bus; diff --git a/qdev-monitor.c b/qdev-monitor.c index ffa08c670f..e6b112eb0a 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp) return; } -if (!migration_is_idle()) { +if (!migration_is_idle() && !dev->allow_unplug_during_migration) { error_setg(errp, "device_del not allowed while migrating"); return; } -- 2.21.0
[PATCH v6 03/11] pci: mark devices partially unplugged
Only the guest unplug request was triggered. This is needed for the failover feature. In case of a failed migration we need to plug the device back to the guest. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- hw/pci/pcie.c| 3 +++ include/hw/pci/pci.h | 1 + 2 files changed, 4 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index a6beb567bd..19363ff8ce 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) { HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); +if (dev->partially_hotplugged) { +return; +} hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); object_unparent(OBJECT(dev)); } diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 69d1f0228b..db75c6dfd0 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache; struct PCIDevice { DeviceState qdev; +bool partially_hotplugged; /* PCI config space */ uint8_t *config; -- 2.21.0
[PATCH v6 01/11] qdev/qbus: add hidden device support
This adds support for hiding a device to the qbus and qdev APIs. The first user of this will be the virtio-net failover feature but the API introduced with this patch could be used to implement other features as well, for example hiding pci devices when a pci bus is powered off. qdev_device_add() is modified to check for a failover_pair_id argument in the option string. A DeviceListener callback should_be_hidden() is added. It can be used by a standby device to inform qdev that this device should not be added now. The standby device handler can store the device options to plug the device in at a later point in time. One reason for hiding the device is that we don't want to expose both devices to the guest kernel until the respective virtio feature bit VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be handled correctly by the guest. More information on the kernel feature this is using: https://www.kernel.org/doc/html/latest/networking/net_failover.html An example where the primary device is a vfio-pci device and the standby device is a virtio-net device: A device is hidden when it has an "failover_pair_id" option, e.g. -device virtio-net-pci,...,failover=on,... -device vfio-pci,...,failover_pair_id=net1,... Signed-off-by: Jens Freimann Reviewed-by: Cornelia Huck --- hw/core/qdev.c | 24 include/hw/qdev-core.h | 29 + qdev-monitor.c | 41 + vl.c | 6 -- 4 files changed, 94 insertions(+), 6 deletions(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index cbad6c1d55..3b8d43d0fd 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -212,6 +212,30 @@ void device_listener_unregister(DeviceListener *listener) QTAILQ_REMOVE(&device_listeners, listener, link); } +bool qdev_should_hide_device(QemuOpts *opts) +{ +int rc = -1; +DeviceListener *listener; + +QTAILQ_FOREACH(listener, &device_listeners, link) { + if (listener->should_be_hidden) { +/* + * should_be_hidden_will return + * 1 if device matches opts and it should be hidden + * 0 if device matches opts and should not be hidden + * -1 if device doesn't match ops + */ +rc = listener->should_be_hidden(listener, opts); +} + +if (rc > 0) { +break; +} +} + +return rc > 0; +} + void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, int required_for_version) { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index aa123f88cb..710981af36 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp); * respective parent types. * * + * + * # Hiding a device # + * To hide a device, a DeviceListener function should_be_hidden() needs to + * be registered. + * It can be used to defer adding a device and therefore hide it from the + * guest. The handler registering to this DeviceListener can save the QOpts + * passed to it for re-using it later and must return that it wants the device + * to be/remain hidden or not. When the handler function decides the device + * shall not be hidden it will be added in qdev_device_add() and + * realized as any other device. Otherwise qdev_device_add() will return early + * without adding the device. The guest will not see a "hidden" device + * until it was marked don't hide and qdev_device_add called again. + * */ typedef struct DeviceClass { /*< private >*/ @@ -154,6 +167,12 @@ struct DeviceState { struct DeviceListener { void (*realize)(DeviceListener *listener, DeviceState *dev); void (*unrealize)(DeviceListener *listener, DeviceState *dev); +/* + * This callback is called upon init of the DeviceState and allows to + * inform qdev that a device should be hidden, depending on the device + * opts, for example, to hide a standby device. + */ +int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts); QTAILQ_ENTRY(DeviceListener) link; }; @@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus) void device_listener_register(DeviceListener *listener); void device_listener_unregister(DeviceListener *listener); +/** + * @qdev_should_hide_device: + * @opts: QemuOpts as passed on cmdline. + * + * Check if a device should be added. + * When a device is added via qdev_device_add() this will be called, + * and return if the device should be added now or not. + */ +bool qdev_should_hide_device(QemuOpts *opts); + #endif diff --git a/qdev-monitor.c b/qdev-monitor.c index 148df9cacf..ffa08c670f 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -32,9 +32,11 @@ #include "qemu/help_option.h" #include "qemu/option.h" #
[PATCH v6 06/11] qapi: add failover negotiated event
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature was enabled. The primary device this virtio-net device is associated with, will now be hotplugged via qdev_device_add(). Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- qapi/net.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 728990f4fb..ea64f7 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -737,3 +737,22 @@ ## { 'command': 'announce-self', 'boxed': true, 'data' : 'AnnounceParameters'} + +## +# @FAILOVER_NEGOTIATED: +# +# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation. +# Failover primary devices which were hidden (not hotplugged when requested) +# before will now be hotplugged by the virtio-net standby device. +# +# device-id: QEMU device id of the unplugged device +# Since: 4.2 +# +# Example: +# +# <- { "event": "FAILOVER_NEGOTIATED", +# "data": "net1" } +# +## +{ 'event': 'FAILOVER_NEGOTIATED', + 'data': {'device-id': 'str'} } -- 2.21.0
[PATCH v6 04/11] pci: mark device having guest unplug request pending
Set pending_deleted_event in DeviceState for failover primary devices that were successfully unplugged by the Guest OS. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- hw/pci/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 19363ff8ce..08718188bb 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); if (dev->partially_hotplugged) { +dev->qdev.pending_deleted_event = false; return; } hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); @@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, return; } +dev->pending_deleted_event = true; + /* In case user cancel the operation of multi-function hot-add, * remove the function that is unexposed to guest individually, * without interaction with guest. -- 2.21.0
[PATCH v6 02/11] pci: add option for net failover
This patch adds a failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. The requirement for PCIe is because it doesn't support other hotplug controllers at the moment. The failover functionality can be added to other hotplug controllers like ACPI, SHCP,... later on. Signed-off-by: Jens Freimann Acked-by: Cornelia Huck --- hw/pci/pci.c | 31 +++ include/hw/pci/pci.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..824ab4ed7b 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -75,6 +75,8 @@ static Property pci_props[] = { QEMU_PCIE_LNKSTA_DLLLA_BITNR, true), DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present, QEMU_PCIE_EXTCAP_INIT_BITNR, true), +DEFINE_PROP_STRING("failover_pair_id", PCIDevice, + failover_pair_id), DEFINE_PROP_END_OF_LIST() }; @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) ObjectClass *klass = OBJECT_CLASS(pc); Error *local_err = NULL; bool is_default_rom; +uint16_t class_id; /* initialize cap_present for pci_is_express() and pci_config_size(), * Note that hybrid PCIs are not set automatically and need to manage @@ -2101,6 +2104,34 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->failover_pair_id) { +if (!pci_bus_is_express(pci_get_bus(pci_dev))) { +error_setg(errp, "failover primary device must be on " + "PCIExpress bus"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE); +if (class_id != PCI_CLASS_NETWORK_ETHERNET) { +error_setg(errp, "failover primary device is not an " + "Ethernet device"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) +&& (PCI_FUNC(pci_dev->devfn) == 0)) { +qdev->allow_unplug_during_migration = true; +} else { +error_setg(errp, "failover: primary device must be in its own " + "PCI slot"); +error_propagate(errp, local_err); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} +} + /* rom loading */ is_default_rom = false; if (pci_dev->romfile == NULL && pc->romfile != NULL) { diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index f3f0ffd5fb..69d1f0228b 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -352,6 +352,9 @@ struct PCIDevice { MSIVectorUseNotifier msix_vector_use_notifier; MSIVectorReleaseNotifier msix_vector_release_notifier; MSIVectorPollNotifier msix_vector_poll_notifier; + +/* ID of standby device in net_failover pair */ +char *failover_pair_id; }; void pci_register_bar(PCIDevice *pci_dev, int region_num, -- 2.21.0
[PATCH v6 0/11] add failover feature for assigned network devices
o migrate the VM. Command line example: qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \ -machine q35,kernel-irqchip=split -cpu host \ -serial stdio \ -net none \ -qmp unix:/tmp/qmp.socket,server,nowait \ -monitor telnet:127.0.0.1:,server,nowait \ -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \ -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \ -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \ -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \ -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \ /root/rhel-guest-image-8.0-1781.x86_64.qcow2 I'm grateful for any remarks or ideas! Thanks! regards, Jens Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration MAINTAINERS| 1 + docs/virtio-net-failover.rst | 68 hw/core/qdev.c | 25 +++ hw/net/virtio-net.c| 302 + hw/pci/pci.c | 32 hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 30 include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h| 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 36 migration/savevm.h | 2 + qapi/migration.json| 24 ++- qapi/net.json | 19 +++ qdev-monitor.c | 43 - tests/libqos/libqos.c | 3 +- vl.c | 6 +- 22 files changed, 652 insertions(+), 15 deletions(-) create mode 100644 docs/virtio-net-failover.rst -- 2.21.0
Re: [PATCH v5 02/11] pci: add option for net failover
On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote: On Wed, 23 Oct 2019 10:27:02 +0200 Jens Freimann wrote: [...] @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->net_failover_pair_id) { +if (!pci_is_express(pci_dev)) { +error_setg(errp, "failover device is not a PCIExpress device"); +error_propagate(errp, local_err); +return; +} Did we decide we don't need to test that the device is also in a hotpluggable slot? Are there also multi-function considerations that should be prevented or documented? For example, if a user tries to configure both the primary and failover NICs in the same slot, I assume bad things will happen. I added this check if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) && (PCI_FUNC(pci_dev->devfn) == 0)) { qdev->allow_unplug_during_migration = true; } else { error_setg(errp, "failover: primary device must be in its own " "PCI slot"); error_propagate(errp, local_err); pci_qdev_unrealize(DEVICE(pci_dev), NULL); return; } When I first add a vfio-pci with net_failover_pair_id=x,multifunction=on and addr=0.0 I will now get an error. (qemu) device_add vfio-pci,...,bus=root2,net_failover_pair_id=net1,multifunction=on,addr=0.0 Error: failover: primary device must be in its own PCI slot If I put in a virtio-net device in slot 0 and then try to add a vfio-pci device in the same slot I get the following error message: -device virtio-net-pci,...id=net1bus=root1,failover=on,multifunction=on,addr=0.0 (qemu) device_add vfio-pci,id=hostdev1,host=:5e:00.2,bus=root1,net_failover_pair_id=net1,addr=0.1 Error: PCI: slot 0 function 0 already ocuppied by virtio-net-pci, new func vfio-pci cannot be exposed to guest. regards, Jens
Re: [PATCH v5 06/11] qapi: add failover negotiated event
On Fri, Oct 25, 2019 at 07:35:28AM +0200, Markus Armbruster wrote: We ask patch submitters to cc: subject matter experts for review. You did. When such patches touch the QAPI schema, it's best to cc the qapi schema maintainers (Eric Blake and me) as well, because we can't require all subject matter experts to be fluent in the QAPI schema language and conventions. I found this one more or less by chance. Sorry about that, I'll make sure to get the right people next time. Jens Freimann writes: This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature was not negotiated during virtio feature negotiation. If this event is received it means any primary devices hotplugged before this were were never really added to QEMU devices. Too many negations for my poor old brain to process. I'll try to explain better :) Signed-off-by: Jens Freimann --- qapi/net.json | 16 1 file changed, 16 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 728990f4fb..8c5f3f1fb2 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -737,3 +737,19 @@ ## { 'command': 'announce-self', 'boxed': true, 'data' : 'AnnounceParameters'} + +## +# @FAILOVER_NEGOTIATED: +# +# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation +# +# Since: 4.2 +# +# Example: +# +# <- { "event": "FAILOVER_NEGOTIATED", +# "data": {} } +# +## +{ 'event': 'FAILOVER_NEGOTIATED', + 'data': {} } The commit message at least tries to explain intended use. The doc string does not. Should it? Sure, I'll add it. Thanks for the review! regards, Jens
Re: [PATCH v5 02/11] pci: add option for net failover
On Thu, Oct 24, 2019 at 10:52:36AM -0600, Alex Williamson wrote: On Thu, 24 Oct 2019 11:37:54 +0200 Jens Freimann wrote: [...] > >While reviewing, I realized that we shouldn't have this check for below reasons. > >1. It is user's responsibility to pass networking device. >But its ok to check the class, if PCI Device is passed. >So class comparison should be inside the pci_check(). I'm not sure I understand this point, could you please elaborate? You're suggesting to move the check for the class into the check for pci_is_express? Seems like the suggestion is that net_failover_pair_id should be an option on the base class of PCIDevice (DeviceState?) and only if it's a PCI device would we check the class code. But there are dependencies at the hotplug controller, which I think is why this is currently specific to PCI. Yes, It doesn't support acpi, shpc,... hotplug as of now. It shouldn't be hard to add but I'd like to do it as a follow-on series. However, it's an interesting point about pci_is_express(). This test is really just meant to check whether the hotplug controller supports this feature, which is only implemented in pciehp via this series. There's a bit of a mismatch though that pcie_is_express() checks whether the device is express, not whether the bus it sits on is express. I think we really want the latter, so maybe this should be: pci_bus_is_express(pci_get_bus(dev) For example this feature should work if I plug an e1000 (not e1000e) into an express slot, but not if I plug an e1000e into a conventional slot. I'll try this and test. Thanks! regards, Jens
Re: [PATCH v5 06/11] qapi: add failover negotiated event
On Thu, Oct 24, 2019 at 06:32:45PM +0100, Dr. David Alan Gilbert wrote: * Jens Freimann (jfreim...@redhat.com) wrote: This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature was not negotiated during virtio feature negotiation. If this event is received it means any primary devices hotplugged before this were were never really added to QEMU devices. Signed-off-by: Jens Freimann Can I just understand a bit more about what the meaning of this is. Say my VM boots: a) BIOS b) Boot loader c) Linux d) Reboots (possibly a',b', different c') When would I get that event? When can libvirt know it can use it? The event is sent every time we do feature negotiation for the virtio-net device, so you'll get it during Linux boot and reboots. In v6, I add a data field 'id' to the event to pass the device id. regards, Jens
Re: [PATCH v5 02/11] pci: add option for net failover
On Thu, Oct 24, 2019 at 06:22:27PM +0100, Dr. David Alan Gilbert wrote: * Jens Freimann (jfreim...@redhat.com) wrote: This patch adds a net_failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. QEMU will exit with an error message otherwise. Signed-off-by: Jens Freimann --- hw/pci/pci.c | 17 + include/hw/pci/pci.h | 3 +++ 2 files changed, 20 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..fa9b5219f8 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -75,6 +75,8 @@ static Property pci_props[] = { QEMU_PCIE_LNKSTA_DLLLA_BITNR, true), DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present, QEMU_PCIE_EXTCAP_INIT_BITNR, true), +DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice, +net_failover_pair_id), Should we just make this 'failover_pair_id' - then when someone in the future figures out how to make it work for something else (e.g. multipath block devices) then it's all good? Yes, I see no reason why not to rename it. Thanks! regards, Jens
Re: [PATCH v5 02/11] pci: add option for net failover
On Thu, Oct 24, 2019 at 05:03:46AM +, Parav Pandit wrote: @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->net_failover_pair_id) { +if (!pci_is_express(pci_dev)) { I am testing and integrating this piece with mlx5 devices. I see that pci_is_express() return true only for first PCI function. Didn't yet dig the API. Commenting out this check and below class check progresses further. First of all, thanks for testing this! Could you share your commandline please? I can't reproduce it. While reviewing, I realized that we shouldn't have this check for below reasons. 1. It is user's responsibility to pass networking device. But its ok to check the class, if PCI Device is passed. So class comparison should be inside the pci_check(). I'm not sure I understand this point, could you please elaborate? You're suggesting to move the check for the class into the check for pci_is_express? 2. It is limiting to only consider PCI devices. Automated and regression tests should be able validate this feature without PCI Device. This will enhance the stability of feature in long run. 3. net failover driver doesn't limit it to have it over only PCI device. So similarly hypervisor should be limiting. I agree that we don't have to limit it to PCI(e) forever. But for this first shot I think we should and then extend it continually. There are more things we can support in the future like other hotplug types etc. regards, Jens
Re: [PATCH v5 02/11] pci: add option for net failover
On Wed, Oct 23, 2019 at 02:02:11PM -0600, Alex Williamson wrote: On Wed, 23 Oct 2019 21:30:35 +0200 Jens Freimann wrote: On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote: >On Wed, 23 Oct 2019 10:27:02 +0200 >Jens Freimann wrote: [...] >Are there also multi-function considerations that >should be prevented or documented? For example, if a user tries to >configure both the primary and failover NICs in the same slot, I assume >bad things will happen. I would have expected that this is already checked in pci code, but it is not. I tried it and when I put both devices into the same slot they are both unplugged from the guest during boot but nothing else happens. I don't know what triggers that unplug of the devices. I'm not aware of any other problems regarding multi-function, which doesn't mean there aren't any. Hmm, was the hidden device at function #0? The guest won't find any functions if function #0 isn't present, but I don't know what would trigger the hotplug. The angle I'm thinking is that we only have slot level granularity for hotplug, so any sort of automatic hotplug of a slot should probably think about bystander devices within the slot. Yes that would be a problem, but isn't it the same in the non-failover case where a user configures it wrong? The slot where the device is plugged is not chosen automatically it's configured by the user, no? I might be mixing something up here. I have no idea yet how to check if a slot is already populated, but I'll think about it. Thanks! regards, Jens
Re: [PATCH v5 02/11] pci: add option for net failover
On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote: On Wed, 23 Oct 2019 10:27:02 +0200 Jens Freimann wrote: This patch adds a net_failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. QEMU will exit with an error message otherwise. Signed-off-by: Jens Freimann --- hw/pci/pci.c | 17 + include/hw/pci/pci.h | 3 +++ 2 files changed, 20 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..fa9b5219f8 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -75,6 +75,8 @@ static Property pci_props[] = { QEMU_PCIE_LNKSTA_DLLLA_BITNR, true), DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present, QEMU_PCIE_EXTCAP_INIT_BITNR, true), +DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice, +net_failover_pair_id), Nit, white space appears broken here. I'll fix it. DEFINE_PROP_END_OF_LIST() }; @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) ObjectClass *klass = OBJECT_CLASS(pc); Error *local_err = NULL; bool is_default_rom; +uint16_t class_id; /* initialize cap_present for pci_is_express() and pci_config_size(), * Note that hybrid PCIs are not set automatically and need to manage @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->net_failover_pair_id) { +if (!pci_is_express(pci_dev)) { +error_setg(errp, "failover device is not a PCIExpress device"); +error_propagate(errp, local_err); +return; +} Did we decide we don't need to test that the device is also in a hotpluggable slot? Hmm, my reply to you was never sent. I thought the check for qdev_device_add() was sufficient but you said that works only after qdev_hotplug is set (after machine creation). I modified the check to this: hide = should_hide_device(opts); if ((hide || qdev_hotplug) && bus && !qbus_is_hotpluggable(bus)) { error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name); return NULL; } if (hide) { return NULL; } This will make qemu fail when we have the device in the initial configuration or when we hotplug it to a bus that doesn't support it. I tested both with a device on pcie.0. Am I missing something? Are there also multi-function considerations that should be prevented or documented? For example, if a user tries to configure both the primary and failover NICs in the same slot, I assume bad things will happen. I would have expected that this is already checked in pci code, but it is not. I tried it and when I put both devices into the same slot they are both unplugged from the guest during boot but nothing else happens. I don't know what triggers that unplug of the devices. I'm not aware of any other problems regarding multi-function, which doesn't mean there aren't any. +class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE); +if (class_id != PCI_CLASS_NETWORK_ETHERNET) { +error_setg(errp, "failover device is not an Ethernet device"); +error_propagate(errp, local_err); +return; +} +} Looks like cleanup is missing both both error cases, the option rom error path below this does a pci_qdev_unrealize() before returning so I'd assume we want the same here. Thanks, Thanks, I'll fix this too. regards, Jens
[PATCH v5 11/11] vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann --- hw/vfio/pci.c | 26 -- hw/vfio/pci.h | 1 + 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 12fac39804..5dadfec6e8 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -40,6 +40,7 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "migration/blocker.h" #define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) @@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) return; } +if (!pdev->net_failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +error_free(vdev->migration_blocker); +return; +} +} + vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev); vdev->vbasedev.ops = &vfio_pci_ops; vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI; @@ -3008,6 +3020,10 @@ out_teardown: vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} } static void vfio_instance_finalize(Object *obj) @@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj) vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} /* * XXX Leaking igd_opregion is not an oversight, we can't remove the * fw_cfg entry therefore leaking this allocation seems like the safest @@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static const VMStateDescription vfio_pci_vmstate = { -.name = "vfio-pci", -.unmigratable = 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) dc->reset = vfio_pci_reset; dc->props = vfio_pci_dev_properties; -dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 834a90d646..b329d50338 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice { bool no_vfio_ioeventfd; bool enable_ramfb; VFIODisplay *dpy; +Error *migration_blocker; } VFIOPCIDevice; uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); -- 2.21.0
[PATCH v5 08/11] migration: add new migration state wait-unplug
This patch adds a new migration state called wait-unplug. It is entered after the SETUP state if failover devices are present. It will transition into ACTIVE once all devices were succesfully unplugged from the guest. So if a guest doesn't respond or takes long to honor the unplug request the user will see the migration state 'wait-unplug'. In the migration thread we query failover devices if they're are still pending the guest unplug. When all are unplugged the migration continues. If one device won't unplug migration will stay in wait_unplug state. Signed-off-by: Jens Freimann --- include/migration/vmstate.h | 2 ++ migration/migration.c | 21 + migration/migration.h | 3 +++ migration/savevm.c | 36 migration/savevm.h | 2 ++ qapi/migration.json | 5 - 6 files changed, 68 insertions(+), 1 deletion(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index b9ee563aa4..ac4f46a67d 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -186,6 +186,8 @@ struct VMStateDescription { int (*pre_save)(void *opaque); int (*post_save)(void *opaque); bool (*needed)(void *opaque); +bool (*dev_unplug_pending)(void *opaque); + const VMStateField *fields; const VMStateDescription **subsections; }; diff --git a/migration/migration.c b/migration/migration.c index 3febd0f8f3..51764f2565 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -52,6 +52,7 @@ #include "hw/qdev-properties.h" #include "monitor/monitor.h" #include "net/announce.h" +#include "qemu/queue.h" #define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */ @@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state) case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return true; default: @@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info) case MIGRATION_STATUS_CANCELLED: info->has_status = true; break; +case MIGRATION_STATUS_WAIT_UNPLUG: +info->has_status = true; +break; } info->status = s->state; } @@ -1694,6 +1699,7 @@ bool migration_is_idle(void) case MIGRATION_STATUS_COLO: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: +case MIGRATION_STATUS_WAIT_UNPLUG: return false; case MIGRATION_STATUS__MAX: g_assert_not_reached(); @@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque) qemu_savevm_state_setup(s->to_dst_file); +if (qemu_savevm_nr_failover_devices()) { +migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, + MIGRATION_STATUS_WAIT_UNPLUG); + +while (s->state == MIGRATION_STATUS_WAIT_UNPLUG && +!qemu_savevm_state_guest_unplug_pending()) { +qemu_sem_timedwait(&s->wait_unplug_sem, 250); +} + +migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, +MIGRATION_STATUS_ACTIVE); +} + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); @@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj) qemu_mutex_destroy(&ms->qemu_file_lock); g_free(params->tls_hostname); g_free(params->tls_creds); +qemu_sem_destroy(&ms->wait_unplug_sem); qemu_sem_destroy(&ms->rate_limit_sem); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); @@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj) qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_sem, 0); qemu_sem_init(&ms->rate_limit_sem, 0); +qemu_sem_init(&ms->wait_unplug_sem, 0); qemu_mutex_init(&ms->qemu_file_lock); } diff --git a/migration/migration.h b/migration/migration.h index 4f2fe193dc..79b3dda146 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -206,6 +206,9 @@ struct MigrationState /* Flag set once the migration thread called bdrv_inactivate_all */ bool block_inactive; +/* Migration is waiting for guest to unplug device */ +QemuSemaphore wait_unplug_sem; + /* Migration is paused due to pause-before-switchover */ QemuSemaphore pause_sem; diff --git a/migration/savevm.c b/migration/savevm.c index 8d95e261f6..0f18dea49e 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1113,6 +1113,42 @@ void qemu_savevm_state_header(QEMUFile *f) } } +int qemu_sav
[PATCH v5 10/11] net/virtio: add failover support
This patch adds support to handle failover device pairs of a virtio-net device and a vfio-pci device, where the virtio-net acts as the standby device and the vfio-pci device as the primary. The general idea is that we have a pair of devices, a vfio-pci and a emulated (virtio-net) device. Before migration the vfio device is unplugged and data flows to the emulated device, on the target side another vfio-pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. To achieve this we need: 1. Provide a callback function for the should_be_hidden DeviceListener. It is called when the primary device is plugged in. Evaluate the QOpt passed in to check if it is the matching primary device. It returns two values: - one to signal if the device to be added is the matching primary device - another one to signal to qdev if it should actually continue with adding the device or skip it. In the latter case it stores the device options in the VirtioNet struct and the device is added once the VIRTIO_NET_F_STANDBY feature is negotiated during virtio feature negotiation. If the virtio-net devices are not realized at the time the vfio-pci devices are realized, we need to connect the devices later. This way we make sure primary and standby devices can be specified in any order. 2. Register a callback for migration status notifier. When called it will unplug its primary device before the migration happens. 3. Register a callback for the migration code that checks if a device needs to be unplugged from the guest. Signed-off-by: Jens Freimann --- hw/net/virtio-net.c| 302 + include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + 3 files changed, 315 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 9f11422337..cccaf77bde 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -12,6 +12,7 @@ */ #include "qemu/osdep.h" +#include "qemu/atomic.h" #include "qemu/iov.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -21,6 +22,10 @@ #include "net/tap.h" #include "qemu/error-report.h" #include "qemu/timer.h" +#include "qemu/option.h" +#include "qemu/option_int.h" +#include "qemu/config-file.h" +#include "qapi/qmp/qdict.h" #include "hw/virtio/virtio-net.h" #include "net/vhost_net.h" #include "net/announce.h" @@ -28,11 +33,15 @@ #include "qapi/error.h" #include "qapi/qapi-events-net.h" #include "hw/qdev-properties.h" +#include "qapi/qapi-types-migration.h" +#include "qapi/qapi-events-migration.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" #include "standard-headers/linux/ethtool.h" #include "sysemu/sysemu.h" #include "trace.h" +#include "monitor/qdev.h" +#include "hw/pci/pci.h" #define VIRTIO_NET_VM_VERSION11 @@ -746,9 +755,99 @@ static inline uint64_t virtio_net_supported_guest_offloads(VirtIONet *n) return virtio_net_guest_offloads_by_features(vdev->guest_features); } +static void failover_add_primary(VirtIONet *n, Error **errp) +{ +Error *err = NULL; + +n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"), +n->primary_device_id); +if (n->primary_device_opts) { +n->primary_dev = qdev_device_add(n->primary_device_opts, &err); +if (err) { +qemu_opts_del(n->primary_device_opts); +} +if (n->primary_dev) { +n->primary_bus = n->primary_dev->parent_bus; +if (err) { +qdev_unplug(n->primary_dev, &err); +qdev_set_id(n->primary_dev, ""); + +} +} +} else { +error_setg(errp, "Primary device not found"); +error_append_hint(errp, "Virtio-net failover will not work. Make " +"sure primary device has parameter" +" net_failover_pair_id=\n"); +} +if (err) { +error_propagate(errp, err); +} +} + +static int is_my_primary(void *opaque, QemuOpts *opts, Error **errp) +{ +VirtIONet *n = opaque; +int ret = 0; + +const char *standby_id = qemu_opt_get(opts, "net_failover_pair_id"); + +if (standby_id != NULL && (g_strcmp0(standby_id, n->netclient_name) == 0)) { +n->primary_device_id = g_strdup(opts->id); +ret = 1; +} + +return ret; +} + +static DeviceState *virtio_net_find_primary(VirtIONet *n, Error **errp) +{ +DeviceState *dev = NULL; +Error *err = NULL; + +if (qemu_opts
[PATCH v5 05/11] qapi: add unplug primary event
This event is emitted when we sent a request to unplug a failover primary device from the Guest OS and it includes the device id of the primary device. Signed-off-by: Jens Freimann --- qapi/migration.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index 82feb5bd39..52e69e2868 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1448,3 +1448,22 @@ # Since: 3.0 ## { 'command': 'migrate-pause', 'allow-oob': true } + +## +# @UNPLUG_PRIMARY: +# +# Emitted from source side of a migration when migration state is +# WAIT_UNPLUG. Device was unplugged by guest operating system. +# Device resources in QEMU are kept on standby to be able to re-plug it in case +# of migration failure. +# +# @device_id: QEMU device id of the unplugged device +# +# Since: 4.2 +# +# Example: +# {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} } +# +## +{ 'event': 'UNPLUG_PRIMARY', + 'data': { 'device_id': 'str' } } -- 2.21.0
[PATCH v5 06/11] qapi: add failover negotiated event
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature was not negotiated during virtio feature negotiation. If this event is received it means any primary devices hotplugged before this were were never really added to QEMU devices. Signed-off-by: Jens Freimann --- qapi/net.json | 16 1 file changed, 16 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 728990f4fb..8c5f3f1fb2 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -737,3 +737,19 @@ ## { 'command': 'announce-self', 'boxed': true, 'data' : 'AnnounceParameters'} + +## +# @FAILOVER_NEGOTIATED: +# +# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation +# +# Since: 4.2 +# +# Example: +# +# <- { "event": "FAILOVER_NEGOTIATED", +# "data": {} } +# +## +{ 'event': 'FAILOVER_NEGOTIATED', + 'data': {} } -- 2.21.0
[PATCH v5 09/11] libqos: tolerate wait-unplug migration state
Signed-off-by: Jens Freimann --- tests/libqos/libqos.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c index d71557c5cb..f229eb2cb8 100644 --- a/tests/libqos/libqos.c +++ b/tests/libqos/libqos.c @@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri) break; } -if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) { +if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0) +|| (strcmp(st, "wait-unplug") == 0)) { qobject_unref(rsp); g_usleep(5000); continue; -- 2.21.0
[PATCH v5 02/11] pci: add option for net failover
This patch adds a net_failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. QEMU will exit with an error message otherwise. Signed-off-by: Jens Freimann --- hw/pci/pci.c | 17 + include/hw/pci/pci.h | 3 +++ 2 files changed, 20 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..fa9b5219f8 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -75,6 +75,8 @@ static Property pci_props[] = { QEMU_PCIE_LNKSTA_DLLLA_BITNR, true), DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present, QEMU_PCIE_EXTCAP_INIT_BITNR, true), +DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice, +net_failover_pair_id), DEFINE_PROP_END_OF_LIST() }; @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) ObjectClass *klass = OBJECT_CLASS(pc); Error *local_err = NULL; bool is_default_rom; +uint16_t class_id; /* initialize cap_present for pci_is_express() and pci_config_size(), * Note that hybrid PCIs are not set automatically and need to manage @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } +if (pci_dev->net_failover_pair_id) { +if (!pci_is_express(pci_dev)) { +error_setg(errp, "failover device is not a PCIExpress device"); +error_propagate(errp, local_err); +return; +} +class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE); +if (class_id != PCI_CLASS_NETWORK_ETHERNET) { +error_setg(errp, "failover device is not an Ethernet device"); +error_propagate(errp, local_err); +return; +} +} + /* rom loading */ is_default_rom = false; if (pci_dev->romfile == NULL && pc->romfile != NULL) { diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index f3f0ffd5fb..def5435685 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -352,6 +352,9 @@ struct PCIDevice { MSIVectorUseNotifier msix_vector_use_notifier; MSIVectorReleaseNotifier msix_vector_release_notifier; MSIVectorPollNotifier msix_vector_poll_notifier; + +/* ID of standby device in net_failover pair */ +char *net_failover_pair_id; }; void pci_register_bar(PCIDevice *pci_dev, int region_num, -- 2.21.0
[PATCH v5 04/11] pci: mark device having guest unplug request pending
Set pending_deleted_event in DeviceState for failover primary devices that were successfully unplugged by the Guest OS. Signed-off-by: Jens Freimann --- hw/pci/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 19363ff8ce..08718188bb 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); if (dev->partially_hotplugged) { +dev->qdev.pending_deleted_event = false; return; } hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); @@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, return; } +dev->pending_deleted_event = true; + /* In case user cancel the operation of multi-function hot-add, * remove the function that is unexposed to guest individually, * without interaction with guest. -- 2.21.0
[PATCH v5 07/11] migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figured out what works. For failover primary devices qdev_unplug() is called from the migration handler, i.e. during migration. This patch adds a flag to DeviceState which is set to false for all devices and makes an exception for PCI devices that are also primary devices in a failover pair. Signed-off-by: Jens Freimann --- hw/core/qdev.c | 1 + hw/pci/pci.c | 1 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 2 +- 4 files changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index f786650446..dc1289da86 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -995,6 +995,7 @@ static void device_initfn(Object *obj) dev->instance_id_alias = -1; dev->realized = false; +dev->allow_unplug_during_migration = false; object_property_add_bool(obj, "realized", device_get_realized, device_set_realized, NULL); diff --git a/hw/pci/pci.c b/hw/pci/pci.c index fa9b5219f8..8fbf32d68c 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2116,6 +2116,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) error_propagate(errp, local_err); return; } +qdev->allow_unplug_during_migration = true; } /* rom loading */ diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 710981af36..1518495b1e 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -156,6 +156,7 @@ struct DeviceState { bool pending_deleted_event; QemuOpts *opts; int hotplugged; +bool allow_unplug_during_migration; BusState *parent_bus; QLIST_HEAD(, NamedGPIOList) gpios; QLIST_HEAD(, BusState) child_bus; diff --git a/qdev-monitor.c b/qdev-monitor.c index 676a759fb4..bc6a41fa37 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp) return; } -if (!migration_is_idle()) { +if (!migration_is_idle() && !dev->allow_unplug_during_migration) { error_setg(errp, "device_del not allowed while migrating"); return; } -- 2.21.0
[PATCH v5 03/11] pci: mark devices partially unplugged
Only the guest unplug request was triggered. This is needed for the failover feature. In case of a failed migration we need to plug the device back to the guest. Signed-off-by: Jens Freimann --- hw/pci/pcie.c| 3 +++ include/hw/pci/pci.h | 1 + 2 files changed, 4 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index a6beb567bd..19363ff8ce 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) { HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); +if (dev->partially_hotplugged) { +return; +} hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); object_unparent(OBJECT(dev)); } diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index def5435685..7b7eac845c 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache; struct PCIDevice { DeviceState qdev; +bool partially_hotplugged; /* PCI config space */ uint8_t *config; -- 2.21.0
[PATCH v5 0/11] add failover feature for assigned network devices
stdio \ -net none \ -qmp unix:/tmp/qmp.socket,server,nowait \ -monitor telnet:127.0.0.1:,server,nowait \ -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \ -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \ -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \ -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \ -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \ /root/rhel-guest-image-8.0-1781.x86_64.qcow2 I'm grateful for any remarks or ideas! Thanks! regards, Jens Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration hw/core/qdev.c | 24 +++ hw/net/virtio-net.c| 302 + hw/pci/pci.c | 18 ++ hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 30 include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h| 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 36 migration/savevm.h | 2 + qapi/migration.json| 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 43 - tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 565 insertions(+), 15 deletions(-) -- 2.21.0
[PATCH v5 01/11] qdev/qbus: add hidden device support
This adds support for hiding a device to the qbus and qdev APIs. The first user of this will be the virtio-net failover feature but the API introduced with this patch could be used to implement other features as well, for example hiding pci devices when a pci bus is powered off. qdev_device_add() is modified to check for a net_failover_pair_id argument in the option string. A DeviceListener callback should_be_hidden() is added. It can be used by a standby device to inform qdev that this device should not be added now. The standby device handler can store the device options to plug the device in at a later point in time. One reason for hiding the device is that we don't want to expose both devices to the guest kernel until the respective virtio feature bit VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be handled correctly by the guest. More information on the kernel feature this is using: https://www.kernel.org/doc/html/latest/networking/net_failover.html An example where the primary device is a vfio-pci device and the standby device is a virtio-net device: A device is hidden when it has an "net_failover_pair_id" option, e.g. -device virtio-net-pci,...,failover=on,... -device vfio-pci,...,net_failover_pair_id=net1,... Signed-off-by: Jens Freimann Reviewed-by: Cornelia Huck --- hw/core/qdev.c | 23 +++ include/hw/qdev-core.h | 29 + qdev-monitor.c | 41 + vl.c | 6 -- 4 files changed, 93 insertions(+), 6 deletions(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index cbad6c1d55..f786650446 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -212,6 +212,29 @@ void device_listener_unregister(DeviceListener *listener) QTAILQ_REMOVE(&device_listeners, listener, link); } +bool qdev_should_hide_device(QemuOpts *opts) +{ +int rc = -1; +DeviceListener *listener; + +QTAILQ_FOREACH(listener, &device_listeners, link) { + if (listener->should_be_hidden) { +/* should_be_hidden_will return + * 1 if device matches opts and it should be hidden + * 0 if device matches opts and should not be hidden + * -1 if device doesn't match opts + */ +rc = listener->should_be_hidden(listener, opts); +} + +if (rc > 0) { +break; +} +} + +return rc > 0; +} + void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, int required_for_version) { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index aa123f88cb..710981af36 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp); * respective parent types. * * + * + * # Hiding a device # + * To hide a device, a DeviceListener function should_be_hidden() needs to + * be registered. + * It can be used to defer adding a device and therefore hide it from the + * guest. The handler registering to this DeviceListener can save the QOpts + * passed to it for re-using it later and must return that it wants the device + * to be/remain hidden or not. When the handler function decides the device + * shall not be hidden it will be added in qdev_device_add() and + * realized as any other device. Otherwise qdev_device_add() will return early + * without adding the device. The guest will not see a "hidden" device + * until it was marked don't hide and qdev_device_add called again. + * */ typedef struct DeviceClass { /*< private >*/ @@ -154,6 +167,12 @@ struct DeviceState { struct DeviceListener { void (*realize)(DeviceListener *listener, DeviceState *dev); void (*unrealize)(DeviceListener *listener, DeviceState *dev); +/* + * This callback is called upon init of the DeviceState and allows to + * inform qdev that a device should be hidden, depending on the device + * opts, for example, to hide a standby device. + */ +int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts); QTAILQ_ENTRY(DeviceListener) link; }; @@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus) void device_listener_register(DeviceListener *listener); void device_listener_unregister(DeviceListener *listener); +/** + * @qdev_should_hide_device: + * @opts: QemuOpts as passed on cmdline. + * + * Check if a device should be added. + * When a device is added via qdev_device_add() this will be called, + * and return if the device should be added now or not. + */ +bool qdev_should_hide_device(QemuOpts *opts); + #endif diff --git a/qdev-monitor.c b/qdev-monitor.c index 148df9cacf..676a759fb4 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -32,9 +32,11 @@ #include "qemu/help_option.h" #include "qemu/option.h" #
Re: [PATCH 02/11] pci: add option for net failover
On Mon, Oct 21, 2019 at 01:01:22PM -0600, Alex Williamson wrote: On Mon, 21 Oct 2019 20:45:46 +0200 Jens Freimann wrote: On Mon, Oct 21, 2019 at 08:58:23AM -0600, Alex Williamson wrote: >On Fri, 18 Oct 2019 22:20:31 +0200 >Jens Freimann wrote: > >> This patch adds a net_failover_pair_id property to PCIDev which is >> used to link the primary device in a failover pair (the PCI dev) to >> a standby (a virtio-net-pci) device. >> >> It only supports ethernet devices. Also currently it only supports >> PCIe devices. QEMU will exit with an error message otherwise. > >Doesn't the PCIe device also need to be hotpluggable? We can have PCIe >devices attached to the root bus where we don't currently support >hotplug. Thanks, How do I recognize such a device? hotpluggable in DeviceClass? I wouldn't expect it to be a device property, it's more of a function of whether the bus that it's attached to supports hotplug, so probably something related to the bus controller. Thanks, IIUC this is checked for in qdev_device_add() by this: if (qdev_hotplug && bus && !qbus_is_hotpluggable(bus)) { error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name); return NULL; } So qemu will fail with 'Bus pcie.0 does not allow hotplug' when we try to hotplug the device. I tried a primary with bus=pcie.0 and it aborted. Aborting qemu is not nice so I'll make it print a error message QERR_BUS_NO_HOTPLUG instead but let it continue. This will be a change in the virtio-net patch, not in this one. regards, Jens
Re: [PATCH 11/11] vfio: unplug failover primary device before migration
On Mon, Oct 21, 2019 at 09:19:20AM -0600, Alex Williamson wrote: On Fri, 18 Oct 2019 22:20:40 +0200 Jens Freimann wrote: [...] +if (!pdev->net_failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +goto error; +} +} else { +pdev->qdev.allow_unplug_during_migration = true; Why is this unique to vfio-pci, shouldn't any device set as the primary for a failover allow unplug during migration, and therefore should it be done in the core code rather than device driver? With the net_failover_pair_id set in PCIDevice, I should be able to test with an e1000e NIC as primary, but this suggests they wouldn't be handled identically elsewhere. Thanks, I assume you're talking about pci core code not qdev core. Failover is pci specific at this time (only the part to hide devices is more generic and is in qdev code). I can set the flag to allow migration in pci_qdev_realize(). regards, Jens
Re: [PATCH 02/11] pci: add option for net failover
On Mon, Oct 21, 2019 at 08:58:23AM -0600, Alex Williamson wrote: On Fri, 18 Oct 2019 22:20:31 +0200 Jens Freimann wrote: This patch adds a net_failover_pair_id property to PCIDev which is used to link the primary device in a failover pair (the PCI dev) to a standby (a virtio-net-pci) device. It only supports ethernet devices. Also currently it only supports PCIe devices. QEMU will exit with an error message otherwise. Doesn't the PCIe device also need to be hotpluggable? We can have PCIe devices attached to the root bus where we don't currently support hotplug. Thanks, How do I recognize such a device? hotpluggable in DeviceClass? regards, Jens
Re: [PATCH 11/11] vfio: unplug failover primary device before migration
On Mon, Oct 21, 2019 at 03:45:04PM +0200, Cornelia Huck wrote: On Fri, 18 Oct 2019 22:20:40 +0200 Jens Freimann wrote: As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann --- hw/vfio/pci.c | 31 +-- hw/vfio/pci.h | 1 + 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 12fac39804..a15b83c6b6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -40,6 +40,9 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "migration/blocker.h" +#include "qemu/option.h" +#include "qemu/option_int.h" #define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) @@ -2712,12 +2715,26 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) int i, ret; bool is_mdev; +if (!pdev->net_failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +goto error; This looks wrong, you haven't set up vbasedev.name yet. You're right. +} +} else { +pdev->qdev.allow_unplug_during_migration = true; +} + if (!vdev->vbasedev.sysfsdev) { if (!(~vdev->host.domain || ~vdev->host.bus || ~vdev->host.slot || ~vdev->host.function)) { error_setg(errp, "No provided host device"); error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F " "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n"); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); return; } vdev->vbasedev.sysfsdev = @@ -2729,6 +2746,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) if (stat(vdev->vbasedev.sysfsdev, &st) < 0) { error_setg_errno(errp, errno, "no such host device"); error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.sysfsdev); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); return; } Might be a bit easier cleanup-wise if you set up the blocker resp. allow unplug during migration only here. The only difference is that you'll get a different error message when trying to set up a non-failover device with invalid specs on a migratable-only setup. Yes, so I moved it to this place now. This saves me cleanup up the migration blocker in the above two cases. I don't jump to error if adding the migration blocker failed, I just have to free the blocker Error and can return. @@ -3008,6 +3027,8 @@ out_teardown: vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); Shouldn't you check whether migration_block has been set up, like in the finalize routine? yes, added the same check here. Thanks Conny! regards, Jens
Re: [PATCH 01/11] qdev/qbus: add hidden device support
On Mon, Oct 21, 2019 at 01:53:58PM +0100, Peter Maydell wrote: On Fri, 18 Oct 2019 at 21:22, Jens Freimann wrote: This adds support for hiding a device to the qbus and qdev APIs. The first user of this will be the virtio-net failover feature but the API introduced with this patch could be used to implement other features as well, for example hiding pci devices when a pci bus is powered off. In what sense is a hidden device hidden? Is it hidden from the guest (in what way) ? Is it hidden from the QMP/HMP monitor commands? Is it hidden from QEMU functions that iterate through the qbus, or that walk the QOM tree ? What does a hidden device have to do to be successfully hidden ? Is it completely disconnected from the rest of the simulation, or does it itself have to avoid doing things like asserting IRQ lines ? Expanding the DeviceClass doc comment might be a good place to answer questions like the above and generally describe the 'hidden device' mechanism. ok, I will add to the DeviceClass comment. }; @@ -451,4 +457,6 @@ static inline bool qbus_is_hotpluggable(BusState *bus) void device_listener_register(DeviceListener *listener); void device_listener_unregister(DeviceListener *listener); +bool qdev_should_hide_device(QemuOpts *opts); New globally visible functions should always have doc-comment format documentation comments, please. Will add this too. Thanks for the review! regards, Jens
Re: [PATCH 01/11] qdev/qbus: add hidden device support
On Mon, Oct 21, 2019 at 02:44:08PM +0200, Cornelia Huck wrote: On Fri, 18 Oct 2019 22:20:30 +0200 Jens Freimann wrote: This adds support for hiding a device to the qbus and qdev APIs. The first user of this will be the virtio-net failover feature but the API introduced with this patch could be used to implement other features as well, for example hiding pci devices when a pci bus is powered off. qdev_device_add() is modified to check for a net_failover_pair_id argument in the option string. A DeviceListener callback should_be_hidden() is added. It can be used by a standby device to inform qdev that this device should not be added now. The standby device handler can store the device options to plug the device in at a later point in time. One reason for hiding the device is that we don't want to expose both devices to the guest kernel until the respective virtio feature bit VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be handled correctly by the guest. More information on the kernel feature this is using: https://www.kernel.org/doc/html/latest/networking/net_failover.html An example where the primary device is a vfio-pci device and the standby device is a virtio-net device: A device is hidden when it has an "net_failover_pair_id" option, e.g. -device virtio-net-pci,...,failover=on,... -device vfio-pci,...,net_failover_pair_id=net1,... Signed-off-by: Jens Freimann --- hw/core/qdev.c | 23 +++ include/hw/qdev-core.h | 8 qdev-monitor.c | 36 +--- vl.c | 6 -- 4 files changed, 68 insertions(+), 5 deletions(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index cbad6c1d55..89c134ec53 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -212,6 +212,29 @@ void device_listener_unregister(DeviceListener *listener) QTAILQ_REMOVE(&device_listeners, listener, link); } +bool qdev_should_hide_device(QemuOpts *opts) +{ +int rc; Initialize to 0? Yes, that's what the test bot found as well. Fixed it already. Though I initialize to -1, otherwise all devices will be hidden. +DeviceListener *listener; + +QTAILQ_FOREACH(listener, &device_listeners, link) { + if (listener->should_be_hidden) { +/* should_be_hidden_will return + * 1 if device matches opts and it should be hidden + * 0 if device matches opts and should not be hidden + * -1 if device doesn't match ops + */ +rc = listener->should_be_hidden(listener, opts); +} + +if (rc > 0) { +break; +} +} + +return rc > 0; +} + void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, int required_for_version) { (...) +static bool should_hide_device(QemuOpts *opts) +{ +if (qemu_opt_foreach(opts, is_failover_device, opts, NULL) == 0) { +return false; +} +return true; +} I still think you should turn the check around to make it easier to extend in the future, but this is fine as well. Sorry, I've gone back and forth on this and ended up with the same. I'll take another look at it. (...) With the rc thing changed, Reviewed-by: Cornelia Huck Thanks! regards, Jens
Re: [PATCH v4 0/11] add failover feature for assigned network devices
On Sat, Oct 19, 2019 at 08:12:27AM -0700, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20191018202040.30349-1-jfreim...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === CC hw/core/or-irq.o CC hw/core/split-irq.o /tmp/qemu-test/src/hw/core/qdev.c: In function 'qdev_should_hide_device': /tmp/qemu-test/src/hw/core/qdev.c:235:5: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized] return rc > 0; hmpf, always run all tests especially after last minute changes on friday afternoon. I'll fix this. regards, Jens ^ cc1: all warnings being treated as errors make: *** [hw/core/qdev.o] Error 1 make: *** Waiting for unfinished jobs Traceback (most recent call last): File "./tests/docker/docker.py", line 662, in --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ozx9dk_0/src/docker-src.2019-10-19-11.10.07.7972:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ozx9dk_0/src' make: *** [docker-run-test-quick@centos7] Error 2 real2m19.619s user0m8.409s The full log is available at http://patchew.org/logs/20191018202040.30349-1-jfreim...@redhat.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH 05/11] qapi: add unplug primary event
On Fri, Oct 18, 2019 at 03:28:36PM -0500, Eric Blake wrote: On 10/18/19 3:20 PM, Jens Freimann wrote: This event is emitted when we sent a request to unplug a failover primary device from the Guest OS and it includes the device id of the primary device. Signed-off-by: Jens Freimann --- qapi/migration.json | 19 +++ 1 file changed, 19 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index 82feb5bd39..52e69e2868 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1448,3 +1448,22 @@ # Since: 3.0 ## { 'command': 'migrate-pause', 'allow-oob': true } + +## +# @UNPLUG_PRIMARY: +# +# Emitted from source side of a migration when migration state is +# WAIT_UNPLUG. Device was unplugged by guest operating system. +# Device resources in QEMU are kept on standby to be able to re-plug it in case +# of migration failure. +# +# @device_id: QEMU device id of the unplugged device +# +# Since: 4.2 +# +# Example: +# {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} } Unless there is a strong reason in favor of 'device_id' (such as consistency with a similar event), our naming convention prefers this to be 'device-id'. No reason, I'll change it. Thanks Eric! regards, Jens
[PATCH 07/11] migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figured out what works. For failover primary devices qdev_unplug() is called from the migration handler, i.e. during migration. This patch adds a flag to DeviceState which is set to false for all devices and makes an exception for vfio-pci devices that are also primary devices in a failover pair. Signed-off-by: Jens Freimann --- hw/core/qdev.c | 1 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 89c134ec53..b1be568af3 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -995,6 +995,7 @@ static void device_initfn(Object *obj) dev->instance_id_alias = -1; dev->realized = false; +dev->allow_unplug_during_migration = false; object_property_add_bool(obj, "realized", device_get_realized, device_set_realized, NULL); diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 28f594a47d..6b690e85b1 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -143,6 +143,7 @@ struct DeviceState { bool pending_deleted_event; QemuOpts *opts; int hotplugged; +bool allow_unplug_during_migration; BusState *parent_bus; QLIST_HEAD(, NamedGPIOList) gpios; QLIST_HEAD(, BusState) child_bus; diff --git a/qdev-monitor.c b/qdev-monitor.c index 508d85df87..265ab4f0d5 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -848,7 +848,7 @@ void qdev_unplug(DeviceState *dev, Error **errp) return; } -if (!migration_is_idle()) { +if (!migration_is_idle() && !dev->allow_unplug_during_migration) { error_setg(errp, "device_del not allowed while migrating"); return; } -- 2.21.0
[PATCH 10/11] net/virtio: add failover support
This patch adds support to handle failover device pairs of a virtio-net device and a vfio-pci device, where the virtio-net acts as the standby device and the vfio-pci device as the primary. The general idea is that we have a pair of devices, a vfio-pci and a emulated (virtio-net) device. Before migration the vfio device is unplugged and data flows to the emulated device, on the target side another vfio-pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. To achieve this we need: 1. Provide a callback function for the should_be_hidden DeviceListener. It is called when the primary device is plugged in. Evaluate the QOpt passed in to check if it is the matching primary device. It returns two values: - one to signal if the device to be added is the matching primary device - another one to signal to qdev if it should actually continue with adding the device or skip it. In the latter case it stores the device options in the VirtioNet struct and the device is added once the VIRTIO_NET_F_STANDBY feature is negotiated during virtio feature negotiation. If the virtio-net devices are not realized at the time the vfio-pci devices are realized, we need to connect the devices later. This way we make sure primary and standby devices can be specified in any order. 2. Register a callback for migration status notifier. When called it will unplug its primary device before the migration happens. 3. Register a callback for the migration code that checks if a device needs to be unplugged from the guest. Signed-off-by: Jens Freimann --- hw/net/virtio-net.c| 282 + include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + 3 files changed, 295 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 9f11422337..afe113f083 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -12,6 +12,7 @@ */ #include "qemu/osdep.h" +#include "qemu/atomic.h" #include "qemu/iov.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -21,6 +22,10 @@ #include "net/tap.h" #include "qemu/error-report.h" #include "qemu/timer.h" +#include "qemu/option.h" +#include "qemu/option_int.h" +#include "qemu/config-file.h" +#include "qapi/qmp/qdict.h" #include "hw/virtio/virtio-net.h" #include "net/vhost_net.h" #include "net/announce.h" @@ -28,11 +33,15 @@ #include "qapi/error.h" #include "qapi/qapi-events-net.h" #include "hw/qdev-properties.h" +#include "qapi/qapi-types-migration.h" +#include "qapi/qapi-events-migration.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" #include "standard-headers/linux/ethtool.h" #include "sysemu/sysemu.h" #include "trace.h" +#include "monitor/qdev.h" +#include "hw/pci/pci.h" #define VIRTIO_NET_VM_VERSION11 @@ -746,9 +755,85 @@ static inline uint64_t virtio_net_supported_guest_offloads(VirtIONet *n) return virtio_net_guest_offloads_by_features(vdev->guest_features); } +static void failover_add_primary(VirtIONet *n) +{ +Error *err = NULL; + +n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"), +n->primary_device_id); +if (n->primary_device_opts) { +n->primary_dev = qdev_device_add(n->primary_device_opts, &err); +if (err) { +qemu_opts_del(n->primary_device_opts); +} +if (n->primary_dev) { +n->primary_bus = n->primary_dev->parent_bus; +if (err) { +qdev_unplug(n->primary_dev, &err); +qdev_set_id(n->primary_dev, ""); + +} +} +} +if (err) { +error_report_err(err); +} +} + +static int is_my_primary(void *opaque, QemuOpts *opts, Error **errp) +{ +VirtIONet *n = opaque; +int ret = 0; + +const char *standby_id = qemu_opt_get(opts, "net_failover_pair_id"); + +if (standby_id != NULL && (g_strcmp0(standby_id, n->netclient_name) == 0)) { +n->primary_device_id = g_strdup(opts->id); +ret = 1; +} + +return ret; +} + +static DeviceState *virtio_net_find_primary(VirtIONet *n, Error *err) +{ +DeviceState *dev = NULL; + +if (qemu_opts_foreach(qemu_find_opts("device"), + is_my_primary, n, &err)) { +if (n->primary_device_id) { +dev = qdev_find_recursive(sysbus_get_default(), +n->primary_device_id); +} else { +return NULL; +} +} +return dev; +} + + +
[PATCH 11/11] vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann --- hw/vfio/pci.c | 31 +-- hw/vfio/pci.h | 1 + 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 12fac39804..a15b83c6b6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -40,6 +40,9 @@ #include "pci.h" #include "trace.h" #include "qapi/error.h" +#include "migration/blocker.h" +#include "qemu/option.h" +#include "qemu/option_int.h" #define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) @@ -2712,12 +2715,26 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) int i, ret; bool is_mdev; +if (!pdev->net_failover_pair_id) { +error_setg(&vdev->migration_blocker, +"VFIO device doesn't support migration"); +ret = migrate_add_blocker(vdev->migration_blocker, &err); +if (err) { +error_propagate(errp, err); +goto error; +} +} else { +pdev->qdev.allow_unplug_during_migration = true; +} + if (!vdev->vbasedev.sysfsdev) { if (!(~vdev->host.domain || ~vdev->host.bus || ~vdev->host.slot || ~vdev->host.function)) { error_setg(errp, "No provided host device"); error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F " "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n"); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); return; } vdev->vbasedev.sysfsdev = @@ -2729,6 +2746,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) if (stat(vdev->vbasedev.sysfsdev, &st) < 0) { error_setg_errno(errp, errno, "no such host device"); error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.sysfsdev); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); return; } @@ -3008,6 +3027,8 @@ out_teardown: vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); } static void vfio_instance_finalize(Object *obj) @@ -3019,6 +3040,10 @@ static void vfio_instance_finalize(Object *obj) vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); +if (vdev->migration_blocker) { +migrate_del_blocker(vdev->migration_blocker); +error_free(vdev->migration_blocker); +} /* * XXX Leaking igd_opregion is not an oversight, we can't remove the * fw_cfg entry therefore leaking this allocation seems like the safest @@ -3151,11 +3176,6 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static const VMStateDescription vfio_pci_vmstate = { -.name = "vfio-pci", -.unmigratable = 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3163,7 +3183,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) dc->reset = vfio_pci_reset; dc->props = vfio_pci_dev_properties; -dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 834a90d646..b329d50338 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice { bool no_vfio_ioeventfd; bool enable_ramfb; VFIODisplay *dpy; +Error *migration_blocker; } VFIOPCIDevice; uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); -- 2.21.0
[PATCH 04/11] pci: mark device having guest unplug request pending
Set pending_deleted_event in DeviceState for failover primary devices that were successfully unplugged by the Guest OS. Signed-off-by: Jens Freimann --- hw/pci/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 19363ff8ce..08718188bb 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); if (dev->partially_hotplugged) { +dev->qdev.pending_deleted_event = false; return; } hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort); @@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, return; } +dev->pending_deleted_event = true; + /* In case user cancel the operation of multi-function hot-add, * remove the function that is unexposed to guest individually, * without interaction with guest. -- 2.21.0