Re: [PATCH v3 3/4] failover: really display a warning when the primary device is not found

2021-02-12 Thread Jens Freimann

On Fri, Feb 12, 2021 at 02:52:49PM +0100, Laurent Vivier wrote:

In failover_add_primary(), we search the id of the failover device by
scanning the list of the devices in the opts list to find a device with
a failover_pair_id equals to the id of the virtio-net device.

If the failover_pair_id is not found, QEMU ignores the primary
device silently (which also means it will not be hidden and
it will be enabled directly at boot).

After that, we search the id in the opts list to do a qdev_device_add()
with it. The device will be always found as otherwise we had exited
before, and thus the warning is never displayed.

Fix that by moving the error report to the first exit condition.
Also add a g_assert() to be sure the compiler will not complain
about a possibly NULL pointer.

Signed-off-by: Laurent Vivier 
---
hw/net/virtio-net.c | 20 +---
1 file changed, 9 insertions(+), 11 deletions(-)



Thank you Laurent!

Reviewed-by: Jens Freimann 

regards,
Jens




Re: [PATCH 2/2] virtio-net: add missing object_unref()

2021-02-08 Thread Jens Freimann

On Sat, Feb 06, 2021 at 01:39:55PM +0100, Laurent Vivier wrote:

failover_add_primary() calls qdev_device_add() and doesn't unref
the device. Because of that, when the device is unplugged a reference
is remaining and prevents the cleanup of the object.

This prevents to be able to plugin back the failover primary device,
with errors like:

 (qemu) device_add 
vfio-pci,host=:41:00.0,id=hostdev0,bus=root.3,failover_pair_id=net0
 (qemu) device_del hostdev0

We can check with "info qtree" and "info pci" that the device has been removed, 
and then:

 (qemu) device_add 
vfio-pci,host=:41:00.0,id=hostdev1,bus=root.3,failover_pair_id=net0
 Error: vfio :41:00.0: device is already attached
 (qemu) device_add 
vfio-pci,host=:41:00.0,id=hostdev0,bus=root.3,failover_pair_id=net0
 qemu-kvm: Duplicate ID 'hostdev0' for device

Fixes: 21e8709b29cd ("failover: Remove primary_dev member")
Cc: quint...@redhat.com
Signed-off-by: Laurent Vivier 
---
hw/net/virtio-net.c | 2 ++
1 file changed, 2 insertions(+)



Reviewed-by: Jens Freimann 

Thank you Laurent!

regards,
Jens 





Re: [PATCH 1/2] pci: cleanup failover sanity check

2021-02-08 Thread Jens Freimann

On Sat, Feb 06, 2021 at 01:39:54PM +0100, Laurent Vivier wrote:

Commit a1190ab628 has added a "allow_unplug_during_migration = true" at
the end of the main "if" block, so it is not needed to set it anymore
in the previous checking.

Remove it, to have only sub-ifs that check for needed conditions and exit
if one fails.

Fixes: 4f5b6a05a4e7 ("pci: add option for net failover")
Fixes: a1190ab628c0 ("migration: allow unplug during migration for failover 
devices")
Cc: jfreim...@redhat.com
Signed-off-by: Laurent Vivier 
---
hw/pci/pci.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)



Reviewed-by: Jens Freimann 

Thank you Laurent!

regards,
Jens 





Re: Documents not in sphinx toctree

2020-11-05 Thread Jens Freimann

On Thu, Nov 05, 2020 at 04:19:00PM +, Peter Maydell wrote:

On Thu, 5 Nov 2020 at 16:14, Marc-André Lureau
 wrote:

By running sphinx over the docs/ directory (like readthedocs.org presumably 
does), it finds a couple of rst documents that are not referenced:
- cpu-hotplug.rst
- microvm.rst
- pr-manager.rst
- virtio-net-failover.rst


Given the current structure of the content in
https://qemu.readthedocs.io/en/latest/,
would adding this as a new bullet in "QEMU System Emulation User’s
Guide" be the right thing to do?  


regards,
Jens 





Re: [PULL 1/2] error: Remove NULL checks on error_propagate() calls (again)

2020-10-09 Thread Jens Freimann

On Fri, Oct 09, 2020 at 08:48:57AM +0200, Markus Armbruster wrote:

Patch created mechanically by rerunning:

   $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \
--macro-file scripts/cocci-macro-file.h \
--use-gitgrep .

Cc: Jens Freimann 
Cc: Hailiang Zhang 
Cc: Juan Quintela 
Signed-off-by: Markus Armbruster 
Message-Id: <20200722084048.1726105-4-arm...@redhat.com>
Reviewed-by: Eric Blake 
---
hw/net/virtio-net.c   | 8 ++--
migration/colo.c  | 4 +---
migration/migration.c | 8 ++--
3 files changed, 5 insertions(+), 15 deletions(-)



Reviewed-by: Jens Freimann 





Re: [PATCH] virtiofsd: avoid /proc/self/fd tempdir

2020-10-06 Thread Jens Freimann

On Tue, Oct 06, 2020 at 10:58:26AM +0100, Stefan Hajnoczi wrote:

In order to prevent /proc/self/fd escapes a temporary directory is
created where /proc/self/fd is bind-mounted. This doesn't work on
read-only file systems.

Avoid the temporary directory by bind-mounting /proc/self/fd over /proc.
This does not affect other processes since we remounted / with MS_REC |
MS_SLAVE. /proc must exist and virtiofsd does not use it so it's safe to
do this.

Path traversal can be tested with the following function:

 static void test_proc_fd_escape(struct lo_data *lo)
 {
 int fd;
 int level = 0;
 ino_t last_ino = 0;

 fd = lo->proc_self_fd;
 for (;;) {
 struct stat st;

 if (fstat(fd, ) != 0) {
 perror("fstat");
 return;
 }
 if (last_ino && st.st_ino == last_ino) {
 fprintf(stderr, "inode number unchanged, stopping\n");
 return;
 }
 last_ino = st.st_ino;

 fprintf(stderr, "Level %d dev %lu ino %lu\n", level,
 (unsigned long)st.st_dev,
 (unsigned long)last_ino);
 fd = openat(fd, "..", O_PATH | O_DIRECTORY | O_NOFOLLOW);
 level++;
 }
 }

Before and after this patch only Level 0 is displayed. Without
/proc/self/fd bind-mount protection it is possible to traverse parent
directories.

Fixes: 397ae982f4df4 ("virtiofsd: jail lo->proc_self_fd")
Cc: Miklos Szeredi 
Cc: Jens Freimann 
Signed-off-by: Stefan Hajnoczi 


Thanks Stefan, it fixes the problem we had!

Tested-by: Jens Freimann 
Reviewed-by: Jens Freimann  


regards,
Jens 





Re: [RFC 0/1] tools/virtiofsd: don't create temporary directory in /

2020-10-05 Thread Jens Freimann

On Thu, Oct 01, 2020 at 08:15:18AM +0200, Jens Freimann wrote:

I'm sending this as an RFC because:
Maybe just prepending "/tmp" is not generic enough and we should make it
somehow configurable or use $TMPDIR. Also there might be security
implications I'm not aware of.
The process is running with container_kvm_t context which also needs
a change to be allowed to create files in tmpfs to make it work.


Fabiano had the idea to use a glib function to create the
temporary directory. It would be good because it uses the $TMPDIR env
variable. 


But before we decide about glib or not: the change is in the call
chain of setup_sandbox() and there was a question what other implications
that has. What do you think?

regards,
Jens  





[RFC 0/1] tools/virtiofsd: don't create temporary directory in /

2020-10-01 Thread Jens Freimann
When running a Kata container with virtiofs in OpenShift/k8s I get a
"Operation not permitted" error from a mkdtemp() call in virtiofsd
because it is trying to create a directory like /virtiofsd.11RAND

To avoid this change in virtiofsd, I've tried to set the TMPDIR
environment variable for the virtiofsd process, hoping that mkdtemp()
would use it, but it does not. Looking at glibc code it seems to be used
by tmpfile() etc. only. 

I'm sending this as an RFC because:
Maybe just prepending "/tmp" is not generic enough and we should make it
somehow configurable or use $TMPDIR. Also there might be security
implications I'm not aware of.
The process is running with container_kvm_t context which also needs
a change to be allowed to create files in tmpfs to make it work.



Jens Freimann (1):
  tools/virtiofsd: create tmpdir in /tmp

 tools/virtiofsd/passthrough_ll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.26.2




[RFC 1/1] tools/virtiofsd: create temporary directory in /tmp

2020-10-01 Thread Jens Freimann
mkdtemp() will try to create a current directory in the working
directory of the process. In this case it's trying to create it in /.
This is a problem when the process doesn't have write access there.

This patch changes the template string and prepends "/tmp" which is
typically writable.

Signed-off-by: Jens Freimann 
---
 tools/virtiofsd/passthrough_ll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0b229ebd57..f79bcce0d7 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2393,7 +2393,7 @@ static void setup_wait_parent_capabilities(void)
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
 pid_t child;
-char template[] = "virtiofsd-XX";
+char template[] = "/tmp/virtiofsd-XX";
 char *tmpdir;
 
 /*
-- 
2.26.2




Re: [PATCH 02/22] pci: Delete useless error_propagate()

2020-06-22 Thread Jens Freimann

On Mon, Jun 22, 2020 at 12:42:30PM +0200, Markus Armbruster wrote:

Cc: Jens Freimann 
Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Signed-off-by: Markus Armbruster 
---
hw/pci/pci.c | 3 ---
1 file changed, 3 deletions(-)



Reviewed-by: Jens Freimann 

Thanks!

regards,
Jens 





Re: [PATCH 01/22] net/virtio: Fix failover_replug_primary() return value regression

2020-06-22 Thread Jens Freimann

On Mon, Jun 22, 2020 at 12:42:29PM +0200, Markus Armbruster wrote:

Commit 150ab54aa6 "net/virtio: fix re-plugging of primary device"
fixed failover_replug_primary() to return false on failure.  Commit
5a0948d36c "net/virtio: Fix failover error handling crash bugs" broke
it again for hotplug_handler_plug() failure.  Unbreak it.

Commit 5a0948d36c4cbc1c5534afac6fee99de55245d12

Fixes: 5a0948d36c4cbc1c5534afac6fee99de55245d12
Cc: Jens Freimann 
Cc: Michael S. Tsirkin 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Markus Armbruster 
---
hw/net/virtio-net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Jens Freimann 

Thanks!

regards,
Jens 





Re: [PATCH] migration: Optimization about wait-unplug migration state

2020-02-05 Thread Jens Freimann

On Tue, Feb 04, 2020 at 01:08:41PM +0800, Keqian Zhu wrote:

qemu_savevm_nr_failover_devices() is originally designed to
get the number of failover devices, but it actually returns
the number of "unplug-pending" failover devices now. Moreover,
what drives migration state to wait-unplug should be the number
of "unplug-pending" failover devices, not all failover devices.

We can also notice that qemu_savevm_state_guest_unplug_pending()
and qemu_savevm_nr_failover_devices() is equivalent almost (from
the code view). So the latter is incorrect semantically and
useless, just delete it.

In the qemu_savevm_state_guest_unplug_pending(), once hit a
unplug-pending failover device, then it can return true right
now to save cpu time.

Signed-off-by: Keqian Zhu 
---
Cc: jfreim...@redhat.com
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
---
migration/migration.c |  2 +-
migration/savevm.c| 24 +++-
migration/savevm.h|  1 -
3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3a21a4686c..deedc968cf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -,7 +,7 @@ static void *migration_thread(void *opaque)

qemu_savevm_state_setup(s->to_dst_file);

-if (qemu_savevm_nr_failover_devices()) {
+if (qemu_savevm_state_guest_unplug_pending()) {
migrate_set_state(>state, MIGRATION_STATUS_SETUP,
  MIGRATION_STATUS_WAIT_UNPLUG);

diff --git a/migration/savevm.c b/migration/savevm.c
index f19cb9ec7a..1d4220ece8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1140,36 +1140,18 @@ void qemu_savevm_state_header(QEMUFile *f)
}
}

-int qemu_savevm_nr_failover_devices(void)
+bool qemu_savevm_state_guest_unplug_pending(void)
{
SaveStateEntry *se;
-int n = 0;

QTAILQ_FOREACH(se, _state.handlers, entry) {
if (se->vmsd && se->vmsd->dev_unplug_pending &&
se->vmsd->dev_unplug_pending(se->opaque)) {
-n++;
-}
-}
-
-return n;
-}
-
-bool qemu_savevm_state_guest_unplug_pending(void)
-{
-SaveStateEntry *se;
-int n = 0;
-
-QTAILQ_FOREACH(se, _state.handlers, entry) {
-if (!se->vmsd || !se->vmsd->dev_unplug_pending) {
-continue;
-}
-if (se->vmsd->dev_unplug_pending(se->opaque)) {
-n++;
+return true;
}
}

-return n > 0;
+return false;
}

void qemu_savevm_state_setup(QEMUFile *f)
diff --git a/migration/savevm.h b/migration/savevm.h
index c42b9c80ee..ba64a7e271 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,7 +31,6 @@

bool qemu_savevm_state_blocked(Error **errp);
void qemu_savevm_state_setup(QEMUFile *f);
-int qemu_savevm_nr_failover_devices(void);
bool qemu_savevm_state_guest_unplug_pending(void);
int qemu_savevm_state_resume_prepare(MigrationState *s);
void qemu_savevm_state_header(QEMUFile *f);
--
2.19.1


Looks good to me. I tested it and it still works, so 


Tested-by: Jens Freimann 
Reviewed-by: Jens Freimann  


regards
Jens




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-18 Thread Jens Freimann

On Wed, Dec 11, 2019 at 01:52:33PM +0100, Damien Hedde wrote:

On 12/4/19 7:51 PM, Eduardo Habkost wrote:

On Wed, Dec 04, 2019 at 05:21:25PM +0100, Jens Freimann wrote:

On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote:

On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote:

On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:

+jfreimann, +mst

And if migration fails this same device is plugged back using
failover_replug_primary():

static bool failover_replug_primary(VirtIONet *n, Error **errp)
{
[...]
qdev_set_parent_bus(n->primary_dev, n->primary_bus);
[...]
hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
if (hotplug_ctrl) {
hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, );
if (err) {
goto out;
}
hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
}
[...]
}


My concern is about the qdev_set_parent_bus() call above (because I
touch this function in my series) and don't want to have side effects there.

I looked at the code and thought the partial unplug has the effect of
cutting off the unplug procedure just before doing the qdev real unplug.
In pcie_unplug_device() we return before doing the object_unparent():

static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, ...
 {
 [...]
 if (dev->partially_hotplugged) {
 dev->qdev.pending_deleted_event = false;
 return;
 }
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
 object_unparent(OBJECT(dev));
 }


From my understanding, object_unparent() is the only way to really
unplug a device from a bus regarding qdev (and it also unrealizes the
device). So I have the feeling that qdev_set_parent_bus() here is a
no-op (because primary_dev is already on primary_bus).


I tested this now and it really is a no-op. It was required in a
earlier version of the patches and I missed to remove it when I
reworked the re-plug functionality.

I will send a patch to remove it.

regards
Jens




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-04 Thread Jens Freimann

On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote:

On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote:

On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:
> +jfreimann, +mst
>
> On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote:
> > On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost  wrote:
> > > So, to summarize the current issues:
> > >
> > > 1) realize triggers a plug operation implicitly.
> > > 2) unplug triggers unrealize implicitly.
> > >
> > > Do you expect to see use cases that will require us to implement
> > > realize-without-plug?
> >
> > I don't think so, but only because of the oddity that
> > we put lots of devices on the 'sysbus' and claim that
> > that's plugging them into the bus. The common case of
> > 'realize' is where one device (say an SoC) has a bunch of child
> > devices (like UARTs); the SoC's realize method realizes its child
> > devices. Those devices all end up plugged into the 'sysbus'
> > but there's no actual bus there, it's fictional and about
> > the only thing it matters for is reset propagation (which
> > we don't model right either). A few devices don't live on
> > buses at all.
>
> That's my impression as well.
>
> >
> > > Similarly, do you expect use cases that will require us to
> > > implement unplug-without-unrealize?
> >
> > I don't know enough about hotplug to answer this one:
> > it's essentially what I'm hoping you'd be able to answer.
> > I vaguely had in mind that eg the user might be able to
> > create a 'disk' object, plug it into a SCSI bus, then
> > unplug it from the bus without the disk and all its data
> > evaporating, and maybe plug it back into the SCSI
> > bus (or some other SCSI bus) later ? But I don't know
> > anything about how we expose that kind of thing to the
> > user via QMP/HMP.
>
> This ability isn't exposed to the user at all.  Our existing
> interfaces are -device, device_add and device_del.
>
> We do have something new that sounds suspiciously similar to
> "unplugged but not unrealized", though: the new hidden device
> API, added by commit f3a850565693 ("qdev/qbus: add hidden device
> support").
>
> Jens, Michael, what exactly is the difference between a "hidden"
> device and a "unplugged" device?

"hidden" the way we use it for virtio-net failover is actually unplugged. But it
doesn't have to be that way. You can register a function that decides
if the device should be hidden, i.e. plugged now, or do something else
with it (in the virtio-net failover case we just save everything we
need to plug the device later).

We did introduce a "unplugged but not unrealized" function too as part
of the failover feature. See "a99c4da9fc pci: mark devices partially
unplugged"

This was needed so we would be able to re-plug the device in case a
migration failed and we need to hotplug the primary device back to the
guest. To avoid the risk of not getting the resources the device needs
we don't unrealize but just trigger the unplug from the guest OS.


Thanks for the explanation.  Let me confirm if I understand the
purpose of the new mechanisms: should_be_hidden is a mechanism
for implementing realize-without-plug.  partially_hotplugged is a
mechanism for implementing unplug-without-unrealize.  Is that
correct?


should_be_hidden is a mechanism for implementing
realize-without-plug: kind of. It's a mechanism that ensures
qdev_device_add() returns early as long as the condition to hide the
device is true. You could to the realize-without-plug in the handler
function that decides if the device should be "hidden".  


partially_hotplugged is a mechanism for implementing
unplug-without-unrealize: yes. 


regards
Jens




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-04 Thread Jens Freimann

On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:

+jfreimann, +mst

On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote:

On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost  wrote:
> So, to summarize the current issues:
>
> 1) realize triggers a plug operation implicitly.
> 2) unplug triggers unrealize implicitly.
>
> Do you expect to see use cases that will require us to implement
> realize-without-plug?

I don't think so, but only because of the oddity that
we put lots of devices on the 'sysbus' and claim that
that's plugging them into the bus. The common case of
'realize' is where one device (say an SoC) has a bunch of child
devices (like UARTs); the SoC's realize method realizes its child
devices. Those devices all end up plugged into the 'sysbus'
but there's no actual bus there, it's fictional and about
the only thing it matters for is reset propagation (which
we don't model right either). A few devices don't live on
buses at all.


That's my impression as well.



> Similarly, do you expect use cases that will require us to
> implement unplug-without-unrealize?

I don't know enough about hotplug to answer this one:
it's essentially what I'm hoping you'd be able to answer.
I vaguely had in mind that eg the user might be able to
create a 'disk' object, plug it into a SCSI bus, then
unplug it from the bus without the disk and all its data
evaporating, and maybe plug it back into the SCSI
bus (or some other SCSI bus) later ? But I don't know
anything about how we expose that kind of thing to the
user via QMP/HMP.


This ability isn't exposed to the user at all.  Our existing
interfaces are -device, device_add and device_del.

We do have something new that sounds suspiciously similar to
"unplugged but not unrealized", though: the new hidden device
API, added by commit f3a850565693 ("qdev/qbus: add hidden device
support").

Jens, Michael, what exactly is the difference between a "hidden"
device and a "unplugged" device?


"hidden" the way we use it for virtio-net failover is actually unplugged. But it
doesn't have to be that way. You can register a function that decides
if the device should be hidden, i.e. plugged now, or do something else
with it (in the virtio-net failover case we just save everything we
need to plug the device later).  


We did introduce a "unplugged but not unrealized" function too as part
of the failover feature. See "a99c4da9fc pci: mark devices partially
unplugged"

This was needed so we would be able to re-plug the device in case a
migration failed and we need to hotplug the primary device back to the
guest. To avoid the risk of not getting the resources the device needs
we don't unrealize but just trigger the unplug from the guest OS.

regards
Jens




Re: [PATCH 00/21] Error handling fixes, may contain 4.2 material

2019-12-02 Thread Jens Freimann

On Sat, Nov 30, 2019 at 08:42:19PM +0100, Markus Armbruster wrote:

PATCH 2-4 fix crash bugs.  Including them would be a no-brainer at
-rc0.  But we're post -rc3, and even for crash bugs we require a
certain likelihood of users getting bitten.

Jens, please assess impact of PATCH 2's crash bug.


Guest can't use it to trigger a qemu crash because the hotplug_handler
called after the pre-plug doesn't do anything with errp.

qemu_opt_set_bool() is also unlikely to trigger the bug because both
reasons for why it would set errp are not true in this path.

I would prefer for this fix to go into 4.2 but if it is the only one
and triggers a large amount of work then don't do it.

regards
Jens




Re: [PATCH 01/21] net/virtio: Drop useless n->primary_dev not null checks

2019-12-02 Thread Jens Freimann

On Sat, Nov 30, 2019 at 08:42:20PM +0100, Markus Armbruster wrote:

virtio_net_handle_migration_primary() returns early when it can't
ensure n->primary_dev is non-null.  Checking it again right after that
early return is redundant.  Drop.

If n->primary_dev is null on entering failover_replug_primary(), @pdev
will become null, and pdev->partially_hotplugged will crash.  Checking
n->primary_dev later is useless.  It can't actually be null, because
its caller virtio_net_handle_migration_primary() ensures it isn't.
Drop the useless check.

Cc: Jens Freimann 
Cc: Michael S. Tsirkin 
Signed-off-by: Markus Armbruster 
---
hw/net/virtio-net.c | 8 +---
1 file changed, 1 insertion(+), 7 deletions(-)



Thanks Markus!

Reviewed-by: Jens Freimann 

regards
Jens




Re: [PATCH 02/21] net/virtio: Fix failover error handling crash bugs

2019-12-02 Thread Jens Freimann

On Sat, Nov 30, 2019 at 08:42:21PM +0100, Markus Armbruster wrote:

Functions that take an Error ** parameter to pass an error to the
caller expect the parameter to point to null.
failover_replug_primary() violates this precondition in several
places:

* After qemu_opts_from_qdict() failed, *errp is no longer null.
 Passing it to error_setg() is wrong, and will trip the assertion in
 error_setv().  Messed up in commit 150ab54aa6 "net/virtio: fix
 re-plugging of primary device".  Simply drop the error_setg().

* Passing @errp to qemu_opt_set_bool(), hotplug_handler_pre_plug(),
 and hotplug_handler_plug() is wrong.  If one of the first two fails,
 *errp is no longer null.  Risks tripping the same assertion.
 Moreover, continuing after such errors is unsafe.  Messed up in
 commit 9711cd0dfc "net/virtio: add failover support".  Fix by
 handling each error properly.

failover_replug_primary() crashes when passed a null @errp.  Also
messed up in commit 9711cd0dfc.  This bug can't bite as no caller
actually passes null.  Fix it anyway.

Fixes: 9711cd0dfc3fa414f7f64935713c07134ae67971
Fixes: 150ab54aa6934583180f88a2bd540bc6fc4fbff3
Cc: Jens Freimann 
Cc: Michael S. Tsirkin 
Signed-off-by: Markus Armbruster 
---
hw/net/virtio-net.c | 19 +--
1 file changed, 13 insertions(+), 6 deletions(-)



Thanks Markus!

Reviewed-by: Jens Freimann 





Re: [PULL 0/5] i386 patches for QEMU 4.2-rc

2019-11-28 Thread Jens Freimann

On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote:

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:

On 11/26/19 10:19 AM, no-re...@patchew.org wrote:
> Patchew URL: 
https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/
>
> This series failed the docker-quick@centos7 build test. Please find the 
testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>TESTcheck-unit: tests/test-thread-pool
> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
> **
> ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> make: *** [check-qtest-aarch64] Error 1

Should we worry about this error?


Interesting; that should be fixed by Jens'
284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending
is still lying;  it's showing we're still landing in 'wait-unplug' on
aarch, because it's got a virtio-net by default; even though we've not
got a failover device setup.  CCing Jens.


I've run this test  on aarch64 in a loop today for a few hours but could not
reproduce this error.

One bug I found is that in primary_unplug_device() I look at the
virtio guest feature bits instead of the negotiated bits. But I don't
think this could lead to the above problem because even if the check
for the feature bit fails, primary_unplug_pending would still return
false because no primary device was set and n->primary_dev is NULL. 

I'll keep the test running until I can reproduce it. 


regards
Jens




Re: [PULL 0/5] i386 patches for QEMU 4.2-rc

2019-11-27 Thread Jens Freimann

On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote:

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:

On 11/26/19 10:19 AM, no-re...@patchew.org wrote:
> Patchew URL: 
https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/
>
> This series failed the docker-quick@centos7 build test. Please find the 
testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>TESTcheck-unit: tests/test-thread-pool
> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
> **
> ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> make: *** [check-qtest-aarch64] Error 1

Should we worry about this error?


Interesting; that should be fixed by Jens'
284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending
is still lying;  it's showing we're still landing in 'wait-unplug' on
aarch, because it's got a virtio-net by default; even though we've not
got a failover device setup.  CCing Jens.


Hmm, I did test it. I'm looking into it.

regards
Jens




[PATCH v3 2/4] net/virtio: return early when failover primary alread added

2019-11-20 Thread Jens Freimann
Bail out when primary device was already added before.
This avoids printing a wrong warning message during reboot.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 946039c0dc..ac4d19109e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error 
**errp)
 {
 Error *err = NULL;
 
+if (n->primary_dev) {
+return;
+}
+
 n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
 n->primary_device_id);
 if (n->primary_device_opts) {
-- 
2.21.0




[PATCH v3 3/4] net/virtio: fix re-plugging of primary device

2019-11-20 Thread Jens Freimann
failover_replug_primary was returning true on failure which lead to
re-plug not working when a migration failed.  Fix this by returning
success when hotplug worked.  This is a bug that was missed in last
round of testing but was tested succesfully with this version.  Also
make sure we don't pass NULL to qdev_set_parent_bus().

This fixes CID 1407224.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ac4d19109e..565dea0b93 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 n->primary_device_opts = qemu_opts_from_qdict(
 qemu_find_opts("device"),
 n->primary_device_dict, errp);
-}
-if (n->primary_device_opts) {
-if (n->primary_dev) {
-n->primary_bus = n->primary_dev->parent_bus;
-}
-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
-n->primary_should_be_hidden = false;
-qemu_opt_set_bool(n->primary_device_opts,
-"partially_hotplugged", true, errp);
-hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
-if (hotplug_ctrl) {
-hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
-hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+if (!n->primary_device_opts) {
+error_setg(errp, "virtio_net: couldn't find primary device opts");
+goto out;
 }
-if (!n->primary_dev) {
+}
+if (!n->primary_dev) {
 error_setg(errp, "virtio_net: couldn't find primary device");
-}
+goto out;
 }
-return *errp != NULL;
+
+n->primary_bus = n->primary_dev->parent_bus;
+if (!n->primary_bus) {
+error_setg(errp, "virtio_net: couldn't find primary bus");
+goto out;
+}
+qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+n->primary_should_be_hidden = false;
+qemu_opt_set_bool(n->primary_device_opts,
+  "partially_hotplugged", true, errp);
+hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
+if (hotplug_ctrl) {
+hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
+hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+}
+
+out:
+return *errp == NULL;
 }
 
 static void virtio_net_handle_migration_primary(VirtIONet *n,
@@ -2852,7 +2860,7 @@ static void virtio_net_handle_migration_primary(VirtIONet 
*n,
 warn_report("couldn't unplug primary device");
 }
 } else if (migration_has_failed(s)) {
-/* We already unplugged the device let's plugged it back */
+/* We already unplugged the device let's plug it back */
 if (!failover_replug_primary(n, )) {
 if (err) {
 error_report_err(err);
-- 
2.21.0




[PATCH v3 1/4] net/virtio: fix dev_unplug_pending

2019-11-20 Thread Jens Freimann
.dev_unplug_pending is set up by virtio-net code indepent of failover
support was set for the device or not. This gives a wrong result when
we check for existing primary devices in migration code.

Fix this by actually calling dev_unplug_pending() instead of just
checking if the function pointer was set. When the feature was not
negotiated dev_unplug_pending() will always return false. This prevents
us from going into the wait-unplug state when there's no primary device
present.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reported-by: Dr. David Alan Gilbert 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 3 +++
 migration/savevm.c  | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 97a5113f7e..946039c0dc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIONet *n = VIRTIO_NET(vdev);
 
+if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
+return false;
+}
 return n->primary_dev ? n->primary_dev->pending_deleted_event : false;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 966a9c3bdb..a71b930b91 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void)
 int n = 0;
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
-if (se->vmsd && se->vmsd->dev_unplug_pending) {
+if (se->vmsd && se->vmsd->dev_unplug_pending &&
+se->vmsd->dev_unplug_pending(se->opaque)) {
 n++;
 }
 }
-- 
2.21.0




[PATCH v3 4/4] net/virtio: return error when device_opts arg is NULL

2019-11-20 Thread Jens Freimann
This fixes CID 1407222.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 565dea0b93..3c31471026 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2880,9 +2880,12 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 QemuOpts *device_opts)
 {
 VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
-bool match_found;
-bool hide;
+bool match_found = false;
+bool hide = false;
 
+if (!device_opts) {
+return -1;
+}
 n->primary_device_dict = qemu_opts_to_qdict(device_opts,
 n->primary_device_dict);
 if (n->primary_device_dict) {
@@ -2890,7 +2893,7 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
 "failover_pair_id"));
 }
-if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) {
+if (g_strcmp0(n->standby_id, n->netclient_name) == 0) {
 match_found = true;
 } else {
 match_found = false;
-- 
2.21.0




[PATCH v3 0/4] net/virtio: fixes for failover

2019-11-20 Thread Jens Freimann
This series fixes bugs found by coverity and one reported by David
Gilbert.

v2->v3:
 * change patch description and subject of patch 3/4

Jens Freimann (4):
  net/virtio: fix dev_unplug_pending
  net/virtio: return early when failover primary alread added
  net/virtio: fix re-plugging of primary device
  net/virtio: return error when device_opts arg is NULL

 hw/net/virtio-net.c | 58 +
 migration/savevm.c  |  3 ++-
 2 files changed, 40 insertions(+), 21 deletions(-)

-- 
2.21.0




Re: [PATCH v2 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus

2019-11-20 Thread Jens Freimann

On Wed, Nov 20, 2019 at 10:04:01AM -0500, Michael S. Tsirkin wrote:

On Wed, Nov 20, 2019 at 03:38:58PM +0100, Jens Freimann wrote:

Make sure we don't pass NULL to qdev_set_parent_bus(). Also simplify
code a bit and fix a typo.
This fixes CID 1407224.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 


patch itself is ok I think

Reviewed-by: Michael S. Tsirkin 

but some questions on the commit log


---
 hw/net/virtio-net.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ac4d19109e..78f1e4e87c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 n->primary_device_opts = qemu_opts_from_qdict(
 qemu_find_opts("device"),
 n->primary_device_dict, errp);
-}
-if (n->primary_device_opts) {
-if (n->primary_dev) {
-n->primary_bus = n->primary_dev->parent_bus;
-}
-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
-n->primary_should_be_hidden = false;
-qemu_opt_set_bool(n->primary_device_opts,
-"partially_hotplugged", true, errp);
-hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
-if (hotplug_ctrl) {
-hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
-hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+if (!n->primary_device_opts) {
+error_setg(errp, "virtio_net: couldn't find primary device opts");
+goto out;
 }
-if (!n->primary_dev) {
+}
+if (!n->primary_dev) {
 error_setg(errp, "virtio_net: couldn't find primary device");
-}
+goto out;
 }
-return *errp != NULL;
+
+n->primary_bus = n->primary_dev->parent_bus;
+if (!n->primary_bus) {
+error_setg(errp, "virtio_net: couldn't find primary bus");
+goto out;
+}
+qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+n->primary_should_be_hidden = false;
+qemu_opt_set_bool(n->primary_device_opts,
+  "partially_hotplugged", true, errp);
+hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
+if (hotplug_ctrl) {
+hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
+hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+}
+
+out:
+return *errp == NULL;
 }

 static void virtio_net_handle_migration_primary(VirtIONet *n,


So the logic actually was inverted here? Then ... how did this work?
and how was it tested?


It did not work and I missed the error in my test output. I tested it
now manually by migrating and then cancelling. Device was added back
to source VM successfully.

Will sent new version with better patch description.

Thanks!

regards
Jens




[PATCH v2 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus

2019-11-20 Thread Jens Freimann
Make sure we don't pass NULL to qdev_set_parent_bus(). Also simplify
code a bit and fix a typo.
This fixes CID 1407224.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ac4d19109e..78f1e4e87c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2805,25 +2805,33 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 n->primary_device_opts = qemu_opts_from_qdict(
 qemu_find_opts("device"),
 n->primary_device_dict, errp);
-}
-if (n->primary_device_opts) {
-if (n->primary_dev) {
-n->primary_bus = n->primary_dev->parent_bus;
-}
-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
-n->primary_should_be_hidden = false;
-qemu_opt_set_bool(n->primary_device_opts,
-"partially_hotplugged", true, errp);
-hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
-if (hotplug_ctrl) {
-hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
-hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+if (!n->primary_device_opts) {
+error_setg(errp, "virtio_net: couldn't find primary device opts");
+goto out;
 }
-if (!n->primary_dev) {
+}
+if (!n->primary_dev) {
 error_setg(errp, "virtio_net: couldn't find primary device");
-}
+goto out;
 }
-return *errp != NULL;
+
+n->primary_bus = n->primary_dev->parent_bus;
+if (!n->primary_bus) {
+error_setg(errp, "virtio_net: couldn't find primary bus");
+goto out;
+}
+qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+n->primary_should_be_hidden = false;
+qemu_opt_set_bool(n->primary_device_opts,
+  "partially_hotplugged", true, errp);
+hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
+if (hotplug_ctrl) {
+hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
+hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+}
+
+out:
+return *errp == NULL;
 }
 
 static void virtio_net_handle_migration_primary(VirtIONet *n,
@@ -2852,7 +2860,7 @@ static void virtio_net_handle_migration_primary(VirtIONet 
*n,
 warn_report("couldn't unplug primary device");
 }
 } else if (migration_has_failed(s)) {
-/* We already unplugged the device let's plugged it back */
+/* We already unplugged the device let's plug it back */
 if (!failover_replug_primary(n, )) {
 if (err) {
 error_report_err(err);
-- 
2.21.0




[PATCH v2 0/4] net/virtio: fixes for failover

2019-11-20 Thread Jens Freimann
This series fixes bugs found by coverity and one reported by David
Gilbert.

Jens Freimann (4):
  net/virtio: fix dev_unplug_pending
  net/virtio: return early when failover primary alread added
  net/virtio: avoid passing NULL to qdev_set_parent_bus
  net/virtio: return error when device_opts arg is NULL

 hw/net/virtio-net.c | 58 +
 migration/savevm.c  |  3 ++-
 2 files changed, 40 insertions(+), 21 deletions(-)

-- 
2.21.0




[PATCH v2 4/4] net/virtio: return error when device_opts arg is NULL

2019-11-20 Thread Jens Freimann
This fixes CID 1407222.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 78f1e4e87c..600c6e164f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2880,9 +2880,12 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 QemuOpts *device_opts)
 {
 VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
-bool match_found;
-bool hide;
+bool match_found = false;
+bool hide = false;
 
+if (!device_opts) {
+return -1;
+}
 n->primary_device_dict = qemu_opts_to_qdict(device_opts,
 n->primary_device_dict);
 if (n->primary_device_dict) {
@@ -2890,7 +2893,7 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
 "failover_pair_id"));
 }
-if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) {
+if (g_strcmp0(n->standby_id, n->netclient_name) == 0) {
 match_found = true;
 } else {
 match_found = false;
-- 
2.21.0




[PATCH v2 2/4] net/virtio: return early when failover primary alread added

2019-11-20 Thread Jens Freimann
Bail out when primary device was already added before.
This avoids printing a wrong warning message during reboot.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 946039c0dc..ac4d19109e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error 
**errp)
 {
 Error *err = NULL;
 
+if (n->primary_dev) {
+return;
+}
+
 n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
 n->primary_device_id);
 if (n->primary_device_opts) {
-- 
2.21.0




[PATCH v2 1/4] net/virtio: fix dev_unplug_pending

2019-11-20 Thread Jens Freimann
.dev_unplug_pending is set up by virtio-net code indepent of failover
support was set for the device or not. This gives a wrong result when
we check for existing primary devices in migration code.

Fix this by actually calling dev_unplug_pending() instead of just
checking if the function pointer was set. When the feature was not
negotiated dev_unplug_pending() will always return false. This prevents
us from going into the wait-unplug state when there's no primary device
present.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reported-by: Dr. David Alan Gilbert 
Reviewed-by: Michael S. Tsirkin 
---
 hw/net/virtio-net.c | 3 +++
 migration/savevm.c  | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 97a5113f7e..946039c0dc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIONet *n = VIRTIO_NET(vdev);
 
+if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
+return false;
+}
 return n->primary_dev ? n->primary_dev->pending_deleted_event : false;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 966a9c3bdb..a71b930b91 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void)
 int n = 0;
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
-if (se->vmsd && se->vmsd->dev_unplug_pending) {
+if (se->vmsd && se->vmsd->dev_unplug_pending &&
+se->vmsd->dev_unplug_pending(se->opaque)) {
 n++;
 }
 }
-- 
2.21.0




Re: [PATCH 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus

2019-11-20 Thread Jens Freimann

On Wed, Nov 20, 2019 at 05:46:36AM -0500, Michael S. Tsirkin wrote:

On Thu, Nov 14, 2019 at 03:16:12PM +0100, Jens Freimann wrote:

Make sure no arguments for qdev_set_parent_bus are NULL.
This fixes CID 1407224.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ac4d19109e..81650d4dc0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2809,8 +2809,16 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 if (n->primary_device_opts) {
 if (n->primary_dev) {
 n->primary_bus = n->primary_dev->parent_bus;
+if (n->primary_bus) {
+qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+} else  {
+error_setg(errp, "virtio_net: couldn't set bus for primary");
+goto out;
+}
+} else {
+error_setg(errp, "virtio_net: couldn't find primary device");
+goto out;
 }



A bit less messy:

if (!n->primary_dev) {
   error_setg(errp, "virtio_net: couldn't find primary device");
   goto err;
}

n->primary_bus = n->primary_dev->parent_bus;
if (!n->primary_bus) {
 error_setg(errp, "virtio_net: couldn't find primary device");
 goto err;
}

qdev_set_parent_bus(n->primary_dev, n->primary_bus);

err:
return false;

also is it valid for primary_device_opts to not be present at all?
Maybe we should check that too.




-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
 n->primary_should_be_hidden = false;
 qemu_opt_set_bool(n->primary_device_opts,
 "partially_hotplugged", true, errp);
@@ -2819,10 +2827,8 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
 hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
 }
-if (!n->primary_dev) {
-error_setg(errp, "virtio_net: couldn't find primary device");
-}
 }
+out:
 return *errp != NULL;
 }


I'd handle errors separately from good path.
BTW I don't understand something here:
*errp != NULL is true on error, no?
Why are we returning success?
Confused.
Just return true would be cleaner imho.


I'll fix this and sent a new version as a single patch. As you said
this is not really a series, just individual patches. 


Thanks!

regards
Jens




[PATCH 3/4] net/virtio: avoid passing NULL to qdev_set_parent_bus

2019-11-14 Thread Jens Freimann
Make sure no arguments for qdev_set_parent_bus are NULL.
This fixes CID 1407224.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ac4d19109e..81650d4dc0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2809,8 +2809,16 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 if (n->primary_device_opts) {
 if (n->primary_dev) {
 n->primary_bus = n->primary_dev->parent_bus;
+if (n->primary_bus) {
+qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+} else  {
+error_setg(errp, "virtio_net: couldn't set bus for primary");
+goto out;
+}
+} else {
+error_setg(errp, "virtio_net: couldn't find primary device");
+goto out;
 }
-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
 n->primary_should_be_hidden = false;
 qemu_opt_set_bool(n->primary_device_opts,
 "partially_hotplugged", true, errp);
@@ -2819,10 +2827,8 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
 hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
 }
-if (!n->primary_dev) {
-error_setg(errp, "virtio_net: couldn't find primary device");
-}
 }
+out:
 return *errp != NULL;
 }
 
-- 
2.21.0




[PATCH 4/4] net/virtio: return error when device_opts arg is NULL

2019-11-14 Thread Jens Freimann
device_opts could be NULL. Make sure we don't pass it to
qemu_opts_to_dict. When we made sure it can't be NULL we can also remove
it from the if condition.
This fixes CID 1407222.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 81650d4dc0..d53aa5796b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2878,9 +2878,12 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 QemuOpts *device_opts)
 {
 VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
-bool match_found;
-bool hide;
+bool match_found = false;
+bool hide = false;
 
+if (!device_opts) {
+return -1;
+}
 n->primary_device_dict = qemu_opts_to_qdict(device_opts,
 n->primary_device_dict);
 if (n->primary_device_dict) {
@@ -2888,7 +2891,7 @@ static int 
virtio_net_primary_should_be_hidden(DeviceListener *listener,
 n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
 "failover_pair_id"));
 }
-if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) {
+if (g_strcmp0(n->standby_id, n->netclient_name) == 0) {
 match_found = true;
 } else {
 match_found = false;
-- 
2.21.0




[PATCH 2/4] net/virtio: return early when failover primary alread added

2019-11-14 Thread Jens Freimann
Bail out when primary device was already added before.
This avoids printing a wrong warning message during reboot.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 946039c0dc..ac4d19109e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -759,6 +759,10 @@ static void failover_add_primary(VirtIONet *n, Error 
**errp)
 {
 Error *err = NULL;
 
+if (n->primary_dev) {
+return;
+}
+
 n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
 n->primary_device_id);
 if (n->primary_device_opts) {
-- 
2.21.0




[PATCH 1/4] net/virtio: fix dev_unplug_pending

2019-11-14 Thread Jens Freimann
.dev_unplug_pending is set up by virtio-net code indepent of whether
failover=on was set for the device or not. This gives a wrong result when
we check for existing primary devices in migration code.

Fix this by actually calling dev_unplug_pending() instead of just
checking if the function pointer was set. When the feature was not
negotiated dev_unplug_pending() will always return false. This prevents
us from going into the wait-unplug state when there's no primary device
present.

Fixes: 9711cd0dfc3f ("net/virtio: add failover support")
Signed-off-by: Jens Freimann 
Reported-by: Dr. David Alan Gilbert 
---
 hw/net/virtio-net.c | 3 +++
 migration/savevm.c  | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 97a5113f7e..946039c0dc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3124,6 +3124,9 @@ static bool primary_unplug_pending(void *opaque)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIONet *n = VIRTIO_NET(vdev);
 
+if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
+return false;
+}
 return n->primary_dev ? n->primary_dev->pending_deleted_event : false;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 966a9c3bdb..a71b930b91 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1119,7 +1119,8 @@ int qemu_savevm_nr_failover_devices(void)
 int n = 0;
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
-if (se->vmsd && se->vmsd->dev_unplug_pending) {
+if (se->vmsd && se->vmsd->dev_unplug_pending &&
+se->vmsd->dev_unplug_pending(se->opaque)) {
 n++;
 }
 }
-- 
2.21.0




[PATCH] vfio: don't ignore return value of migrate_add_blocker

2019-11-14 Thread Jens Freimann
When an error occurs in migrate_add_blocker() it sets a
negative return value and uses error pointer we pass in.
Instead of just looking at the error pointer check for a negative return
value and avoid a coverity error because the return value is
set but never used. This fixes CID 1407219.

Fixes: f045a0104c8c ("vfio: unplug failover primary device before
  migration")
Signed-off-by: Jens Freimann 
---
 hw/vfio/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e6569a7968..ed01774673 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2737,7 +2737,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 error_setg(>migration_blocker,
 "VFIO device doesn't support migration");
 ret = migrate_add_blocker(vdev->migration_blocker, );
-if (err) {
+if (ret) {
 error_propagate(errp, err);
 error_free(vdev->migration_blocker);
 return;
-- 
2.21.0




Re: [PATCH] tests/migration: Print some debug on bad status

2019-11-11 Thread Jens Freimann

On Fri, Nov 08, 2019 at 06:38:17PM +, Dr. David Alan Gilbert wrote:

Hi Jens,
 the unplug failover stuff is triggering an assertion occasionally on
aarch64; but
 a) I'm not sure the right way to fix it
 b) And I'm out for a little over a week

so...


I'll take care of it. 



* no-re...@patchew.org (no-re...@patchew.org) wrote:

Patchew URL: 
https://patchew.org/QEMU/20191108104307.125020-1-dgilb...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTcheck-unit: tests/test-bdrv-drain
wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1


In tests/migration-test.c we've got wait_for_migration_fail, and it's
expecting the state to be any one of:
  setup, failed or maybe active

but it's getting surprised by seeing a 'wait-unplug'

So the question is should we see a wait-unplug?

the migration code has:

   if (qemu_savevm_nr_failover_devices()) {
   migrate_set_state(>state, MIGRATION_STATUS_SETUP,
 MIGRATION_STATUS_WAIT_UNPLUG);

Should qemu_savevm_nr_failover_devices() be true?
On aarch64 it seems to have a virtio-net device by default
and qemu_savevm_nr_failover_devices() checks for devices
having dev_unplug_pending but doesn't call it.

I see two fixes but am not sure which is right:
  a) Add 'wait-unplug' to the wait_for_migration_fail
 (easy)
  b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices
 so that on a guest which has a virtio-net, but no failover device,
 the state isn't entered.


I also prefer b over a, but I should only set up dev_unplug_pending when
failover=on to avoid this. I'll send a patch.

regards
Jens




Re: [PATCH v6 06/11] qapi: add failover negotiated event

2019-10-30 Thread Jens Freimann

On Tue, Oct 29, 2019 at 06:41:58PM -0400, Michael S. Tsirkin wrote:

On Fri, Oct 25, 2019 at 07:39:21PM +0200, Jens Freimann wrote:

On Fri, Oct 25, 2019 at 04:03:54PM +0200, Markus Armbruster wrote:
> Bear with me, I know next to nothing about failover.
>
> Jens Freimann  writes:
>
> > This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature
> > was enabled. The primary device this virtio-net device is associated
> > with, will now be hotplugged via qdev_device_add().
>
> Passive voice deftly avoids telling the reader who will do the
> hot-plugging.  Intentional?

Not really, it's in the comment to the event. The hotplug will be
done by the virtio-net device code that activates the feature, in
virtio_net_set_features().

>
> > Signed-off-by: Jens Freimann 
> > Acked-by: Cornelia Huck 
> > ---
> >  qapi/net.json | 19 +++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 728990f4fb..ea64f7 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -737,3 +737,22 @@
> >  ##
> >  { 'command': 'announce-self', 'boxed': true,
> >'data' : 'AnnounceParameters'}
> > +
> > +##
> > +# @FAILOVER_NEGOTIATED:
> > +#
> > +# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation.
> > +# Failover primary devices which were hidden (not hotplugged when 
requested)
> > +# before will now be hotplugged by the virtio-net standby device.
> > +#
> > +# device-id: QEMU device id of the unplugged device
>
> @device-id is new since v5.
>
> A quick skim of
> https://www.kernel.org/doc/html/latest/networking/net_failover.html
> tells me there are three devices involved: master, primary slave,
> standby slave.  Which one is @device-id?  Or am I confused?

Yes, the device-id is new and it's the device-id of the standby (i.e.
virtio-net) device.

regards,
Jens


And now I am confused.
How is standby "the unplugged device"?
Why not just say "the standby device"?


Yes, that's better. Do you want a patch on top?

diff --git a/qapi/net.json b/qapi/net.json
index ea64f7..0f225cb900 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -745,7 +745,7 @@
# Failover primary devices which were hidden (not hotplugged when requested)
# before will now be hotplugged by the virtio-net standby device.
#
-# device-id: QEMU device id of the unplugged device
+# device-id: QEMU device id of the standby device
# Since: 4.2
#
# Example:


Signed-off-by: Jens Freimann 

regards,
Jens 



--
MST





Re: [PATCH v7 05/11] qapi: add unplug primary event

2019-10-29 Thread Jens Freimann

On Tue, Oct 29, 2019 at 01:50:02PM +0100, Markus Armbruster wrote:

Jens Freimann  writes:


This event is emitted when we sent a request to unplug a


Uh, "we sent a requestion [...] from the Guest OS"...  do you mean "we
received"?


No, we sent a pci hotplug event to the guest by "pushing" the pcie attention
button. So, it's QEMU requesting the unplug.


+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device-id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} }
+#
+##
+{ 'event': 'UNPLUG_PRIMARY',
+  'data': { 'device-id': 'str' } }


Commit message might need a touch-up.  Regardless:
Acked-by: Markus Armbruster 


Thanks for the review!

regards,
Jens 





[PATCH v7 11/11] vfio: unplug failover primary device before migration

2019-10-29 Thread Jens Freimann
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann 
Acked-by: Alex Williamson 
---
 hw/vfio/pci.c | 26 --
 hw/vfio/pci.h |  1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..e6569a7968 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
+if (!pdev->failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+error_free(vdev->migration_blocker);
+return;
+}
+}
+
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = _pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
@@ -3008,6 +3020,10 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
-.name = "vfio-pci",
-.unmigratable = 1,
-};
-
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = vfio_pci_reset;
 dc->props = vfio_pci_dev_properties;
-dc->vmsd = _pci_vmstate;
 dc->desc = "VFIO-based PCI device assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 pdc->realize = vfio_realize;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..b329d50338 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
+Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0




[PATCH v7 09/11] libqos: tolerate wait-unplug migration state

2019-10-29 Thread Jens Freimann
Signed-off-by: Jens Freimann 
---
 tests/libqos/libqos.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
index d71557c5cb..f229eb2cb8 100644
--- a/tests/libqos/libqos.c
+++ b/tests/libqos/libqos.c
@@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri)
 break;
 }
 
-if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) {
+if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)
+|| (strcmp(st, "wait-unplug") == 0)) {
 qobject_unref(rsp);
 g_usleep(5000);
 continue;
-- 
2.21.0




[PATCH v7 08/11] migration: add new migration state wait-unplug

2019-10-29 Thread Jens Freimann
This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/vmstate.h |  2 ++
 migration/migration.c   | 21 +
 migration/migration.h   |  3 +++
 migration/savevm.c  | 31 +++
 migration/savevm.h  |  2 ++
 qapi/migration.json |  5 -
 6 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ac4f46a67d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -186,6 +186,8 @@ struct VMStateDescription {
 int (*pre_save)(void *opaque);
 int (*post_save)(void *opaque);
 bool (*needed)(void *opaque);
+bool (*dev_unplug_pending)(void *opaque);
+
 const VMStateField *fields;
 const VMStateDescription **subsections;
 };
diff --git a/migration/migration.c b/migration/migration.c
index 4133ed2684..354ad072fa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "net/announce.h"
+#include "qemu/queue.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state)
 case MIGRATION_STATUS_SETUP:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return true;
 
 default:
@@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info)
 case MIGRATION_STATUS_CANCELLED:
 info->has_status = true;
 break;
+case MIGRATION_STATUS_WAIT_UNPLUG:
+info->has_status = true;
+break;
 }
 info->status = s->state;
 }
@@ -1694,6 +1699,7 @@ bool migration_is_idle(void)
 case MIGRATION_STATUS_COLO:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return false;
 case MIGRATION_STATUS__MAX:
 g_assert_not_reached();
@@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque)
 
 qemu_savevm_state_setup(s->to_dst_file);
 
+if (qemu_savevm_nr_failover_devices()) {
+migrate_set_state(>state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_WAIT_UNPLUG);
+
+while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
+   qemu_savevm_state_guest_unplug_pending()) {
+qemu_sem_timedwait(>wait_unplug_sem, 250);
+}
+
+migrate_set_state(>state, MIGRATION_STATUS_WAIT_UNPLUG,
+MIGRATION_STATUS_ACTIVE);
+}
+
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_ACTIVE);
@@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_mutex_destroy(>qemu_file_lock);
 g_free(params->tls_hostname);
 g_free(params->tls_creds);
+qemu_sem_destroy(>wait_unplug_sem);
 qemu_sem_destroy(>rate_limit_sem);
 qemu_sem_destroy(>pause_sem);
 qemu_sem_destroy(>postcopy_pause_sem);
@@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(>postcopy_pause_rp_sem, 0);
 qemu_sem_init(>rp_state.rp_sem, 0);
 qemu_sem_init(>rate_limit_sem, 0);
+qemu_sem_init(>wait_unplug_sem, 0);
 qemu_mutex_init(>qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 4f2fe193dc..79b3dda146 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,9 @@ struct MigrationState
 /* Flag set once the migration thread called bdrv_inactivate_all */
 bool block_inactive;
 
+/* Migration is waiting for guest to unplug device */
+QemuSemaphore wait_unplug_sem;
+
 /* Migration is paused due to pause-before-switchover */
 QemuSemaphore pause_sem;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e261f6..966a9c3bdb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1113,6 +1113,37 @@ void qemu_savevm_state_header(QEMUFile *f)
 }
 }
 
+int qemu_savevm_nr_failover_devices(void)
+{
+SaveStateEntry *se;
+int n = 0;
+
+QTAILQ_FOREACH(se, _state.handlers, entry)

[PATCH v7 06/11] qapi: add failover negotiated event

2019-10-29 Thread Jens Freimann
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature
is enabled. The primary device this virtio-net (standby) device is
associated with, is now hotplugged by the virtio-net device.

Signed-off-by: Jens Freimann 
---
 qapi/net.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..ea64f7 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,22 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation.
+# Failover primary devices which were hidden (not hotplugged when requested)
+# before will now be hotplugged by the virtio-net standby device.
+#
+# device-id: QEMU device id of the unplugged device
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "FAILOVER_NEGOTIATED",
+#  "data": "net1" }
+#
+##
+{ 'event': 'FAILOVER_NEGOTIATED',
+  'data': {'device-id': 'str'} }
-- 
2.21.0




[PATCH v7 05/11] qapi: add unplug primary event

2019-10-29 Thread Jens Freimann
This event is emitted when we sent a request to unplug a
failover primary device from the Guest OS and it includes the
device id of the primary device.

Signed-off-by: Jens Freimann 
---
 qapi/migration.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82feb5bd39..e9e7a97c03 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1448,3 +1448,22 @@
 # Since: 3.0
 ##
 { 'command': 'migrate-pause', 'allow-oob': true }
+
+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device-id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} }
+#
+##
+{ 'event': 'UNPLUG_PRIMARY',
+  'data': { 'device-id': 'str' } }
-- 
2.21.0




[PATCH v7 10/11] net/virtio: add failover support

2019-10-29 Thread Jens Freimann
This patch adds support to handle failover device pairs of a virtio-net
device and a (vfio-)pci device, where the virtio-net acts as the standby
device and the (vfio-)pci device as the primary.

The general idea is that we have a pair of devices, a (vfio-)pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another (vfio-)pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   if the device should be hidden or not.
   When it should be hidden it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

   If the virtio-net devices are not realized at the time the (vfio-)pci
   devices are realized, we need to connect the devices later. This way
   we make sure primary and standby devices can be specified in any
   order.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

3. Register a callback for the migration code that checks if a device
   needs to be unplugged from the guest.

Signed-off-by: Jens Freimann 
---
 MAINTAINERS|   1 +
 docs/virtio-net-failover.rst   |  68 
 hw/net/virtio-net.c| 302 +
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 5 files changed, 384 insertions(+)
 create mode 100644 docs/virtio-net-failover.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 42e702f346..9d94cbacf8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1421,6 +1421,7 @@ S: Odd Fixes
 F: hw/net/
 F: include/hw/net/
 F: tests/virtio-net-test.c
+F: docs/virtio-net-failover.rst
 T: git https://github.com/jasowang/qemu.git net
 
 Parallel NOR Flash devices
diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst
new file mode 100644
index 00..22f64c7bc8
--- /dev/null
+++ b/docs/virtio-net-failover.rst
@@ -0,0 +1,68 @@
+
+QEMU virtio-net standby (net_failover)
+
+
+This document explains the setup and usage of virtio-net standby feature which
+is used to create a net_failover pair of devices.
+
+The general idea is that we have a pair of devices, a (vfio-)pci and a
+virtio-net device. Before migration the vfio device is unplugged and data flows
+through the virtio-net device, on the target side another vfio-pci device is
+plugged in to take over the data-path. In the guest the net_failover kernel
+module will pair net devices with the same MAC address.
+
+The two devices are called primary and standby device. The fast hardware based
+networking device is called the primary device and the virtio-net device is the
+standby device.
+
+Restrictions
+
+
+Currently only PCIe devices are allowed as primary devices, this restriction
+can be lifted in the future with enhanced QEMU support. Also, only networking
+devices are allowed as primary device. The user needs to ensure that primary
+and standby devices are not plugged into the same PCIe slot.
+
+Usecase
+---
+
+  Virtio-net standby allows easy migration while using a passed-through fast
+  networking device by falling back to a virtio-net device for the duration of
+  the migration. It is like a simple version of a bond, the difference is that 
it
+  requires no configuration in the guest. When a guest is live-migrated to
+  another host QEMU will unplug the primary device via the PCIe based hotplug
+  handler and traffic will go through the virtio-net device.  On the target
+  system the primary device will be automatically plugged back and the
+  net_failover module registers it again as the primary device.
+
+Usage
+-
+
+  The primary device can be hotplugged or be part of the startup configuration
+
+  -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
+bus=root2,failover=on
+
+  With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be 
enabled.
+
+  -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1
+
+  failover_pair_id references the id of the virtio-net standby device. This
+  is only for pairing the devices within QEMU. The guest kernel module
+  net_failover will match devices with identical MAC addresses.
+
+Hotplug
+---
+
+  Both primary and standby device can be hotplugged via the QEMU monitor.  Note
+  that if the virtio-net device is plugged first a warning will be issued that 
it
+  couldn't find the primary device.
+
+Migration
+-
+
+  A new migration state wait-unplug was added

[PATCH v7 04/11] pci: mark device having guest unplug request pending

2019-10-29 Thread Jens Freimann
Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.

Signed-off-by: Jens Freimann 
---
 hw/pci/pcie.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 19363ff8ce..08718188bb 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
 if (dev->partially_hotplugged) {
+dev->qdev.pending_deleted_event = false;
 return;
 }
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
@@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+dev->pending_deleted_event = true;
+
 /* In case user cancel the operation of multi-function hot-add,
  * remove the function that is unexposed to guest individually,
  * without interaction with guest.
-- 
2.21.0




[PATCH v7 03/11] pci: mark devices partially unplugged

2019-10-29 Thread Jens Freimann
Only the guest unplug request was triggered. This is needed for
the failover feature. In case of a failed migration we need to
plug the device back to the guest.

Signed-off-by: Jens Freimann 
---
 hw/pci/pcie.c| 3 +++
 include/hw/pci/pci.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index a6beb567bd..19363ff8ce 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 {
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
+if (dev->partially_hotplugged) {
+return;
+}
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
 object_unparent(OBJECT(dev));
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 69d1f0228b..db75c6dfd0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 
 struct PCIDevice {
 DeviceState qdev;
+bool partially_hotplugged;
 
 /* PCI config space */
 uint8_t *config;
-- 
2.21.0




[PATCH v7 07/11] migration: allow unplug during migration for failover devices

2019-10-29 Thread Jens Freimann
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann 
---
 hw/core/qdev.c | 1 +
 hw/pci/pci.c   | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c | 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 3b8d43d0fd..cf1ba28fe3 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -996,6 +996,7 @@ static void device_initfn(Object *obj)
 
 dev->instance_id_alias = -1;
 dev->realized = false;
+dev->allow_unplug_during_migration = false;
 
 object_property_add_bool(obj, "realized",
  device_get_realized, device_set_realized, NULL);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 824ab4ed7b..c68498c0de 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2130,6 +2130,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_qdev_unrealize(DEVICE(pci_dev), NULL);
 return;
 }
+qdev->allow_unplug_during_migration = true;
 }
 
 /* rom loading */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 710981af36..1518495b1e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -156,6 +156,7 @@ struct DeviceState {
 bool pending_deleted_event;
 QemuOpts *opts;
 int hotplugged;
+bool allow_unplug_during_migration;
 BusState *parent_bus;
 QLIST_HEAD(, NamedGPIOList) gpios;
 QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index ffa08c670f..e6b112eb0a 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 return;
 }
 
-if (!migration_is_idle()) {
+if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
 error_setg(errp, "device_del not allowed while migrating");
 return;
 }
-- 
2.21.0




[PATCH v7 02/11] pci: add option for net failover

2019-10-29 Thread Jens Freimann
This patch adds a failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. The requirement for PCIe is because it doesn't support
other hotplug controllers at the moment. The failover functionality can
be added to other hotplug controllers like ACPI, SHCP,... later on.

Signed-off-by: Jens Freimann 
---
 hw/pci/pci.c | 31 +++
 include/hw/pci/pci.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..824ab4ed7b 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
 QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
+   failover_pair_id),
 DEFINE_PROP_END_OF_LIST()
 };
 
@@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 ObjectClass *klass = OBJECT_CLASS(pc);
 Error *local_err = NULL;
 bool is_default_rom;
+uint16_t class_id;
 
 /* initialize cap_present for pci_is_express() and pci_config_size(),
  * Note that hybrid PCIs are not set automatically and need to manage
@@ -2101,6 +2104,34 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
 
+if (pci_dev->failover_pair_id) {
+if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
+error_setg(errp, "failover primary device must be on "
+ "PCIExpress bus");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+error_setg(errp, "failover primary device is not an "
+ "Ethernet device");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
+&& (PCI_FUNC(pci_dev->devfn) == 0)) {
+qdev->allow_unplug_during_migration = true;
+} else {
+error_setg(errp, "failover: primary device must be in its own "
+  "PCI slot");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+}
+
 /* rom loading */
 is_default_rom = false;
 if (pci_dev->romfile == NULL && pc->romfile != NULL) {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3f0ffd5fb..69d1f0228b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -352,6 +352,9 @@ struct PCIDevice {
 MSIVectorUseNotifier msix_vector_use_notifier;
 MSIVectorReleaseNotifier msix_vector_release_notifier;
 MSIVectorPollNotifier msix_vector_poll_notifier;
+
+/* ID of standby device in net_failover pair */
+char *failover_pair_id;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
-- 
2.21.0




[PATCH v7 0/11] add failover feature for assigned network devices

2019-10-29 Thread Jens Freimann
erver,nowait \
-monitor telnet:127.0.0.1:,server,nowait \
-device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
-device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
-device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
-netdev 
tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
-device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on
 \
-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id 
=net1 \
/root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 


Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 MAINTAINERS|   1 +
 docs/virtio-net-failover.rst   |  68 
 hw/core/qdev.c |  25 +++
 hw/net/virtio-net.c| 302 +
 hw/pci/pci.c   |  32 
 hw/pci/pcie.c  |   6 +
 hw/vfio/pci.c  |  26 ++-
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   4 +
 include/hw/qdev-core.h |  30 
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 include/migration/vmstate.h|   2 +
 migration/migration.c  |  21 +++
 migration/migration.h  |   3 +
 migration/savevm.c |  33 
 migration/savevm.h |   2 +
 qapi/migration.json|  24 ++-
 qapi/net.json  |  19 +++
 qdev-monitor.c |  43 -
 tests/libqos/libqos.c  |   3 +-
 vl.c   |   6 +-
 22 files changed, 649 insertions(+), 15 deletions(-)
 create mode 100644 docs/virtio-net-failover.rst

-- 
2.21.0


*** BLURB HERE ***

Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 MAINTAINERS|   1 +
 docs/virtio-net-failover.rst   |  68 
 hw/core/qdev.c |  25 +++
 hw/net/virtio-net.c| 302 +
 hw/pci/pci.c   |  32 
 hw/pci/pcie.c  |   6 +
 hw/vfio/pci.c  |  26 ++-
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   4 +
 include/hw/qdev-core.h |  30 
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 include/migration/vmstate.h|   2 +
 migration/migration.c  |  21 +++
 migration/migration.h  |   3 +
 migration/savevm.c |  31 
 migration/savevm.h |   2 +
 qapi/migration.json|  24 ++-
 qapi/net.json  |  19 +++
 qdev-monitor.c |  43 -
 tests/libqos/libqos.c  |   3 +-
 vl.c   |   6 +-
 22 files changed, 647 insertions(+), 15 deletions(-)
 create mode 100644 docs/virtio-net-failover.rst

-- 
2.21.0




[PATCH v7 01/11] qdev/qbus: add hidden device support

2019-10-29 Thread Jens Freimann
This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.

qdev_device_add() is modified to check for a failover_pair_id
argument in the option string. A DeviceListener callback
should_be_hidden() is added. It can be used by a standby device to
inform qdev that this device should not be added now. The standby device
handler can store the device options to plug the device in at a later
point in time.

One reason for hiding the device is that we don't want to expose both
devices to the guest kernel until the respective virtio feature bit
VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be
handled correctly by the guest.

More information on the kernel feature this is using:
 https://www.kernel.org/doc/html/latest/networking/net_failover.html

An example where the primary device is a vfio-pci device and the standby
device is a virtio-net device:

A device is hidden when it has an "failover_pair_id" option, e.g.

 -device virtio-net-pci,...,failover=on,...
 -device vfio-pci,...,failover_pair_id=net1,...

Signed-off-by: Jens Freimann 
---
 hw/core/qdev.c | 24 
 include/hw/qdev-core.h | 29 +
 qdev-monitor.c | 41 +
 vl.c   |  6 --
 4 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cbad6c1d55..3b8d43d0fd 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -212,6 +212,30 @@ void device_listener_unregister(DeviceListener *listener)
 QTAILQ_REMOVE(_listeners, listener, link);
 }
 
+bool qdev_should_hide_device(QemuOpts *opts)
+{
+int rc = -1;
+DeviceListener *listener;
+
+QTAILQ_FOREACH(listener, _listeners, link) {
+   if (listener->should_be_hidden) {
+/*
+ * should_be_hidden_will return
+ *  1 if device matches opts and it should be hidden
+ *  0 if device matches opts and should not be hidden
+ *  -1 if device doesn't match ops
+ */
+rc = listener->should_be_hidden(listener, opts);
+}
+
+if (rc > 0) {
+break;
+}
+}
+
+return rc > 0;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa123f88cb..710981af36 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp);
  * respective parent types.
  *   
  * 
+ *
+ * # Hiding a device #
+ * To hide a device, a DeviceListener function should_be_hidden() needs to
+ * be registered.
+ * It can be used to defer adding a device and therefore hide it from the
+ * guest. The handler registering to this DeviceListener can save the QOpts
+ * passed to it for re-using it later and must return that it wants the device
+ * to be/remain hidden or not. When the handler function decides the device
+ * shall not be hidden it will be added in qdev_device_add() and
+ * realized as any other device. Otherwise qdev_device_add() will return early
+ * without adding the device. The guest will not see a "hidden" device
+ * until it was marked don't hide and qdev_device_add called again.
+ *
  */
 typedef struct DeviceClass {
 /*< private >*/
@@ -154,6 +167,12 @@ struct DeviceState {
 struct DeviceListener {
 void (*realize)(DeviceListener *listener, DeviceState *dev);
 void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+/*
+ * This callback is called upon init of the DeviceState and allows to
+ * inform qdev that a device should be hidden, depending on the device
+ * opts, for example, to hide a standby device.
+ */
+int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts);
 QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
+/**
+ * @qdev_should_hide_device:
+ * @opts: QemuOpts as passed on cmdline.
+ *
+ * Check if a device should be added.
+ * When a device is added via qdev_device_add() this will be called,
+ * and return if the device should be added now or not.
+ */
+bool qdev_should_hide_device(QemuOpts *opts);
+
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 148df9cacf..ffa08c670f 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -32,9 +32,11 @@
 #include "qemu/help_option.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h"
+#include "qemu/op

Re: [PATCH v6 0/11] add failover feature for assigned network devices

2019-10-29 Thread Jens Freimann

On Mon, Oct 28, 2019 at 11:58:53AM -0400, Michael S. Tsirkin wrote:

I see at least comments from Markus.
You answered but don't you need to also tweak the patch?


That comment was addressed already IMO, but I'll change the patch
description as well and while I'm at it will also fix David's comment.

regards,
Jens




Re: [PATCH v6 08/11] migration: add new migration state wait-unplug

2019-10-28 Thread Jens Freimann
Dr. David Alan Gilbert  schrieb am Di., 29. Okt. 2019,
03:57:

> * Jens Freimann (jfreim...@redhat.com) wrote:
> > This patch adds a new migration state called wait-unplug.  It is entered
> > after the SETUP state if failover devices are present. It will transition
> > into ACTIVE once all devices were succesfully unplugged from the guest.
> >
> > So if a guest doesn't respond or takes long to honor the unplug request
> > the user will see the migration state 'wait-unplug'.
> >
> > In the migration thread we query failover devices if they're are still
> > pending the guest unplug. When all are unplugged the migration
> > continues. If one device won't unplug migration will stay in wait_unplug
> > state.
> >
> > Signed-off-by: Jens Freimann 
> > Acked-by: Cornelia Huck 
>
> I think this is OK, so
>
>
> Reviewed-by: Dr. David Alan Gilbert 
>
> but see question below
>
> > ---
> >  include/migration/vmstate.h |  2 ++
> >  migration/migration.c   | 21 +
> >  migration/migration.h   |  3 +++
> >  migration/savevm.c  | 36 
> >  migration/savevm.h  |  2 ++
> >  qapi/migration.json |  5 -
> >  6 files changed, 68 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index b9ee563aa4..ac4f46a67d 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -186,6 +186,8 @@ struct VMStateDescription {
> >  int (*pre_save)(void *opaque);
> >  int (*post_save)(void *opaque);
> >  bool (*needed)(void *opaque);
> > +bool (*dev_unplug_pending)(void *opaque);
> > +
> >  const VMStateField *fields;
> >  const VMStateDescription **subsections;
> >  };
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 3febd0f8f3..51764f2565 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -52,6 +52,7 @@
> >  #include "hw/qdev-properties.h"
> >  #include "monitor/monitor.h"
> >  #include "net/announce.h"
> > +#include "qemu/queue.h"
> >
> >  #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed
> throttling */
> >
> > @@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state)
> >  case MIGRATION_STATUS_SETUP:
> >  case MIGRATION_STATUS_PRE_SWITCHOVER:
> >  case MIGRATION_STATUS_DEVICE:
> > +case MIGRATION_STATUS_WAIT_UNPLUG:
> >  return true;
> >
> >  default:
> > @@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo
> *info)
> >  case MIGRATION_STATUS_CANCELLED:
> >  info->has_status = true;
> >  break;
> > +case MIGRATION_STATUS_WAIT_UNPLUG:
> > +info->has_status = true;
> > +break;
> >  }
> >  info->status = s->state;
> >  }
> > @@ -1694,6 +1699,7 @@ bool migration_is_idle(void)
> >  case MIGRATION_STATUS_COLO:
> >  case MIGRATION_STATUS_PRE_SWITCHOVER:
> >  case MIGRATION_STATUS_DEVICE:
> > +case MIGRATION_STATUS_WAIT_UNPLUG:
> >  return false;
> >  case MIGRATION_STATUS__MAX:
> >  g_assert_not_reached();
> > @@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque)
> >
> >  qemu_savevm_state_setup(s->to_dst_file);
> >
> > +if (qemu_savevm_nr_failover_devices()) {
> > +migrate_set_state(>state, MIGRATION_STATUS_SETUP,
> > +  MIGRATION_STATUS_WAIT_UNPLUG);
> > +
> > +while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
> > +!qemu_savevm_state_guest_unplug_pending()) {
> > +qemu_sem_timedwait(>wait_unplug_sem, 250);
> > +}
> > +
> > +migrate_set_state(>state, MIGRATION_STATUS_WAIT_UNPLUG,
> > +MIGRATION_STATUS_ACTIVE);
> > +}
> > +
> >  s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> >  migrate_set_state(>state, MIGRATION_STATUS_SETUP,
> >MIGRATION_STATUS_ACTIVE);
> > @@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object
> *obj)
> >  qemu_mutex_destroy(>qemu_file_lock);
> >  g_free(params->tls_hostname);
> >  g_free(params->tls_creds);
> > +qemu_sem_destroy(>wait_unplug_sem);
> >  qemu_sem_destroy(&g

Re: [PATCH v6 0/11] add failover feature for assigned network devices

2019-10-28 Thread Jens Freimann

Hi Michael,

I addressed all comments and feedback and think this can be merged but
I'm unclear about which tree it should go to. Will you merge it into
the virtio-tree?

regards,
Jens

On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote:

This is implementing the host side of the net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Changes since v5:
* rename net_failover_pair_id parameter/property to failover_pair_id
* in PCI code use pci_bus_is_express(). This won't fail on functions > 0
* make sure primary and standby can't be added to same PCI slot
* add documentation file in docs/ to virtio-net patch, add file to
  MAINTAINERS (added to networking devices section)
* add comment to QAPI event for failover negotiation, try to improve
  commit message

The general idea is that we have a pair of devices, a vfio-pci and a
virtio-net device. Before migration the vfio device is unplugged and data
flows to the virtio-net device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs

* Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
 failover primary and only PCIExpress capable devices

* Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
 use to skip the unrealize code path when doing a unplug of the primary
 device

* Patch 4 sets the pending_deleted_event before triggering the guest
 unplug request

* Patch 5 and 6 add new qmp events, one sends the device id of a device
 that was just requested to be unplugged from the guest and another one
 to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated

* Patch 7 make sure that we can unplug the vfio-device before
 migration starts

* Patch 8 adds a new migration state that is entered while we wait for
 devices to be unplugged by guest OS

* Patch 9 just adds the new migration state to a check in libqos code

* Patch 10 In the second patch the virtio-net uses the API to defer adding the 
vfio
 device until the VIRTIO_NET_F_STANDBY feature is acked. It also
 implements the migration handler to unplug the device from the guest and
 re-plug in case of migration failure

* Patch 11 allows migration for failover vfio-pci devices

Previous discussion:
 RFC v1 https://patchwork.ozlabs.org/cover/989098/
 RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
 v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
 v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
 v3: https://patchew.org/QEMU/2019102015.11785-1-jfreim...@redhat.com/
 v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreim...@redhat.com/
 v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreim...@redhat.com/

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
 Migration might get stuck for unpredictable time with unclear reason.
 This approach combines two tricky things, hot/unplug and migration.
 -> We need to let libvirt know what's happening. Add new qmp events
and a new migration state. When a primary device is (partially)
unplugged (only from guest) we send a qmp event with the device id. When
it is unplugged from the guest the DEVICE_DELETED event is sent.
Migration will enter the wait-unplug state while waiting for the guest
os to unplug all primary devices and then move on with migration.
2. PCI devices are a precious ressource. The primary device should never
 be added to QEMU if it won't be used by guest instead of hiding it in
 QEMU.
 -> We only hotplug the device when the standby feature bit was
negotiated. We save the device cmdline options until we need it for
qdev_device_add()
Hiding a device can be a useful concept to model. For example a
pci device in a powered-off slot could be marked as hidden until the slot is
powered on (mst).
3. Management layer software should handle this. Open Stack already has
 components/code to handle unplug/replug VFIO devices and metadata to
 provide to the guest for detecting which devices should be paired.
 -> An approach that includes all software from firmware to
higher-level management software wasn't tried in the last years. This is
an attempt to keep it simple and contained in QEMU as much as possible.
One of the problems that stopped management software and libvirt from
implementing this idea is that it can't be sure that it's possible to
re-plug the primary device. By not freeing the devices resources in QEMU
and only asking the guest OS to unplug it is possible to re-plug the
device in case of a migration failure.
4. Hotplugging a device and then making it part of a failover setup is
  not possible
 -> addressed by extending qdev hotplug fun

Re: [PATCH v6 06/11] qapi: add failover negotiated event

2019-10-25 Thread Jens Freimann

On Fri, Oct 25, 2019 at 04:03:54PM +0200, Markus Armbruster wrote:

Bear with me, I know next to nothing about failover.

Jens Freimann  writes:


This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature
was enabled. The primary device this virtio-net device is associated
with, will now be hotplugged via qdev_device_add().


Passive voice deftly avoids telling the reader who will do the
hot-plugging.  Intentional?


Not really, it's in the comment to the event. The hotplug will be
done by the virtio-net device code that activates the feature, in
virtio_net_set_features().




Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 qapi/net.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..ea64f7 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,22 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation.
+# Failover primary devices which were hidden (not hotplugged when requested)
+# before will now be hotplugged by the virtio-net standby device.
+#
+# device-id: QEMU device id of the unplugged device


@device-id is new since v5.

A quick skim of
https://www.kernel.org/doc/html/latest/networking/net_failover.html
tells me there are three devices involved: master, primary slave,
standby slave.  Which one is @device-id?  Or am I confused?


Yes, the device-id is new and it's the device-id of the standby (i.e.
virtio-net) device.

regards,
Jens 





[PATCH v6 11/11] vfio: unplug failover primary device before migration

2019-10-25 Thread Jens Freimann
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann 
Acked-by: Alex Williamson 
Acked-by: Cornelia Huck 
---
 hw/vfio/pci.c | 26 --
 hw/vfio/pci.h |  1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..e6569a7968 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
+if (!pdev->failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+error_free(vdev->migration_blocker);
+return;
+}
+}
+
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = _pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
@@ -3008,6 +3020,10 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
-.name = "vfio-pci",
-.unmigratable = 1,
-};
-
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = vfio_pci_reset;
 dc->props = vfio_pci_dev_properties;
-dc->vmsd = _pci_vmstate;
 dc->desc = "VFIO-based PCI device assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 pdc->realize = vfio_realize;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..b329d50338 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
+Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0




[PATCH v6 10/11] net/virtio: add failover support

2019-10-25 Thread Jens Freimann
This patch adds support to handle failover device pairs of a virtio-net
device and a (vfio-)pci device, where the virtio-net acts as the standby
device and the (vfio-)pci device as the primary.

The general idea is that we have a pair of devices, a (vfio-)pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another (vfio-)pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   if the device should be hidden or not.
   When it should be hidden it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

   If the virtio-net devices are not realized at the time the (vfio-)pci
   devices are realized, we need to connect the devices later. This way
   we make sure primary and standby devices can be specified in any
   order.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

3. Register a callback for the migration code that checks if a device
   needs to be unplugged from the guest.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 MAINTAINERS|   1 +
 docs/virtio-net-failover.rst   |  68 
 hw/net/virtio-net.c| 302 +
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 5 files changed, 384 insertions(+)
 create mode 100644 docs/virtio-net-failover.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ed41d7d1b6..28f957e83d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1412,6 +1412,7 @@ S: Odd Fixes
 F: hw/net/
 F: include/hw/net/
 F: tests/virtio-net-test.c
+F: docs/virtio-net-failover.rst
 T: git https://github.com/jasowang/qemu.git net
 
 Parallel NOR Flash devices
diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst
new file mode 100644
index 00..22f64c7bc8
--- /dev/null
+++ b/docs/virtio-net-failover.rst
@@ -0,0 +1,68 @@
+
+QEMU virtio-net standby (net_failover)
+
+
+This document explains the setup and usage of virtio-net standby feature which
+is used to create a net_failover pair of devices.
+
+The general idea is that we have a pair of devices, a (vfio-)pci and a
+virtio-net device. Before migration the vfio device is unplugged and data flows
+through the virtio-net device, on the target side another vfio-pci device is
+plugged in to take over the data-path. In the guest the net_failover kernel
+module will pair net devices with the same MAC address.
+
+The two devices are called primary and standby device. The fast hardware based
+networking device is called the primary device and the virtio-net device is the
+standby device.
+
+Restrictions
+
+
+Currently only PCIe devices are allowed as primary devices, this restriction
+can be lifted in the future with enhanced QEMU support. Also, only networking
+devices are allowed as primary device. The user needs to ensure that primary
+and standby devices are not plugged into the same PCIe slot.
+
+Usecase
+---
+
+  Virtio-net standby allows easy migration while using a passed-through fast
+  networking device by falling back to a virtio-net device for the duration of
+  the migration. It is like a simple version of a bond, the difference is that 
it
+  requires no configuration in the guest. When a guest is live-migrated to
+  another host QEMU will unplug the primary device via the PCIe based hotplug
+  handler and traffic will go through the virtio-net device.  On the target
+  system the primary device will be automatically plugged back and the
+  net_failover module registers it again as the primary device.
+
+Usage
+-
+
+  The primary device can be hotplugged or be part of the startup configuration
+
+  -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
+bus=root2,failover=on
+
+  With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be 
enabled.
+
+  -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1
+
+  failover_pair_id references the id of the virtio-net standby device. This
+  is only for pairing the devices within QEMU. The guest kernel module
+  net_failover will match devices with identical MAC addresses.
+
+Hotplug
+---
+
+  Both primary and standby device can be hotplugged via the QEMU monitor.  Note
+  that if the virtio-net device is plugged first a warning will be issued that 
it
+  couldn't find the primary device.
+
+Migration
+-
+
+  A new migration state wait

[PATCH v6 09/11] libqos: tolerate wait-unplug migration state

2019-10-25 Thread Jens Freimann
Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 tests/libqos/libqos.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
index d71557c5cb..f229eb2cb8 100644
--- a/tests/libqos/libqos.c
+++ b/tests/libqos/libqos.c
@@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri)
 break;
 }
 
-if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) {
+if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)
+|| (strcmp(st, "wait-unplug") == 0)) {
 qobject_unref(rsp);
 g_usleep(5000);
 continue;
-- 
2.21.0




[PATCH v6 05/11] qapi: add unplug primary event

2019-10-25 Thread Jens Freimann
This event is emitted when we sent a request to unplug a
failover primary device from the Guest OS and it includes the
device id of the primary device.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 qapi/migration.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82feb5bd39..e9e7a97c03 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1448,3 +1448,22 @@
 # Since: 3.0
 ##
 { 'command': 'migrate-pause', 'allow-oob': true }
+
+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device-id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"} }
+#
+##
+{ 'event': 'UNPLUG_PRIMARY',
+  'data': { 'device-id': 'str' } }
-- 
2.21.0




[PATCH v6 08/11] migration: add new migration state wait-unplug

2019-10-25 Thread Jens Freimann
This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 include/migration/vmstate.h |  2 ++
 migration/migration.c   | 21 +
 migration/migration.h   |  3 +++
 migration/savevm.c  | 36 
 migration/savevm.h  |  2 ++
 qapi/migration.json |  5 -
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ac4f46a67d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -186,6 +186,8 @@ struct VMStateDescription {
 int (*pre_save)(void *opaque);
 int (*post_save)(void *opaque);
 bool (*needed)(void *opaque);
+bool (*dev_unplug_pending)(void *opaque);
+
 const VMStateField *fields;
 const VMStateDescription **subsections;
 };
diff --git a/migration/migration.c b/migration/migration.c
index 3febd0f8f3..51764f2565 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "net/announce.h"
+#include "qemu/queue.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state)
 case MIGRATION_STATUS_SETUP:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return true;
 
 default:
@@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info)
 case MIGRATION_STATUS_CANCELLED:
 info->has_status = true;
 break;
+case MIGRATION_STATUS_WAIT_UNPLUG:
+info->has_status = true;
+break;
 }
 info->status = s->state;
 }
@@ -1694,6 +1699,7 @@ bool migration_is_idle(void)
 case MIGRATION_STATUS_COLO:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return false;
 case MIGRATION_STATUS__MAX:
 g_assert_not_reached();
@@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque)
 
 qemu_savevm_state_setup(s->to_dst_file);
 
+if (qemu_savevm_nr_failover_devices()) {
+migrate_set_state(>state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_WAIT_UNPLUG);
+
+while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
+!qemu_savevm_state_guest_unplug_pending()) {
+qemu_sem_timedwait(>wait_unplug_sem, 250);
+}
+
+migrate_set_state(>state, MIGRATION_STATUS_WAIT_UNPLUG,
+MIGRATION_STATUS_ACTIVE);
+}
+
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_ACTIVE);
@@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_mutex_destroy(>qemu_file_lock);
 g_free(params->tls_hostname);
 g_free(params->tls_creds);
+qemu_sem_destroy(>wait_unplug_sem);
 qemu_sem_destroy(>rate_limit_sem);
 qemu_sem_destroy(>pause_sem);
 qemu_sem_destroy(>postcopy_pause_sem);
@@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(>postcopy_pause_rp_sem, 0);
 qemu_sem_init(>rp_state.rp_sem, 0);
 qemu_sem_init(>rate_limit_sem, 0);
+qemu_sem_init(>wait_unplug_sem, 0);
 qemu_mutex_init(>qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 4f2fe193dc..79b3dda146 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,9 @@ struct MigrationState
 /* Flag set once the migration thread called bdrv_inactivate_all */
 bool block_inactive;
 
+/* Migration is waiting for guest to unplug device */
+QemuSemaphore wait_unplug_sem;
+
 /* Migration is paused due to pause-before-switchover */
 QemuSemaphore pause_sem;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e261f6..0f18dea49e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1113,6 +1113,42 @@ void qemu_savevm_state_header(QEMUFile *f)
 }
 }
 
+int qemu_savevm_nr_failover_devices(void)
+{
+SaveStateEntry *se;
+int n = 0;
+
+QTAILQ_FOREACH(se, _state.handlers, entry)

[PATCH v6 07/11] migration: allow unplug during migration for failover devices

2019-10-25 Thread Jens Freimann
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 hw/core/qdev.c | 1 +
 hw/pci/pci.c   | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c | 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 3b8d43d0fd..cf1ba28fe3 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -996,6 +996,7 @@ static void device_initfn(Object *obj)
 
 dev->instance_id_alias = -1;
 dev->realized = false;
+dev->allow_unplug_during_migration = false;
 
 object_property_add_bool(obj, "realized",
  device_get_realized, device_set_realized, NULL);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 824ab4ed7b..c68498c0de 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2130,6 +2130,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_qdev_unrealize(DEVICE(pci_dev), NULL);
 return;
 }
+qdev->allow_unplug_during_migration = true;
 }
 
 /* rom loading */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 710981af36..1518495b1e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -156,6 +156,7 @@ struct DeviceState {
 bool pending_deleted_event;
 QemuOpts *opts;
 int hotplugged;
+bool allow_unplug_during_migration;
 BusState *parent_bus;
 QLIST_HEAD(, NamedGPIOList) gpios;
 QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index ffa08c670f..e6b112eb0a 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 return;
 }
 
-if (!migration_is_idle()) {
+if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
 error_setg(errp, "device_del not allowed while migrating");
 return;
 }
-- 
2.21.0




[PATCH v6 03/11] pci: mark devices partially unplugged

2019-10-25 Thread Jens Freimann
Only the guest unplug request was triggered. This is needed for
the failover feature. In case of a failed migration we need to
plug the device back to the guest.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 hw/pci/pcie.c| 3 +++
 include/hw/pci/pci.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index a6beb567bd..19363ff8ce 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 {
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
+if (dev->partially_hotplugged) {
+return;
+}
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
 object_unparent(OBJECT(dev));
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 69d1f0228b..db75c6dfd0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 
 struct PCIDevice {
 DeviceState qdev;
+bool partially_hotplugged;
 
 /* PCI config space */
 uint8_t *config;
-- 
2.21.0




[PATCH v6 01/11] qdev/qbus: add hidden device support

2019-10-25 Thread Jens Freimann
This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.

qdev_device_add() is modified to check for a failover_pair_id
argument in the option string. A DeviceListener callback
should_be_hidden() is added. It can be used by a standby device to
inform qdev that this device should not be added now. The standby device
handler can store the device options to plug the device in at a later
point in time.

One reason for hiding the device is that we don't want to expose both
devices to the guest kernel until the respective virtio feature bit
VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be
handled correctly by the guest.

More information on the kernel feature this is using:
 https://www.kernel.org/doc/html/latest/networking/net_failover.html

An example where the primary device is a vfio-pci device and the standby
device is a virtio-net device:

A device is hidden when it has an "failover_pair_id" option, e.g.

 -device virtio-net-pci,...,failover=on,...
 -device vfio-pci,...,failover_pair_id=net1,...

Signed-off-by: Jens Freimann 
Reviewed-by: Cornelia Huck 
---
 hw/core/qdev.c | 24 
 include/hw/qdev-core.h | 29 +
 qdev-monitor.c | 41 +
 vl.c   |  6 --
 4 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cbad6c1d55..3b8d43d0fd 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -212,6 +212,30 @@ void device_listener_unregister(DeviceListener *listener)
 QTAILQ_REMOVE(_listeners, listener, link);
 }
 
+bool qdev_should_hide_device(QemuOpts *opts)
+{
+int rc = -1;
+DeviceListener *listener;
+
+QTAILQ_FOREACH(listener, _listeners, link) {
+   if (listener->should_be_hidden) {
+/*
+ * should_be_hidden_will return
+ *  1 if device matches opts and it should be hidden
+ *  0 if device matches opts and should not be hidden
+ *  -1 if device doesn't match ops
+ */
+rc = listener->should_be_hidden(listener, opts);
+}
+
+if (rc > 0) {
+break;
+}
+}
+
+return rc > 0;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa123f88cb..710981af36 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp);
  * respective parent types.
  *   
  * 
+ *
+ * # Hiding a device #
+ * To hide a device, a DeviceListener function should_be_hidden() needs to
+ * be registered.
+ * It can be used to defer adding a device and therefore hide it from the
+ * guest. The handler registering to this DeviceListener can save the QOpts
+ * passed to it for re-using it later and must return that it wants the device
+ * to be/remain hidden or not. When the handler function decides the device
+ * shall not be hidden it will be added in qdev_device_add() and
+ * realized as any other device. Otherwise qdev_device_add() will return early
+ * without adding the device. The guest will not see a "hidden" device
+ * until it was marked don't hide and qdev_device_add called again.
+ *
  */
 typedef struct DeviceClass {
 /*< private >*/
@@ -154,6 +167,12 @@ struct DeviceState {
 struct DeviceListener {
 void (*realize)(DeviceListener *listener, DeviceState *dev);
 void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+/*
+ * This callback is called upon init of the DeviceState and allows to
+ * inform qdev that a device should be hidden, depending on the device
+ * opts, for example, to hide a standby device.
+ */
+int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts);
 QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
+/**
+ * @qdev_should_hide_device:
+ * @opts: QemuOpts as passed on cmdline.
+ *
+ * Check if a device should be added.
+ * When a device is added via qdev_device_add() this will be called,
+ * and return if the device should be added now or not.
+ */
+bool qdev_should_hide_device(QemuOpts *opts);
+
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 148df9cacf..ffa08c670f 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -32,9 +32,11 @@
 #include "qemu/help_option.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h&q

[PATCH v6 06/11] qapi: add failover negotiated event

2019-10-25 Thread Jens Freimann
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY feature
was enabled. The primary device this virtio-net device is associated
with, will now be hotplugged via qdev_device_add().

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 qapi/net.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..ea64f7 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,22 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was enabled during feature negotiation.
+# Failover primary devices which were hidden (not hotplugged when requested)
+# before will now be hotplugged by the virtio-net standby device.
+#
+# device-id: QEMU device id of the unplugged device
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "FAILOVER_NEGOTIATED",
+#  "data": "net1" }
+#
+##
+{ 'event': 'FAILOVER_NEGOTIATED',
+  'data': {'device-id': 'str'} }
-- 
2.21.0




[PATCH v6 04/11] pci: mark device having guest unplug request pending

2019-10-25 Thread Jens Freimann
Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 hw/pci/pcie.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 19363ff8ce..08718188bb 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
 if (dev->partially_hotplugged) {
+dev->qdev.pending_deleted_event = false;
 return;
 }
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
@@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+dev->pending_deleted_event = true;
+
 /* In case user cancel the operation of multi-function hot-add,
  * remove the function that is unexposed to guest individually,
  * without interaction with guest.
-- 
2.21.0




[PATCH v6 02/11] pci: add option for net failover

2019-10-25 Thread Jens Freimann
This patch adds a failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. The requirement for PCIe is because it doesn't support
other hotplug controllers at the moment. The failover functionality can
be added to other hotplug controllers like ACPI, SHCP,... later on.

Signed-off-by: Jens Freimann 
Acked-by: Cornelia Huck 
---
 hw/pci/pci.c | 31 +++
 include/hw/pci/pci.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..824ab4ed7b 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
 QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
+   failover_pair_id),
 DEFINE_PROP_END_OF_LIST()
 };
 
@@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 ObjectClass *klass = OBJECT_CLASS(pc);
 Error *local_err = NULL;
 bool is_default_rom;
+uint16_t class_id;
 
 /* initialize cap_present for pci_is_express() and pci_config_size(),
  * Note that hybrid PCIs are not set automatically and need to manage
@@ -2101,6 +2104,34 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
 
+if (pci_dev->failover_pair_id) {
+if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
+error_setg(errp, "failover primary device must be on "
+ "PCIExpress bus");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+error_setg(errp, "failover primary device is not an "
+ "Ethernet device");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
+&& (PCI_FUNC(pci_dev->devfn) == 0)) {
+qdev->allow_unplug_during_migration = true;
+} else {
+error_setg(errp, "failover: primary device must be in its own "
+  "PCI slot");
+error_propagate(errp, local_err);
+pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+return;
+}
+}
+
 /* rom loading */
 is_default_rom = false;
 if (pci_dev->romfile == NULL && pc->romfile != NULL) {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3f0ffd5fb..69d1f0228b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -352,6 +352,9 @@ struct PCIDevice {
 MSIVectorUseNotifier msix_vector_use_notifier;
 MSIVectorReleaseNotifier msix_vector_release_notifier;
 MSIVectorPollNotifier msix_vector_poll_notifier;
+
+/* ID of standby device in net_failover pair */
+char *failover_pair_id;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
-- 
2.21.0




[PATCH v6 0/11] add failover feature for assigned network devices

2019-10-25 Thread Jens Freimann
u-system-x86_64 -enable-kvm -m 3072 -smp 3 \
-machine q35,kernel-irqchip=split -cpu host   \
-serial stdio   \
-net none \
-qmp unix:/tmp/qmp.socket,server,nowait \
-monitor telnet:127.0.0.1:,server,nowait \
-device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
-device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
-device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
-netdev 
tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
-device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on
 \
-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id 
=net1 \
/root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 


Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 MAINTAINERS|   1 +
 docs/virtio-net-failover.rst   |  68 
 hw/core/qdev.c |  25 +++
 hw/net/virtio-net.c| 302 +
 hw/pci/pci.c   |  32 
 hw/pci/pcie.c  |   6 +
 hw/vfio/pci.c  |  26 ++-
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   4 +
 include/hw/qdev-core.h |  30 
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 include/migration/vmstate.h|   2 +
 migration/migration.c  |  21 +++
 migration/migration.h  |   3 +
 migration/savevm.c |  36 
 migration/savevm.h |   2 +
 qapi/migration.json|  24 ++-
 qapi/net.json  |  19 +++
 qdev-monitor.c |  43 -
 tests/libqos/libqos.c  |   3 +-
 vl.c   |   6 +-
 22 files changed, 652 insertions(+), 15 deletions(-)
 create mode 100644 docs/virtio-net-failover.rst

-- 
2.21.0




Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-25 Thread Jens Freimann

On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:

On Wed, 23 Oct 2019 10:27:02 +0200
Jens Freimann  wrote:

[...]

@@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }

+if (pci_dev->net_failover_pair_id) {
+if (!pci_is_express(pci_dev)) {
+error_setg(errp, "failover device is not a PCIExpress device");
+error_propagate(errp, local_err);
+return;
+}


Did we decide we don't need to test that the device is also in a
hotpluggable slot?  Are there also multi-function considerations that
should be prevented or documented?  For example, if a user tries to
configure both the primary and failover NICs in the same slot, I assume
bad things will happen.


I added this check

   if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
   && (PCI_FUNC(pci_dev->devfn) == 0)) {
   qdev->allow_unplug_during_migration = true;
   } else {
   error_setg(errp, "failover: primary device must be in its own "
 "PCI slot");
   error_propagate(errp, local_err);
   pci_qdev_unrealize(DEVICE(pci_dev), NULL);
   return;
   }

When I first add a vfio-pci with net_failover_pair_id=x,multifunction=on
and addr=0.0 I will now get an error.

(qemu) device_add 
vfio-pci,...,bus=root2,net_failover_pair_id=net1,multifunction=on,addr=0.0
Error: failover: primary device must be in its own PCI slot

If I put in a virtio-net device in slot 0 and then try to add a
vfio-pci device in the same slot I get the following error message:

-device virtio-net-pci,...id=net1bus=root1,failover=on,multifunction=on,addr=0.0
(qemu) device_add 
vfio-pci,id=hostdev1,host=:5e:00.2,bus=root1,net_failover_pair_id=net1,addr=0.1
Error: PCI: slot 0 function 0 already ocuppied by virtio-net-pci,
  new func vfio-pci cannot be exposed to guest.


regards,
Jens




Re: [PATCH v5 06/11] qapi: add failover negotiated event

2019-10-25 Thread Jens Freimann

On Fri, Oct 25, 2019 at 07:35:28AM +0200, Markus Armbruster wrote:

We ask patch submitters to cc: subject matter experts for review.  You
did.  When such patches touch the QAPI schema, it's best to cc the qapi
schema maintainers (Eric Blake and me) as well, because we can't require
all subject matter experts to be fluent in the QAPI schema language and
conventions.  I found this one more or less by chance.


Sorry about that, I'll make sure to get the right people next time.


Jens Freimann  writes:


This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
feature was not negotiated during virtio feature negotiation. If this
event is received it means any primary devices hotplugged before
this were were never really added to QEMU devices.


Too many negations for my poor old brain to process.


I'll try to explain better :) 




Signed-off-by: Jens Freimann 
---
 qapi/net.json | 16 
 1 file changed, 16 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..8c5f3f1fb2 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,19 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
+#
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "FAILOVER_NEGOTIATED",
+#  "data": {} }
+#
+##
+{ 'event': 'FAILOVER_NEGOTIATED',
+  'data': {} }


The commit message at least tries to explain intended use.  The doc
string does not.  Should it?


Sure, I'll add it. 


Thanks for the review!

regards,
Jens 





Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-24 Thread Jens Freimann

On Thu, Oct 24, 2019 at 10:52:36AM -0600, Alex Williamson wrote:

On Thu, 24 Oct 2019 11:37:54 +0200
Jens Freimann  wrote:

[...]

>

>While reviewing, I realized that we shouldn't have this check for below 
reasons.
>
>1. It is user's responsibility to pass networking device.
>But its ok to check the class, if PCI Device is passed.
>So class comparison should be inside the pci_check().

I'm not sure I understand this point, could you please elaborate?
You're suggesting to move the check for the class into the check for
pci_is_express?


Seems like the suggestion is that net_failover_pair_id should be an
option on the base class of PCIDevice (DeviceState?) and only if it's a
PCI device would we check the class code.  But there are dependencies
at the hotplug controller, which I think is why this is currently
specific to PCI.


Yes, It doesn't support acpi, shpc,... hotplug as of now. It
shouldn't be hard to add but I'd like to do it as a follow-on series.


However, it's an interesting point about pci_is_express().  This test
is really just meant to check whether the hotplug controller supports
this feature, which is only implemented in pciehp via this series.
There's a bit of a mismatch though that pcie_is_express() checks
whether the device is express, not whether the bus it sits on is
express.  I think we really want the latter, so maybe this should be:

pci_bus_is_express(pci_get_bus(dev)

For example this feature should work if I plug an e1000 (not e1000e)
into an express slot, but not if I plug an e1000e into a conventional
slot.


I'll try this and test. 


Thanks!

regards,
Jens 





Re: [PATCH v5 06/11] qapi: add failover negotiated event

2019-10-24 Thread Jens Freimann

On Thu, Oct 24, 2019 at 06:32:45PM +0100, Dr. David Alan Gilbert wrote:

* Jens Freimann (jfreim...@redhat.com) wrote:

This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
feature was not negotiated during virtio feature negotiation. If this
event is received it means any primary devices hotplugged before
this were were never really added to QEMU devices.

Signed-off-by: Jens Freimann 


Can I just understand a bit more about what the meaning of this is.

Say my VM boots:
  a) BIOS
  b) Boot loader
  c) Linux
  d) Reboots
 (possibly a',b', different c')

When would I get that event?
When can libvirt know it can use it?


The event is sent every time we do feature negotiation for the virtio-net
device, so you'll get it during Linux boot and reboots.

In v6, I add a data field 'id' to the event to pass the device id. 


regards,
Jens 





Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-24 Thread Jens Freimann

On Thu, Oct 24, 2019 at 06:22:27PM +0100, Dr. David Alan Gilbert wrote:

* Jens Freimann (jfreim...@redhat.com) wrote:

This patch adds a net_failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. QEMU will exit with an error message otherwise.

Signed-off-by: Jens Freimann 
---
 hw/pci/pci.c | 17 +
 include/hw/pci/pci.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..fa9b5219f8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
 QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
+net_failover_pair_id),


Should we just make this 'failover_pair_id' - then when someone in the
future figures out how to make it work for something else (e.g.
multipath block devices) then it's all good?


Yes, I see no reason why not to rename it. 


Thanks!

regards,
Jens




Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-24 Thread Jens Freimann

On Thu, Oct 24, 2019 at 05:03:46AM +, Parav Pandit wrote:

@@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev,
Error **errp)
 }
 }

+if (pci_dev->net_failover_pair_id) {
+if (!pci_is_express(pci_dev)) {


I am testing and integrating this piece with mlx5 devices.
I see that pci_is_express() return true only for first PCI function.
Didn't yet dig the API.
Commenting out this check and below class check progresses further.


First of all, thanks for testing this!
Could you share your commandline please? I can't reproduce it.


While reviewing, I realized that we shouldn't have this check for below reasons.

1. It is user's responsibility to pass networking device.
But its ok to check the class, if PCI Device is passed.
So class comparison should be inside the pci_check().


I'm not sure I understand this point, could you please elaborate?
You're suggesting to move the check for the class into the check for
pci_is_express?


2. It is limiting to only consider PCI devices.
Automated and regression tests should be able validate this feature without PCI 
Device.
This will enhance the stability of feature in long run.

3. net failover driver doesn't limit it to have it over only PCI device.
So similarly hypervisor should be limiting.


I agree that we don't have to limit it to PCI(e) forever. But for this
first shot I think we should and then extend it continually. There are
more things we can support in the future like other hotplug types etc.

regards,
Jens  





Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-23 Thread Jens Freimann

On Wed, Oct 23, 2019 at 02:02:11PM -0600, Alex Williamson wrote:

On Wed, 23 Oct 2019 21:30:35 +0200
Jens Freimann  wrote:


On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
>On Wed, 23 Oct 2019 10:27:02 +0200
>Jens Freimann  wrote:

[...]

>Are there also multi-function considerations that
>should be prevented or documented?  For example, if a user tries to
>configure both the primary and failover NICs in the same slot, I assume
>bad things will happen.

  I would have expected that this is already checked in pci code, but
it is not. I tried it and when I put both devices into the same slot
they are both unplugged from the guest during boot but nothing else
happens. I don't know what triggers that unplug of the devices.

I'm not aware of any other problems regarding multi-function, which
doesn't mean there aren't any.


Hmm, was the hidden device at function #0?  The guest won't find any
functions if function #0 isn't present, but I don't know what would
trigger the hotplug.  The angle I'm thinking is that we only have slot
level granularity for hotplug, so any sort of automatic hotplug of a
slot should probably think about bystander devices within the slot.


Yes that would be a problem, but isn't it the same in the non-failover case
where a user configures it wrong? The slot where the device is plugged is not
chosen automatically it's configured by the user, no? I might be mixing 
something
up here.  I have no idea yet how to check if a slot is already populated, but
I'll think about it. 


Thanks!

regards,
Jens 





Re: [PATCH v5 02/11] pci: add option for net failover

2019-10-23 Thread Jens Freimann

On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:

On Wed, 23 Oct 2019 10:27:02 +0200
Jens Freimann  wrote:


This patch adds a net_failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. QEMU will exit with an error message otherwise.

Signed-off-by: Jens Freimann 
---
 hw/pci/pci.c | 17 +
 include/hw/pci/pci.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..fa9b5219f8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
 QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
+net_failover_pair_id),


Nit, white space appears broken here.


I'll fix it.


 DEFINE_PROP_END_OF_LIST()
 };

@@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 ObjectClass *klass = OBJECT_CLASS(pc);
 Error *local_err = NULL;
 bool is_default_rom;
+uint16_t class_id;

 /* initialize cap_present for pci_is_express() and pci_config_size(),
  * Note that hybrid PCIs are not set automatically and need to manage
@@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }

+if (pci_dev->net_failover_pair_id) {
+if (!pci_is_express(pci_dev)) {
+error_setg(errp, "failover device is not a PCIExpress device");
+error_propagate(errp, local_err);
+return;
+}


Did we decide we don't need to test that the device is also in a
hotpluggable slot?  


Hmm, my reply to you was never sent. I thought the check for
qdev_device_add() was sufficient but you said that works only
after qdev_hotplug is set (after machine creation). I modified
the check to this:

   hide = should_hide_device(opts);  
 
   if ((hide || qdev_hotplug) && bus && !qbus_is_hotpluggable(bus)) {
   error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name); 
   return NULL;  
   } 
 
   if (hide) {   
   return NULL;  
   }


This will make qemu fail when we have the device in the initial
configuration or when we hotplug it to a bus that doesn't support it.
I tested both with a device on pcie.0. Am I missing something? 


Are there also multi-function considerations that
should be prevented or documented?  For example, if a user tries to
configure both the primary and failover NICs in the same slot, I assume
bad things will happen.


 I would have expected that this is already checked in pci code, but
it is not. I tried it and when I put both devices into the same slot
they are both unplugged from the guest during boot but nothing else
happens. I don't know what triggers that unplug of the devices.

I'm not aware of any other problems regarding multi-function, which
doesn't mean there aren't any. 




+class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+error_setg(errp, "failover device is not an Ethernet device");
+error_propagate(errp, local_err);
+return;
+}
+}


Looks like cleanup is missing both both error cases, the option rom
error path below this does a pci_qdev_unrealize() before returning so
I'd assume we want the same here.  Thanks,


Thanks, I'll fix this too.

regards,
Jens 





[PATCH v5 11/11] vfio: unplug failover primary device before migration

2019-10-23 Thread Jens Freimann
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann 
---
 hw/vfio/pci.c | 26 --
 hw/vfio/pci.h |  1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..5dadfec6e8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
+if (!pdev->net_failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+error_free(vdev->migration_blocker);
+return;
+}
+}
+
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = _pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
@@ -3008,6 +3020,10 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
-.name = "vfio-pci",
-.unmigratable = 1,
-};
-
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = vfio_pci_reset;
 dc->props = vfio_pci_dev_properties;
-dc->vmsd = _pci_vmstate;
 dc->desc = "VFIO-based PCI device assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 pdc->realize = vfio_realize;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..b329d50338 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
+Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0




[PATCH v5 08/11] migration: add new migration state wait-unplug

2019-10-23 Thread Jens Freimann
This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann 
---
 include/migration/vmstate.h |  2 ++
 migration/migration.c   | 21 +
 migration/migration.h   |  3 +++
 migration/savevm.c  | 36 
 migration/savevm.h  |  2 ++
 qapi/migration.json |  5 -
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ac4f46a67d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -186,6 +186,8 @@ struct VMStateDescription {
 int (*pre_save)(void *opaque);
 int (*post_save)(void *opaque);
 bool (*needed)(void *opaque);
+bool (*dev_unplug_pending)(void *opaque);
+
 const VMStateField *fields;
 const VMStateDescription **subsections;
 };
diff --git a/migration/migration.c b/migration/migration.c
index 3febd0f8f3..51764f2565 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "net/announce.h"
+#include "qemu/queue.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state)
 case MIGRATION_STATUS_SETUP:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return true;
 
 default:
@@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info)
 case MIGRATION_STATUS_CANCELLED:
 info->has_status = true;
 break;
+case MIGRATION_STATUS_WAIT_UNPLUG:
+info->has_status = true;
+break;
 }
 info->status = s->state;
 }
@@ -1694,6 +1699,7 @@ bool migration_is_idle(void)
 case MIGRATION_STATUS_COLO:
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
+case MIGRATION_STATUS_WAIT_UNPLUG:
 return false;
 case MIGRATION_STATUS__MAX:
 g_assert_not_reached();
@@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque)
 
 qemu_savevm_state_setup(s->to_dst_file);
 
+if (qemu_savevm_nr_failover_devices()) {
+migrate_set_state(>state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_WAIT_UNPLUG);
+
+while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
+!qemu_savevm_state_guest_unplug_pending()) {
+qemu_sem_timedwait(>wait_unplug_sem, 250);
+}
+
+migrate_set_state(>state, MIGRATION_STATUS_WAIT_UNPLUG,
+MIGRATION_STATUS_ACTIVE);
+}
+
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_ACTIVE);
@@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_mutex_destroy(>qemu_file_lock);
 g_free(params->tls_hostname);
 g_free(params->tls_creds);
+qemu_sem_destroy(>wait_unplug_sem);
 qemu_sem_destroy(>rate_limit_sem);
 qemu_sem_destroy(>pause_sem);
 qemu_sem_destroy(>postcopy_pause_sem);
@@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(>postcopy_pause_rp_sem, 0);
 qemu_sem_init(>rp_state.rp_sem, 0);
 qemu_sem_init(>rate_limit_sem, 0);
+qemu_sem_init(>wait_unplug_sem, 0);
 qemu_mutex_init(>qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 4f2fe193dc..79b3dda146 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,9 @@ struct MigrationState
 /* Flag set once the migration thread called bdrv_inactivate_all */
 bool block_inactive;
 
+/* Migration is waiting for guest to unplug device */
+QemuSemaphore wait_unplug_sem;
+
 /* Migration is paused due to pause-before-switchover */
 QemuSemaphore pause_sem;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e261f6..0f18dea49e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1113,6 +1113,42 @@ void qemu_savevm_state_header(QEMUFile *f)
 }
 }
 
+int qemu_savevm_nr_failover_devices(void)
+{
+SaveStateEntry *se;
+int n = 0;
+
+QTAILQ_FOREACH(se, _state.handlers, entry) {
+if (se->vmsd && se-&g

[PATCH v5 10/11] net/virtio: add failover support

2019-10-23 Thread Jens Freimann
This patch adds support to handle failover device pairs of a virtio-net
device and a vfio-pci device, where the virtio-net acts as the standby
device and the vfio-pci device as the primary.

The general idea is that we have a pair of devices, a vfio-pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another vfio-pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   two values:
 - one to signal if the device to be added is the matching
   primary device
 - another one to signal to qdev if it should actually
   continue with adding the device or skip it.

   In the latter case it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

   If the virtio-net devices are not realized at the time the vfio-pci
   devices are realized, we need to connect the devices later. This way
   we make sure primary and standby devices can be specified in any
   order.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

3. Register a callback for the migration code that checks if a device
   needs to be unplugged from the guest.

Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c| 302 +
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 3 files changed, 315 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f11422337..cccaf77bde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/atomic.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -21,6 +22,10 @@
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
+#include "qemu/config-file.h"
+#include "qapi/qmp/qdict.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/announce.h"
@@ -28,11 +33,15 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-net.h"
 #include "hw/qdev-properties.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
+#include "monitor/qdev.h"
+#include "hw/pci/pci.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -746,9 +755,99 @@ static inline uint64_t 
virtio_net_supported_guest_offloads(VirtIONet *n)
 return virtio_net_guest_offloads_by_features(vdev->guest_features);
 }
 
+static void failover_add_primary(VirtIONet *n, Error **errp)
+{
+Error *err = NULL;
+
+n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
+n->primary_device_id);
+if (n->primary_device_opts) {
+n->primary_dev = qdev_device_add(n->primary_device_opts, );
+if (err) {
+qemu_opts_del(n->primary_device_opts);
+}
+if (n->primary_dev) {
+n->primary_bus = n->primary_dev->parent_bus;
+if (err) {
+qdev_unplug(n->primary_dev, );
+qdev_set_id(n->primary_dev, "");
+
+}
+}
+} else {
+error_setg(errp, "Primary device not found");
+error_append_hint(errp, "Virtio-net failover will not work. Make "
+"sure primary device has parameter"
+" net_failover_pair_id=\n");
+}
+if (err) {
+error_propagate(errp, err);
+}
+}
+
+static int is_my_primary(void *opaque, QemuOpts *opts, Error **errp)
+{
+VirtIONet *n = opaque;
+int ret = 0;
+
+const char *standby_id = qemu_opt_get(opts, "net_failover_pair_id");
+
+if (standby_id != NULL && (g_strcmp0(standby_id, n->netclient_name) == 0)) 
{
+n->primary_device_id = g_strdup(opts->id);
+ret = 1;
+}
+
+return ret;
+}
+
+static DeviceState *virtio_net_find_primary(VirtIONet *n, Error **errp)
+{
+DeviceState *dev = NULL;
+Error *err = NULL;
+
+if (qemu_opts_foreach(qemu_find_opts("de

[PATCH v5 05/11] qapi: add unplug primary event

2019-10-23 Thread Jens Freimann
This event is emitted when we sent a request to unplug a
failover primary device from the Guest OS and it includes the
device id of the primary device.

Signed-off-by: Jens Freimann 
---
 qapi/migration.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82feb5bd39..52e69e2868 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1448,3 +1448,22 @@
 # Since: 3.0
 ##
 { 'command': 'migrate-pause', 'allow-oob': true }
+
+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device_id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} }
+#
+##
+{ 'event': 'UNPLUG_PRIMARY',
+  'data': { 'device_id': 'str' } }
-- 
2.21.0




[PATCH v5 06/11] qapi: add failover negotiated event

2019-10-23 Thread Jens Freimann
This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
feature was not negotiated during virtio feature negotiation. If this
event is received it means any primary devices hotplugged before
this were were never really added to QEMU devices.

Signed-off-by: Jens Freimann 
---
 qapi/net.json | 16 
 1 file changed, 16 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..8c5f3f1fb2 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,19 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
+#
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "FAILOVER_NEGOTIATED",
+#  "data": {} }
+#
+##
+{ 'event': 'FAILOVER_NEGOTIATED',
+  'data': {} }
-- 
2.21.0




[PATCH v5 09/11] libqos: tolerate wait-unplug migration state

2019-10-23 Thread Jens Freimann
Signed-off-by: Jens Freimann 
---
 tests/libqos/libqos.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
index d71557c5cb..f229eb2cb8 100644
--- a/tests/libqos/libqos.c
+++ b/tests/libqos/libqos.c
@@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri)
 break;
 }
 
-if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) {
+if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)
+|| (strcmp(st, "wait-unplug") == 0)) {
 qobject_unref(rsp);
 g_usleep(5000);
 continue;
-- 
2.21.0




[PATCH v5 02/11] pci: add option for net failover

2019-10-23 Thread Jens Freimann
This patch adds a net_failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. QEMU will exit with an error message otherwise.

Signed-off-by: Jens Freimann 
---
 hw/pci/pci.c | 17 +
 include/hw/pci/pci.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..fa9b5219f8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
 QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
+net_failover_pair_id),
 DEFINE_PROP_END_OF_LIST()
 };
 
@@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 ObjectClass *klass = OBJECT_CLASS(pc);
 Error *local_err = NULL;
 bool is_default_rom;
+uint16_t class_id;
 
 /* initialize cap_present for pci_is_express() and pci_config_size(),
  * Note that hybrid PCIs are not set automatically and need to manage
@@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
 
+if (pci_dev->net_failover_pair_id) {
+if (!pci_is_express(pci_dev)) {
+error_setg(errp, "failover device is not a PCIExpress device");
+error_propagate(errp, local_err);
+return;
+}
+class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+error_setg(errp, "failover device is not an Ethernet device");
+error_propagate(errp, local_err);
+return;
+}
+}
+
 /* rom loading */
 is_default_rom = false;
 if (pci_dev->romfile == NULL && pc->romfile != NULL) {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3f0ffd5fb..def5435685 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -352,6 +352,9 @@ struct PCIDevice {
 MSIVectorUseNotifier msix_vector_use_notifier;
 MSIVectorReleaseNotifier msix_vector_release_notifier;
 MSIVectorPollNotifier msix_vector_poll_notifier;
+
+/* ID of standby device in net_failover pair */
+char *net_failover_pair_id;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
-- 
2.21.0




[PATCH v5 04/11] pci: mark device having guest unplug request pending

2019-10-23 Thread Jens Freimann
Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.

Signed-off-by: Jens Freimann 
---
 hw/pci/pcie.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 19363ff8ce..08718188bb 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
 if (dev->partially_hotplugged) {
+dev->qdev.pending_deleted_event = false;
 return;
 }
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
@@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+dev->pending_deleted_event = true;
+
 /* In case user cancel the operation of multi-function hot-add,
  * remove the function that is unexposed to guest individually,
  * without interaction with guest.
-- 
2.21.0




[PATCH v5 07/11] migration: allow unplug during migration for failover devices

2019-10-23 Thread Jens Freimann
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann 
---
 hw/core/qdev.c | 1 +
 hw/pci/pci.c   | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c | 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f786650446..dc1289da86 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -995,6 +995,7 @@ static void device_initfn(Object *obj)
 
 dev->instance_id_alias = -1;
 dev->realized = false;
+dev->allow_unplug_during_migration = false;
 
 object_property_add_bool(obj, "realized",
  device_get_realized, device_set_realized, NULL);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fa9b5219f8..8fbf32d68c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2116,6 +2116,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 error_propagate(errp, local_err);
 return;
 }
+qdev->allow_unplug_during_migration = true;
 }
 
 /* rom loading */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 710981af36..1518495b1e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -156,6 +156,7 @@ struct DeviceState {
 bool pending_deleted_event;
 QemuOpts *opts;
 int hotplugged;
+bool allow_unplug_during_migration;
 BusState *parent_bus;
 QLIST_HEAD(, NamedGPIOList) gpios;
 QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 676a759fb4..bc6a41fa37 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 return;
 }
 
-if (!migration_is_idle()) {
+if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
 error_setg(errp, "device_del not allowed while migrating");
 return;
 }
-- 
2.21.0




[PATCH v5 03/11] pci: mark devices partially unplugged

2019-10-23 Thread Jens Freimann
Only the guest unplug request was triggered. This is needed for
the failover feature. In case of a failed migration we need to
plug the device back to the guest.

Signed-off-by: Jens Freimann 
---
 hw/pci/pcie.c| 3 +++
 include/hw/pci/pci.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index a6beb567bd..19363ff8ce 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 {
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
+if (dev->partially_hotplugged) {
+return;
+}
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
 object_unparent(OBJECT(dev));
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index def5435685..7b7eac845c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 
 struct PCIDevice {
 DeviceState qdev;
+bool partially_hotplugged;
 
 /* PCI config space */
 uint8_t *config;
-- 
2.21.0




[PATCH v5 0/11] add failover feature for assigned network devices

2019-10-23 Thread Jens Freimann
   -qmp unix:/tmp/qmp.socket,server,nowait \
-monitor telnet:127.0.0.1:,server,nowait \
-device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
-device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
-device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
-netdev 
tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
-device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on
 \
-device 
vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \
/root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 

Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 hw/core/qdev.c |  24 +++
 hw/net/virtio-net.c| 302 +
 hw/pci/pci.c   |  18 ++
 hw/pci/pcie.c  |   6 +
 hw/vfio/pci.c  |  26 ++-
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   4 +
 include/hw/qdev-core.h |  30 
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 include/migration/vmstate.h|   2 +
 migration/migration.c  |  21 +++
 migration/migration.h  |   3 +
 migration/savevm.c |  36 
 migration/savevm.h |   2 +
 qapi/migration.json|  24 ++-
 qapi/net.json  |  16 ++
 qdev-monitor.c |  43 -
 tests/libqos/libqos.c  |   3 +-
 vl.c   |   6 +-
 20 files changed, 565 insertions(+), 15 deletions(-)

-- 
2.21.0




[PATCH v5 01/11] qdev/qbus: add hidden device support

2019-10-23 Thread Jens Freimann
This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.

qdev_device_add() is modified to check for a net_failover_pair_id
argument in the option string. A DeviceListener callback
should_be_hidden() is added. It can be used by a standby device to
inform qdev that this device should not be added now. The standby device
handler can store the device options to plug the device in at a later
point in time.

One reason for hiding the device is that we don't want to expose both
devices to the guest kernel until the respective virtio feature bit
VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be
handled correctly by the guest.

More information on the kernel feature this is using:
 https://www.kernel.org/doc/html/latest/networking/net_failover.html

An example where the primary device is a vfio-pci device and the standby
device is a virtio-net device:

A device is hidden when it has an "net_failover_pair_id" option, e.g.

 -device virtio-net-pci,...,failover=on,...
 -device vfio-pci,...,net_failover_pair_id=net1,...

Signed-off-by: Jens Freimann 
Reviewed-by: Cornelia Huck 
---
 hw/core/qdev.c | 23 +++
 include/hw/qdev-core.h | 29 +
 qdev-monitor.c | 41 +
 vl.c   |  6 --
 4 files changed, 93 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cbad6c1d55..f786650446 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -212,6 +212,29 @@ void device_listener_unregister(DeviceListener *listener)
 QTAILQ_REMOVE(_listeners, listener, link);
 }
 
+bool qdev_should_hide_device(QemuOpts *opts)
+{
+int rc = -1;
+DeviceListener *listener;
+
+QTAILQ_FOREACH(listener, _listeners, link) {
+   if (listener->should_be_hidden) {
+/* should_be_hidden_will return
+ *  1 if device matches opts and it should be hidden
+ *  0 if device matches opts and should not be hidden
+ *  -1 if device doesn't match opts
+ */
+rc = listener->should_be_hidden(listener, opts);
+}
+
+if (rc > 0) {
+break;
+}
+}
+
+return rc > 0;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa123f88cb..710981af36 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp);
  * respective parent types.
  *   
  * 
+ *
+ * # Hiding a device #
+ * To hide a device, a DeviceListener function should_be_hidden() needs to
+ * be registered.
+ * It can be used to defer adding a device and therefore hide it from the
+ * guest. The handler registering to this DeviceListener can save the QOpts
+ * passed to it for re-using it later and must return that it wants the device
+ * to be/remain hidden or not. When the handler function decides the device
+ * shall not be hidden it will be added in qdev_device_add() and
+ * realized as any other device. Otherwise qdev_device_add() will return early
+ * without adding the device. The guest will not see a "hidden" device
+ * until it was marked don't hide and qdev_device_add called again.
+ *
  */
 typedef struct DeviceClass {
 /*< private >*/
@@ -154,6 +167,12 @@ struct DeviceState {
 struct DeviceListener {
 void (*realize)(DeviceListener *listener, DeviceState *dev);
 void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+/*
+ * This callback is called upon init of the DeviceState and allows to
+ * inform qdev that a device should be hidden, depending on the device
+ * opts, for example, to hide a standby device.
+ */
+int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts);
 QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
+/**
+ * @qdev_should_hide_device:
+ * @opts: QemuOpts as passed on cmdline.
+ *
+ * Check if a device should be added.
+ * When a device is added via qdev_device_add() this will be called,
+ * and return if the device should be added now or not.
+ */
+bool qdev_should_hide_device(QemuOpts *opts);
+
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 148df9cacf..676a759fb4 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -32,9 +32,11 @@
 #include "qemu/help_option.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h"
+#incl

Re: [PATCH 02/11] pci: add option for net failover

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 01:01:22PM -0600, Alex Williamson wrote:

On Mon, 21 Oct 2019 20:45:46 +0200
Jens Freimann  wrote:


On Mon, Oct 21, 2019 at 08:58:23AM -0600, Alex Williamson wrote:
>On Fri, 18 Oct 2019 22:20:31 +0200
>Jens Freimann  wrote:
>
>> This patch adds a net_failover_pair_id property to PCIDev which is
>> used to link the primary device in a failover pair (the PCI dev) to
>> a standby (a virtio-net-pci) device.
>>
>> It only supports ethernet devices. Also currently it only supports
>> PCIe devices. QEMU will exit with an error message otherwise.
>
>Doesn't the PCIe device also need to be hotpluggable?  We can have PCIe
>devices attached to the root bus where we don't currently support
>hotplug.  Thanks,

How do I recognize such a device? hotpluggable in DeviceClass?


I wouldn't expect it to be a device property, it's more of a function
of whether the bus that it's attached to supports hotplug, so probably
something related to the bus controller.  Thanks,


IIUC this is checked for in qdev_device_add() by this:

 if (qdev_hotplug && bus && !qbus_is_hotpluggable(bus)) {
   error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);
   return NULL;
 }

So qemu will fail with 'Bus pcie.0 does not allow hotplug' when we try 
to hotplug the device. I tried a primary with bus=pcie.0 and it aborted.

Aborting qemu is not nice so I'll make it print a error message
QERR_BUS_NO_HOTPLUG instead but let it continue. This will be
a change in the virtio-net patch, not in this one. 


regards,
Jens 





Re: [PATCH 11/11] vfio: unplug failover primary device before migration

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 09:19:20AM -0600, Alex Williamson wrote:

On Fri, 18 Oct 2019 22:20:40 +0200
Jens Freimann  wrote:

[...]

+if (!pdev->net_failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+goto error;
+}
+} else {
+pdev->qdev.allow_unplug_during_migration = true;


Why is this unique to vfio-pci, shouldn't any device set as the primary
for a failover allow unplug during migration, and therefore should it
be done in the core code rather than device driver?  With the
net_failover_pair_id set in PCIDevice, I should be able to test with an
e1000e NIC as primary, but this suggests they wouldn't be handled
identically elsewhere.  Thanks,


I assume you're talking about pci core code not qdev core. Failover is
pci specific at this time (only the part to hide devices is more
generic and is in qdev code). I can set the flag to allow migration in
pci_qdev_realize().

regards,
Jens 





Re: [PATCH 02/11] pci: add option for net failover

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 08:58:23AM -0600, Alex Williamson wrote:

On Fri, 18 Oct 2019 22:20:31 +0200
Jens Freimann  wrote:


This patch adds a net_failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. QEMU will exit with an error message otherwise.


Doesn't the PCIe device also need to be hotpluggable?  We can have PCIe
devices attached to the root bus where we don't currently support
hotplug.  Thanks,


How do I recognize such a device? hotpluggable in DeviceClass?

regards,
Jens 





Re: [PATCH 11/11] vfio: unplug failover primary device before migration

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 03:45:04PM +0200, Cornelia Huck wrote:

On Fri, 18 Oct 2019 22:20:40 +0200
Jens Freimann  wrote:


As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann 
---
 hw/vfio/pci.c | 31 +--
 hw/vfio/pci.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..a15b83c6b6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,9 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"

 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2712,12 +2715,26 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 int i, ret;
 bool is_mdev;

+if (!pdev->net_failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+goto error;


This looks wrong, you haven't set up vbasedev.name yet.


You're right.


+}
+} else {
+pdev->qdev.allow_unplug_during_migration = true;
+}
+
 if (!vdev->vbasedev.sysfsdev) {
 if (!(~vdev->host.domain || ~vdev->host.bus ||
   ~vdev->host.slot || ~vdev->host.function)) {
 error_setg(errp, "No provided host device");
 error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F "
   "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
 return;
 }
 vdev->vbasedev.sysfsdev =
@@ -2729,6 +2746,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (stat(vdev->vbasedev.sysfsdev, ) < 0) {
 error_setg_errno(errp, errno, "no such host device");
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.sysfsdev);
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
 return;
 }


Might be a bit easier cleanup-wise if you set up the blocker resp.
allow unplug during migration only here. The only difference is that
you'll get a different error message when trying to set up a
non-failover device with invalid specs on a migratable-only setup.


Yes, so I moved it to this place now. This saves me cleanup up the
migration blocker in the above two cases. I don't jump to error if
adding the migration blocker failed, I just have to free the blocker
Error and can return.




@@ -3008,6 +3027,8 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);


Shouldn't you check whether migration_block has been set up, like in
the finalize routine?


yes, added the same check here.

Thanks Conny!

regards,
Jens 





Re: [PATCH 01/11] qdev/qbus: add hidden device support

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 01:53:58PM +0100, Peter Maydell wrote:

On Fri, 18 Oct 2019 at 21:22, Jens Freimann  wrote:


This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.


In what sense is a hidden device hidden? Is it hidden from
the guest (in what way) ? Is it hidden from the QMP/HMP monitor
commands? Is it hidden from QEMU functions that iterate through
the qbus, or that walk the QOM tree ? What does a hidden device
have to do to be successfully hidden ? Is it completely disconnected
from the rest of the simulation, or does it itself have to avoid
doing things like asserting IRQ lines ?

Expanding the DeviceClass doc comment might be a good place to answer
questions like the above and generally describe the 'hidden device'
mechanism.


ok, I will add to the DeviceClass comment. 




 };

@@ -451,4 +457,6 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);

+bool qdev_should_hide_device(QemuOpts *opts);


New globally visible functions should always have doc-comment
format documentation comments, please.


Will add this too. Thanks for the review!

regards,
Jens 





Re: [PATCH 01/11] qdev/qbus: add hidden device support

2019-10-21 Thread Jens Freimann

On Mon, Oct 21, 2019 at 02:44:08PM +0200, Cornelia Huck wrote:

On Fri, 18 Oct 2019 22:20:30 +0200
Jens Freimann  wrote:


This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.

qdev_device_add() is modified to check for a net_failover_pair_id
argument in the option string. A DeviceListener callback
should_be_hidden() is added. It can be used by a standby device to
inform qdev that this device should not be added now. The standby device
handler can store the device options to plug the device in at a later
point in time.

One reason for hiding the device is that we don't want to expose both
devices to the guest kernel until the respective virtio feature bit
VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be
handled correctly by the guest.

More information on the kernel feature this is using:
 https://www.kernel.org/doc/html/latest/networking/net_failover.html

An example where the primary device is a vfio-pci device and the standby
device is a virtio-net device:

A device is hidden when it has an "net_failover_pair_id" option, e.g.

 -device virtio-net-pci,...,failover=on,...
 -device vfio-pci,...,net_failover_pair_id=net1,...

Signed-off-by: Jens Freimann 
---
 hw/core/qdev.c | 23 +++
 include/hw/qdev-core.h |  8 
 qdev-monitor.c | 36 +---
 vl.c   |  6 --
 4 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cbad6c1d55..89c134ec53 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -212,6 +212,29 @@ void device_listener_unregister(DeviceListener *listener)
 QTAILQ_REMOVE(_listeners, listener, link);
 }

+bool qdev_should_hide_device(QemuOpts *opts)
+{
+int rc;


Initialize to 0?


Yes, that's what the test bot found as well. Fixed it already. 
Though I initialize to -1, otherwise all devices will be hidden.



+DeviceListener *listener;
+
+QTAILQ_FOREACH(listener, _listeners, link) {
+   if (listener->should_be_hidden) {
+/* should_be_hidden_will return
+ *  1 if device matches opts and it should be hidden
+ *  0 if device matches opts and should not be hidden
+ *  -1 if device doesn't match ops
+ */
+rc = listener->should_be_hidden(listener, opts);
+}
+
+if (rc > 0) {
+break;
+}
+}
+
+return rc > 0;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
  int required_for_version)
 {


(...)


+static bool should_hide_device(QemuOpts *opts)
+{
+if (qemu_opt_foreach(opts, is_failover_device, opts, NULL) == 0) {
+return false;
+}
+return true;
+}


I still think you should turn the check around to make it easier to
extend in the future, but this is fine as well.


Sorry, I've gone back and forth on this and ended up with the same.
I'll take another look at it. 


(...)

With the rc thing changed,

Reviewed-by: Cornelia Huck 


Thanks!

regards,
Jens 





Re: [PATCH v4 0/11] add failover feature for assigned network devices

2019-10-21 Thread Jens Freimann

On Sat, Oct 19, 2019 at 08:12:27AM -0700, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20191018202040.30349-1-jfreim...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

 CC  hw/core/or-irq.o
 CC  hw/core/split-irq.o
/tmp/qemu-test/src/hw/core/qdev.c: In function 'qdev_should_hide_device':
/tmp/qemu-test/src/hw/core/qdev.c:235:5: error: 'rc' may be used uninitialized 
in this function [-Werror=maybe-uninitialized]
return rc > 0;


hmpf, always run all tests especially after last minute changes on friday 
afternoon. I'll fix this.

regards,
Jens 



^

cc1: all warnings being treated as errors
make: *** [hw/core/qdev.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
 File "./tests/docker/docker.py", line 662, in 
---
   raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-ozx9dk_0/src/docker-src.2019-10-19-11.10.07.7972:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ozx9dk_0/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m19.619s
user0m8.409s


The full log is available at
http://patchew.org/logs/20191018202040.30349-1-jfreim...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com





Re: [PATCH 05/11] qapi: add unplug primary event

2019-10-21 Thread Jens Freimann

On Fri, Oct 18, 2019 at 03:28:36PM -0500, Eric Blake wrote:

On 10/18/19 3:20 PM, Jens Freimann wrote:

This event is emitted when we sent a request to unplug a
failover primary device from the Guest OS and it includes the
device id of the primary device.

Signed-off-by: Jens Freimann 
---
 qapi/migration.json | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82feb5bd39..52e69e2868 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1448,3 +1448,22 @@
 # Since: 3.0
 ##
 { 'command': 'migrate-pause', 'allow-oob': true }
+
+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device_id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} }


Unless there is a strong reason in favor of 'device_id' (such as 
consistency with a similar event), our naming convention prefers this 
to be 'device-id'.


No reason, I'll change it.

Thanks Eric!

regards,
Jens 





[PATCH 07/11] migration: allow unplug during migration for failover devices

2019-10-18 Thread Jens Freimann
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for vfio-pci devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann 
---
 hw/core/qdev.c | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 89c134ec53..b1be568af3 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -995,6 +995,7 @@ static void device_initfn(Object *obj)
 
 dev->instance_id_alias = -1;
 dev->realized = false;
+dev->allow_unplug_during_migration = false;
 
 object_property_add_bool(obj, "realized",
  device_get_realized, device_set_realized, NULL);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 28f594a47d..6b690e85b1 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -143,6 +143,7 @@ struct DeviceState {
 bool pending_deleted_event;
 QemuOpts *opts;
 int hotplugged;
+bool allow_unplug_during_migration;
 BusState *parent_bus;
 QLIST_HEAD(, NamedGPIOList) gpios;
 QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 508d85df87..265ab4f0d5 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -848,7 +848,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 return;
 }
 
-if (!migration_is_idle()) {
+if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
 error_setg(errp, "device_del not allowed while migrating");
 return;
 }
-- 
2.21.0




[PATCH 10/11] net/virtio: add failover support

2019-10-18 Thread Jens Freimann
This patch adds support to handle failover device pairs of a virtio-net
device and a vfio-pci device, where the virtio-net acts as the standby
device and the vfio-pci device as the primary.

The general idea is that we have a pair of devices, a vfio-pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another vfio-pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   two values:
 - one to signal if the device to be added is the matching
   primary device
 - another one to signal to qdev if it should actually
   continue with adding the device or skip it.

   In the latter case it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

   If the virtio-net devices are not realized at the time the vfio-pci
   devices are realized, we need to connect the devices later. This way
   we make sure primary and standby devices can be specified in any
   order.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

3. Register a callback for the migration code that checks if a device
   needs to be unplugged from the guest.

Signed-off-by: Jens Freimann 
---
 hw/net/virtio-net.c| 282 +
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h |   1 +
 3 files changed, 295 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f11422337..afe113f083 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/atomic.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -21,6 +22,10 @@
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
+#include "qemu/config-file.h"
+#include "qapi/qmp/qdict.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/announce.h"
@@ -28,11 +33,15 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-net.h"
 #include "hw/qdev-properties.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
+#include "monitor/qdev.h"
+#include "hw/pci/pci.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -746,9 +755,85 @@ static inline uint64_t 
virtio_net_supported_guest_offloads(VirtIONet *n)
 return virtio_net_guest_offloads_by_features(vdev->guest_features);
 }
 
+static void failover_add_primary(VirtIONet *n)
+{
+Error *err = NULL;
+
+n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
+n->primary_device_id);
+if (n->primary_device_opts) {
+n->primary_dev = qdev_device_add(n->primary_device_opts, );
+if (err) {
+qemu_opts_del(n->primary_device_opts);
+}
+if (n->primary_dev) {
+n->primary_bus = n->primary_dev->parent_bus;
+if (err) {
+qdev_unplug(n->primary_dev, );
+qdev_set_id(n->primary_dev, "");
+
+}
+}
+}
+if (err) {
+error_report_err(err);
+}
+}
+
+static int is_my_primary(void *opaque, QemuOpts *opts, Error **errp)
+{
+VirtIONet *n = opaque;
+int ret = 0;
+
+const char *standby_id = qemu_opt_get(opts, "net_failover_pair_id");
+
+if (standby_id != NULL && (g_strcmp0(standby_id, n->netclient_name) == 0)) 
{
+n->primary_device_id = g_strdup(opts->id);
+ret = 1;
+}
+
+return ret;
+}
+
+static DeviceState *virtio_net_find_primary(VirtIONet *n, Error *err)
+{
+DeviceState *dev = NULL;
+
+if (qemu_opts_foreach(qemu_find_opts("device"),
+ is_my_primary, n, )) {
+if (n->primary_device_id) {
+dev = qdev_find_recursive(sysbus_get_default(),
+n->primary_device_id);
+} else {
+return NULL;
+}
+}
+return dev;
+}
+
+
+
+static Dev

[PATCH 11/11] vfio: unplug failover primary device before migration

2019-10-18 Thread Jens Freimann
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann 
---
 hw/vfio/pci.c | 31 +--
 hw/vfio/pci.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..a15b83c6b6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,9 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2712,12 +2715,26 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 int i, ret;
 bool is_mdev;
 
+if (!pdev->net_failover_pair_id) {
+error_setg(>migration_blocker,
+"VFIO device doesn't support migration");
+ret = migrate_add_blocker(vdev->migration_blocker, );
+if (err) {
+error_propagate(errp, err);
+goto error;
+}
+} else {
+pdev->qdev.allow_unplug_during_migration = true;
+}
+
 if (!vdev->vbasedev.sysfsdev) {
 if (!(~vdev->host.domain || ~vdev->host.bus ||
   ~vdev->host.slot || ~vdev->host.function)) {
 error_setg(errp, "No provided host device");
 error_append_hint(errp, "Use -device vfio-pci,host=:BB:DD.F "
   "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
 return;
 }
 vdev->vbasedev.sysfsdev =
@@ -2729,6 +2746,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (stat(vdev->vbasedev.sysfsdev, ) < 0) {
 error_setg_errno(errp, errno, "no such host device");
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.sysfsdev);
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
 return;
 }
 
@@ -3008,6 +3027,8 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3019,6 +3040,10 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
+if (vdev->migration_blocker) {
+migrate_del_blocker(vdev->migration_blocker);
+error_free(vdev->migration_blocker);
+}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3151,11 +3176,6 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
-.name = "vfio-pci",
-.unmigratable = 1,
-};
-
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3163,7 +3183,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
void *data)
 
 dc->reset = vfio_pci_reset;
 dc->props = vfio_pci_dev_properties;
-dc->vmsd = _pci_vmstate;
 dc->desc = "VFIO-based PCI device assignment";
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 pdc->realize = vfio_realize;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..b329d50338 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
+Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0




[PATCH 04/11] pci: mark device having guest unplug request pending

2019-10-18 Thread Jens Freimann
Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.

Signed-off-by: Jens Freimann 
---
 hw/pci/pcie.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 19363ff8ce..08718188bb 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, 
void *opaque)
 HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
 if (dev->partially_hotplugged) {
+dev->qdev.pending_deleted_event = false;
 return;
 }
 hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort);
@@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+dev->pending_deleted_event = true;
+
 /* In case user cancel the operation of multi-function hot-add,
  * remove the function that is unexposed to guest individually,
  * without interaction with guest.
-- 
2.21.0




  1   2   3   4   5   >