Share some updates of my work on this topic recently:
Thanks for Erik's guide and advices. Now my PoC patches almost works.
Will send the RFC soon.
Mostly the ideas are based on Alex's idea: a match between a device
state version and a minimum required version
"Match of versions" in Libvirt
Initialization stage:
- Libvirt would detect if there is any device state version in a
"mdev_type" of a mediated device when creating a mdev node in node
device tree.
- If the "mdev_type" of a mediated device *has* a device state version,
then this mediated device supports migration.
- If not, (compatibility case, mostly for old vendor drivers which
don't support migration), this mediated device doesn't support migration
Migration stage:
- Libvirt would put the mdev information inside cookies and send them
between src machine and dst machine. So a new type of cookie would be
added here.
There are different versions of migration protocols in libvirt. Each of
them starts to send cookies in different sequence. The idea here is to
let the match happens as early as possible. Looks like QEMU driver in
libvirt only support V2/V3 proto.
V2 proto:
- The match would happen in SRC machine after the DST machine transfers
the cookies with mdev information back to the SRC machine during the
"preparation" stage. The disadvantage is the DST virtual machine has
already been created in "preparation" stage. If the match fails, the
virtual machine in DST machine has to be killed as well, which would
waste some time.
V3 proto:
- The match would happen in DST machine after the SRC machine transfers
the cookies to the DST machine during the "begin" stage. As the DST
machine hasn't entered into "preparation" stage at this time, the
virtual machine hasn't been created in DST machine at this point. No
extra VM destroy is needed if the match fails. This would be the ideal
place for a match.
"Match of version" in QEMU level
As there are several different types of migration in libvirt. In a
migration with hypervisor native transport, the target machine could
even not have libvirtd, the migration happens between device models
directly. So we need a match in QEMU level as well. We might still need
Kirti's approach as the last level match.
Thanks,
Zhi.
On 08/11/18 05:28, Zhi Wang wrote:
Hi Alex and Kirti:
Thanks for your reply and discussion. :) Sorry for my late reply since
there quite some work and email needs to be caught up after my vacation.
From my point of view, failing the migration because of the mismatch
of version in different levels provides different pros/cons.
- Match version in userspace toolkit level, like in QEMU and Libvirt:
Pros: Better responsiveness since the match of the version would be
figured out before actually suspend/resume devices. All the userspace
toolkit could provide these information to UI or other management tool,
like virtsh and virt manager, so it would be helpful for the
administrator to know what's happening through the management interface.
Cons: Vendor driver has to expose the version information. Some vendor
driver might not wish to expose that explicitly. Considering the mdev
could be highly related to different vendors and different devices, this
might happen in future as well.
- Match version in device state level (vendor-specific)
Pros: The vendor driver doesn't need to explain and expose the a
explicit version of device state.
Cons: Waste of bandwidth. Bad responsiveness and informative.
How about we combine the two ideas together? The vendor driver could
decide to use the device state or not. But still, the error information
could be a problem since it's could be hard for the management tool like
virtsh or virt-manager to get a error message from a remote node.
Let me cook some RFC patch in the next week.
Have a great weekend. :)
Thanks,
Zhi.
-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Monday,
August 6, 2018 10:22 PM
To: Kirti Wankhede
Cc: Wang, Zhi A ; libvir-list@redhat.com
Subject: Re: Matching the type of mediated devices in the migration
On Mon, 6 Aug 2018 23:45:21 +0530
Kirti Wankhede wrote:
On 8/3/2018 11:26 PM, Alex Williamson wrote:
> On Fri, 3 Aug 2018 12:07:58 +
> "Wang, Zhi A" wrote:
> >> Hi:
>>
>> Thanks for unfolding your idea. The picture is clearer to me now. I
didn't realize that you also want to support cross hardware migration.
Well, I thought for a while, the cross hardware migration might be not
popular in vGPU case but could be quite popular in other mdev cases. >
> Exactly, we need to think beyond the implementation for a specific >
vendor or class of device.
> >> Let me continue my summary:
>>
>> Mdev dev type has already included a parent driver name/a group
name/physical device version/configuration type. For example
i915-GVTg_V5_4. The driver name and the group name could already
distinguish the vendor and the product