On 7/28/25 3:35 AM, Jason Wang wrote:
On Mon, Jul 28, 2025 at 3:09 PM Jason Wang <jasow...@redhat.com> wrote:
On Fri, Jul 25, 2025 at 5:33 PM Michael S. Tsirkin <m...@redhat.com> wrote:
On Thu, Jul 24, 2025 at 05:59:20PM -0400, Jonah Palmer wrote:
On 7/23/25 1:51 AM, Jason Wang wrote:
On Tue, Jul 22, 2025 at 8:41 PM Jonah Palmer <jonah.pal...@oracle.com> wrote:
This series is an RFC initial implementation of iterative live
migration for virtio-net devices.
The main motivation behind implementing iterative migration for
virtio-net devices is to start on heavy, time-consuming operations
for the destination while the source is still active (i.e. before
the stop-and-copy phase).
It would be better to explain which kind of operations were heavy and
time-consuming and how iterative migration help.
You're right. Apologies for being vague here.
I did do some profiling of the virtio_load call for virtio-net to try and
narrow down where exactly most of the downtime is coming from during the
stop-and-copy phase.
Pretty much the entirety of the downtime comes from the vmstate_load_state
call for the vmstate_virtio's subsections:
/* Subsections */
ret = vmstate_load_state(f, &vmstate_virtio, vdev, 1);
if (ret) {
return ret;
}
More specifically, the vmstate_virtio_virtqueues and
vmstate_virtio_extra_state subsections.
For example, currently (with no iterative migration), for a virtio-net
device, the virtio_load call took 13.29ms to finish. 13.20ms of that time
was spent in vmstate_load_state(f, &vmstate_virtio, vdev, 1).
Of that 13.21ms, ~6.83ms was spent migrating vmstate_virtio_virtqueues and
~6.33ms was spent migrating the vmstate_virtio_extra_state subsections. And
I believe this is from walking VIRTIO_QUEUE_MAX virtqueues, twice.
Can we optimize it simply by sending a bitmap of used vqs?
+1.
For example devices like virtio-net may know exactly the number of
virtqueues that will be used.
Ok, I think it comes from the following subsections:
static const VMStateDescription vmstate_virtio_virtqueues = {
.name = "virtio/virtqueues",
.version_id = 1,
.minimum_version_id = 1,
.needed = &virtio_virtqueue_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
VIRTIO_QUEUE_MAX, 0, vmstate_virtqueue, VirtQueue),
VMSTATE_END_OF_LIST()
}
};
static const VMStateDescription vmstate_virtio_packed_virtqueues = {
.name = "virtio/packed_virtqueues",
.version_id = 1,
.minimum_version_id = 1,
.needed = &virtio_packed_virtqueue_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue,
VirtQueue),
VMSTATE_END_OF_LIST()
}
};
A rough idea is to disable those subsections and use new subsections
instead (and do the compatibility work) like virtio_save():
for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
if (vdev->vq[i].vring.num == 0)
break;
}
qemu_put_be32(f, i);
....
Thanks
Right. There's this for split/packed VQs and then there's also the
vmstate_virtio_extra_state which ends up loading this
virtio-pci/modern-state:
static const VMStateDescription vmstate_virtio_pci_modern_state_sub = {
.name = "virtio_pci/modern_state",
.version_id = 1,
.minimum_version_id = 1,
.needed = &virtio_pci_modern_state_needed,
.fields = (const VMStateField[]) {
VMSTATE_UINT32(dfselect, VirtIOPCIProxy),
VMSTATE_UINT32(gfselect, VirtIOPCIProxy),
VMSTATE_UINT32_ARRAY(guest_features, VirtIOPCIProxy, 2),
VMSTATE_STRUCT_ARRAY(vqs, VirtIOPCIProxy, VIRTIO_QUEUE_MAX, 0,
vmstate_virtio_pci_modern_queue_state,
VirtIOPCIQueue),
VMSTATE_END_OF_LIST()
}
};
...
vmstate_load_state_end virtio/virtqueues end/0
vmstate_load_state virtio/extra_state v1
vmstate_load_state virtio_pci v1
vmstate_load_state virtio_pci/modern_state v1
vmstate_load_state virtio_pci/modern_queue_state v1
...
I'll take a look at what could be done here and try and get it into the
next series.
vmstate_load_state virtio-net v11
vmstate_load_state PCIDevice v2
vmstate_load_state_end PCIDevice end/0
vmstate_load_state virtio-net-device v11
vmstate_load_state virtio-net-queue-tx_waiting v0
vmstate_load_state_end virtio-net-queue-tx_waiting end/0
vmstate_load_state virtio-net-vnet v0
vmstate_load_state_end virtio-net-vnet end/0
vmstate_load_state virtio-net-ufo v0
vmstate_load_state_end virtio-net-ufo end/0
vmstate_load_state virtio-net-tx_waiting v0
vmstate_load_state virtio-net-queue-tx_waiting v0
vmstate_load_state_end virtio-net-queue-tx_waiting end/0
vmstate_load_state virtio-net-queue-tx_waiting v0
vmstate_load_state_end virtio-net-queue-tx_waiting end/0
vmstate_load_state virtio-net-queue-tx_waiting v0
vmstate_load_state_end virtio-net-queue-tx_waiting end/0
vmstate_load_state_end virtio-net-tx_waiting end/0
vmstate_load_state_end virtio-net-device end/0
vmstate_load_state virtio v1
vmstate_load_state virtio/64bit_features v1
vmstate_load_state_end virtio/64bit_features end/0
vmstate_load_state virtio/virtqueues v1
vmstate_load_state virtqueue_state v1 <--- Queue idx 0
...
vmstate_load_state_end virtqueue_state end/0
vmstate_load_state virtqueue_state v1 <--- Queue idx 1023
vmstate_load_state_end virtqueue_state end/0
vmstate_load_state_end virtio/virtqueues end/0
vmstate_load_state virtio/extra_state v1
vmstate_load_state virtio_pci v1
vmstate_load_state virtio_pci/modern_state v1
vmstate_load_state virtio_pci/modern_queue_state v1 <--- Queue idx 0
vmstate_load_state_end virtio_pci/modern_queue_state end/0
...
vmstate_load_state virtio_pci/modern_queue_state v1 <--- Queue idx 1023
vmstate_load_state_end virtio_pci/modern_queue_state end/0
vmstate_load_state_end virtio_pci/modern_state end/0
vmstate_load_state_end virtio_pci end/0
vmstate_load_state_end virtio/extra_state end/0
vmstate_load_state virtio/started v1
vmstate_load_state_end virtio/started end/0
vmstate_load_state_end virtio end/0
vmstate_load_state_end virtio-net end/0
vmstate_downtime_load type=non-iterable idstr=0000:00:03.0/virtio-net
instance_id=0 downtime=13260
With iterative migration for virtio-net (maybe all virtio devices?), we can
send this early while the source is still running and then only send the
deltas during the stop-and-copy phase. It's likely that the source wont be
using all VIRTIO_QUEUE_MAX virtqueues during the migration period, so this
could really minimize a large majority of the downtime contributed by
virtio-net.
This could be one example.
Or if the system call is expensive, could we try io_uring to mitigate it.
The motivation behind this RFC series specifically is to provide an
initial framework for such an implementation and get feedback on the
design and direction.
-------
This implementation of iterative live migration for a virtio-net device
is enabled via setting the migration capability 'virtio-iterative' to
on for both the source & destination, e.g. (HMP):
(qemu) migrate_set_capability virtio-iterative on
The virtio-net device's SaveVMHandlers hooks are registered/unregistered
during the device's realize/unrealize phase.
I wonder about the plan for libvirt support.
Could you elaborate on this a bit?
I meant how this feature will be supported by the libvirt.
Currently, this series only sends and loads the vmstate at the start of
migration. The vmstate is still sent (again) during the stop-and-copy
phase, as it is today, to handle any deltas in the state since it was
initially sent. A future patch in this series could avoid having to
re-send and re-load the entire state again and instead focus only on the
deltas.
There is a slight, modest improvement in guest-visible downtime from
this series. More specifically, when using iterative live migration with
a virtio-net device, the downtime contributed by migrating a virtio-net
device decreased from ~3.2ms to ~1.4ms on average:
Are you testing this via a software virtio device or hardware one?
Just software (virtio-device, vhost-net) with these numbers. I can run some
tests with vDPA hardware though.
I see. Considering you see great improvement with software devices. It
should be sufficient.
Those numbers were from a simple, 1 queue-pair virtio-net device.
Thanks
Before:
-------
vmstate_downtime_load type=non-iterable idstr=0000:00:03.0/virtio-net
instance_id=0 downtime=3594
After:
------
vmstate_downtime_load type=non-iterable idstr=0000:00:03.0/virtio-net
instance_id=0 downtime=1607
This slight improvement is likely due to the initial vmstate_load_state
call "warming up" pages in memory such that, when it's called a second
time during the stop-and-copy phase, allocation and page-fault latencies
are reduced.
-------
Comments, suggestions, etc. are welcome here.
Jonah Palmer (6):
migration: Add virtio-iterative capability
virtio-net: Reorder vmstate_virtio_net and helpers
virtio-net: Add SaveVMHandlers for iterative migration
virtio-net: iter live migration - migrate vmstate
virtio,virtio-net: skip consistency check in virtio_load for iterative
migration
virtio-net: skip vhost_started assertion during iterative migration
hw/net/virtio-net.c | 246 +++++++++++++++++++++++++++------
hw/virtio/virtio.c | 32 +++--
include/hw/virtio/virtio-net.h | 8 ++
include/hw/virtio/virtio.h | 7 +
migration/savevm.c | 1 +
qapi/migration.json | 7 +-
6 files changed, 247 insertions(+), 54 deletions(-)
--
2.47.1
Thanks