On Mon, Jul 28, 2025 at 6:24 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 7/28/25 11:30 AM, Eugenio Perez Martin wrote: > > On Tue, Jul 22, 2025 at 2:41 PM Jonah Palmer <jonah.pal...@oracle.com> > > wrote: > >> > >> Iterative live migration for virtio-net sends an initial > >> VMStateDescription while the source is still active. Because data > >> continues to flow for virtio-net, the guest's avail index continues to > >> increment after last_avail_idx had already been sent. This causes the > >> destination to often see something like this from virtio_error(): > >> > >> VQ 0 size 0x100 Guest index 0x0 inconsistent with Host index 0xc: delta > >> 0xfff4 > >> > >> This patch suppresses this consistency check if we're loading the > >> initial VMStateDescriptions via iterative migration and unsuppresses > >> it for the stop-and-copy phase when the final VMStateDescriptions > >> (carrying the correct indices) are loaded. > >> > >> A temporary VirtIODevMigration migration data structure is introduced here > >> to > >> represent the iterative migration process for a VirtIODevice. For now it > >> just holds a flag to indicate whether or not the initial > >> VMStateDescription was sent during the iterative live migration process. > >> > >> Signed-off-by: Jonah Palmer <jonah.pal...@oracle.com> > >> --- > >> hw/net/virtio-net.c | 13 +++++++++++++ > >> hw/virtio/virtio.c | 32 ++++++++++++++++++++++++-------- > >> include/hw/virtio/virtio.h | 6 ++++++ > >> 3 files changed, 43 insertions(+), 8 deletions(-) > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index 86a6fe5b91..b7ac5e8278 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -3843,12 +3843,19 @@ static void virtio_net_save_cleanup(void *opaque) > >> > >> static int virtio_net_load_setup(QEMUFile *f, void *opaque, Error **errp) > >> { > >> + VirtIONet *n = opaque; > >> + VirtIODevice *vdev = VIRTIO_DEVICE(n); > >> + vdev->migration = g_new0(VirtIODevMigration, 1); > >> + vdev->migration->iterative_vmstate_loaded = false; > >> + > >> return 0; > >> } > >> > >> static int virtio_net_load_state(QEMUFile *f, void *opaque, int > >> version_id) > >> { > >> VirtIONet *n = opaque; > >> + VirtIODevice *vdev = VIRTIO_DEVICE(n); > >> + VirtIODevMigration *mig = vdev->migration; > >> uint64_t flag; > >> > >> flag = qemu_get_be64(f); > >> @@ -3861,6 +3868,7 @@ static int virtio_net_load_state(QEMUFile *f, void > >> *opaque, int version_id) > >> case VNET_MIG_F_INIT_STATE: > >> { > >> vmstate_load_state(f, &vmstate_virtio_net, n, > >> VIRTIO_NET_VM_VERSION); > >> + mig->iterative_vmstate_loaded = true; > > > > This code will need to change if we send the status iteratively more > > than once. For example, if the guest changes the mac address, the > > number of vqs, etc. > > > > Hopefully we can reach a solution where we'd only need to call the full > vmstate_load_state(f, &vmstate_virtio_net, ...) for a virtio-net device > once and then handle any changes afterwards individually. > > Perhaps, maybe for simplicity, we could just send the > sub-states/subsections (instead of the whole state again) iteratively if > there were any changes in the fields that those sub-states/subsections > govern. > > Definitely something I'll keep in mind as this series develops. > > > In my opinion, we should set a flag named "in_iterative_migration" (or > > equivalent) in virtio_net_load_setup and clear it in > > virtio_net_load_cleanup. That's enough to tell in virtio_load if we > > should perform actions like checking for inconsistent indices. > > > > I did actually try something like this but I realized that the > .load_cleanup and .save_cleanup hooks actually fire at the very end of > live migration (e.g. during the stop-and-copy phase). I thought they > fired at the end of the iterative portion of live migration, but this > didn't appear to be the case. >
Ok that makes a lot of sense. What about .switchover_start ? We need the switchover capability though, not sure if it is a good idea to mandate it as a requirement. So yes, maybe this patch is the most reliable way to do so. > >> break; > >> } > >> default: > >> @@ -3875,6 +3883,11 @@ static int virtio_net_load_state(QEMUFile *f, void > >> *opaque, int version_id) > >> > >> static int virtio_net_load_cleanup(void *opaque) > >> { > >> + VirtIONet *n = opaque; > >> + VirtIODevice *vdev = VIRTIO_DEVICE(n); > >> + g_free(vdev->migration); > >> + vdev->migration = NULL; > >> + > >> return 0; > >> } > >> > >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > >> index 5534251e01..68957ee7d1 100644 > >> --- a/hw/virtio/virtio.c > >> +++ b/hw/virtio/virtio.c > >> @@ -3222,6 +3222,7 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int > >> version_id) > >> int32_t config_len; > >> uint32_t num; > >> uint32_t features; > >> + bool inconsistent_indices; > >> BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); > >> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > >> VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); > >> @@ -3365,6 +3366,16 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int > >> version_id) > >> if (vdev->vq[i].vring.desc) { > >> uint16_t nheads; > >> > >> + /* > >> + * Ring indices will be inconsistent during iterative > >> migration. The actual > >> + * indices will be sent later during the stop-and-copy phase. > >> + */ > >> + if (vdev->migration) { > >> + inconsistent_indices = > >> !vdev->migration->iterative_vmstate_loaded; > >> + } else { > >> + inconsistent_indices = false; > >> + } > > > > Nit, "inconsistent_indices = vdev->migration && > > !vdev->migration->iterative_vmstate_loaded" ? I'm happy with the > > current "if else" too, but I think the one line is clearer. Your call > > :). > > > > Ah, nice catch! I like the one-liner more :) Will change this for next > series. > > >> + > >> /* > >> * VIRTIO-1 devices migrate desc, used, and avail ring > >> addresses so > >> * only the region cache needs to be set up. Legacy devices > >> need > >> @@ -3384,14 +3395,19 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int > >> version_id) > >> continue; > >> } > >> > >> - nheads = vring_avail_idx(&vdev->vq[i]) - > >> vdev->vq[i].last_avail_idx; > >> - /* Check it isn't doing strange things with descriptor > >> numbers. */ > >> - if (nheads > vdev->vq[i].vring.num) { > >> - virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x " > >> - "inconsistent with Host index 0x%x: delta > >> 0x%x", > >> - i, vdev->vq[i].vring.num, > >> - vring_avail_idx(&vdev->vq[i]), > >> - vdev->vq[i].last_avail_idx, nheads); > >> + if (!inconsistent_indices) { > >> + nheads = vring_avail_idx(&vdev->vq[i]) - > >> vdev->vq[i].last_avail_idx; > >> + /* Check it isn't doing strange things with descriptor > >> numbers. */ > >> + if (nheads > vdev->vq[i].vring.num) { > >> + virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x " > >> + "inconsistent with Host index 0x%x: > >> delta 0x%x", > >> + i, vdev->vq[i].vring.num, > >> + vring_avail_idx(&vdev->vq[i]), > >> + vdev->vq[i].last_avail_idx, nheads); > >> + inconsistent_indices = true; > >> + } > >> + } > >> + if (inconsistent_indices) { > >> vdev->vq[i].used_idx = 0; > >> vdev->vq[i].shadow_avail_idx = 0; > >> vdev->vq[i].inuse = 0; > >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > >> index 214d4a77e9..06b6e6ba65 100644 > >> --- a/include/hw/virtio/virtio.h > >> +++ b/include/hw/virtio/virtio.h > >> @@ -98,6 +98,11 @@ enum virtio_device_endian { > >> VIRTIO_DEVICE_ENDIAN_BIG, > >> }; > >> > >> +/* VirtIODevice iterative live migration data structure */ > >> +typedef struct VirtIODevMigration { > >> + bool iterative_vmstate_loaded; > >> +} VirtIODevMigration; > >> + > >> /** > >> * struct VirtIODevice - common VirtIO structure > >> * @name: name of the device > >> @@ -151,6 +156,7 @@ struct VirtIODevice > >> bool disable_legacy_check; > >> bool vhost_started; > >> VMChangeStateEntry *vmstate; > >> + VirtIODevMigration *migration; > >> char *bus_name; > >> uint8_t device_endian; > >> /** > >> -- > >> 2.47.1 > >> > > >