On Tue, Jul 22, 2025 at 12:41:26PM +0000, Jonah Palmer wrote:
> Iterative live migration for virtio-net sends an initial
> VMStateDescription while the source is still active. Because data
> continues to flow for virtio-net, the guest's avail index continues to
> increment after last_avail_idx had already been sent. This causes the
> destination to often see something like this from virtio_error():
> 
> VQ 0 size 0x100 Guest index 0x0 inconsistent with Host index 0xc: delta 0xfff4

This is pretty much understanable, as vmstate_save() / vmstate_load() are,
IMHO, not designed to be used while VM is running.

To me, it's still illegal (per previous patch) to use vmstate_save_state()
while VM is running, in a save_setup() phase.

Some very high level questions from migration POV:

- Have we figured out why the downtime can be shrinked just by sending the
  vmstate twice?

  If we suspect it's memory got preheated, have we tried other ways to
  simply heat the memory up on dest side?  For example, some form of
  mlock[all]()?  IMHO it's pretty important we figure out the root of why
  such optimization came from.

  I do remember we have downtime issue with number of max_vqueues that may
  cause post_load() to be slow, I wonder there're other ways to improve it
  instead of vmstate_save(), especially in setup phase.

- Normally devices need iterative phase because:

  (a) the device may contain huge amount of data to transfer

      E.g. RAM and VFIO are good examples and fall into this category.

  (b) the device states are "iterable" from concept

      RAM is definitely true.  VFIO somehow mimiced that even though it was
      a streamed binary protocol..

  What's the answer for virtio-net here?  How large is the device state?
  Is this relevant to vDPA and real hardware (so virtio-net can look
  similar to VFIO at some point)?

Thanks,

-- 
Peter Xu


Reply via email to