Re: [RFC 1/6] migration: Add virtio-iterative capability

Jonah Palmer Mon, 25 Aug 2025 07:59:18 -0700



On 8/25/25 8:44 AM, Markus Armbruster wrote:

Please excuse the delay, I was on vacation.

Jonah Palmer <jonah.pal...@oracle.com> writes:

On 8/8/25 6:48 AM, Markus Armbruster wrote:

I apologize for the lateness of my review.


Late again: I was on vacation.

Jonah Palmer <jonah.pal...@oracle.com> writes:

Adds a new migration capability 'virtio-iterative' that will allow
virtio devices, where supported, to iteratively migrate configuration
changes that occur during the migration process.


Why is that desirable?


To be frank, I wasn't sure if having a migration capability, or even
have it toggleable at all, would be desirable or not. It appears though
that this might be better off as a per-device feature set via
--device virtio-net-pci,iterative-mig=on,..., for example.


See below.

And by "iteratively migrate configuration changes" I meant more along
the lines of the device's state as it continues running on the source.


Isn't that what migration does always?

Essentially yes, but today all of the state is only migrated at the end,once the source has been paused. So the final correct state is alwayssent to the destination.

If we're no longer waiting until the source has been paused and theinitial state is sent early, then we need to make sure that any changesthat happen is still communicated to the destination.

This RFC handles this by just re-sending the entire state again once thesource has been paused. But of course this isn't optimal and I'm lookinginto how to better optimize this part.

But perhaps actual configuration changes (e.g. changing the number of
queue pairs) could also be supported mid-migration like this?


I don't know.

This capability is added to the validated capabilities list to ensure
both the source and destination support it before enabling.


What happens when only one side enables it?


The migration stream breaks if only one side enables it.


How does it break?  Error message pointing out the misconfiguration?

The destination VM is torn down and the source just reports thatmigration failed.

I don't believe the source/destination could be aware of themisconfiguration. IIUC the destination reads the migration stream andexpects certain pieces of data in a certain order. If new data is addedto the migration stream or the order has changed and the destinationisn't expecting it, then the migration fails. It doesn't know exactlywhy, just that it read-in data that it wasn't expecting.

This is poor wording on my part, my apologies. I don't think it's even
possible to know the capabilities between the source & destination.

The capability defaults to off to maintain backward compatibility.

To enable the capability via HMP:
(qemu) migrate_set_capability virtio-iterative on

To enable the capability via QMP:
{"execute": "migrate-set-capabilities", "arguments": {
       "capabilities": [
          { "capability": "virtio-iterative", "state": true }
       ]
    }
}

Signed-off-by: Jonah Palmer <jonah.pal...@oracle.com>
---
  migration/savevm.c  | 1 +
  qapi/migration.json | 7 ++++++-
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index bb04a4520d..40a2189866 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
      switch (capability) {
      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
      case MIGRATION_CAPABILITY_MAPPED_RAM:
+    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
          return true;
      default:
          return false;
diff --git a/qapi/migration.json b/qapi/migration.json
index 4963f6ca12..8f042c3ba5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -479,6 +479,11 @@
  #     each RAM page.  Requires a migration URI that supports seeking,
  #     such as a file.  (since 9.0)
  #
+# @virtio-iterative: Enable iterative migration for virtio devices, if
+#     the device supports it. When enabled, and where supported, virtio
+#     devices will track and migrate configuration changes that may
+#     occur during the migration process. (Since 10.1)


When and why should the user enable this?


Well if all goes according to plan, always (at least for virtio-net).
This should improve the overall speed of live migration for a virtio-net
device (and vhost-net/vhost-vdpa).


So the only use for "disabled" would be when migrating to or from an
older version of QEMU that doesn't support this.  Fair?


Correct.

What's the default?


Disabled.

What exactly do you mean by "where supported"?


I meant if both source's Qemu and destination's Qemu support it, as well
as for other virtio devices in the future if they decide to implement
iterative migration (e.g. a more general "enable iterative migration for
virtio devices").

But I think for now this is better left as a virtio-net configuration
rather than as a migration capability (e.g. --device
virtio-net-pci,iterative-mig=on/off,...)


Makes sense to me (but I'm not a migration expert).

[...]

Re: [RFC 1/6] migration: Add virtio-iterative capability

Reply via email to