Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration

Si-Wei Liu Fri, 15 Aug 2025 12:36:20 -0700

Hi Jonah,

On 8/15/2025 7:50 AM, Jonah Palmer wrote:

On 8/14/25 5:28 AM, Eugenio Perez Martin wrote:
On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <pet...@redhat.com> wrote:
On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote:
On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <pet...@redhat.com> wrote:
On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote:
This effort was started to reduce the guest visible downtime by
virtio-net/vhost-net/vhost-vDPA during live migration, especially
vhost-vDPA.
The downtime contributed by vhost-vDPA, for example, is not fromhaving tomigrate a lot of state but rather expensive backend control-planelatencylike CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLANfilters, offloadsettings, MTU, etc.). Doing this requires kernel/HW NICoperations which
dominates its downtime.
In other words, by migrating the state of virtio-net early(before thestop-and-copy phase), we can also start staging backendconfigurations,which is the main contributor of downtime when migrating avhost-vDPA
device.
I apologize if this series gives the impression that we'remigrating a lotof data here. It's more along the lines of moving control-planelatency out
of the stop-and-copy phase.
I see, thanks.

Please add these into the cover letter of the next post. IMHO it's
extremely important information to explain the real goal of thiswork. I
bet it is not expected for most people when reading the current cover
letter.

Then it could have nothing to do with iterative phase, am I right?

What are the data needed for the dest QEMU to start staging backend
configurations to the HWs underneath? Does dest QEMU already havethem in
the cmdlines?

Asking this because I want to know whether it can be done completely
without src QEMU at all, e.g. when dest QEMU starts.
If src QEMU's data is still needed, please also first considerprovidingsuch facility using an "early VMSD" if it is ever possible: feelfree to
refer to commit 3b95a71b22827d26178.
While it works for this series, it does not allow to resend the state
when the src device changes. For example, if the number of virtqueues
is modified.
Some explanation on "how sync number of vqueues helps downtime"would help.Not "it might preheat things", but exactly why, and how that differswhen
it's pure software, and when hardware will be involved.
By nvidia engineers to configure vqs (number, size, RSS, etc) takes
about ~200ms:
https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566...@nvidia.com/T/__;!!ACWV5N9M2RV99hQ!OQdf7sGaBlbXhcFHX7AC7HgYxvFljgwWlIgJCvMgWwFvPqMrAMbWqf0862zV5shIjaUvlrk54fLTK6uo2pA$
Adding Dragos here in case he can provide more details. Maybe the
numbers have changed though.

And I guess the difference with pure SW will always come down to PCI
communications, which assume it is slower than configuring the host SW
device in RAM or even CPU cache. But I admin that proper profiling is
needed before making those claims.

Jonah, can you print the time it takes to configure the vDPA device
with traces vs the time it takes to enable the dataplane of the
device? So we can get an idea of how much time we save with this.
Let me know if this isn't what you're looking for.

I'm assuming by "configuration time" you mean:
 - Time from device startup (entry to vhost_vdpa_dev_start()) to right
   before we start enabling the vrings (e.g.
   VHOST_VDPA_SET_VRING_ENABLE in vhost_vdpa_net_cvq_load()).

And by "time taken to enable the dataplane" I'm assuming you mean:
 - Time right before we start enabling the vrings (see above) to right
   after we enable the last vring (at the end of
   vhost_vdpa_net_cvq_load())

Guest specs: 128G Mem, SVQ=on, CVQ=on, 8 queue pairs:

I guess what Eugenio may want to see is the config with SVQ=off (i.e.without x-svq=on in below netdev line). Do you have number for that aswell? Then since vhost_vdpa_dev_start() it should exclude the time forpinning, you could easily profile/measure vq configure time (the CVQcommands to configure vq number, size, RSS, etc) vs dataplaneenablement, same way as you did for SVQ=on.


Regards,
-Siwei


-netdev type=vhost-vdpa,vhostdev=$VHOST_VDPA_0,id=vhost-vdpa0,
        queues=8,x-svq=on

-device virtio-net-pci,netdev=vhost-vdpa0,id=vdpa0,bootindex=-1,
        romfile=,page-per-vq=on,mac=$VF1_MAC,ctrl_vq=on,mq=on,
        ctrl_vlan=off,vectors=18,host_mtu=9000,
        disable-legacy=on,disable-modern=off

---

Configuration time:    ~31s
Dataplane enable time: ~0.14ms

If it's only about pre-heat, could dest qemu preheat with max num of
vqueues?  Is it the same cost of downtime when growing num of queues,
v.s. shrinking num of queues?


Well you need to send the vq addresses and properties to preheat
these. If the address is invalid, the destination device will
interpret the vq address as the avail ring, for example, and will read
an invalid avail idx.

For softwares, is it about memory transaction updates due to thevqueues?If so, have we investigated a more generic approach on memory side,likely
some form of continuation from Chuang's work I previously mentioned?


This work is very interesting, and most of the downtime was because of
memory pinning indeed. Thanks for bringing it up! But the downtime is
not caused for the individual vq memory config, but for pinning all
the guest's memory for the device to access to it.

I think it is worth exploring if it affects the downtime in the case
of HW. I don't see any reason to reject that series but lack of
reviews, isn't it?

Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration

Reply via email to