On 10.10.25 02:28, Raphael Norwitz wrote:
Thanks for the detailed response here, it does clear up the intent.
I agree it's much better to keep the management layer from having to
make API calls back and forth to the backend so that the migration
looks like a reconnect from the backend's perspective. I'm not totally
clear on the fundamental reason why the management layer would have to
call out to the backend, as opposed to having the vhost-user code in
the backend figure out that it's a local migration when the new
destination QEMU tries to connect and respond accordingly.
Handling this in vhost-user-server without the management layer would
actually mean handling two connections in parallel. This doesn't seem
to fit well into the vhost-user protocol.
However, we already have this support (as we have live update for VMs
with vhost-user-blk) in the disk service by accepting a new connection
on an additional Unix socket servicing the same disk but in readonly
mode until the initial connection terminates. The problem isn't with
the separate socket itself, but with safely switching the disk backend
from one connection to another. We would have to perform this switch
regardless, even if we managed both connections within the context of a
single server or a single Unix socket. The only difference is that this
way, we might avoid communication from the management layer to the disk
service. Instead of saying, "Hey, disk service, we're going to migrate
this QEMU - prepare for an endpoint switch," we'd just proceed with the
migration, and the disk service would detect it when it sees a second
connection to the Unix socket.
But this extra communication isn't the real issue. The real challenge
is that we still have to switch between connections on the backend
side. And we have to account for the possible temporary unavailability
of the disk service (the migration freeze time would just include this
period of unavailability).
With this series, we're saying: "Hold on. We already have everything
working and set up—the backend is ready, the dataplane is out of QEMU,
and the control plane isn't doing anything. And we're migrating to the
same host. Why not just keep everything as is? Just pass the file
descriptors to the new QEMU process and continue execution."
This way, we make the QEMU live-update operation independent of the
disk service's lifecycle, which improves reliability. And we maintain
only one connection instead of two, making the model simpler.
This doesn't even account for the extra time spent reconfiguring the
connection. Setting up mappings isn't free and becomes more costly for
large VMs (with significant RAM), when using hugetlbfs, or when the
system is under memory pressure.
That said, I haven't followed the work here all that closely. If MST
or other maintainers have blessed this as the right way I'm ok with
it.
--
Best regards,
Vladimir