Thanks for the extensive follow up here. I was hoping there would be
some way to move more of the logic into all vhost-user generic code
both to help other backends support local migration more easily and
have fewer "if backend is doing a local migration" checks in
vhost-user-blk code. As a straw man design, I would think there could
be some way of having the backend coordinate a handoff by signaling
the source Qemu and then the source Qemu could stop the device and ACK
with a message before the destination Qemu is allowed to start the
device.

Anyways, it seems like other maintainers have blessed this approach so
I'll leave it at that.

On Fri, Oct 10, 2025 at 4:47 AM Vladimir Sementsov-Ogievskiy
<[email protected]> wrote:
>
> On 10.10.25 02:28, Raphael Norwitz wrote:
> > Thanks for the detailed response here, it does clear up the intent.
> >
> > I agree it's much better to keep the management layer from having to
> > make API calls back and forth to the backend so that the migration
> > looks like a reconnect from the backend's perspective. I'm not totally
> > clear on the fundamental reason why the management layer would have to
> > call out to the backend, as opposed to having the vhost-user code in
> > the backend figure out that it's a local migration when the new
> > destination QEMU tries to connect and respond accordingly.
> >
>
> Handling this in vhost-user-server without the management layer would
> actually mean handling two connections in parallel. This doesn't seem
> to fit well into the vhost-user protocol.
>
> However, we already have this support (as we have live update for VMs
> with vhost-user-blk) in the disk service by accepting a new connection
> on an additional Unix socket servicing the same disk but in readonly
> mode until the initial connection terminates. The problem isn't with
> the separate socket itself, but with safely switching the disk backend
> from one connection to another. We would have to perform this switch
> regardless, even if we managed both connections within the context of a
> single server or a single Unix socket. The only difference is that this
> way, we might avoid communication from the management layer to the disk
> service. Instead of saying, "Hey, disk service, we're going to migrate
> this QEMU - prepare for an endpoint switch," we'd just proceed with the
> migration, and the disk service would detect it when it sees a second
> connection to the Unix socket.
>
> But this extra communication isn't the real issue. The real challenge
> is that we still have to switch between connections on the backend
> side. And we have to account for the possible temporary unavailability
> of the disk service (the migration freeze time would just include this
> period of unavailability).
>
> With this series, we're saying: "Hold on. We already have everything
> working and set up—the backend is ready, the dataplane is out of QEMU,
> and the control plane isn't doing anything. And we're migrating to the
> same host. Why not just keep everything as is? Just pass the file
> descriptors to the new QEMU process and continue execution."
>
> This way, we make the QEMU live-update operation independent of the
> disk service's lifecycle, which improves reliability. And we maintain
> only one connection instead of two, making the model simpler.
>
> This doesn't even account for the extra time spent reconfiguring the
> connection. Setting up mappings isn't free and becomes more costly for
> large VMs (with significant RAM), when using hugetlbfs, or when the
> system is under memory pressure.
>
>
> > That said, I haven't followed the work here all that closely. If MST
> > or other maintainers have blessed this as the right way I'm ok with
> > it.
> >
>
>
>
> --
> Best regards,
> Vladimir

Reply via email to