On Thu, Oct 23, 2025 at 2:44 AM Alexandr Moshkov
<[email protected]> wrote:
>
>
> On 10/23/25 02:33, Raphael Norwitz wrote:
>
> On Wed, Oct 22, 2025 at 3:59 AM Alexandr Moshkov
> <[email protected]> wrote:
>
> Hi!
>
> On 10/21/25 23:54, Raphael Norwitz wrote:
>
> The logic looks ok from the vhost-user-blk side but some comments inline.
>
> On Mon, Oct 20, 2025 at 1:47 AM Alexandr Moshkov
> <[email protected]> wrote:
>
> Hi!
>
> During inter-host migration, waiting for disk requests to be drained
> in the vhost-user backend can incur significant downtime.
>
> This can be avoided if QEMU migrates the inflight region in vhost-user-blk.
> Thus, during the qemu migration, the vhost-user backend can cancel all 
> inflight requests and
> then, after migration, they will be executed on another host.
>
> At first, I tried to implement migration for all vhost-user devices that 
> support inflight at once,
> but this would require a lot of changes both in vhost-user-blk (to transfer 
> it to the base class) and
> in the vhost-user-base base class (inflight implementation and remodeling + a 
> large refactor).
>
> Even if it's a more significant change I'd rather generalize as much
> logic as possible and expose it as a vhost-user protocol feature. IMO
> too much vhost-user device-agnositic code is being pushed into
> vhost-user-blk.
>
> As far as I understand (correct me if I'm wrong), but this feature must be 
> implemented in device itself (along with inflight field location) or in some 
> base class for vhost-user devices. For now vhost-user-blk doesn't have one 
> yet. The closest base class that i could find is vhost-user-base, but for 
> using it, it has to implement all device-agnostic code from vhost-user-blk, 
> and don't break all existing vhost-user-base derived devices. For example, to 
> support inflight migration in base class, there first need to implement 
> inflight field in it, along with the reconnect feature. Then, with a big 
> refactor, somehow inherit vhost-user-blk from it, along with other devices 
> such as vhost-user-fs.
>
> So, IMO it looks like a whole separate track that will need to be tackled.
>
> I get that inflight is inside of struct VhostUserBlk, not in struct
> vhost_user. Even if not all device types support inflight I think that
> could probably be changed? If we do keep things as you have it here
> patch 1 should probably bring the vhost_dev_load_inflight() and
> vhost_dev_save_inflight() into vhost_user.c rather than vhost.c since
> only vhost_user devices seem to use inflight.
>
> Firstly, I tried to implement inflight in the vhost_user structure, but this 
> structure is initialized on every vhost_user_blk_connect() and then destroyed 
> on every vhost_user_blk_disconnect(). But, the inflight region must survive 
> device disconnect.
>
> During active migration, the vhost_user struct will be destroyed before the 
> inflight saving.
>

In struct vhost we should probably add some data which persists
between reconnects. I take your point that we cannot generalize
inflight migration without doing that first.

> And you are right about the location of the functions in patch 1. I will move 
> them to the vhost-user.c file, which will be more appropriate. Thanks!
>
> As Markus noted this also conflicts significantly with Vladimir's
> series so I'd suggest waiting until those are in, or possibly
> attempting to generalize on top of his changes.
>
> Yea, but i think, i just need to perform inflight migration only when some 
> non-local migration flag is set.
>
> I'm not sure about that. While it may be completely safe in your case
> to avoid GET_VRING_BASE/quiescing the backend, it could introduce
> corruption cases on other vhost-user-blk storage backends which work
> differently. I think this would require an "opt-in" migration
> parameter over and above checking that it's not a local migration.
>
> In patch 2 I introduce migration capability for that. Or you think that we 
> need something like per-device parameter?
>

My point here was that you will definitely need a parameter you have.
I'll leave it to others to say if we actually need a per-device
parameter.

> Ideally there would be some way to have the vhost-user backends expose
> support per device with a protocol feature and, if the source and
> destination both advertise support, we perform the inflight migration
> skipping GET_VRING_BASE. I don't know enough about the migration code
> to know how that would be done or if it is practical.
>
> Also (and I think this also applies to Vladimir's series), it should
> be possible to add multiple vhost-user-blk devices to the same Qemu
> instance but have the devices backed by very different backend
> implementations. What happens if some of those backend implementations
> support things like lightweight local migration and/or inflight
> migration but others don't? Ideally you would want the API to be able
> to select target vhost-user-blk devices for accelerated migration
> rather than assuming they will all want to use the same features. I
> don't know if anyone is doing that or if we care about such cases but
> I just want to throw the concern out there.
>
> Hmm, that looks like we need some per-device parameter in order to set this 
> feature on for them.
>
> I guess we need a opinion from migration maintainer about such cases

Reply via email to