On Thu, Oct 23, 2025 at 2:44 AM Alexandr Moshkov <[email protected]> wrote: > > > On 10/23/25 02:33, Raphael Norwitz wrote: > > On Wed, Oct 22, 2025 at 3:59 AM Alexandr Moshkov > <[email protected]> wrote: > > Hi! > > On 10/21/25 23:54, Raphael Norwitz wrote: > > The logic looks ok from the vhost-user-blk side but some comments inline. > > On Mon, Oct 20, 2025 at 1:47 AM Alexandr Moshkov > <[email protected]> wrote: > > Hi! > > During inter-host migration, waiting for disk requests to be drained > in the vhost-user backend can incur significant downtime. > > This can be avoided if QEMU migrates the inflight region in vhost-user-blk. > Thus, during the qemu migration, the vhost-user backend can cancel all > inflight requests and > then, after migration, they will be executed on another host. > > At first, I tried to implement migration for all vhost-user devices that > support inflight at once, > but this would require a lot of changes both in vhost-user-blk (to transfer > it to the base class) and > in the vhost-user-base base class (inflight implementation and remodeling + a > large refactor). > > Even if it's a more significant change I'd rather generalize as much > logic as possible and expose it as a vhost-user protocol feature. IMO > too much vhost-user device-agnositic code is being pushed into > vhost-user-blk. > > As far as I understand (correct me if I'm wrong), but this feature must be > implemented in device itself (along with inflight field location) or in some > base class for vhost-user devices. For now vhost-user-blk doesn't have one > yet. The closest base class that i could find is vhost-user-base, but for > using it, it has to implement all device-agnostic code from vhost-user-blk, > and don't break all existing vhost-user-base derived devices. For example, to > support inflight migration in base class, there first need to implement > inflight field in it, along with the reconnect feature. Then, with a big > refactor, somehow inherit vhost-user-blk from it, along with other devices > such as vhost-user-fs. > > So, IMO it looks like a whole separate track that will need to be tackled. > > I get that inflight is inside of struct VhostUserBlk, not in struct > vhost_user. Even if not all device types support inflight I think that > could probably be changed? If we do keep things as you have it here > patch 1 should probably bring the vhost_dev_load_inflight() and > vhost_dev_save_inflight() into vhost_user.c rather than vhost.c since > only vhost_user devices seem to use inflight. > > Firstly, I tried to implement inflight in the vhost_user structure, but this > structure is initialized on every vhost_user_blk_connect() and then destroyed > on every vhost_user_blk_disconnect(). But, the inflight region must survive > device disconnect. > > During active migration, the vhost_user struct will be destroyed before the > inflight saving. >
In struct vhost we should probably add some data which persists between reconnects. I take your point that we cannot generalize inflight migration without doing that first. > And you are right about the location of the functions in patch 1. I will move > them to the vhost-user.c file, which will be more appropriate. Thanks! > > As Markus noted this also conflicts significantly with Vladimir's > series so I'd suggest waiting until those are in, or possibly > attempting to generalize on top of his changes. > > Yea, but i think, i just need to perform inflight migration only when some > non-local migration flag is set. > > I'm not sure about that. While it may be completely safe in your case > to avoid GET_VRING_BASE/quiescing the backend, it could introduce > corruption cases on other vhost-user-blk storage backends which work > differently. I think this would require an "opt-in" migration > parameter over and above checking that it's not a local migration. > > In patch 2 I introduce migration capability for that. Or you think that we > need something like per-device parameter? > My point here was that you will definitely need a parameter you have. I'll leave it to others to say if we actually need a per-device parameter. > Ideally there would be some way to have the vhost-user backends expose > support per device with a protocol feature and, if the source and > destination both advertise support, we perform the inflight migration > skipping GET_VRING_BASE. I don't know enough about the migration code > to know how that would be done or if it is practical. > > Also (and I think this also applies to Vladimir's series), it should > be possible to add multiple vhost-user-blk devices to the same Qemu > instance but have the devices backed by very different backend > implementations. What happens if some of those backend implementations > support things like lightweight local migration and/or inflight > migration but others don't? Ideally you would want the API to be able > to select target vhost-user-blk devices for accelerated migration > rather than assuming they will all want to use the same features. I > don't know if anyone is doing that or if we care about such cases but > I just want to throw the concern out there. > > Hmm, that looks like we need some per-device parameter in order to set this > feature on for them. > > I guess we need a opinion from migration maintainer about such cases
