On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy > > wrote: > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy > > > > wrote: > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host > > > > > devices), as > > > > > we are moving to another host. So, we don't enable > > > > > "backend-transfer". We don't > > > > > transfer the backend, we have to initialize new backend on another > > > > > host. > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and > > > > > minimal > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device > > > > > state, etc) > > > > > as is. > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > device at time of each migration. > > > > > > > > This takes us back to the situation we've had historically where the > > > > behaviour of migration depends on global properties the mgmt app has > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > away from that model by passing everything as parameters to the > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > global state properties changing migration behaviour. > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > it is an indirect parameter to the 'migrate' command. > > > > I was not seeing it like that. > > > > I was treating per-device parameter to be a flag showing whether the device > > is capable of passing over FDs, which is more like a device attribute. > > > > Those things (after set by machine type) should never change, and the only > > thing to be changed is the global "backend-transfer" boolean that can be > > set in the "migrate" QMP command, and should be decided by the admin when > > one wants to initiate the migration process. > > > > > > > > > > Ergo, if we need the ability to selectively migrate the backend state > > > > of individal devices, then instead of a property on the device, we > > > > should pass a list of device IDs as a parameter to the migrate > > > > command in QMP. > > > > I doubt whether we would really need that in reality. > > > > Likely the admin should only worry about whether setting the global > > "backend-transfer", the admin may not even need to know which device, and > > how many devices, will be beneficial to this feature enabled. > > > > It just says, "we're doing local migration and via unix sockets, so > > whatever devices can try to reuse their backends if possible". > > An individual device can only use backend transfer if both the old and > new QEMU agree that it can be done. At the time we start the origin > QEMU we know which set of devices are capable of doing an outgoing > backend transfer, but we don't know what set of devices are capable > of doing an incoming backend transfer. > > If we don't have a per-device toggle at time of migration, then we > have to assume that the target QEMU can always support at least the > same set of incoming backends as the src QEMU outgoing backend. This > feels like a potentially risky assumption.
When using machine properties, these things should already be set by the machine types. E.g. if this is a new QEMU with an old machine type, we should have this per-device property set to OFF forever when booting the VM, and should keep it like that after any rounds of migrations. Because any VM using the old machine type _might_ be migrated back to an older QEMU that won't support it. So IIUC that strictly follows how we use versioned machine types. What Vladimir mentioned previously would be something very special, but indeed when there's no machine type versioning we may need to toggle this before each migration. However since upstream is following the machine type properties way of doing this since N years ago, do we need to worry about that? > > Another scenario is where you are doing a localhost migration as a > mechanism to let you change a device backend. In that case you'll > want to do a backend transfer of all devices, except the one that > you want to change. Right, this might be a real need if it exists. Said that, it's so special that I'm not sure whether the admin can easily migrate with global backend-transfer to OFF in this rare case. In general, I would prefer avoiding to introduce any form of list of devices into the migration system if ever possible. I agree if we must introduce that it should at least be a list of IDs rather than adhoc array of strings. However I still want to see whether we can completely avoid it. Thanks, -- Peter Xu
