On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy 
> > > > wrote:
> > > > > 1. Remote migration: we can't reuse backends (files, sockets, host 
> > > > > devices), as
> > > > > we are moving to another host. So, we don't enable 
> > > > > "backend-transfer". We don't
> > > > > transfer the backend, we have to initialize new backend on another 
> > > > > host.
> > > > > 
> > > > > 2. Local migration to update QEMU, with minimal freeze-time and 
> > > > > minimal
> > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device 
> > > > > state, etc)
> > > > > as is.
> > > > > 
> > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > to another backend. We disable "backend-transfer" for one device.
> > > > 
> > > > This implies that you're changing 'backend-transfer' against the
> > > > device at time of each migration.
> > > > 
> > > > This takes us back to the situation we've had historically where the
> > > > behaviour of migration depends on global properties the mgmt app has
> > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > away from that model by passing everything as parameters to the
> > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > global state properties changing migration behaviour.
> > > > 
> > > > This 'backend-transfer' device property is not really a device property,
> > > > it is an indirect parameter to the 'migrate' command.
> > 
> > I was not seeing it like that.
> > 
> > I was treating per-device parameter to be a flag showing whether the device
> > is capable of passing over FDs, which is more like a device attribute.
> > 
> > Those things (after set by machine type) should never change, and the only
> > thing to be changed is the global "backend-transfer" boolean that can be
> > set in the "migrate" QMP command, and should be decided by the admin when
> > one wants to initiate the migration process.
> > 
> > > > 
> > > > Ergo, if we need the ability to selectively migrate the backend state
> > > > of individal devices, then instead of a property on the device, we
> > > > should pass a list of device IDs as a parameter to the migrate
> > > > command in QMP.
> > 
> > I doubt whether we would really need that in reality.
> > 
> > Likely the admin should only worry about whether setting the global
> > "backend-transfer", the admin may not even need to know which device, and
> > how many devices, will be beneficial to this feature enabled.
> > 
> > It just says, "we're doing local migration and via unix sockets, so
> > whatever devices can try to reuse their backends if possible".
> 
> An individual device can only use backend transfer if both the old and
> new QEMU agree that it can be done. At the time we start the origin
> QEMU we know which set of devices are capable of doing an outgoing
> backend transfer, but we don't know what set of devices are capable
> of doing an incoming backend transfer.
> 
> If we don't have a per-device toggle at time of migration, then we
> have to assume that the target QEMU can always support at least the
> same set of incoming backends as the src QEMU outgoing backend. This
> feels like a potentially risky assumption.

When using machine properties, these things should already be set by the
machine types.

E.g. if this is a new QEMU with an old machine type, we should have this
per-device property set to OFF forever when booting the VM, and should keep
it like that after any rounds of migrations.  Because any VM using the old
machine type _might_ be migrated back to an older QEMU that won't support
it.  So IIUC that strictly follows how we use versioned machine types.

What Vladimir mentioned previously would be something very special, but
indeed when there's no machine type versioning we may need to toggle this
before each migration.  However since upstream is following the machine
type properties way of doing this since N years ago, do we need to worry
about that?

> 
> Another scenario is where you are doing a localhost migration as a
> mechanism to let you change a device backend. In that case you'll
> want to do a backend transfer of all devices, except the one that
> you want to change.

Right, this might be a real need if it exists.  Said that, it's so special
that I'm not sure whether the admin can easily migrate with global
backend-transfer to OFF in this rare case.

In general, I would prefer avoiding to introduce any form of list of
devices into the migration system if ever possible.  I agree if we must
introduce that it should at least be a list of IDs rather than adhoc array
of strings.  However I still want to see whether we can completely avoid
it.

Thanks,

-- 
Peter Xu


Reply via email to