On Wed, Jan 28, 2026 at 05:32:26PM +0200, Avihai Horon wrote:
> 
> On 1/28/2026 4:49 PM, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Wed, Jan 28, 2026 at 12:03:46PM +0100, Cédric Le Goater wrote:
> > > +Peter, + Fabiano
> > Thanks, looks benign to me from migration POV. I have one question though,
> > and not sure if I'm the only one wondering..
> > 
> > > On 1/28/26 11:51, Avihai Horon wrote:
> > > > Currently, VFIO device migration event is sent after the device state
> > > > transition has been completed. However, it may be useful to additionally
> > > > send a "prepare" event before the state transition, to notify users that
> > > > it's about to happen.
> > > > 
> > > > For example, in some cases with heavy resource utilization, stopping the
> > > > VFIO device may take a long time. In time-sensitive scenarios, the
> > > > management application that consumes the event may be notified about the
> > > > state transition too late.
> > Could there be more elaborations on the problem?
> 
> Of course.
> 
> >    For example:
> > 
> > (1) What would the mgmt do when receiving the notification?  What would go
> >      wrong if the state notification will be very late?
> 
> In our case, upon receiving an event that the VFIO device is stopped (during
> migration switchover), the mgmt app prevents timeout of RDMA connections to
> the migrated VFIO device.
> This is needed because RDMA connections may have very low timeout, even a
> few tens of ms, which is far below the migration downtime we have.

Makes sense, thanks.

Could you be explicit on the "mgmt app"?  Is it libvirt, or something else?

When introducing a new API like this events, IMHO it would always be good
to explicitly state the consumers.

> 
> As I wrote in the commit message, if the VFIO device has a lot of resources
> it may take long time (even a few hundreds of ms) to stop it and in that
> case, by the time the event is sent (after the state transition), the RDMA
> connection can already timeout.
> This is an issue we actually experienced.
> 
> > 
> > (2) Why would a prepare message help this situation?
> 
> Sending the event before the state transition will allow the mgmt app to act
> on time, regardless of how long the VFIO state transition takes.

It might also be good to state explicitly on what is the planned work to be
done as "act on time".  Per my read until now it seems to be some mechanism
that some "mgmt app" would do to mask the RDMA timeout mechanism to avoid
RDMA retries and finally connection got torn down, but maybe I'm wrong.

> 
> > 
> > (3) Doc below says, the prepare message does not imply the event will be
> >      guaranteed to happen.  Would it confuse the mgmt?
> 
> The expectation is for the mgmt app to be robust and handle these kind of
> scenarios.

I wonder if there can be deterministic way of solving this problem rather
than allowing false positive reports. E.g. attaching one explicit message
to a 100% determined state transition that requires the rdma timeout
mechanism to be turned off.

It just seems still a bit weird to need a prepare event for every state
transition, even for e.g. RUNNING and RESUME - when talking about a
possible masking of rdma timeouts, it should really be the existing event
that matters for those (after device got fully recovered, should the mgmt
app re-enable timeout mechanisms).

I do not know VFIO state machine well, also not familiar with this specific
problem.  So please treat them as pure questions. Anyway, it'll be always
nice to attach some more information into the commit log IMHO.

Thanks,

> 
> Hope that clarifies the use case/need.
> 
> Thanks.
> 
> > 
> > Thanks,
> > 
> > > > To overcome this issue, send an additional "prepare" migration event
> > > > before the device state transition.
> > > > 
> > > > Signed-off-by: Avihai Horon <[email protected]>
> > > > ---
> > > >    qapi/vfio.json      | 33 +++++++++++++++++++++++++++++++++
> > > >    hw/vfio/migration.c | 18 +++++++++++++-----
> > > >    2 files changed, 46 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/qapi/vfio.json b/qapi/vfio.json
> > > > index a1a9c5b673..de41211f1d 100644
> > > > --- a/qapi/vfio.json
> > > > +++ b/qapi/vfio.json
> > > > @@ -66,3 +66,36 @@
> > > >          'qom-path': 'str',
> > > >          'device-state': 'QapiVfioMigrationState'
> > > >      } }
> > > > +
> > > > +##
> > > > +# @VFIO_MIGRATION_PREPARE:
> > > > +#
> > > > +# This event is emitted when a VFIO device migration state is about to
> > > > +# be changed.  Note that even if this event is received for state X,
> > > > +# the VFIO device may transition to a different state if the original
> > > > +# state transition to X failed.
> > > > +#
> > > > +# @device-id: The device's id, if it has one.
> > > > +#
> > > > +# @qom-path: The device's QOM path.
> > > > +#
> > > > +# @device-state: The new device migration state that is about to be
> > > > +#     changed.
> > > > +#
> > > > +# Since: 11.0
> > > > +#
> > > > +# .. qmp-example::
> > > > +#
> > > > +#     <- { "timestamp": { "seconds": 1713771323, "microseconds": 
> > > > 212268 },
> > > > +#          "event": "VFIO_MIGRATION_PREPARE",
> > > > +#          "data": {
> > > > +#              "device-id": "vfio_dev1",
> > > > +#              "qom-path": "/machine/peripheral/vfio_dev1",
> > > > +#              "device-state": "stop" } }
> > > > +##
> > > > +{ 'event': 'VFIO_MIGRATION_PREPARE',
> > > > +  'data': {
> > > > +      'device-id': 'str',
> > > > +      'qom-path': 'str',
> > > > +      'device-state': 'QapiVfioMigrationState'
> > > > +  } }
> > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > > > index b4695030c7..9f887c148f 100644
> > > > --- a/hw/vfio/migration.c
> > > > +++ b/hw/vfio/migration.c
> > > > @@ -90,9 +90,11 @@ mig_state_to_qapi_state(enum vfio_device_mig_state 
> > > > state)
> > > >        }
> > > >    }
> > > > -static void vfio_migration_send_event(VFIODevice *vbasedev)
> > > > +static void vfio_migration_send_event(VFIODevice *vbasedev,
> > > > +                                      enum vfio_device_mig_state state,
> > > > +                                      bool prep)
> > > >    {
> > > > -    VFIOMigration *migration = vbasedev->migration;
> > > > +    QapiVfioMigrationState qapi_state;
> > > >        DeviceState *dev = vbasedev->dev;
> > > >        g_autofree char *qom_path = NULL;
> > > >        Object *obj;
> > > > @@ -105,9 +107,13 @@ static void vfio_migration_send_event(VFIODevice 
> > > > *vbasedev)
> > > >        obj = vbasedev->ops->vfio_get_object(vbasedev);
> > > >        g_assert(obj);
> > > >        qom_path = object_get_canonical_path(obj);
> > > > +    qapi_state = mig_state_to_qapi_state(state);
> > > > -    qapi_event_send_vfio_migration(
> > > > -        dev->id, qom_path, 
> > > > mig_state_to_qapi_state(migration->device_state));
> > > > +    if (prep) {
> > > > +        qapi_event_send_vfio_migration_prepare(dev->id, qom_path, 
> > > > qapi_state);
> > > > +    } else {
> > > > +        qapi_event_send_vfio_migration(dev->id, qom_path, qapi_state);
> > > > +    }
> > > >    }
> > > >    static void vfio_migration_set_device_state(VFIODevice *vbasedev,
> > > > @@ -119,7 +125,7 @@ static void 
> > > > vfio_migration_set_device_state(VFIODevice *vbasedev,
> > > >                                              mig_state_to_str(state));
> > > >        migration->device_state = state;
> > > > -    vfio_migration_send_event(vbasedev);
> > > > +    vfio_migration_send_event(vbasedev, state, false);
> > > >    }
> > > >    int vfio_migration_set_state(VFIODevice *vbasedev,
> > > > @@ -146,6 +152,8 @@ int vfio_migration_set_state(VFIODevice *vbasedev,
> > > >            return 0;
> > > >        }
> > > > +    vfio_migration_send_event(vbasedev, new_state, true);
> > > > +
> > > >        feature->argsz = sizeof(buf);
> > > >        feature->flags =
> > > >            VFIO_DEVICE_FEATURE_SET | 
> > > > VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
> > --
> > Peter Xu
> > 
> 

-- 
Peter Xu


Reply via email to