> -----Original Message----- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Tuesday, April 17, 2018 4:23 AM > To: Kirti Wankhede <kwankh...@nvidia.com> > Cc: Zhang, Yulei <yulei.zh...@intel.com>; qemu-devel@nongnu.org; Tian, > Kevin <kevin.t...@intel.com>; joonas.lahti...@linux.intel.com; > zhen...@linux.intel.com; Wang, Zhi A <zhi.a.w...@intel.com>; > dgilb...@redhat.com; quint...@redhat.com > Subject: Re: [RFC PATCH V4 2/4] vfio: Add vm status change callback to > stop/restart the mdev device > > On Mon, 16 Apr 2018 20:14:27 +0530 > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > On 4/10/2018 11:32 AM, Yulei Zhang wrote: > > > VM status change handler is added to change the vfio pci device > > > status during the migration, write the demanded device status > > > to the DEVICE STATUS subregion to stop the device on the source side > > > before fetch its status and start the deivce on the target side > > > after restore its status. > > > > > > Signed-off-by: Yulei Zhang <yulei.zh...@intel.com> > > > --- > > > hw/vfio/pci.c | 20 ++++++++++++++++++++ > > > include/hw/vfio/vfio-common.h | 1 + > > > linux-headers/linux/vfio.h | 6 ++++++ > > > roms/seabios | 2 +- > > > 4 files changed, 28 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > index f98a9dd..13d8c73 100644 > > > --- a/hw/vfio/pci.c > > > +++ b/hw/vfio/pci.c > > > @@ -38,6 +38,7 @@ > > > > > > static void vfio_disable_interrupts(VFIOPCIDevice *vdev); > > > static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); > > > +static void vfio_vm_change_state_handler(void *pv, int running, > RunState state); > > > > > > /* > > > * Disabling BAR mmaping can be slow, but toggling it around INTx can > > > @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error > **errp) > > > vfio_register_err_notifier(vdev); > > > vfio_register_req_notifier(vdev); > > > vfio_setup_resetfn_quirk(vdev); > > > + > qemu_add_vm_change_state_handler(vfio_vm_change_state_handler, > vdev); > > > > > > return; > > > > > > @@ -2982,6 +2984,24 @@ post_reset: > > > vfio_pci_post_reset(vdev); > > > } > > > > > > +static void vfio_vm_change_state_handler(void *pv, int running, > RunState state) > > > +{ > > > + VFIOPCIDevice *vdev = pv; > > > + VFIODevice *vbasedev = &vdev->vbasedev; > > > + uint8_t dev_state; > > > + uint8_t sz = 1; > > > + > > > + dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP; > > > + > > > + if (pwrite(vdev->vbasedev.fd, &dev_state, > > > + sz, vdev->device_state.offset) != sz) { > > > + error_report("vfio: Failed to %s device", running ? "start" : > > > "stop"); > > > + return; > > > + } > > > + > > > + vbasedev->device_state = dev_state; > > > +} > > > + > > > > Is it expected to trap device_state region by vendor driver? > > Can this information be communicated to vendor driver through an ioctl? > > Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would > be providing REGION_INFO for this region, so the vendor driver is > already in full control here using existing ioctls. I don't see that > we need new ioctls, we just need to fully define the API of the > proposed regions here. > If the device state region is mmaped, we may not be able to use region device state offset to convey the running state. It may need a new ioctl to set the device state.
> > Here only device state is conveyed to vendor driver but knowing > > 'RunState' in vendor driver is very useful and vendor driver can take > > necessary action accordingly like RUN_STATE_PAUSED indicating that VM > is > > in paused state, similarly RUN_STATE_SUSPENDED, > > RUN_STATE_FINISH_MIGRATE, RUN_STATE_IO_ERROR. If these states are > > handled properly, all the cases can be supported with same interface > > like VM suspend-resume, VM pause-restore. > > I agree, but let's remember that we're talking about device state, not > VM state. vfio is a userspace device interface, not specifically a > virtual machine interface, so states should be in terms of the device. > The API of this region needs to be clearly defined and using only 1 > byte at the start of the region is not very forward looking. Thanks, > > Alex > > > > static void vfio_instance_init(Object *obj) > > > { > > > PCIDevice *pci_dev = PCI_DEVICE(obj); > > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- > common.h > > > index f3a2ac9..9c14a8f 100644 > > > --- a/include/hw/vfio/vfio-common.h > > > +++ b/include/hw/vfio/vfio-common.h > > > @@ -125,6 +125,7 @@ typedef struct VFIODevice { > > > unsigned int num_irqs; > > > unsigned int num_regions; > > > unsigned int flags; > > > + bool device_state; > > > } VFIODevice; > > > > > > struct VFIODeviceOps { > > > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h > > > index e3380ad..8f02f2f 100644 > > > --- a/linux-headers/linux/vfio.h > > > +++ b/linux-headers/linux/vfio.h > > > @@ -304,6 +304,12 @@ struct vfio_region_info_cap_type { > > > /* Mdev sub-type for device state save and restore */ > > > #define VFIO_REGION_SUBTYPE_DEVICE_STATE (4) > > > > > > +/* Offset in region to save device state */ > > > +#define VFIO_DEVICE_STATE_OFFSET 1 > > > + > > > +#define VFIO_DEVICE_START 0 > > > +#define VFIO_DEVICE_STOP 1 > > > + > > > /** > > > * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9, > > > * struct vfio_irq_info) > > > diff --git a/roms/seabios b/roms/seabios > > > index 63451fc..5f4c7b1 160000 > > > --- a/roms/seabios > > > +++ b/roms/seabios > > > @@ -1 +1 @@ > > > -Subproject commit 63451fca13c75870e1703eb3e20584d91179aebc > > > +Subproject commit 5f4c7b13cdf9c450eb55645f4362ea58fa61b79b > > >