Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/12/2 18:55, Dr. David Alan Gilbert wrote: > * Shenming Lu (lushenm...@huawei.com) wrote: >> Hi, >> >> After reading everyone's opinions, we have a rough idea for this issue. >> >> One key point is whether it is necessary to setup the config space before >> the device can accept further migration data. I think it is decided by >> the vendor driver, so we can simply ask the vendor driver about it in >> .save_setup, which could avoid a lot of unnecessary copies and settings. >> Once we have known the need, we can iterate the config space (before) >> along with the device migration data in .save_live_iterate and >> .save_live_complete_precopy, and if not needed, we can only migrate the >> config space in .save_state. >> >> Another key point is that the interrupt enabling should be after the >> restoring of the interrupt controller (might not only interrupts). >> My solution is to add a subflag at the beginning of the config data >> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered >> actions on the dst (such as whether to enable interrupts). >> >> Below is it's workflow. >> >> On the save path: >> In vfio_save_setup(): >> Ask the vendor driver if it needs the config space setup before it >> can accept further migration data. > > You could argue that you could always send an initial copy of the config > data at the start; and then send new versions at the end anyway. > But is this just about interrupts? In the current implementation, the effect of the config space is mainly about interrupts (SET_IRQS ioctl). > Is the dst going to interpret the > received migration data differently depending on the config data? I also have a little doubt about the dependency... But not sure. > That > would seem dangerous if the config data was to change during the > migration. > > Dave > >> | >> In vfio_save_iterate() (pre-copy): >> If *needed*, save the config space which would be setup on the dst >> before the migration data, but send with a subflag to instruct not >> to (such as) enable interrupts. >> | >> In vfio_save_complete_precopy() (stop-and-copy, iterable process): >> The same as that in vfio_save_iterate(). >> | >> In .save_state (stop-and-copy, non-iterable process): >> If *needed*, only send a subflag to instruct to enable interrupts. >> If *not needed*, save the config space and setup everything on the dst. >> >> Besides the above idea, we might be able to choose to let the vendor driver >> do >> more: qemu just sends and writes the config data (before) along with the >> device >> migration data every time, and it's up to the vendor driver to filter >> out/buffer >> the received data or reorder the settings... >> >> Thanks, >> Shenming >>
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/12/2 6:21, Alex Williamson wrote: > On Tue, 1 Dec 2020 14:37:52 +0800 > Shenming Lu wrote: > >> On 2020/12/1 1:03, Alex Williamson wrote: >>> On Thu, 26 Nov 2020 14:56:17 +0800 >>> Shenming Lu wrote: >>> Hi, After reading everyone's opinions, we have a rough idea for this issue. One key point is whether it is necessary to setup the config space before the device can accept further migration data. I think it is decided by the vendor driver, so we can simply ask the vendor driver about it in .save_setup, which could avoid a lot of unnecessary copies and settings. Once we have known the need, we can iterate the config space (before) along with the device migration data in .save_live_iterate and .save_live_complete_precopy, and if not needed, we can only migrate the config space in .save_state. Another key point is that the interrupt enabling should be after the restoring of the interrupt controller (might not only interrupts). My solution is to add a subflag at the beginning of the config data (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered actions on the dst (such as whether to enable interrupts). Below is it's workflow. On the save path: In vfio_save_setup(): Ask the vendor driver if it needs the config space setup before it can accept further migration data. >>> >>> How does "ask the vendor driver" actually work? >> >> Maybe via a ioctl? >> Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup) >> and send the config data (before) along with the device migration data >> every time?... > > Migration data streams are unidirectional, otherwise we break offline > migration. There are various ways other than an ioctl for a vendor > driver to advertise requirements, ex. a capability on the migration > region info, but it's not clear to me that we really need this info yet. Ok, this is not clear yet. > | In vfio_save_iterate() (pre-copy): If *needed*, save the config space which would be setup on the dst before the migration data, but send with a subflag to instruct not to (such as) enable interrupts. >>> >>> If not for triggering things like MSI/X configuration, isn't config >>> space almost entirely virtual? What visibility does the vendor driver >>> have to the VM machine dependencies regarding device interrupt versus >>> interrupt controller migration? >> >> My thought is that the vendor driver only decides the order of the config >> space setup and the receiving of the migration data, but leaves the VM >> machine dependencies to QEMU. > > But again, config space is largely virtual, the vendor driver doesn't > have access to it, it's only the effects of config space, like > representing the interrupt mode and configuring it on the device or > specific registers that QEMU writes through that the vendor driver > sees. So how is it that the vendor driver decides the order? As you said above, a vendor driver could advertise its requirements via the migration region... > The > vendor driver doesn't have visibility to the VM machine dependencies, > like when the interrupt controller is sufficiently configured to enable > device interrupts. It seems that a vendor driver that depends on QEMU > enabling interrupts at a specific point is inherently dependent on > assumptions in the machine configuration. Yeah, there might be more than one dependency. > | In vfio_save_complete_precopy() (stop-and-copy, iterable process): The same as that in vfio_save_iterate(). | In .save_state (stop-and-copy, non-iterable process): If *needed*, only send a subflag to instruct to enable interrupts. If *not needed*, save the config space and setup everything on the dst. >>> >>> Again, how does the vendor driver have visibility to know when the VM >>> machine can enable interrupts? >> >> It seems troubling if the vendor driver needs the interrupts to be enabled >> first... > > Yes. > Besides the above idea, we might be able to choose to let the vendor driver do more: qemu just sends and writes the config data (before) along with the device migration data every time, and it's up to the vendor driver to filter out/buffer the received data or reorder the settings... >>> >>> There is no vendor driver in QEMU though, so are you suggesting that >>> QEMU follows a standard protocol and the vendor driver chooses when to >>> enable specific features? For instance, QEMU would call SET_IRQS and >>> the driver would return success, but defer that setup if necessary? >>> That seems quite troubling as we then have ioctls that behave >>> differently depending on the device state and we have no error path to >>> userspace should that setup fail later. The vendor driver
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
* Shenming Lu (lushenm...@huawei.com) wrote: > Hi, > > After reading everyone's opinions, we have a rough idea for this issue. > > One key point is whether it is necessary to setup the config space before > the device can accept further migration data. I think it is decided by > the vendor driver, so we can simply ask the vendor driver about it in > .save_setup, which could avoid a lot of unnecessary copies and settings. > Once we have known the need, we can iterate the config space (before) > along with the device migration data in .save_live_iterate and > .save_live_complete_precopy, and if not needed, we can only migrate the > config space in .save_state. > > Another key point is that the interrupt enabling should be after the > restoring of the interrupt controller (might not only interrupts). > My solution is to add a subflag at the beginning of the config data > (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered > actions on the dst (such as whether to enable interrupts). > > Below is it's workflow. > > On the save path: > In vfio_save_setup(): > Ask the vendor driver if it needs the config space setup before it > can accept further migration data. You could argue that you could always send an initial copy of the config data at the start; and then send new versions at the end anyway. But is this just about interrupts? Is the dst going to interpret the received migration data differently depending on the config data? That would seem dangerous if the config data was to change during the migration. Dave > | > In vfio_save_iterate() (pre-copy): > If *needed*, save the config space which would be setup on the dst > before the migration data, but send with a subflag to instruct not > to (such as) enable interrupts. > | > In vfio_save_complete_precopy() (stop-and-copy, iterable process): > The same as that in vfio_save_iterate(). > | > In .save_state (stop-and-copy, non-iterable process): > If *needed*, only send a subflag to instruct to enable interrupts. > If *not needed*, save the config space and setup everything on the dst. > > Besides the above idea, we might be able to choose to let the vendor driver do > more: qemu just sends and writes the config data (before) along with the > device > migration data every time, and it's up to the vendor driver to filter > out/buffer > the received data or reorder the settings... > > Thanks, > Shenming > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Tue, 1 Dec 2020 14:37:52 +0800 Shenming Lu wrote: > On 2020/12/1 1:03, Alex Williamson wrote: > > On Thu, 26 Nov 2020 14:56:17 +0800 > > Shenming Lu wrote: > > > >> Hi, > >> > >> After reading everyone's opinions, we have a rough idea for this issue. > >> > >> One key point is whether it is necessary to setup the config space before > >> the device can accept further migration data. I think it is decided by > >> the vendor driver, so we can simply ask the vendor driver about it in > >> .save_setup, which could avoid a lot of unnecessary copies and settings. > >> Once we have known the need, we can iterate the config space (before) > >> along with the device migration data in .save_live_iterate and > >> .save_live_complete_precopy, and if not needed, we can only migrate the > >> config space in .save_state. > >> > >> Another key point is that the interrupt enabling should be after the > >> restoring of the interrupt controller (might not only interrupts). > >> My solution is to add a subflag at the beginning of the config data > >> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered > >> actions on the dst (such as whether to enable interrupts). > >> > >> Below is it's workflow. > >> > >> On the save path: > >>In vfio_save_setup(): > >>Ask the vendor driver if it needs the config space setup before it > >>can accept further migration data. > > > > How does "ask the vendor driver" actually work? > > Maybe via a ioctl? > Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup) > and send the config data (before) along with the device migration data > every time?... Migration data streams are unidirectional, otherwise we break offline migration. There are various ways other than an ioctl for a vendor driver to advertise requirements, ex. a capability on the migration region info, but it's not clear to me that we really need this info yet. > >>| > >>In vfio_save_iterate() (pre-copy): > >>If *needed*, save the config space which would be setup on the dst > >>before the migration data, but send with a subflag to instruct not > >>to (such as) enable interrupts. > > > > If not for triggering things like MSI/X configuration, isn't config > > space almost entirely virtual? What visibility does the vendor driver > > have to the VM machine dependencies regarding device interrupt versus > > interrupt controller migration? > > My thought is that the vendor driver only decides the order of the config > space setup and the receiving of the migration data, but leaves the VM > machine dependencies to QEMU. But again, config space is largely virtual, the vendor driver doesn't have access to it, it's only the effects of config space, like representing the interrupt mode and configuring it on the device or specific registers that QEMU writes through that the vendor driver sees. So how is it that the vendor driver decides the order? The vendor driver doesn't have visibility to the VM machine dependencies, like when the interrupt controller is sufficiently configured to enable device interrupts. It seems that a vendor driver that depends on QEMU enabling interrupts at a specific point is inherently dependent on assumptions in the machine configuration. > >>| > >>In vfio_save_complete_precopy() (stop-and-copy, iterable process): > >>The same as that in vfio_save_iterate(). > >>| > >>In .save_state (stop-and-copy, non-iterable process): > >>If *needed*, only send a subflag to instruct to enable interrupts. > >>If *not needed*, save the config space and setup everything on the dst. > >> > > > > Again, how does the vendor driver have visibility to know when the VM > > machine can enable interrupts? > > It seems troubling if the vendor driver needs the interrupts to be enabled > first... Yes. > >> Besides the above idea, we might be able to choose to let the vendor > >> driver do > >> more: qemu just sends and writes the config data (before) along with the > >> device > >> migration data every time, and it's up to the vendor driver to filter > >> out/buffer > >> the received data or reorder the settings... > > > > There is no vendor driver in QEMU though, so are you suggesting that > > QEMU follows a standard protocol and the vendor driver chooses when to > > enable specific features? For instance, QEMU would call SET_IRQS and > > the driver would return success, but defer that setup if necessary? > > That seems quite troubling as we then have ioctls that behave > > differently depending on the device state and we have no error path to > > userspace should that setup fail later. The vendor driver does have > > its own data stream for migration, so the vendor driver could tell the > > destination version of itself what type of interrupt to use, which > > might be sufficient if we were to ignore the latency if QEMU were to > > defer interrupt setup until stop-and-copy.
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/12/1 1:03, Alex Williamson wrote: > On Thu, 26 Nov 2020 14:56:17 +0800 > Shenming Lu wrote: > >> Hi, >> >> After reading everyone's opinions, we have a rough idea for this issue. >> >> One key point is whether it is necessary to setup the config space before >> the device can accept further migration data. I think it is decided by >> the vendor driver, so we can simply ask the vendor driver about it in >> .save_setup, which could avoid a lot of unnecessary copies and settings. >> Once we have known the need, we can iterate the config space (before) >> along with the device migration data in .save_live_iterate and >> .save_live_complete_precopy, and if not needed, we can only migrate the >> config space in .save_state. >> >> Another key point is that the interrupt enabling should be after the >> restoring of the interrupt controller (might not only interrupts). >> My solution is to add a subflag at the beginning of the config data >> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered >> actions on the dst (such as whether to enable interrupts). >> >> Below is it's workflow. >> >> On the save path: >> In vfio_save_setup(): >> Ask the vendor driver if it needs the config space setup before it >> can accept further migration data. > > How does "ask the vendor driver" actually work? Maybe via a ioctl? Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup) and send the config data (before) along with the device migration data every time?... > >> | >> In vfio_save_iterate() (pre-copy): >> If *needed*, save the config space which would be setup on the dst >> before the migration data, but send with a subflag to instruct not >> to (such as) enable interrupts. > > If not for triggering things like MSI/X configuration, isn't config > space almost entirely virtual? What visibility does the vendor driver > have to the VM machine dependencies regarding device interrupt versus > interrupt controller migration? My thought is that the vendor driver only decides the order of the config space setup and the receiving of the migration data, but leaves the VM machine dependencies to QEMU. > >> | >> In vfio_save_complete_precopy() (stop-and-copy, iterable process): >> The same as that in vfio_save_iterate(). >> | >> In .save_state (stop-and-copy, non-iterable process): >> If *needed*, only send a subflag to instruct to enable interrupts. >> If *not needed*, save the config space and setup everything on the dst. > > Again, how does the vendor driver have visibility to know when the VM > machine can enable interrupts? It seems troubling if the vendor driver needs the interrupts to be enabled first... > >> >> Besides the above idea, we might be able to choose to let the vendor driver >> do >> more: qemu just sends and writes the config data (before) along with the >> device >> migration data every time, and it's up to the vendor driver to filter >> out/buffer >> the received data or reorder the settings... > > There is no vendor driver in QEMU though, so are you suggesting that > QEMU follows a standard protocol and the vendor driver chooses when to > enable specific features? For instance, QEMU would call SET_IRQS and > the driver would return success, but defer that setup if necessary? > That seems quite troubling as we then have ioctls that behave > differently depending on the device state and we have no error path to > userspace should that setup fail later. The vendor driver does have > its own data stream for migration, so the vendor driver could tell the > destination version of itself what type of interrupt to use, which > might be sufficient if we were to ignore the latency if QEMU were to > defer interrupt setup until stop-and-copy. Did you mean that we could only enable MSI-X during the iterable phase, but leave the setup of these unmasked vectors to the non-iterable phase? It looks good to me. > > Is the question of when to setup device interrupts versus the interrupt > controller state largely a machine issue within QEMU? If so, shouldn't > it be at QEMU's determination when to act on the config space > information on the target? I think it would be simpler if ensuring the proper calling order in QEMU... > IOW, if a vendor driver has a dependency on > interrupt configuration, they need to include it in their own pre-copy > data stream and decouple that dependency from userspace interrupt > configuration via the SET_IRQS ioctl. Is that possible? Thanks, > I don't understand what the decoupling that dependency from userspace interrupt configuration means... Thanks, Shenming > Alex > > . >
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Thu, 26 Nov 2020 14:56:17 +0800 Shenming Lu wrote: > Hi, > > After reading everyone's opinions, we have a rough idea for this issue. > > One key point is whether it is necessary to setup the config space before > the device can accept further migration data. I think it is decided by > the vendor driver, so we can simply ask the vendor driver about it in > .save_setup, which could avoid a lot of unnecessary copies and settings. > Once we have known the need, we can iterate the config space (before) > along with the device migration data in .save_live_iterate and > .save_live_complete_precopy, and if not needed, we can only migrate the > config space in .save_state. > > Another key point is that the interrupt enabling should be after the > restoring of the interrupt controller (might not only interrupts). > My solution is to add a subflag at the beginning of the config data > (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered > actions on the dst (such as whether to enable interrupts). > > Below is it's workflow. > > On the save path: > In vfio_save_setup(): > Ask the vendor driver if it needs the config space setup before it > can accept further migration data. How does "ask the vendor driver" actually work? > | > In vfio_save_iterate() (pre-copy): > If *needed*, save the config space which would be setup on the dst > before the migration data, but send with a subflag to instruct not > to (such as) enable interrupts. If not for triggering things like MSI/X configuration, isn't config space almost entirely virtual? What visibility does the vendor driver have to the VM machine dependencies regarding device interrupt versus interrupt controller migration? > | > In vfio_save_complete_precopy() (stop-and-copy, iterable process): > The same as that in vfio_save_iterate(). > | > In .save_state (stop-and-copy, non-iterable process): > If *needed*, only send a subflag to instruct to enable interrupts. > If *not needed*, save the config space and setup everything on the dst. Again, how does the vendor driver have visibility to know when the VM machine can enable interrupts? > > Besides the above idea, we might be able to choose to let the vendor driver do > more: qemu just sends and writes the config data (before) along with the > device > migration data every time, and it's up to the vendor driver to filter > out/buffer > the received data or reorder the settings... There is no vendor driver in QEMU though, so are you suggesting that QEMU follows a standard protocol and the vendor driver chooses when to enable specific features? For instance, QEMU would call SET_IRQS and the driver would return success, but defer that setup if necessary? That seems quite troubling as we then have ioctls that behave differently depending on the device state and we have no error path to userspace should that setup fail later. The vendor driver does have its own data stream for migration, so the vendor driver could tell the destination version of itself what type of interrupt to use, which might be sufficient if we were to ignore the latency if QEMU were to defer interrupt setup until stop-and-copy. Is the question of when to setup device interrupts versus the interrupt controller state largely a machine issue within QEMU? If so, shouldn't it be at QEMU's determination when to act on the config space information on the target? IOW, if a vendor driver has a dependency on interrupt configuration, they need to include it in their own pre-copy data stream and decouple that dependency from userspace interrupt configuration via the SET_IRQS ioctl. Is that possible? Thanks, Alex
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
Hi, After reading everyone's opinions, we have a rough idea for this issue. One key point is whether it is necessary to setup the config space before the device can accept further migration data. I think it is decided by the vendor driver, so we can simply ask the vendor driver about it in .save_setup, which could avoid a lot of unnecessary copies and settings. Once we have known the need, we can iterate the config space (before) along with the device migration data in .save_live_iterate and .save_live_complete_precopy, and if not needed, we can only migrate the config space in .save_state. Another key point is that the interrupt enabling should be after the restoring of the interrupt controller (might not only interrupts). My solution is to add a subflag at the beginning of the config data (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered actions on the dst (such as whether to enable interrupts). Below is it's workflow. On the save path: In vfio_save_setup(): Ask the vendor driver if it needs the config space setup before it can accept further migration data. | In vfio_save_iterate() (pre-copy): If *needed*, save the config space which would be setup on the dst before the migration data, but send with a subflag to instruct not to (such as) enable interrupts. | In vfio_save_complete_precopy() (stop-and-copy, iterable process): The same as that in vfio_save_iterate(). | In .save_state (stop-and-copy, non-iterable process): If *needed*, only send a subflag to instruct to enable interrupts. If *not needed*, save the config space and setup everything on the dst. Besides the above idea, we might be able to choose to let the vendor driver do more: qemu just sends and writes the config data (before) along with the device migration data every time, and it's up to the vendor driver to filter out/buffer the received data or reorder the settings... Thanks, Shenming
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
* Neo Jia (c...@nvidia.com) wrote: > On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote: > > External email: Use caution opening links or attachments > > > > > > On 2020/11/21 6:01, Alex Williamson wrote: > > > On Fri, 20 Nov 2020 22:05:49 +0800 > > > Shenming Lu wrote: > > > > > >> On 2020/11/20 1:41, Alex Williamson wrote: > > >>> On Thu, 19 Nov 2020 14:13:24 +0530 > > >>> Kirti Wankhede wrote: > > >>> > > On 11/14/2020 2:47 PM, Shenming Lu wrote: > > > When running VFIO migration, I found that the restoring of VFIO PCI > > > device’s > > > config space is before VGIC on ARM64 target. But generally, interrupt > > > controllers > > > need to be restored before PCI devices. > > > > Is there any other way by which VGIC can be restored before PCI device? > > >> > > >> As far as I know, it seems to have to depend on priorities in the > > >> non-iterable process. > > >> > > > > > Besides, if a VFIO PCI device is > > > configured to have directly-injected MSIs (VLPIs), the restoring of > > > its config > > > space will trigger the configuring of these VLPIs (in kernel), where > > > it would > > > return an error as I saw due to the dependency on kvm’s vgic. > > > > > > > Can this be fixed in kernel to re-initialize the kernel state? > > >> > > >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic? > > >> But the fact is that this error is not caused by kernel, it is due to > > >> the incorrect > > >> calling order of qemu... > > >> > > > > > To avoid this, we can move the saving of the config space from the > > > iterable > > > process to the non-iterable process, so that it will be called after > > > VGIC > > > according to their priorities. > > > > > > > With this change, at resume side, pre-copy phase data would reach > > destination without restored config space. VFIO device on destination > > might need it's config space setup and validated before it can accept > > further VFIO device specific migration state. > > > > This also changes bit-stream, so it would break migration with original > > migration patch-set. > > >>> > > >>> Config space can continue to change while in pre-copy, if we're only > > >>> sending config space at the initiation of pre-copy, how are any changes > > >>> that might occur before the VM is stopped conveyed to the target? For > > >>> example the guest might reboot and a device returned to INTx mode from > > >>> MSI during pre-copy. Thanks, > > >> > > >> What I see is that the config space is only saved once in > > >> save_live_complete_precopy > > >> currently... > > >> As you said, a VFIO device might need it's config space setup first, and > > >> the config space can continue to change while in pre-copy, Did you mean > > >> we > > >> have to migrate the config space in save_live_iterate? > > >> However, I still have a little doubt about the restoring dependence > > >> between > > >> the qemu emulated config space and the device data... > > >> > > >> Besides, if we surely can't move the saving of the config space back, > > >> can we > > >> just move some actions which are triggered by the restoring of the > > >> config space > > >> back (such as vfio_msix_enable())? > > > > > > It seems that the significant benefit to enabling interrupts during > > > pre-copy would be to reduce the latency and failure potential during > > > the final phase of migration. Do we have any data for how much it adds > > > to the device contributed downtime to configure interrupts only at the > > > final stage? My guess is that it's a measurable delay on its own. At > > > the same time, we can't ignore the differences in machine specific > > > dependencies and if we don't even sync the config space once the VM is > > > stopped... this all seems not ready to call supported, especially if we > > > have concerns already about migration bit-stream compatibility. > > > > > > > I have another question for this, if we restore the config space while in > > pre-copy > > (include enabling interrupts), does it affect the _RESUMING state (paused) > > of the > > device on the dst host (cause it to send interrupts? which should not be > > allowed > > in this stage). Does the restore sequence need to be further discussed and > > reach > > a consensus(spec) (taking into account other devices and the corresponding > > actions > > of the vendor driver)? > > > > > Given our timing relative to QEMU 5.2, the only path I feel comfortable > > > with is to move forward with downgrading vfio migration support to be > > > enabled via an experimental option. Objections? Thanks, > > > > Alright, but this issue is related to our ARM GICv4.1 migration scheme, > > could you > > give a rough idea about this (where to enable interrupts, we hope it to be > > after > > the restoring of VGIC)? > > I disagree. If this is only specific
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/11/24 3:33, Neo Jia wrote: > On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote: >> External email: Use caution opening links or attachments >> >> >> On 2020/11/21 6:01, Alex Williamson wrote: >>> On Fri, 20 Nov 2020 22:05:49 +0800 >>> Shenming Lu wrote: >>> On 2020/11/20 1:41, Alex Williamson wrote: > On Thu, 19 Nov 2020 14:13:24 +0530 > Kirti Wankhede wrote: > >> On 11/14/2020 2:47 PM, Shenming Lu wrote: >>> When running VFIO migration, I found that the restoring of VFIO PCI >>> device’s >>> config space is before VGIC on ARM64 target. But generally, interrupt >>> controllers >>> need to be restored before PCI devices. >> >> Is there any other way by which VGIC can be restored before PCI device? As far as I know, it seems to have to depend on priorities in the non-iterable process. >> >>> Besides, if a VFIO PCI device is >>> configured to have directly-injected MSIs (VLPIs), the restoring of its >>> config >>> space will trigger the configuring of these VLPIs (in kernel), where it >>> would >>> return an error as I saw due to the dependency on kvm’s vgic. >>> >> >> Can this be fixed in kernel to re-initialize the kernel state? Did you mean to reconfigure these VLPIs when restoring kvm's vgic? But the fact is that this error is not caused by kernel, it is due to the incorrect calling order of qemu... >> >>> To avoid this, we can move the saving of the config space from the >>> iterable >>> process to the non-iterable process, so that it will be called after >>> VGIC >>> according to their priorities. >>> >> >> With this change, at resume side, pre-copy phase data would reach >> destination without restored config space. VFIO device on destination >> might need it's config space setup and validated before it can accept >> further VFIO device specific migration state. >> >> This also changes bit-stream, so it would break migration with original >> migration patch-set. > > Config space can continue to change while in pre-copy, if we're only > sending config space at the initiation of pre-copy, how are any changes > that might occur before the VM is stopped conveyed to the target? For > example the guest might reboot and a device returned to INTx mode from > MSI during pre-copy. Thanks, What I see is that the config space is only saved once in save_live_complete_precopy currently... As you said, a VFIO device might need it's config space setup first, and the config space can continue to change while in pre-copy, Did you mean we have to migrate the config space in save_live_iterate? However, I still have a little doubt about the restoring dependence between the qemu emulated config space and the device data... Besides, if we surely can't move the saving of the config space back, can we just move some actions which are triggered by the restoring of the config space back (such as vfio_msix_enable())? >>> >>> It seems that the significant benefit to enabling interrupts during >>> pre-copy would be to reduce the latency and failure potential during >>> the final phase of migration. Do we have any data for how much it adds >>> to the device contributed downtime to configure interrupts only at the >>> final stage? My guess is that it's a measurable delay on its own. At >>> the same time, we can't ignore the differences in machine specific >>> dependencies and if we don't even sync the config space once the VM is >>> stopped... this all seems not ready to call supported, especially if we >>> have concerns already about migration bit-stream compatibility. >>> >> >> I have another question for this, if we restore the config space while in >> pre-copy >> (include enabling interrupts), does it affect the _RESUMING state (paused) >> of the >> device on the dst host (cause it to send interrupts? which should not be >> allowed >> in this stage). Does the restore sequence need to be further discussed and >> reach >> a consensus(spec) (taking into account other devices and the corresponding >> actions >> of the vendor driver)? >> >>> Given our timing relative to QEMU 5.2, the only path I feel comfortable >>> with is to move forward with downgrading vfio migration support to be >>> enabled via an experimental option. Objections? Thanks, >> >> Alright, but this issue is related to our ARM GICv4.1 migration scheme, >> could you >> give a rough idea about this (where to enable interrupts, we hope it to be >> after >> the restoring of VGIC)? > > I disagree. If this is only specific to Huawei ARM GIC implementation, why do > we want to > make the entire VFIO based migration an experimental feature? It is not specific to Huawei ARM GIC implementation, the error was encountered in general ARM
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Mon, 23 Nov 2020 11:33:37 -0800 Neo Jia wrote: > On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote: > > External email: Use caution opening links or attachments > > > > > > On 2020/11/21 6:01, Alex Williamson wrote: > > > On Fri, 20 Nov 2020 22:05:49 +0800 > > > Shenming Lu wrote: > > > > > >> On 2020/11/20 1:41, Alex Williamson wrote: > > >>> On Thu, 19 Nov 2020 14:13:24 +0530 > > >>> Kirti Wankhede wrote: > > >>> > > On 11/14/2020 2:47 PM, Shenming Lu wrote: > > > When running VFIO migration, I found that the restoring of VFIO PCI > > > device’s > > > config space is before VGIC on ARM64 target. But generally, interrupt > > > controllers > > > need to be restored before PCI devices. > > > > Is there any other way by which VGIC can be restored before PCI > > device? > > >> > > >> As far as I know, it seems to have to depend on priorities in the > > >> non-iterable process. > > >> > > > > > Besides, if a VFIO PCI device is > > > configured to have directly-injected MSIs (VLPIs), the restoring of > > > its config > > > space will trigger the configuring of these VLPIs (in kernel), where > > > it would > > > return an error as I saw due to the dependency on kvm’s vgic. > > > > > > > Can this be fixed in kernel to re-initialize the kernel state? > > >> > > >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic? > > >> But the fact is that this error is not caused by kernel, it is due to > > >> the incorrect > > >> calling order of qemu... > > >> > > > > > To avoid this, we can move the saving of the config space from the > > > iterable > > > process to the non-iterable process, so that it will be called after > > > VGIC > > > according to their priorities. > > > > > > > With this change, at resume side, pre-copy phase data would reach > > destination without restored config space. VFIO device on destination > > might need it's config space setup and validated before it can accept > > further VFIO device specific migration state. > > > > This also changes bit-stream, so it would break migration with original > > migration patch-set. > > >>> > > >>> Config space can continue to change while in pre-copy, if we're only > > >>> sending config space at the initiation of pre-copy, how are any changes > > >>> that might occur before the VM is stopped conveyed to the target? For > > >>> example the guest might reboot and a device returned to INTx mode from > > >>> MSI during pre-copy. Thanks, > > >> > > >> What I see is that the config space is only saved once in > > >> save_live_complete_precopy > > >> currently... > > >> As you said, a VFIO device might need it's config space setup first, and > > >> the config space can continue to change while in pre-copy, Did you mean > > >> we > > >> have to migrate the config space in save_live_iterate? > > >> However, I still have a little doubt about the restoring dependence > > >> between > > >> the qemu emulated config space and the device data... > > >> > > >> Besides, if we surely can't move the saving of the config space back, > > >> can we > > >> just move some actions which are triggered by the restoring of the > > >> config space > > >> back (such as vfio_msix_enable())? > > > > > > It seems that the significant benefit to enabling interrupts during > > > pre-copy would be to reduce the latency and failure potential during > > > the final phase of migration. Do we have any data for how much it adds > > > to the device contributed downtime to configure interrupts only at the > > > final stage? My guess is that it's a measurable delay on its own. At > > > the same time, we can't ignore the differences in machine specific > > > dependencies and if we don't even sync the config space once the VM is > > > stopped... this all seems not ready to call supported, especially if we > > > have concerns already about migration bit-stream compatibility. > > > > > > > I have another question for this, if we restore the config space while in > > pre-copy > > (include enabling interrupts), does it affect the _RESUMING state (paused) > > of the > > device on the dst host (cause it to send interrupts? which should not be > > allowed > > in this stage). Does the restore sequence need to be further discussed and > > reach > > a consensus(spec) (taking into account other devices and the corresponding > > actions > > of the vendor driver)? If a target device generates an interrupt when stopped, I think that would violate the basic notion of running vs stopped in the device state, right? > > > Given our timing relative to QEMU 5.2, the only path I feel comfortable > > > with is to move forward with downgrading vfio migration support to be > > > enabled via an experimental option. Objections? Thanks, > > > > Alright, but this issue is related
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote: > External email: Use caution opening links or attachments > > > On 2020/11/21 6:01, Alex Williamson wrote: > > On Fri, 20 Nov 2020 22:05:49 +0800 > > Shenming Lu wrote: > > > >> On 2020/11/20 1:41, Alex Williamson wrote: > >>> On Thu, 19 Nov 2020 14:13:24 +0530 > >>> Kirti Wankhede wrote: > >>> > On 11/14/2020 2:47 PM, Shenming Lu wrote: > > When running VFIO migration, I found that the restoring of VFIO PCI > > device’s > > config space is before VGIC on ARM64 target. But generally, interrupt > > controllers > > need to be restored before PCI devices. > > Is there any other way by which VGIC can be restored before PCI device? > >> > >> As far as I know, it seems to have to depend on priorities in the > >> non-iterable process. > >> > > > Besides, if a VFIO PCI device is > > configured to have directly-injected MSIs (VLPIs), the restoring of its > > config > > space will trigger the configuring of these VLPIs (in kernel), where it > > would > > return an error as I saw due to the dependency on kvm’s vgic. > > > > Can this be fixed in kernel to re-initialize the kernel state? > >> > >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic? > >> But the fact is that this error is not caused by kernel, it is due to the > >> incorrect > >> calling order of qemu... > >> > > > To avoid this, we can move the saving of the config space from the > > iterable > > process to the non-iterable process, so that it will be called after > > VGIC > > according to their priorities. > > > > With this change, at resume side, pre-copy phase data would reach > destination without restored config space. VFIO device on destination > might need it's config space setup and validated before it can accept > further VFIO device specific migration state. > > This also changes bit-stream, so it would break migration with original > migration patch-set. > >>> > >>> Config space can continue to change while in pre-copy, if we're only > >>> sending config space at the initiation of pre-copy, how are any changes > >>> that might occur before the VM is stopped conveyed to the target? For > >>> example the guest might reboot and a device returned to INTx mode from > >>> MSI during pre-copy. Thanks, > >> > >> What I see is that the config space is only saved once in > >> save_live_complete_precopy > >> currently... > >> As you said, a VFIO device might need it's config space setup first, and > >> the config space can continue to change while in pre-copy, Did you mean we > >> have to migrate the config space in save_live_iterate? > >> However, I still have a little doubt about the restoring dependence between > >> the qemu emulated config space and the device data... > >> > >> Besides, if we surely can't move the saving of the config space back, can > >> we > >> just move some actions which are triggered by the restoring of the config > >> space > >> back (such as vfio_msix_enable())? > > > > It seems that the significant benefit to enabling interrupts during > > pre-copy would be to reduce the latency and failure potential during > > the final phase of migration. Do we have any data for how much it adds > > to the device contributed downtime to configure interrupts only at the > > final stage? My guess is that it's a measurable delay on its own. At > > the same time, we can't ignore the differences in machine specific > > dependencies and if we don't even sync the config space once the VM is > > stopped... this all seems not ready to call supported, especially if we > > have concerns already about migration bit-stream compatibility. > > > > I have another question for this, if we restore the config space while in > pre-copy > (include enabling interrupts), does it affect the _RESUMING state (paused) of > the > device on the dst host (cause it to send interrupts? which should not be > allowed > in this stage). Does the restore sequence need to be further discussed and > reach > a consensus(spec) (taking into account other devices and the corresponding > actions > of the vendor driver)? > > > Given our timing relative to QEMU 5.2, the only path I feel comfortable > > with is to move forward with downgrading vfio migration support to be > > enabled via an experimental option. Objections? Thanks, > > Alright, but this issue is related to our ARM GICv4.1 migration scheme, could > you > give a rough idea about this (where to enable interrupts, we hope it to be > after > the restoring of VGIC)? I disagree. If this is only specific to Huawei ARM GIC implementation, why do we want to make the entire VFIO based migration an experimental feature? Thanks, Neo > > Thanks, > Shenming
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Fri, 20 Nov 2020 15:01:46 -0700 Alex Williamson wrote: > Given our timing relative to QEMU 5.2, the only path I feel comfortable > with is to move forward with downgrading vfio migration support to be > enabled via an experimental option. Objections? Thanks, > > Alex Agreed from my side. It seems better to give it one more release to figure out some issues, and just mark it experimental for now.
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/11/21 6:01, Alex Williamson wrote: > On Fri, 20 Nov 2020 22:05:49 +0800 > Shenming Lu wrote: > >> On 2020/11/20 1:41, Alex Williamson wrote: >>> On Thu, 19 Nov 2020 14:13:24 +0530 >>> Kirti Wankhede wrote: >>> On 11/14/2020 2:47 PM, Shenming Lu wrote: > When running VFIO migration, I found that the restoring of VFIO PCI > device’s > config space is before VGIC on ARM64 target. But generally, interrupt > controllers > need to be restored before PCI devices. Is there any other way by which VGIC can be restored before PCI device? >> >> As far as I know, it seems to have to depend on priorities in the >> non-iterable process. >> > Besides, if a VFIO PCI device is > configured to have directly-injected MSIs (VLPIs), the restoring of its > config > space will trigger the configuring of these VLPIs (in kernel), where it > would > return an error as I saw due to the dependency on kvm’s vgic. > Can this be fixed in kernel to re-initialize the kernel state? >> >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic? >> But the fact is that this error is not caused by kernel, it is due to the >> incorrect >> calling order of qemu... >> > To avoid this, we can move the saving of the config space from the > iterable > process to the non-iterable process, so that it will be called after VGIC > according to their priorities. > With this change, at resume side, pre-copy phase data would reach destination without restored config space. VFIO device on destination might need it's config space setup and validated before it can accept further VFIO device specific migration state. This also changes bit-stream, so it would break migration with original migration patch-set. >>> >>> Config space can continue to change while in pre-copy, if we're only >>> sending config space at the initiation of pre-copy, how are any changes >>> that might occur before the VM is stopped conveyed to the target? For >>> example the guest might reboot and a device returned to INTx mode from >>> MSI during pre-copy. Thanks, >> >> What I see is that the config space is only saved once in >> save_live_complete_precopy >> currently... >> As you said, a VFIO device might need it's config space setup first, and >> the config space can continue to change while in pre-copy, Did you mean we >> have to migrate the config space in save_live_iterate? >> However, I still have a little doubt about the restoring dependence between >> the qemu emulated config space and the device data... >> >> Besides, if we surely can't move the saving of the config space back, can we >> just move some actions which are triggered by the restoring of the config >> space >> back (such as vfio_msix_enable())? > > It seems that the significant benefit to enabling interrupts during > pre-copy would be to reduce the latency and failure potential during > the final phase of migration. Do we have any data for how much it adds > to the device contributed downtime to configure interrupts only at the > final stage? My guess is that it's a measurable delay on its own. At > the same time, we can't ignore the differences in machine specific > dependencies and if we don't even sync the config space once the VM is > stopped... this all seems not ready to call supported, especially if we > have concerns already about migration bit-stream compatibility. > I have another question for this, if we restore the config space while in pre-copy (include enabling interrupts), does it affect the _RESUMING state (paused) of the device on the dst host (cause it to send interrupts? which should not be allowed in this stage). Does the restore sequence need to be further discussed and reach a consensus(spec) (taking into account other devices and the corresponding actions of the vendor driver)? > Given our timing relative to QEMU 5.2, the only path I feel comfortable > with is to move forward with downgrading vfio migration support to be > enabled via an experimental option. Objections? Thanks, Alright, but this issue is related to our ARM GICv4.1 migration scheme, could you give a rough idea about this (where to enable interrupts, we hope it to be after the restoring of VGIC)? Thanks, Shenming
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Fri, 20 Nov 2020 22:05:49 +0800 Shenming Lu wrote: > On 2020/11/20 1:41, Alex Williamson wrote: > > On Thu, 19 Nov 2020 14:13:24 +0530 > > Kirti Wankhede wrote: > > > >> On 11/14/2020 2:47 PM, Shenming Lu wrote: > >>> When running VFIO migration, I found that the restoring of VFIO PCI > >>> device’s > >>> config space is before VGIC on ARM64 target. But generally, interrupt > >>> controllers > >>> need to be restored before PCI devices. > >> > >> Is there any other way by which VGIC can be restored before PCI device? > > As far as I know, it seems to have to depend on priorities in the > non-iterable process. > > >> > >>> Besides, if a VFIO PCI device is > >>> configured to have directly-injected MSIs (VLPIs), the restoring of its > >>> config > >>> space will trigger the configuring of these VLPIs (in kernel), where it > >>> would > >>> return an error as I saw due to the dependency on kvm’s vgic. > >>> > >> > >> Can this be fixed in kernel to re-initialize the kernel state? > > Did you mean to reconfigure these VLPIs when restoring kvm's vgic? > But the fact is that this error is not caused by kernel, it is due to the > incorrect > calling order of qemu... > > >> > >>> To avoid this, we can move the saving of the config space from the > >>> iterable > >>> process to the non-iterable process, so that it will be called after VGIC > >>> according to their priorities. > >>> > >> > >> With this change, at resume side, pre-copy phase data would reach > >> destination without restored config space. VFIO device on destination > >> might need it's config space setup and validated before it can accept > >> further VFIO device specific migration state. > >> > >> This also changes bit-stream, so it would break migration with original > >> migration patch-set. > > > > Config space can continue to change while in pre-copy, if we're only > > sending config space at the initiation of pre-copy, how are any changes > > that might occur before the VM is stopped conveyed to the target? For > > example the guest might reboot and a device returned to INTx mode from > > MSI during pre-copy. Thanks, > > What I see is that the config space is only saved once in > save_live_complete_precopy > currently... > As you said, a VFIO device might need it's config space setup first, and > the config space can continue to change while in pre-copy, Did you mean we > have to migrate the config space in save_live_iterate? > However, I still have a little doubt about the restoring dependence between > the qemu emulated config space and the device data... > > Besides, if we surely can't move the saving of the config space back, can we > just move some actions which are triggered by the restoring of the config > space > back (such as vfio_msix_enable())? It seems that the significant benefit to enabling interrupts during pre-copy would be to reduce the latency and failure potential during the final phase of migration. Do we have any data for how much it adds to the device contributed downtime to configure interrupts only at the final stage? My guess is that it's a measurable delay on its own. At the same time, we can't ignore the differences in machine specific dependencies and if we don't even sync the config space once the VM is stopped... this all seems not ready to call supported, especially if we have concerns already about migration bit-stream compatibility. Given our timing relative to QEMU 5.2, the only path I feel comfortable with is to move forward with downgrading vfio migration support to be enabled via an experimental option. Objections? Thanks, Alex > > The demo patch is as follows (but it seems that .save_state is not a > appropriate > place to do it. -_-): > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 55261562d4..9cf0310148 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -44,6 +44,7 @@ > #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xef12ULL) > #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xef13ULL) > #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL) > +#define VFIO_MIG_FLAG_DEV_STATE_TRIGGER (0xef15ULL) > > static int64_t bytes_transferred; > > @@ -368,6 +369,15 @@ static int vfio_save_device_config_state(QEMUFile *f, > void *opaque) > return qemu_file_get_error(f); > } > > +static void vfio_device_state_trigger(void *opaque) > +{ > +VFIODevice *vbasedev = opaque; > + > +if (vbasedev->ops && vbasedev->ops->vfio_trigger_config) { > +vbasedev->ops->vfio_trigger_config(vbasedev); > +} > +} > + > static int vfio_load_device_config_state(QEMUFile *f, void *opaque) > { > VFIODevice *vbasedev = opaque; > @@ -620,6 +630,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void > *opaque) > return ret; > } > > +static void vfio_save_state_trigger(QEMUFile *f, void *opaque) > +{ > +qemu_put_be64(f,
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 2020/11/20 1:41, Alex Williamson wrote: > On Thu, 19 Nov 2020 14:13:24 +0530 > Kirti Wankhede wrote: > >> On 11/14/2020 2:47 PM, Shenming Lu wrote: >>> When running VFIO migration, I found that the restoring of VFIO PCI device’s >>> config space is before VGIC on ARM64 target. But generally, interrupt >>> controllers >>> need to be restored before PCI devices. >> >> Is there any other way by which VGIC can be restored before PCI device? As far as I know, it seems to have to depend on priorities in the non-iterable process. >> >>> Besides, if a VFIO PCI device is >>> configured to have directly-injected MSIs (VLPIs), the restoring of its >>> config >>> space will trigger the configuring of these VLPIs (in kernel), where it >>> would >>> return an error as I saw due to the dependency on kvm’s vgic. >>> >> >> Can this be fixed in kernel to re-initialize the kernel state? Did you mean to reconfigure these VLPIs when restoring kvm's vgic? But the fact is that this error is not caused by kernel, it is due to the incorrect calling order of qemu... >> >>> To avoid this, we can move the saving of the config space from the iterable >>> process to the non-iterable process, so that it will be called after VGIC >>> according to their priorities. >>> >> >> With this change, at resume side, pre-copy phase data would reach >> destination without restored config space. VFIO device on destination >> might need it's config space setup and validated before it can accept >> further VFIO device specific migration state. >> >> This also changes bit-stream, so it would break migration with original >> migration patch-set. > > Config space can continue to change while in pre-copy, if we're only > sending config space at the initiation of pre-copy, how are any changes > that might occur before the VM is stopped conveyed to the target? For > example the guest might reboot and a device returned to INTx mode from > MSI during pre-copy. Thanks, What I see is that the config space is only saved once in save_live_complete_precopy currently... As you said, a VFIO device might need it's config space setup first, and the config space can continue to change while in pre-copy, Did you mean we have to migrate the config space in save_live_iterate? However, I still have a little doubt about the restoring dependence between the qemu emulated config space and the device data... Besides, if we surely can't move the saving of the config space back, can we just move some actions which are triggered by the restoring of the config space back (such as vfio_msix_enable())? The demo patch is as follows (but it seems that .save_state is not a appropriate place to do it. -_-): diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 55261562d4..9cf0310148 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -44,6 +44,7 @@ #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xef12ULL) #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xef13ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL) +#define VFIO_MIG_FLAG_DEV_STATE_TRIGGER (0xef15ULL) static int64_t bytes_transferred; @@ -368,6 +369,15 @@ static int vfio_save_device_config_state(QEMUFile *f, void *opaque) return qemu_file_get_error(f); } +static void vfio_device_state_trigger(void *opaque) +{ +VFIODevice *vbasedev = opaque; + +if (vbasedev->ops && vbasedev->ops->vfio_trigger_config) { +vbasedev->ops->vfio_trigger_config(vbasedev); +} +} + static int vfio_load_device_config_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; @@ -620,6 +630,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } +static void vfio_save_state_trigger(QEMUFile *f, void *opaque) +{ +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_STATE_TRIGGER); + +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); +} + static int vfio_load_setup(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; @@ -700,6 +717,18 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) } break; } +case VFIO_MIG_FLAG_DEV_STATE_TRIGGER: +{ +vfio_device_state_trigger(opaque); +data = qemu_get_be64(f); +if (data == VFIO_MIG_FLAG_END_OF_STATE) { +return ret; +} else { +error_report("%s: STATE TRIGGER: EOS not found 0x%"PRIx64, + vbasedev->name, data); +return -EINVAL; +} +} default: error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data); return -EINVAL; @@ -720,6 +749,7 @@ static SaveVMHandlers savevm_vfio_handlers = { .save_live_pending = vfio_save_pending, .save_live_iterate = vfio_save_iterate, .save_live_complete_precopy = vfio_save_complete_precopy, +.save_state = vfio_save_state_trigger, .load_setup =
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On Thu, 19 Nov 2020 14:13:24 +0530 Kirti Wankhede wrote: > On 11/14/2020 2:47 PM, Shenming Lu wrote: > > When running VFIO migration, I found that the restoring of VFIO PCI device’s > > config space is before VGIC on ARM64 target. But generally, interrupt > > controllers > > need to be restored before PCI devices. > > Is there any other way by which VGIC can be restored before PCI device? > > > Besides, if a VFIO PCI device is > > configured to have directly-injected MSIs (VLPIs), the restoring of its > > config > > space will trigger the configuring of these VLPIs (in kernel), where it > > would > > return an error as I saw due to the dependency on kvm’s vgic. > > > > Can this be fixed in kernel to re-initialize the kernel state? > > > To avoid this, we can move the saving of the config space from the iterable > > process to the non-iterable process, so that it will be called after VGIC > > according to their priorities. > > > > With this change, at resume side, pre-copy phase data would reach > destination without restored config space. VFIO device on destination > might need it's config space setup and validated before it can accept > further VFIO device specific migration state. > > This also changes bit-stream, so it would break migration with original > migration patch-set. Config space can continue to change while in pre-copy, if we're only sending config space at the initiation of pre-copy, how are any changes that might occur before the VM is stopped conveyed to the target? For example the guest might reboot and a device returned to INTx mode from MSI during pre-copy. Thanks, Alex > > Signed-off-by: Shenming Lu > > --- > > hw/vfio/migration.c | 22 ++ > > 1 file changed, 6 insertions(+), 16 deletions(-) > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > index 3ce285ea39..028da35a25 100644 > > --- a/hw/vfio/migration.c > > +++ b/hw/vfio/migration.c > > @@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev) > > return 0; > > } > > > > -static int vfio_save_device_config_state(QEMUFile *f, void *opaque) > > +static void vfio_save_device_config_state(QEMUFile *f, void *opaque) > > { > > VFIODevice *vbasedev = opaque; > > > > @@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, > > void *opaque) > > > > trace_vfio_save_device_config_state(vbasedev->name); > > > > -return qemu_file_get_error(f); > > +if (qemu_file_get_error(f)) > > +error_report("%s: Failed to save device config space", > > + vbasedev->name); > > } > > > > static int vfio_load_device_config_state(QEMUFile *f, void *opaque) > > { > > VFIODevice *vbasedev = opaque; > > -uint64_t data; > > > > if (vbasedev->ops && vbasedev->ops->vfio_load_config) { > > int ret; > > @@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, > > void *opaque) > > } > > } > > > > -data = qemu_get_be64(f); > > -if (data != VFIO_MIG_FLAG_END_OF_STATE) { > > -error_report("%s: Failed loading device config space, " > > - "end flag incorrect 0x%"PRIx64, vbasedev->name, data); > > -return -EINVAL; > > -} > > - > > trace_vfio_load_device_config_state(vbasedev->name); > > -return qemu_file_get_error(f); > > +return 0; > > } > > > > static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start) > > @@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, > > void *opaque) > > return ret; > > } > > > > -ret = vfio_save_device_config_state(f, opaque); > > -if (ret) { > > -return ret; > > -} > > - > > ret = vfio_update_pending(vbasedev); > > if (ret) { > > return ret; > > @@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = { > > .save_live_pending = vfio_save_pending, > > .save_live_iterate = vfio_save_iterate, > > .save_live_complete_precopy = vfio_save_complete_precopy, > > +.save_state = vfio_save_device_config_state, > > .load_setup = vfio_load_setup, > > .load_cleanup = vfio_load_cleanup, > > .load_state = vfio_load_state, > > >
Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
On 11/14/2020 2:47 PM, Shenming Lu wrote: When running VFIO migration, I found that the restoring of VFIO PCI device’s config space is before VGIC on ARM64 target. But generally, interrupt controllers need to be restored before PCI devices. Is there any other way by which VGIC can be restored before PCI device? Besides, if a VFIO PCI device is configured to have directly-injected MSIs (VLPIs), the restoring of its config space will trigger the configuring of these VLPIs (in kernel), where it would return an error as I saw due to the dependency on kvm’s vgic. Can this be fixed in kernel to re-initialize the kernel state? To avoid this, we can move the saving of the config space from the iterable process to the non-iterable process, so that it will be called after VGIC according to their priorities. With this change, at resume side, pre-copy phase data would reach destination without restored config space. VFIO device on destination might need it's config space setup and validated before it can accept further VFIO device specific migration state. This also changes bit-stream, so it would break migration with original migration patch-set. Thanks, Kirti Signed-off-by: Shenming Lu --- hw/vfio/migration.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 3ce285ea39..028da35a25 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev) return 0; } -static int vfio_save_device_config_state(QEMUFile *f, void *opaque) +static void vfio_save_device_config_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; @@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, void *opaque) trace_vfio_save_device_config_state(vbasedev->name); -return qemu_file_get_error(f); +if (qemu_file_get_error(f)) +error_report("%s: Failed to save device config space", + vbasedev->name); } static int vfio_load_device_config_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; -uint64_t data; if (vbasedev->ops && vbasedev->ops->vfio_load_config) { int ret; @@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) } } -data = qemu_get_be64(f); -if (data != VFIO_MIG_FLAG_END_OF_STATE) { -error_report("%s: Failed loading device config space, " - "end flag incorrect 0x%"PRIx64, vbasedev->name, data); -return -EINVAL; -} - trace_vfio_load_device_config_state(vbasedev->name); -return qemu_file_get_error(f); +return 0; } static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start) @@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } -ret = vfio_save_device_config_state(f, opaque); -if (ret) { -return ret; -} - ret = vfio_update_pending(vbasedev); if (ret) { return ret; @@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = { .save_live_pending = vfio_save_pending, .save_live_iterate = vfio_save_iterate, .save_live_complete_precopy = vfio_save_complete_precopy, +.save_state = vfio_save_device_config_state, .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, .load_state = vfio_load_state,
[PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration
When running VFIO migration, I found that the restoring of VFIO PCI device’s config space is before VGIC on ARM64 target. But generally, interrupt controllers need to be restored before PCI devices. Besides, if a VFIO PCI device is configured to have directly-injected MSIs (VLPIs), the restoring of its config space will trigger the configuring of these VLPIs (in kernel), where it would return an error as I saw due to the dependency on kvm’s vgic. To avoid this, we can move the saving of the config space from the iterable process to the non-iterable process, so that it will be called after VGIC according to their priorities. Signed-off-by: Shenming Lu --- hw/vfio/migration.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 3ce285ea39..028da35a25 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev) return 0; } -static int vfio_save_device_config_state(QEMUFile *f, void *opaque) +static void vfio_save_device_config_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; @@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, void *opaque) trace_vfio_save_device_config_state(vbasedev->name); -return qemu_file_get_error(f); +if (qemu_file_get_error(f)) +error_report("%s: Failed to save device config space", + vbasedev->name); } static int vfio_load_device_config_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; -uint64_t data; if (vbasedev->ops && vbasedev->ops->vfio_load_config) { int ret; @@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) } } -data = qemu_get_be64(f); -if (data != VFIO_MIG_FLAG_END_OF_STATE) { -error_report("%s: Failed loading device config space, " - "end flag incorrect 0x%"PRIx64, vbasedev->name, data); -return -EINVAL; -} - trace_vfio_load_device_config_state(vbasedev->name); -return qemu_file_get_error(f); +return 0; } static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start) @@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } -ret = vfio_save_device_config_state(f, opaque); -if (ret) { -return ret; -} - ret = vfio_update_pending(vbasedev); if (ret) { return ret; @@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = { .save_live_pending = vfio_save_pending, .save_live_iterate = vfio_save_iterate, .save_live_complete_precopy = vfio_save_complete_precopy, +.save_state = vfio_save_device_config_state, .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, .load_state = vfio_load_state, -- 2.19.1