Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-12-04 Thread Shenming Lu



On 2020/12/2 18:55, Dr. David Alan Gilbert wrote:
> * Shenming Lu (lushenm...@huawei.com) wrote:
>> Hi,
>>
>> After reading everyone's opinions, we have a rough idea for this issue.
>>
>> One key point is whether it is necessary to setup the config space before
>> the device can accept further migration data. I think it is decided by
>> the vendor driver, so we can simply ask the vendor driver about it in
>> .save_setup, which could avoid a lot of unnecessary copies and settings.
>> Once we have known the need, we can iterate the config space (before)
>> along with the device migration data in .save_live_iterate and
>> .save_live_complete_precopy, and if not needed, we can only migrate the
>> config space in .save_state.
>>
>> Another key point is that the interrupt enabling should be after the
>> restoring of the interrupt controller (might not only interrupts).
>> My solution is to add a subflag at the beginning of the config data
>> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
>> actions on the dst (such as whether to enable interrupts).
>>
>> Below is it's workflow.
>>
>> On the save path:
>>  In vfio_save_setup():
>>  Ask the vendor driver if it needs the config space setup before it
>>  can accept further migration data.
> 
> You could argue that you could always send an initial copy of the config
> data at the start; and then send new versions at the end anyway.
> But is this just about interrupts?

In the current implementation, the effect of the config space is mainly about
interrupts (SET_IRQS ioctl).

> Is the dst going to interpret the
> received migration data differently depending on the config data?

I also have a little doubt about the dependency... But not sure.

> That
> would seem dangerous if the config data was to change during the
> migration.
> 
> Dave
> 
>>  |
>>  In vfio_save_iterate() (pre-copy):
>>  If *needed*, save the config space which would be setup on the dst
>>  before the migration data, but send with a subflag to instruct not
>>  to (such as) enable interrupts.
>>  |
>>  In vfio_save_complete_precopy() (stop-and-copy, iterable process):
>>  The same as that in vfio_save_iterate().
>>  |
>>  In .save_state (stop-and-copy, non-iterable process):
>>  If *needed*, only send a subflag to instruct to enable interrupts.
>>  If *not needed*, save the config space and setup everything on the dst.
>>
>> Besides the above idea, we might be able to choose to let the vendor driver 
>> do
>> more: qemu just sends and writes the config data (before) along with the 
>> device
>> migration data every time, and it's up to the vendor driver to filter 
>> out/buffer
>> the received data or reorder the settings...
>>
>> Thanks,
>> Shenming
>>



Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-12-03 Thread Shenming Lu
On 2020/12/2 6:21, Alex Williamson wrote:
> On Tue, 1 Dec 2020 14:37:52 +0800
> Shenming Lu  wrote:
> 
>> On 2020/12/1 1:03, Alex Williamson wrote:
>>> On Thu, 26 Nov 2020 14:56:17 +0800
>>> Shenming Lu  wrote:
>>>   
 Hi,

 After reading everyone's opinions, we have a rough idea for this issue.

 One key point is whether it is necessary to setup the config space before
 the device can accept further migration data. I think it is decided by
 the vendor driver, so we can simply ask the vendor driver about it in
 .save_setup, which could avoid a lot of unnecessary copies and settings.
 Once we have known the need, we can iterate the config space (before)
 along with the device migration data in .save_live_iterate and
 .save_live_complete_precopy, and if not needed, we can only migrate the
 config space in .save_state.

 Another key point is that the interrupt enabling should be after the
 restoring of the interrupt controller (might not only interrupts).
 My solution is to add a subflag at the beginning of the config data
 (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
 actions on the dst (such as whether to enable interrupts).

 Below is it's workflow.

 On the save path:
In vfio_save_setup():
Ask the vendor driver if it needs the config space setup before it
can accept further migration data.  
>>>
>>> How does "ask the vendor driver" actually work?  
>>
>> Maybe via a ioctl?
>> Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup)
>> and send the config data (before) along with the device migration data
>> every time?...
> 
> Migration data streams are unidirectional, otherwise we break offline
> migration.  There are various ways other than an ioctl for a vendor
> driver to advertise requirements, ex. a capability on the migration
> region info, but it's not clear to me that we really need this info yet.

Ok, this is not clear yet.

> 
|
In vfio_save_iterate() (pre-copy):
If *needed*, save the config space which would be setup on the dst
before the migration data, but send with a subflag to instruct not
to (such as) enable interrupts.  
>>>
>>> If not for triggering things like MSI/X configuration, isn't config
>>> space almost entirely virtual?  What visibility does the vendor driver
>>> have to the VM machine dependencies regarding device interrupt versus
>>> interrupt controller migration?  
>>
>> My thought is that the vendor driver only decides the order of the config
>> space setup and the receiving of the migration data, but leaves the VM
>> machine dependencies to QEMU.
> 
> But again, config space is largely virtual, the vendor driver doesn't
> have access to it, it's only the effects of config space, like
> representing the interrupt mode and configuring it on the device or
> specific registers that QEMU writes through that the vendor driver
> sees.  So how is it that the vendor driver decides the order?

As you said above, a vendor driver could advertise its requirements via
the migration region...

> The
> vendor driver doesn't have visibility to the VM machine dependencies,
> like when the interrupt controller is sufficiently configured to enable
> device interrupts.  It seems that a vendor driver that depends on QEMU
> enabling interrupts at a specific point is inherently dependent on
> assumptions in the machine configuration.

Yeah, there might be more than one dependency.

> 
|
In vfio_save_complete_precopy() (stop-and-copy, iterable process):
The same as that in vfio_save_iterate().
|
In .save_state (stop-and-copy, non-iterable process):
If *needed*, only send a subflag to instruct to enable interrupts.
If *not needed*, save the config space and setup everything on the dst. 
  
>>>
>>> Again, how does the vendor driver have visibility to know when the VM
>>> machine can enable interrupts?  
>>
>> It seems troubling if the vendor driver needs the interrupts to be enabled
>> first...
> 
> Yes.
> 
 Besides the above idea, we might be able to choose to let the vendor 
 driver do
 more: qemu just sends and writes the config data (before) along with the 
 device
 migration data every time, and it's up to the vendor driver to filter 
 out/buffer
 the received data or reorder the settings...  
>>>
>>> There is no vendor driver in QEMU though, so are you suggesting that
>>> QEMU follows a standard protocol and the vendor driver chooses when to
>>> enable specific features?  For instance, QEMU would call SET_IRQS and
>>> the driver would return success, but defer that setup if necessary?
>>> That seems quite troubling as we then have ioctls that behave
>>> differently depending on the device state and we have no error path to
>>> userspace should that setup fail later.  The vendor driver 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-12-02 Thread Dr. David Alan Gilbert
* Shenming Lu (lushenm...@huawei.com) wrote:
> Hi,
> 
> After reading everyone's opinions, we have a rough idea for this issue.
> 
> One key point is whether it is necessary to setup the config space before
> the device can accept further migration data. I think it is decided by
> the vendor driver, so we can simply ask the vendor driver about it in
> .save_setup, which could avoid a lot of unnecessary copies and settings.
> Once we have known the need, we can iterate the config space (before)
> along with the device migration data in .save_live_iterate and
> .save_live_complete_precopy, and if not needed, we can only migrate the
> config space in .save_state.
> 
> Another key point is that the interrupt enabling should be after the
> restoring of the interrupt controller (might not only interrupts).
> My solution is to add a subflag at the beginning of the config data
> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
> actions on the dst (such as whether to enable interrupts).
> 
> Below is it's workflow.
> 
> On the save path:
>   In vfio_save_setup():
>   Ask the vendor driver if it needs the config space setup before it
>   can accept further migration data.

You could argue that you could always send an initial copy of the config
data at the start; and then send new versions at the end anyway.
But is this just about interrupts? Is the dst going to interpret the
received migration data differently depending on the config data?  That
would seem dangerous if the config data was to change during the
migration.

Dave

>   |
>   In vfio_save_iterate() (pre-copy):
>   If *needed*, save the config space which would be setup on the dst
>   before the migration data, but send with a subflag to instruct not
>   to (such as) enable interrupts.
>   |
>   In vfio_save_complete_precopy() (stop-and-copy, iterable process):
>   The same as that in vfio_save_iterate().
>   |
>   In .save_state (stop-and-copy, non-iterable process):
>   If *needed*, only send a subflag to instruct to enable interrupts.
>   If *not needed*, save the config space and setup everything on the dst.
> 
> Besides the above idea, we might be able to choose to let the vendor driver do
> more: qemu just sends and writes the config data (before) along with the 
> device
> migration data every time, and it's up to the vendor driver to filter 
> out/buffer
> the received data or reorder the settings...
> 
> Thanks,
> Shenming
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-12-01 Thread Alex Williamson
On Tue, 1 Dec 2020 14:37:52 +0800
Shenming Lu  wrote:

> On 2020/12/1 1:03, Alex Williamson wrote:
> > On Thu, 26 Nov 2020 14:56:17 +0800
> > Shenming Lu  wrote:
> >   
> >> Hi,
> >>
> >> After reading everyone's opinions, we have a rough idea for this issue.
> >>
> >> One key point is whether it is necessary to setup the config space before
> >> the device can accept further migration data. I think it is decided by
> >> the vendor driver, so we can simply ask the vendor driver about it in
> >> .save_setup, which could avoid a lot of unnecessary copies and settings.
> >> Once we have known the need, we can iterate the config space (before)
> >> along with the device migration data in .save_live_iterate and
> >> .save_live_complete_precopy, and if not needed, we can only migrate the
> >> config space in .save_state.
> >>
> >> Another key point is that the interrupt enabling should be after the
> >> restoring of the interrupt controller (might not only interrupts).
> >> My solution is to add a subflag at the beginning of the config data
> >> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
> >> actions on the dst (such as whether to enable interrupts).
> >>
> >> Below is it's workflow.
> >>
> >> On the save path:
> >>In vfio_save_setup():
> >>Ask the vendor driver if it needs the config space setup before it
> >>can accept further migration data.  
> > 
> > How does "ask the vendor driver" actually work?  
> 
> Maybe via a ioctl?
> Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup)
> and send the config data (before) along with the device migration data
> every time?...

Migration data streams are unidirectional, otherwise we break offline
migration.  There are various ways other than an ioctl for a vendor
driver to advertise requirements, ex. a capability on the migration
region info, but it's not clear to me that we really need this info yet.

> >>|
> >>In vfio_save_iterate() (pre-copy):
> >>If *needed*, save the config space which would be setup on the dst
> >>before the migration data, but send with a subflag to instruct not
> >>to (such as) enable interrupts.  
> > 
> > If not for triggering things like MSI/X configuration, isn't config
> > space almost entirely virtual?  What visibility does the vendor driver
> > have to the VM machine dependencies regarding device interrupt versus
> > interrupt controller migration?  
> 
> My thought is that the vendor driver only decides the order of the config
> space setup and the receiving of the migration data, but leaves the VM
> machine dependencies to QEMU.

But again, config space is largely virtual, the vendor driver doesn't
have access to it, it's only the effects of config space, like
representing the interrupt mode and configuring it on the device or
specific registers that QEMU writes through that the vendor driver
sees.  So how is it that the vendor driver decides the order?  The
vendor driver doesn't have visibility to the VM machine dependencies,
like when the interrupt controller is sufficiently configured to enable
device interrupts.  It seems that a vendor driver that depends on QEMU
enabling interrupts at a specific point is inherently dependent on
assumptions in the machine configuration.

> >>|
> >>In vfio_save_complete_precopy() (stop-and-copy, iterable process):
> >>The same as that in vfio_save_iterate().
> >>|
> >>In .save_state (stop-and-copy, non-iterable process):
> >>If *needed*, only send a subflag to instruct to enable interrupts.
> >>If *not needed*, save the config space and setup everything on the dst. 
> >>  
> > 
> > Again, how does the vendor driver have visibility to know when the VM
> > machine can enable interrupts?  
> 
> It seems troubling if the vendor driver needs the interrupts to be enabled
> first...

Yes.

> >> Besides the above idea, we might be able to choose to let the vendor 
> >> driver do
> >> more: qemu just sends and writes the config data (before) along with the 
> >> device
> >> migration data every time, and it's up to the vendor driver to filter 
> >> out/buffer
> >> the received data or reorder the settings...  
> > 
> > There is no vendor driver in QEMU though, so are you suggesting that
> > QEMU follows a standard protocol and the vendor driver chooses when to
> > enable specific features?  For instance, QEMU would call SET_IRQS and
> > the driver would return success, but defer that setup if necessary?
> > That seems quite troubling as we then have ioctls that behave
> > differently depending on the device state and we have no error path to
> > userspace should that setup fail later.  The vendor driver does have
> > its own data stream for migration, so the vendor driver could tell the
> > destination version of itself what type of interrupt to use, which
> > might be sufficient if we were to ignore the latency if QEMU were to
> > defer interrupt setup until stop-and-copy.  

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-30 Thread Shenming Lu
On 2020/12/1 1:03, Alex Williamson wrote:
> On Thu, 26 Nov 2020 14:56:17 +0800
> Shenming Lu  wrote:
> 
>> Hi,
>>
>> After reading everyone's opinions, we have a rough idea for this issue.
>>
>> One key point is whether it is necessary to setup the config space before
>> the device can accept further migration data. I think it is decided by
>> the vendor driver, so we can simply ask the vendor driver about it in
>> .save_setup, which could avoid a lot of unnecessary copies and settings.
>> Once we have known the need, we can iterate the config space (before)
>> along with the device migration data in .save_live_iterate and
>> .save_live_complete_precopy, and if not needed, we can only migrate the
>> config space in .save_state.
>>
>> Another key point is that the interrupt enabling should be after the
>> restoring of the interrupt controller (might not only interrupts).
>> My solution is to add a subflag at the beginning of the config data
>> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
>> actions on the dst (such as whether to enable interrupts).
>>
>> Below is it's workflow.
>>
>> On the save path:
>>  In vfio_save_setup():
>>  Ask the vendor driver if it needs the config space setup before it
>>  can accept further migration data.
> 
> How does "ask the vendor driver" actually work?

Maybe via a ioctl?
Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup)
and send the config data (before) along with the device migration data
every time?...

> 
>>  |
>>  In vfio_save_iterate() (pre-copy):
>>  If *needed*, save the config space which would be setup on the dst
>>  before the migration data, but send with a subflag to instruct not
>>  to (such as) enable interrupts.
> 
> If not for triggering things like MSI/X configuration, isn't config
> space almost entirely virtual?  What visibility does the vendor driver
> have to the VM machine dependencies regarding device interrupt versus
> interrupt controller migration?

My thought is that the vendor driver only decides the order of the config
space setup and the receiving of the migration data, but leaves the VM
machine dependencies to QEMU.

> 
>>  |
>>  In vfio_save_complete_precopy() (stop-and-copy, iterable process):
>>  The same as that in vfio_save_iterate().
>>  |
>>  In .save_state (stop-and-copy, non-iterable process):
>>  If *needed*, only send a subflag to instruct to enable interrupts.
>>  If *not needed*, save the config space and setup everything on the dst.
> 
> Again, how does the vendor driver have visibility to know when the VM
> machine can enable interrupts?

It seems troubling if the vendor driver needs the interrupts to be enabled
first...

> 
>>
>> Besides the above idea, we might be able to choose to let the vendor driver 
>> do
>> more: qemu just sends and writes the config data (before) along with the 
>> device
>> migration data every time, and it's up to the vendor driver to filter 
>> out/buffer
>> the received data or reorder the settings...
> 
> There is no vendor driver in QEMU though, so are you suggesting that
> QEMU follows a standard protocol and the vendor driver chooses when to
> enable specific features?  For instance, QEMU would call SET_IRQS and
> the driver would return success, but defer that setup if necessary?
> That seems quite troubling as we then have ioctls that behave
> differently depending on the device state and we have no error path to
> userspace should that setup fail later.  The vendor driver does have
> its own data stream for migration, so the vendor driver could tell the
> destination version of itself what type of interrupt to use, which
> might be sufficient if we were to ignore the latency if QEMU were to
> defer interrupt setup until stop-and-copy.

Did you mean that we could only enable MSI-X during the iterable phase, but
leave the setup of these unmasked vectors to the non-iterable phase?
It looks good to me.

> 
> Is the question of when to setup device interrupts versus the interrupt
> controller state largely a machine issue within QEMU?  If so, shouldn't
> it be at QEMU's determination when to act on the config space
> information on the target?

I think it would be simpler if ensuring the proper calling order in QEMU...

> IOW, if a vendor driver has a dependency on
> interrupt configuration, they need to include it in their own pre-copy
> data stream and decouple that dependency from userspace interrupt
> configuration via the SET_IRQS ioctl.  Is that possible?  Thanks,
> 

I don't understand what the decoupling that dependency from userspace
interrupt configuration means...

Thanks,
Shenming

> Alex
> 
> .
> 



Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-30 Thread Alex Williamson
On Thu, 26 Nov 2020 14:56:17 +0800
Shenming Lu  wrote:

> Hi,
> 
> After reading everyone's opinions, we have a rough idea for this issue.
> 
> One key point is whether it is necessary to setup the config space before
> the device can accept further migration data. I think it is decided by
> the vendor driver, so we can simply ask the vendor driver about it in
> .save_setup, which could avoid a lot of unnecessary copies and settings.
> Once we have known the need, we can iterate the config space (before)
> along with the device migration data in .save_live_iterate and
> .save_live_complete_precopy, and if not needed, we can only migrate the
> config space in .save_state.
> 
> Another key point is that the interrupt enabling should be after the
> restoring of the interrupt controller (might not only interrupts).
> My solution is to add a subflag at the beginning of the config data
> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
> actions on the dst (such as whether to enable interrupts).
> 
> Below is it's workflow.
> 
> On the save path:
>   In vfio_save_setup():
>   Ask the vendor driver if it needs the config space setup before it
>   can accept further migration data.

How does "ask the vendor driver" actually work?

>   |
>   In vfio_save_iterate() (pre-copy):
>   If *needed*, save the config space which would be setup on the dst
>   before the migration data, but send with a subflag to instruct not
>   to (such as) enable interrupts.

If not for triggering things like MSI/X configuration, isn't config
space almost entirely virtual?  What visibility does the vendor driver
have to the VM machine dependencies regarding device interrupt versus
interrupt controller migration?

>   |
>   In vfio_save_complete_precopy() (stop-and-copy, iterable process):
>   The same as that in vfio_save_iterate().
>   |
>   In .save_state (stop-and-copy, non-iterable process):
>   If *needed*, only send a subflag to instruct to enable interrupts.
>   If *not needed*, save the config space and setup everything on the dst.

Again, how does the vendor driver have visibility to know when the VM
machine can enable interrupts?

> 
> Besides the above idea, we might be able to choose to let the vendor driver do
> more: qemu just sends and writes the config data (before) along with the 
> device
> migration data every time, and it's up to the vendor driver to filter 
> out/buffer
> the received data or reorder the settings...

There is no vendor driver in QEMU though, so are you suggesting that
QEMU follows a standard protocol and the vendor driver chooses when to
enable specific features?  For instance, QEMU would call SET_IRQS and
the driver would return success, but defer that setup if necessary?
That seems quite troubling as we then have ioctls that behave
differently depending on the device state and we have no error path to
userspace should that setup fail later.  The vendor driver does have
its own data stream for migration, so the vendor driver could tell the
destination version of itself what type of interrupt to use, which
might be sufficient if we were to ignore the latency if QEMU were to
defer interrupt setup until stop-and-copy.

Is the question of when to setup device interrupts versus the interrupt
controller state largely a machine issue within QEMU?  If so, shouldn't
it be at QEMU's determination when to act on the config space
information on the target?  IOW, if a vendor driver has a dependency on
interrupt configuration, they need to include it in their own pre-copy
data stream and decouple that dependency from userspace interrupt
configuration via the SET_IRQS ioctl.  Is that possible?  Thanks,

Alex




Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-25 Thread Shenming Lu
Hi,

After reading everyone's opinions, we have a rough idea for this issue.

One key point is whether it is necessary to setup the config space before
the device can accept further migration data. I think it is decided by
the vendor driver, so we can simply ask the vendor driver about it in
.save_setup, which could avoid a lot of unnecessary copies and settings.
Once we have known the need, we can iterate the config space (before)
along with the device migration data in .save_live_iterate and
.save_live_complete_precopy, and if not needed, we can only migrate the
config space in .save_state.

Another key point is that the interrupt enabling should be after the
restoring of the interrupt controller (might not only interrupts).
My solution is to add a subflag at the beginning of the config data
(right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
actions on the dst (such as whether to enable interrupts).

Below is it's workflow.

On the save path:
In vfio_save_setup():
Ask the vendor driver if it needs the config space setup before it
can accept further migration data.
|
In vfio_save_iterate() (pre-copy):
If *needed*, save the config space which would be setup on the dst
before the migration data, but send with a subflag to instruct not
to (such as) enable interrupts.
|
In vfio_save_complete_precopy() (stop-and-copy, iterable process):
The same as that in vfio_save_iterate().
|
In .save_state (stop-and-copy, non-iterable process):
If *needed*, only send a subflag to instruct to enable interrupts.
If *not needed*, save the config space and setup everything on the dst.

Besides the above idea, we might be able to choose to let the vendor driver do
more: qemu just sends and writes the config data (before) along with the device
migration data every time, and it's up to the vendor driver to filter out/buffer
the received data or reorder the settings...

Thanks,
Shenming




Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-24 Thread Dr. David Alan Gilbert
* Neo Jia (c...@nvidia.com) wrote:
> On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2020/11/21 6:01, Alex Williamson wrote:
> > > On Fri, 20 Nov 2020 22:05:49 +0800
> > > Shenming Lu  wrote:
> > >
> > >> On 2020/11/20 1:41, Alex Williamson wrote:
> > >>> On Thu, 19 Nov 2020 14:13:24 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>
> >  On 11/14/2020 2:47 PM, Shenming Lu wrote:
> > > When running VFIO migration, I found that the restoring of VFIO PCI 
> > > device’s
> > > config space is before VGIC on ARM64 target. But generally, interrupt 
> > > controllers
> > > need to be restored before PCI devices.
> > 
> >  Is there any other way by which VGIC can be restored before PCI device?
> > >>
> > >> As far as I know, it seems to have to depend on priorities in the 
> > >> non-iterable process.
> > >>
> > 
> > > Besides, if a VFIO PCI device is
> > > configured to have directly-injected MSIs (VLPIs), the restoring of 
> > > its config
> > > space will trigger the configuring of these VLPIs (in kernel), where 
> > > it would
> > > return an error as I saw due to the dependency on kvm’s vgic.
> > >
> > 
> >  Can this be fixed in kernel to re-initialize the kernel state?
> > >>
> > >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
> > >> But the fact is that this error is not caused by kernel, it is due to 
> > >> the incorrect
> > >> calling order of qemu...
> > >>
> > 
> > > To avoid this, we can move the saving of the config space from the 
> > > iterable
> > > process to the non-iterable process, so that it will be called after 
> > > VGIC
> > > according to their priorities.
> > >
> > 
> >  With this change, at resume side, pre-copy phase data would reach
> >  destination without restored config space. VFIO device on destination
> >  might need it's config space setup and validated before it can accept
> >  further VFIO device specific migration state.
> > 
> >  This also changes bit-stream, so it would break migration with original
> >  migration patch-set.
> > >>>
> > >>> Config space can continue to change while in pre-copy, if we're only
> > >>> sending config space at the initiation of pre-copy, how are any changes
> > >>> that might occur before the VM is stopped conveyed to the target?  For
> > >>> example the guest might reboot and a device returned to INTx mode from
> > >>> MSI during pre-copy.  Thanks,
> > >>
> > >> What I see is that the config space is only saved once in 
> > >> save_live_complete_precopy
> > >> currently...
> > >> As you said, a VFIO device might need it's config space setup first, and
> > >> the config space can continue to change while in pre-copy, Did you mean 
> > >> we
> > >> have to migrate the config space in save_live_iterate?
> > >> However, I still have a little doubt about the restoring dependence 
> > >> between
> > >> the qemu emulated config space and the device data...
> > >>
> > >> Besides, if we surely can't move the saving of the config space back, 
> > >> can we
> > >> just move some actions which are triggered by the restoring of the 
> > >> config space
> > >> back (such as vfio_msix_enable())?
> > >
> > > It seems that the significant benefit to enabling interrupts during
> > > pre-copy would be to reduce the latency and failure potential during
> > > the final phase of migration.  Do we have any data for how much it adds
> > > to the device contributed downtime to configure interrupts only at the
> > > final stage?  My guess is that it's a measurable delay on its own.  At
> > > the same time, we can't ignore the differences in machine specific
> > > dependencies and if we don't even sync the config space once the VM is
> > > stopped... this all seems not ready to call supported, especially if we
> > > have concerns already about migration bit-stream compatibility.
> > >
> > 
> > I have another question for this, if we restore the config space while in 
> > pre-copy
> > (include enabling interrupts), does it affect the _RESUMING state (paused) 
> > of the
> > device on the dst host (cause it to send interrupts? which should not be 
> > allowed
> > in this stage). Does the restore sequence need to be further discussed and 
> > reach
> > a consensus(spec) (taking into account other devices and the corresponding 
> > actions
> > of the vendor driver)?
> > 
> > > Given our timing relative to QEMU 5.2, the only path I feel comfortable
> > > with is to move forward with downgrading vfio migration support to be
> > > enabled via an experimental option.  Objections?  Thanks,
> > 
> > Alright, but this issue is related to our ARM GICv4.1 migration scheme, 
> > could you
> > give a rough idea about this (where to enable interrupts, we hope it to be 
> > after
> > the restoring of VGIC)?
> 
> I disagree. If this is only specific 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-24 Thread Shenming Lu
On 2020/11/24 3:33, Neo Jia wrote:
> On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2020/11/21 6:01, Alex Williamson wrote:
>>> On Fri, 20 Nov 2020 22:05:49 +0800
>>> Shenming Lu  wrote:
>>>
 On 2020/11/20 1:41, Alex Williamson wrote:
> On Thu, 19 Nov 2020 14:13:24 +0530
> Kirti Wankhede  wrote:
>
>> On 11/14/2020 2:47 PM, Shenming Lu wrote:
>>> When running VFIO migration, I found that the restoring of VFIO PCI 
>>> device’s
>>> config space is before VGIC on ARM64 target. But generally, interrupt 
>>> controllers
>>> need to be restored before PCI devices.
>>
>> Is there any other way by which VGIC can be restored before PCI device?

 As far as I know, it seems to have to depend on priorities in the 
 non-iterable process.

>>
>>> Besides, if a VFIO PCI device is
>>> configured to have directly-injected MSIs (VLPIs), the restoring of its 
>>> config
>>> space will trigger the configuring of these VLPIs (in kernel), where it 
>>> would
>>> return an error as I saw due to the dependency on kvm’s vgic.
>>>
>>
>> Can this be fixed in kernel to re-initialize the kernel state?

 Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
 But the fact is that this error is not caused by kernel, it is due to the 
 incorrect
 calling order of qemu...

>>
>>> To avoid this, we can move the saving of the config space from the 
>>> iterable
>>> process to the non-iterable process, so that it will be called after 
>>> VGIC
>>> according to their priorities.
>>>
>>
>> With this change, at resume side, pre-copy phase data would reach
>> destination without restored config space. VFIO device on destination
>> might need it's config space setup and validated before it can accept
>> further VFIO device specific migration state.
>>
>> This also changes bit-stream, so it would break migration with original
>> migration patch-set.
>
> Config space can continue to change while in pre-copy, if we're only
> sending config space at the initiation of pre-copy, how are any changes
> that might occur before the VM is stopped conveyed to the target?  For
> example the guest might reboot and a device returned to INTx mode from
> MSI during pre-copy.  Thanks,

 What I see is that the config space is only saved once in 
 save_live_complete_precopy
 currently...
 As you said, a VFIO device might need it's config space setup first, and
 the config space can continue to change while in pre-copy, Did you mean we
 have to migrate the config space in save_live_iterate?
 However, I still have a little doubt about the restoring dependence between
 the qemu emulated config space and the device data...

 Besides, if we surely can't move the saving of the config space back, can 
 we
 just move some actions which are triggered by the restoring of the config 
 space
 back (such as vfio_msix_enable())?
>>>
>>> It seems that the significant benefit to enabling interrupts during
>>> pre-copy would be to reduce the latency and failure potential during
>>> the final phase of migration.  Do we have any data for how much it adds
>>> to the device contributed downtime to configure interrupts only at the
>>> final stage?  My guess is that it's a measurable delay on its own.  At
>>> the same time, we can't ignore the differences in machine specific
>>> dependencies and if we don't even sync the config space once the VM is
>>> stopped... this all seems not ready to call supported, especially if we
>>> have concerns already about migration bit-stream compatibility.
>>>
>>
>> I have another question for this, if we restore the config space while in 
>> pre-copy
>> (include enabling interrupts), does it affect the _RESUMING state (paused) 
>> of the
>> device on the dst host (cause it to send interrupts? which should not be 
>> allowed
>> in this stage). Does the restore sequence need to be further discussed and 
>> reach
>> a consensus(spec) (taking into account other devices and the corresponding 
>> actions
>> of the vendor driver)?
>>
>>> Given our timing relative to QEMU 5.2, the only path I feel comfortable
>>> with is to move forward with downgrading vfio migration support to be
>>> enabled via an experimental option.  Objections?  Thanks,
>>
>> Alright, but this issue is related to our ARM GICv4.1 migration scheme, 
>> could you
>> give a rough idea about this (where to enable interrupts, we hope it to be 
>> after
>> the restoring of VGIC)?
> 
> I disagree. If this is only specific to Huawei ARM GIC implementation, why do 
> we want to
> make the entire VFIO based migration an experimental feature?

It is not specific to Huawei ARM GIC implementation, the error was encountered 
in general
ARM 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-23 Thread Alex Williamson
On Mon, 23 Nov 2020 11:33:37 -0800
Neo Jia  wrote:

> On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2020/11/21 6:01, Alex Williamson wrote:  
> > > On Fri, 20 Nov 2020 22:05:49 +0800
> > > Shenming Lu  wrote:
> > >  
> > >> On 2020/11/20 1:41, Alex Williamson wrote:  
> > >>> On Thu, 19 Nov 2020 14:13:24 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>  
> >  On 11/14/2020 2:47 PM, Shenming Lu wrote:  
> > > When running VFIO migration, I found that the restoring of VFIO PCI 
> > > device’s
> > > config space is before VGIC on ARM64 target. But generally, interrupt 
> > > controllers
> > > need to be restored before PCI devices.  
> > 
> >  Is there any other way by which VGIC can be restored before PCI 
> >  device?  
> > >>
> > >> As far as I know, it seems to have to depend on priorities in the 
> > >> non-iterable process.
> > >>  
> >   
> > > Besides, if a VFIO PCI device is
> > > configured to have directly-injected MSIs (VLPIs), the restoring of 
> > > its config
> > > space will trigger the configuring of these VLPIs (in kernel), where 
> > > it would
> > > return an error as I saw due to the dependency on kvm’s vgic.
> > >  
> > 
> >  Can this be fixed in kernel to re-initialize the kernel state?  
> > >>
> > >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
> > >> But the fact is that this error is not caused by kernel, it is due to 
> > >> the incorrect
> > >> calling order of qemu...
> > >>  
> >   
> > > To avoid this, we can move the saving of the config space from the 
> > > iterable
> > > process to the non-iterable process, so that it will be called after 
> > > VGIC
> > > according to their priorities.
> > >  
> > 
> >  With this change, at resume side, pre-copy phase data would reach
> >  destination without restored config space. VFIO device on destination
> >  might need it's config space setup and validated before it can accept
> >  further VFIO device specific migration state.
> > 
> >  This also changes bit-stream, so it would break migration with original
> >  migration patch-set.  
> > >>>
> > >>> Config space can continue to change while in pre-copy, if we're only
> > >>> sending config space at the initiation of pre-copy, how are any changes
> > >>> that might occur before the VM is stopped conveyed to the target?  For
> > >>> example the guest might reboot and a device returned to INTx mode from
> > >>> MSI during pre-copy.  Thanks,  
> > >>
> > >> What I see is that the config space is only saved once in 
> > >> save_live_complete_precopy
> > >> currently...
> > >> As you said, a VFIO device might need it's config space setup first, and
> > >> the config space can continue to change while in pre-copy, Did you mean 
> > >> we
> > >> have to migrate the config space in save_live_iterate?
> > >> However, I still have a little doubt about the restoring dependence 
> > >> between
> > >> the qemu emulated config space and the device data...
> > >>
> > >> Besides, if we surely can't move the saving of the config space back, 
> > >> can we
> > >> just move some actions which are triggered by the restoring of the 
> > >> config space
> > >> back (such as vfio_msix_enable())?  
> > >
> > > It seems that the significant benefit to enabling interrupts during
> > > pre-copy would be to reduce the latency and failure potential during
> > > the final phase of migration.  Do we have any data for how much it adds
> > > to the device contributed downtime to configure interrupts only at the
> > > final stage?  My guess is that it's a measurable delay on its own.  At
> > > the same time, we can't ignore the differences in machine specific
> > > dependencies and if we don't even sync the config space once the VM is
> > > stopped... this all seems not ready to call supported, especially if we
> > > have concerns already about migration bit-stream compatibility.
> > >  
> > 
> > I have another question for this, if we restore the config space while in 
> > pre-copy
> > (include enabling interrupts), does it affect the _RESUMING state (paused) 
> > of the
> > device on the dst host (cause it to send interrupts? which should not be 
> > allowed
> > in this stage). Does the restore sequence need to be further discussed and 
> > reach
> > a consensus(spec) (taking into account other devices and the corresponding 
> > actions
> > of the vendor driver)?

If a target device generates an interrupt when stopped, I think that
would violate the basic notion of running vs stopped in the device
state, right?

> > > Given our timing relative to QEMU 5.2, the only path I feel comfortable
> > > with is to move forward with downgrading vfio migration support to be
> > > enabled via an experimental option.  Objections?  Thanks,  
> > 
> > Alright, but this issue is related 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-23 Thread Neo Jia
On Mon, Nov 23, 2020 at 11:14:38AM +0800, Shenming Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2020/11/21 6:01, Alex Williamson wrote:
> > On Fri, 20 Nov 2020 22:05:49 +0800
> > Shenming Lu  wrote:
> >
> >> On 2020/11/20 1:41, Alex Williamson wrote:
> >>> On Thu, 19 Nov 2020 14:13:24 +0530
> >>> Kirti Wankhede  wrote:
> >>>
>  On 11/14/2020 2:47 PM, Shenming Lu wrote:
> > When running VFIO migration, I found that the restoring of VFIO PCI 
> > device’s
> > config space is before VGIC on ARM64 target. But generally, interrupt 
> > controllers
> > need to be restored before PCI devices.
> 
>  Is there any other way by which VGIC can be restored before PCI device?
> >>
> >> As far as I know, it seems to have to depend on priorities in the 
> >> non-iterable process.
> >>
> 
> > Besides, if a VFIO PCI device is
> > configured to have directly-injected MSIs (VLPIs), the restoring of its 
> > config
> > space will trigger the configuring of these VLPIs (in kernel), where it 
> > would
> > return an error as I saw due to the dependency on kvm’s vgic.
> >
> 
>  Can this be fixed in kernel to re-initialize the kernel state?
> >>
> >> Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
> >> But the fact is that this error is not caused by kernel, it is due to the 
> >> incorrect
> >> calling order of qemu...
> >>
> 
> > To avoid this, we can move the saving of the config space from the 
> > iterable
> > process to the non-iterable process, so that it will be called after 
> > VGIC
> > according to their priorities.
> >
> 
>  With this change, at resume side, pre-copy phase data would reach
>  destination without restored config space. VFIO device on destination
>  might need it's config space setup and validated before it can accept
>  further VFIO device specific migration state.
> 
>  This also changes bit-stream, so it would break migration with original
>  migration patch-set.
> >>>
> >>> Config space can continue to change while in pre-copy, if we're only
> >>> sending config space at the initiation of pre-copy, how are any changes
> >>> that might occur before the VM is stopped conveyed to the target?  For
> >>> example the guest might reboot and a device returned to INTx mode from
> >>> MSI during pre-copy.  Thanks,
> >>
> >> What I see is that the config space is only saved once in 
> >> save_live_complete_precopy
> >> currently...
> >> As you said, a VFIO device might need it's config space setup first, and
> >> the config space can continue to change while in pre-copy, Did you mean we
> >> have to migrate the config space in save_live_iterate?
> >> However, I still have a little doubt about the restoring dependence between
> >> the qemu emulated config space and the device data...
> >>
> >> Besides, if we surely can't move the saving of the config space back, can 
> >> we
> >> just move some actions which are triggered by the restoring of the config 
> >> space
> >> back (such as vfio_msix_enable())?
> >
> > It seems that the significant benefit to enabling interrupts during
> > pre-copy would be to reduce the latency and failure potential during
> > the final phase of migration.  Do we have any data for how much it adds
> > to the device contributed downtime to configure interrupts only at the
> > final stage?  My guess is that it's a measurable delay on its own.  At
> > the same time, we can't ignore the differences in machine specific
> > dependencies and if we don't even sync the config space once the VM is
> > stopped... this all seems not ready to call supported, especially if we
> > have concerns already about migration bit-stream compatibility.
> >
> 
> I have another question for this, if we restore the config space while in 
> pre-copy
> (include enabling interrupts), does it affect the _RESUMING state (paused) of 
> the
> device on the dst host (cause it to send interrupts? which should not be 
> allowed
> in this stage). Does the restore sequence need to be further discussed and 
> reach
> a consensus(spec) (taking into account other devices and the corresponding 
> actions
> of the vendor driver)?
> 
> > Given our timing relative to QEMU 5.2, the only path I feel comfortable
> > with is to move forward with downgrading vfio migration support to be
> > enabled via an experimental option.  Objections?  Thanks,
> 
> Alright, but this issue is related to our ARM GICv4.1 migration scheme, could 
> you
> give a rough idea about this (where to enable interrupts, we hope it to be 
> after
> the restoring of VGIC)?

I disagree. If this is only specific to Huawei ARM GIC implementation, why do 
we want to
make the entire VFIO based migration an experimental feature?

Thanks,
Neo

> 
> Thanks,
> Shenming



Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-23 Thread Cornelia Huck
On Fri, 20 Nov 2020 15:01:46 -0700
Alex Williamson  wrote:

> Given our timing relative to QEMU 5.2, the only path I feel comfortable
> with is to move forward with downgrading vfio migration support to be
> enabled via an experimental option.  Objections?  Thanks,
> 
> Alex

Agreed from my side. It seems better to give it one more release to
figure out some issues, and just mark it experimental for now.




Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-22 Thread Shenming Lu
On 2020/11/21 6:01, Alex Williamson wrote:
> On Fri, 20 Nov 2020 22:05:49 +0800
> Shenming Lu  wrote:
> 
>> On 2020/11/20 1:41, Alex Williamson wrote:
>>> On Thu, 19 Nov 2020 14:13:24 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 11/14/2020 2:47 PM, Shenming Lu wrote:  
> When running VFIO migration, I found that the restoring of VFIO PCI 
> device’s
> config space is before VGIC on ARM64 target. But generally, interrupt 
> controllers
> need to be restored before PCI devices. 

 Is there any other way by which VGIC can be restored before PCI device?  
>>
>> As far as I know, it seems to have to depend on priorities in the 
>> non-iterable process.
>>
  
> Besides, if a VFIO PCI device is
> configured to have directly-injected MSIs (VLPIs), the restoring of its 
> config
> space will trigger the configuring of these VLPIs (in kernel), where it 
> would
> return an error as I saw due to the dependency on kvm’s vgic.
> 

 Can this be fixed in kernel to re-initialize the kernel state?  
>>
>> Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
>> But the fact is that this error is not caused by kernel, it is due to the 
>> incorrect
>> calling order of qemu...
>>
  
> To avoid this, we can move the saving of the config space from the 
> iterable
> process to the non-iterable process, so that it will be called after VGIC
> according to their priorities.
> 

 With this change, at resume side, pre-copy phase data would reach 
 destination without restored config space. VFIO device on destination 
 might need it's config space setup and validated before it can accept 
 further VFIO device specific migration state.

 This also changes bit-stream, so it would break migration with original 
 migration patch-set.  
>>>
>>> Config space can continue to change while in pre-copy, if we're only
>>> sending config space at the initiation of pre-copy, how are any changes
>>> that might occur before the VM is stopped conveyed to the target?  For
>>> example the guest might reboot and a device returned to INTx mode from
>>> MSI during pre-copy.  Thanks,  
>>
>> What I see is that the config space is only saved once in 
>> save_live_complete_precopy
>> currently...
>> As you said, a VFIO device might need it's config space setup first, and
>> the config space can continue to change while in pre-copy, Did you mean we
>> have to migrate the config space in save_live_iterate?
>> However, I still have a little doubt about the restoring dependence between
>> the qemu emulated config space and the device data...
>>
>> Besides, if we surely can't move the saving of the config space back, can we
>> just move some actions which are triggered by the restoring of the config 
>> space
>> back (such as vfio_msix_enable())?
> 
> It seems that the significant benefit to enabling interrupts during
> pre-copy would be to reduce the latency and failure potential during
> the final phase of migration.  Do we have any data for how much it adds
> to the device contributed downtime to configure interrupts only at the
> final stage?  My guess is that it's a measurable delay on its own.  At
> the same time, we can't ignore the differences in machine specific
> dependencies and if we don't even sync the config space once the VM is
> stopped... this all seems not ready to call supported, especially if we
> have concerns already about migration bit-stream compatibility.
> 

I have another question for this, if we restore the config space while in 
pre-copy
(include enabling interrupts), does it affect the _RESUMING state (paused) of 
the
device on the dst host (cause it to send interrupts? which should not be allowed
in this stage). Does the restore sequence need to be further discussed and reach
a consensus(spec) (taking into account other devices and the corresponding 
actions
of the vendor driver)?

> Given our timing relative to QEMU 5.2, the only path I feel comfortable
> with is to move forward with downgrading vfio migration support to be
> enabled via an experimental option.  Objections?  Thanks,

Alright, but this issue is related to our ARM GICv4.1 migration scheme, could 
you
give a rough idea about this (where to enable interrupts, we hope it to be after
the restoring of VGIC)?

Thanks,
Shenming



Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-20 Thread Alex Williamson
On Fri, 20 Nov 2020 22:05:49 +0800
Shenming Lu  wrote:

> On 2020/11/20 1:41, Alex Williamson wrote:
> > On Thu, 19 Nov 2020 14:13:24 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 11/14/2020 2:47 PM, Shenming Lu wrote:  
> >>> When running VFIO migration, I found that the restoring of VFIO PCI 
> >>> device’s
> >>> config space is before VGIC on ARM64 target. But generally, interrupt 
> >>> controllers
> >>> need to be restored before PCI devices. 
> >>
> >> Is there any other way by which VGIC can be restored before PCI device?  
> 
> As far as I know, it seems to have to depend on priorities in the 
> non-iterable process.
> 
> >>  
> >>> Besides, if a VFIO PCI device is
> >>> configured to have directly-injected MSIs (VLPIs), the restoring of its 
> >>> config
> >>> space will trigger the configuring of these VLPIs (in kernel), where it 
> >>> would
> >>> return an error as I saw due to the dependency on kvm’s vgic.
> >>> 
> >>
> >> Can this be fixed in kernel to re-initialize the kernel state?  
> 
> Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
> But the fact is that this error is not caused by kernel, it is due to the 
> incorrect
> calling order of qemu...
> 
> >>  
> >>> To avoid this, we can move the saving of the config space from the 
> >>> iterable
> >>> process to the non-iterable process, so that it will be called after VGIC
> >>> according to their priorities.
> >>> 
> >>
> >> With this change, at resume side, pre-copy phase data would reach 
> >> destination without restored config space. VFIO device on destination 
> >> might need it's config space setup and validated before it can accept 
> >> further VFIO device specific migration state.
> >>
> >> This also changes bit-stream, so it would break migration with original 
> >> migration patch-set.  
> > 
> > Config space can continue to change while in pre-copy, if we're only
> > sending config space at the initiation of pre-copy, how are any changes
> > that might occur before the VM is stopped conveyed to the target?  For
> > example the guest might reboot and a device returned to INTx mode from
> > MSI during pre-copy.  Thanks,  
> 
> What I see is that the config space is only saved once in 
> save_live_complete_precopy
> currently...
> As you said, a VFIO device might need it's config space setup first, and
> the config space can continue to change while in pre-copy, Did you mean we
> have to migrate the config space in save_live_iterate?
> However, I still have a little doubt about the restoring dependence between
> the qemu emulated config space and the device data...
> 
> Besides, if we surely can't move the saving of the config space back, can we
> just move some actions which are triggered by the restoring of the config 
> space
> back (such as vfio_msix_enable())?

It seems that the significant benefit to enabling interrupts during
pre-copy would be to reduce the latency and failure potential during
the final phase of migration.  Do we have any data for how much it adds
to the device contributed downtime to configure interrupts only at the
final stage?  My guess is that it's a measurable delay on its own.  At
the same time, we can't ignore the differences in machine specific
dependencies and if we don't even sync the config space once the VM is
stopped... this all seems not ready to call supported, especially if we
have concerns already about migration bit-stream compatibility.

Given our timing relative to QEMU 5.2, the only path I feel comfortable
with is to move forward with downgrading vfio migration support to be
enabled via an experimental option.  Objections?  Thanks,

Alex

> 
> The demo patch is as follows (but it seems that .save_state is not a 
> appropriate
> place to do it. -_-):
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 55261562d4..9cf0310148 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -44,6 +44,7 @@
>  #define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
>  #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
>  #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> +#define VFIO_MIG_FLAG_DEV_STATE_TRIGGER (0xef15ULL)
> 
>  static int64_t bytes_transferred;
> 
> @@ -368,6 +369,15 @@ static int vfio_save_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
> 
> +static void vfio_device_state_trigger(void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +
> +if (vbasedev->ops && vbasedev->ops->vfio_trigger_config) {
> +vbasedev->ops->vfio_trigger_config(vbasedev);
> +}
> +}
> +
>  static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
>  {
>  VFIODevice *vbasedev = opaque;
> @@ -620,6 +630,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
> *opaque)
>  return ret;
>  }
> 
> +static void vfio_save_state_trigger(QEMUFile *f, void *opaque)
> +{
> +qemu_put_be64(f, 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-20 Thread Shenming Lu
On 2020/11/20 1:41, Alex Williamson wrote:
> On Thu, 19 Nov 2020 14:13:24 +0530
> Kirti Wankhede  wrote:
> 
>> On 11/14/2020 2:47 PM, Shenming Lu wrote:
>>> When running VFIO migration, I found that the restoring of VFIO PCI device’s
>>> config space is before VGIC on ARM64 target. But generally, interrupt 
>>> controllers
>>> need to be restored before PCI devices.   
>>
>> Is there any other way by which VGIC can be restored before PCI device?

As far as I know, it seems to have to depend on priorities in the non-iterable 
process.

>>
>>> Besides, if a VFIO PCI device is
>>> configured to have directly-injected MSIs (VLPIs), the restoring of its 
>>> config
>>> space will trigger the configuring of these VLPIs (in kernel), where it 
>>> would
>>> return an error as I saw due to the dependency on kvm’s vgic.
>>>   
>>
>> Can this be fixed in kernel to re-initialize the kernel state?

Did you mean to reconfigure these VLPIs when restoring kvm's vgic?
But the fact is that this error is not caused by kernel, it is due to the 
incorrect
calling order of qemu...

>>
>>> To avoid this, we can move the saving of the config space from the iterable
>>> process to the non-iterable process, so that it will be called after VGIC
>>> according to their priorities.
>>>   
>>
>> With this change, at resume side, pre-copy phase data would reach 
>> destination without restored config space. VFIO device on destination 
>> might need it's config space setup and validated before it can accept 
>> further VFIO device specific migration state.
>>
>> This also changes bit-stream, so it would break migration with original 
>> migration patch-set.
> 
> Config space can continue to change while in pre-copy, if we're only
> sending config space at the initiation of pre-copy, how are any changes
> that might occur before the VM is stopped conveyed to the target?  For
> example the guest might reboot and a device returned to INTx mode from
> MSI during pre-copy.  Thanks,

What I see is that the config space is only saved once in 
save_live_complete_precopy
currently...
As you said, a VFIO device might need it's config space setup first, and
the config space can continue to change while in pre-copy, Did you mean we
have to migrate the config space in save_live_iterate?
However, I still have a little doubt about the restoring dependence between
the qemu emulated config space and the device data...

Besides, if we surely can't move the saving of the config space back, can we
just move some actions which are triggered by the restoring of the config space
back (such as vfio_msix_enable())?

The demo patch is as follows (but it seems that .save_state is not a appropriate
place to do it. -_-):

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 55261562d4..9cf0310148 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -44,6 +44,7 @@
 #define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+#define VFIO_MIG_FLAG_DEV_STATE_TRIGGER (0xef15ULL)

 static int64_t bytes_transferred;

@@ -368,6 +369,15 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }

+static void vfio_device_state_trigger(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+
+if (vbasedev->ops && vbasedev->ops->vfio_trigger_config) {
+vbasedev->ops->vfio_trigger_config(vbasedev);
+}
+}
+
 static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
@@ -620,6 +630,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }

+static void vfio_save_state_trigger(QEMUFile *f, void *opaque)
+{
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_STATE_TRIGGER);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+}
+
 static int vfio_load_setup(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
@@ -700,6 +717,18 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int 
version_id)
 }
 break;
 }
+case VFIO_MIG_FLAG_DEV_STATE_TRIGGER:
+{
+vfio_device_state_trigger(opaque);
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {
+return ret;
+} else {
+error_report("%s: STATE TRIGGER: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+}
 default:
 error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
 return -EINVAL;
@@ -720,6 +749,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
 .save_live_pending = vfio_save_pending,
 .save_live_iterate = vfio_save_iterate,
 .save_live_complete_precopy = vfio_save_complete_precopy,
+.save_state = vfio_save_state_trigger,
 .load_setup = 

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-19 Thread Alex Williamson
On Thu, 19 Nov 2020 14:13:24 +0530
Kirti Wankhede  wrote:

> On 11/14/2020 2:47 PM, Shenming Lu wrote:
> > When running VFIO migration, I found that the restoring of VFIO PCI device’s
> > config space is before VGIC on ARM64 target. But generally, interrupt 
> > controllers
> > need to be restored before PCI devices.   
> 
> Is there any other way by which VGIC can be restored before PCI device?
> 
> > Besides, if a VFIO PCI device is
> > configured to have directly-injected MSIs (VLPIs), the restoring of its 
> > config
> > space will trigger the configuring of these VLPIs (in kernel), where it 
> > would
> > return an error as I saw due to the dependency on kvm’s vgic.
> >   
> 
> Can this be fixed in kernel to re-initialize the kernel state?
> 
> > To avoid this, we can move the saving of the config space from the iterable
> > process to the non-iterable process, so that it will be called after VGIC
> > according to their priorities.
> >   
> 
> With this change, at resume side, pre-copy phase data would reach 
> destination without restored config space. VFIO device on destination 
> might need it's config space setup and validated before it can accept 
> further VFIO device specific migration state.
> 
> This also changes bit-stream, so it would break migration with original 
> migration patch-set.

Config space can continue to change while in pre-copy, if we're only
sending config space at the initiation of pre-copy, how are any changes
that might occur before the VM is stopped conveyed to the target?  For
example the guest might reboot and a device returned to INTx mode from
MSI during pre-copy.  Thanks,

Alex


> > Signed-off-by: Shenming Lu 
> > ---
> >   hw/vfio/migration.c | 22 ++
> >   1 file changed, 6 insertions(+), 16 deletions(-)
> > 
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index 3ce285ea39..028da35a25 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev)
> >   return 0;
> >   }
> >   
> > -static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
> > +static void vfio_save_device_config_state(QEMUFile *f, void *opaque)
> >   {
> >   VFIODevice *vbasedev = opaque;
> >   
> > @@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, 
> > void *opaque)
> >   
> >   trace_vfio_save_device_config_state(vbasedev->name);
> >   
> > -return qemu_file_get_error(f);
> > +if (qemu_file_get_error(f))
> > +error_report("%s: Failed to save device config space",
> > + vbasedev->name);
> >   }
> >   
> >   static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
> >   {
> >   VFIODevice *vbasedev = opaque;
> > -uint64_t data;
> >   
> >   if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
> >   int ret;
> > @@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, 
> > void *opaque)
> >   }
> >   }
> >   
> > -data = qemu_get_be64(f);
> > -if (data != VFIO_MIG_FLAG_END_OF_STATE) {
> > -error_report("%s: Failed loading device config space, "
> > - "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
> > -return -EINVAL;
> > -}
> > -
> >   trace_vfio_load_device_config_state(vbasedev->name);
> > -return qemu_file_get_error(f);
> > +return 0;
> >   }
> >   
> >   static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
> > @@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
> > void *opaque)
> >   return ret;
> >   }
> >   
> > -ret = vfio_save_device_config_state(f, opaque);
> > -if (ret) {
> > -return ret;
> > -}
> > -
> >   ret = vfio_update_pending(vbasedev);
> >   if (ret) {
> >   return ret;
> > @@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
> >   .save_live_pending = vfio_save_pending,
> >   .save_live_iterate = vfio_save_iterate,
> >   .save_live_complete_precopy = vfio_save_complete_precopy,
> > +.save_state = vfio_save_device_config_state,
> >   .load_setup = vfio_load_setup,
> >   .load_cleanup = vfio_load_cleanup,
> >   .load_state = vfio_load_state,
> >   
> 




Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-19 Thread Kirti Wankhede




On 11/14/2020 2:47 PM, Shenming Lu wrote:

When running VFIO migration, I found that the restoring of VFIO PCI device’s
config space is before VGIC on ARM64 target. But generally, interrupt 
controllers
need to be restored before PCI devices. 


Is there any other way by which VGIC can be restored before PCI device?


Besides, if a VFIO PCI device is
configured to have directly-injected MSIs (VLPIs), the restoring of its config
space will trigger the configuring of these VLPIs (in kernel), where it would
return an error as I saw due to the dependency on kvm’s vgic.



Can this be fixed in kernel to re-initialize the kernel state?


To avoid this, we can move the saving of the config space from the iterable
process to the non-iterable process, so that it will be called after VGIC
according to their priorities.



With this change, at resume side, pre-copy phase data would reach 
destination without restored config space. VFIO device on destination 
might need it's config space setup and validated before it can accept 
further VFIO device specific migration state.


This also changes bit-stream, so it would break migration with original 
migration patch-set.


Thanks,
Kirti


Signed-off-by: Shenming Lu 
---
  hw/vfio/migration.c | 22 ++
  1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3ce285ea39..028da35a25 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev)
  return 0;
  }
  
-static int vfio_save_device_config_state(QEMUFile *f, void *opaque)

+static void vfio_save_device_config_state(QEMUFile *f, void *opaque)
  {
  VFIODevice *vbasedev = opaque;
  
@@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
  
  trace_vfio_save_device_config_state(vbasedev->name);
  
-return qemu_file_get_error(f);

+if (qemu_file_get_error(f))
+error_report("%s: Failed to save device config space",
+ vbasedev->name);
  }
  
  static int vfio_load_device_config_state(QEMUFile *f, void *opaque)

  {
  VFIODevice *vbasedev = opaque;
-uint64_t data;
  
  if (vbasedev->ops && vbasedev->ops->vfio_load_config) {

  int ret;
@@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
  }
  }
  
-data = qemu_get_be64(f);

-if (data != VFIO_MIG_FLAG_END_OF_STATE) {
-error_report("%s: Failed loading device config space, "
- "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
-return -EINVAL;
-}
-
  trace_vfio_load_device_config_state(vbasedev->name);
-return qemu_file_get_error(f);
+return 0;
  }
  
  static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)

@@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }
  
-ret = vfio_save_device_config_state(f, opaque);

-if (ret) {
-return ret;
-}
-
  ret = vfio_update_pending(vbasedev);
  if (ret) {
  return ret;
@@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
  .save_live_pending = vfio_save_pending,
  .save_live_iterate = vfio_save_iterate,
  .save_live_complete_precopy = vfio_save_complete_precopy,
+.save_state = vfio_save_device_config_state,
  .load_setup = vfio_load_setup,
  .load_cleanup = vfio_load_cleanup,
  .load_state = vfio_load_state,





[PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-14 Thread Shenming Lu
When running VFIO migration, I found that the restoring of VFIO PCI device’s
config space is before VGIC on ARM64 target. But generally, interrupt 
controllers
need to be restored before PCI devices. Besides, if a VFIO PCI device is
configured to have directly-injected MSIs (VLPIs), the restoring of its config
space will trigger the configuring of these VLPIs (in kernel), where it would
return an error as I saw due to the dependency on kvm’s vgic.

To avoid this, we can move the saving of the config space from the iterable
process to the non-iterable process, so that it will be called after VGIC
according to their priorities.

Signed-off-by: Shenming Lu 
---
 hw/vfio/migration.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3ce285ea39..028da35a25 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -351,7 +351,7 @@ static int vfio_update_pending(VFIODevice *vbasedev)
 return 0;
 }
 
-static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
+static void vfio_save_device_config_state(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
 
@@ -365,13 +365,14 @@ static int vfio_save_device_config_state(QEMUFile *f, 
void *opaque)
 
 trace_vfio_save_device_config_state(vbasedev->name);
 
-return qemu_file_get_error(f);
+if (qemu_file_get_error(f))
+error_report("%s: Failed to save device config space",
+ vbasedev->name);
 }
 
 static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
-uint64_t data;
 
 if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
 int ret;
@@ -384,15 +385,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
 }
 }
 
-data = qemu_get_be64(f);
-if (data != VFIO_MIG_FLAG_END_OF_STATE) {
-error_report("%s: Failed loading device config space, "
- "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
-return -EINVAL;
-}
-
 trace_vfio_load_device_config_state(vbasedev->name);
-return qemu_file_get_error(f);
+return 0;
 }
 
 static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
@@ -575,11 +569,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
-ret = vfio_save_device_config_state(f, opaque);
-if (ret) {
-return ret;
-}
-
 ret = vfio_update_pending(vbasedev);
 if (ret) {
 return ret;
@@ -720,6 +709,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
 .save_live_pending = vfio_save_pending,
 .save_live_iterate = vfio_save_iterate,
 .save_live_complete_precopy = vfio_save_complete_precopy,
+.save_state = vfio_save_device_config_state,
 .load_setup = vfio_load_setup,
 .load_cleanup = vfio_load_cleanup,
 .load_state = vfio_load_state,
-- 
2.19.1