Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-12 Thread Alexander Duyck
On Sat, Jun 11, 2016 at 8:03 PM, Zhou Jie  wrote:
> Hi, Alex
>
>
> On 2016/6/9 23:39, Alexander Duyck wrote:
>>
>> On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie 
>> wrote:
>>>
>>> TO Alex
>>> TO Michael
>>>
>>>In your solution you add a emulate PCI bridge to act as
>>>a bridge between direct assigned devices and the host bridge.
>>>Do you mean put all direct assigned devices to
>>>one emulate PCI bridge?
>>>If yes, this maybe bring some problems.
>>>
>>>We are writing a patchset to support aer feature in qemu.
>>>When assigning a vfio device with AER enabled, we must check whether
>>>the device supports a host bus reset (ie. hot reset) as this may be
>>>used by the guest OS in order to recover the device from an AER
>>>error.
>>>QEMU must therefore have the ability to perform a physical
>>>host bus reset using the existing vfio APIs in response to a virtual
>>>bus reset in the VM.
>>>A physical bus reset affects all of the devices on the host bus.
>>>Therefore all physical devices affected by a bus reset must be
>>>configured on the same virtual bus in the VM.
>>>And no devices unaffected by the bus reset,
>>>be configured on the same virtual bus.
>>>
>>>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>>>
>>> Sincerely,
>>> Zhou Jie
>>
>>
>> That makes sense, but I don't think you have to worry much about this
>> at this point at least on my side as this was mostly just theory and I
>> haven't had a chance to put any of it into practice as of yet.
>>
>> My idea has been evolving on this for a while.  One thought I had is
>> that we may want to have something like an emulated IOMMU and if
>> possible we would want to split it up over multiple domains just so we
>> can be certain that the virtual interfaces and the physical ones
>> existed in separate domains.  In regards to your concerns perhaps what
>> we could do is put each assigned device into its own domain to prevent
>> them from affecting each other.  To that end we could probably break
>> things up so that each device effectively lives in its own PCIe slot
>> in the emulated system.  Then when we start a migration of the guest
>> the assigned device domains would then have to be tracked for unmap
>> and sync calls when the direction is from the device.
>>
>> I will keep your concerns in mind in the future when I get some time
>> to look at exploring this solution further.
>>
>> - Alex
>
>
> I am thinking about the practice of migration of passthrough device.
>
> In your solution, you use a vendor specific configuration space to
> negotiate with guest.
> If you put each assigned device into its own domain,
> how can qemu negotiate with guest?
> Add the vendor specific configuration space to every pci bus which
> is assigned a passthrough device?

This is kind of the direction I was thinking of heading in, so yes.
Basically in my mind we should be emulating a PCIe hierarchy if we
want so support device assignment.  That way we can already make use
of things like hot-plug and AER natively.  So if we have a root port
assigned to each assigned device we should be able to place some extra
logic there to handle things like this.

- Alex


Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-12 Thread Alexander Duyck
On Sat, Jun 11, 2016 at 8:03 PM, Zhou Jie  wrote:
> Hi, Alex
>
>
> On 2016/6/9 23:39, Alexander Duyck wrote:
>>
>> On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie 
>> wrote:
>>>
>>> TO Alex
>>> TO Michael
>>>
>>>In your solution you add a emulate PCI bridge to act as
>>>a bridge between direct assigned devices and the host bridge.
>>>Do you mean put all direct assigned devices to
>>>one emulate PCI bridge?
>>>If yes, this maybe bring some problems.
>>>
>>>We are writing a patchset to support aer feature in qemu.
>>>When assigning a vfio device with AER enabled, we must check whether
>>>the device supports a host bus reset (ie. hot reset) as this may be
>>>used by the guest OS in order to recover the device from an AER
>>>error.
>>>QEMU must therefore have the ability to perform a physical
>>>host bus reset using the existing vfio APIs in response to a virtual
>>>bus reset in the VM.
>>>A physical bus reset affects all of the devices on the host bus.
>>>Therefore all physical devices affected by a bus reset must be
>>>configured on the same virtual bus in the VM.
>>>And no devices unaffected by the bus reset,
>>>be configured on the same virtual bus.
>>>
>>>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>>>
>>> Sincerely,
>>> Zhou Jie
>>
>>
>> That makes sense, but I don't think you have to worry much about this
>> at this point at least on my side as this was mostly just theory and I
>> haven't had a chance to put any of it into practice as of yet.
>>
>> My idea has been evolving on this for a while.  One thought I had is
>> that we may want to have something like an emulated IOMMU and if
>> possible we would want to split it up over multiple domains just so we
>> can be certain that the virtual interfaces and the physical ones
>> existed in separate domains.  In regards to your concerns perhaps what
>> we could do is put each assigned device into its own domain to prevent
>> them from affecting each other.  To that end we could probably break
>> things up so that each device effectively lives in its own PCIe slot
>> in the emulated system.  Then when we start a migration of the guest
>> the assigned device domains would then have to be tracked for unmap
>> and sync calls when the direction is from the device.
>>
>> I will keep your concerns in mind in the future when I get some time
>> to look at exploring this solution further.
>>
>> - Alex
>
>
> I am thinking about the practice of migration of passthrough device.
>
> In your solution, you use a vendor specific configuration space to
> negotiate with guest.
> If you put each assigned device into its own domain,
> how can qemu negotiate with guest?
> Add the vendor specific configuration space to every pci bus which
> is assigned a passthrough device?

This is kind of the direction I was thinking of heading in, so yes.
Basically in my mind we should be emulating a PCIe hierarchy if we
want so support device assignment.  That way we can already make use
of things like hot-plug and AER natively.  So if we have a root port
assigned to each assigned device we should be able to place some extra
logic there to handle things like this.

- Alex


Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-11 Thread Zhou Jie

Hi, Alex

On 2016/6/9 23:39, Alexander Duyck wrote:

On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie  wrote:

TO Alex
TO Michael

   In your solution you add a emulate PCI bridge to act as
   a bridge between direct assigned devices and the host bridge.
   Do you mean put all direct assigned devices to
   one emulate PCI bridge?
   If yes, this maybe bring some problems.

   We are writing a patchset to support aer feature in qemu.
   When assigning a vfio device with AER enabled, we must check whether
   the device supports a host bus reset (ie. hot reset) as this may be
   used by the guest OS in order to recover the device from an AER
   error.
   QEMU must therefore have the ability to perform a physical
   host bus reset using the existing vfio APIs in response to a virtual
   bus reset in the VM.
   A physical bus reset affects all of the devices on the host bus.
   Therefore all physical devices affected by a bus reset must be
   configured on the same virtual bus in the VM.
   And no devices unaffected by the bus reset,
   be configured on the same virtual bus.

   http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html

Sincerely,
Zhou Jie


That makes sense, but I don't think you have to worry much about this
at this point at least on my side as this was mostly just theory and I
haven't had a chance to put any of it into practice as of yet.

My idea has been evolving on this for a while.  One thought I had is
that we may want to have something like an emulated IOMMU and if
possible we would want to split it up over multiple domains just so we
can be certain that the virtual interfaces and the physical ones
existed in separate domains.  In regards to your concerns perhaps what
we could do is put each assigned device into its own domain to prevent
them from affecting each other.  To that end we could probably break
things up so that each device effectively lives in its own PCIe slot
in the emulated system.  Then when we start a migration of the guest
the assigned device domains would then have to be tracked for unmap
and sync calls when the direction is from the device.

I will keep your concerns in mind in the future when I get some time
to look at exploring this solution further.

- Alex


I am thinking about the practice of migration of passthrough device.

In your solution, you use a vendor specific configuration space to
negotiate with guest.
If you put each assigned device into its own domain,
how can qemu negotiate with guest?
Add the vendor specific configuration space to every pci bus which
is assigned a passthrough device?

Sincerely
Zhou Jie




Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-11 Thread Zhou Jie

Hi, Alex

On 2016/6/9 23:39, Alexander Duyck wrote:

On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie  wrote:

TO Alex
TO Michael

   In your solution you add a emulate PCI bridge to act as
   a bridge between direct assigned devices and the host bridge.
   Do you mean put all direct assigned devices to
   one emulate PCI bridge?
   If yes, this maybe bring some problems.

   We are writing a patchset to support aer feature in qemu.
   When assigning a vfio device with AER enabled, we must check whether
   the device supports a host bus reset (ie. hot reset) as this may be
   used by the guest OS in order to recover the device from an AER
   error.
   QEMU must therefore have the ability to perform a physical
   host bus reset using the existing vfio APIs in response to a virtual
   bus reset in the VM.
   A physical bus reset affects all of the devices on the host bus.
   Therefore all physical devices affected by a bus reset must be
   configured on the same virtual bus in the VM.
   And no devices unaffected by the bus reset,
   be configured on the same virtual bus.

   http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html

Sincerely,
Zhou Jie


That makes sense, but I don't think you have to worry much about this
at this point at least on my side as this was mostly just theory and I
haven't had a chance to put any of it into practice as of yet.

My idea has been evolving on this for a while.  One thought I had is
that we may want to have something like an emulated IOMMU and if
possible we would want to split it up over multiple domains just so we
can be certain that the virtual interfaces and the physical ones
existed in separate domains.  In regards to your concerns perhaps what
we could do is put each assigned device into its own domain to prevent
them from affecting each other.  To that end we could probably break
things up so that each device effectively lives in its own PCIe slot
in the emulated system.  Then when we start a migration of the guest
the assigned device domains would then have to be tracked for unmap
and sync calls when the direction is from the device.

I will keep your concerns in mind in the future when I get some time
to look at exploring this solution further.

- Alex


I am thinking about the practice of migration of passthrough device.

In your solution, you use a vendor specific configuration space to
negotiate with guest.
If you put each assigned device into its own domain,
how can qemu negotiate with guest?
Add the vendor specific configuration space to every pci bus which
is assigned a passthrough device?

Sincerely
Zhou Jie




Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-09 Thread Alexander Duyck
On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie  wrote:
> TO Alex
> TO Michael
>
>In your solution you add a emulate PCI bridge to act as
>a bridge between direct assigned devices and the host bridge.
>Do you mean put all direct assigned devices to
>one emulate PCI bridge?
>If yes, this maybe bring some problems.
>
>We are writing a patchset to support aer feature in qemu.
>When assigning a vfio device with AER enabled, we must check whether
>the device supports a host bus reset (ie. hot reset) as this may be
>used by the guest OS in order to recover the device from an AER
>error.
>QEMU must therefore have the ability to perform a physical
>host bus reset using the existing vfio APIs in response to a virtual
>bus reset in the VM.
>A physical bus reset affects all of the devices on the host bus.
>Therefore all physical devices affected by a bus reset must be
>configured on the same virtual bus in the VM.
>And no devices unaffected by the bus reset,
>be configured on the same virtual bus.
>
>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>
> Sincerely,
> Zhou Jie

That makes sense, but I don't think you have to worry much about this
at this point at least on my side as this was mostly just theory and I
haven't had a chance to put any of it into practice as of yet.

My idea has been evolving on this for a while.  One thought I had is
that we may want to have something like an emulated IOMMU and if
possible we would want to split it up over multiple domains just so we
can be certain that the virtual interfaces and the physical ones
existed in separate domains.  In regards to your concerns perhaps what
we could do is put each assigned device into its own domain to prevent
them from affecting each other.  To that end we could probably break
things up so that each device effectively lives in its own PCIe slot
in the emulated system.  Then when we start a migration of the guest
the assigned device domains would then have to be tracked for unmap
and sync calls when the direction is from the device.

I will keep your concerns in mind in the future when I get some time
to look at exploring this solution further.

- Alex


Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-09 Thread Alexander Duyck
On Thu, Jun 9, 2016 at 3:14 AM, Zhou Jie  wrote:
> TO Alex
> TO Michael
>
>In your solution you add a emulate PCI bridge to act as
>a bridge between direct assigned devices and the host bridge.
>Do you mean put all direct assigned devices to
>one emulate PCI bridge?
>If yes, this maybe bring some problems.
>
>We are writing a patchset to support aer feature in qemu.
>When assigning a vfio device with AER enabled, we must check whether
>the device supports a host bus reset (ie. hot reset) as this may be
>used by the guest OS in order to recover the device from an AER
>error.
>QEMU must therefore have the ability to perform a physical
>host bus reset using the existing vfio APIs in response to a virtual
>bus reset in the VM.
>A physical bus reset affects all of the devices on the host bus.
>Therefore all physical devices affected by a bus reset must be
>configured on the same virtual bus in the VM.
>And no devices unaffected by the bus reset,
>be configured on the same virtual bus.
>
>http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html
>
> Sincerely,
> Zhou Jie

That makes sense, but I don't think you have to worry much about this
at this point at least on my side as this was mostly just theory and I
haven't had a chance to put any of it into practice as of yet.

My idea has been evolving on this for a while.  One thought I had is
that we may want to have something like an emulated IOMMU and if
possible we would want to split it up over multiple domains just so we
can be certain that the virtual interfaces and the physical ones
existed in separate domains.  In regards to your concerns perhaps what
we could do is put each assigned device into its own domain to prevent
them from affecting each other.  To that end we could probably break
things up so that each device effectively lives in its own PCIe slot
in the emulated system.  Then when we start a migration of the guest
the assigned device domains would then have to be tracked for unmap
and sync calls when the direction is from the device.

I will keep your concerns in mind in the future when I get some time
to look at exploring this solution further.

- Alex


Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-09 Thread Zhou Jie

TO Alex
TO Michael

   In your solution you add a emulate PCI bridge to act as
   a bridge between direct assigned devices and the host bridge.
   Do you mean put all direct assigned devices to
   one emulate PCI bridge?
   If yes, this maybe bring some problems.

   We are writing a patchset to support aer feature in qemu.
   When assigning a vfio device with AER enabled, we must check whether
   the device supports a host bus reset (ie. hot reset) as this may be
   used by the guest OS in order to recover the device from an AER
   error.
   QEMU must therefore have the ability to perform a physical
   host bus reset using the existing vfio APIs in response to a virtual
   bus reset in the VM.
   A physical bus reset affects all of the devices on the host bus.
   Therefore all physical devices affected by a bus reset must be
   configured on the same virtual bus in the VM.
   And no devices unaffected by the bus reset,
   be configured on the same virtual bus.

   http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html

Sincerely,
Zhou Jie

On 2016/6/7 0:04, Alex Duyck wrote:

On Mon, Jun 6, 2016 at 2:18 AM, Zhou Jie  wrote:

Hi Alex,


On 2016/1/6 0:18, Alexander Duyck wrote:


On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:


On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:


The two mechanisms referenced above would likely require coordination
with
QEMU and as such are open to discussion.  I haven't attempted to
address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to
the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with
pass-through
devices.



shpc?



That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex




Start by using hot-unplug for this!

Really use your patch guest side, and write host side
to allow starting migration with the device, but
defer completing it.



Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.



I have seen your email about live migration.

I conclude the idea you proposed as following.
1. Extend swiotlb to allow for a page dirtying functionality.
2. Use pci express capability to implement of a PCI bridge to act
   as a bridge between direct assigned devices and the host bridge.
3. Using APCI event or extend shpc driver to support device pause.
Is it right?

Will you implement the patchs for live migration?


That is pretty much the heart of the proposal I had.  I submitted an
RFC as a proof-of-concept for item 1 in the hopes that someone else
might try tackling items 2 and 3 but I haven't seen any updates since
then.  The trick is to find a way to make it so that item 1 doesn't
slow down standard SWIOTLB when you are not migrating a VM. If nothing
else we would probably just need to add a static key that we could
default to false unless there is a PCI bridge indicating we are
starting a migration.

I haven't had time to really work on this though. In addition I am not
that familiar with QEMU and the internals of live migration so pieces
2 and 3 would take me some additional time to work on.

- Alex


.






Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-09 Thread Zhou Jie

TO Alex
TO Michael

   In your solution you add a emulate PCI bridge to act as
   a bridge between direct assigned devices and the host bridge.
   Do you mean put all direct assigned devices to
   one emulate PCI bridge?
   If yes, this maybe bring some problems.

   We are writing a patchset to support aer feature in qemu.
   When assigning a vfio device with AER enabled, we must check whether
   the device supports a host bus reset (ie. hot reset) as this may be
   used by the guest OS in order to recover the device from an AER
   error.
   QEMU must therefore have the ability to perform a physical
   host bus reset using the existing vfio APIs in response to a virtual
   bus reset in the VM.
   A physical bus reset affects all of the devices on the host bus.
   Therefore all physical devices affected by a bus reset must be
   configured on the same virtual bus in the VM.
   And no devices unaffected by the bus reset,
   be configured on the same virtual bus.

   http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02989.html

Sincerely,
Zhou Jie

On 2016/6/7 0:04, Alex Duyck wrote:

On Mon, Jun 6, 2016 at 2:18 AM, Zhou Jie  wrote:

Hi Alex,


On 2016/1/6 0:18, Alexander Duyck wrote:


On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:


On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:


The two mechanisms referenced above would likely require coordination
with
QEMU and as such are open to discussion.  I haven't attempted to
address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to
the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with
pass-through
devices.



shpc?



That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex




Start by using hot-unplug for this!

Really use your patch guest side, and write host side
to allow starting migration with the device, but
defer completing it.



Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.



I have seen your email about live migration.

I conclude the idea you proposed as following.
1. Extend swiotlb to allow for a page dirtying functionality.
2. Use pci express capability to implement of a PCI bridge to act
   as a bridge between direct assigned devices and the host bridge.
3. Using APCI event or extend shpc driver to support device pause.
Is it right?

Will you implement the patchs for live migration?


That is pretty much the heart of the proposal I had.  I submitted an
RFC as a proof-of-concept for item 1 in the hopes that someone else
might try tackling items 2 and 3 but I haven't seen any updates since
then.  The trick is to find a way to make it so that item 1 doesn't
slow down standard SWIOTLB when you are not migrating a VM. If nothing
else we would probably just need to add a static key that we could
default to false unless there is a PCI bridge indicating we are
starting a migration.

I haven't had time to really work on this though. In addition I am not
that familiar with QEMU and the internals of live migration so pieces
2 and 3 would take me some additional time to work on.

- Alex


.






Re: Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-06 Thread Alex Duyck
On Mon, Jun 6, 2016 at 2:18 AM, Zhou Jie  wrote:
> Hi Alex,
>
>
> On 2016/1/6 0:18, Alexander Duyck wrote:
>>
>> On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:
>>>
>>> On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
>>
>> The two mechanisms referenced above would likely require coordination
>> with
>> QEMU and as such are open to discussion.  I haven't attempted to
>> address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to
>> the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with
>> pass-through
>> devices.
>
>
> shpc?


 That is kind of what I was thinking.  We basically need some mechanism
 to allow for the host to ask the device to quiesce.  It has been
 proposed to possibly even look at something like an ACPI interface
 since I know ACPI is used by QEMU to manage hot-plug in the standard
 case.

 - Alex
>>>
>>>
>>>
>>> Start by using hot-unplug for this!
>>>
>>> Really use your patch guest side, and write host side
>>> to allow starting migration with the device, but
>>> defer completing it.
>>
>>
>> Yeah, I'm fully on board with this idea, though I'm not really working
>> on this right now since last I knew the folks on this thread from
>> Intel were working on it.  My patches were mostly meant to be a nudge
>> in this direction so that we could get away from the driver specific
>> code.
>
>
> I have seen your email about live migration.
>
> I conclude the idea you proposed as following.
> 1. Extend swiotlb to allow for a page dirtying functionality.
> 2. Use pci express capability to implement of a PCI bridge to act
>as a bridge between direct assigned devices and the host bridge.
> 3. Using APCI event or extend shpc driver to support device pause.
> Is it right?
>
> Will you implement the patchs for live migration?

That is pretty much the heart of the proposal I had.  I submitted an
RFC as a proof-of-concept for item 1 in the hopes that someone else
might try tackling items 2 and 3 but I haven't seen any updates since
then.  The trick is to find a way to make it so that item 1 doesn't
slow down standard SWIOTLB when you are not migrating a VM. If nothing
else we would probably just need to add a static key that we could
default to false unless there is a PCI bridge indicating we are
starting a migration.

I haven't had time to really work on this though. In addition I am not
that familiar with QEMU and the internals of live migration so pieces
2 and 3 would take me some additional time to work on.

- Alex


Re: Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-06 Thread Alex Duyck
On Mon, Jun 6, 2016 at 2:18 AM, Zhou Jie  wrote:
> Hi Alex,
>
>
> On 2016/1/6 0:18, Alexander Duyck wrote:
>>
>> On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:
>>>
>>> On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
>>
>> The two mechanisms referenced above would likely require coordination
>> with
>> QEMU and as such are open to discussion.  I haven't attempted to
>> address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to
>> the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with
>> pass-through
>> devices.
>
>
> shpc?


 That is kind of what I was thinking.  We basically need some mechanism
 to allow for the host to ask the device to quiesce.  It has been
 proposed to possibly even look at something like an ACPI interface
 since I know ACPI is used by QEMU to manage hot-plug in the standard
 case.

 - Alex
>>>
>>>
>>>
>>> Start by using hot-unplug for this!
>>>
>>> Really use your patch guest side, and write host side
>>> to allow starting migration with the device, but
>>> defer completing it.
>>
>>
>> Yeah, I'm fully on board with this idea, though I'm not really working
>> on this right now since last I knew the folks on this thread from
>> Intel were working on it.  My patches were mostly meant to be a nudge
>> in this direction so that we could get away from the driver specific
>> code.
>
>
> I have seen your email about live migration.
>
> I conclude the idea you proposed as following.
> 1. Extend swiotlb to allow for a page dirtying functionality.
> 2. Use pci express capability to implement of a PCI bridge to act
>as a bridge between direct assigned devices and the host bridge.
> 3. Using APCI event or extend shpc driver to support device pause.
> Is it right?
>
> Will you implement the patchs for live migration?

That is pretty much the heart of the proposal I had.  I submitted an
RFC as a proof-of-concept for item 1 in the hopes that someone else
might try tackling items 2 and 3 but I haven't seen any updates since
then.  The trick is to find a way to make it so that item 1 doesn't
slow down standard SWIOTLB when you are not migrating a VM. If nothing
else we would probably just need to add a static key that we could
default to false unless there is a PCI bridge indicating we are
starting a migration.

I haven't had time to really work on this though. In addition I am not
that familiar with QEMU and the internals of live migration so pieces
2 and 3 would take me some additional time to work on.

- Alex


Re: Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-06 Thread Zhou Jie

Hi Alex,

On 2016/1/6 0:18, Alexander Duyck wrote:

On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:

On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:

The two mechanisms referenced above would likely require coordination with
QEMU and as such are open to discussion.  I haven't attempted to address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with pass-through
devices.


shpc?


That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex



Start by using hot-unplug for this!

Really use your patch guest side, and write host side
to allow starting migration with the device, but
defer completing it.


Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.


I have seen your email about live migration.

I conclude the idea you proposed as following.
1. Extend swiotlb to allow for a page dirtying functionality.
2. Use pci express capability to implement of a PCI bridge to act
   as a bridge between direct assigned devices and the host bridge.
3. Using APCI event or extend shpc driver to support device pause.
Is it right?

Will you implement the patchs for live migration?

Sincerely,
Zhou Jie





So

1.- host tells guest to start tracking memory writes
2.- guest acks
3.- migration starts
4.- most memory is migrated
5.- host tells guest to eject device
6.- guest acks
7.- stop vm and migrate rest of state



Sounds about right.  The only way this differs from what I see as the
final solution for this is that instead of fully ejecting the device
in step 5 the driver would instead pause the device and give the host
something like 10 seconds to stop the VM and resume with the same
device connected if it is available.  We would probably also need to
look at a solution that would force the device to be ejected or abort
prior to starting the migration if it doesn't give us the ack in step
2.


It will already be a win since hot unplug after migration starts and
most memory has been migrated is better than hot unplug before migration
starts.


Right.  Generally the longer the VF can be maintained as a part of the
guest the longer the network performance is improved versus using a
purely virtual interface.


Then measure downtime and profile. Then we can look at ways
to quiesce device faster which really means step 5 is replaced
with "host tells guest to quiesce device and dirty (or just unmap!)
all memory mapped for write by device".


Step 5 will be the spot where we really need to start modifying
drivers.  Specifically we probably need to go through and clean-up
things so that we can reduce as many of the delays in the driver
suspend/resume path as possible.  I suspect there is quite a bit that
can be done there that would probably also improve boot and shutdown
times since those are also impacted by the devices.

- Alex



.






Re: Re: [Qemu-devel] [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-06-06 Thread Zhou Jie

Hi Alex,

On 2016/1/6 0:18, Alexander Duyck wrote:

On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:

On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:

The two mechanisms referenced above would likely require coordination with
QEMU and as such are open to discussion.  I haven't attempted to address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with pass-through
devices.


shpc?


That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex



Start by using hot-unplug for this!

Really use your patch guest side, and write host side
to allow starting migration with the device, but
defer completing it.


Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.


I have seen your email about live migration.

I conclude the idea you proposed as following.
1. Extend swiotlb to allow for a page dirtying functionality.
2. Use pci express capability to implement of a PCI bridge to act
   as a bridge between direct assigned devices and the host bridge.
3. Using APCI event or extend shpc driver to support device pause.
Is it right?

Will you implement the patchs for live migration?

Sincerely,
Zhou Jie





So

1.- host tells guest to start tracking memory writes
2.- guest acks
3.- migration starts
4.- most memory is migrated
5.- host tells guest to eject device
6.- guest acks
7.- stop vm and migrate rest of state



Sounds about right.  The only way this differs from what I see as the
final solution for this is that instead of fully ejecting the device
in step 5 the driver would instead pause the device and give the host
something like 10 seconds to stop the VM and resume with the same
device connected if it is available.  We would probably also need to
look at a solution that would force the device to be ejected or abort
prior to starting the migration if it doesn't give us the ack in step
2.


It will already be a win since hot unplug after migration starts and
most memory has been migrated is better than hot unplug before migration
starts.


Right.  Generally the longer the VF can be maintained as a part of the
guest the longer the network performance is improved versus using a
purely virtual interface.


Then measure downtime and profile. Then we can look at ways
to quiesce device faster which really means step 5 is replaced
with "host tells guest to quiesce device and dirty (or just unmap!)
all memory mapped for write by device".


Step 5 will be the spot where we really need to start modifying
drivers.  Specifically we probably need to go through and clean-up
things so that we can reduce as many of the delays in the driver
suspend/resume path as possible.  I suspect there is quite a bit that
can be done there that would probably also improve boot and shutdown
times since those are also impacted by the devices.

- Alex



.