Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-27 Thread Michael S. Tsirkin
On Fri, Dec 25, 2015 at 02:31:14PM -0800, Alexander Duyck wrote:
> The PCI hot-plug specification calls out that the OS can optionally
> implement a "pause" mechanism which is meant to be used for high
> availability type environments.  What I am proposing is basically
> extending the standard SHPC capable PCI bridge so that we can support
> the DMA page dirtying for everything hosted on it, add a vendor
> specific block to the config space so that the guest can notify the
> host that it will do page dirtying, and add a mechanism to indicate
> that all hot-plug events during the warm-up phase of the migration are
> pause events instead of full removals.

Two comments:

1. A vendor specific capability will always be problematic.
Better to register a capability id with pci sig.

2. There are actually several capabilities:

A. support for memory dirtying
if not supported, we must stop device before migration

This is supported by core guest OS code,
using patches similar to posted by you.


B. support for device replacement
This is a faster form of hotplug, where device is removed and
later another device using same driver is inserted in the same slot.

This is a possible optimization, but I am convinced
(A) should be implemented independently of (B).




> I've been poking around in the kernel and QEMU code and the part I
> have been trying to sort out is how to get QEMU based pci-bridge to
> use the SHPC driver because from what I can tell the driver never
> actually gets loaded on the device as it is left in the control of
> ACPI hot-plug.
> 
> - Alex

There are ways, but you can just use pci express, it's easier.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-27 Thread Alexander Duyck
On Sun, Dec 27, 2015 at 1:21 AM, Michael S. Tsirkin  wrote:
> On Fri, Dec 25, 2015 at 02:31:14PM -0800, Alexander Duyck wrote:
>> The PCI hot-plug specification calls out that the OS can optionally
>> implement a "pause" mechanism which is meant to be used for high
>> availability type environments.  What I am proposing is basically
>> extending the standard SHPC capable PCI bridge so that we can support
>> the DMA page dirtying for everything hosted on it, add a vendor
>> specific block to the config space so that the guest can notify the
>> host that it will do page dirtying, and add a mechanism to indicate
>> that all hot-plug events during the warm-up phase of the migration are
>> pause events instead of full removals.
>
> Two comments:
>
> 1. A vendor specific capability will always be problematic.
> Better to register a capability id with pci sig.
>
> 2. There are actually several capabilities:
>
> A. support for memory dirtying
> if not supported, we must stop device before migration
>
> This is supported by core guest OS code,
> using patches similar to posted by you.
>
>
> B. support for device replacement
> This is a faster form of hotplug, where device is removed and
> later another device using same driver is inserted in the same slot.
>
> This is a possible optimization, but I am convinced
> (A) should be implemented independently of (B).
>

My thought on this was that we don't need much to really implement
either feature.  Really only a bit or two for either one.  I had
thought about extending the PCI Advanced Features, but for now it
might make more sense to just implement it as a vendor capability for
the QEMU based bridges instead of trying to make this a true PCI
capability since I am not sure if this in any way would apply to
physical hardware.  The fact is the PCI Advanced Features capability
is essentially just a vendor specific capability with a different ID
so if we were to use 2 bits that are currently reserved in the
capability we could later merge the functionality without much
overhead.

I fully agree that the two implementations should be separate but
nothing says we have to implement them completely different.  If we
are just using 3 bits for capability, status, and control of each
feature there is no reason for them to need to be stored in separate
locations.

>> I've been poking around in the kernel and QEMU code and the part I
>> have been trying to sort out is how to get QEMU based pci-bridge to
>> use the SHPC driver because from what I can tell the driver never
>> actually gets loaded on the device as it is left in the control of
>> ACPI hot-plug.
>
> There are ways, but you can just use pci express, it's easier.

That's true.  I should probably just give up on trying to do an
implementation that works with the i440fx implementation.  I could
probably move over to the q35 and once that is done then we could look
at something like the PCI Advanced Features solution for something
like the PCI-bridge drivers.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to reserve guest physical region for ACPI

2015-12-27 Thread Xiao Guangrong


Hi Michael, Paolo,

Now it is the time to return to the challenge that how to reserve guest
physical region internally used by ACPI.

Igor suggested that:
| An alternative place to allocate reserve from could be high memory.
| For pc we have "reserved-memory-end" which currently makes sure
| that hotpluggable memory range isn't used by firmware
(https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)

he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
| when writing ASL one shall make sure that only XP supported
| features are in global scope, which is evaluated when tables
| are loaded and features of rev2 and higher are inside methods.
| That way XP doesn't crash as far as it doesn't evaluate unsupported
| opcode and one can guard those opcodes checking _REV object if neccesary.
(https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)

Michael, Paolo, what do you think about these ideas?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-27 Thread Alexander Duyck
On Sun, Dec 27, 2015 at 7:20 PM, Dong, Eddie  wrote:
>> >
>> > Even if the device driver doesn't support migration, you still want to
>> > migrate VM? That maybe risk and we should add the "bad path" for the
>> > driver at least.
>>
>> At a minimum we should have support for hot-plug if we are expecting to
>> support migration.  You would simply have to hot-plug the device before you
>> start migration and then return it after.  That is how the current bonding
>> approach for this works if I am not mistaken.
>
> Hotplug is good to eliminate the device spefic state clone, but bonding 
> approach is very network specific, it doesn’t work for other devices such as 
> FPGA device, QaT devices & GPU devices, which we plan to support gradually :)

Hotplug would be usable for that assuming the guest supports the
optional "pause" implementation as called out in the PCI hotplug spec.
With that the device can maintain state for some period of time after
the hotplug remove event has occurred.

The problem is that you have to get the device to quiesce at some
point as you cannot complete the migration with the device still
active.  The way you were doing it was using the per-device
configuration space mechanism.  That doesn't scale when you have to
implement it for each and every driver for each and every OS you have
to support.  Using the "pause" implementation for hot-plug would have
a much greater likelihood of scaling as you could either take the fast
path approach of "pausing" the device to resume it when migration has
completed, or you could just remove the device and restart the driver
on the other side if the pause support is not yet implemented.  You
would lose the state under such a migration but it is much more
practical than having to implement a per device solution.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-27 Thread Dong, Eddie
> >
> > Even if the device driver doesn't support migration, you still want to
> > migrate VM? That maybe risk and we should add the "bad path" for the
> > driver at least.
> 
> At a minimum we should have support for hot-plug if we are expecting to
> support migration.  You would simply have to hot-plug the device before you
> start migration and then return it after.  That is how the current bonding
> approach for this works if I am not mistaken.

Hotplug is good to eliminate the device spefic state clone, but bonding 
approach is very network specific, it doesn’t work for other devices such as 
FPGA device, QaT devices & GPU devices, which we plan to support gradually :)

> 
> The advantage we are looking to gain is to avoid removing/disabling the
> device for as long as possible.  Ideally we want to keep the device active
> through the warm-up period, but if the guest doesn't do that we should still
> be able to fall back on the older approaches if needed.
> 
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf