Re: live migration vs device assignment (motivation)

2015-12-10 Thread Lan, Tianyu

On 12/10/2015 4:38 PM, Michael S. Tsirkin wrote:

Let's assume you do save state and do have a way to detect
whether state matches a given hardware. For example,
driver could store firmware and hardware versions
in the state, and then on destination, retrieve them
and compare. It will be pretty common that you have a mismatch,
and you must not just fail migration. You need a way to recover,
maybe with more downtime.


Second, you can change the driver but you can not be sure it will have
the chance to run at all. Host overload is a common reason to migrate
out of the host.  You also can not trust guest to do the right thing.
So how long do you want to wait until you decide guest is not
cooperating and kill it?  Most people will probably experiment a bit and
then add a bit of a buffer. This is not robust at all.

Again, maybe you ask driver to save state, and if it does
not respond for a while, then you still migrate,
and driver has to recover on destination.


With the above in mind, you need to support two paths:
1. "good path": driver stores state on source, checks it on destination
detects a match and restores state into the device
2. "bad path": driver does not store state, or detects a mismatch
on destination. driver has to assume device was lost,
and reset it

So what I am saying is, implement bad path first. Then good path
is an optimization - measure whether it's faster, and by how much.



These sound reasonable. Driver should have ability to do such check
to ensure hardware or firmware coherence after migration and reset 
device when migration happens at some unexpected position.




Also, it would be nice if on the bad path there was a way
to switch to another driver entirely, even if that means
a bit more downtime. For example, have a way for driver to
tell Linux it has to re-do probing for the device.


Just glace the code of device core. device_reprobe() does what you said.

/**
 * device_reprobe - remove driver for a device and probe for a new  driver
 * @dev: the device to reprobe
 *
 * This function detaches the attached driver (if any) for the given
 * device and restarts the driver probing process.  It is intended
 * to use if probing criteria changed during a devices lifetime and
 * driver attachment should change accordingly.
 */
int device_reprobe(struct device *dev)





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: live migration vs device assignment (motivation)

2015-12-10 Thread Michael S. Tsirkin
On Thu, Dec 10, 2015 at 11:04:54AM +0800, Lan, Tianyu wrote:
> 
> On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote:
> >On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote:
> >>On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
> >>>I thought about what this is doing at the high level, and I do have some
> >>>value in what you are trying to do, but I also think we need to clarify
> >>>the motivation a bit more.  What you are saying is not really what the
> >>>patches are doing.
> >>>
> >>>And with that clearer understanding of the motivation in mind (assuming
> >>>it actually captures a real need), I would also like to suggest some
> >>>changes.
> >>
> >>Motivation:
> >>Most current solutions for migration with passthough device are based on
> >>the PCI hotplug but it has side affect and can't work for all device.
> >>
> >>For NIC device:
> >>PCI hotplug solution can work around Network device migration
> >>via switching VF and PF.
> >
> >This is just more confusion. hotplug is just a way to add and remove
> >devices. switching VF and PF is up to guest and hypervisor.
> 
> This is a combination. Because it's not able to migrate device state in
> the current world during migration(What we are doing), Exist solutions
> of migrating VM with passthough NIC relies on the PCI hotplug.

That's where you go wrong I think. This marketing speak about solution
of migrating VM with passthrough is just confusing people.

There's no way to do migration with device passthrough on KVM at the
moment, in particular because of lack of way for host to save and
restore device state, and you do not propose a way either.

So how do people migrate? Stop doing device passthrough.
So what I think your patches do is add ability to do the two things
in parallel: stop doing passthrough and start migration.
You still can not migrate with passthrough.

> Unplug VF
> before starting migration and then switch network from VF NIC to PV NIC
> in order to maintain the network connection.

Again, this is mixing unrelated things.  This switching is not really
related to migration. You can do this at any time for any number of
reasons.  If migration takes a lot of time and if you unplug before
migration, then switching to another interface might make sense.
But it's question of policy.

> Plug VF again after
> migration and then switch from PV back to VF. Bond driver provides a way to
> switch between PV and VF NIC automatically with save IP and MAC and so bond
> driver is more preferred.

Preferred over switching manually? As long as it works well, sure.  But
one can come up with other techniques.  For example, don't switch. Save
ip, mac etc, remove source device and add the destination one.  You were
also complaining that the switch took too long.

> >
> >>But switching network interface will introduce service down time.
> >>
> >>I tested the service down time via putting VF and PV interface
> >>into a bonded interface and ping the bonded interface during plug
> >>and unplug VF.
> >>1) About 100ms when add VF
> >>2) About 30ms when del VF
> >
> >OK and what's the source of the downtime?
> >I'm guessing that's just arp being repopulated.  So simply save and
> >re-populate it.
> >
> >There would be a much cleaner solution.
> >
> >Or maybe there's a timer there that just delays hotplug
> >for no reason. Fix it, everyone will benefit.
> >
> >>It also requires guest to do switch configuration.
> >
> >That's just wrong. if you want a switch, you need to
> >configure a switch.
> 
> I meant the config of switching operation between PV and VF.

I see. So sure, there are many ways to configure networking
on linux. You seem to see this as a downside and so want
to hardcode a single configuration into the driver.

> >
> >>These are hard to
> >>manage and deploy from our customers.
> >
> >So kernel want to remain flexible, and the stack is
> >configurable. Downside: customers need to deploy userspace
> >to configure it. Your solution: a hard-coded configuration
> >within kernel and hypervisor.  Sorry, this makes no sense.
> >If kernel is easier for you to deploy than userspace,
> >you need to rethink your deployment strategy.
> 
> This is one factor.
> 
> >
> >>To maintain PV performance during
> >>migration, host side also needs to assign a VF to PV device. This
> >>affects scalability.
> >
> >No idea what this means.
> >
> >>These factors block SRIOV NIC passthough usage in the cloud service and
> >>OPNFV which require network high performance and stability a lot.
> >
> >Everyone needs performance and scalability.
> >
> >>
> >>For other kind of devices, it's hard to work.
> >>We are also adding migration support for QAT(QuickAssist Technology) device.
> >>
> >>QAT device user case introduction.
> >>Server, networking, big data, and storage applications use QuickAssist
> >>Technology to offload servers from handling compute-intensive operations,
> >>such as:
> >>1) Symmetric cryptography functions including cipher operations and
> >>authentication op

Re: live migration vs device assignment (motivation)

2015-12-09 Thread Lan, Tianyu



On 12/10/2015 1:14 AM, Alexander Duyck wrote:

On Wed, Dec 9, 2015 at 8:26 AM, Lan, Tianyu  wrote:


For other kind of devices, it's hard to work.
We are also adding migration support for QAT(QuickAssist Technology) device.

QAT device user case introduction.
Server, networking, big data, and storage applications use QuickAssist
Technology to offload servers from handling compute-intensive operations,
such as:
1) Symmetric cryptography functions including cipher operations and
authentication operations
2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
cryptography
3) Compression and decompression functions including DEFLATE and LZS

PCI hotplug will not work for such devices during migration and these
operations will fail when unplug device.


I assume the problem is that with a PCI hotplug event you are losing
the state information for the device, do I have that right?

Looking over the QAT drivers it doesn't seem like any of them support
the suspend/resume PM calls.  I would imagine it makes it difficult
for a system with a QAT card in it to be able to drop the system to a
low power state.  You might want to try enabling suspend/resume
support for the devices on bare metal before you attempt to take on
migration as it would provide you with a good testing framework to see
what needs to be saved/restored within the device and in what order
before you attempt to do the same while migrating from one system to
another.


Sure. The suspend/resume job is under way.
Actually, we have enabled QAT work for migration internally. Doing more 
test and fixing bugs.




- Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: live migration vs device assignment (motivation)

2015-12-09 Thread Lan, Tianyu


On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote:

On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote:

On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:

I thought about what this is doing at the high level, and I do have some
value in what you are trying to do, but I also think we need to clarify
the motivation a bit more.  What you are saying is not really what the
patches are doing.

And with that clearer understanding of the motivation in mind (assuming
it actually captures a real need), I would also like to suggest some
changes.


Motivation:
Most current solutions for migration with passthough device are based on
the PCI hotplug but it has side affect and can't work for all device.

For NIC device:
PCI hotplug solution can work around Network device migration
via switching VF and PF.


This is just more confusion. hotplug is just a way to add and remove
devices. switching VF and PF is up to guest and hypervisor.


This is a combination. Because it's not able to migrate device state in
the current world during migration(What we are doing), Exist solutions
of migrating VM with passthough NIC relies on the PCI hotplug. Unplug VF
before starting migration and then switch network from VF NIC to PV NIC
in order to maintain the network connection. Plug VF again after
migration and then switch from PV back to VF. Bond driver provides a way 
to switch between PV and VF NIC automatically with save IP and MAC and 
so bond driver is more preferred.





But switching network interface will introduce service down time.

I tested the service down time via putting VF and PV interface
into a bonded interface and ping the bonded interface during plug
and unplug VF.
1) About 100ms when add VF
2) About 30ms when del VF


OK and what's the source of the downtime?
I'm guessing that's just arp being repopulated.  So simply save and
re-populate it.

There would be a much cleaner solution.

Or maybe there's a timer there that just delays hotplug
for no reason. Fix it, everyone will benefit.


It also requires guest to do switch configuration.


That's just wrong. if you want a switch, you need to
configure a switch.


I meant the config of switching operation between PV and VF.




These are hard to
manage and deploy from our customers.


So kernel want to remain flexible, and the stack is
configurable. Downside: customers need to deploy userspace
to configure it. Your solution: a hard-coded configuration
within kernel and hypervisor.  Sorry, this makes no sense.
If kernel is easier for you to deploy than userspace,
you need to rethink your deployment strategy.


This is one factor.




To maintain PV performance during
migration, host side also needs to assign a VF to PV device. This
affects scalability.


No idea what this means.


These factors block SRIOV NIC passthough usage in the cloud service and
OPNFV which require network high performance and stability a lot.


Everyone needs performance and scalability.



For other kind of devices, it's hard to work.
We are also adding migration support for QAT(QuickAssist Technology) device.

QAT device user case introduction.
Server, networking, big data, and storage applications use QuickAssist
Technology to offload servers from handling compute-intensive operations,
such as:
1) Symmetric cryptography functions including cipher operations and
authentication operations
2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
cryptography
3) Compression and decompression functions including DEFLATE and LZS

PCI hotplug will not work for such devices during migration and these
operations will fail when unplug device.

So we are trying implementing a new solution which really migrates
device state to target machine and won't affect user during migration
with low service down time.


Let's assume for the sake of the argument that there's a lot going on
and removing the device is just too slow (though you should figure out
what's going on before giving up and just building something new from
scratch).


No, we can find a PV NIC as backup for VF NIC during migration but it 
doesn't work for other kinds of device since there is no backup for 
them. E,G When migration happens during users compresses files via QAT, 
it's impossible to remove QAT at that point. If do that, the compress 
operation will fail and affect user experience.




I still don't think you should be migrating state.  That's just too
fragile, and it also means you depend on driver to be nice and shut down
device on source, so you can not migrate at will.  Instead, reset device
on destination and re-initialize it.



Yes, saving and restoring device state relies on the driver and so we 
reworks driver and make it more friend to migration.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: live migration vs device assignment (motivation)

2015-12-09 Thread Michael S. Tsirkin
On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote:
> On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
> >I thought about what this is doing at the high level, and I do have some
> >value in what you are trying to do, but I also think we need to clarify
> >the motivation a bit more.  What you are saying is not really what the
> >patches are doing.
> >
> >And with that clearer understanding of the motivation in mind (assuming
> >it actually captures a real need), I would also like to suggest some
> >changes.
> 
> Motivation:
> Most current solutions for migration with passthough device are based on
> the PCI hotplug but it has side affect and can't work for all device.
> 
> For NIC device:
> PCI hotplug solution can work around Network device migration
> via switching VF and PF.

This is just more confusion. hotplug is just a way to add and remove
devices. switching VF and PF is up to guest and hypervisor.

> But switching network interface will introduce service down time.
> 
> I tested the service down time via putting VF and PV interface
> into a bonded interface and ping the bonded interface during plug
> and unplug VF.
> 1) About 100ms when add VF
> 2) About 30ms when del VF

OK and what's the source of the downtime?
I'm guessing that's just arp being repopulated.  So simply save and
re-populate it.

There would be a much cleaner solution.

Or maybe there's a timer there that just delays hotplug
for no reason. Fix it, everyone will benefit.

> It also requires guest to do switch configuration.

That's just wrong. if you want a switch, you need to
configure a switch.

> These are hard to
> manage and deploy from our customers.

So kernel want to remain flexible, and the stack is
configurable. Downside: customers need to deploy userspace
to configure it. Your solution: a hard-coded configuration
within kernel and hypervisor.  Sorry, this makes no sense.
If kernel is easier for you to deploy than userspace,
you need to rethink your deployment strategy.

> To maintain PV performance during
> migration, host side also needs to assign a VF to PV device. This
> affects scalability.

No idea what this means.

> These factors block SRIOV NIC passthough usage in the cloud service and
> OPNFV which require network high performance and stability a lot.

Everyone needs performance and scalability.

> 
> For other kind of devices, it's hard to work.
> We are also adding migration support for QAT(QuickAssist Technology) device.
> 
> QAT device user case introduction.
> Server, networking, big data, and storage applications use QuickAssist
> Technology to offload servers from handling compute-intensive operations,
> such as:
> 1) Symmetric cryptography functions including cipher operations and
> authentication operations
> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
> cryptography
> 3) Compression and decompression functions including DEFLATE and LZS
> 
> PCI hotplug will not work for such devices during migration and these
> operations will fail when unplug device.
> 
> So we are trying implementing a new solution which really migrates
> device state to target machine and won't affect user during migration
> with low service down time.

Let's assume for the sake of the argument that there's a lot going on
and removing the device is just too slow (though you should figure out
what's going on before giving up and just building something new from
scratch).

I still don't think you should be migrating state.  That's just too
fragile, and it also means you depend on driver to be nice and shut down
device on source, so you can not migrate at will.  Instead, reset device
on destination and re-initialize it.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: live migration vs device assignment (motivation)

2015-12-09 Thread Alexander Duyck
On Wed, Dec 9, 2015 at 8:26 AM, Lan, Tianyu  wrote:

> For other kind of devices, it's hard to work.
> We are also adding migration support for QAT(QuickAssist Technology) device.
>
> QAT device user case introduction.
> Server, networking, big data, and storage applications use QuickAssist
> Technology to offload servers from handling compute-intensive operations,
> such as:
> 1) Symmetric cryptography functions including cipher operations and
> authentication operations
> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
> cryptography
> 3) Compression and decompression functions including DEFLATE and LZS
>
> PCI hotplug will not work for such devices during migration and these
> operations will fail when unplug device.

I assume the problem is that with a PCI hotplug event you are losing
the state information for the device, do I have that right?

Looking over the QAT drivers it doesn't seem like any of them support
the suspend/resume PM calls.  I would imagine it makes it difficult
for a system with a QAT card in it to be able to drop the system to a
low power state.  You might want to try enabling suspend/resume
support for the devices on bare metal before you attempt to take on
migration as it would provide you with a good testing framework to see
what needs to be saved/restored within the device and in what order
before you attempt to do the same while migrating from one system to
another.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: live migration vs device assignment (motivation)

2015-12-09 Thread Lan, Tianyu

On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:

I thought about what this is doing at the high level, and I do have some
value in what you are trying to do, but I also think we need to clarify
the motivation a bit more.  What you are saying is not really what the
patches are doing.

And with that clearer understanding of the motivation in mind (assuming
it actually captures a real need), I would also like to suggest some
changes.


Motivation:
Most current solutions for migration with passthough device are based on
the PCI hotplug but it has side affect and can't work for all device.

For NIC device:
PCI hotplug solution can work around Network device migration
via switching VF and PF.

But switching network interface will introduce service down time.

I tested the service down time via putting VF and PV interface
into a bonded interface and ping the bonded interface during plug
and unplug VF.
1) About 100ms when add VF
2) About 30ms when del VF

It also requires guest to do switch configuration. These are hard to
manage and deploy from our customers. To maintain PV performance during
migration, host side also needs to assign a VF to PV device. This
affects scalability.

These factors block SRIOV NIC passthough usage in the cloud service and
OPNFV which require network high performance and stability a lot.


For other kind of devices, it's hard to work.
We are also adding migration support for QAT(QuickAssist Technology) device.

QAT device user case introduction.
Server, networking, big data, and storage applications use QuickAssist 
Technology to offload servers from handling compute-intensive 
operations, such as:
1) Symmetric cryptography functions including cipher operations and 
authentication operations
2) Public key functions including RSA, Diffie-Hellman, and elliptic 
curve cryptography

3) Compression and decompression functions including DEFLATE and LZS

PCI hotplug will not work for such devices during migration and these 
operations will fail when unplug device.


So we are trying implementing a new solution which really migrates
device state to target machine and won't affect user during migration
with low service down time.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html