Re: live migration vs device assignment (motivation)
On 12/10/2015 4:38 PM, Michael S. Tsirkin wrote: Let's assume you do save state and do have a way to detect whether state matches a given hardware. For example, driver could store firmware and hardware versions in the state, and then on destination, retrieve them and compare. It will be pretty common that you have a mismatch, and you must not just fail migration. You need a way to recover, maybe with more downtime. Second, you can change the driver but you can not be sure it will have the chance to run at all. Host overload is a common reason to migrate out of the host. You also can not trust guest to do the right thing. So how long do you want to wait until you decide guest is not cooperating and kill it? Most people will probably experiment a bit and then add a bit of a buffer. This is not robust at all. Again, maybe you ask driver to save state, and if it does not respond for a while, then you still migrate, and driver has to recover on destination. With the above in mind, you need to support two paths: 1. "good path": driver stores state on source, checks it on destination detects a match and restores state into the device 2. "bad path": driver does not store state, or detects a mismatch on destination. driver has to assume device was lost, and reset it So what I am saying is, implement bad path first. Then good path is an optimization - measure whether it's faster, and by how much. These sound reasonable. Driver should have ability to do such check to ensure hardware or firmware coherence after migration and reset device when migration happens at some unexpected position. Also, it would be nice if on the bad path there was a way to switch to another driver entirely, even if that means a bit more downtime. For example, have a way for driver to tell Linux it has to re-do probing for the device. Just glace the code of device core. device_reprobe() does what you said. /** * device_reprobe - remove driver for a device and probe for a new driver * @dev: the device to reprobe * * This function detaches the attached driver (if any) for the given * device and restarts the driver probing process. It is intended * to use if probing criteria changed during a devices lifetime and * driver attachment should change accordingly. */ int device_reprobe(struct device *dev) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: live migration vs device assignment (motivation)
On Thu, Dec 10, 2015 at 11:04:54AM +0800, Lan, Tianyu wrote: > > On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote: > >On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote: > >>On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: > >>>I thought about what this is doing at the high level, and I do have some > >>>value in what you are trying to do, but I also think we need to clarify > >>>the motivation a bit more. What you are saying is not really what the > >>>patches are doing. > >>> > >>>And with that clearer understanding of the motivation in mind (assuming > >>>it actually captures a real need), I would also like to suggest some > >>>changes. > >> > >>Motivation: > >>Most current solutions for migration with passthough device are based on > >>the PCI hotplug but it has side affect and can't work for all device. > >> > >>For NIC device: > >>PCI hotplug solution can work around Network device migration > >>via switching VF and PF. > > > >This is just more confusion. hotplug is just a way to add and remove > >devices. switching VF and PF is up to guest and hypervisor. > > This is a combination. Because it's not able to migrate device state in > the current world during migration(What we are doing), Exist solutions > of migrating VM with passthough NIC relies on the PCI hotplug. That's where you go wrong I think. This marketing speak about solution of migrating VM with passthrough is just confusing people. There's no way to do migration with device passthrough on KVM at the moment, in particular because of lack of way for host to save and restore device state, and you do not propose a way either. So how do people migrate? Stop doing device passthrough. So what I think your patches do is add ability to do the two things in parallel: stop doing passthrough and start migration. You still can not migrate with passthrough. > Unplug VF > before starting migration and then switch network from VF NIC to PV NIC > in order to maintain the network connection. Again, this is mixing unrelated things. This switching is not really related to migration. You can do this at any time for any number of reasons. If migration takes a lot of time and if you unplug before migration, then switching to another interface might make sense. But it's question of policy. > Plug VF again after > migration and then switch from PV back to VF. Bond driver provides a way to > switch between PV and VF NIC automatically with save IP and MAC and so bond > driver is more preferred. Preferred over switching manually? As long as it works well, sure. But one can come up with other techniques. For example, don't switch. Save ip, mac etc, remove source device and add the destination one. You were also complaining that the switch took too long. > > > >>But switching network interface will introduce service down time. > >> > >>I tested the service down time via putting VF and PV interface > >>into a bonded interface and ping the bonded interface during plug > >>and unplug VF. > >>1) About 100ms when add VF > >>2) About 30ms when del VF > > > >OK and what's the source of the downtime? > >I'm guessing that's just arp being repopulated. So simply save and > >re-populate it. > > > >There would be a much cleaner solution. > > > >Or maybe there's a timer there that just delays hotplug > >for no reason. Fix it, everyone will benefit. > > > >>It also requires guest to do switch configuration. > > > >That's just wrong. if you want a switch, you need to > >configure a switch. > > I meant the config of switching operation between PV and VF. I see. So sure, there are many ways to configure networking on linux. You seem to see this as a downside and so want to hardcode a single configuration into the driver. > > > >>These are hard to > >>manage and deploy from our customers. > > > >So kernel want to remain flexible, and the stack is > >configurable. Downside: customers need to deploy userspace > >to configure it. Your solution: a hard-coded configuration > >within kernel and hypervisor. Sorry, this makes no sense. > >If kernel is easier for you to deploy than userspace, > >you need to rethink your deployment strategy. > > This is one factor. > > > > >>To maintain PV performance during > >>migration, host side also needs to assign a VF to PV device. This > >>affects scalability. > > > >No idea what this means. > > > >>These factors block SRIOV NIC passthough usage in the cloud service and > >>OPNFV which require network high performance and stability a lot. > > > >Everyone needs performance and scalability. > > > >> > >>For other kind of devices, it's hard to work. > >>We are also adding migration support for QAT(QuickAssist Technology) device. > >> > >>QAT device user case introduction. > >>Server, networking, big data, and storage applications use QuickAssist > >>Technology to offload servers from handling compute-intensive operations, > >>such as: > >>1) Symmetric cryptography functions including cipher operations and > >>authentication op
Re: live migration vs device assignment (motivation)
On 12/10/2015 1:14 AM, Alexander Duyck wrote: On Wed, Dec 9, 2015 at 8:26 AM, Lan, Tianyu wrote: For other kind of devices, it's hard to work. We are also adding migration support for QAT(QuickAssist Technology) device. QAT device user case introduction. Server, networking, big data, and storage applications use QuickAssist Technology to offload servers from handling compute-intensive operations, such as: 1) Symmetric cryptography functions including cipher operations and authentication operations 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve cryptography 3) Compression and decompression functions including DEFLATE and LZS PCI hotplug will not work for such devices during migration and these operations will fail when unplug device. I assume the problem is that with a PCI hotplug event you are losing the state information for the device, do I have that right? Looking over the QAT drivers it doesn't seem like any of them support the suspend/resume PM calls. I would imagine it makes it difficult for a system with a QAT card in it to be able to drop the system to a low power state. You might want to try enabling suspend/resume support for the devices on bare metal before you attempt to take on migration as it would provide you with a good testing framework to see what needs to be saved/restored within the device and in what order before you attempt to do the same while migrating from one system to another. Sure. The suspend/resume job is under way. Actually, we have enabled QAT work for migration internally. Doing more test and fixing bugs. - Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: live migration vs device assignment (motivation)
On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote: On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote: On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: I thought about what this is doing at the high level, and I do have some value in what you are trying to do, but I also think we need to clarify the motivation a bit more. What you are saying is not really what the patches are doing. And with that clearer understanding of the motivation in mind (assuming it actually captures a real need), I would also like to suggest some changes. Motivation: Most current solutions for migration with passthough device are based on the PCI hotplug but it has side affect and can't work for all device. For NIC device: PCI hotplug solution can work around Network device migration via switching VF and PF. This is just more confusion. hotplug is just a way to add and remove devices. switching VF and PF is up to guest and hypervisor. This is a combination. Because it's not able to migrate device state in the current world during migration(What we are doing), Exist solutions of migrating VM with passthough NIC relies on the PCI hotplug. Unplug VF before starting migration and then switch network from VF NIC to PV NIC in order to maintain the network connection. Plug VF again after migration and then switch from PV back to VF. Bond driver provides a way to switch between PV and VF NIC automatically with save IP and MAC and so bond driver is more preferred. But switching network interface will introduce service down time. I tested the service down time via putting VF and PV interface into a bonded interface and ping the bonded interface during plug and unplug VF. 1) About 100ms when add VF 2) About 30ms when del VF OK and what's the source of the downtime? I'm guessing that's just arp being repopulated. So simply save and re-populate it. There would be a much cleaner solution. Or maybe there's a timer there that just delays hotplug for no reason. Fix it, everyone will benefit. It also requires guest to do switch configuration. That's just wrong. if you want a switch, you need to configure a switch. I meant the config of switching operation between PV and VF. These are hard to manage and deploy from our customers. So kernel want to remain flexible, and the stack is configurable. Downside: customers need to deploy userspace to configure it. Your solution: a hard-coded configuration within kernel and hypervisor. Sorry, this makes no sense. If kernel is easier for you to deploy than userspace, you need to rethink your deployment strategy. This is one factor. To maintain PV performance during migration, host side also needs to assign a VF to PV device. This affects scalability. No idea what this means. These factors block SRIOV NIC passthough usage in the cloud service and OPNFV which require network high performance and stability a lot. Everyone needs performance and scalability. For other kind of devices, it's hard to work. We are also adding migration support for QAT(QuickAssist Technology) device. QAT device user case introduction. Server, networking, big data, and storage applications use QuickAssist Technology to offload servers from handling compute-intensive operations, such as: 1) Symmetric cryptography functions including cipher operations and authentication operations 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve cryptography 3) Compression and decompression functions including DEFLATE and LZS PCI hotplug will not work for such devices during migration and these operations will fail when unplug device. So we are trying implementing a new solution which really migrates device state to target machine and won't affect user during migration with low service down time. Let's assume for the sake of the argument that there's a lot going on and removing the device is just too slow (though you should figure out what's going on before giving up and just building something new from scratch). No, we can find a PV NIC as backup for VF NIC during migration but it doesn't work for other kinds of device since there is no backup for them. E,G When migration happens during users compresses files via QAT, it's impossible to remove QAT at that point. If do that, the compress operation will fail and affect user experience. I still don't think you should be migrating state. That's just too fragile, and it also means you depend on driver to be nice and shut down device on source, so you can not migrate at will. Instead, reset device on destination and re-initialize it. Yes, saving and restoring device state relies on the driver and so we reworks driver and make it more friend to migration. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: live migration vs device assignment (motivation)
On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote: > On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: > >I thought about what this is doing at the high level, and I do have some > >value in what you are trying to do, but I also think we need to clarify > >the motivation a bit more. What you are saying is not really what the > >patches are doing. > > > >And with that clearer understanding of the motivation in mind (assuming > >it actually captures a real need), I would also like to suggest some > >changes. > > Motivation: > Most current solutions for migration with passthough device are based on > the PCI hotplug but it has side affect and can't work for all device. > > For NIC device: > PCI hotplug solution can work around Network device migration > via switching VF and PF. This is just more confusion. hotplug is just a way to add and remove devices. switching VF and PF is up to guest and hypervisor. > But switching network interface will introduce service down time. > > I tested the service down time via putting VF and PV interface > into a bonded interface and ping the bonded interface during plug > and unplug VF. > 1) About 100ms when add VF > 2) About 30ms when del VF OK and what's the source of the downtime? I'm guessing that's just arp being repopulated. So simply save and re-populate it. There would be a much cleaner solution. Or maybe there's a timer there that just delays hotplug for no reason. Fix it, everyone will benefit. > It also requires guest to do switch configuration. That's just wrong. if you want a switch, you need to configure a switch. > These are hard to > manage and deploy from our customers. So kernel want to remain flexible, and the stack is configurable. Downside: customers need to deploy userspace to configure it. Your solution: a hard-coded configuration within kernel and hypervisor. Sorry, this makes no sense. If kernel is easier for you to deploy than userspace, you need to rethink your deployment strategy. > To maintain PV performance during > migration, host side also needs to assign a VF to PV device. This > affects scalability. No idea what this means. > These factors block SRIOV NIC passthough usage in the cloud service and > OPNFV which require network high performance and stability a lot. Everyone needs performance and scalability. > > For other kind of devices, it's hard to work. > We are also adding migration support for QAT(QuickAssist Technology) device. > > QAT device user case introduction. > Server, networking, big data, and storage applications use QuickAssist > Technology to offload servers from handling compute-intensive operations, > such as: > 1) Symmetric cryptography functions including cipher operations and > authentication operations > 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve > cryptography > 3) Compression and decompression functions including DEFLATE and LZS > > PCI hotplug will not work for such devices during migration and these > operations will fail when unplug device. > > So we are trying implementing a new solution which really migrates > device state to target machine and won't affect user during migration > with low service down time. Let's assume for the sake of the argument that there's a lot going on and removing the device is just too slow (though you should figure out what's going on before giving up and just building something new from scratch). I still don't think you should be migrating state. That's just too fragile, and it also means you depend on driver to be nice and shut down device on source, so you can not migrate at will. Instead, reset device on destination and re-initialize it. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: live migration vs device assignment (motivation)
On Wed, Dec 9, 2015 at 8:26 AM, Lan, Tianyu wrote: > For other kind of devices, it's hard to work. > We are also adding migration support for QAT(QuickAssist Technology) device. > > QAT device user case introduction. > Server, networking, big data, and storage applications use QuickAssist > Technology to offload servers from handling compute-intensive operations, > such as: > 1) Symmetric cryptography functions including cipher operations and > authentication operations > 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve > cryptography > 3) Compression and decompression functions including DEFLATE and LZS > > PCI hotplug will not work for such devices during migration and these > operations will fail when unplug device. I assume the problem is that with a PCI hotplug event you are losing the state information for the device, do I have that right? Looking over the QAT drivers it doesn't seem like any of them support the suspend/resume PM calls. I would imagine it makes it difficult for a system with a QAT card in it to be able to drop the system to a low power state. You might want to try enabling suspend/resume support for the devices on bare metal before you attempt to take on migration as it would provide you with a good testing framework to see what needs to be saved/restored within the device and in what order before you attempt to do the same while migrating from one system to another. - Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: live migration vs device assignment (motivation)
On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: I thought about what this is doing at the high level, and I do have some value in what you are trying to do, but I also think we need to clarify the motivation a bit more. What you are saying is not really what the patches are doing. And with that clearer understanding of the motivation in mind (assuming it actually captures a real need), I would also like to suggest some changes. Motivation: Most current solutions for migration with passthough device are based on the PCI hotplug but it has side affect and can't work for all device. For NIC device: PCI hotplug solution can work around Network device migration via switching VF and PF. But switching network interface will introduce service down time. I tested the service down time via putting VF and PV interface into a bonded interface and ping the bonded interface during plug and unplug VF. 1) About 100ms when add VF 2) About 30ms when del VF It also requires guest to do switch configuration. These are hard to manage and deploy from our customers. To maintain PV performance during migration, host side also needs to assign a VF to PV device. This affects scalability. These factors block SRIOV NIC passthough usage in the cloud service and OPNFV which require network high performance and stability a lot. For other kind of devices, it's hard to work. We are also adding migration support for QAT(QuickAssist Technology) device. QAT device user case introduction. Server, networking, big data, and storage applications use QuickAssist Technology to offload servers from handling compute-intensive operations, such as: 1) Symmetric cryptography functions including cipher operations and authentication operations 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve cryptography 3) Compression and decompression functions including DEFLATE and LZS PCI hotplug will not work for such devices during migration and these operations will fail when unplug device. So we are trying implementing a new solution which really migrates device state to target machine and won't affect user during migration with low service down time. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html