Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
Hi, Tianyu, I am testing your V2 patch set in our environment, while facing two issues now. Have a workaround for the first one and hope you could share some light on the second one :-) 1. Mismatch for ram_block (Have a workaround) --- Below is the error message on the destination: qemu-system-x86_64: Length mismatch: : 0x3000 in != 0x4000: Invalid argument qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram' qemu-system-x86_64: load of migration failed: Invalid argument With the following command line on source and destination respectively: git/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 4096 -smp 4 --nographic -drive file=/root/nfs/rhel.img,format=raw,cache=none -device vfio-sriov,host=:03:10.0 git/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 4096 -smp 4 --nographic -drive file=/root/nfs/rhel.img,format=raw,cache=none -device vfio-sriov,host=:03:10.0 --incoming tcp:0: By some debugging, the reason for this error is the ram_block->idstr of pass-through MMIO region is not set. My workaround is to add vmstate_register_ram() in vfio_mmap_region() after memory_region_init_ram_ptr() returns. I think this is not a good solution, since the ram_block->idstr is coded with the VF's BDF. So I guess this will not work when the VF has different BDF from source to destination respectively. Maybe my test step is not correct? 2. Failed to migrate the MAC address --- By adding some code in VF's driver in destination guest, I found the MAC information has been migrated to destination in adapter->hw.mac. While this is "reset" by VF's driver, when ixgbevf_migration_task is invoked at the end of the migration process. Below is what I have printed: The ifconfig output from destination: eth8 Link encap:Ethernet HWaddr 52:54:00:81:39:F2 inet addr:9.31.210.106 Bcast:9.31.255.255 Mask:255.255.0.0 inet6 addr: fe80::5054:ff:fe81:39f2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:66 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21840 (21.3 KiB) TX bytes:920 (920.0 b) The log message I printed in destination's VF driver: ixgbevf: migration end -- ixgbevf: original mac:52:54:00:81:39:f2 ixgbevf: after reset mac:52:54:00:92:04:a3 ixgbevf: migration end == I didn't take a close look in the "reset" function, while seems it retrieves the mac from VF hardware. Hmm... is there some possible way to have the same mac on both source and destination? At last, I appreciated all your work and help, learned much from your side. On Tue, Nov 24, 2015 at 09:35:17PM +0800, Lan Tianyu wrote: >This patchset is to propose a solution of adding live migration >support for SRIOV NIC. > >During migration, Qemu needs to let VF driver in the VM to know >migration start and end. Qemu adds faked PCI migration capability >to help to sync status between two sides during migration. > >Qemu triggers VF's mailbox irq via sending MSIX msg when migration >status is changed. VF driver tells Qemu its mailbox vector index >via the new PCI capability. In some cases(NIC is suspended or closed), >VF mailbox irq is freed and VF driver can disable irq injecting via >new capability. > >VF driver will put down nic before migration and put up again on >the target machine. > >Lan Tianyu (10): > Qemu/VFIO: Create head file pci.h to share data struct. > Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition > Qemu/VFIO: Rework vfio_std_cap_max_size() function > Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space >regs > Qemu/VFIO: Expose PCI config space read/write and msix functions > Qemu/PCI: Add macros for faked PCI migration capability > Qemu: Add post_load_state() to run after restoring CPU state > Qemu: Add save_before_stop callback to run just before stopping VCPU >during migration > Qemu/VFIO: Add SRIOV VF migration support > Qemu/VFIO: Misc change for enable migration with VFIO > > hw/vfio/Makefile.objs | 2 +- > hw/vfio/pci.c | 196 +--- > hw/vfio/pci.h | 168 + > hw/vfio/sriov.c | 178 > include/hw/pci/pci_regs.h | 19 + > include/migration/vmstate.h | 5 ++ > include/sysemu/sysemu.h | 1 + > linux-headers/linux/vfio.h | 16 > migration/migration.c | 3 +- > migration/savevm.c | 28 +++ > 10 files changed, 459 insertions(+), 157 deletions(-) > create mode 100644 hw/vfio/pci.h > create mode 100644 hw/vfio/sriov.c > >-- >1.9.3 > >
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Fri, Dec 04, 2015 at 02:42:36PM +0800, Lan, Tianyu wrote: > > On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote: > >>>We hope > >>>to find a better way to make SRIOV NIC work in these cases and this is > >>>worth to do since SRIOV NIC provides better network performance compared > >>>with PV NIC. > >If this is a performance optimization as the above implies, > >you need to include some numbers, and document how did > >you implement the switch and how did you measure the performance. > > > > OK. Some ideas of my patches come from paper "CompSC: Live Migration with > Pass-through Devices". > http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf > > It compared performance data between the solution of switching PV and VF and > VF migration.(Chapter 7: Discussion) > I haven't read it, but I would like to note you can't rely on research papers. If you propose a patch to be merged you need to measure what is its actual effect on modern linux at the end of 2015. > >>>Current patches have some issues. I think we can find > >>>solution for them andimprove them step by step.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/4/2015 4:05 PM, Michael S. Tsirkin wrote: I haven't read it, but I would like to note you can't rely on research papers. If you propose a patch to be merged you need to measure what is its actual effect on modern linux at the end of 2015. Sure. Will do that.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote: >We hope >to find a better way to make SRIOV NIC work in these cases and this is >worth to do since SRIOV NIC provides better network performance compared >with PV NIC. If this is a performance optimization as the above implies, you need to include some numbers, and document how did you implement the switch and how did you measure the performance. OK. Some ideas of my patches come from paper "CompSC: Live Migration with Pass-through Devices". http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf It compared performance data between the solution of switching PV and VF and VF migration.(Chapter 7: Discussion) >Current patches have some issues. I think we can find >solution for them andimprove them step by step.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote: >We hope >to find a better way to make SRIOV NIC work in these cases and this is >worth to do since SRIOV NIC provides better network performance compared >with PV NIC. If this is a performance optimization as the above implies, you need to include some numbers, and document how did you implement the switch and how did you measure the performance. OK. Some ideas of my patches come from paper "CompSC: Live Migration with Pass-through Devices". http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf It compared performance data between the solution of switching PV and VF and VF migration.(Chapter 7: Discussion) >Current patches have some issues. I think we can find >solution for them andimprove them step by step.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Wed, Dec 2, 2015 at 6:08 AM, Lan, Tianyuwrote: > On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote: >>> >>> But >>> it requires guest OS to do specific configurations inside and rely on >>> bonding driver which blocks it work on Windows. >>> From performance side, >>> putting VF and virtio NIC under bonded interface will affect their >>> performance even when not do migration. These factors block to use VF >>> NIC passthough in some user cases(Especially in the cloud) which require >>> migration. >> >> >> That's really up to guest. You don't need to do bonding, >> you can just move the IP and mac from userspace, that's >> possible on most OS-es. >> >> Or write something in guest kernel that is more lightweight if you are >> so inclined. What we are discussing here is the host-guest interface, >> not the in-guest interface. >> >>> Current solution we proposed changes NIC driver and Qemu. Guest Os >>> doesn't need to do special thing for migration. >>> It's easy to deploy >> >> >> >> Except of course these patches don't even work properly yet. >> >> And when they do, even minor changes in host side NIC hardware across >> migration will break guests in hard to predict ways. > > > Switching between PV and VF NIC will introduce network stop and the > latency of hotplug VF is measurable. For some user cases(cloud service > and OPNFV) which are sensitive to network stabilization and performance, > these are not friend and blocks SRIOV NIC usage in these case. We hope > to find a better way to make SRIOV NIC work in these cases and this is > worth to do since SRIOV NIC provides better network performance compared > with PV NIC. Current patches have some issues. I think we can find > solution for them andimprove them step by step. I still believe the concepts being put into use here are deeply flawed. You are assuming you can somehow complete the migration while the device is active and I seriously doubt that is the case. You are going to cause data corruption or worse cause a kernel panic when you end up corrupting the guest memory. You have to halt the device at some point in order to complete the migration. Now I fully agree it is best to do this for as small a window as possible. I really think that your best approach would be embrace and extend the current solution that is making use of bonding. The first step being to make it so that you don't have to hot-plug the VF until just before you halt the guest instead of before you start he migration. Just doing that would yield a significant gain in terms of performance during the migration. In addition something like that should be able to be done without having to be overly invasive into the drivers. A few tweaks to the DMA API and you could probably have that resolved. As far as avoiding the hot-plug itself that would be better handled as a separate follow-up, and really belongs more to the PCI layer than the NIC device drivers. The device drivers should already have code for handling a suspend/resume due to a power cycle event. If you could make use of that then it is just a matter of implementing something in the hot-plug or PCIe drivers that would allow QEMU to signal when the device needs to go into D3 and when it can resume normal operation at D0. You could probably use the PCI Bus Master Enable bit as the test on if the device is ready for migration or not. If the bit is set you cannot migrate the VM, and if it is cleared than you are ready to migrate.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Wed, Dec 02, 2015 at 10:08:25PM +0800, Lan, Tianyu wrote: > On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote: > >>But > >>it requires guest OS to do specific configurations inside and rely on > >>bonding driver which blocks it work on Windows. > >> From performance side, > >>putting VF and virtio NIC under bonded interface will affect their > >>performance even when not do migration. These factors block to use VF > >>NIC passthough in some user cases(Especially in the cloud) which require > >>migration. > > > >That's really up to guest. You don't need to do bonding, > >you can just move the IP and mac from userspace, that's > >possible on most OS-es. > > > >Or write something in guest kernel that is more lightweight if you are > >so inclined. What we are discussing here is the host-guest interface, > >not the in-guest interface. > > > >>Current solution we proposed changes NIC driver and Qemu. Guest Os > >>doesn't need to do special thing for migration. > >>It's easy to deploy > > > > > >Except of course these patches don't even work properly yet. > > > >And when they do, even minor changes in host side NIC hardware across > >migration will break guests in hard to predict ways. > > Switching between PV and VF NIC will introduce network stop and the > latency of hotplug VF is measurable. > For some user cases(cloud service > and OPNFV) which are sensitive to network stabilization and performance, > these are not friend and blocks SRIOV NIC usage in these case. I find this hard to credit. hotplug is not normally a data path operation. > We hope > to find a better way to make SRIOV NIC work in these cases and this is > worth to do since SRIOV NIC provides better network performance compared > with PV NIC. If this is a performance optimization as the above implies, you need to include some numbers, and document how did you implement the switch and how did you measure the performance. > Current patches have some issues. I think we can find > solution for them andimprove them step by step. -- MST
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote: But it requires guest OS to do specific configurations inside and rely on bonding driver which blocks it work on Windows. From performance side, putting VF and virtio NIC under bonded interface will affect their performance even when not do migration. These factors block to use VF NIC passthough in some user cases(Especially in the cloud) which require migration. That's really up to guest. You don't need to do bonding, you can just move the IP and mac from userspace, that's possible on most OS-es. Or write something in guest kernel that is more lightweight if you are so inclined. What we are discussing here is the host-guest interface, not the in-guest interface. Current solution we proposed changes NIC driver and Qemu. Guest Os doesn't need to do special thing for migration. It's easy to deploy Except of course these patches don't even work properly yet. And when they do, even minor changes in host side NIC hardware across migration will break guests in hard to predict ways. Switching between PV and VF NIC will introduce network stop and the latency of hotplug VF is measurable. For some user cases(cloud service and OPNFV) which are sensitive to network stabilization and performance, these are not friend and blocks SRIOV NIC usage in these case. We hope to find a better way to make SRIOV NIC work in these cases and this is worth to do since SRIOV NIC provides better network performance compared with PV NIC. Current patches have some issues. I think we can find solution for them andimprove them step by step.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Tue, Dec 01, 2015 at 02:26:57PM +0800, Lan, Tianyu wrote: > > > On 11/30/2015 4:01 PM, Michael S. Tsirkin wrote: > >It is still not very clear what it is you are trying to achieve, and > >whether your patchset achieves it. You merely say "adding live > >migration" but it seems pretty clear this isn't about being able to > >migrate a guest transparently, since you are adding a host/guest > >handshake. > > > >This isn't about functionality either: I think that on KVM, it isn't > >hard to live migrate if you can do a host/guest handshake, even today, > >with no kernel changes: > >1. before migration, expose a pv nic to guest (can be done directly on > > boot) > >2. use e.g. a serial connection to move IP from an assigned device to pv nic > >3. maybe move the mac as well > >4. eject the assigned device > >5. detect eject on host (QEMU generates a DEVICE_DELETED event when this > >happens) and start migration > > > > This looks like the bonding driver solution Why does it? Unlike bonding, this doesn't touch data path or any kernel code. Just run a script from guest agent. > which put pv nic and VF > in one bonded interface under active-backup mode. The bonding driver > will switch from VF to PV nic automatically when VF is unplugged during > migration. This is the only available solution for VF NIC migration. It really isn't. For one, there is also teaming. > But > it requires guest OS to do specific configurations inside and rely on > bonding driver which blocks it work on Windows. > From performance side, > putting VF and virtio NIC under bonded interface will affect their > performance even when not do migration. These factors block to use VF > NIC passthough in some user cases(Especially in the cloud) which require > migration. That's really up to guest. You don't need to do bonding, you can just move the IP and mac from userspace, that's possible on most OS-es. Or write something in guest kernel that is more lightweight if you are so inclined. What we are discussing here is the host-guest interface, not the in-guest interface. > Current solution we proposed changes NIC driver and Qemu. Guest Os > doesn't need to do special thing for migration. > It's easy to deploy Except of course these patches don't even work properly yet. And when they do, even minor changes in host side NIC hardware across migration will break guests in hard to predict ways. > and > all changes are in the NIC driver, NIC vendor can implement migration > support just in the their driver. Kernel code and hypervisor code is not easier to develop and deploy than a userspace script. If that is all the motivation there is, that's a pretty small return on investment. -- MST
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On 11/30/2015 4:01 PM, Michael S. Tsirkin wrote: It is still not very clear what it is you are trying to achieve, and whether your patchset achieves it. You merely say "adding live migration" but it seems pretty clear this isn't about being able to migrate a guest transparently, since you are adding a host/guest handshake. This isn't about functionality either: I think that on KVM, it isn't hard to live migrate if you can do a host/guest handshake, even today, with no kernel changes: 1. before migration, expose a pv nic to guest (can be done directly on boot) 2. use e.g. a serial connection to move IP from an assigned device to pv nic 3. maybe move the mac as well 4. eject the assigned device 5. detect eject on host (QEMU generates a DEVICE_DELETED event when this happens) and start migration This looks like the bonding driver solution which put pv nic and VF in one bonded interface under active-backup mode. The bonding driver will switch from VF to PV nic automatically when VF is unplugged during migration. This is the only available solution for VF NIC migration. But it requires guest OS to do specific configurations inside and rely on bonding driver which blocks it work on Windows. From performance side, putting VF and virtio NIC under bonded interface will affect their performance even when not do migration. These factors block to use VF NIC passthough in some user cases(Especially in the cloud) which require migration. Current solution we proposed changes NIC driver and Qemu. Guest Os doesn't need to do special thing for migration. It's easy to deploy and all changes are in the NIC driver, NIC vendor can implement migration support just in the their driver.
Re: [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC
On Tue, Nov 24, 2015 at 09:35:17PM +0800, Lan Tianyu wrote: > This patchset is to propose a solution of adding live migration > support for SRIOV NIC. > > During migration, Qemu needs to let VF driver in the VM to know > migration start and end. Qemu adds faked PCI migration capability > to help to sync status between two sides during migration. > > Qemu triggers VF's mailbox irq via sending MSIX msg when migration > status is changed. VF driver tells Qemu its mailbox vector index > via the new PCI capability. In some cases(NIC is suspended or closed), > VF mailbox irq is freed and VF driver can disable irq injecting via > new capability. > > VF driver will put down nic before migration and put up again on > the target machine. It is still not very clear what it is you are trying to achieve, and whether your patchset achieves it. You merely say "adding live migration" but it seems pretty clear this isn't about being able to migrate a guest transparently, since you are adding a host/guest handshake. This isn't about functionality either: I think that on KVM, it isn't hard to live migrate if you can do a host/guest handshake, even today, with no kernel changes: 1. before migration, expose a pv nic to guest (can be done directly on boot) 2. use e.g. a serial connection to move IP from an assigned device to pv nic 3. maybe move the mac as well 4. eject the assigned device 5. detect eject on host (QEMU generates a DEVICE_DELETED event when this happens) and start migration Is this patchset a performance optimization then? If yes it needs to be accompanied with some performance numbers. -- MST