Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. There should not be a conflict. guest agent should just give NM the information, and have NM do the right thing. IOW, if you want to do this setup where the guest is given multiple NICs connected to the same host LAN, then I think we should just let the gues admin configure bonding in whatever manner they decide is best for their OS install. - during migration, unplug the passthroughed NIC. then do native migration. - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Regards, Daniel Users are actually asking for this functionality. Configuring everything manually is possible but error prone. We probably should leave manual configuration as an option for the 10% of people who want to tweak guest networking config, but this does not mean we shouldn't have it all work out of the box for 90% of people that just want networking to go fast with no tweaks. -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Thu, Apr 23, 2015 at 12:35:28PM -0400, Laine Stump wrote: On 04/22/2015 01:20 PM, Dr. David Alan Gilbert wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. IOW, if you want to do this setup where the guest is given multiple NICs connected to the same host LAN, then I think we should just let the gues admin configure bonding in whatever manner they decide is best for their OS install. I disagree; there should be a way for the admin not to have to do this manually; however it should interact well with existing management stuff. At the simplest, something that marks the two NICs in a discoverable way so that they can be seen that they're part of a set; with just that ID system then an installer or setup tool can notice them and offer to put them into a bond automatically; I'd assume it would be possible to add a rule somewhere that said anything with the same ID would automatically be added to the bond. I didn't mean the admin would literally configure stuff manually. I really just meant that the guest OS itself should decide how it is done, whether NetworkManager magically does the right thing, or the person building the cloud disk image provides a magic udev rule, or $something else. I just don't think that the QEMU guest agent should be involved, as that will definitely trample all over other things that manage networking in the guest. OK, good, that's about the same level I was at. I could see this being solved in the cloud disk images by using cloud-init metadata to mark the NICs as being in a set, or perhaps there is some magic you could define in SMBIOS tables, or something else again. A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS solution might. Would either of these work with hotplug though? I guess as the VM starts off with the pair of NICs, then when you remove one and add it back after migration then you don't need any more information added; so yes cloud-init or SMBIOS would do it. (I was thinking SMBIOS stuff in the way that you get device/slot numbering that NIC naming is sometimes based off). What about if we hot-add a new NIC later on (not during migration); a normal hot-add of a NIC now turns into a hot-add of two new NICs; how do we pass the information at hot-add time to provide that? Hmm, yes, actually hotplug would be a problem with that. A even simpler idea would be to just keep things real dumb and simply use the same MAC address for both NICs. Once you put them in a bond device, the kernel will be copying the MAC address of the first NIC into the second NIC anyway, so unless I'm missing
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote: On 04/23/2015 04:34 AM, Chen Fan wrote: On 04/20/2015 06:29 AM, Laine Stump wrote: On 04/17/2015 04:53 AM, Chen Fan wrote: - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). Hi Laine, I think SRIOV virtual NIC to support migration is good idea, but I also think some passthrough NIC without SRIOV-capable. for these NIC devices we only able to use hostdev to specify the passthrough function, so for these NIC I think we should support too. As I think you've already discovered, passing through non-SRIOV NICS is problematic. It is completely impossible for the host to change their MAC address before assigning them to the guest - the guest's driver sees standard netdev hardware and resets it, which resets the MAC address to the original value burned into the firmware. This makes management more complicated, especially when you get into scenarios such as what we're discussing (i.e. migration) where the actual hardware (and thus MAC address) may be different from one run to the next. Right, passing through PFs is also insecure. Let's get everything working fine with VFs first, worry about PFs later. Since libvirt's interface element requires a fixed MAC address in the XML, it's not possible to have an interface that gets the actual device from a network pool (without some serious hacking to that code), and there is no support for plain (non-network) hostdev device pools; there would need to be a separate (nonexistent) driver for that. Since the hostdev element relies on the PCI address of the device (in the source subelement, which also must be fixed) to determine which device to passthrough, a domain config with a hostdev that could be run on two different machines would require the device to reside at exactly the same PCI address on both machines, which is a very serious limitation to have in an environment large enough that migrating domains is a requirement. Also, non-SRIOV NICs are limited to a single device per physical port, meaning probably at most 4 devices per physical host PCIe slot, and this results in a greatly reduced density on the host (and even more so on the switch that connects to the host!) compared to even the old Intel 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch ports, while the same number of guests with non-SRIOV would take 4 PCIe slots and 14(!) switch ports. The difference is even more striking when comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox (also 64?) or SolarFlare (128?) card. And don't forget that, because you don't have pools of devices to be automatically chosen from, that each guest domain that will be migrated requires a reserved NIC on *every* machine it will be migrated to (no other domain can be configured to use that NIC, in order to avoid conflicts). Of course you could complicate the software by adding a driver that manages pools of generic hostdevs, and coordinates MAC address changes with the guest (part of what you're suggesting), but all that extra complexity not only takes a lot of time and effort to develop, it also creates more code that needs to be maintained and tested for regressions at each release. The alternative is to just spend $130 per host for an 82576 or Intel I350 card (these are the cheapest SRIOV options I'm aware of). When compared to the total cost of any hardware installation large enough to support migration and have performance requirements high enough that NIC passthrough is needed, this is a trivial amount. I guess the bottom line of all this is that (in my opinion, of course :-) supporting useful migration of domains that used passed-through non-SRIOV NICs would be an interesting experiment, but I don't see much utility to it, other than scratching an intellectual itch, and I'm concerned that it would create more long term maintenance cost than it was worth. I'm not sure it has no utility but it's easy to agree that VFs are more important, and focusing on this first is a good idea. -- libvir-list mailing list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote: On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. There should not be a conflict. guest agent should just give NM the information, and have NM do the right thing. That assumes the guest will have NM running. Unless you want to severely limit the scope of usefulness, you also need to handle systems that have NM disabled, and among those the different styles of system network config. It gets messy very fast. Users are actually asking for this functionality. Configuring everything manually is possible but error prone. Yes, but attempting to do it automatically is also error prone (due to the myriad of different guest network config systems, even just within the seemingly narrow category of Linux guests). Pick your poison :-) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote: On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote: On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. There should not be a conflict. guest agent should just give NM the information, and have NM do the right thing. That assumes the guest will have NM running. Unless you want to severely limit the scope of usefulness, you also need to handle systems that have NM disabled, and among those the different styles of system network config. It gets messy very fast. Also OpenStack already has a way to pass guest information about the required network setup, via cloud-init, so it would not be interested in any thing that used the QEMU guest agent to configure network manager. Which is really just another example of why this does not belong anywhere in libvirt or lower. The decision to use NM is a policy decision that will always be wrong for a non-negligble set of use cases and as such does not belong in libvirt or QEMU. It is the job of higher level apps to make that kind of policy decision. Users are actually asking for this functionality. Configuring everything manually is possible but error prone. Yes, but attempting to do it automatically is also error prone (due to the myriad of different guest network config systems, even just within the seemingly narrow category of Linux guests). Pick your poison :-) Also note I'm not debating the usefulness of the overall concept or the need for automation. It simply doesn't belong in libvirt or lower - it is a job for the higher level management applications to define a policy for that fits in with the way they are managing the virtual machines and the networking. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Tue, May 19, 2015 at 03:21:49PM +0100, Daniel P. Berrange wrote: On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote: On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote: On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. There should not be a conflict. guest agent should just give NM the information, and have NM do the right thing. That assumes the guest will have NM running. Unless you want to severely limit the scope of usefulness, you also need to handle systems that have NM disabled, and among those the different styles of system network config. It gets messy very fast. Also OpenStack already has a way to pass guest information about the required network setup, via cloud-init, so it would not be interested in any thing that used the QEMU guest agent to configure network manager. Which is really just another example of why this does not belong anywhere in libvirt or lower. The decision to use NM is a policy decision that will always be wrong for a non-negligble set of use cases and as such does not belong in libvirt or QEMU. It is the job of higher level apps to make that kind of policy decision. Using NM is up to users. On some of my VMs, I bring up links manually after each boot. We can provide the into to guest, and teach NM use that. If someone will write bash scripts to use this info, that's also fine. Users are actually asking for this functionality. Configuring everything manually is possible but error prone. Yes, but attempting to do it automatically is also error prone (due to the myriad of different guest network config systems, even just within the seemingly narrow category of Linux guests). Pick your poison :-) Also note I'm not debating the usefulness of the overall concept or the need for automation. It simply doesn't belong in libvirt or lower - it is a job for the higher level management applications to define a policy for that fits in with the way they are managing the virtual machines and the networking. Regards, Daniel Users are asking for this automation, so it's useful to them. We can always tell them no. Saying no because we seem unable to be able to decide where this useful functionality fits does not look like a good reason. -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote: On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote: On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote: On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. I'm not really clear on why libvirt/guest agent needs to be involved in this. I think configuration of networking is really something that must be left to the guest OS admin to control. I don't think the guest agent should be trying to reconfigure guest networking itself, as that is inevitably going to conflict with configuration attempted by things in the guest like NetworkManager or systemd-networkd. There should not be a conflict. guest agent should just give NM the information, and have NM do the right thing. That assumes the guest will have NM running. Unless you want to severely limit the scope of usefulness, you also need to handle systems that have NM disabled, and among those the different styles of system network config. It gets messy very fast. Systems with system network config can just do the configuration manually, they won't be worse off than they are now. Users are actually asking for this functionality. Configuring everything manually is possible but error prone. Yes, but attempting to do it automatically is also error prone (due to the myriad of different guest network config systems, even just within the seemingly narrow category of Linux guests). Pick your poison :-) Make it work well for RHEL guests. Others will work with less integration. -- MST -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On 04/23/2015 04:34 AM, Chen Fan wrote: On 04/20/2015 06:29 AM, Laine Stump wrote: On 04/17/2015 04:53 AM, Chen Fan wrote: - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). Hi Laine, I think SRIOV virtual NIC to support migration is good idea, but I also think some passthrough NIC without SRIOV-capable. for these NIC devices we only able to use hostdev to specify the passthrough function, so for these NIC I think we should support too. As I think you've already discovered, passing through non-SRIOV NICS is problematic. It is completely impossible for the host to change their MAC address before assigning them to the guest - the guest's driver sees standard netdev hardware and resets it, which resets the MAC address to the original value burned into the firmware. This makes management more complicated, especially when you get into scenarios such as what we're discussing (i.e. migration) where the actual hardware (and thus MAC address) may be different from one run to the next. Since libvirt's interface element requires a fixed MAC address in the XML, it's not possible to have an interface that gets the actual device from a network pool (without some serious hacking to that code), and there is no support for plain (non-network) hostdev device pools; there would need to be a separate (nonexistent) driver for that. Since the hostdev element relies on the PCI address of the device (in the source subelement, which also must be fixed) to determine which device to passthrough, a domain config with a hostdev that could be run on two different machines would require the device to reside at exactly the same PCI address on both machines, which is a very serious limitation to have in an environment large enough that migrating domains is a requirement. Also, non-SRIOV NICs are limited to a single device per physical port, meaning probably at most 4 devices per physical host PCIe slot, and this results in a greatly reduced density on the host (and even more so on the switch that connects to the host!) compared to even the old Intel 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch ports, while the same number of guests with non-SRIOV would take 4 PCIe slots and 14(!) switch ports. The difference is even more striking when comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox (also 64?) or SolarFlare (128?) card. And don't forget that, because you don't have pools of devices to be automatically chosen from, that each guest domain that will be migrated requires a reserved NIC on *every* machine it will be migrated to (no other domain can be configured to use that NIC, in order to avoid conflicts). Of course you could complicate the software by adding a driver that manages pools of generic hostdevs, and coordinates MAC address changes with the guest (part of what you're suggesting), but all that extra complexity not only takes a lot of time and effort to develop, it also creates more code that needs to be maintained and tested for regressions at each release. The alternative is to just spend $130 per host for an 82576 or Intel I350 card (these are the cheapest SRIOV options I'm aware of). When compared to the total cost of any hardware installation large enough to support migration and have performance requirements high enough that NIC passthrough is needed, this is a trivial amount. I guess the bottom line of all this is that (in my opinion, of course :-) supporting useful migration of domains that used passed-through non-SRIOV NICs would be an interesting experiment, but I don't see much utility to it, other than scratching an intellectual itch, and I'm concerned that it would create more long term maintenance cost than it was worth. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On 04/22/2015 12:22 AM, Chen Fan wrote: Hi Laine, Thanks for your review for my patches. and do you know that solarflare's patches have made some update version since https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html ? if not, I hope to go on to complete this work. ;) I haven't heard of any updates. Their priorities may have changed. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On 04/20/2015 06:29 AM, Laine Stump wrote: On 04/17/2015 04:53 AM, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. This functionality has been on my mind/bug list for a long time, but I haven't been able to pursue it much. See this BZ, along with the original patches submitted by Shradha Shah from SolarFlare: https://bugzilla.redhat.com/show_bug.cgi?id=896716 (I was a bit optimistic in my initial review of the patches - there are actually a lot of issues that weren't handled by those patches.) So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. An interesting idea, but I think that is a 2nd level enhancement, not necessary initially (and maybe not ever, due to the high possibility of it being extremely difficult to get right in 100% of the cases). - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. This isn't quite making sense - the bond will be on the guest, which may not have netcf installed. Anyway, I think it should be up to the guest's own system network config to have the bond already setup. If you try to impose it from outside that infrastructure, you run too much risk of running afoul of something on the guest (e.g. NetworkManager) - during migration, unplug the passthroughed NIC. then do native migration. Correct. This is the most important part. But not just unplugging it, you also need to wait until the unplug operation completes (it is asynchronous). (After this point, the emulated NIC that is part of the bond would get all of the traffic). - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). Hi Laine, I think SRIOV virtual NIC to support migration is good idea, but I also think some passthrough NIC without SRIOV-capable. for these NIC devices we only able to use hostdev to specify the passthrough function, so for these NIC I think we should support too. Thanks, Chen TODO: 1. when hot add a new NIC in destination side after migration finished, the NIC device need to re-enslave on bonding device in guest. otherwise, it is offline. maybe we should consider bonding driver to support add interfaces dynamically. I never looked at the details of how SolarFlare's code handled the guest side (they have/had their own patchset they maintained for some older version of libvirt which integrated with some sort of enhanced bonding driver on the guests). I assumed the bond driver could handle this already, but have to say I never investigated. This is an example on how this might work, so I want to hear some voices about this scenario. Thanks, Chen Chen Fan (7): qemu-agent: add agent init callback when detecting guest setup qemu: add guest init event callback to do the initialize work for guest hostdev: add a 'bond' type element in hostdev element Putting this into hostdev is the wrong approach, for two reasons: 1) it doesn't account for the device to be used being in a different address on the source and
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
Hi Laine, Thanks for your review for my patches. and do you know that solarflare's patches have made some update version since https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html ? if not, I hope to go on to complete this work. ;) Thanks, Chen On 04/20/2015 06:29 AM, Laine Stump wrote: On 04/17/2015 04:53 AM, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. This functionality has been on my mind/bug list for a long time, but I haven't been able to pursue it much. See this BZ, along with the original patches submitted by Shradha Shah from SolarFlare: https://bugzilla.redhat.com/show_bug.cgi?id=896716 (I was a bit optimistic in my initial review of the patches - there are actually a lot of issues that weren't handled by those patches.) So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. An interesting idea, but I think that is a 2nd level enhancement, not necessary initially (and maybe not ever, due to the high possibility of it being extremely difficult to get right in 100% of the cases). - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. This isn't quite making sense - the bond will be on the guest, which may not have netcf installed. Anyway, I think it should be up to the guest's own system network config to have the bond already setup. If you try to impose it from outside that infrastructure, you run too much risk of running afoul of something on the guest (e.g. NetworkManager) - during migration, unplug the passthroughed NIC. then do native migration. Correct. This is the most important part. But not just unplugging it, you also need to wait until the unplug operation completes (it is asynchronous). (After this point, the emulated NIC that is part of the bond would get all of the traffic). - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). TODO: 1. when hot add a new NIC in destination side after migration finished, the NIC device need to re-enslave on bonding device in guest. otherwise, it is offline. maybe we should consider bonding driver to support add interfaces dynamically. I never looked at the details of how SolarFlare's code handled the guest side (they have/had their own patchset they maintained for some older version of libvirt which integrated with some sort of enhanced bonding driver on the guests). I assumed the bond driver could handle this already, but have to say I never investigated. This is an example on how this might work, so I want to hear some voices about this scenario. Thanks, Chen Chen Fan (7): qemu-agent: add agent init callback when detecting guest setup qemu: add guest init event callback to do the initialize work for guest hostdev: add a 'bond' type element in hostdev element Putting this into hostdev is the wrong approach, for two reasons: 1) it doesn't account for the device to be used being in a different address on the source and destination
Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
On 04/17/2015 04:53 AM, Chen Fan wrote: backgrond: Live migration is one of the most important features of virtualization technology. With regard to recent virtualization techniques, performance of network I/O is critical. Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant performance gap with native network I/O. Pass-through network devices have near native performance, however, they have thus far prevented live migration. No existing methods solve the problem of live migration with pass-through devices perfectly. There was an idea to solve the problem in website: https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf Please refer to above document for detailed information. This functionality has been on my mind/bug list for a long time, but I haven't been able to pursue it much. See this BZ, along with the original patches submitted by Shradha Shah from SolarFlare: https://bugzilla.redhat.com/show_bug.cgi?id=896716 (I was a bit optimistic in my initial review of the patches - there are actually a lot of issues that weren't handled by those patches.) So I think this problem maybe could be solved by using the combination of existing technology. and the following steps are we considering to implement: - before boot VM, we anticipate to specify two NICs for creating bonding device (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. An interesting idea, but I think that is a 2nd level enhancement, not necessary initially (and maybe not ever, due to the high possibility of it being extremely difficult to get right in 100% of the cases). - when qemu-guest-agent startup in guest it would send a notification to libvirt, then libvirt will call the previous registered initialize callbacks. so through the callback functions, we can create the bonding device according to the XML configuration. and here we use netcf tool which can facilitate to create bonding device easily. This isn't quite making sense - the bond will be on the guest, which may not have netcf installed. Anyway, I think it should be up to the guest's own system network config to have the bond already setup. If you try to impose it from outside that infrastructure, you run too much risk of running afoul of something on the guest (e.g. NetworkManager) - during migration, unplug the passthroughed NIC. then do native migration. Correct. This is the most important part. But not just unplugging it, you also need to wait until the unplug operation completes (it is asynchronous). (After this point, the emulated NIC that is part of the bond would get all of the traffic). - on destination side, check whether need to hotplug new NIC according to specified XML. usually, we use migrate --xml command option to specify the destination host NIC mac address to hotplug a new NIC, because source side passthrough NIC mac address is different, then hotplug the deivce according to the destination XML configuration. Why does the MAC address need to be different? Are you suggesting doing this with passed-through non-SRIOV NICs? An SRIOV virtual function gets its MAC address from the libvirt config, so it's very simple to use the same MAC address across the migration. Any network card that would be able to do this on any sort of useful scale will be SRIOV-capable (or should be replaced with one that is - some of them are not that expensive). TODO: 1. when hot add a new NIC in destination side after migration finished, the NIC device need to re-enslave on bonding device in guest. otherwise, it is offline. maybe we should consider bonding driver to support add interfaces dynamically. I never looked at the details of how SolarFlare's code handled the guest side (they have/had their own patchset they maintained for some older version of libvirt which integrated with some sort of enhanced bonding driver on the guests). I assumed the bond driver could handle this already, but have to say I never investigated. This is an example on how this might work, so I want to hear some voices about this scenario. Thanks, Chen Chen Fan (7): qemu-agent: add agent init callback when detecting guest setup qemu: add guest init event callback to do the initialize work for guest hostdev: add a 'bond' type element in hostdev element Putting this into hostdev is the wrong approach, for two reasons: 1) it doesn't account for the device to be used being in a different address on the source and destination hosts, 2) the interface element already has much of the config you need, and an interface type supporting hostdev passthrough. It has been possible to do passthrough of an SRIOV VF via interface type='hostdev' for a long time now and, even better, via an interface