Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Michael S. Tsirkin
On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
 On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
  backgrond:
  Live migration is one of the most important features of virtualization 
  technology.
  With regard to recent virtualization techniques, performance of network I/O 
  is critical.
  Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a 
  significant
  performance gap with native network I/O. Pass-through network devices have 
  near
  native performance, however, they have thus far prevented live migration. 
  No existing
  methods solve the problem of live migration with pass-through devices 
  perfectly.
  
  There was an idea to solve the problem in website:
  https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
  Please refer to above document for detailed information.
  
  So I think this problem maybe could be solved by using the combination of 
  existing
  technology. and the following steps are we considering to implement:
  
  -  before boot VM, we anticipate to specify two NICs for creating bonding 
  device
 (one plugged and one virtual NIC) in XML. here we can specify the NIC's 
  mac addresses
 in XML, which could facilitate qemu-guest-agent to find the network 
  interfaces in guest.
  
  -  when qemu-guest-agent startup in guest it would send a notification to 
  libvirt,
 then libvirt will call the previous registered initialize callbacks. so 
  through
 the callback functions, we can create the bonding device according to 
  the XML
 configuration. and here we use netcf tool which can facilitate to create 
  bonding device
 easily.
 
 I'm not really clear on why libvirt/guest agent needs to be involved in this.
 I think configuration of networking is really something that must be left to
 the guest OS admin to control. I don't think the guest agent should be trying
 to reconfigure guest networking itself, as that is inevitably going to 
 conflict
 with configuration attempted by things in the guest like NetworkManager or
 systemd-networkd.

There should not be a conflict.
guest agent should just give NM the information, and have  NM do
the right thing.

 IOW, if you want to do this setup where the guest is given multiple NICs 
 connected
 to the same host LAN, then I think we should just let the gues admin configure
 bonding in whatever manner they decide is best for their OS install.
 
  -  during migration, unplug the passthroughed NIC. then do native migration.
  
  -  on destination side, check whether need to hotplug new NIC according to 
  specified XML.
 usually, we use migrate --xml command option to specify the 
  destination host NIC mac
 address to hotplug a new NIC, because source side passthrough NIC mac 
  address is different,
 then hotplug the deivce according to the destination XML configuration.
 
 Regards,
 Daniel

Users are actually asking for this functionality.

Configuring everything manually is possible but error
prone. We probably should leave manual configuration
as an option for the 10% of people who want to tweak
guest networking config, but this does not mean we shouldn't
have it all work out of the box for 90% of people that
just want networking to go fast with no tweaks.




 -- 
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Michael S. Tsirkin
On Thu, Apr 23, 2015 at 12:35:28PM -0400, Laine Stump wrote:
 On 04/22/2015 01:20 PM, Dr. David Alan Gilbert wrote:
  * Daniel P. Berrange (berra...@redhat.com) wrote:
  On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote:
  * Daniel P. Berrange (berra...@redhat.com) wrote:
  On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
  * Daniel P. Berrange (berra...@redhat.com) wrote:
  On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
  backgrond:
  Live migration is one of the most important features of 
  virtualization technology.
  With regard to recent virtualization techniques, performance of 
  network I/O is critical.
  Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) 
  has a significant
  performance gap with native network I/O. Pass-through network devices 
  have near
  native performance, however, they have thus far prevented live 
  migration. No existing
  methods solve the problem of live migration with pass-through devices 
  perfectly.
 
  There was an idea to solve the problem in website:
  https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
  Please refer to above document for detailed information.
 
  So I think this problem maybe could be solved by using the 
  combination of existing
  technology. and the following steps are we considering to implement:
 
  -  before boot VM, we anticipate to specify two NICs for creating 
  bonding device
 (one plugged and one virtual NIC) in XML. here we can specify the 
  NIC's mac addresses
 in XML, which could facilitate qemu-guest-agent to find the 
  network interfaces in guest.
 
  -  when qemu-guest-agent startup in guest it would send a 
  notification to libvirt,
 then libvirt will call the previous registered initialize 
  callbacks. so through
 the callback functions, we can create the bonding device according 
  to the XML
 configuration. and here we use netcf tool which can facilitate to 
  create bonding device
 easily.
  I'm not really clear on why libvirt/guest agent needs to be involved 
  in this.
  I think configuration of networking is really something that must be 
  left to
  the guest OS admin to control. I don't think the guest agent should be 
  trying
  to reconfigure guest networking itself, as that is inevitably going to 
  conflict
  with configuration attempted by things in the guest like 
  NetworkManager or
  systemd-networkd.
 
  IOW, if you want to do this setup where the guest is given multiple 
  NICs connected
  to the same host LAN, then I think we should just let the gues admin 
  configure
  bonding in whatever manner they decide is best for their OS install.
  I disagree; there should be a way for the admin not to have to do this 
  manually;
  however it should interact well with existing management stuff.
 
  At the simplest, something that marks the two NICs in a discoverable way
  so that they can be seen that they're part of a set;  with just that ID 
  system
  then an installer or setup tool can notice them and offer to put them 
  into
  a bond automatically; I'd assume it would be possible to add a rule 
  somewhere
  that said anything with the same ID would automatically be added to the 
  bond.
  I didn't mean the admin would literally configure stuff manually. I 
  really
  just meant that the guest OS itself should decide how it is done, whether
  NetworkManager magically does the right thing, or the person building the
  cloud disk image provides a magic udev rule, or $something else. I just
  don't think that the QEMU guest agent should be involved, as that will
  definitely trample all over other things that manage networking in the
  guest.
  OK, good, that's about the same level I was at.
 
  I could see this being solved in the cloud disk images by using
  cloud-init metadata to mark the NICs as being in a set, or perhaps there
  is some magic you could define in SMBIOS tables, or something else again.
  A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
  solution might.
  Would either of these work with hotplug though?   I guess as the VM starts
  off with the pair of NICs, then when you remove one and add it back after
  migration then you don't need any more information added; so yes
  cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
  in the way that you get device/slot numbering that NIC naming is 
  sometimes based
  off).
 
  What about if we hot-add a new NIC later on (not during migration);
  a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
  do we pass the information at hot-add time to provide that?
  Hmm, yes, actually hotplug would be a problem with that.
 
  A even simpler idea would be to just keep things real dumb and simply
  use the same MAC address for both NICs. Once you put them in a bond
  device, the kernel will be copying the MAC address of the first NIC
  into the second NIC anyway, so unless I'm missing 

Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Michael S. Tsirkin
On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote:
 On 04/23/2015 04:34 AM, Chen Fan wrote:
 
  On 04/20/2015 06:29 AM, Laine Stump wrote:
  On 04/17/2015 04:53 AM, Chen Fan wrote:
  -  on destination side, check whether need to hotplug new NIC
  according to specified XML.
  usually, we use migrate --xml command option to specify the
  destination host NIC mac
  address to hotplug a new NIC, because source side passthrough
  NIC mac address is different,
  then hotplug the deivce according to the destination XML
  configuration.
 
  Why does the MAC address need to be different? Are you suggesting doing
  this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
  its MAC address from the libvirt config, so it's very simple to use the
  same MAC address across the migration. Any network card that would be
  able to do this on any sort of useful scale will be SRIOV-capable (or
  should be replaced with one that is - some of them are not that
  expensive).
 
  Hi Laine,
 
  I think SRIOV virtual NIC to support migration is good idea,
  but I also think some passthrough NIC without SRIOV-capable. for
  these NIC devices we only able to use hostdev to specify the
  passthrough
  function, so for these NIC I think we should support too.
 
 As I think you've already discovered, passing through non-SRIOV NICS is
 problematic. It is completely impossible for the host to change their
 MAC address before assigning them to the guest - the guest's driver sees
 standard netdev hardware and resets it, which resets the MAC address to
 the original value burned into the firmware. This makes management more
 complicated, especially when you get into scenarios such as what we're
 discussing (i.e. migration) where the actual hardware (and thus MAC
 address) may be different from one run to the next.

Right, passing through PFs is also insecure.  Let's get
everything working fine with VFs first, worry about PFs later.


 Since libvirt's interface element requires a fixed MAC address in the
 XML, it's not possible to have an interface that gets the actual
 device from a network pool (without some serious hacking to that code),
 and there is no support for plain (non-network) hostdev device pools;
 there would need to be a separate (nonexistent) driver for that. Since
 the hostdev element relies on the PCI address of the device (in the
 source subelement, which also must be fixed) to determine which device
 to passthrough, a domain config with a hostdev that could be run on
 two different machines would require the device to reside at exactly the
 same PCI address on both machines, which is a very serious limitation to
 have in an environment large enough that migrating domains is a requirement.
 
 Also, non-SRIOV NICs are limited to a single device per physical port,
 meaning probably at most 4 devices per physical host PCIe slot, and this
 results in a greatly reduced density on the host (and even more so on
 the switch that connects to the host!) compared to even the old Intel
 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
 - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
 ports, while the same number of guests with non-SRIOV would take 4 PCIe
 slots and 14(!) switch ports. The difference is even more striking when
 comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
 (also 64?) or SolarFlare (128?) card. And don't forget that, because you
 don't have pools of devices to be automatically chosen from, that each
 guest domain that will be migrated requires a reserved NIC on *every*
 machine it will be migrated to (no other domain can be configured to use
 that NIC, in order to avoid conflicts).
 
 Of course you could complicate the software by adding a driver that
 manages pools of generic hostdevs, and coordinates MAC address changes
 with the guest (part of what you're suggesting), but all that extra
 complexity not only takes a lot of time and effort to develop, it also
 creates more code that needs to be maintained and tested for regressions
 at each release.
 
 The alternative is to just spend $130 per host for an 82576 or Intel
 I350 card (these are the cheapest SRIOV options I'm aware of). When
 compared to the total cost of any hardware installation large enough to
 support migration and have performance requirements high enough that NIC
 passthrough is needed, this is a trivial amount.
 
 I guess the bottom line of all this is that (in my opinion, of course
 :-) supporting useful migration of domains that used passed-through
 non-SRIOV NICs would be an interesting experiment, but I don't see much
 utility to it, other than scratching an intellectual itch, and I'm
 concerned that it would create more long term maintenance cost than it
 was worth.

I'm not sure it has no utility but it's easy to agree that
VFs are more important, and focusing on this first is a good
idea.

--
libvir-list mailing list

Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Laine Stump
On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
 On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
 On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
 backgrond:
 Live migration is one of the most important features of virtualization 
 technology.
 With regard to recent virtualization techniques, performance of network I/O 
 is critical.
 Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a 
 significant
 performance gap with native network I/O. Pass-through network devices have 
 near
 native performance, however, they have thus far prevented live migration. 
 No existing
 methods solve the problem of live migration with pass-through devices 
 perfectly.

 There was an idea to solve the problem in website:
 https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
 Please refer to above document for detailed information.

 So I think this problem maybe could be solved by using the combination of 
 existing
 technology. and the following steps are we considering to implement:

 -  before boot VM, we anticipate to specify two NICs for creating bonding 
 device
(one plugged and one virtual NIC) in XML. here we can specify the NIC's 
 mac addresses
in XML, which could facilitate qemu-guest-agent to find the network 
 interfaces in guest.

 -  when qemu-guest-agent startup in guest it would send a notification to 
 libvirt,
then libvirt will call the previous registered initialize callbacks. so 
 through
the callback functions, we can create the bonding device according to 
 the XML
configuration. and here we use netcf tool which can facilitate to create 
 bonding device
easily.
 I'm not really clear on why libvirt/guest agent needs to be involved in this.
 I think configuration of networking is really something that must be left to
 the guest OS admin to control. I don't think the guest agent should be trying
 to reconfigure guest networking itself, as that is inevitably going to 
 conflict
 with configuration attempted by things in the guest like NetworkManager or
 systemd-networkd.
 There should not be a conflict.
 guest agent should just give NM the information, and have  NM do
 the right thing.

That assumes the guest will have NM running. Unless you want to severely
limit the scope of usefulness, you also need to handle systems that have
NM disabled, and among those the different styles of system network
config. It gets messy very fast.


 Users are actually asking for this functionality.

 Configuring everything manually is possible but error
 prone.

Yes, but attempting to do it automatically is also error prone (due to
the myriad of different guest network config systems, even just within
the seemingly narrow category of Linux guests). Pick your poison :-)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Daniel P. Berrange
On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
 On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
  On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
  On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
  backgrond:
  Live migration is one of the most important features of virtualization 
  technology.
  With regard to recent virtualization techniques, performance of network 
  I/O is critical.
  Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has 
  a significant
  performance gap with native network I/O. Pass-through network devices 
  have near
  native performance, however, they have thus far prevented live migration. 
  No existing
  methods solve the problem of live migration with pass-through devices 
  perfectly.
 
  There was an idea to solve the problem in website:
  https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
  Please refer to above document for detailed information.
 
  So I think this problem maybe could be solved by using the combination of 
  existing
  technology. and the following steps are we considering to implement:
 
  -  before boot VM, we anticipate to specify two NICs for creating bonding 
  device
 (one plugged and one virtual NIC) in XML. here we can specify the 
  NIC's mac addresses
 in XML, which could facilitate qemu-guest-agent to find the network 
  interfaces in guest.
 
  -  when qemu-guest-agent startup in guest it would send a notification to 
  libvirt,
 then libvirt will call the previous registered initialize callbacks. 
  so through
 the callback functions, we can create the bonding device according to 
  the XML
 configuration. and here we use netcf tool which can facilitate to 
  create bonding device
 easily.
  I'm not really clear on why libvirt/guest agent needs to be involved in 
  this.
  I think configuration of networking is really something that must be left 
  to
  the guest OS admin to control. I don't think the guest agent should be 
  trying
  to reconfigure guest networking itself, as that is inevitably going to 
  conflict
  with configuration attempted by things in the guest like NetworkManager or
  systemd-networkd.
  There should not be a conflict.
  guest agent should just give NM the information, and have  NM do
  the right thing.
 
 That assumes the guest will have NM running. Unless you want to severely
 limit the scope of usefulness, you also need to handle systems that have
 NM disabled, and among those the different styles of system network
 config. It gets messy very fast.

Also OpenStack already has a way to pass guest information about the
required network setup, via cloud-init, so it would not be interested
in any thing that used the QEMU guest agent to configure network
manager. Which is really just another example of why this does not
belong anywhere in libvirt or lower.  The decision to use NM is a
policy decision that will always be wrong for a non-negligble set
of use cases and as such does not belong in libvirt or QEMU. It is
the job of higher level apps to make that kind of policy decision.

  Users are actually asking for this functionality.
 
  Configuring everything manually is possible but error
  prone.
 
 Yes, but attempting to do it automatically is also error prone (due to
 the myriad of different guest network config systems, even just within
 the seemingly narrow category of Linux guests). Pick your poison :-)

Also note I'm not debating the usefulness of the overall concept
or the need for automation. It simply doesn't belong in libvirt or
lower - it is a job for the higher level management applications to
define a policy for that fits in with the way they are managing the
virtual machines and the networking.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Michael S. Tsirkin
On Tue, May 19, 2015 at 03:21:49PM +0100, Daniel P. Berrange wrote:
 On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
  On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
   On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
   On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
   backgrond:
   Live migration is one of the most important features of virtualization 
   technology.
   With regard to recent virtualization techniques, performance of network 
   I/O is critical.
   Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) 
   has a significant
   performance gap with native network I/O. Pass-through network devices 
   have near
   native performance, however, they have thus far prevented live 
   migration. No existing
   methods solve the problem of live migration with pass-through devices 
   perfectly.
  
   There was an idea to solve the problem in website:
   https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
   Please refer to above document for detailed information.
  
   So I think this problem maybe could be solved by using the combination 
   of existing
   technology. and the following steps are we considering to implement:
  
   -  before boot VM, we anticipate to specify two NICs for creating 
   bonding device
  (one plugged and one virtual NIC) in XML. here we can specify the 
   NIC's mac addresses
  in XML, which could facilitate qemu-guest-agent to find the network 
   interfaces in guest.
  
   -  when qemu-guest-agent startup in guest it would send a notification 
   to libvirt,
  then libvirt will call the previous registered initialize callbacks. 
   so through
  the callback functions, we can create the bonding device according 
   to the XML
  configuration. and here we use netcf tool which can facilitate to 
   create bonding device
  easily.
   I'm not really clear on why libvirt/guest agent needs to be involved in 
   this.
   I think configuration of networking is really something that must be 
   left to
   the guest OS admin to control. I don't think the guest agent should be 
   trying
   to reconfigure guest networking itself, as that is inevitably going to 
   conflict
   with configuration attempted by things in the guest like NetworkManager 
   or
   systemd-networkd.
   There should not be a conflict.
   guest agent should just give NM the information, and have  NM do
   the right thing.
  
  That assumes the guest will have NM running. Unless you want to severely
  limit the scope of usefulness, you also need to handle systems that have
  NM disabled, and among those the different styles of system network
  config. It gets messy very fast.
 
 Also OpenStack already has a way to pass guest information about the
 required network setup, via cloud-init, so it would not be interested
 in any thing that used the QEMU guest agent to configure network
 manager. Which is really just another example of why this does not
 belong anywhere in libvirt or lower.  The decision to use NM is a
 policy decision that will always be wrong for a non-negligble set
 of use cases and as such does not belong in libvirt or QEMU. It is
 the job of higher level apps to make that kind of policy decision.

Using NM is up to users. On some of my VMs, I bring up links manually
after each boot.  We can provide the into to guest, and teach NM use
that.  If someone will write bash scripts to use this info, that's also
fine.

   Users are actually asking for this functionality.
  
   Configuring everything manually is possible but error
   prone.
  
  Yes, but attempting to do it automatically is also error prone (due to
  the myriad of different guest network config systems, even just within
  the seemingly narrow category of Linux guests). Pick your poison :-)
 
 Also note I'm not debating the usefulness of the overall concept
 or the need for automation. It simply doesn't belong in libvirt or
 lower - it is a job for the higher level management applications to
 define a policy for that fits in with the way they are managing the
 virtual machines and the networking.
 
 Regards,
 Daniel

Users are asking for this automation, so it's useful to them. We can
always tell them no. Saying no because we seem unable to be able to
decide where this useful functionality fits does not look like a good
reason.

 -- 
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-05-19 Thread Michael S. Tsirkin
On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
 On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
  On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
  On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
  backgrond:
  Live migration is one of the most important features of virtualization 
  technology.
  With regard to recent virtualization techniques, performance of network 
  I/O is critical.
  Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has 
  a significant
  performance gap with native network I/O. Pass-through network devices 
  have near
  native performance, however, they have thus far prevented live migration. 
  No existing
  methods solve the problem of live migration with pass-through devices 
  perfectly.
 
  There was an idea to solve the problem in website:
  https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
  Please refer to above document for detailed information.
 
  So I think this problem maybe could be solved by using the combination of 
  existing
  technology. and the following steps are we considering to implement:
 
  -  before boot VM, we anticipate to specify two NICs for creating bonding 
  device
 (one plugged and one virtual NIC) in XML. here we can specify the 
  NIC's mac addresses
 in XML, which could facilitate qemu-guest-agent to find the network 
  interfaces in guest.
 
  -  when qemu-guest-agent startup in guest it would send a notification to 
  libvirt,
 then libvirt will call the previous registered initialize callbacks. 
  so through
 the callback functions, we can create the bonding device according to 
  the XML
 configuration. and here we use netcf tool which can facilitate to 
  create bonding device
 easily.
  I'm not really clear on why libvirt/guest agent needs to be involved in 
  this.
  I think configuration of networking is really something that must be left 
  to
  the guest OS admin to control. I don't think the guest agent should be 
  trying
  to reconfigure guest networking itself, as that is inevitably going to 
  conflict
  with configuration attempted by things in the guest like NetworkManager or
  systemd-networkd.
  There should not be a conflict.
  guest agent should just give NM the information, and have  NM do
  the right thing.
 
 That assumes the guest will have NM running. Unless you want to severely
 limit the scope of usefulness, you also need to handle systems that have
 NM disabled, and among those the different styles of system network
 config. It gets messy very fast.

Systems with system network config can just do the configuration
manually, they won't be worse off than they are now.

 
  Users are actually asking for this functionality.
 
  Configuring everything manually is possible but error
  prone.
 
 Yes, but attempting to do it automatically is also error prone (due to
 the myriad of different guest network config systems, even just within
 the seemingly narrow category of Linux guests). Pick your poison :-)

Make it work well for RHEL guests. Others will work with less integration.

-- 
MST

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-04-23 Thread Laine Stump
On 04/23/2015 04:34 AM, Chen Fan wrote:

 On 04/20/2015 06:29 AM, Laine Stump wrote:
 On 04/17/2015 04:53 AM, Chen Fan wrote:
 -  on destination side, check whether need to hotplug new NIC
 according to specified XML.
 usually, we use migrate --xml command option to specify the
 destination host NIC mac
 address to hotplug a new NIC, because source side passthrough
 NIC mac address is different,
 then hotplug the deivce according to the destination XML
 configuration.

 Why does the MAC address need to be different? Are you suggesting doing
 this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
 its MAC address from the libvirt config, so it's very simple to use the
 same MAC address across the migration. Any network card that would be
 able to do this on any sort of useful scale will be SRIOV-capable (or
 should be replaced with one that is - some of them are not that
 expensive).

 Hi Laine,

 I think SRIOV virtual NIC to support migration is good idea,
 but I also think some passthrough NIC without SRIOV-capable. for
 these NIC devices we only able to use hostdev to specify the
 passthrough
 function, so for these NIC I think we should support too.

As I think you've already discovered, passing through non-SRIOV NICS is
problematic. It is completely impossible for the host to change their
MAC address before assigning them to the guest - the guest's driver sees
standard netdev hardware and resets it, which resets the MAC address to
the original value burned into the firmware. This makes management more
complicated, especially when you get into scenarios such as what we're
discussing (i.e. migration) where the actual hardware (and thus MAC
address) may be different from one run to the next.

Since libvirt's interface element requires a fixed MAC address in the
XML, it's not possible to have an interface that gets the actual
device from a network pool (without some serious hacking to that code),
and there is no support for plain (non-network) hostdev device pools;
there would need to be a separate (nonexistent) driver for that. Since
the hostdev element relies on the PCI address of the device (in the
source subelement, which also must be fixed) to determine which device
to passthrough, a domain config with a hostdev that could be run on
two different machines would require the device to reside at exactly the
same PCI address on both machines, which is a very serious limitation to
have in an environment large enough that migrating domains is a requirement.

Also, non-SRIOV NICs are limited to a single device per physical port,
meaning probably at most 4 devices per physical host PCIe slot, and this
results in a greatly reduced density on the host (and even more so on
the switch that connects to the host!) compared to even the old Intel
82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
- with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
ports, while the same number of guests with non-SRIOV would take 4 PCIe
slots and 14(!) switch ports. The difference is even more striking when
comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
(also 64?) or SolarFlare (128?) card. And don't forget that, because you
don't have pools of devices to be automatically chosen from, that each
guest domain that will be migrated requires a reserved NIC on *every*
machine it will be migrated to (no other domain can be configured to use
that NIC, in order to avoid conflicts).

Of course you could complicate the software by adding a driver that
manages pools of generic hostdevs, and coordinates MAC address changes
with the guest (part of what you're suggesting), but all that extra
complexity not only takes a lot of time and effort to develop, it also
creates more code that needs to be maintained and tested for regressions
at each release.

The alternative is to just spend $130 per host for an 82576 or Intel
I350 card (these are the cheapest SRIOV options I'm aware of). When
compared to the total cost of any hardware installation large enough to
support migration and have performance requirements high enough that NIC
passthrough is needed, this is a trivial amount.

I guess the bottom line of all this is that (in my opinion, of course
:-) supporting useful migration of domains that used passed-through
non-SRIOV NICs would be an interesting experiment, but I don't see much
utility to it, other than scratching an intellectual itch, and I'm
concerned that it would create more long term maintenance cost than it
was worth.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-04-23 Thread Laine Stump
On 04/22/2015 12:22 AM, Chen Fan wrote:
 Hi Laine,

 Thanks for your review for my patches.

 and do you know that solarflare's patches have made some update version
 since

 https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html

 ?

 if not, I hope to go on to complete this work. ;)


I haven't heard of any updates. Their priorities may have changed.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-04-23 Thread Chen Fan


On 04/20/2015 06:29 AM, Laine Stump wrote:

On 04/17/2015 04:53 AM, Chen Fan wrote:

backgrond:
Live migration is one of the most important features of virtualization 
technology.
With regard to recent virtualization techniques, performance of network I/O is 
critical.
Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a 
significant
performance gap with native network I/O. Pass-through network devices have near
native performance, however, they have thus far prevented live migration. No 
existing
methods solve the problem of live migration with pass-through devices perfectly.

There was an idea to solve the problem in website:
https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
Please refer to above document for detailed information.

This functionality has been on my mind/bug list for a long time, but I
haven't been able to pursue it much. See this BZ, along with the
original patches submitted by Shradha Shah from SolarFlare:

https://bugzilla.redhat.com/show_bug.cgi?id=896716

(I was a bit optimistic in my initial review of the patches - there are
actually a lot of issues that weren't handled by those patches.)


So I think this problem maybe could be solved by using the combination of 
existing
technology. and the following steps are we considering to implement:

-  before boot VM, we anticipate to specify two NICs for creating bonding device
(one plugged and one virtual NIC) in XML. here we can specify the NIC's mac 
addresses
in XML, which could facilitate qemu-guest-agent to find the network 
interfaces in guest.

An interesting idea, but I think that is a 2nd level enhancement, not
necessary initially (and maybe not ever, due to the high possibility of
it being extremely difficult to get right in 100% of the cases).


-  when qemu-guest-agent startup in guest it would send a notification to 
libvirt,
then libvirt will call the previous registered initialize callbacks. so 
through
the callback functions, we can create the bonding device according to the 
XML
configuration. and here we use netcf tool which can facilitate to create 
bonding device
easily.

This isn't quite making sense - the bond will be on the guest, which may
not have netcf installed. Anyway, I think it should be up to the guest's
own system network config to have the bond already setup. If you try to
impose it from outside that infrastructure, you run too much risk of
running afoul of something on the guest (e.g. NetworkManager)



-  during migration, unplug the passthroughed NIC. then do native migration.

Correct. This is the most important part. But not just unplugging it,
you also need to wait until the unplug operation completes (it is
asynchronous). (After this point, the emulated NIC that is part of the
bond would get all of the traffic).


-  on destination side, check whether need to hotplug new NIC according to 
specified XML.
usually, we use migrate --xml command option to specify the destination 
host NIC mac
address to hotplug a new NIC, because source side passthrough NIC mac 
address is different,
then hotplug the deivce according to the destination XML configuration.

Why does the MAC address need to be different? Are you suggesting doing
this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
its MAC address from the libvirt config, so it's very simple to use the
same MAC address across the migration. Any network card that would be
able to do this on any sort of useful scale will be SRIOV-capable (or
should be replaced with one that is - some of them are not that expensive).

Hi Laine,

I think SRIOV virtual NIC to support migration is good idea,
but I also think some passthrough NIC without SRIOV-capable. for
these NIC devices we only able to use hostdev to specify the passthrough
function, so for these NIC I think we should support too.

Thanks,
Chen





TODO:
   1.  when hot add a new NIC in destination side after migration finished, the 
NIC device
   need to re-enslave on bonding device in guest. otherwise, it is offline. 
maybe
   we should consider bonding driver to support add interfaces dynamically.

I never looked at the details of how SolarFlare's code handled the guest
side (they have/had their own patchset they maintained for some older
version of libvirt which integrated with some sort of enhanced bonding
driver on the guests). I assumed the bond driver could handle this
already, but have to say I never investigated.



This is an example on how this might work, so I want to hear some voices about 
this scenario.

Thanks,
Chen

Chen Fan (7):
   qemu-agent: add agent init callback when detecting guest setup
   qemu: add guest init event callback to do the initialize work for
 guest
   hostdev: add a 'bond' type element in hostdev element


Putting this into hostdev is the wrong approach, for two reasons: 1)
it doesn't account for the device to be used being in a different
address on the source and 

Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-04-21 Thread Chen Fan

Hi Laine,

Thanks for your review for my patches.

and do you know that solarflare's patches have made some update version
since

https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html

?

if not, I hope to go on to complete this work. ;)

Thanks,
Chen


On 04/20/2015 06:29 AM, Laine Stump wrote:

On 04/17/2015 04:53 AM, Chen Fan wrote:

backgrond:
Live migration is one of the most important features of virtualization 
technology.
With regard to recent virtualization techniques, performance of network I/O is 
critical.
Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a 
significant
performance gap with native network I/O. Pass-through network devices have near
native performance, however, they have thus far prevented live migration. No 
existing
methods solve the problem of live migration with pass-through devices perfectly.

There was an idea to solve the problem in website:
https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
Please refer to above document for detailed information.

This functionality has been on my mind/bug list for a long time, but I
haven't been able to pursue it much. See this BZ, along with the
original patches submitted by Shradha Shah from SolarFlare:

https://bugzilla.redhat.com/show_bug.cgi?id=896716

(I was a bit optimistic in my initial review of the patches - there are
actually a lot of issues that weren't handled by those patches.)


So I think this problem maybe could be solved by using the combination of 
existing
technology. and the following steps are we considering to implement:

-  before boot VM, we anticipate to specify two NICs for creating bonding device
(one plugged and one virtual NIC) in XML. here we can specify the NIC's mac 
addresses
in XML, which could facilitate qemu-guest-agent to find the network 
interfaces in guest.

An interesting idea, but I think that is a 2nd level enhancement, not
necessary initially (and maybe not ever, due to the high possibility of
it being extremely difficult to get right in 100% of the cases).


-  when qemu-guest-agent startup in guest it would send a notification to 
libvirt,
then libvirt will call the previous registered initialize callbacks. so 
through
the callback functions, we can create the bonding device according to the 
XML
configuration. and here we use netcf tool which can facilitate to create 
bonding device
easily.

This isn't quite making sense - the bond will be on the guest, which may
not have netcf installed. Anyway, I think it should be up to the guest's
own system network config to have the bond already setup. If you try to
impose it from outside that infrastructure, you run too much risk of
running afoul of something on the guest (e.g. NetworkManager)



-  during migration, unplug the passthroughed NIC. then do native migration.

Correct. This is the most important part. But not just unplugging it,
you also need to wait until the unplug operation completes (it is
asynchronous). (After this point, the emulated NIC that is part of the
bond would get all of the traffic).


-  on destination side, check whether need to hotplug new NIC according to 
specified XML.
usually, we use migrate --xml command option to specify the destination 
host NIC mac
address to hotplug a new NIC, because source side passthrough NIC mac 
address is different,
then hotplug the deivce according to the destination XML configuration.

Why does the MAC address need to be different? Are you suggesting doing
this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
its MAC address from the libvirt config, so it's very simple to use the
same MAC address across the migration. Any network card that would be
able to do this on any sort of useful scale will be SRIOV-capable (or
should be replaced with one that is - some of them are not that expensive).



TODO:
   1.  when hot add a new NIC in destination side after migration finished, the 
NIC device
   need to re-enslave on bonding device in guest. otherwise, it is offline. 
maybe
   we should consider bonding driver to support add interfaces dynamically.

I never looked at the details of how SolarFlare's code handled the guest
side (they have/had their own patchset they maintained for some older
version of libvirt which integrated with some sort of enhanced bonding
driver on the guests). I assumed the bond driver could handle this
already, but have to say I never investigated.



This is an example on how this might work, so I want to hear some voices about 
this scenario.

Thanks,
Chen

Chen Fan (7):
   qemu-agent: add agent init callback when detecting guest setup
   qemu: add guest init event callback to do the initialize work for
 guest
   hostdev: add a 'bond' type element in hostdev element


Putting this into hostdev is the wrong approach, for two reasons: 1)
it doesn't account for the device to be used being in a different
address on the source and destination 

Re: [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal

2015-04-19 Thread Laine Stump
On 04/17/2015 04:53 AM, Chen Fan wrote:
 backgrond:
 Live migration is one of the most important features of virtualization 
 technology.
 With regard to recent virtualization techniques, performance of network I/O 
 is critical.
 Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a 
 significant
 performance gap with native network I/O. Pass-through network devices have 
 near
 native performance, however, they have thus far prevented live migration. No 
 existing
 methods solve the problem of live migration with pass-through devices 
 perfectly.

 There was an idea to solve the problem in website:
 https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
 Please refer to above document for detailed information.

This functionality has been on my mind/bug list for a long time, but I
haven't been able to pursue it much. See this BZ, along with the
original patches submitted by Shradha Shah from SolarFlare:

https://bugzilla.redhat.com/show_bug.cgi?id=896716

(I was a bit optimistic in my initial review of the patches - there are
actually a lot of issues that weren't handled by those patches.)


 So I think this problem maybe could be solved by using the combination of 
 existing
 technology. and the following steps are we considering to implement:

 -  before boot VM, we anticipate to specify two NICs for creating bonding 
 device
(one plugged and one virtual NIC) in XML. here we can specify the NIC's 
 mac addresses
in XML, which could facilitate qemu-guest-agent to find the network 
 interfaces in guest.

An interesting idea, but I think that is a 2nd level enhancement, not
necessary initially (and maybe not ever, due to the high possibility of
it being extremely difficult to get right in 100% of the cases).


 -  when qemu-guest-agent startup in guest it would send a notification to 
 libvirt,
then libvirt will call the previous registered initialize callbacks. so 
 through
the callback functions, we can create the bonding device according to the 
 XML
configuration. and here we use netcf tool which can facilitate to create 
 bonding device
easily.

This isn't quite making sense - the bond will be on the guest, which may
not have netcf installed. Anyway, I think it should be up to the guest's
own system network config to have the bond already setup. If you try to
impose it from outside that infrastructure, you run too much risk of
running afoul of something on the guest (e.g. NetworkManager)



 -  during migration, unplug the passthroughed NIC. then do native migration.

Correct. This is the most important part. But not just unplugging it,
you also need to wait until the unplug operation completes (it is
asynchronous). (After this point, the emulated NIC that is part of the
bond would get all of the traffic).


 -  on destination side, check whether need to hotplug new NIC according to 
 specified XML.
usually, we use migrate --xml command option to specify the destination 
 host NIC mac
address to hotplug a new NIC, because source side passthrough NIC mac 
 address is different,
then hotplug the deivce according to the destination XML configuration.

Why does the MAC address need to be different? Are you suggesting doing
this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
its MAC address from the libvirt config, so it's very simple to use the
same MAC address across the migration. Any network card that would be
able to do this on any sort of useful scale will be SRIOV-capable (or
should be replaced with one that is - some of them are not that expensive).



 TODO:
   1.  when hot add a new NIC in destination side after migration finished, 
 the NIC device
   need to re-enslave on bonding device in guest. otherwise, it is 
 offline. maybe
   we should consider bonding driver to support add interfaces dynamically.

I never looked at the details of how SolarFlare's code handled the guest
side (they have/had their own patchset they maintained for some older
version of libvirt which integrated with some sort of enhanced bonding
driver on the guests). I assumed the bond driver could handle this
already, but have to say I never investigated.



 This is an example on how this might work, so I want to hear some voices 
 about this scenario.

 Thanks,
 Chen

 Chen Fan (7):
   qemu-agent: add agent init callback when detecting guest setup
   qemu: add guest init event callback to do the initialize work for
 guest
   hostdev: add a 'bond' type element in hostdev element


Putting this into hostdev is the wrong approach, for two reasons: 1)
it doesn't account for the device to be used being in a different
address on the source and destination hosts, 2) the interface element
already has much of the config you need, and an interface type
supporting hostdev passthrough.

It has been possible to do passthrough of an SRIOV VF via interface
type='hostdev' for a long time now and, even better, via an interface