On Thu, Jun 09, 2022 at 07:33:01AM +0000, Het Gala wrote: > > As of now, the multi-FD feature supports connection over the default network > only. This Patchset series is a Qemu side implementation of providing multiple > interfaces support for multi-FD. This enables us to fully utilize dedicated or > multiple NICs in case bonding of NICs is not possible. > > > Introduction > ------------- > Multi-FD Qemu implementation currently supports connection only on the default > network. This forbids us from advantages like: > - Separating VM live migration traffic from the default network.
Perhaps I'm mis-understanding your intent here, but AFAIK it has been possible to separate VM migration traffic from general host network traffic essentially forever. If you have two NICs with IP addresses on different subnets, then the kernel will pick which NIC to use automatically based on the IP address of the target matching the kernel routing table entries. Management apps have long used this ability in order to control which NIC migration traffic flows over. > - Fully utilize all NICs’ capacity in cases where creating a LACP bond (Link > Aggregation Control Protocol) is not supported. Can you elaborate on scenarios in which it is impossible to use LACP bonding at the kernel level ? > Multi-interface with Multi-FD > ----------------------------- > Multiple-interface support over basic multi-FD has been implemented in the > patches. Advantages of this implementation are: > - Able to separate live migration traffic from default network interface by > creating multiFD channels on ip addresses of multiple non-default > interfaces. > - Can optimize the number of multi-FD channels on a particular interface > depending upon the network bandwidth limit on a particular interface. Manually assigning individual channels to different NICs is a pretty inefficient way to optimizing traffic. Feels like you could easily get into a situation where one NIC ends up idle while the other is busy, especially if the traffic patterns are different. For example with post-copy there's an extra channel for OOB async page requests, and its far from clear that manually picking NICs per chanel upfront is going work for that. The kernel can continually dynamically balance load on the fly and so do much better than any static mapping QEMU tries to apply, especially if there are multiple distinct QEMU's competing for bandwidth. > Implementation > -------------- > > Earlier the 'migrate' qmp command: > { "execute": "migrate", "arguments": { "uri": "tcp:0:4446" } } > > Modified qmp command: > { "execute": "migrate", > "arguments": { "uri": "tcp:0:4446", "multi-fd-uri-list": [ { > "source-uri": "tcp::6900", "destination-uri": "tcp:0:4480", > "multifd-channels": 4}, { "source-uri": "tcp:10.0.0.0: ", > "destination-uri": "tcp:11.0.0.0:7789", > "multifd-channels": 5} ] } } > ------------------------------------------------------------------------------ > > Earlier the 'migrate-incoming' qmp command: > { "execute": "migrate-incoming", "arguments": { "uri": "tcp::4446" } } > > Modified 'migrate-incoming' qmp command: > { "execute": "migrate-incoming", > "arguments": {"uri": "tcp::6789", > "multi-fd-uri-list" : [ {"destination-uri" : "tcp::6900", > "multifd-channels": 4}, {"destination-uri" : "tcp:11.0.0.0:7789", > "multifd-channels": 5} ] } } > ------------------------------------------------------------------------------ These examples pretty nicely illustrate my concern with this proposal. It is making QEMU configuration of migration massively more complicated, while duplicating functionality the kernel can provide via NIC teaming, but without having ability to balance it on the fly as the kernel would. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|