Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote: > Yeah, I can see it now :( I guess that the ship has sailed and we are > stuck with this ugly thing forever... > > Could you at least make some common code that is shared in between > netvsc and virtio_net so this is handled in exacly the same way in both? IMHO netvsc is a vendor specific driver which made a mistake on what behaviour it provides (or tried to align itself with Windows SR-IOV). Let's not make a far, far more commonly deployed and important driver (virtio) bug-compatible with netvsc. To Jiri's initial comments, I feel the same way, in fact I've talked to the NetworkManager guys to get auto-bonding based on MACs handled in user space. I think it may very well get done in next versions of NM, but isn't done yet. Stephen also raised the point that not everybody is using NM. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On Tue, Feb 20, 2018 at 12:14 PM, Jiri Pirkowrote: > Tue, Feb 20, 2018 at 06:14:32PM CET, sridhar.samudr...@intel.com wrote: >>On 2/20/2018 8:29 AM, Jiri Pirko wrote: >>> Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: >>> > On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirko wrote: >>> > > Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >>> > > > Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >>> > > > used by hypervisor to indicate that virtio_net interface should act as >>> > > > a backup for another device with the same MAC address. >>> > > > >>> > > > Ppatch 2 is in response to the community request for a 3 netdev >>> > > > solution. However, it creates some issues we'll get into in a moment. >>> > > > It extends virtio_net to use alternate datapath when available and >>> > > > registered. When BACKUP feature is enabled, virtio_net driver creates >>> > > > an additional 'bypass' netdev that acts as a master device and >>> > > > controls >>> > > > 2 slave devices. The original virtio_net netdev is registered as >>> > > > 'backup' netdev and a passthru/vf device with the same MAC gets >>> > > > registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >>> > > > associated with the same 'pci' device. The user accesses the network >>> > > > interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' >>> > > > netdev >>> > > > as default for transmits when it is available with link up and >>> > > > running. >>> > > Sorry, but this is ridiculous. You are apparently re-implemeting part >>> > > of bonding driver as a part of NIC driver. Bond and team drivers >>> > > are mature solutions, well tested, broadly used, with lots of issues >>> > > resolved in the past. What you try to introduce is a weird shortcut >>> > > that already has couple of issues as you mentioned and will certanly >>> > > have many more. Also, I'm pretty sure that in future, someone comes up >>> > > with ideas like multiple VFs, LACP and similar bonding things. >>> > The problem with the bond and team drivers is they are too large and >>> > have too many interfaces available for configuration so as a result >>> > they can really screw this interface up. >>> What? Too large is which sense? Why "too many interfaces" is a problem? >>> Also, team has only one interface to userspace team-generic-netlink. >>> >>> >>> > Essentially this is meant to be a bond that is more-or-less managed by >>> > the host, not the guest. We want the host to be able to configure it >>> How is it managed by the host? In your usecase the guest has 2 netdevs: >>> virtio_net, pci vf. >>> I don't see how host can do any managing of that, other than the >>> obvious. But still, the active/backup decision is done in guest. This is >>> a simple bond/team usecase. As I said, there is something needed to be >>> implemented in userspace in order to handle re-appear of vf netdev. >>> But that should be fairly easy to do in teamd. >> >>The host manages the active/backup decision by >>- assigning the same MAC address to both VF and virtio interfaces >>- setting a BACKUP feature bit on virtio that enables virtio to transparently >>take >> over the VFs datapath. >>- only enable one datapath at anytime so that packets don't get looped back >>- during live migration enable virtio datapth, unplug vf on the source and >>replug >> vf on the destination. >> >>The VM is not expected and doesn't have any control of setting the MAC >>address >>or bringing up/down the links. >> >>This is the model that is currently supported with netvsc driver on Azure. > > Yeah, I can see it now :( I guess that the ship has sailed and we are > stuck with this ugly thing forever... > > Could you at least make some common code that is shared in between > netvsc and virtio_net so this is handled in exacly the same way in both? > > The fact that the netvsc/virtio_net kidnaps a netdev only because it > has the same mac is going to give me some serious nighmares... > I think we need to introduce some more strict checks. In order for that to work we need to settle on a model for these. The issue is that netvsc is using what we refer to as the "2 netdev" model where they don't expose the paravirtual interface as its own netdev. The opinion of Jakub and others has been that we should do a "3 netdev" model in the case of virtio_net since otherwise we will lose functionality such as in-driver XDP and have to deal with an extra set of qdiscs and Tx queue locks on transmit path. Really at this point I am good either way, but we need to probably have Stephen, Jakub, and whoever else had an opinion on the matter sort out the 2 vs 3 argument before we could proceed on that. Most of patch 2 in the set can easily be broken out into a separate file later if we decide to go that route. Thanks. - Alex ___ Virtualization mailing list
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Tue, Feb 20, 2018 at 06:14:32PM CET, sridhar.samudr...@intel.com wrote: >On 2/20/2018 8:29 AM, Jiri Pirko wrote: >> Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: >> > On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirkowrote: >> > > Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >> > > > Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >> > > > used by hypervisor to indicate that virtio_net interface should act as >> > > > a backup for another device with the same MAC address. >> > > > >> > > > Ppatch 2 is in response to the community request for a 3 netdev >> > > > solution. However, it creates some issues we'll get into in a moment. >> > > > It extends virtio_net to use alternate datapath when available and >> > > > registered. When BACKUP feature is enabled, virtio_net driver creates >> > > > an additional 'bypass' netdev that acts as a master device and controls >> > > > 2 slave devices. The original virtio_net netdev is registered as >> > > > 'backup' netdev and a passthru/vf device with the same MAC gets >> > > > registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >> > > > associated with the same 'pci' device. The user accesses the network >> > > > interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' >> > > > netdev >> > > > as default for transmits when it is available with link up and running. >> > > Sorry, but this is ridiculous. You are apparently re-implemeting part >> > > of bonding driver as a part of NIC driver. Bond and team drivers >> > > are mature solutions, well tested, broadly used, with lots of issues >> > > resolved in the past. What you try to introduce is a weird shortcut >> > > that already has couple of issues as you mentioned and will certanly >> > > have many more. Also, I'm pretty sure that in future, someone comes up >> > > with ideas like multiple VFs, LACP and similar bonding things. >> > The problem with the bond and team drivers is they are too large and >> > have too many interfaces available for configuration so as a result >> > they can really screw this interface up. >> What? Too large is which sense? Why "too many interfaces" is a problem? >> Also, team has only one interface to userspace team-generic-netlink. >> >> >> > Essentially this is meant to be a bond that is more-or-less managed by >> > the host, not the guest. We want the host to be able to configure it >> How is it managed by the host? In your usecase the guest has 2 netdevs: >> virtio_net, pci vf. >> I don't see how host can do any managing of that, other than the >> obvious. But still, the active/backup decision is done in guest. This is >> a simple bond/team usecase. As I said, there is something needed to be >> implemented in userspace in order to handle re-appear of vf netdev. >> But that should be fairly easy to do in teamd. > >The host manages the active/backup decision by >- assigning the same MAC address to both VF and virtio interfaces >- setting a BACKUP feature bit on virtio that enables virtio to transparently >take > over the VFs datapath. >- only enable one datapath at anytime so that packets don't get looped back >- during live migration enable virtio datapth, unplug vf on the source and >replug > vf on the destination. > >The VM is not expected and doesn't have any control of setting the MAC >address >or bringing up/down the links. > >This is the model that is currently supported with netvsc driver on Azure. Yeah, I can see it now :( I guess that the ship has sailed and we are stuck with this ugly thing forever... Could you at least make some common code that is shared in between netvsc and virtio_net so this is handled in exacly the same way in both? The fact that the netvsc/virtio_net kidnaps a netdev only because it has the same mac is going to give me some serious nighmares... I think we need to introduce some more strict checks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Tue, Feb 20, 2018 at 06:23:49PM CET, alexander.du...@gmail.com wrote: >On Tue, Feb 20, 2018 at 8:29 AM, Jiri Pirkowrote: >> Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: >>>On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirko wrote: Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >used by hypervisor to indicate that virtio_net interface should act as >a backup for another device with the same MAC address. > >Ppatch 2 is in response to the community request for a 3 netdev >solution. However, it creates some issues we'll get into in a moment. >It extends virtio_net to use alternate datapath when available and >registered. When BACKUP feature is enabled, virtio_net driver creates >an additional 'bypass' netdev that acts as a master device and controls >2 slave devices. The original virtio_net netdev is registered as >'backup' netdev and a passthru/vf device with the same MAC gets >registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >associated with the same 'pci' device. The user accesses the network >interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev >as default for transmits when it is available with link up and running. Sorry, but this is ridiculous. You are apparently re-implemeting part of bonding driver as a part of NIC driver. Bond and team drivers are mature solutions, well tested, broadly used, with lots of issues resolved in the past. What you try to introduce is a weird shortcut that already has couple of issues as you mentioned and will certanly have many more. Also, I'm pretty sure that in future, someone comes up with ideas like multiple VFs, LACP and similar bonding things. >>> >>>The problem with the bond and team drivers is they are too large and >>>have too many interfaces available for configuration so as a result >>>they can really screw this interface up. >> >> What? Too large is which sense? Why "too many interfaces" is a problem? >> Also, team has only one interface to userspace team-generic-netlink. > >Specifically I was working with bond. I had overlooked team for the >most part since it required an additional userspace daemon which >basically broke our requirement of no user-space intervention. Why? That sound artificial. Why the userspace cannot be part of the solution? > >I was trying to focus on just doing an active/backup setup. The >problem is there are debugfs, sysfs, and procfs interfaces exposed >that we don't need and/or want. Adding any sort of interface to >exclude these would just bloat up the bonding driver, and leaving them >in would just be confusing since they would all need to be ignored. In >addition the steps needed to get the name to come out the same as the >original virtio interface would just bloat up bonding. Why to you care about "name"? it's a netdev, isn't it all that matters? The viewpoint of the user inside vm boils down to: 1) I have 2 netdevs 2) One is preferred 3) I setup team on top of them That's should be it. It is the users responsibility to do it this way. > >>> >>>Essentially this is meant to be a bond that is more-or-less managed by >>>the host, not the guest. We want the host to be able to configure it >> >> How is it managed by the host? In your usecase the guest has 2 netdevs: >> virtio_net, pci vf. >> I don't see how host can do any managing of that, other than the >> obvious. But still, the active/backup decision is done in guest. This is >> a simple bond/team usecase. As I said, there is something needed to be >> implemented in userspace in order to handle re-appear of vf netdev. >> But that should be fairly easy to do in teamd. >> >> >>>and have it automatically kick in on the guest. For now we want to >>>avoid adding too much complexity as this is meant to be just the first >> >> That's what I fear, "for now".. > >I used the expression "for now" as I see this being the first stage of >a multi-stage process. That is what I fear... > >Step 1 is to get a basic virtio-bypass driver added to virtio so that >it is at least comparable to netvsc in terms of feature set and >enables basic network live migration. > >Step 2 is adding some sort of dirty page tracking, preferably via >something like a paravirtual iommu interface. Once we have that we can >defer the eviction of the VF until the very last moment of the live >migration. For now I need to work on testing a modification to allow >mapping the entire guest as being pass-through for DMA to the device, >and requiring dynamic for any DMA that is bidirectional or from the >device. That is purely on the host side. Does not really matter if your solution or standard bond/team is in use, right? > >Step 3 will be to start looking at advanced configuration. That is >where we drop the
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On Tue, Feb 20, 2018 at 8:29 AM, Jiri Pirkowrote: > Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: >>On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirko wrote: >>> Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be used by hypervisor to indicate that virtio_net interface should act as a backup for another device with the same MAC address. Ppatch 2 is in response to the community request for a 3 netdev solution. However, it creates some issues we'll get into in a moment. It extends virtio_net to use alternate datapath when available and registered. When BACKUP feature is enabled, virtio_net driver creates an additional 'bypass' netdev that acts as a master device and controls 2 slave devices. The original virtio_net netdev is registered as 'backup' netdev and a passthru/vf device with the same MAC gets registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are associated with the same 'pci' device. The user accesses the network interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev as default for transmits when it is available with link up and running. >>> >>> Sorry, but this is ridiculous. You are apparently re-implemeting part >>> of bonding driver as a part of NIC driver. Bond and team drivers >>> are mature solutions, well tested, broadly used, with lots of issues >>> resolved in the past. What you try to introduce is a weird shortcut >>> that already has couple of issues as you mentioned and will certanly >>> have many more. Also, I'm pretty sure that in future, someone comes up >>> with ideas like multiple VFs, LACP and similar bonding things. >> >>The problem with the bond and team drivers is they are too large and >>have too many interfaces available for configuration so as a result >>they can really screw this interface up. > > What? Too large is which sense? Why "too many interfaces" is a problem? > Also, team has only one interface to userspace team-generic-netlink. Specifically I was working with bond. I had overlooked team for the most part since it required an additional userspace daemon which basically broke our requirement of no user-space intervention. I was trying to focus on just doing an active/backup setup. The problem is there are debugfs, sysfs, and procfs interfaces exposed that we don't need and/or want. Adding any sort of interface to exclude these would just bloat up the bonding driver, and leaving them in would just be confusing since they would all need to be ignored. In addition the steps needed to get the name to come out the same as the original virtio interface would just bloat up bonding. >> >>Essentially this is meant to be a bond that is more-or-less managed by >>the host, not the guest. We want the host to be able to configure it > > How is it managed by the host? In your usecase the guest has 2 netdevs: > virtio_net, pci vf. > I don't see how host can do any managing of that, other than the > obvious. But still, the active/backup decision is done in guest. This is > a simple bond/team usecase. As I said, there is something needed to be > implemented in userspace in order to handle re-appear of vf netdev. > But that should be fairly easy to do in teamd. > > >>and have it automatically kick in on the guest. For now we want to >>avoid adding too much complexity as this is meant to be just the first > > That's what I fear, "for now".. I used the expression "for now" as I see this being the first stage of a multi-stage process. Step 1 is to get a basic virtio-bypass driver added to virtio so that it is at least comparable to netvsc in terms of feature set and enables basic network live migration. Step 2 is adding some sort of dirty page tracking, preferably via something like a paravirtual iommu interface. Once we have that we can defer the eviction of the VF until the very last moment of the live migration. For now I need to work on testing a modification to allow mapping the entire guest as being pass-through for DMA to the device, and requiring dynamic for any DMA that is bidirectional or from the device. Step 3 will be to start looking at advanced configuration. That is where we drop the implementation in step 1 and instead look at spawning something that looks more like the team type interface, however instead of working with a user-space daemon we would likely need to work with some sort of mailbox or message queue coming up from the hypervisor. Then we can start looking at doing things like passing up blocks of eBPF code to handle Tx port selection or whatever we need. > >>step. Trying to go in and implement the whole solution right from the >>start based on existing drivers is going to be a massive time sink and >>will likely never get completed due to the fact that there is always >>going to be some other thing that will
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On 2/20/2018 8:29 AM, Jiri Pirko wrote: Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirkowrote: Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be used by hypervisor to indicate that virtio_net interface should act as a backup for another device with the same MAC address. Ppatch 2 is in response to the community request for a 3 netdev solution. However, it creates some issues we'll get into in a moment. It extends virtio_net to use alternate datapath when available and registered. When BACKUP feature is enabled, virtio_net driver creates an additional 'bypass' netdev that acts as a master device and controls 2 slave devices. The original virtio_net netdev is registered as 'backup' netdev and a passthru/vf device with the same MAC gets registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are associated with the same 'pci' device. The user accesses the network interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev as default for transmits when it is available with link up and running. Sorry, but this is ridiculous. You are apparently re-implemeting part of bonding driver as a part of NIC driver. Bond and team drivers are mature solutions, well tested, broadly used, with lots of issues resolved in the past. What you try to introduce is a weird shortcut that already has couple of issues as you mentioned and will certanly have many more. Also, I'm pretty sure that in future, someone comes up with ideas like multiple VFs, LACP and similar bonding things. The problem with the bond and team drivers is they are too large and have too many interfaces available for configuration so as a result they can really screw this interface up. What? Too large is which sense? Why "too many interfaces" is a problem? Also, team has only one interface to userspace team-generic-netlink. Essentially this is meant to be a bond that is more-or-less managed by the host, not the guest. We want the host to be able to configure it How is it managed by the host? In your usecase the guest has 2 netdevs: virtio_net, pci vf. I don't see how host can do any managing of that, other than the obvious. But still, the active/backup decision is done in guest. This is a simple bond/team usecase. As I said, there is something needed to be implemented in userspace in order to handle re-appear of vf netdev. But that should be fairly easy to do in teamd. The host manages the active/backup decision by - assigning the same MAC address to both VF and virtio interfaces - setting a BACKUP feature bit on virtio that enables virtio to transparently take over the VFs datapath. - only enable one datapath at anytime so that packets don't get looped back - during live migration enable virtio datapth, unplug vf on the source and replug vf on the destination. The VM is not expected and doesn't have any control of setting the MAC address or bringing up/down the links. This is the model that is currently supported with netvsc driver on Azure. and have it automatically kick in on the guest. For now we want to avoid adding too much complexity as this is meant to be just the first That's what I fear, "for now".. step. Trying to go in and implement the whole solution right from the start based on existing drivers is going to be a massive time sink and will likely never get completed due to the fact that there is always going to be some other thing that will interfere. "implement the whole solution right from the start based on existing drivers" - what solution are you talking about? I don't understand this para. My personal hope is that we can look at doing a virtio-bond sort of device that will handle all this as well as providing a communication channel, but that is much further down the road. For now we only have a single bit so the goal for now is trying to keep this as simple as possible. Oh. So there is really intention to do re-implementation of bonding in virtio. That is plain-wrong in my opinion. Could you just use bond/team, please, and don't reinvent the wheel with this abomination? What is the reason for this abomination? According to: https://marc.info/?l=linux-virtualization=151189725224231=2 The reason is quite weak. User in the vm sees 2 (or more) netdevices, he puts them in bond/team and that's it. This works now! If the vm lacks some userspace features, let's fix it there! For example the MAC changes is something that could be easily handled in teamd userspace deamon. I think you might have missed the point of this. This is meant to be a simple interface so the guest should not be able to change the MAC address, and it shouldn't require any userspace daemon to setup or tear down. Ideally with this solution the virtio bypass will come up and be assigned the name of the original virtio, and the
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.du...@gmail.com wrote: >On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirkowrote: >> Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >>>Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >>>used by hypervisor to indicate that virtio_net interface should act as >>>a backup for another device with the same MAC address. >>> >>>Ppatch 2 is in response to the community request for a 3 netdev >>>solution. However, it creates some issues we'll get into in a moment. >>>It extends virtio_net to use alternate datapath when available and >>>registered. When BACKUP feature is enabled, virtio_net driver creates >>>an additional 'bypass' netdev that acts as a master device and controls >>>2 slave devices. The original virtio_net netdev is registered as >>>'backup' netdev and a passthru/vf device with the same MAC gets >>>registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >>>associated with the same 'pci' device. The user accesses the network >>>interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev >>>as default for transmits when it is available with link up and running. >> >> Sorry, but this is ridiculous. You are apparently re-implemeting part >> of bonding driver as a part of NIC driver. Bond and team drivers >> are mature solutions, well tested, broadly used, with lots of issues >> resolved in the past. What you try to introduce is a weird shortcut >> that already has couple of issues as you mentioned and will certanly >> have many more. Also, I'm pretty sure that in future, someone comes up >> with ideas like multiple VFs, LACP and similar bonding things. > >The problem with the bond and team drivers is they are too large and >have too many interfaces available for configuration so as a result >they can really screw this interface up. What? Too large is which sense? Why "too many interfaces" is a problem? Also, team has only one interface to userspace team-generic-netlink. > >Essentially this is meant to be a bond that is more-or-less managed by >the host, not the guest. We want the host to be able to configure it How is it managed by the host? In your usecase the guest has 2 netdevs: virtio_net, pci vf. I don't see how host can do any managing of that, other than the obvious. But still, the active/backup decision is done in guest. This is a simple bond/team usecase. As I said, there is something needed to be implemented in userspace in order to handle re-appear of vf netdev. But that should be fairly easy to do in teamd. >and have it automatically kick in on the guest. For now we want to >avoid adding too much complexity as this is meant to be just the first That's what I fear, "for now".. >step. Trying to go in and implement the whole solution right from the >start based on existing drivers is going to be a massive time sink and >will likely never get completed due to the fact that there is always >going to be some other thing that will interfere. "implement the whole solution right from the start based on existing drivers" - what solution are you talking about? I don't understand this para. > >My personal hope is that we can look at doing a virtio-bond sort of >device that will handle all this as well as providing a communication >channel, but that is much further down the road. For now we only have >a single bit so the goal for now is trying to keep this as simple as >possible. Oh. So there is really intention to do re-implementation of bonding in virtio. That is plain-wrong in my opinion. Could you just use bond/team, please, and don't reinvent the wheel with this abomination? > >> What is the reason for this abomination? According to: >> https://marc.info/?l=linux-virtualization=151189725224231=2 >> The reason is quite weak. >> User in the vm sees 2 (or more) netdevices, he puts them in bond/team >> and that's it. This works now! If the vm lacks some userspace features, >> let's fix it there! For example the MAC changes is something that could >> be easily handled in teamd userspace deamon. > >I think you might have missed the point of this. This is meant to be a >simple interface so the guest should not be able to change the MAC >address, and it shouldn't require any userspace daemon to setup or >tear down. Ideally with this solution the virtio bypass will come up >and be assigned the name of the original virtio, and the "backup" >interface will come up and be assigned the name of the original virtio >with an additional "nbackup" tacked on via the phys_port_name, and >then whenever a VF is added it will automatically be enslaved by the >bypass interface, and it will be removed when the VF is hotplugged >out. > >In my mind the difference between this and bond or team is where the >configuration interface lies. In the case of bond it is in the kernel. >If my understanding is correct team is mostly in user space. With this >the configuration interface
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On 2/18/2018 10:11 PM, Jakub Kicinski wrote: On Sat, 17 Feb 2018 09:12:01 -0800, Alexander Duyck wrote: We noticed a couple of issues with this approach during testing. - As both 'bypass' and 'backup' netdevs are associated with the same virtio pci device, udev tries to rename both of them with the same name and the 2nd rename will fail. This would be OK as long as the first netdev to be renamed is the 'bypass' netdev, but the order in which udev gets to rename the 2 netdevs is not reliable. Out of curiosity - why do you link the master netdev to the virtio struct device? The basic idea of all this is that we wanted this to work with an existing VM image that was using virtio. As such we were trying to make it so that the bypass interface takes the place of the original virtio and get udev to rename the bypass to what the original virtio_net was. That makes sense. Is it udev/naming that you're most concerned about here? I.e. what's the user space app that expects the netdev to be linked? This is just out of curiosity, the linking of netdevs to devices is a bit of a PITA in the switchdev eswitch mode world, with libvirt expecting only certain devices to be there.. Right now we're not linking VF reprs, which breaks naming. I wanted to revisit that. For live migration usecase, userspace is only aware of one virtio_net interface and it doesn't expect it to be linked with any lower dev. So it should be fine even if the lower netdev is not present. Only the master netdev should be assigned the same name so that userspace configuration scripts in the VM don't need to change. FWIW two solutions that immediately come to mind is to export "backup" as phys_port_name of the backup virtio link and/or assign a name to the master like you are doing already. I think team uses team%d and bond uses bond%d, soft naming of master devices seems quite natural in this case. I figured I had overlooked something like that.. Thanks for pointing this out. Okay so I think the phys_port_name approach might resolve the original issue. If I am reading things correctly what we end up with is the master showing up as "ens1" for example and the backup showing up as "ens1nbackup". Am I understanding that right? Yes, provided systemd is new enough. Yes. I did a quick test to confirm that adding ndo_phys_port_name() to virtio_net ndo_ops fixes the udev naming issue with 2 virtio netdevs. This is on fedora27. The problem with the team/bond%d approach is that it creates a new netdevice and so it would require guest configuration changes. IMHO phys_port_name == "backup" if BACKUP bit is set on slave virtio link is quite neat. I agree. For non-"backup" virio_net devices would it be okay for us to just return -EOPNOTSUPP? I assume it would be and that way the legacy behavior could be maintained although the function still exists. That's my understanding too. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirkowrote: > Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >>Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >>used by hypervisor to indicate that virtio_net interface should act as >>a backup for another device with the same MAC address. >> >>Ppatch 2 is in response to the community request for a 3 netdev >>solution. However, it creates some issues we'll get into in a moment. >>It extends virtio_net to use alternate datapath when available and >>registered. When BACKUP feature is enabled, virtio_net driver creates >>an additional 'bypass' netdev that acts as a master device and controls >>2 slave devices. The original virtio_net netdev is registered as >>'backup' netdev and a passthru/vf device with the same MAC gets >>registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >>associated with the same 'pci' device. The user accesses the network >>interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev >>as default for transmits when it is available with link up and running. > > Sorry, but this is ridiculous. You are apparently re-implemeting part > of bonding driver as a part of NIC driver. Bond and team drivers > are mature solutions, well tested, broadly used, with lots of issues > resolved in the past. What you try to introduce is a weird shortcut > that already has couple of issues as you mentioned and will certanly > have many more. Also, I'm pretty sure that in future, someone comes up > with ideas like multiple VFs, LACP and similar bonding things. The problem with the bond and team drivers is they are too large and have too many interfaces available for configuration so as a result they can really screw this interface up. Essentially this is meant to be a bond that is more-or-less managed by the host, not the guest. We want the host to be able to configure it and have it automatically kick in on the guest. For now we want to avoid adding too much complexity as this is meant to be just the first step. Trying to go in and implement the whole solution right from the start based on existing drivers is going to be a massive time sink and will likely never get completed due to the fact that there is always going to be some other thing that will interfere. My personal hope is that we can look at doing a virtio-bond sort of device that will handle all this as well as providing a communication channel, but that is much further down the road. For now we only have a single bit so the goal for now is trying to keep this as simple as possible. > What is the reason for this abomination? According to: > https://marc.info/?l=linux-virtualization=151189725224231=2 > The reason is quite weak. > User in the vm sees 2 (or more) netdevices, he puts them in bond/team > and that's it. This works now! If the vm lacks some userspace features, > let's fix it there! For example the MAC changes is something that could > be easily handled in teamd userspace deamon. I think you might have missed the point of this. This is meant to be a simple interface so the guest should not be able to change the MAC address, and it shouldn't require any userspace daemon to setup or tear down. Ideally with this solution the virtio bypass will come up and be assigned the name of the original virtio, and the "backup" interface will come up and be assigned the name of the original virtio with an additional "nbackup" tacked on via the phys_port_name, and then whenever a VF is added it will automatically be enslaved by the bypass interface, and it will be removed when the VF is hotplugged out. In my mind the difference between this and bond or team is where the configuration interface lies. In the case of bond it is in the kernel. If my understanding is correct team is mostly in user space. With this the configuration interface is really down in the hypervisor and requests are communicated up to the guest. I would prefer not to make virtio_net dependent on the bonding or team drivers, or worse yet a userspace daemon in the guest. For now I would argue we should keep this as simple as possible just to support basic live migration. There has already been discussions of refactoring this after it is in so that we can start to combine the functionality here with what is there in bonding/team, but the differences in configuration interface and the size of the code bases will make it challenging to outright merge this into something like that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[4.4-stable 12/22] virtio_balloon: prevent uninitialized variable use
commit f0bb2d50dfcc519f06f901aac88502be6ff1df2c upstream. The latest gcc-7.0.1 snapshot reports a new warning: virtio/virtio_balloon.c: In function 'update_balloon_stats': virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized] virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized] virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized] virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized] This seems absolutely right, so we should add an extra check to prevent copying uninitialized stack data into the statistics. >From all I can tell, this has been broken since the statistics code was originally added in 2.6.34. Fixes: 9564e138b1f6 ("virtio: Add memory statistics reporting to the balloon driver (V4)") Signed-off-by: Arnd BergmannSigned-off-by: Ladi Prosek Signed-off-by: Michael S. Tsirkin [arnd: backported to 4.4] Signed-off-by: Arnd Bergmann --- drivers/virtio/virtio_balloon.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 01d15dca940e..26d0dff069f0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -239,12 +239,15 @@ static void update_balloon_stats(struct virtio_balloon *vb) all_vm_events(events); si_meminfo(); + +#ifdef CONFIG_VM_EVENT_COUNTERS update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_IN, pages_to_bytes(events[PSWPIN])); update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_OUT, pages_to_bytes(events[PSWPOUT])); update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]); update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]); +#endif update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE, pages_to_bytes(i.freeram)); update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMTOT, -- 2.9.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] iommu: Add virtio-iommu driver
On 19/02/18 12:23, Tomasz Nowicki wrote: [...] >> +static int viommu_receive_resp(struct viommu_dev *viommu, int nr_sent, >> + struct list_head *sent) >> +{ >> + >> +unsigned int len; >> +int nr_received = 0; >> +struct viommu_request *req, *pending; >> + >> +pending = list_first_entry_or_null(sent, struct viommu_request, list); >> +if (WARN_ON(!pending)) >> +return 0; >> + >> +while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) { >> +if (req != pending) { >> +dev_warn(viommu->dev, "discarding stale request\n"); >> +continue; >> +} >> + >> +pending->written = len; >> + >> +if (++nr_received == nr_sent) { >> +WARN_ON(!list_is_last(>list, sent)); >> +break; >> +} else if (WARN_ON(list_is_last(>list, sent))) { >> +break; >> +} >> + >> +pending = list_next_entry(pending, list); > > We should remove current element from the pending list. There is no > guarantee we get response for each while loop so when we get back for > more the _viommu_send_reqs_sync() caller will pass pointer to the out of > date head next time. Right, I'll fix this Thanks, Jean ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudr...@intel.com wrote: >Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be >used by hypervisor to indicate that virtio_net interface should act as >a backup for another device with the same MAC address. > >Ppatch 2 is in response to the community request for a 3 netdev >solution. However, it creates some issues we'll get into in a moment. >It extends virtio_net to use alternate datapath when available and >registered. When BACKUP feature is enabled, virtio_net driver creates >an additional 'bypass' netdev that acts as a master device and controls >2 slave devices. The original virtio_net netdev is registered as >'backup' netdev and a passthru/vf device with the same MAC gets >registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are >associated with the same 'pci' device. The user accesses the network >interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev >as default for transmits when it is available with link up and running. Sorry, but this is ridiculous. You are apparently re-implemeting part of bonding driver as a part of NIC driver. Bond and team drivers are mature solutions, well tested, broadly used, with lots of issues resolved in the past. What you try to introduce is a weird shortcut that already has couple of issues as you mentioned and will certanly have many more. Also, I'm pretty sure that in future, someone comes up with ideas like multiple VFs, LACP and similar bonding things. What is the reason for this abomination? According to: https://marc.info/?l=linux-virtualization=151189725224231=2 The reason is quite weak. User in the vm sees 2 (or more) netdevices, he puts them in bond/team and that's it. This works now! If the vm lacks some userspace features, let's fix it there! For example the MAC changes is something that could be easily handled in teamd userspace deamon. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization