On Thu, Mar 21, 2019 at 12:19:22AM +0200, Liran Alon wrote: > > > > On 21 Mar 2019, at 0:10, Michael S. Tsirkin <[email protected]> wrote: > > > > On Wed, Mar 20, 2019 at 11:43:41PM +0200, Liran Alon wrote: > >> > >> > >>> On 20 Mar 2019, at 16:09, Michael S. Tsirkin <[email protected]> wrote: > >>> > >>> On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote: > >>>> > >>>> > >>>>> On 20 Mar 2019, at 12:25, Michael S. Tsirkin <[email protected]> wrote: > >>>>> > >>>>> On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran Alon wrote: > >>>>>> > >>>>>> > >>>>>>> On 19 Mar 2019, at 23:19, Michael S. Tsirkin <[email protected]> wrote: > >>>>>>> > >>>>>>> On Tue, Mar 19, 2019 at 08:46:47AM -0700, Stephen Hemminger wrote: > >>>>>>>> On Tue, 19 Mar 2019 14:38:06 +0200 > >>>>>>>> Liran Alon <[email protected]> wrote: > >>>>>>>> > >>>>>>>>> b.3) cloud-init: If configured to perform network-configuration, it > >>>>>>>>> attempts to configure all available netdevs. It should avoid > >>>>>>>>> however doing so on net-failover slaves. > >>>>>>>>> (Microsoft has handled this by adding a mechanism in cloud-init to > >>>>>>>>> blacklist a netdev from being configured in case it is owned by a > >>>>>>>>> specific PCI driver. Specifically, they blacklist Mellanox VF > >>>>>>>>> driver. However, this technique doesn’t work for the net-failover > >>>>>>>>> mechanism because both the net-failover netdev and the virtio-net > >>>>>>>>> netdev are owned by the virtio-net PCI driver). > >>>>>>>> > >>>>>>>> Cloud-init should really just ignore all devices that have a master > >>>>>>>> device. > >>>>>>>> That would have been more general, and safer for other use cases. > >>>>>>> > >>>>>>> Given lots of userspace doesn't do this, I wonder whether it would be > >>>>>>> safer to just somehow pretend to userspace that the slave links are > >>>>>>> down? And add a special attribute for the actual link state. > >>>>>> > >>>>>> I think this may be problematic as it would also break legit use case > >>>>>> of userspace attempt to set various config on VF slave. > >>>>>> In general, lying to userspace usually leads to problems. > >>>>> > >>>>> I hear you on this. So how about instead of lying, > >>>>> we basically just fail some accesses to slaves > >>>>> unless a flag is set e.g. in ethtool. > >>>>> > >>>>> Some userspace will need to change to set it but in a minor way. > >>>>> Arguably/hopefully failure to set config would generally be a safer > >>>>> failure. > >>>> > >>>> Once userspace will set this new flag by ethtool, all operations done by > >>>> other userspace components will still work. > >>> > >>> Sorry about being unclear, the idea would be to require the flag on each > >>> ethtool operation. > >> > >> Oh. I have indeed misunderstood your previous email then. :) > >> Thanks for clarifying. > >> > >>> > >>>> E.g. Running dhclient without parameters, after this flag was set, will > >>>> still attempt to perform DHCP on it and will now succeed. > >>> > >>> I think sending/receiving should probably just fail unconditionally. > >> > >> You mean that you wish that somehow kernel will prevent Tx on net-failover > >> slave netdev > >> unless skb is marked with some flag to indicate it has been sent via the > >> net-failover master? > > > > We can maybe avoid binding a protocol socket to the device? > > That is indeed another possibility that would work to avoid the DHCP issues. > And will still allow checking connectivity. So it is better. > However, I still think it provides an non-intuitive customer experience. > In addition, I also want to take into account that most customers are > expected a 1:1 mapping between a vNIC and a netdev. > i.e. A cloud instance should show 1-netdev if it has one vNIC attached to it > defined. > Customers usually don’t care how they get accelerated networking. They just > care they do. > > > > >> This indeed resolves the group of userspace issues around performing DHCP > >> on net-failover slaves directly (By dracut/initramfs, dhclient and etc.). > >> > >> However, I see a couple of down-sides to it: > >> 1) It doesn’t resolve all userspace issues listed in this email thread. > >> For example, cloud-init will still attempt to perform network config on > >> net-failover slaves. > >> It also doesn’t help with regard to Ubuntu’s netplan issue that creates > >> udev rules that match only by MAC. > > > > > > How about we fail to retrieve mac from the slave? > > That would work but I think it is cleaner to just not bind PV and VF based on > having the same MAC.
There's a reference to that under "Non-MAC based pairing". I'll look into making it more explicit. > > > >> 2) It brings non-intuitive customer experience. For example, a customer > >> may attempt to analyse connectivity issue by checking the connectivity > >> on a net-failover slave (e.g. the VF) but will see no connectivity when > >> in-fact checking the connectivity on the net-failover master netdev shows > >> correct connectivity. > >> > >> The set of changes I vision to fix our issues are: > >> 1) Hide net-failover slaves in a different netns created and managed by > >> the kernel. But that user can enter to it and manage the netdevs there if > >> wishes to do so explicitly. > >> (E.g. Configure the net-failover VF slave in some special way). > >> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. > >> (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get > >> PCI slot where the matching VF will be hot-plugged by hypervisor. > >> 3) Have an explicit virtio-net control message to command hypervisor to > >> switch data-path from virtio-net to VF and vice-versa. Instead of relying > >> on intercepting the PCI master enable-bit > >> as an indicator on when VF is about to be set up. (Similar to as done in > >> NetVSC). > >> > >> Is there any clear issue we see regarding the above suggestion? > >> > >> -Liran > > > > The issue would be this: how do we avoid conflicting with namespaces > > created by users? > > This is kinda controversial, but maybe separate netns names into 2 groups: > hidden and normal. > To reference a hidden netns, you need to do it explicitly. > Hidden and normal netns names can collide as they will be maintained in > different namespaces (Yes I’m overloading the term namespace here…). Maybe it's an unnamed namespace. Hidden until userspace gives it a name? > Does this seems reasonable? > > -Liran Reasonable I'd say yes, easy to implement probably no. But maybe I missed a trick or two. > > > >>> > >>>> Therefore, this proposal just effectively delays when the net-failover > >>>> slave can be operated on by userspace. > >>>> But what we actually want is to never allow a net-failover slave to be > >>>> operated by userspace unless it is explicitly stated > >>>> by userspace that it wishes to perform a set of actions on the > >>>> net-failover slave. > >>>> > >>>> Something that was achieved if, for example, the net-failover slaves > >>>> were in a different netns than default netns. > >>>> This also aligns with expected customer experience that most customers > >>>> just want to see a 1:1 mapping between a vNIC and a visible netdev. > >>>> But of course maybe there are other ideas that can achieve similar > >>>> behaviour. > >>>> > >>>> -Liran > >>>> > >>>>> > >>>>> Which things to fail? Probably sending/receiving packets? Getting MAC? > >>>>> More? > >>>>> > >>>>>> If we reach > >>>>>> to a scenario where we try to avoid userspace issues generically and > >>>>>> not on a userspace component basis, I believe the right path should be > >>>>>> to hide the net-failover slaves such that explicit action is required > >>>>>> to actually manipulate them (As described in blog-post). E.g. > >>>>>> Automatically move net-failover slaves by kernel to a different netns. > >>>>>> > >>>>>> -Liran > >>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> MST _______________________________________________ Virtualization mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/virtualization
