Re: [summary] virtio network device failover writeup

Michael S. Tsirkin Wed, 20 Mar 2019 07:13:08 -0700

On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote:
> 
> 
> > On 20 Mar 2019, at 12:25, Michael S. Tsirkin <[email protected]> wrote:
> > 
> > On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 19 Mar 2019, at 23:19, Michael S. Tsirkin <[email protected]> wrote:
> >>> 
> >>> On Tue, Mar 19, 2019 at 08:46:47AM -0700, Stephen Hemminger wrote:
> >>>> On Tue, 19 Mar 2019 14:38:06 +0200
> >>>> Liran Alon <[email protected]> wrote:
> >>>> 
> >>>>> b.3) cloud-init: If configured to perform network-configuration, it 
> >>>>> attempts to configure all available netdevs. It should avoid however 
> >>>>> doing so on net-failover slaves.
> >>>>> (Microsoft has handled this by adding a mechanism in cloud-init to 
> >>>>> blacklist a netdev from being configured in case it is owned by a 
> >>>>> specific PCI driver. Specifically, they blacklist Mellanox VF driver. 
> >>>>> However, this technique doesn’t work for the net-failover mechanism 
> >>>>> because both the net-failover netdev and the virtio-net netdev are 
> >>>>> owned by the virtio-net PCI driver).
> >>>> 
> >>>> Cloud-init should really just ignore all devices that have a master 
> >>>> device.
> >>>> That would have been more general, and safer for other use cases.
> >>> 
> >>> Given lots of userspace doesn't do this, I wonder whether it would be
> >>> safer to just somehow pretend to userspace that the slave links are
> >>> down? And add a special attribute for the actual link state.
> >> 
> >> I think this may be problematic as it would also break legit use case
> >> of userspace attempt to set various config on VF slave.
> >> In general, lying to userspace usually leads to problems.
> > 
> > I hear you on this. So how about instead of lying,
> > we basically just fail some accesses to slaves
> > unless a flag is set e.g. in ethtool.
> > 
> > Some userspace will need to change to set it but in a minor way.
> > Arguably/hopefully failure to set config would generally be a safer
> > failure.
> 
> Once userspace will set this new flag by ethtool, all operations done by 
> other userspace components will still work.


Sorry about being unclear, the idea would be to require the flag on each 
ethtool operation.

> E.g. Running dhclient without parameters, after this flag was set, will still 
> attempt to perform DHCP on it and will now succeed.

I think sending/receiving should probably just fail unconditionally.

> Therefore, this proposal just effectively delays when the net-failover slave 
> can be operated on by userspace.
> But what we actually want is to never allow a net-failover slave to be 
> operated by userspace unless it is explicitly stated
> by userspace that it wishes to perform a set of actions on the net-failover 
> slave.
> 
> Something that was achieved if, for example, the net-failover slaves were in 
> a different netns than default netns.
> This also aligns with expected customer experience that most customers just 
> want to see a 1:1 mapping between a vNIC and a visible netdev.
> But of course maybe there are other ideas that can achieve similar behaviour.
> 
> -Liran
> 
> > 
> > Which things to fail? Probably sending/receiving packets?  Getting MAC?
> > More?
> > 
> >> If we reach
> >> to a scenario where we try to avoid userspace issues generically and
> >> not on a userspace component basis, I believe the right path should be
> >> to hide the net-failover slaves such that explicit action is required
> >> to actually manipulate them (As described in blog-post). E.g.
> >> Automatically move net-failover slaves by kernel to a different netns.
> >> 
> >> -Liran
> >> 
> >>> 
> >>> -- 
> >>> MST
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

Reply via email to