Re: [summary] virtio network device failover writeup

Liran Alon Fri, 19 Apr 2019 16:45:18 -0700


> On 20 Mar 2019, at 16:09, Michael S. Tsirkin <[email protected]> wrote:
> 
> On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote:
>> 
>> 
>>> On 20 Mar 2019, at 12:25, Michael S. Tsirkin <[email protected]> wrote:
>>> 
>>> On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran Alon wrote:
>>>> 
>>>> 
>>>>> On 19 Mar 2019, at 23:19, Michael S. Tsirkin <[email protected]> wrote:
>>>>> 
>>>>> On Tue, Mar 19, 2019 at 08:46:47AM -0700, Stephen Hemminger wrote:
>>>>>> On Tue, 19 Mar 2019 14:38:06 +0200
>>>>>> Liran Alon <[email protected]> wrote:
>>>>>> 
>>>>>>> b.3) cloud-init: If configured to perform network-configuration, it 
>>>>>>> attempts to configure all available netdevs. It should avoid however 
>>>>>>> doing so on net-failover slaves.
>>>>>>> (Microsoft has handled this by adding a mechanism in cloud-init to 
>>>>>>> blacklist a netdev from being configured in case it is owned by a 
>>>>>>> specific PCI driver. Specifically, they blacklist Mellanox VF driver. 
>>>>>>> However, this technique doesn’t work for the net-failover mechanism 
>>>>>>> because both the net-failover netdev and the virtio-net netdev are 
>>>>>>> owned by the virtio-net PCI driver).
>>>>>> 
>>>>>> Cloud-init should really just ignore all devices that have a master 
>>>>>> device.
>>>>>> That would have been more general, and safer for other use cases.
>>>>> 
>>>>> Given lots of userspace doesn't do this, I wonder whether it would be
>>>>> safer to just somehow pretend to userspace that the slave links are
>>>>> down? And add a special attribute for the actual link state.
>>>> 
>>>> I think this may be problematic as it would also break legit use case
>>>> of userspace attempt to set various config on VF slave.
>>>> In general, lying to userspace usually leads to problems.
>>> 
>>> I hear you on this. So how about instead of lying,
>>> we basically just fail some accesses to slaves
>>> unless a flag is set e.g. in ethtool.
>>> 
>>> Some userspace will need to change to set it but in a minor way.
>>> Arguably/hopefully failure to set config would generally be a safer
>>> failure.
>> 
>> Once userspace will set this new flag by ethtool, all operations done by 
>> other userspace components will still work.
> 
> Sorry about being unclear, the idea would be to require the flag on each 
> ethtool operation.


Oh. I have indeed misunderstood your previous email then. :)
Thanks for clarifying.

> 
>> E.g. Running dhclient without parameters, after this flag was set, will 
>> still attempt to perform DHCP on it and will now succeed.
> 
> I think sending/receiving should probably just fail unconditionally.

You mean that you wish that somehow kernel will prevent Tx on net-failover 
slave netdev
unless skb is marked with some flag to indicate it has been sent via the 
net-failover master?

This indeed resolves the group of userspace issues around performing DHCP on 
net-failover slaves directly (By dracut/initramfs, dhclient and etc.).

However, I see a couple of down-sides to it:
1) It doesn’t resolve all userspace issues listed in this email thread. For 
example, cloud-init will still attempt to perform network config on 
net-failover slaves.
It also doesn’t help with regard to Ubuntu’s netplan issue that creates udev 
rules that match only by MAC.
2) It brings non-intuitive customer experience. For example, a customer may 
attempt to analyse connectivity issue by checking the connectivity
on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact 
checking the connectivity on the net-failover master netdev shows correct 
connectivity.

The set of changes I vision to fix our issues are:
1) Hide net-failover slaves in a different netns created and managed by the 
kernel. But that user can enter to it and manage the netdevs there if wishes to 
do so explicitly.
(E.g. Configure the net-failover VF slave in some special way).
2) Match the virtio-net and the VF based on a PV attribute instead of MAC. 
(Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI 
slot where the matching VF will be hot-plugged by hypervisor.
3) Have an explicit virtio-net control message to command hypervisor to switch 
data-path from virtio-net to VF and vice-versa. Instead of relying on 
intercepting the PCI master enable-bit
as an indicator on when VF is about to be set up. (Similar to as done in 
NetVSC).

Is there any clear issue we see regarding the above suggestion?

-Liran

> 
>> Therefore, this proposal just effectively delays when the net-failover slave 
>> can be operated on by userspace.
>> But what we actually want is to never allow a net-failover slave to be 
>> operated by userspace unless it is explicitly stated
>> by userspace that it wishes to perform a set of actions on the net-failover 
>> slave.
>> 
>> Something that was achieved if, for example, the net-failover slaves were in 
>> a different netns than default netns.
>> This also aligns with expected customer experience that most customers just 
>> want to see a 1:1 mapping between a vNIC and a visible netdev.
>> But of course maybe there are other ideas that can achieve similar behaviour.
>> 
>> -Liran
>> 
>>> 
>>> Which things to fail? Probably sending/receiving packets?  Getting MAC?
>>> More?
>>> 
>>>> If we reach
>>>> to a scenario where we try to avoid userspace issues generically and
>>>> not on a userspace component basis, I believe the right path should be
>>>> to hide the net-failover slaves such that explicit action is required
>>>> to actually manipulate them (As described in blog-post). E.g.
>>>> Automatically move net-failover slaves by kernel to a different netns.
>>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>> -- 
>>>>> MST

_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

Reply via email to