Hi Michael,

Great blog-post which summarise everything very well!

Some comments I have:

1) I think that when we are using the term “1-netdev model” on community 
discussion, we tend to refer to what you have defined in blog-post as "3-device 
model with hidden slaves”.
Therefore, I would suggest to just remove the “1-netdev model” section and 
rename the "3-device model with hidden slaves” section to “1-netdev model”.

2) The userspace issues result both from using “2-netdev model” and “3-netdev 
model”. However, they are described in blog-post as they only exist on 
“3-netdev model”.
The reason these issues are not seen in Azure environment is because these 
issues were partially handled by Microsoft for their specific 2-netdev model.
Which leads me to the next comment.

3) I suggest that blog-post will also elaborate on what exactly are the 
userspace issues which results in models different than “1-netdev model”.
The issues that I’m aware of are (Please tell me if you are aware of others!):
(a) udev rename race-condition: When net-failover device is opened, it also 
opens it's slaves. However, the order of events to udev on KOBJ_ADD is first 
for the net-failover netdev and only then for the virtio-net netdev. This means 
that if userspace will respond to first event by open the net-failover, then 
any attempt of userspace to rename virtio-net netdev as a response to the 
second event will fail because the virtio-net netdev is already opened. Also 
note that this udev rename rule is useful because we would like to add rules 
that renames virtio-net netdev to clearly signal that it’s used as the standby 
interface of another net-failover netdev.
The way this problem was workaround by Microsoft in NetVSC is to delay the open 
done on slave-VF from the open of the NetVSC netdev. However, this is still a 
race and thus a hacky solution. It was accepted by community only because it’s 
internal to the NetVSC driver. However, similar solution was rejected by 
community for the net-failover driver.
The solution that we currently proposed to address this (Patch by Si-Wei) was 
to change the rename kernel handling to allow a net-failover slave to be 
renamed even if it is already opened. Patch is still not accepted.
(b) Issues caused because of various userspace components DHCP the net-failover 
slaves: DHCP of course should only be done on the net-failover netdev. 
Attempting to DHCP on net-failover slaves as-well will cause networking issues. 
Therefore, userspace components should be taught to avoid doing DHCP on the 
net-failover slaves. The various userspace components include:
b.1) dhclient: If run without parameters, it by default just enum all netdevs 
and attempt to DHCP them all.
(I don’t think Microsoft has handled this)
b.2) initramfs / dracut: In order to mount the root file-system from iSCSI, 
these components needs networking and therefore DHCP on all netdevs.
(Microsoft haven’t handled (b.2) because they don’t have images which perform 
iSCSI boot in their Azure setup. Still an open issue)
b.3) cloud-init: If configured to perform network-configuration, it attempts to 
configure all available netdevs. It should avoid however doing so on 
net-failover slaves.
(Microsoft has handled this by adding a mechanism in cloud-init to blacklist a 
netdev from being configured in case it is owned by a specific PCI driver. 
Specifically, they blacklist Mellanox VF driver. However, this technique 
doesn’t work for the net-failover mechanism because both the net-failover 
netdev and the virtio-net netdev are owned by the virtio-net PCI driver).
b.4) Various distros network-manager need to be updated to avoid DHCP on 
net-failover slaves? (Not sure. Asking...)

4) Another interesting use-case where the net-failover mechanism is useful is 
for handling NIC firmware failures or NIC firmware Live-Upgrade.
In both cases, there is a need to perform a full PCIe reset of the NIC. Which 
lose all the NIC eSwitch configuration of the various VFs.
To handle these cases gracefully, one could just hot-unplug all VFs from guests 
running on host (which will make all guests now use the virtio-net netdev which 
is backed by a netdev that eventually is on top of PF). Therefore, networking 
will be restored to guests once the PCIe reset is completed and the PF is 
functional again. To re-acceelrate the guests network, hypervisor can just 
hot-plug new VFs to guests.

I would very appreciate all this forum help in closing on the pending items 
written in (3). Which currently prevents using this net-failover mechanism in 
real production use-cases.


> On 17 Mar 2019, at 15:55, Michael S. Tsirkin <m...@redhat.com> wrote:
> Hi all,
> I've put up a blog post with a summary of where network
> device failover stands and some open issues.
> Not sure where best to host it, I just put it up on blogspot:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mstsirkin.blogspot.com_2019_03_virtio-2Dnetwork-2Ddevice-2Dfailover-2Dsupport.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=jd0emHx6EkPSTvO0TytfYmG4rOMQ9htenhrgKprrh9E&s=5EJamlc_g1lZa0Ga7K30E6aWVg3jy8lizhw1aSguo3A&e=
> Comments, corrections are welcome!
> -- 

Virtualization mailing list

Reply via email to