Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior
Hello, I have a similar error (seems it is connected with the ones above mentioned) using bridge-utils but not bonding support. My configuration is +Debian streetch # uname -a Linux server2 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13) x86_64 GNU/Linux # cat /etc/debian_version 9.5 # dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $1, $2, $3}' ii bridge-utils 1.5-13+deb9u1 ii ifupdown 0.8.19 I have a Realtek 8139 based network card for teaching purposes on system administration: # dmesg | grep 8139 [0.137837] pci :01:0e.0: [10ec:8139] type 00 class 0x02 [0.704364] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) [0.704370] 8139cp :01:0e.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip, use 8139too [0.704858] 8139too: 8139too Fast Ethernet driver 0.9.28 [0.705537] 8139too :01:0e.0 eth0: RealTek RTL8139 at 0xa9a8c0339c00, 00:08:54:53:8c:01, IRQ 17 [0.736030] 8139too :01:0e.0 enp1s14: renamed from eth0 [1.503524] 8139too :01:0e.0 enp1s14: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 846.953912] 8139too :01:0e.0 enp1s14: link up, 100Mbps, full-duplex, lpa 0xC1E1 My network configuration is the following auto lo iface lo inet loopback iface enp1s14 inet manual auto xenbr0 iface xenbr0 inet dhcp bridge_ports enp1s14 bridge_hw 00:08:54:53:8c:02 bridge_stp off bridge_fd 0 bridge_maxwait 0 This configuration (that works fine in Jessie even without the specifiying the manual configuration of enp1s14 interface) on reboot and startup results in enp1s14 and xenbr0 (both) makes a DHCP request and get autocongured with the IP address (see below). I changed the mac address of xenbr0 to see better the problem. If i do not do it, both dhcp requests are made resulting in the same IP configuration for the two devices. Due to this inconsistent state if bridge_maxwait option is not used the operations with networking service are lagged for a lot of time. # ip a s 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s14: mtu 1500 qdisc pfifo_fast master xenbr0 state UNKNOWN group default qlen 1000 link/ether 00:08:54:53:8c:01 brd ff:ff:ff:ff:ff:ff inet 192.168.112.239/24 brd 192.168.112.255 scope global enp1s14 valid_lft forever preferred_lft forever 3: xenbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:08:54:53:8c:02 brd ff:ff:ff:ff:ff:ff inet 192.168.112.135/24 brd 192.168.112.255 scope global xenbr0 valid_lft forever preferred_lft forever inet6 fe80::208:54ff:fe53:8c02/64 scope link valid_lft forever preferred_lft forever In order to solve the problem every time the system starts i have to: + stop networking service: service networking stop or systemctl stop networking.service. After this IP address of enp1s14 is not removed. + Manually delete the erroneous configuration for ethernet interface (enp1s14): ip a d 192.168.xxx.yyy dev enp1s14 + relaunch networking service: service networking start After that everything works fine (service relaunching and all). So the problem seems to be at startup. Also, feel free to ask me for additional information. However, my technical skills are not enough to suggest a solution for the problem. Best regards. -- J. Ramón Méndez University of Vigo (Spain)
Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior
Hello, I'd like to share also my experience with this bug, which also affects us at work (GRNET). We have the following setup: # cat /etc/debian_version 9.3 # dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $2, $3}' ii bridge-utils 1.5-13+deb9u1 ii ifupdown 0.8.19 ii vlan 1.9-3.2+b1 # uname -a Linux foo 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux I have reproduced it by disabling networking.service, not loading the bonding module on boot, with the following configuration: # cat /etc/network/interfaces auto bond0 iface bond0 inet static mtu 9000 bond-mode 802.3ad bond_xmit_hash_policy layer3+4 bond-miimon 100 slavesens5f0 ens5f1 auto vlan109 iface vlan109 inet manual bridge_ports bond0.109 bridge_stp off bridge_maxwait 0 bridge_fd 0 mtu9000 auto vlan110 iface vlan110 inet manual bridge_ports bond0.110 bridge_stp off bridge_maxwait 0 bridge_fd 0 mtu9000 # cat /etc/modules 8021q bonding In our case, we noticed the following timeline which is quite similar like Apollon's one: * bonding module gets loaded into the kernel, way before networking.service gets started (defined in /etc/modules), should be unnecessary tbh) * Interface bond0 gets created, which triggers a udev 'add' action * The action calls bridge-network-interface with INTERFACE=bond0 * bridge-network-interface creates interface bond0.109. bond0.109 has MTU 1500 because ifup has not ran yet * The creation of bond0.109 triggers another udev 'add' action (which, I think, should not happen) * bridge-network-interface tries to run ifup --allow auto vlan109 * The above command fails because it cannot set the MTU of vlan109 to 9000, because bond0.109's MTU is 1500. vlan109 interface is left in an unconfigured state. * /lib/udev/bridge-network-interface fails because of set -e * The second call of bridge-network-interface with INTERFACE=bond0.109 fails in a similar way. All other interfaces are untouched. * systemd starts up networking.service and runs ifup --allow=auto -a * bond0 gets MTU 9000 * ifup tries to get vlan109 interface up * This fails because bond0.109's MTU is 1500. It seems that ifupdown and/or bridge-utils do not touch it * ifup for vlan110 runs successfully because it creates a new bond0.110 interface, which inherits the MTU of bond0, which is now 9000 and gets up correctly The above behavior does not always happen: If, for some reason, networking.service gets started before bridge-network-interface runs its stuff, all interfaces will get up correctly. Also, this affects only the first interface in /e/n/i which has bridge_ports stanza defined, because bridge-network-interface fails for the reasons I described above. I agree with Apollon, I really do not understand what the code is trying to do and why BRIDGE_HOTPLUG defaults to yes. We ran into serious problems with silent packet loss in QEMU VMs, which had their tap interfaces bridged to the above vlanXXX interfaces and MTU 9000 and the only way to mitigate this problem for now is to set BRIDGE_HOTPLUG=no. Unfortunately, it's not quite easy for us to suggest a solution but we can provide more information if needed. Regards, Nikos
Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior
Package: bridge-utils Version: 1.5-11 Severity: serious TL;DR: If you're using bridges, bonds and VLANs together, set BRIDGE_HOTPLUG=no in /etc/default/bridge-utils. Dear Maintainer, There are some rather serious race conditions arising from the fact that bridge-utils handles udev events triggered by ifupdown actions and messes with the state of various interfaces while ifupdown is still running. To illustrate why this is happening, take the following e/n/i configuration as an example: auto bond0 iface bond0 inet manual bond-slaves eth0 eth1 bond-mode active-backup bond-miimon 100 up ip link set $IFACE mtu 9000 auto dmz iface dmz inet manual bridge_ports bond0.200 bridge_fd 0 bridge_stp off bridge_maxwait 0 up ip link set $IFACE up This straightforward configuration worked fine in Jessie, but produces unexpected results on boot since Stretch, which - among others - include: - not setting the bond mode to active-backup, but to round-robin - creating bond0.200 with MTU 1500 instead of 9000 We have been hit by the above issues on production systems dist-upgraded to Stretch, and it all comes down to the races introduced by the bridge-utils hotplug support (which is now enabled by default). So, what is actually happening is the following: 1. On boot, networking.service calls `ifup --allow=auto -a`. This starts off by creating bond0. As soon as the ifenslave hooks create the interface, a udev "add" event for bond0 is triggered, *while ifup is still configuring bond0*. 2. /lib/udev/bridge-network-interface is called, with $INTERFACE set to bond0. The script will run `ifquery --list --allow auto` and will look for any interface containing bond0 or bond0.* in its bridge_ports, matching "dmz" in our case. It will then go on to: a) create_vlan_port: this will run `ip link set bond0 up` and then create the vlan sub-interface on bond0 b) call `ifup dmz` once the vlan port has been created All of the above - for all we know - happen while `ifup -a` is *still* configuring bond0 on its own. Step 2 is especially troublesome for the following reasons: i) create_vlan_port messes with the interface state while ifup is still configuring it. Bonding interfaces - for instance - need to be down to have their mode configured, and create_vlan_port explicitly sets the bond interface up. This causes the bond interface to potentially come up with the default mode (round-robin), making the system unreachable in case e.g. 802.3ad was requested. ii) create_vlan_port creates the VLAN sub-interface while the underlying device is still being configured. This means that the VLAN interface may be inherit the wrong MTU value, if ifup has not yet set the parent interface's MTU to the desired value at the time the VLAN interface is created. iii) dmz is brought up whenever bond0 is brought up, although this has not been necessarily requested. iv) dmz is configured twice (once because of `ifup -a` and once because of bridge-utils setting it up). Note that high-cpu-count SMP systems seem more prone to the races i) and ii). To be completely honest, I don't know what the hotplugging code is trying to achieve here, especially when it comes to short-circuiting ifupdown's internals. At a bare minimum, it should neither bring up "auto" interfaces that happen to be down, nor touch any interface while ifup might be still configuring it. Regards, Apollon -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (90, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.14.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=el_GR.UTF-8, LC_CTYPE=el_GR.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages bridge-utils depends on: ii libc6 2.25-5 bridge-utils recommends no packages. Versions of packages bridge-utils suggests: ii ifupdown 0.8.29 -- no debconf information