Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior

2018-10-05 Thread José Ramón Méndez Reboredo
Hello,

I have a similar error (seems it is connected with the ones above
mentioned) using bridge-utils but not bonding support. My configuration is

+Debian streetch
# uname -a
Linux server2 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13)
x86_64 GNU/Linux
# cat /etc/debian_version
9.5
# dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $1, $2,
$3}'
ii bridge-utils 1.5-13+deb9u1
ii ifupdown 0.8.19

I have a Realtek 8139 based network card for teaching purposes on system
administration:
# dmesg | grep 8139
[0.137837] pci :01:0e.0: [10ec:8139] type 00 class 0x02
[0.704364] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22,
2004)
[0.704370] 8139cp :01:0e.0: This (id 10ec:8139 rev 10) is not an
8139C+ compatible chip, use 8139too
[0.704858] 8139too: 8139too Fast Ethernet driver 0.9.28
[0.705537] 8139too :01:0e.0 eth0: RealTek RTL8139 at
0xa9a8c0339c00, 00:08:54:53:8c:01, IRQ 17
[0.736030] 8139too :01:0e.0 enp1s14: renamed from eth0
[1.503524] 8139too :01:0e.0 enp1s14: link up, 100Mbps, full-duplex,
lpa 0xC1E1
[  846.953912] 8139too :01:0e.0 enp1s14: link up, 100Mbps, full-duplex,
lpa 0xC1E1


My network configuration is the following
auto lo
iface lo inet loopback
iface enp1s14 inet manual

auto xenbr0
iface xenbr0 inet dhcp
 bridge_ports enp1s14
 bridge_hw 00:08:54:53:8c:02
 bridge_stp off
 bridge_fd 0
 bridge_maxwait 0

This configuration (that works fine in Jessie even without the specifiying
the manual configuration of enp1s14 interface) on reboot and startup
results in enp1s14 and xenbr0 (both) makes a DHCP request and get
autocongured with the IP address (see below). I changed the mac address of
xenbr0 to see better the problem. If i do not do it, both dhcp requests are
made resulting in the same IP configuration for the two devices. Due to
this inconsistent state if bridge_maxwait option is not used the operations
with networking service are lagged for a lot of time.

# ip a s
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: enp1s14:  mtu 1500 qdisc pfifo_fast
master xenbr0 state UNKNOWN group default qlen 1000
link/ether 00:08:54:53:8c:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.112.239/24 brd 192.168.112.255 scope global enp1s14
   valid_lft forever preferred_lft forever
3: xenbr0:  mtu 1500 qdisc noqueue state
UP group default qlen 1000
link/ether 00:08:54:53:8c:02 brd ff:ff:ff:ff:ff:ff
inet 192.168.112.135/24 brd 192.168.112.255 scope global xenbr0
   valid_lft forever preferred_lft forever
inet6 fe80::208:54ff:fe53:8c02/64 scope link
   valid_lft forever preferred_lft forever


In order to solve the problem every time the system starts i have to:
+ stop networking service: service networking stop or systemctl stop
networking.service. After this IP address of enp1s14 is not removed.
+ Manually delete the erroneous configuration for ethernet interface
(enp1s14): ip a d 192.168.xxx.yyy dev enp1s14
+ relaunch networking service: service networking start
After that everything works fine (service relaunching and all). So the
problem seems to be at startup.

Also, feel free to ask me for additional information. However, my technical
skills are not enough to suggest a solution for the  problem.

Best regards.

-- 
J. Ramón Méndez
University of Vigo (Spain)


Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior

2018-03-08 Thread Nikos Kormpakis

Hello,

I'd like to share also my experience with this bug, which also affects
us at work (GRNET). We have the following setup:

# cat /etc/debian_version
9.3
# dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $2, $3}'
ii  bridge-utils 1.5-13+deb9u1
ii  ifupdown 0.8.19
ii  vlan 1.9-3.2+b1
# uname -a
Linux foo 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 
GNU/Linux

I have reproduced it by disabling networking.service, not loading the
bonding module on boot, with the following configuration:

# cat /etc/network/interfaces
auto bond0
iface bond0 inet static
  mtu   9000
  bond-mode 802.3ad
  bond_xmit_hash_policy layer3+4
  bond-miimon   100
  slavesens5f0 ens5f1

auto vlan109
iface vlan109 inet manual
  bridge_ports   bond0.109
  bridge_stp off
  bridge_maxwait 0
  bridge_fd  0
  mtu9000

auto vlan110
iface vlan110 inet manual
  bridge_ports   bond0.110
  bridge_stp off
  bridge_maxwait 0
  bridge_fd  0
  mtu9000

# cat /etc/modules
8021q
bonding

In our case, we noticed the following timeline which is quite similar
like Apollon's one:

* bonding module gets loaded into the kernel, way before
  networking.service gets started (defined in /etc/modules), should
  be unnecessary tbh)
* Interface bond0 gets created, which triggers a udev 'add' action
* The action calls bridge-network-interface with INTERFACE=bond0
* bridge-network-interface creates interface bond0.109. bond0.109 has
  MTU 1500 because ifup has not ran yet
* The creation of bond0.109 triggers another udev 'add' action (which, I
  think, should not happen)
* bridge-network-interface tries to run ifup --allow auto vlan109
* The above command fails because it cannot set the MTU of vlan109 to
  9000, because bond0.109's MTU is 1500. vlan109 interface is left in an
  unconfigured state.
* /lib/udev/bridge-network-interface fails because of set -e
* The second call of bridge-network-interface with INTERFACE=bond0.109
  fails in a similar way. All other interfaces are untouched.
* systemd starts up networking.service and runs ifup --allow=auto -a
* bond0 gets MTU 9000
* ifup tries to get vlan109 interface up
* This fails because bond0.109's MTU is 1500. It seems that ifupdown
  and/or bridge-utils do not touch it
* ifup for vlan110 runs successfully because it creates a new bond0.110
  interface, which inherits the MTU of bond0, which is now 9000 and gets
  up correctly

The above behavior does not always happen: If, for some reason,
networking.service gets started before bridge-network-interface runs its
stuff, all interfaces will get up correctly. Also, this affects only the
first interface in /e/n/i which has bridge_ports stanza defined, because
bridge-network-interface fails for the reasons I described above.

I agree with Apollon, I really do not understand what the code is trying
to do and why BRIDGE_HOTPLUG defaults to yes. We ran into serious
problems with silent packet loss in QEMU VMs, which had their tap
interfaces bridged to the above vlanXXX interfaces and MTU 9000 and the
only way to mitigate this problem for now is to set BRIDGE_HOTPLUG=no.

Unfortunately, it's not quite easy for us to suggest a solution but we
can provide more information if needed.

Regards,
Nikos



Bug#892277: bridge-utils: hotplugging interferes with ifupdown resulting in unpredictable behavior

2018-03-07 Thread Apollon Oikonomopoulos
Package: bridge-utils
Version: 1.5-11
Severity: serious

TL;DR: If you're using bridges, bonds and VLANs together, set 
   BRIDGE_HOTPLUG=no in /etc/default/bridge-utils.

Dear Maintainer,

There are some rather serious race conditions arising from the fact that 
bridge-utils handles udev events triggered by ifupdown actions and  
messes with the state of various interfaces while ifupdown is still 
running. To illustrate why this is happening, take the following e/n/i 
configuration as an example:

auto bond0
iface bond0 inet manual
  bond-slaves eth0 eth1
  bond-mode   active-backup
  bond-miimon 100
  up ip link set $IFACE mtu 9000

auto dmz
iface dmz inet manual
  bridge_ports  bond0.200
  bridge_fd   0
  bridge_stp  off
  bridge_maxwait  0
  up ip link set $IFACE up

This straightforward configuration worked fine in Jessie, but produces 
unexpected results on boot since Stretch, which - among others - 
include:

 - not setting the bond mode to active-backup, but to round-robin
 - creating bond0.200 with MTU 1500 instead of 9000

We have been hit by the above issues on production systems dist-upgraded 
to Stretch, and it all comes down to the races introduced by the 
bridge-utils hotplug support (which is now enabled by default).

So, what is actually happening is the following:

 1. On boot, networking.service calls `ifup --allow=auto -a`. This 
starts off by creating bond0. As soon as the ifenslave hooks create 
the interface, a udev "add" event for bond0 is triggered, *while 
ifup is still configuring bond0*.

 2. /lib/udev/bridge-network-interface is called, with $INTERFACE set to 
bond0. The script will run `ifquery --list --allow auto` and will 
look for any interface containing bond0 or bond0.* in its 
bridge_ports, matching "dmz" in our case. It will then go on to:
 a) create_vlan_port: this will run `ip link set bond0 up` and then 
create the vlan sub-interface on bond0
 b) call `ifup dmz` once the vlan port has been created

All of the above - for all we know - happen while `ifup -a` is 
*still* configuring bond0 on its own.

Step 2 is especially troublesome for the following reasons:

 i) create_vlan_port messes with the interface state while ifup is still 
configuring it. Bonding interfaces - for instance - need to be down 
to have their mode configured, and create_vlan_port explicitly sets 
the bond interface up. This causes the bond interface to potentially 
come up with the default mode (round-robin), making the system 
unreachable in case e.g. 802.3ad was requested.

 ii) create_vlan_port creates the VLAN sub-interface while the 
 underlying device is still being configured. This means that the 
 VLAN interface may be inherit the wrong MTU value, if ifup has not 
 yet set the parent interface's MTU to the desired value at the time 
 the VLAN interface is created.

 iii) dmz is brought up whenever bond0 is brought up, although this has 
  not been necessarily requested.

 iv) dmz is configured twice (once because of `ifup -a` and once because 
 of bridge-utils setting it up).

Note that high-cpu-count SMP systems seem more prone to the races i) and 
ii).

To be completely honest, I don't know what the hotplugging code is 
trying to achieve here, especially when it comes to short-circuiting 
ifupdown's internals. At a bare minimum, it should neither bring up 
"auto" interfaces that happen to be down, nor touch any interface while 
ifup might be still configuring it.

Regards,
Apollon

-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'testing'), (500, 'stable'), (90, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.14.0-3-amd64 (SMP w/4 CPU cores)
Locale: LANG=el_GR.UTF-8, LC_CTYPE=el_GR.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages bridge-utils depends on:
ii  libc6  2.25-5

bridge-utils recommends no packages.

Versions of packages bridge-utils suggests:
ii  ifupdown  0.8.29

-- no debconf information