Adding SRU proposal for wily.

** Description changed:

+ [Impact]
+ 
+  * A lack of proper synchronization in ifupdown causes a race condition
+ resulting in occasional incorrect network interface initialization (e.g.
+ in bonding case - wrong bonding settings, network unavailable because
+ slave<->master interfaces initialization order was wrong
+ 
+  * This is very annoying in case of large deployments (e.g. when
+ bringing up 1000 machines it is almost guaranteed that at least a few of
+ them will end up with network down).
+ 
+  * It has been fixed by introducing hierarchical and per-interface
+ locking mechanism ensuring the right order (along with the correct order
+ in the /e/n/interfaces file) of initialization
+ 
+ [Test Case]
+ 
+  1. Create a VM with bonding configured with at least 2 slave interfaces.
+  2. Reboot.
+  3. If all interfaces are up - go to 2.
+ 
+ [Regression Potential]
+ 
+  * This change has been introduced upstream in Debian.
+  * It does not require any config changes to existing installations.
+ 
+ [Other Info]
+  
+ Original bug description:
+ 
  * please consider my bonding examples are using eth1 and eth2 as slave
-  interfaces.
+  interfaces.
  
  ifupdown some race conditions explained bellow. ifenslave does not
  behave well with sysv networking and upstart network-interface scripts
  running together.
  
  !!!!
  case 1)
  (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
-                                   1-1. Wait for locking ifstate.lock
-                                       file.
+                                   1-1. Wait for locking ifstate.lock
+                                       file.
  1-2. Read ifstate file to check
-      the target NIC.
+      the target NIC.
  1-3. close(=release) ifstate.lock
-      file.
+      file.
  1-4. Judge that the target NIC
-      isn't processed.
-                                   1-2. Read ifstate file to check
-                                        the target NIC.
-                                   1-3. close(=release) ifstate.lock
-                                        file.
-                                   1-4. Judge that the target NIC
-                                        isn't processed.
+      isn't processed.
+                                   1-2. Read ifstate file to check
+                                        the target NIC.
+                                   1-3. close(=release) ifstate.lock
+                                        file.
+                                   1-4. Judge that the target NIC
+                                        isn't processed.
  2. Lock and update ifstate file.
-    Release the lock.
-                                   2. Lock and update ifstate file.
-                                      Release the lock.
+    Release the lock.
+                                   2. Lock and update ifstate file.
+                                      Release the lock.
  !!!
  
  to be explained
  
  !!!
  case 2)
  (a) ifenslave of eth0                  (b) ifenslave of eth0
  ------------------------------------------------------------------
  3. Execute ifenslave of eth0.  3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
-    /sys/class/net/bond0/bonding
-    /slaves then NIC gets up
-                                   4. Link down the target NIC.
-                                   5. Fails to write NIC id to
-                                      /sys/class/net/bond0/bonding/
-                                      slaves it is already written.
+    /sys/class/net/bond0/bonding
+    /slaves then NIC gets up
+                                   4. Link down the target NIC.
+                                   5. Fails to write NIC id to
+                                      /sys/class/net/bond0/bonding/
+                                      slaves it is already written.
  !!!
  
  #####################################################################
  
  #### My setup:
  
  root@provisioned:~# cat /etc/modprobe.d/bonding.conf
  alias bond0 bonding options bonding mode=1 arp_interval=2000
  
  Both, /etc/init.d/networking and upstart network-interface begin
  enabled.
  
  #### Beginning:
  
  root@provisioned:~# cat /etc/network/interfaces
  # /etc/network/interfaces
  
  auto lo
  iface lo inet loopback
  
  auto eth0
  iface eth0 inet dhcp
  
  I'm able to boot with both scripts (networking and network-interface
- enabled) with no problem. I can also boot with only "networking" 
+ enabled) with no problem. I can also boot with only "networking"
  script enabled:
  
  ---
  root@provisioned:~# initctl list | grep network
  network-interface stop/waiting
  networking start/running
  ---
  
  OR only the script "network-interface" enabled:
  
  ---
  root@provisioned:~# initctl list | grep network
  network-interface (eth2) start/running
  network-interface (lo) start/running
  network-interface (eth0) start/running
  network-interface (eth1) start/running
  ---
  
  #### Enabling bonding:
  
  Following ifenslave configuration example (/usr/share/doc/ifenslave/
- examples/two_hotplug_ethernet), my /etc/network/interfaces has to 
+ examples/two_hotplug_ethernet), my /etc/network/interfaces has to
  look like this:
  
  ---
  auto eth1
  iface eth1 inet manual
-     bond-master bond0
+     bond-master bond0
  
  auto eth2
  iface eth2 inet manual
-     bond-master bond0
+     bond-master bond0
  
  auto bond0
  iface bond0 inet static
-     bond-mode 1
-     bond-miimon 100
-     bond-primary eth1 eth2
-       address 192.168.169.1
-       netmask 255.255.255.0
-       broadcast 192.168.169.255
+     bond-mode 1
+     bond-miimon 100
+     bond-primary eth1 eth2
+  address 192.168.169.1
+  netmask 255.255.255.0
+  broadcast 192.168.169.255
  ---
  
  Having both scripts running does not make any difference since we
  are missing "bond-slaves" keyword on slave interfaces, for ifenslave
  to work, and they are set to "manual".
  
  Ifenslave code:
  
  """
  for slave in $BOND_SLAVES ; do
  ...
  # Ensure $slave is down.
  ip link set "$slave" down 2>/dev/null
  if ! sysfs_add slaves "$slave" 2>/dev/null ; then
-       echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER 
-                       ready and a bonding interface ?" >&2
+  echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER
+    ready and a bonding interface ?" >&2
  else
-       # Bring up slave if it is the target of an allow-bondX stanza.
-       # This is usefull to bring up slaves that need extra setup.
-       if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
-               --list | grep -q $slave; then
-               ifup $v --allow "$BOND_MASTER" "$slave"
-       fi
+  # Bring up slave if it is the target of an allow-bondX stanza.
+  # This is usefull to bring up slaves that need extra setup.
+  if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
+   --list | grep -q $slave; then
+   ifup $v --allow "$BOND_MASTER" "$slave"
+  fi
  """
  
  Without the keyword "bond-slaves" on the master interface declaration,
- ifenslave will NOT bring any slave interface up on the "master" 
- interface ifup invocation. 
+ ifenslave will NOT bring any slave interface up on the "master"
+ interface ifup invocation.
  
  *********** Part 1
  
  So, having networking sysv init script AND upstart network-interface
  script running together... the following example works:
  
  ---
  root@provisioned:~# cat /etc/network/interfaces
  # /etc/network/interfaces
  
  auto lo
  iface lo inet loopback
  
  auto eth0
  iface eth0 inet dhcp
  
  auto eth1
  iface eth1 inet manual
-     bond-master bond0
+     bond-master bond0
  
  auto eth2
  iface eth2 inet manual
-     bond-master bond0
+     bond-master bond0
  
  auto bond0
  iface bond0 inet static
-     bond-mode 1
-     bond-miimon 100
-     bond-primary eth1
-     bond-slaves eth1 eth2
-     address 192.168.169.1
-     netmask 255.255.255.0
-     broadcast 192.168.169.255
- ---
- 
- Ifenslave script sets link down to all slave interfaces, declared by 
- "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave 
- script ONLY tries to make a reentrant call to ifupdown if the slave 
- interfaces have "allow-bondX" stanza (not our case). 
+     bond-mode 1
+     bond-miimon 100
+     bond-primary eth1
+     bond-slaves eth1 eth2
+     address 192.168.169.1
+     netmask 255.255.255.0
+     broadcast 192.168.169.255
+ ---
+ 
+ Ifenslave script sets link down to all slave interfaces, declared by
+ "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave
+ script ONLY tries to make a reentrant call to ifupdown if the slave
+ interfaces have "allow-bondX" stanza (not our case).
  
  So this should not work, since when the master bonding interface
  (bond0) is called, ifenslave does not configure slaves without
  "allow-bondX" stanza. What is happening, why is it working ?
  
  If we disable upstart "network-interface" script.. our bonding stops
- to work on the boot. This is because upstart was the one setting 
+ to work on the boot. This is because upstart was the one setting
  the slave interfaces up (with the configuration above) and not
- sysv networking scripts. 
- 
- It is clear that ifenslave from sysv script invocation can set the 
- slave interface down anytime (even during upstart script execution) 
+ sysv networking scripts.
+ 
+ It is clear that ifenslave from sysv script invocation can set the
+ slave interface down anytime (even during upstart script execution)
  so it might work and might not:
  
  """
  ip link set "$slave" down 2>/dev/null
  """
  
  root@provisioned:~# initctl list | grep network-interface
  network-interface (eth2) start/running
  network-interface (lo) start/running
  network-interface (bond0) start/running
  network-interface (eth0) start/running
  network-interface (eth1) start/running
  
- Since having the interface down is a requirement to slave it, 
- running both scripts together (upstart and sysv) could create a 
+ Since having the interface down is a requirement to slave it,
+ running both scripts together (upstart and sysv) could create a
  situation where upstart puts slave interface online but ifenslave
  from sysv script puts it down and never bring it up again (because
- it does not have "allow-bondX" stanza). 
+ it does not have "allow-bondX" stanza).
  
  *********** Part 2
  
  What if I disable upstart "network-interface", stay only with the sysv
- script but introduce the "allow-bondX" stanza to slave interfaces ? 
+ script but introduce the "allow-bondX" stanza to slave interfaces ?
  
  The funny part begins... without upstart, the ifupdown tool calls
  ifenslave, for bond0 interface, and ifenslave calls this line:
  
  """
  for slave in $BOND_SLAVES ; do
  ...
-       if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
-               --list | grep -q $slave; then
-               ifup $v --allow "$BOND_MASTER" "$slave"
-       fi
+  if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
+   --list | grep -q $slave; then
+   ifup $v --allow "$BOND_MASTER" "$slave"
+  fi
  """
  
  But ifenslave stays waiting for the bond0 interface to be online
  forever. We do have a chicken egg situation now:
  
- * ifupdown trys to put bond0 interface online. 
+ * ifupdown trys to put bond0 interface online.
  * we are not running upstart network-interface script.
  * ifupdown for bond0 calls ifenslave.
  * ifenslave tries to find interfaces with "allow-bondX" stanza
  * ifenslave tries to ifup slave interfaces with that stanza
  * slave interfaces keep forever waiting for the master
  * master is waiting for the slave interface
  * slave interface is waiting for the master interface
  ... :D
  
  And we have an infinite loop for ifenslave:
  
- """ 
+ """
  # Wait for the master to be ready
- [ ! -f /run/network/ifenslave.$BOND_MASTER ] && 
-       echo "Waiting for bond master $BOND_MASTER to be ready"
+ [ ! -f /run/network/ifenslave.$BOND_MASTER ] &&
+  echo "Waiting for bond master $BOND_MASTER to be ready"
  while :; do
-     if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
-         break
-     fi
-     sleep 0.1
+     if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
+         break
+     fi
+     sleep 0.1
  done
  """
  
  *********** Conclusion
  
  That can be achieved if correct triggers are set (like the ones I just
- showed). Not having ifupdown parallel executions (sysv and upstart, 
+ showed). Not having ifupdown parallel executions (sysv and upstart,
  for example) can make an infinite loop to happen during the boot.
  
  Having parallel ifupdown executions can trigger race conditions
  between:
  
  1) ifupdown itself (case a on the bug description).
  2) ifupdown and ifenslave script (case b on the bug description).

** Patch added: "wily_ifupdown_0.7.54ubuntu2.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+attachment/4501797/+files/wily_ifupdown_0.7.54ubuntu2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1337873

Title:
  Precise, Trusty, Utopic - ifupdown initialization problems caused by
  race condition

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to