Adding SRU proposal for wily. ** Description changed:
+ [Impact] + + * A lack of proper synchronization in ifupdown causes a race condition + resulting in occasional incorrect network interface initialization (e.g. + in bonding case - wrong bonding settings, network unavailable because + slave<->master interfaces initialization order was wrong + + * This is very annoying in case of large deployments (e.g. when + bringing up 1000 machines it is almost guaranteed that at least a few of + them will end up with network down). + + * It has been fixed by introducing hierarchical and per-interface + locking mechanism ensuring the right order (along with the correct order + in the /e/n/interfaces file) of initialization + + [Test Case] + + 1. Create a VM with bonding configured with at least 2 slave interfaces. + 2. Reboot. + 3. If all interfaces are up - go to 2. + + [Regression Potential] + + * This change has been introduced upstream in Debian. + * It does not require any config changes to existing installations. + + [Other Info] + + Original bug description: + * please consider my bonding examples are using eth1 and eth2 as slave - interfaces. + interfaces. ifupdown some race conditions explained bellow. ifenslave does not behave well with sysv networking and upstart network-interface scripts running together. !!!! case 1) (a) ifup eth0 (b) ifup -a for eth0 ----------------------------------------------------------------- 1-1. Lock ifstate.lock file. - 1-1. Wait for locking ifstate.lock - file. + 1-1. Wait for locking ifstate.lock + file. 1-2. Read ifstate file to check - the target NIC. + the target NIC. 1-3. close(=release) ifstate.lock - file. + file. 1-4. Judge that the target NIC - isn't processed. - 1-2. Read ifstate file to check - the target NIC. - 1-3. close(=release) ifstate.lock - file. - 1-4. Judge that the target NIC - isn't processed. + isn't processed. + 1-2. Read ifstate file to check + the target NIC. + 1-3. close(=release) ifstate.lock + file. + 1-4. Judge that the target NIC + isn't processed. 2. Lock and update ifstate file. - Release the lock. - 2. Lock and update ifstate file. - Release the lock. + Release the lock. + 2. Lock and update ifstate file. + Release the lock. !!! to be explained !!! case 2) (a) ifenslave of eth0 (b) ifenslave of eth0 ------------------------------------------------------------------ 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. 4. Link down the target NIC. 5. Write NIC id to - /sys/class/net/bond0/bonding - /slaves then NIC gets up - 4. Link down the target NIC. - 5. Fails to write NIC id to - /sys/class/net/bond0/bonding/ - slaves it is already written. + /sys/class/net/bond0/bonding + /slaves then NIC gets up + 4. Link down the target NIC. + 5. Fails to write NIC id to + /sys/class/net/bond0/bonding/ + slaves it is already written. !!! ##################################################################### #### My setup: root@provisioned:~# cat /etc/modprobe.d/bonding.conf alias bond0 bonding options bonding mode=1 arp_interval=2000 Both, /etc/init.d/networking and upstart network-interface begin enabled. #### Beginning: root@provisioned:~# cat /etc/network/interfaces # /etc/network/interfaces auto lo iface lo inet loopback auto eth0 iface eth0 inet dhcp I'm able to boot with both scripts (networking and network-interface - enabled) with no problem. I can also boot with only "networking" + enabled) with no problem. I can also boot with only "networking" script enabled: --- root@provisioned:~# initctl list | grep network network-interface stop/waiting networking start/running --- OR only the script "network-interface" enabled: --- root@provisioned:~# initctl list | grep network network-interface (eth2) start/running network-interface (lo) start/running network-interface (eth0) start/running network-interface (eth1) start/running --- #### Enabling bonding: Following ifenslave configuration example (/usr/share/doc/ifenslave/ - examples/two_hotplug_ethernet), my /etc/network/interfaces has to + examples/two_hotplug_ethernet), my /etc/network/interfaces has to look like this: --- auto eth1 iface eth1 inet manual - bond-master bond0 + bond-master bond0 auto eth2 iface eth2 inet manual - bond-master bond0 + bond-master bond0 auto bond0 iface bond0 inet static - bond-mode 1 - bond-miimon 100 - bond-primary eth1 eth2 - address 192.168.169.1 - netmask 255.255.255.0 - broadcast 192.168.169.255 + bond-mode 1 + bond-miimon 100 + bond-primary eth1 eth2 + address 192.168.169.1 + netmask 255.255.255.0 + broadcast 192.168.169.255 --- Having both scripts running does not make any difference since we are missing "bond-slaves" keyword on slave interfaces, for ifenslave to work, and they are set to "manual". Ifenslave code: """ for slave in $BOND_SLAVES ; do ... # Ensure $slave is down. ip link set "$slave" down 2>/dev/null if ! sysfs_add slaves "$slave" 2>/dev/null ; then - echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER - ready and a bonding interface ?" >&2 + echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER + ready and a bonding interface ?" >&2 else - # Bring up slave if it is the target of an allow-bondX stanza. - # This is usefull to bring up slaves that need extra setup. - if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" - --list | grep -q $slave; then - ifup $v --allow "$BOND_MASTER" "$slave" - fi + # Bring up slave if it is the target of an allow-bondX stanza. + # This is usefull to bring up slaves that need extra setup. + if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" + --list | grep -q $slave; then + ifup $v --allow "$BOND_MASTER" "$slave" + fi """ Without the keyword "bond-slaves" on the master interface declaration, - ifenslave will NOT bring any slave interface up on the "master" - interface ifup invocation. + ifenslave will NOT bring any slave interface up on the "master" + interface ifup invocation. *********** Part 1 So, having networking sysv init script AND upstart network-interface script running together... the following example works: --- root@provisioned:~# cat /etc/network/interfaces # /etc/network/interfaces auto lo iface lo inet loopback auto eth0 iface eth0 inet dhcp auto eth1 iface eth1 inet manual - bond-master bond0 + bond-master bond0 auto eth2 iface eth2 inet manual - bond-master bond0 + bond-master bond0 auto bond0 iface bond0 inet static - bond-mode 1 - bond-miimon 100 - bond-primary eth1 - bond-slaves eth1 eth2 - address 192.168.169.1 - netmask 255.255.255.0 - broadcast 192.168.169.255 - --- - - Ifenslave script sets link down to all slave interfaces, declared by - "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave - script ONLY tries to make a reentrant call to ifupdown if the slave - interfaces have "allow-bondX" stanza (not our case). + bond-mode 1 + bond-miimon 100 + bond-primary eth1 + bond-slaves eth1 eth2 + address 192.168.169.1 + netmask 255.255.255.0 + broadcast 192.168.169.255 + --- + + Ifenslave script sets link down to all slave interfaces, declared by + "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave + script ONLY tries to make a reentrant call to ifupdown if the slave + interfaces have "allow-bondX" stanza (not our case). So this should not work, since when the master bonding interface (bond0) is called, ifenslave does not configure slaves without "allow-bondX" stanza. What is happening, why is it working ? If we disable upstart "network-interface" script.. our bonding stops - to work on the boot. This is because upstart was the one setting + to work on the boot. This is because upstart was the one setting the slave interfaces up (with the configuration above) and not - sysv networking scripts. - - It is clear that ifenslave from sysv script invocation can set the - slave interface down anytime (even during upstart script execution) + sysv networking scripts. + + It is clear that ifenslave from sysv script invocation can set the + slave interface down anytime (even during upstart script execution) so it might work and might not: """ ip link set "$slave" down 2>/dev/null """ root@provisioned:~# initctl list | grep network-interface network-interface (eth2) start/running network-interface (lo) start/running network-interface (bond0) start/running network-interface (eth0) start/running network-interface (eth1) start/running - Since having the interface down is a requirement to slave it, - running both scripts together (upstart and sysv) could create a + Since having the interface down is a requirement to slave it, + running both scripts together (upstart and sysv) could create a situation where upstart puts slave interface online but ifenslave from sysv script puts it down and never bring it up again (because - it does not have "allow-bondX" stanza). + it does not have "allow-bondX" stanza). *********** Part 2 What if I disable upstart "network-interface", stay only with the sysv - script but introduce the "allow-bondX" stanza to slave interfaces ? + script but introduce the "allow-bondX" stanza to slave interfaces ? The funny part begins... without upstart, the ifupdown tool calls ifenslave, for bond0 interface, and ifenslave calls this line: """ for slave in $BOND_SLAVES ; do ... - if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" - --list | grep -q $slave; then - ifup $v --allow "$BOND_MASTER" "$slave" - fi + if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" + --list | grep -q $slave; then + ifup $v --allow "$BOND_MASTER" "$slave" + fi """ But ifenslave stays waiting for the bond0 interface to be online forever. We do have a chicken egg situation now: - * ifupdown trys to put bond0 interface online. + * ifupdown trys to put bond0 interface online. * we are not running upstart network-interface script. * ifupdown for bond0 calls ifenslave. * ifenslave tries to find interfaces with "allow-bondX" stanza * ifenslave tries to ifup slave interfaces with that stanza * slave interfaces keep forever waiting for the master * master is waiting for the slave interface * slave interface is waiting for the master interface ... :D And we have an infinite loop for ifenslave: - """ + """ # Wait for the master to be ready - [ ! -f /run/network/ifenslave.$BOND_MASTER ] && - echo "Waiting for bond master $BOND_MASTER to be ready" + [ ! -f /run/network/ifenslave.$BOND_MASTER ] && + echo "Waiting for bond master $BOND_MASTER to be ready" while :; do - if [ -f /run/network/ifenslave.$BOND_MASTER ]; then - break - fi - sleep 0.1 + if [ -f /run/network/ifenslave.$BOND_MASTER ]; then + break + fi + sleep 0.1 done """ *********** Conclusion That can be achieved if correct triggers are set (like the ones I just - showed). Not having ifupdown parallel executions (sysv and upstart, + showed). Not having ifupdown parallel executions (sysv and upstart, for example) can make an infinite loop to happen during the boot. Having parallel ifupdown executions can trigger race conditions between: 1) ifupdown itself (case a on the bug description). 2) ifupdown and ifenslave script (case b on the bug description). ** Patch added: "wily_ifupdown_0.7.54ubuntu2.debdiff" https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+attachment/4501797/+files/wily_ifupdown_0.7.54ubuntu2.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1337873 Title: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
