On Tue, May 9, 2017 at 10:32 AM, Dimitri John Ledkov <[email protected] > wrote:
> Maybe this is easier to read. > > root@xnox-iad-nr5:~# journalctl -o short-monotonic -u [email protected] > -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09 > 15:22:34 UTC. -- > [ 6.740201] xnox-iad-nr5 systemd[1]: Started ifup for bond0. > [ 6.750333] xnox-iad-nr5 sh[1184]: Waiting for a slave to join bond0 > (will timeout after 60s) > [ 6.987241] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for > interface bond0 in pre-up phase > [ 6.987341] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for > parent interface bond0 in pre-up phase > [ 6.987425] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for > parent interface bond0 in pre-up phase > root@xnox-iad-nr5:~# journalctl -o short-monotonic -u > [email protected] > -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09 > 15:22:34 UTC. -- > [ 6.755723] xnox-iad-nr5 systemd[1]: Started ifup for bond0.101. > [ 6.757056] xnox-iad-nr5 sh[1286]: ifup: waiting for lock on > /run/network/ifstate.bond0 > [ 7.227572] xnox-iad-nr5 sh[1286]: Set name-type for VLAN subsystem. > Should be visible in /proc/net/vlan/config > root@xnox-iad-nr5:~# journalctl -o short-monotonic -u > [email protected] > -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09 > 15:22:34 UTC. -- > [ 6.760568] xnox-iad-nr5 systemd[1]: Started ifup for bond0.401. > [ 6.761920] xnox-iad-nr5 sh[1290]: ifup: waiting for lock on > /run/network/ifstate.bond0 > [ 7.197983] xnox-iad-nr5 sh[1290]: Set name-type for VLAN subsystem. > Should be visible in /proc/net/vlan/config > root@xnox-iad-nr5:~# journalctl -o short-monotonic -u networking.service > -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09 > 15:22:34 UTC. -- > [ 6.645323] xnox-iad-nr5 systemd[1]: Starting Raise network > interfaces... > [ 6.692530] xnox-iad-nr5 ifup[992]: Waiting for bonding kernel module > to be ready (will timeout after 5s) > [ 6.693104] xnox-iad-nr5 ifup[992]: Waiting for bond master bond0 to be > ready > [ 7.221867] xnox-iad-nr5 ifup[992]: /sbin/ifup: waiting for lock on > /run/network/ifstate.bond0 > [ 7.263179] xnox-iad-nr5 systemd[1]: Started Raise network interfaces. > > What is the status/timestamp of the physical devices (enspf90 and 91) ? Note that networking, [email protected], and [email protected] all hit locks that are held by bond0. > Note that [email protected] beats networking.service to acquire the lock. > hrm, I didn't think we got ifup events for non-physical devices; this is certainly problematic; the design, IIUC, is for physical devices to get [email protected] triggers via udev events; see the the BindsTo=sys-subsystem-net-devices-%i.device The result in this boot is that networking.service (which starts at 6.64 seconds calls ifup -a (pid 992) which is racing with the ifup@bond0 (started at 6.74, pid 1184); so we have *two* ifups locking bond0; I think this is the real deadlock. I'm not sure why we're getting an ifup@bond0 unit execution, the udev scripts are only triggered for physical devices; Looks like when the module is loaded (modprobe bonding) that we get an event for bond0 ... which may be triggering the hotplug udev-rule which queues a ifup@bond0 job... This is strange (to me): # modprobe -vr bonding rmmod bonding root@x1:~# ifconfig -a ens3 Link encap:Ethernet HWaddr 52:54:00:87:54:74 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:104 (104.0 B) TX bytes:184 (184.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:320 errors:0 dropped:0 overruns:0 frame:0 TX packets:320 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:23680 (23.6 KB) TX bytes:23680 (23.6 KB) root@x1:~# cat /etc/issue Ubuntu 16.04.2 LTS \n \l root@x1:~# lsb_release -rd Description: Ubuntu 16.04.2 LTS Release: 16.04 root@x1:~# ls -al /sys/class/net/ total 0 drwxr-xr-x 2 root root 0 May 9 16:36 . drwxr-xr-x 63 root root 0 May 9 16:35 .. lrwxrwxrwx 1 root root 0 May 9 16:35 ens3 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/ens3 lrwxrwxrwx 1 root root 0 May 9 16:35 lo -> ../../devices/virtual/net/lo root@x1:~# modprobe -v bonding insmod /lib/modules/4.4.0-75-generic/kernel/drivers/net/bonding/bonding.ko root@x1:~# ls -al /sys/class/net/ total 0 drwxr-xr-x 2 root root 0 May 9 16:36 . drwxr-xr-x 63 root root 0 May 9 16:36 .. lrwxrwxrwx 1 root root 0 May 9 16:36 bond0 -> ../../devices/virtual/net/bond0 -rw-r--r-- 1 root root 4096 May 9 16:36 bonding_masters lrwxrwxrwx 1 root root 0 May 9 16:36 ens3 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/ens3 lrwxrwxrwx 1 root root 0 May 9 16:36 lo -> ../../devices/virtual/net/lo root@x1:~# uname -r 4.4.0-75-generic Why would bond0 get created when loading the bonding module? @xnox, if you change your interfaces from bond0, to bond1 globally, does your configuration work ? > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1636708 > > Title: > ifup -a does not start dependants last, causes deadlocks with > vlans/bonding > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/ > 1636708/+subscriptions > -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1636708 Title: ifup -a does not start dependants last, causes deadlocks with vlans/bonding To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1636708/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
