My analysis so far of the problem:

1. container A has an outstanding TCP connection (thus, a socket and dst
which hold reference on the "lo" interface from the container). When the
container is stopped, the TCP connection takes ~2 minutes to timeout
(with default settings).

2. when container A is being removed, its net namespace is removed via
net/core/net_namespace.c function cleanup_net(). This takes two locks -
first, the net_mutex, and then the rtnl mutex (i.e. rtnl_lock). It then
cleans up the net ns and calls rtnl_unlock(). However, rtnl_unlock()
internally waits for all the namespace's interfaces to be freed, which
requires all their references to reach 0. This must wait until the above
TCP connection times out, before it releases its reference. So, at this
point the container is hung inside rtnl_unlock() waiting, and it still
holds the net_mutex. It doesn't release the net_mutex until its lo
interface finally is destroyed after the TCP connection times out.

3. When a new container is started, part of its startup is to call
net/core/net_namespace.c function copy_net_ns() to copy the caller's net
namespace to the new container net namespace. However, this function
locks the net_mutex. Since the previous container still is holding the
net_mutex as explaned in #2 above, this new container creation blocks
until #2 releases the mutex.


There are a few ways to possibly address this: 

a) when a container is being removed, all its TCP connections should
abort themselves. Currently, TCP connections don't directly register for
interface unregister events - they explictly want to stick around, so if
an interface is taken down and then brought back up, the TCP connection
remains, and the communication riding on top of the connection isn't
interrupted. However, the way TCP does this is to move all its dst
references off the interface that is unregistering, to the loopback
interface. This works for the initial network namespace, where the
loopback interface is always present and never removed. However, this
doesn't work for non-default network namespaces - like containers -
where the loopback interface is unregistered when the container is being
removed. So this aspect of TCP may need to change, to correctly handle
containers.  This also may not cover all causes of this hang, since
sockets handle more than just tcp connections.

b) when a container is being removed, instead of holding the net_mutex
while the cleaning up (and calling rtnl_unlock), it could release the
net_mutex first (after removing all netns marked for cleanup from the
pernet list), then call rtnl_unlock. This needs examination to make sure
it would not introduce any races or other problems.

c) rtnl_unlock could be simplified - currently, it includes significant
side-effects, which include the long delay while waiting to actually
remove all references to the namespace's interfaces (including
loopback). Instead of blocking rtnl_unlock() to do all this cleanup, the
cleanup could be deferred. This also would need investigation to make
sure no caller is expecting to be able to free resources that may be
accessed later from the cleanup action (which I believe is the case).

As this is a complex problem there are likely other options to fix it as
well.  This issue also has been around ever since namespaces were
introduced to the kernel, as far as I can tell, but it's not a commonly
seen issue because in most cases socket connections are shut down before
stopping the container.  The socket connection causing the problem here
is a kernel socket, which is different than normal userspace-created
sockets; that may make a difference though I haven't investigated that
angle yet.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to