[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-08-08 Thread Gonzalo Servat
Dan,

We had been running 80-100 instances on the hotfix kernel and I was
calling victory on the workaround as it has been a month without this
issue popping up, but I just had an instance lockup on, say, "docker ps"
and dmesg shows:

[2924480.806202] unregister_netdevice: waiting for vethd078a31 (netns 
81efab00) to become free. Usage count = 2
[2924490.870209] unregister_netdevice: waiting for vethd078a31 (netns 
81efab00) to become free. Usage count = 2
[2924501.006177] unregister_netdevice: waiting for vethd078a31 (netns 
81efab00) to become free. Usage count = 2
[2924511.134204] unregister_netdevice: waiting for vethd078a31 (netns 
81efab00) to become free. Usage count = 2
[2924521.218189] unregister_netdevice: waiting for vethd078a31 (netns 
81efab00) to become free. Usage count = 2

:-(

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-08-08 Thread Gonzalo Servat
Using:

linux-image-4.4.0-127-generic   4.4.0-127.153+hf1711407v20180524b3

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-08-05 Thread TomaszChmielewski
Also seeing it on Ubuntu 18.04 on AWS:

# uname -a
Linux uni09.sys.timedoctor.com 4.15.0-1016-aws #16-Ubuntu SMP Wed Jul 18 
09:20:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-08-01 Thread TomaszChmielewski
I'm hitting this bug frequently on AWS with Ubuntu 16.04.5 LTS:

Linux uni02.sys.timedoctor.com 4.4.0-1063-aws #72-Ubuntu SMP Fri Jul 13
07:23:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-06-11 Thread anuj
I'm using kernel version 4.9.73
4.9.73 #0 SMP PREEMPT Wed May 30 13:39:24 2018 aarch64 GNU/Linux
The problem reproduces for me as soon as I do ftp(vsftpd 3.0.3 server on DUT) 
to the DUT. After that it just hangs and keeps giving the msg every 10 secs 
waiting for lo to become free. Usage count=1.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-06-04 Thread Dan Streetman
> is it possibility that sk->sk_dst_cache is overwritten?

maybe ;-)

> like in __sk_dst_check,
> when tcp timer tries to resend a packet, at the same time, tcp_close is called

tcp_close() locks the sock; tcp_retransmit_timer() is also called with
the sock locked.  So they should not be operating on the same sock
concurrently.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-06-04 Thread Dan Streetman
> So they should not be operating on the same sock concurrently.

But, of course, all this is caused by bugs, so yes it's certainly
possible a bug allowed what you said to happen, but you'll need to dig
in deeper to find where such a bug is (if any), since the locking design
is correct as far as I can tell.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-30 Thread lirongqing
is it possibility that sk->sk_dst_cache is overwritten? like in __sk_dst_check,
when tcp timer tries to resend a packet, at the same time, tcp_close is called, 
and a reset packet will send, and ip_queue_xmit will be called concurrent; 

cpu 1   cpu 2   
tcp_close
   tcp_send_active_reset
  ip_queue_xmit
 __sk_dst_check {
dst = __sk_dst_get(sk);
 tcp timer fire
 tcp_retransmit_timer
   __tcp_retransmit_skb
  ip_queue_xmit
 sk_setup_caps
sk->sk_dst_cache is 
changed, like dst222 

RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
dst222 is leaked
dst_release(dst);

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-29 Thread lirongqing
I have the same issue, and I dump the vmcore, and find dst cache, hope it has 
some help.
this leaked dst is in dst_busy_list, except dst_busy_list, nowhere I can find 
it in
==
First case:

crash> rtable 0x880036fbba00 -x
struct rtable {
  dst = {
callback_head = {
  next = 0x880036fbb300, 
  func = 0x0
}, 
child = 0x0, 
dev = 0x8817f15c3000, 
ops = 0x8191f2c0 , 
_metrics = 0x815ef7d1, 
expires = 0x0, 
path = 0x880036fbba00, 
from = 0x0, 
xfrm = 0x0, 
input = 0x8149f777 , 
{
  output = 0x8149f762 , 
  __UNIQUE_ID_rh_kabi_hide29 = {
output = 0x8149f762 
  }, 
  {}
}, 
flags = 0x0, 
pending_confirm = 0x0, 
error = 0x0, 
obsolete = 0x2, 
header_len = 0x0, 
trailer_len = 0x0, 
tclassid = 0x0, 
__pad_to_align_refcnt = {0xe, 0x80060}, 
__refcnt = {
  counter = 0x1
}, 
__use = 0x0, 
lastuse = 0x1000860b0, 
{
  next = 0x882fdaaa9800, 
  rt_next = 0x882fdaaa9800, 
  rt6_next = 0x882fdaaa9800, 
  dn_next = 0x882fdaaa9800
}, 
rh_reserved1 = 0x2, 
rh_reserved2 = 0x85987845, 
rh_reserved3 = 0x0, 
rh_reserved4 = 0x0
  }, 
  rt_genid = 0x11, 
  rt_flags = 0x8000, 
  rt_type = 0x2, 
  rt_is_input = 0x1, 
  rt_uses_gateway = 0x0, 
  rt_iif = 0x0, 
  rt_gateway = 0x0, 
  rt_pmtu = 0x0, 
  rt_uncached = {
next = 0x880036fbbac0, 
prev = 0x880036fbbac0
  }
}
crash> 
=
seconds case:
crash> rtable  8807ec0d9a00 -x
struct rtable {
  dst = {
callback_head = {
  next = 0x8807ec0db900, 
  func = 0x8119e250 
}, 
child = 0x0, 
dev = 0x8817ed384800, 
ops = 0x81b18bc0 , 
_metrics = 0x817ca681, 
expires = 0x0, 
path = 0x8807ec0d9a00, 
from = 0x0, 
xfrm = 0x0, 
input = 0x8162b200 , 
{
  output = 0x8162b1e0 , 
  __UNIQUE_ID_rh_kabi_hide29 = {
output = 0x8162b1e0 
  }, 
  {}
}, 
flags = 0x6, 
pending_confirm = 0x0, 
error = 0x0, 
obsolete = 0x2, 
header_len = 0x0, 
trailer_len = 0x0, 
tclassid = 0x0, 
__pad_to_align_refcnt = {0x880f6df8fec0, 0x0}, 
__refcnt = {
  counter = 0x1
}, 
__use = 0x0, 
lastuse = 0x1fcce9056, 
{
  next = 0x880404f49400, 
  rt_next = 0x880404f49400, 
  rt6_next = 0x880404f49400, 
  dn_next = 0x880404f49400
}, 
rh_reserved1 = 0x0, 
rh_reserved2 = 0x0, 
rh_reserved3 = 0x0, 
rh_reserved4 = 0x0
  }, 
  rt_genid = 0x16, 
  rt_flags = 0x8000, 
  rt_type = 0x2, 
  rt_is_input = 0x0, 
  rt_uses_gateway = 0x0, 
  rt_iif = 0x0, 
  rt_gateway = 0x0, 
  rt_pmtu = 0x0, 
  rt_uncached = {
next = 0x8807ec0d9ac0, 
prev = 0x8807ec0d9ac0
  }
}
crash>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-27 Thread Gonzalo Servat
Many thanks, Dan! Going to roll it out to a few test boxes and try it
out over the next few days. Will report back as soon as I have some
feedback.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-25 Thread Dan Streetman
> We're using a newer kernel:
> 4.4.0-109-generic

ok, i backported the workaround to the xenial 4.4 kernel, and built it at the 
same ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

it's version 4.4.0-127.153+hf1711407v20180524b3

can you test with that and let me know if it works around the hangs?
Note that it is still expected to show errors in the logs when the
problem is reproduced; this workaround is strictly to avoid internal
hangs when the error happens, so you should be able to still
create/destroy containers (net namespaces).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-19 Thread Gonzalo Servat
Thanks for the ping, Dan. No worries. Looking forward to it as this issue
seems to be biting us more frequently nowadays so really interested to see
if this kernel helps.

On Sat, May 19, 2018 at 7:19 AM, Dan Streetman 
wrote:

> > Would you be able to build a test kernel against this version?
>
> very sorry, i've been quite busy the last couple weeks.  I'll get a test
> kernel built for you next week.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1711407
>
> Title:
>   unregister_netdevice: waiting for lo to become free
>
> Status in linux package in Ubuntu:
>   In Progress
> Status in linux source package in Trusty:
>   In Progress
> Status in linux source package in Xenial:
>   In Progress
> Status in linux source package in Zesty:
>   Won't Fix
> Status in linux source package in Artful:
>   In Progress
> Status in linux source package in Bionic:
>   In Progress
>
> Bug description:
>   This is a "continuation" of bug 1403152, as that bug has been marked
>   "fix released" and recent reports of failure may (or may not) be a new
>   bug.  Any further reports of the problem should please be reported
>   here instead of that bug.
>
>   --
>
>   [Impact]
>
>   When shutting down and starting containers the container network
>   namespace may experience a dst reference counting leak which results
>   in this message repeated in the logs:
>
>   unregister_netdevice: waiting for lo to become free. Usage count =
>   1
>
>   This can cause issues when trying to create net network namespace and
>   thus block a user from creating new containers.
>
>   [Test Case]
>
>   See comment 16, reproducer provided at https://github.com/fho/docker-
>   samba-loop
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1711407/+subscriptions
>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-18 Thread Dan Streetman
> Would you be able to build a test kernel against this version?

very sorry, i've been quite busy the last couple weeks.  I'll get a test
kernel built for you next week.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-05-06 Thread Gonzalo Servat
Hi Dan,

We're using a newer kernel:

4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64
x86_64 GNU/Linux

Would you be able to build a test kernel against this version?

Thanks very much, Dan.

On Tue, May 1, 2018 at 12:14 AM, Dan Streetman 
wrote:

> > The symptoms seen are a number of docker commands that just hang
> indefinitely (e.g.
> > docker ps, spinning up new containers). Occasionally, after a long time,
> we have
> > seen it come back and continue working OK (like whatever was holding the
> lock
> > released it finally), but most of the time it hangs indefinitely
>
> ok, great.  You didn't mention what release you're using on gcp, but I
> have a 4.13 test kernel in this ppa, with patches to specifically
> fix/workaround the hanging problem:
> https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407
>
> Note that *only* version 4.13.0-38.43+hf1711407v20180413b1 in that ppa
> has the patches to workaround the hanging.  If you need a different
> version, let me know and I can apply the patches and build for whatever
> release you need (just give me the 'uname -a' output).
>
> The details of the cause of this problem is back in comment 72, if you
> want that info.  I'm preparing to send the workaround patch upstream;
> after that, there will be (likely ongoing) work to debug the various
> dst/object leaks and kernel socket "leaks" that are actually causing the
> symptoms.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1711407
>
> Title:
>   unregister_netdevice: waiting for lo to become free
>
> Status in linux package in Ubuntu:
>   In Progress
> Status in linux source package in Trusty:
>   In Progress
> Status in linux source package in Xenial:
>   In Progress
> Status in linux source package in Zesty:
>   Won't Fix
> Status in linux source package in Artful:
>   In Progress
> Status in linux source package in Bionic:
>   In Progress
>
> Bug description:
>   This is a "continuation" of bug 1403152, as that bug has been marked
>   "fix released" and recent reports of failure may (or may not) be a new
>   bug.  Any further reports of the problem should please be reported
>   here instead of that bug.
>
>   --
>
>   [Impact]
>
>   When shutting down and starting containers the container network
>   namespace may experience a dst reference counting leak which results
>   in this message repeated in the logs:
>
>   unregister_netdevice: waiting for lo to become free. Usage count =
>   1
>
>   This can cause issues when trying to create net network namespace and
>   thus block a user from creating new containers.
>
>   [Test Case]
>
>   See comment 16, reproducer provided at https://github.com/fho/docker-
>   samba-loop
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1711407/+subscriptions
>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-30 Thread Dan Streetman
> The symptoms seen are a number of docker commands that just hang indefinitely 
> (e.g.
> docker ps, spinning up new containers). Occasionally, after a long time, we 
> have
> seen it come back and continue working OK (like whatever was holding the lock
> released it finally), but most of the time it hangs indefinitely

ok, great.  You didn't mention what release you're using on gcp, but I have a 
4.13 test kernel in this ppa, with patches to specifically fix/workaround the 
hanging problem:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

Note that *only* version 4.13.0-38.43+hf1711407v20180413b1 in that ppa
has the patches to workaround the hanging.  If you need a different
version, let me know and I can apply the patches and build for whatever
release you need (just give me the 'uname -a' output).

The details of the cause of this problem is back in comment 72, if you
want that info.  I'm preparing to send the workaround patch upstream;
after that, there will be (likely ongoing) work to debug the various
dst/object leaks and kernel socket "leaks" that are actually causing the
symptoms.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-30 Thread Gonzalo Servat
Hi Dan,

> There's actually been quite a lot of discussion/work in this bug since 
> comment 2, so I don't 
> actually need that info anymore (except for reproduction steps, that's always 
> welcome).

Fair enough. I wish I had some reproduction steps but unfortunately it
just happens randomly and not because of any one action.

> Are you having any problems with the system *other* than just the log
messages?

Apologies here. I should have mentioned how it affects us. The symptoms
seen are a number of docker commands that just hang indefinitely (e.g.
docker ps, spinning up new containers). Occasionally, after a long time,
we have seen it come back and continue working OK (like whatever was
holding the lock released it finally), but most of the time it hangs
indefinitely. On occasion we can also use the -n parameter (e.g. docker
ps -n #) and it may or may not return a partial listing of running
containers. We've also found (again, not always) that we might be able
to stop a container if we use the container name or ID. When we strace
the process, we just see:

write(5, "GET /v1.24/containers/json?limit"..., 97) = 97
futex(0xc8200fa590, FUTEX_WAKE, 1)  = 1
futex(0x21fce50, FUTEX_WAIT, 0, NULL

... and that's as far as it goes. I do have a Docker stacktrace if you
want to see that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-30 Thread Dan Streetman
Hi @gservat,

> To answer your questions:

There's actually been quite a lot of discussion/work in this bug since
comment 2, so I don't actually need that info anymore (except for
reproduction steps, that's always welcome).

> It just happens on its own after 1/2 weeks. The following shows in
dmesg:

Are you having any problems with the system *other* than just the log
messages?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-29 Thread Gonzalo Servat
Hi Dan,

We run a bunch of instances both on AWS and GCP, and we run a
significant number of containers on both. We've only ever seen this
problem on GCP and never on AWS (it's baffling!). The kernels are as-
close-as-possible and the rest (Ubuntu version / Docker version / etc)
are identical. To answer your questions:

-what release are you using (trusty/xenial/zesty)?

Xenial

-what kernel version are you using?

4.13.0-1008-gcp

-do you have specific steps to reproduce the problem?

Unfortunately we don't. It just happens on its own after 1/2 weeks. The
following shows in dmesg:

[1015401.681728] unregister_netdevice: waiting for veth251ecfe to become free. 
Usage count = 1
[1015411.761772] unregister_netdevice: waiting for veth251ecfe to become free. 
Usage count = 1
[1015421.841740] unregister_netdevice: waiting for veth251ecfe to become free. 
Usage count = 1
[1015431.953729] unregister_netdevice: waiting for veth251ecfe to become free. 
Usage count = 1

etc etc

We've tried a few things including upgrading kernel and disabling IPv6
to no avail.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-23 Thread Dan Streetman
> the CPU usage is zero, there is nothing popping up. The tasks causing the 
> high load
> value are in D state

ok that seems expected then.  This doesn't 'fix' the problem of the
leaked dst (and namespace), just prevents blocking further net namespace
creation/deletion.

> if I can't run out of some resources (kworker threads?)

there is a limit, yes, and once it's hit namespace cleanup will
essentially stop, and all further net namespaces will be leaked.
However, there's really just no way to avoid that without fixing the
actual dst leaks, which I'd still like to try to add debug to help
identify.

I've been busy elsewhere last week but I'll prepare the patch to send
upstream this week, and then look at adding debug to actually find the
dst leak.

Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-18 Thread Jiri Horky
Hi Dan,

the CPU usage is zero, there is nothing popping up. The tasks causing
the high load value are in D state. If I do "echo w > /proc/sysrq-
trigger", I will get for every leaked NS:

[392149.562095] kworker/u81:40  D0  4290  2 0x8000
[392149.562112] Workqueue: netns cleanup_net
[392149.562125] Call Trace:
[392149.562135]  __schedule+0x291/0x8a0
[392149.562146]  schedule+0x2c/0x80
[392149.562157]  schedule_timeout+0x182/0x360
[392149.562167]  ? call_timer_fn+0x130/0x130
[392149.562177]  msleep+0x2d/0x40
[392149.562188]  ? msleep+0x2d/0x40
[392149.562199]  netdev_run_todo+0x15e/0x5d0
[392149.562210]  rtnl_unlock+0xe/0x10
[392149.562221]  default_device_exit_batch+0x156/0x180
[392149.562233]  ? do_wait_intr_irq+0x90/0x90
[392149.562245]  cleanup_net+0x14e/0x2a0
[392149.562257]  process_one_work+0x1ec/0x410
[392149.562270]  worker_thread+0x32/0x410
[392149.562281]  kthread+0x128/0x140
[392149.562292]  ? process_one_work+0x410/0x410
[392149.562300]  ? kthread_create_on_node+0x70/0x70
[392149.562305]  ? do_syscall_64+0x67/0x130
[392149.562310]  ? SyS_exit_group+0x14/0x20
[392149.562316]  ret_from_fork+0x35/0x40


I am not sure how fine is to have e.g. thousands of such leaked namespaces - if 
I can't run out of some resources (kworker threads?). I am going to keep the 
machine running + generate even higher load there to see if I hit any such 
limit.

Thanks
JH

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-16 Thread Dan Streetman
> The machine continue to run and the docker is still able to create new 
> instances,
> i.e. there is no complete lock out.

Great!

> The machine continued to run further, and after an ~hour , 3 namespaces 
> leaked.
> The machine now has load "3", which seems to be an artifact of this. Other 
> than that,
> it seems to continue to run!

Hmm, can you check 'top' or 'htop' or similar to see if the kworker
thread(s) are actually using cpu time?  They should be sleeping almost
all the time, not spinning on anything.

I'll work on getting that patch upstream (but likely not nearly as
verbose, probably just a single WARNING for leaked namespace).

After that, I'll keep working on adding dst leak debugging to actually
try to find where your dst leak is coming from.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-14 Thread Jiri Horky
Hi Dan,

so here are the information for the latest kernel
(4.13.0-38.43+hf1711407v20180413b1):


The bug got reproduced:

[ 1085.626663] unregister_netdevice: possible namespace leak (netns 
8d2dc3d34800), have been waiting for 60 seconds
[ 1085.626696]  (netns 8d2dc3d34800): dst 8d2dd006db00 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0x9c0

If I grep the netns: 8d2dc3d34800, it is visible that during first
10 minutes, it outputs more info, then until ~ 1 hour, it says the same
message more frequently and then only outputs it ever hour.

[  901.854919] net_namespace: alloced net 8d2dc3d34800
[ 1085.626663] unregister_netdevice: possible namespace leak (netns 
8d2dc3d34800), have been waiting for 60 seconds
[ 1085.626696]  (netns 8d2dc3d34800): dst 8d2dd006db00 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0x9c0
[ 1095.706659] unregister_netdevice: possible namespace leak (netns 
8d2dc3d34800), have been waiting for 70 seconds
[ 1095.706692]  (netns 8d2dc3d34800): dst 8d2dd006db00 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0x9c0
[ 1105.786651] unregister_netdevice: possible namespace leak (netns 
8d2dc3d34800), have been waiting for 80 seconds
[ 1105.786669]  (netns 8d2dc3d34800): dst 8d2dd006db00 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0x9c0
[ 1115.866654] unregister_netdevice: possible namespace leak (netns 
8d2dc3d34800), have been waiting for 90 seconds

[ 1631.003906] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 605 seconds
[ 1641.147950] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 616 seconds
[ 1651.291981] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 626 seconds
[ 1661.372025] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 636 seconds
[ 1671.452071] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 646 seconds
[ 1681.564110] unregister_netdevice: likely namespace leak (netns 
8d2dc3d34800), have been waiting for 656 seconds
...

[ 4628.301622] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 3603 seconds
[ 8252.383668] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 7227 seconds
[11877.333159] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 10852 seconds
[15502.281254] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 14477 seconds
[19127.230600] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 18102 seconds
[22752.179625] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 21727 seconds
[26377.129256] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 25352 seconds
[30002.078342] unregister_netdevice: namespace leak (netns 8d2dc3d34800), 
have been waiting for 28977 seconds

The machine continue to run and the docker is still able to create new
instances, i.e. there is no complete lock out.

The machine continued to run further, and after an ~hour , 3 namespaces
leaked. The machine now has load "3", which seems to be an artifact of
this. Other than that, it seems to continue to run!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-13 Thread Dan Streetman
> I have just reproduced it using the previous #43+hf1711407v20180412b2"
kernel:

Excellent.  And just to clarify again, I do *not* expect the kernel to
fix/avoid the problem - I fully expect it to happen - the change in
these kernels is strictly to make sure that the system is still usable
after the problem happens.  Can you verify that after reproducing?  I
assume that before, when you reproduced the problem, you were no longer
able to create new (or destroy old) containers (i.e., new net
namespaces) - that should now be fixed, and you should be able to keep
creating new containers with no issues.

> The seconds seems to be rather milliseconds.

yep sorry, kernel building now should fix that to print seconds instead
of ms.

> Waiting here for the newest kernel and will try it right away.

thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-13 Thread Jiri Horky
Hi there, great to hear the progress!

I have just reproduced it using the previous #43+hf1711407v20180412b2"
kernel:

Apr 13 19:12:14 prg41-004 kernel: [13621.398319] unregister_netdevice: likely 
namespace leak (netns 8c7a056f), have been waiting for 1973452 seconds
Apr 13 19:12:24 prg41-004 kernel: [13631.478266] unregister_netdevice: likely 
namespace leak (netns 8c7a056f), have been waiting for 1983532 seconds

The seconds seems to be rather milliseconds.

Waiting here for the newest kernel and will try it right away.

Jiri Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-13 Thread Dan Streetman
Jiri,

I just added a new artful version to the ppa, version
4.13.0-38.43+hf1711407v20180413b1; it's building right now.  It should
avoid the hang just like the last version from yesterday; but the last
one blocked freeing any future net namespaces after one hangs; this one
fixes that so even after a net namespace hangs, future ones will
correctly clean up and get freed.

I also added debug to this new build that will show each net namespace
alloc and free, like:

[  207.625857] net_namespace: alloced net 8d2eb3cae000
...
[  208.152882] net_namespace: freeing net 8d2eb3cae000

and it has the same dbg during reproduction of this bug, where it hangs,
like:

[ 1363.377355] unregister_netdevice: possible namespace leak (netns 
96ddf1c1), have been waiting for 131032 seconds
[ 1363.377393]  (netns 96ddf1c1): open sk 96ddf1e77380 family 2 
state 4 dst_cache   (null) rx_dst 96ddf532d800 creator 
generic_ip_connect+0x257/0x4f0 [cifs]
[ 1363.377406]  (netns 96ddf1c1): dst 96ddf532d800 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_input_rcu+0x6b3/0xd80


If you're able to test with today's build, let me know if you are able to 
reproduce the problem (see the 'namespace leak' message) and if it has worked 
around the container hang correctly.  Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-13 Thread Dan Streetman
Just to re-summarize this:

The problem here is when a net namespace is being cleaned up, the thread
cleaning it up gets hung inside net/core/dev.c netdev_wait_allrefs()
function.  This is called every time anything calls rtnl_unlock(), which
happens a lot, including during net namespace cleanup.  This function
waits for all netdev device references to be released.

The thing(s) holding dev references here are dst objects from sockets
(at least, that's what is failing to release the dev refs, in my
debugging).  When a net namespace is cleaned up, all its sockets' call
dst_dev_put() for their dsts, which moves the netdev refs from the
actual net namespace interfaces to the netns loopback device (this is
why the message is typically 'waiting for lo to become free', but
sometimes it waits for non-lo to become free if the dst_dev_put hasn't
been called for a dst).

The dsts are then put, which should reduce their ref count to 0 and free
them, which then also reduces the lo device's ref count.  In correct
operation, this reduces the lo device's ref count to 0, which allows
netdev_wait_allrefs() to complete.

This is where the problem arrives:


1. the first possibility a kernel socket that hasn't closed, and still has 
dst(s).  This is possible because while normal (user) sockets take a reference 
to their net namespace, and the net namespace does not begin cleanup until all 
its references are released (meaning, no user sockets are still open), kernel 
sockets do not take a reference to their net namespace.  So, net namespace 
cleanup begins while kernel sockets are still open.  While the net namespace is 
cleaning up, it ifdown's all its interfaces, which cause the kernel sockets to 
close, and free their dsts which releases all netdev interface refs, which 
allows the net namespace to close.  If, however, one of the kernel sockets 
doesn't pay attention to the net namespace cleaning up, it may hang around, 
failing to free its dsts, and failing to release the netdev interface 
reference(s).  That causes this bug.  In some cases (maybe most or all, I'm not 
sure) the socket eventually times out, and does release its dsts, allowing the 
net namespace to finish cleaning up and exiting.

My recent patch 4ee806d51176ba7b8ff1efd81f271d7252e03a1d fixes *one*
case of this happening - a TCP kernel socket remaining open, trying in
vain to complete its TCP FIN sequence (which will never complete since
all net namespace interfaces are down and will never come back up, since
the net namespace is cleaning up).  The TCP socket eventually would time
out the FIN sequence failure, and close, allowing the net namespace to
finish cleaning up, but it takes ~2 minutes or so.  There very well
could be more cases of this happening with other kernel sockets.


2. alternately, even if all kernel sockets behave correctly and cleanup 
immediately during net namespace cleanup, it's possible for a dst object to 
leak a reference.  This has happened before, there are likely existing dst 
leaks, and there will likely be dst leaks accidentally introduced in the 
future.  When a dst leaks, its refcount never reaches 0, and so it never 
releases its reference on its interface (which remember, is moved to the net 
namespace loopback interface when its socket is closing).  Unfortunately, a dst 
that has leaked a reference will never free itself, and will never release its 
reference to the loopback interface.  Thus, netdev_wait_allrefs() will hang 
forever.


Actually fixing either cause above is the correct thing to do, of course, but 
that's a long-term thing, as even if all causes currently in the kernel are 
fixed, more leaks probably will be accidentally introduced in the future.  And 
while mem leaks are not good, the more serious problem caused by this is that 
the netdev_wait_allrefs() hang happens while the thread is holding the global 
net mutex - that prevents the creation of any new net namespaces, and prevents 
the cleanup of any existing net namespaces (it also blocks 
registering/unregistering any pernet subsystem or pernet device).  This can 
render the entire system unusable, and requires a reboot to correct.


Starting from the assumption that it's unsafe to ever ignore the interface's 
refcount, and just free it anyway after some time period, I have to allow that 
when a dst leaks, its net namespace will also have to leak, since the dst will 
always hold a reference to the net namespace loopback (or other) interface.  
Given that, what we need to do is move the call to netdev_wait_allrefs() that 
is hanging, to outside of the net mutex locked section.  I'm working on a patch 
to do just that currently.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-12 Thread Dan Streetman
Hi Jiri,

no the "Won't Fix" is only for the Zesty release, as that's EOL now.

I actually have a test kernel building in the ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407/+packages

it's version 4.13.0-38.43+hf1711407v20180412b2; it hasn't finished
building yet, probably will take another ~4 hours or so (building the
kernel in ppas is sloow)

this patch isn't quite finished, I still have to redo the netns cleanup
workqueue because the cleanup work can now hang (indefinitely) so it
can't be a single-threaded workqueue with the same work queued.  But, it
should at least verify if it works around the container create/destroy
hangs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-12 Thread Jiri Horky
Hi Dan,

how should I read that it got to "Won't fix" state? No more time to
debug it?

Thanks
Jirka H.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-12 Thread Dan Streetman
** Changed in: linux (Ubuntu Zesty)
   Status: In Progress => Won't Fix

** Changed in: linux (Ubuntu Zesty)
 Assignee: Dan Streetman (ddstreet) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-02 Thread Dan Streetman
Jiri,

I've looked at several possible ways to just work around the main issue,
of hanging netns creation/destruction, and just allow the netns to leak
when a dst leaks.  However, none of the approaches I tried were usable.
I'm going to keep looking at it this week and I'll let you know when I
have antoher debug and/or test kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-04-01 Thread Jiri Horky
Hi Dan,

any luck in preparing anothe kernel with more debugging info? Looking
forward to try another one.

Thanks for your effort!

Jiri Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-03-22 Thread Dan Streetman
> Hi, might be useful to find root-cause

"root-cause" for this is a dst object leak, and there have been many
kernel dst leak patches, there almost certainly are still dst leak(s) in
the kernel, and more dst leaks will be accidentally added later.
Unfortunately, the dst leaks all lead to an identical end, with the
'waiting for...' error message printing in a loop, and blocking creation
of new net namespaces.  So, there is no single "root cause" for this.
Disabling ipv6 may help in your situation, and may help others as well,
but will not work for everyone.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-03-22 Thread Ham Tum
Hi, might be useful to find root-cause: we kept running into this Kernel bug 
triggered by docker somehow until deactivating IPv6. I have not seen it since 
then.
Regards, Hamza

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-03-10 Thread Jiri Horky
Hi Dan, thanks for commenting. I started to lose hope already ;) Looking
forward for next kernel, will test it right away.

Jiri Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-03-07 Thread Dan Streetman
Sorry for the long delay - I haven't forgot about this bug, but I have
to think about how best to add debug to track the dst leak(s) in a
generic way.  I'll try to get back to this bug sometime this or next
week.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-13 Thread Jiri Horky
Hi again,

so for the second kernel (), was able to reproduce it in ~same time. The
messages are:

Feb 13 23:04:23 prg41-004 kernel: [  650.285711] unregister_netdevice: waiting 
for lo (netns 943cfe8ce000) to become free. Usage count = 1
Feb 13 23:04:23 prg41-004 kernel: [  650.285736](netns 
943cfe8ce000): dst 943d18ac4200 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20

root@prg41-004:~# grep 943d18ac4200 /var/log/syslog
Feb 13 23:02:14 prg41-004 kernel: [  521.458077] dst_init: dst 943d18ac4200 
refcnt 1 from __dst_alloc+0x7e/0x100
Feb 13 23:02:14 prg41-004 kernel: [  521.458085] dst_hold: dst 943d18ac4200 
refcnt 2 from rt_cache_route+0x54/0xe0
Feb 13 23:02:14 prg41-004 kernel: [  521.634088] dst_hold_safe: dst 
943d18ac4200 refcnt 3 from ip_route_output_key_hash_rcu+0x8a1/0xa20
Feb 13 23:02:14 prg41-004 kernel: [  521.778104] dst_release_immediate: dst 
943d18ac4200 refcnt 2 from free_fib_info_rcu+0x146/0x1c0
Feb 13 23:04:23 prg41-004 kernel: [  650.285736](netns 
943cfe8ce000): dst 943d18ac4200 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 23:04:33 prg41-004 kernel: [  660.525715](netns 
943cfe8ce000): dst 943d18ac4200 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 23:04:43 prg41-004 kernel: [  670.669641](netns 
943cfe8ce000): dst 943d18ac4200 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20

etc.

I hope it helps!

Regards
Jirka H.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-13 Thread Jiri Horky
Forgot to paste the second kernel version. The previous message was
obtained using 4.13.13-00941-gf382397cf315 kernel (the one with
ipsec/xfrm dst leak patch).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-13 Thread Jiri Horky
Hi Dan,

so for the first kernel (linux-
image-4.13.13-00940-gf16e2bbbddee_4.13.13-00940-gf16e2bbbddee-
24_amd64.deb), it failed after some 10 minutes. The outputs are:

Feb 13 22:35:54 prg41-004 kernel: [  736.399342] unregister_netdevice: waiting 
for lo (netns 9adbfbebb000) to become free. Usage count = 1
Feb 13 22:35:54 prg41-004 kernel: [  736.399376](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20

And then:
root@prg41-004:~# grep 9ae3fee51f00 /var/log/syslog
Feb 13 22:33:33 prg41-004 kernel: [  595.493003] dst_init: dst 9ae3fee51f00 
refcnt 1 from __dst_alloc+0x7e/0x100
Feb 13 22:33:33 prg41-004 kernel: [  595.493009] dst_hold: dst 9ae3fee51f00 
refcnt 2 from rt_cache_route+0x54/0xe0
Feb 13 22:33:33 prg41-004 kernel: [  595.648996] dst_hold_safe: dst 
9ae3fee51f00 refcnt 3 from ip_route_output_key_hash_rcu+0x8a1/0xa20
Feb 13 22:33:33 prg41-004 kernel: [  595.689067] dst_release_immediate: dst 
9ae3fee51f00 refcnt 2 from free_fib_info_rcu+0x146/0x1c0
Feb 13 22:35:54 prg41-004 kernel: [  736.399376](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:04 prg41-004 kernel: [  746.447322](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:14 prg41-004 kernel: [  756.527237](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:24 prg41-004 kernel: [  766.639159](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:34 prg41-004 kernel: [  776.815056](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:45 prg41-004 kernel: [  786.958967](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20
Feb 13 22:36:55 prg41-004 kernel: [  797.070875](netns 
9adbfbebb000): dst 9ae3fee51f00 expires 0 error 0 obsolete 2 flags 0 
refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0xa20

(and obviously more of such messages every 10 seconds)

BTW from the timing it seems that the docker image who produced this
error might be interrupted with timeout (normally, the dockers images we
run on the machine live for few seconds only).

Will post the second kernel soon.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-13 Thread Dan Streetman
Can you also test with this kernel please:
http://people.canonical.com/~ddstreet/lp1711407/linux-image-4.13.13-00941-gf382397cf315_4.13.13-00941-gf382397cf315-26_amd64.deb

It's the same as the last one, with a ipsec/xfrm dst leak patch.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-13 Thread Dan Streetman
Ok, here is a new debug kernel:
http://people.canonical.com/~ddstreet/lp1711407/linux-image-4.13.13-00940-gf16e2bbbddee_4.13.13-00940-gf16e2bbbddee-24_amd64.deb

I didn't use the PPA as this one will dump out quite a lot of debug in
the logs all the time.  When you reproduce it (which hopefully you can
still do, even with the added debug msgs), please check the output to
find the address of the leaked dst, e.g.:

[ 2833.421135]  (netns 9f4d77278000): dst 9f4d743e9a00 expires 0
error 0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator
ip_route_input_rcu+0x6b0/0xd80

would be dst address "9f4d743e9a00".  Then, please grep dmesg for
that, so you get output similar to:

$ dmesg | grep 9f4d743e9a00
[ 2823.048309] dst_init: dst 9f4d743e9a00 refcnt 1 from 
__dst_alloc+0x7e/0x100
[ 2823.048312] dst_hold: dst 9f4d743e9a00 refcnt 2 from 
rt_cache_route+0x54/0xe0
[ 2823.048327] dst_hold_safe: dst 9f4d743e9a00 refcnt 3 from 
inet_sk_rx_dst_set+0x7f/0xc0
[ 2823.048453] dst_release: dst 9f4d743e9a00 refcnt 2 from 
skb_release_head_state+0x96/0xa0
[ 2823.048572] dst_hold_safe: dst 9f4d743e9a00 refcnt 3 from 
tcp_add_backlog+0x12f/0x170
[ 2823.048580] dst_release: dst 9f4d743e9a00 refcnt 2 from 
skb_release_head_state+0x96/0xa0
[ 2823.231547] dst_release_immediate: dst 9f4d743e9a00 refcnt 1 from 
free_fib_info_rcu+0x173/0x1c0
[ 2833.421088]  (netns 9f4d77278000): open sk 9f4d766aa100 family 2 
state 4 dst_cache   (null) rx_dst 9f4d743e9a00 creator 
generic_ip_connect+0x24b/0x4f0 [cifs]
[ 2833.421135]  (netns 9f4d77278000): dst 9f4d743e9a00 expires 0 error 
0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_input_rcu+0x6b0/0xd80


Paste that output here, so I can see the history of the dst refcnt.  If you 
don't see the initial dst_init in the trace, try using /var/log/kern.log 
instead of dmesg, as dmesg may have rolled over the older logs if you have a 
lot of output going on.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-06 Thread Jiri Horky
Hi Dan,

Glad to help, I thank you for digging into that! Here is the latest
output:

[  912.489822] unregister_netdevice: waiting for lo (netns 8de5c10f3000) to 
become free. Usage count = 1
[  912.489927]  (netns 8de5c10f3000): dst 8de5ba0ced00 expires 0 error 
0 obsolete 2 flags 0 refcnt 2 ops ipv4_dst_ops+0x0/0xc0 creator 
ip_route_output_key_hash_rcu+0x355/0x9c0

Waiting for another image ;-)

Regards
Jiri Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-05 Thread Dan Streetman
Here is another Artful debug kernel, with just slightly more debug info; it may 
help a little while I work on adding more significant debug to track where the 
dst refcnt is getting leaked.  Same PPA:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

New kernel version is 4.13.0-32.35+hf1711407v20180205b1.

Thanks again for testing these debug kernels!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-02 Thread Dan Streetman
And we're on our way down the rabbit hole...

netns exit is blocked by reference from interface...
interface exit is blocked by reference from dst object...
dst object free is blocked by reference from ???...

I'll set up more dbg and have a new test kernel, probably by Monday.

Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-02 Thread Jiri Horky
Hi Dan,

here is the output:

Feb  2 18:50:14 prg41-004 kernel: [  482.151773] unregister_netdevice: waiting 
for lo (netns 8de11803e000) to become free. Usage count = 1
Feb  2 18:50:14 prg41-004 kernel: [  482.151876](netns 
8de11803e000): dst 8dd905360300 expires 0 error 0 obsolete 2 refcnt 1 
ops ipv4_dst_ops+0x0/0xc0 creator rt_dst_alloc+0x54/0xe0

It keeps to be output every 10s for over 500s now.

I hope it helps.

Jirka Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-02-02 Thread Dan Streetman
Hi Jirka,

I have a new debug kernel building (will take a few hours as usual);
please test when you have a chance.  New kernel is still for Artful and
version is linux-4.13.0-32.35+hf1711407v20180202b1.

https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-31 Thread Jiri Horky
Hi Dan,

to answer your questions:
1) Yes, it was full output before next "waiting for..." message
2) If I leave the box running (and stops generating the load), it actually does 
not recover itself at all. It keeps outputing the "waiting for..." forever with 
the same device:

# dmesg | grep waiti
[35684.449735] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35694.529742] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35704.609742] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35714.689748] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35724.769751] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35734.849759] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35744.929763] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35755.009758] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35765.089759] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35775.169756] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35785.249761] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35795.329763] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35805.409758] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35815.489753] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35825.569749] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35835.649747] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35845.729752] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35855.809741] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35865.889736] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35875.969733] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35886.049726] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1
[35896.129717] unregister_netdevice: waiting for lo (netns 88cd026e) to 
become free. Usage count = 1

I am happy to continue testing with a new kernel.
Cheers
Jirka H.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-31 Thread Dan Streetman
> Here is the output of the kernel when the "unregister_netdevice" bug
was hit

Thanks!  And just to verify, that's the full output of all those lines
before the next "waiting..." line, right?  If so, it looks like the
problem in your case is not a kernel socket holding the netns open, it
looks like a good-old dst leak, or something else holding a dst, so I'll
need to add debug tracking those now...

Also, when you reproduce the problem, if you leave it for a while (like,
5 minutes or so), does it resolve itself?  Or stay hung and printing the
"waiting..." messages forever?  Make sure to check the netns addr in the
"waiting..." message to verify it's stuck waiting for the *same* netns
to close forever, not just reproducing the problem over and over with
different netns.

I'll get another kernel ready with additional dst debug.  I think there
is at least one dst leak patch upstream too, I'll check for those.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-31 Thread Jiri Horky
Hi Dan,

Here is the output of the kernel when the "unregister_netdevice" bug was
hit for the first time on the reproducer machine.

I hope it helps. Let me know if I can provide you more debug output or
try to reproduce it once more.

Thanks
Jiri Horky

Jan 31 08:42:26 prg41-004 kernel: [  663.049461] unregister_netdevice: waiting 
for lo (netns 88cd026e) to become free. Usage count = 1
Jan 31 08:42:26 prg41-004 kernel: [  663.049477](netns 
88cd026e): open sk 88cd0120a000 family 16 state 7 dst_cache 
  (null) rx_dst   (null) creator __netlink_kernel_create+0x6c/0x260
Jan 31 08:42:26 prg41-004 kernel: [  663.049481](netns 
88cd026e): open sk 88cd0120d800 family 16 state 7 dst_cache 
  (null) rx_dst   (null) creator __netlink_kernel_create+0x6c/0x260
Jan 31 08:42:26 prg41-004 kernel: [  663.049486](netns 
88cd026e): open sk 88cd0120d000 family 16 state 7 dst_cache 
  (null) rx_dst   (null) creator __netlink_kernel_create+0x6c/0x260
Jan 31 08:42:26 prg41-004 kernel: [  663.049490](netns 
88cd026e): open sk 88cd0120c000 family 16 state 7 dst_cache 
  (null) rx_dst   (null) creator __netlink_kernel_create+0x6c/0x260
Jan 31 08:42:26 prg41-004 kernel: [  663.049495](netns 
88cd026e): open sk 88cd18dc2200 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049498](netns 
88cd026e): open sk 88cd10e58f00 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049501](netns 
88cd026e): open sk 88cd10e5da00 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049504](netns 
88cd026e): open sk 88cd10e5b0c0 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049506](netns 
88cd026e): open sk 88cd10e5c380 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049509](netns 
88cd026e): open sk 88cd10e5b480 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049512](netns 
88cd026e): open sk 88cd10e5a1c0 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049514](netns 
88cd026e): open sk 88cd10e5b840 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049517](netns 
88cd026e): open sk 88cd10e5cb00 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049520](netns 
88cd026e): open sk 88cd10e5f440 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049523](netns 
88cd026e): open sk 88cd10e58000 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049526](netns 
88cd026e): open sk 88cd10e5a580 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049529](netns 
88cd026e): open sk 88cd10e58b40 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049531](netns 
88cd026e): open sk 88cd10e59e00 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049534](netns 
88cd026e): open sk 88cd10e5bc00 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049537](netns 
88cd026e): open sk 88cd10e5c740 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  663.049539](netns 
88cd026e): open sk 88cd10e5d280 family 2 state 7 dst_cache  
 (null) rx_dst   (null) creator inet_ctl_sock_create+0x37/0x80
Jan 31 08:42:26 prg41-004 kernel: [  

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-30 Thread Dan Streetman
I uploaded a new debug kernel to the ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

I've only updated the 4.13 kernel currently, on Artful, so if you need a
Xenial HWE 4.13 kernel let me know and I can build that as well.  It
will likely take hours to finish building but should be ready tomorrow.

With the debug kernel, when you hit the bug, you should see additional
debug output, like:

[   87.028053] unregister_netdevice: waiting for lo (netns 9af471fa8000) to 
become free. Usage count = 1
[   87.031461]  (netns 9af471fa8000): open sk 9af479983000 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031464]  (netns 9af471fa8000): open sk 9af479982000 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031467]  (netns 9af471fa8000): open sk 9af479982800 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031470]  (netns 9af471fa8000): open sk 9af479985000 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031475]  (netns 9af471fa8000): open sk 9af476541540 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031479]  (netns 9af471fa8000): open sk 9af471096940 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031482]  (netns 9af471fa8000): open sk 9af471094f00 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031485]  (netns 9af471fa8000): open sk 9af4710970c0 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031488]  (netns 9af471fa8000): open sk 9af471094000 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031491]  (netns 9af471fa8000): open sk 9af4710961c0 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031494]  (netns 9af471fa8000): open sk 9af471095680 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031497]  (netns 9af471fa8000): open sk 9af471096580 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031501]  (netns 9af471fa8000): open sk 9af4710943c0 family 2 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031504]  (netns 9af471fa8000): open sk 9af479987000 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031507]  (netns 9af471fa8000): open sk 9af479981800 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031510]  (netns 9af471fa8000): open sk 9af475703f00 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031513]  (netns 9af471fa8000): open sk 9af475704380 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031516]  (netns 9af471fa8000): open sk 9af475707080 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031520]  (netns 9af471fa8000): open sk 9af475705580 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031523]  (netns 9af471fa8000): open sk 9af475703a80 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031526]  (netns 9af471fa8000): open sk 9af475702400 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031529]  (netns 9af471fa8000): open sk 9af475701f80 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031532]  (netns 9af471fa8000): open sk 9af475702880 family 10 
state 7 dst_cache   (null) rx_dst   (null) creator 
inet_ctl_sock_create+0x37/0x80
[   87.031535]  (netns 9af471fa8000): open sk 9af471fc0800 family 16 
state 7 dst_cache   (null) rx_dst   (null) creator 
__netlink_kernel_create+0x6c/0x260
[   87.031567]  (netns 9af471fa8000): open sk 9af473ffa100 family 2 
state 4 dst_cache   (null) rx_dst 9af471ec4a00 creator 
generic_ip_connect+0x24b/0x4f0 [cifs]


That will include all open 

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-29 Thread Jiri Horky
Ok, waiting for the kernel with dbg messages. I should be able to test
it right away.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-29 Thread Dan Streetman
> was able to get "unregister_netdevice: waiting for lo to become free. Usage 
> count = 1"
> in less than 5 minutes :-(

well, that's unfortunate that you still see it but fortunate that you're
able to reproduce it quickly.  The issue that I fixed - TCP kernel
socket staying open - is certainly not the only way to encounter the
problem.

> Our environment spawns about 1 container a second that does either openvpn 
> (both udp and tcp)
> or ipsec tunnel testing.

That's good to know; my guess would be ipsec, but of course will require
investigation.  I'll try reproducing separately with openvpn and ipsec
also.

> PLease let me know how to enable the debug messages

the previous builds did have some dbg messages, but I removed them for
the last build, so I'll need to add dbg again and spin new kernel
builds.  I'll update this bug once dbg kernel(s) are ready.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-28 Thread JiriHorky
Hi Dan,

first of all thanks for looking into this long lasting, irritating
issue.

I have tried your kernel (4.13.0-30-generic #33+hf1711407v20180118b2-Ubuntu 
SMP) on Ubuntu 17.10 and was able to get "unregister_netdevice: waiting for lo 
to become free. Usage count = 1" in less than 5 minutes :-( 
Our environment spawns about 1 container a second that does either openvpn 
(both udp and tcp) or ipsec tunnel testing. 
I don't see any debug output in the kernel log, the lines are just old 
ordinary: 

[  910.690908] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  920.806382] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  930.945869] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  941.025341] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  951.104797] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  961.184286] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[  971.263749] unregister_netdevice: waiting for lo to become free. Usage count 
= 1

I don't even see any "net_free:" or "net_alloc:" messages in the log
(and I checked the referrenced git and your recent patch and it does not
seem to be there either).

PLease let me know how to enable the debug messages, so we can finally
trace it down.

Thanks
Jiri Horky

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2018-01-19 Thread Dan Streetman
I've updated the patch and submitted upstream:
https://marc.info/?l=linux-netdev=151631015108584=2

The PPA also contains the latest ubuntu kernels patched for the issue, if 
anyone has time to test:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407

> How do we know when we hit the bug with your patched kernel?

I didn't include any output in the latest kernels to indicate this issue
had been detected and fixed.  If anyone is able to reproduce it with one
of the test kernels, I can provide a debug kernel to gather more details
of why it's still failing, though.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-21 Thread Daniel Graña
Hi Dan,

How do we know when we hit the bug with your patched kernel? is it still
logging "waiting for lo to become free" or is there any other signature
we can use to detect the original problem and still say "Wow! it works,
Dan rocks"

I have been scavenging the internet looking for your linux git fork and
the commits for this ticket without luck, nothing in your github fork
neither http://kernel.ubuntu.com/git/ nor https://launchpad.net/linux.
Is there a public place where we can take a look at the patch?

thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-21 Thread Dan Streetman
One note on the test kernels, I'm still updating the patch as the change
in the current test kernels will leave sockets open which leaks memory
due to the netns not fully closing/exiting.  I'll update this bug when I
have an updated patch and test kernels.

I would not recommend the current test kernels for long-term use on
systems where containers (or namespaces) are created/destroyed
regularly.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-21 Thread Stefan Rinkes
Looks good so far. Docker is still usable after 5000 runs, no "waiting
for lo to become free" in syslog.

Linux docker 4.4.0-104-generic #127+hf1711407v20171220b1-Ubuntu SMP Wed
Dec 20 12:31:13 UTC 201 x86_64 x86_64 x86_64 GNU/Linux

$ docker version
Client:
 Version:  17.09.1-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:Thu Dec  7 22:24:23 2017
 OS/Arch:  linux/amd64

Server:
 Version:  17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:Thu Dec  7 22:23:00 2017
 OS/Arch:  linux/amd64
 Experimental: false

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-20 Thread Stefan Rinkes
$ sudo add-apt-repository ppa:ddstreet/lp1711407
Cannot add PPA: 'ppa:~ddstreet/ubuntu/lp1711407'.
ERROR: '~ddstreet' user or team does not exist.

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-20 Thread Stefan Rinkes
nvm, forgot to configure https_proxy :)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-20 Thread Stefan Rinkes
Ok, waiting for the build to finish and then will test ASAP.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-20 Thread Dan Streetman
> the test kernel you were using had this one [1] included? It's
upstream commit f186ce61bb82

no that commit doesn't help, because the issue isn't one of dst refcount
leaks or delayed dst gc, the problem is kernel sockets that haven't
closed yet holding a reference to the netns lo interface device via
their dsts.

I do think I have a possible fix for this though, and I have some test kernels 
building in this ppa (please note they are currently building and will take a 
few more hours to finish):
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1711407


can anyone please test with any of those kernels (they all contain the patch to 
maybe fix this)?  In my testing it eliminated the problem completely, but I'd 
like to get some more diverse real-world testing before starting an upstream 
discussion on the patch.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-20 Thread Daniel Borkmann
Hi Dan,

the test kernel you were using had this one [1] included? It's upstream
commit f186ce61bb82 ("Fix an intermittent pr_emerg warning about lo
becoming free."). Meaning, it's a rather a dst leak somewhere in latest
upstream as well?

Thanks,
Daniel

  [1] https://patchwork.ozlabs.org/patch/773480/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-13 Thread Dan Streetman
> So if there is something to test or debug, I would be happy to help
fixing this issue.

yes, I'm hoping to have a kernel build with some added debug to
track/list exactly what socket(s) are still holding lo device references
when the container is killed/shutdown, so testing with that would help
clarify the specifics of this problem, once I have the debug kernel
ready and built.  I will update this bug once I have something ready to
test.

> Also already tried the latest kernel from http://kernel.ubuntu.com
/~kernel-ppa/mainline/:

and I assume that did not help, correct?  This problem still exists
upstream (assuming we're all encountering the same problem that I'm
debugging) and will require an upstream patch.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-13 Thread Stefan Rinkes
Hi,

In a test-setup I can easily reproduce this issue very quickly.
At my scenario it looks like the issue is just triggered if I change the IP 
inside the docker container
(Needed to test StrongSwan's MOBIKE feature), send some data (ICMP) and then 
kill the docker container.

So if there is something to test or debug, I would be happy to help
fixing this issue.

Also already tried the latest kernel from 
http://kernel.ubuntu.com/~kernel-ppa/mainline/:
Linux docker 4.15.0-041500rc3-generic #201712110230 SMP Mon Dec 11 02:31:13 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-07 Thread Dan Streetman
simpler reproducer for this using LXD instead of Docker:

1. install/setup LXD
$ sudo apt install lxd
$ sudo lxd init

2. create two containers
$ lxc launch ubuntu:xenial server
$ lxc launch ubuntu:xenial client

3. make the client privileged (so it can mount smb/cifs)
$ lxc config set client security.privileged true
$ lxc stop client
$ lxc start client

4. set up samba server in container
$ lxc exec server apt update
$ lxc exec server apt install cifs-utils samba
$ lxc exec server mkdir /dummy
$ lxc exec server touch /dummy/emptyfile
$ lxc exec server vim /etc/samba/smb.conf

add this section at the end of the container's smb.conf:

[dummy]
  path = /dummy
  browseable = yes
  read only = yes
  guest ok = yes

$ lxc stop server
$ lxc start server

5. note server container's ip
$ lxc list server

e.g.
| server | RUNNING | 10.109.201.68 (eth0) 

6. setup client container
$ lxc exec client mkdir /dummy
$ lxc exec apt update
$ lxc exec apt install cifs-utils

7. stop client container
$ lxc stop client

8. full cycle of client container start through stop
$ lxc start client ; sleep 2 ; lxc exec client -- mount -o guest 
//10.109.201.68/dummy /dummy ; lxc exec client -- ls -l /dummy ; lxc exec 
client umount /dummy ; lxc stop -f client

9. repeat client cycle; it should hang due to this bug
$ lxc start client ; sleep 2 ; lxc exec client -- mount -o guest 
//10.109.201.68/dummy /dummy ; lxc exec client -- ls -l /dummy ; lxc exec 
client umount /dummy ; lxc stop -f client


Note this force-stops the container, instead of normally stopping it; if the 
container is allowed to exit normally, the problem isn't reproduced (with these 
steps, at least).  Also note the 'sleep 2' is there to give enough time for the 
client container network to come up, it may need to be increased if your client 
container takes longer to assign its ip address (you can edit the client 
container nw config to use a static address if you want, which should be 
faster).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-12-07 Thread Dan Streetman
> In both of our reproducers, we added "ip route flush table all;

that won't help

> ifconfig down;

that's not a valid cmdline

> sleep 10" before existing from containers.

sleeping without closing existing open sockets doesn't help

what are you using for container mgmt?  docker?  lxc?  lxd?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-27 Thread JiriHorky
his information might be relevant.

We are able to reproduce the problem with unregister_netdevice: waiting
for lo to become free. Usage count = 1 with 4.14.0-rc3 kernel with
CONFIG_PREEMPT_NONE=y and running only on one CPU with following boot
kernel options:

BOOT_IMAGE=/boot/vmlinuz-4.14.0-rc3 root=/dev/mapper/vg0-root ro quiet 
vsyscall=emulate nosmp
Once we hit this state, it stays in this state and reboot is needed. No more 
containers can be spawned. We reproduce it by running images doing 
ipsec/openvpn connections + downloading a small file inside the tunnels. Then 
the instances exist (usually they run < 10s). We run 10s of such containers a 
minute on one machine. With the abovementioned settings (only 1cpu), the 
machine hits it in ~2 hours.

Another reproducer with the same kernel, but without limiting number of
CPUs, is to jus run iperf in UDP mode for 3 seconds inside the container
(so there is no TCP communication at all). If we run 10 of such
containers in parallel, wait for all of them to finish and do it again,
we hit the trouble in less than 10 minutes (on 40 cores machine).

In both of our reproducers, we added "ip route flush table all; ifconfig
down; sleep 10" before existing from containers. It does not seem to
have any effect.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-09 Thread Dan Streetman
** Changed in: linux (Ubuntu Artful)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Zesty)
   Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu Zesty)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Trusty)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu Trusty)
   Status: New => In Progress

** Changed in: linux (Ubuntu Zesty)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Trusty)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Dan Streetman
> Should separate Ubuntu bug reports be created (if they don't exist already) 
> regarding
> the kernel crashes

This bug is only to fix the hang/delay in new container creation.
Please open a new bug for kernel crashes.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Fabian Holler
Ok, seems that I'm than in the wrong ticket with my issues. :-)
Should separate Ubuntu bug reports be created (if they don't exist already) 
regarding the kernel 
crashes?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Dan Streetman
My analysis so far of the problem:

1. container A has an outstanding TCP connection (thus, a socket and dst
which hold reference on the "lo" interface from the container). When the
container is stopped, the TCP connection takes ~2 minutes to timeout
(with default settings).

2. when container A is being removed, its net namespace is removed via
net/core/net_namespace.c function cleanup_net(). This takes two locks -
first, the net_mutex, and then the rtnl mutex (i.e. rtnl_lock). It then
cleans up the net ns and calls rtnl_unlock(). However, rtnl_unlock()
internally waits for all the namespace's interfaces to be freed, which
requires all their references to reach 0. This must wait until the above
TCP connection times out, before it releases its reference. So, at this
point the container is hung inside rtnl_unlock() waiting, and it still
holds the net_mutex. It doesn't release the net_mutex until its lo
interface finally is destroyed after the TCP connection times out.

3. When a new container is started, part of its startup is to call
net/core/net_namespace.c function copy_net_ns() to copy the caller's net
namespace to the new container net namespace. However, this function
locks the net_mutex. Since the previous container still is holding the
net_mutex as explaned in #2 above, this new container creation blocks
until #2 releases the mutex.


There are a few ways to possibly address this: 

a) when a container is being removed, all its TCP connections should
abort themselves. Currently, TCP connections don't directly register for
interface unregister events - they explictly want to stick around, so if
an interface is taken down and then brought back up, the TCP connection
remains, and the communication riding on top of the connection isn't
interrupted. However, the way TCP does this is to move all its dst
references off the interface that is unregistering, to the loopback
interface. This works for the initial network namespace, where the
loopback interface is always present and never removed. However, this
doesn't work for non-default network namespaces - like containers -
where the loopback interface is unregistered when the container is being
removed. So this aspect of TCP may need to change, to correctly handle
containers.  This also may not cover all causes of this hang, since
sockets handle more than just tcp connections.

b) when a container is being removed, instead of holding the net_mutex
while the cleaning up (and calling rtnl_unlock), it could release the
net_mutex first (after removing all netns marked for cleanup from the
pernet list), then call rtnl_unlock. This needs examination to make sure
it would not introduce any races or other problems.

c) rtnl_unlock could be simplified - currently, it includes significant
side-effects, which include the long delay while waiting to actually
remove all references to the namespace's interfaces (including
loopback). Instead of blocking rtnl_unlock() to do all this cleanup, the
cleanup could be deferred. This also would need investigation to make
sure no caller is expecting to be able to free resources that may be
accessed later from the cleanup action (which I believe is the case).

As this is a complex problem there are likely other options to fix it as
well.  This issue also has been around ever since namespaces were
introduced to the kernel, as far as I can tell, but it's not a commonly
seen issue because in most cases socket connections are shut down before
stopping the container.  The socket connection causing the problem here
is a kernel socket, which is different than normal userspace-created
sockets; that may make a difference though I haven't investigated that
angle yet.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Dan Streetman
> Is there any way for a non-kernel-dev type to see exactly which resource 
> unregister_netdevice is waiting for? (Or does it only keep track of usage 
> count?)

not really, no.  It's simply waiting for its reference count to drop to
0, and just broadcasts unregister events periodically hoping that
everyone with a reference is listening and does whatever they need to do
to drop their references.  In this case the socket's dst actually is
listening, but the way the dst handles it assumes that the loopback
interface will never, ever unregister - the dst shifts its reference
count from the interface going down, to its netns's loopback interface.
If the interface going down is the netns's loopback interface, then that
reference shift doesn't help, of course.

So no, there is no simple way for you to tell what it's waiting for.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Dan Streetman
> That the startup of containers is delayed is annoying.
> The much bigger issue is that it can reproducible cause a kernel Oops and
> crash a whole machine.

Yes as you found this uncovers other bugs in the kernel, but those are
separate and looks like they are getting fixed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-11-02 Thread Fabian Holler
According to https://github.com/moby/moby/issues/35068 the crash is
fixed by:

https://patchwork.ozlabs.org/patch/801533/
https://patchwork.ozlabs.org/patch/778449/

** Bug watch added: github.com/moby/moby/issues #35068
   https://github.com/moby/moby/issues/35068

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-10-25 Thread Fabian Holler
Hello Dan,

thanks for the analysis!

That the startup of containers is delayed is annoying. 
The much bigger issue is that it can reproducible cause a kernel Oops and crash 
a whole machine.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-10-24 Thread Charles
Is there any way for a non-kernel-dev type to see exactly which resource
unregister_netdevice is waiting for? (Or does it only keep track of
usage count?)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-10-11 Thread Eric Desrochers
** Also affects: linux (Ubuntu Artful)
   Importance: Medium
 Assignee: Dan Streetman (ddstreet)
   Status: Confirmed

** Also affects: linux (Ubuntu Bb-series)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Zesty)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-10-11 Thread Dan Streetman
Hi everyone,

so this is an interesting problem.  There are 2 parts to this:

1. in the container being destroyed, some socket(s) remain open for a
period of time, which prevents the container from fully exits until all
its sockets have exited.  While this happens you will see the 'waiting
for lo to become free' message repeated.

2. while the previous container is waiting for its sockets to exit, so
it can exit, any new containers are blocked from starting.

For issue #2, I'm not clear just yet on what is blocking the new
containers from starting.  I thought it could be the rtnl_lock since the
exiting container does loop in the rtnl_unlock path, however while it's
looping it also calls __rtnl_unlock/rtnl_lock during its periods of
waiting, so the mutex should not be held the entire time.  I'm still
investigating that part of this issue to find what exactly is blocking
the new containers.

For issue #1, the sockets are lingering because - at least for the
reproducer case from comment 16 - there is a TCP connection that never
is closed, so the kernel continues to probe the other end until it times
out, which takes around 2 minutes.  For this specific reproducer, there
are 2 easy ways to work around this.  First for background, the
reproducer uses docker to create a CIFS mount, which uses a TCP
connection.  This is the script run in the 'client' side, where the hang
happens:

date ; \
mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
date ; \
ls -la /mnt ; \
umount /mnt ; \
echo "umount ok"

this is repeated 50 times in the script, but the 2nd (and later) calls
hang (for 2 minutes each, if you wait that long).

A. The reason the TCP connection to the CIFS server lingers is beacuse
it is never closed; the reason it isn't closed is because the script
exits the container immediately after unmounting.  The kernel cifs
driver, during fs unmount, closes the TCP socket to the CIFS server,
which queues a TCP FIN to the server.  Normally, this FIN will reach the
server who will respond and the TCP connection will close.  However,
since the container starts exiting immediately, that FIN never reaches
the CIFS server - the interface is taken down too quickly.  So, the TCP
connection remains.  To avoid this, simply add a short sleep between
unmounting and exiting the container, e.g. (new line prefixed with +):

date ; \
mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
date ; \
ls -la /mnt ; \
umount /mnt ; \
+sleep 1 ; \
echo "umount ok"

that avoids the problem, and allows the container to exit immediately
and the next container to start immediately.

B. Instead of delaying enough to allow the TCP connection to exit, the
kernel's TCP configuration can be changed to more quickly timeout.  The
specific setting to change is tcp_orphan_retries, which controls how
many times the kernel keeps trying to talk to a remote host over a
closed socket.  Normally, this defaults to 0 - this will cause the
kernel to actually use a value of 8 (retries).  So, change it from 0 to
1, to actually reduce it from 8 to 1 (confusing, I know).  Like (new
lines prefixed with +):

date ; \
+echo 1 > /proc/sys/net/ipv4/tcp_orphan_retries ; \
+grep -H . /proc/sys/net/ipv4/tcp_orphan_retries ; \
mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
date ; \
ls -la /mnt ; \
umount /mnt ; \
echo "umount ok"


the 'grep' added there is just so you can verify the value was correctly 
changed.  Note this method will take slightly longer than method A, since this 
does perform 1 retry, while method A closes the TCP connection correctly and 
does no retries.


To extrapolate this generally to situations beside this specific reproducer, 
here are some suggestions:

1. close all your connections before closing the container.  Especially
kernel connections (like the CIFS TCP connection).

2. if you can't close the connections, add a delay as the last action
the container takes, after bringing down all its interfaces (except lo)
but before exiting - like sleep for up to 2 minutes.  The container will
take up a bit more resources during that time but it should allow enough
time for all TCP connections to close.

3. if you can't do either of those workarounds, reduce the number of TCP
retries that will be done after the container starts exiting by changing
the tcp_orphan_retries param (from inside the container - don't change
this value in your host system).  Change the parameter right before you
close the container, if possible.

Those are simply workarounds - I'm still investigating what is blocking
new containers from starting, as well as looking at ways to forcibly
destroy a container's sockets when the container is exiting (e.g. maybe
sock_diag_destroy)

Please let me know if any of those, or similar, workarounds helps anyone
who is seeing this problem.

-- 

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-10-02 Thread Jens Elkner
Since xenial we have this very annoying behavior (unregister_netdevice:
waiting for lo to become free. Usage count = 1) as well, which in turn
makes a container reboot taking several minutes instead of seconds e.g.
on vivid (lxc-ls -f also hangs for that time). E.g.:

 [ +10.244888] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[Oct 2 17:16] INFO: task lxc-autostart:5321 blocked for more than 120 seconds.
[  +0.007147]   Tainted: P   OE   4.4.0-96-generic #119-Ubuntu
[  +0.006796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  +0.007913] lxc-autostart   D 883fb99fbcb8 0  5321  1 0x
[  +0.07]  883fb99fbcb8  883ff2a3aa00 
881fed2fe200
[  +0.04]  883fb99fc000 81ef79a4 881fed2fe200 

[  +0.04]  81ef79a8 883fb99fbcd0 8183f165 
81ef79a0
[  +0.04] Call Trace:
[  +0.14]  [] schedule+0x35/0x80
[  +0.05]  [] schedule_preempt_disabled+0xe/0x10
[  +0.16]  [] __mutex_lock_slowpath+0xb9/0x130
[  +0.14]  [] mutex_lock+0x1f/0x30
[  +0.09]  [] copy_net_ns+0x6e/0x120
[  +0.09]  [] create_new_namespaces+0x11b/0x1d0
[  +0.03]  [] copy_namespaces+0x6d/0xa0
[  +0.07]  [] copy_process+0x8e2/0x1b30
[  +0.11]  [] _do_fork+0x80/0x360
[  +0.15]  [] ? exit_to_usermode_loop+0xa6/0xd0
[  +0.04]  [] SyS_clone+0x19/0x20
[  +0.09]  [] entry_SYSCALL_64_fastpath+0x16/0x71
[  +4.882232] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-09-29 Thread Charles
We're seeing this on our build servers. These are VMWare virtual
machines which run docker containers in privileged mode. Inside the
containers are Jenkins agents and a build environment. Part of the build
process uses unshare to create these namespaces: ipc uts mount pid net.
If you have any questions about this arrangement, I can try to provide
more detail and maybe example code.

The kernel is 4.4.0-79-generic x86_64. We are using docker 17.05.0-ce.

In one case, the whole VM hung when this was encountered.

Are there any workarounds for 16.04 users beyond rebooting the host?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-09-12 Thread Dan Streetman
Thank you for the reproducer, I have been able to reproduce this using
docker-samba-loop scripts.  This should help me to dig into the kernel
to see where the dst leak is coming from.

** Description changed:

  This is a "continuation" of bug 1403152, as that bug has been marked
  "fix released" and recent reports of failure may (or may not) be a new
  bug.  Any further reports of the problem should please be reported here
  instead of that bug.
  
  --
  
  [Impact]
  
  When shutting down and starting containers the container network
  namespace may experience a dst reference counting leak which results in
  this message repeated in the logs:
  
- unregister_netdevice: waiting for lo to become free. Usage count = 1
+ unregister_netdevice: waiting for lo to become free. Usage count = 1
  
  This can cause issues when trying to create net network namespace and
  thus block a user from creating new containers.
  
  [Test Case]
  
- ?
+ See comment 16, reproducer provided at https://github.com/fho/docker-
+ samba-loop

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-09-07 Thread Fabian Holler
I could not reproduce the bug with the described method with kernel
4.4.0-81-generic and neither with 4.13.0-041300rc7-generic. 4.4.0-81
logged a hung tasks but does not Oops.

So the bug might have been reintroduced between 4.4.0-82 and 4.4.0-93
and 4.13 seems to contain a fix.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-09-07 Thread Thomas Morin
bug 1715660 seems related

( Lots of "waiting for lo to become free" message, then a "task ip:1358
blocked for more than 120 seconds." with a backtrace )

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-08-31 Thread Fabian Holler
Attached are the logs for an Oops on Ubuntu 14.04 on kernel linux-
image-4.4.0-93-generic=4.4.0-93.116~14.04.1

** Attachment added: "kernoops-4.4.0-93.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+attachment/4941963/+files/kernoops-4.4.0-93.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-08-31 Thread Fabian Holler
With https://github.com/fho/docker-samba-loop I was able to reproduce
kernel Oopses on a clean Ubuntu 16.0.4 installation with:

- linux-image-4.10.0-32-generic=4.10.0-32.36~16.04.1
- linux-image-4.11.0-14-generic=4.11.0-14.20~16.04.1

On 4.11.0-14 it was much harder to reproduce. Sometimes only a warning
happened, 1x time I was able to produce an Oops with
https://github.com/fho/docker-samba-loop/commits/more_stress
But I don't know if the changes in the branch made it more likely to happen or
if it was only a coincidence.


The kernel message:
  unregister_netdevice: waiting for lo to become free. Usage count = 1
shows up, some minutes later the oops and/or warning happens.


In a different scenario were the Ubuntu 16.04 servers were running multiple 
docker containers with Nginx or small network applications in parallel, I was 
also able to reproduce the kernel Oopses also on:

- linux-image-4.10.0-1004-gcp
- linux-image-4.4.0-93-generic
- linux-image-4.11.0-14-generic
- linux-image-4.12.10-041210-generic=4.12.10-041210.20170830

I haven't tried again to reproduce it with those kernels on a clean Ubuntu
installation and unfortunately didn't kept the kernel logs.


Kernel logs can be found in the attached file.

** Attachment added: "kernel-logs.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+attachment/4941935/+files/kernel-logs.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-08-17 Thread Andrew Reis
Distributor ID: Ubuntu
Description:Ubuntu 16.04.3 LTS
Release:16.04
Codename:   xenial

cat /etc/apt/sources.list.d/docker.list
deb [arch=amd64] https://apt.dockerproject.org/repo ubuntu-xenial main
# deb-src [arch=amd64] https://apt.dockerproject.org/repo ubuntu-xenial main

cat /etc/apt/sources.list.d/runner_gitlab-ci-multi-runner.list
# this file was generated by packages.gitlab.com for
# the repository at https://packages.gitlab.com/runner/gitlab-ci-multi-runner

deb https://packages.gitlab.com/runner/gitlab-ci-multi-runner/ubuntu/ xenial 
main
deb-src https://packages.gitlab.com/runner/gitlab-ci-multi-runner/ubuntu/ 
xenial main


dpkg -l | grep gitlab
ii  gitlab-ci-multi-runner   9.4.2  
amd64GitLab Runner

dpkg -l | grep docker
ii  docker-engine17.05.0~ce-0~ubuntu-xenial 
amd64Docker: the open-source application container engine

Seems to only occur when running docker with privileged inside of
gitlab-runner. On "close" of the docker container, the process fails
with the following:

Aug 17 17:53:04 WIS-DOCKERPHY kernel: [ 1579.245473] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:53:15 WIS-DOCKERPHY kernel: [ 1589.356182] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:53:25 WIS-DOCKERPHY kernel: [ 1599.466898] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:53:35 WIS-DOCKERPHY kernel: [ 1609.577597] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:53:45 WIS-DOCKERPHY kernel: [ 1619.692335] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:53:54 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:53:54.994924776-05:00" level=error msg="containerd: start 
container" error="containerd: container did not start before the specified 
timeout" id=2ef6677cc840f3597e5b38024cfb337856adfaccdcc1a2739f78413979c0fa2b
Aug 17 17:53:54 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:53:54.995769254-05:00" level=error msg="Create container 
failed with error: containerd: container did not start before the specified 
timeout"
Aug 17 17:53:55 WIS-DOCKERPHY kernel: [ 1629.366310] docker0: port 
1(veth9f03d7e) entered disabled state
Aug 17 17:53:55 WIS-DOCKERPHY kernel: [ 1629.367889] device veth9f03d7e left 
promiscuous mode
Aug 17 17:53:55 WIS-DOCKERPHY kernel: [ 1629.367891] docker0: port 
1(veth9f03d7e) entered disabled state
Aug 17 17:53:55 WIS-DOCKERPHY kernel: [ 1629.449744] auplink[25254]: segfault 
at 7ffdd8ddb0d8 ip 7facdfff1579 sp 7ffdd8ddb0e0 error 6 in 
libc-2.23.so[7facdfef8000+1c]
Aug 17 17:53:55 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:53:55.215290996-05:00" level=warning msg="Couldn't run 
auplink before unmount 
/var/lib/docker/aufs/mnt/90bdc718c5e518ef7b03df0e1769ab4638642ad0e847a38dbf16c6e600783f00:
 signal: segmentation fault (core dumped)"
Aug 17 17:53:55 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:53:55.215372764-05:00" level=error msg="Error unmounting 
container 2ef6677cc840f3597e5b38024cfb337856adfaccdcc1a2739f78413979c0fa2b: 
device or resource busy"
Aug 17 17:53:55 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:53:55.243281045-05:00" level=error msg="Handler for POST 
/v1.18/containers/2ef6677cc840f3597e5b38024cfb337856adfaccdcc1a2739f78413979c0fa2b/start
 returned error: containerd: container did not start before the specified 
timeout"
Aug 17 17:53:55 WIS-DOCKERPHY kernel: [ 1629.862984] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:54:05 WIS-DOCKERPHY kernel: [ 1640.005751] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:54:15 WIS-DOCKERPHY kernel: [ 1650.084470] unregister_netdevice: 
waiting for lo to become free. Usage count = 1
Aug 17 17:54:25 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:54:25.123304254-05:00" level=error msg="containerd: start 
container" error="containerd: container did not start before the specified 
timeout" id=891adb44118b9114d73a600748e52160a250ec0dd7f94a7ad1fc2352d492252d
Aug 17 17:54:25 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:54:25.124077481-05:00" level=error msg="Create container 
failed with error: containerd: container did not start before the specified 
timeout"
Aug 17 17:54:25 WIS-DOCKERPHY kernel: [ 1659.490959] docker0: port 
2(vetha61fb77) entered disabled state
Aug 17 17:54:25 WIS-DOCKERPHY kernel: [ 1659.492532] device vetha61fb77 left 
promiscuous mode
Aug 17 17:54:25 WIS-DOCKERPHY kernel: [ 1659.492534] docker0: port 
2(vetha61fb77) entered disabled state
Aug 17 17:54:25 WIS-DOCKERPHY dockerd[18386]: 
time="2017-08-17T17:54:25.249728323-05:00" level=error msg="Error unmounting 
container 891adb44118b9114d73a600748e52160a250ec0dd7f94a7ad1fc2352d492252d: 
device or resource busy"
Aug 17 17:54:25 WIS-DOCKERPHY dockerd[18386]: 

[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free

2017-08-17 Thread Dan Streetman
Can anyone experiencing this issue please provide details such as:

-what release are you using (trusty/xenial/zesty)?

-what kernel version are you using?

-do you have specific steps to reproduce the problem?

** Description changed:

  This is a "continuation" of bug 1403152, as that bug has been marked
  "fix released" and recent reports of failure may (or may not) be a new
  bug.  Any further reports of the problem should please be reported here
  instead of that bug.
+ 
+ --
+ 
+ [Impact]
+ 
+ When shutting down and starting containers the container network
+ namespace may experience a dst reference counting leak which results in
+ this message repeated in the logs:
+ 
+ unregister_netdevice: waiting for lo to become free. Usage count = 1
+ 
+ This can cause issues when trying to create net network namespace and
+ thus block a user from creating new containers.
+ 
+ [Test Case]
+ 
+ ?

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
   Status: Incomplete => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs