[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-03-27 Thread Jordan Curzon
An addition to my #18 comment above. I forgot that I had to remove
logging on calls from skb_release_head_state to dst_release due to
performance impact of the number of calls. In both failure and success
scenarios, calls to dst_release by skb_release_head_state are also
present.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1403152

Title:
  unregister_netdevice: waiting for lo to become free. Usage count

Status in The Linux Kernel:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Confirmed
Status in linux source package in Utopic:
  Confirmed

Bug description:
  I currently running trusty latest patches and i get on these hardware
  and software:

  Ubuntu 3.13.0-43.72-generic 3.13.11.11

  processor : 7
  vendor_id : GenuineIntel
  cpu family: 6
  model : 77
  model name: Intel(R) Atom(TM) CPU  C2758  @ 2.40GHz
  stepping  : 8
  microcode : 0x11d
  cpu MHz   : 2400.000
  cache size: 1024 KB
  physical id   : 0
  siblings  : 8
  core id   : 7
  cpu cores : 8
  apicid: 14
  initial apicid: 14
  fpu   : yes
  fpu_exception : yes
  cpuid level   : 11
  wp: yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch 
arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
  bogomips  : 4799.48
  clflush size  : 64
  cache_alignment   : 64
  address sizes : 36 bits physical, 48 bits virtual
  power management:

  somehow reproducable the subjected error, and lxc is working still but
  not more managable until a reboot.

  managable means every command hangs.

  I saw there are alot of bugs but they seams to relate to older version
  and are closed, so i decided to file a new one?

  I run alot of machine with trusty an lxc containers but only these kind of 
machines produces these errors, all
  other don't show these odd behavior.

  thx in advance

  meno

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-03-24 Thread Jordan Curzon
To add to my comment just above, we have found a workload that is not
under own our control (which limits what we can do with it, including
sharing it) but which exacerbates the issue on our systems. This
workload causes the issue less than 5% of the time and that gives us the
chance to look at what callers are using dst_release on the dst in
question (the one that meets conditions A, B, and C; ABC dst). To
clarify, failure at this point for us is when the dst-ref_cnt=1 during
free_fib_info_rcu, success is when dst-ref_cnt=0 and free_fib_info_rcu
calls dst_destroy on it's nh_rth_input member.

In both failure and non-failure scenarios the only two callers we see on
are a single call to ipv4_pktinfo_prepare, and many calls to
inet_sock_destruct, and tcp_data_queue. That likely eliminates a unique
function call to dst_release that can be identified as missing in
failure scenarios. My interpretation is that a single call to
inet_sock_destruct or tcp_data_queue is failing to occur when in the
failure scenario.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1403152

Title:
  unregister_netdevice: waiting for lo to become free. Usage count

Status in The Linux Kernel:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Confirmed
Status in linux source package in Utopic:
  Confirmed

Bug description:
  I currently running trusty latest patches and i get on these hardware
  and software:

  Ubuntu 3.13.0-43.72-generic 3.13.11.11

  processor : 7
  vendor_id : GenuineIntel
  cpu family: 6
  model : 77
  model name: Intel(R) Atom(TM) CPU  C2758  @ 2.40GHz
  stepping  : 8
  microcode : 0x11d
  cpu MHz   : 2400.000
  cache size: 1024 KB
  physical id   : 0
  siblings  : 8
  core id   : 7
  cpu cores : 8
  apicid: 14
  initial apicid: 14
  fpu   : yes
  fpu_exception : yes
  cpuid level   : 11
  wp: yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch 
arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
  bogomips  : 4799.48
  clflush size  : 64
  cache_alignment   : 64
  address sizes : 36 bits physical, 48 bits virtual
  power management:

  somehow reproducable the subjected error, and lxc is working still but
  not more managable until a reboot.

  managable means every command hangs.

  I saw there are alot of bugs but they seams to relate to older version
  and are closed, so i decided to file a new one?

  I run alot of machine with trusty an lxc containers but only these kind of 
machines produces these errors, all
  other don't show these odd behavior.

  thx in advance

  meno

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-03-23 Thread Jordan Curzon
I'm working with Rodrigo Vaz and we've found some details about our
occurrence of this issue using systemtap.

rt_cache_route places a dst_entry struct into a fib_nh struct as the
nh_rth_input. Occasionally the reference counter on that dst is not
decremented by the time free_fib_info_rcu is called on the fib during
container teardown. In that case free_fib_info_rcu doesn't call
dst_destroy and dev_put is not called on the lo interface of the
container. The only situation where we've seen this is when A) the
fib_nh-nh_dev points to the eth0 interface of the container, B) the dst
is part of an rtable struct where rt- rt_is_input==1, and C) the dst
points to the lo interface of the container. The dst is cached in the
fib only once and never replaced and then thousands of
dst_hold/dst_release calls are made on the dst for connections and
socket buffers.

We have only seen this so far on containers making lots of outbound
connections. It doesn't appear to depend on the lifetime of the
container, some are only alive for 30min and others are alive for 24hrs.
The issue occurs when you try to destroy the container because that is
when the fib is freed. We don't know when or where the dst ref_cnt
becomes incorrect.

We don't know how to reproduce the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1403152

Title:
  unregister_netdevice: waiting for lo to become free. Usage count

Status in The Linux Kernel:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Trusty:
  Confirmed
Status in linux source package in Utopic:
  Confirmed

Bug description:
  I currently running trusty latest patches and i get on these hardware
  and software:

  Ubuntu 3.13.0-43.72-generic 3.13.11.11

  processor : 7
  vendor_id : GenuineIntel
  cpu family: 6
  model : 77
  model name: Intel(R) Atom(TM) CPU  C2758  @ 2.40GHz
  stepping  : 8
  microcode : 0x11d
  cpu MHz   : 2400.000
  cache size: 1024 KB
  physical id   : 0
  siblings  : 8
  core id   : 7
  cpu cores : 8
  apicid: 14
  initial apicid: 14
  fpu   : yes
  fpu_exception : yes
  cpuid level   : 11
  wp: yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch 
arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
  bogomips  : 4799.48
  clflush size  : 64
  cache_alignment   : 64
  address sizes : 36 bits physical, 48 bits virtual
  power management:

  somehow reproducable the subjected error, and lxc is working still but
  not more managable until a reboot.

  managable means every command hangs.

  I saw there are alot of bugs but they seams to relate to older version
  and are closed, so i decided to file a new one?

  I run alot of machine with trusty an lxc containers but only these kind of 
machines produces these errors, all
  other don't show these odd behavior.

  thx in advance

  meno

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp