[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-12-15 Thread Rodrigo Vaz
FWIW here is an update of what I've tried in the last couple months trying to fix this problem (unsuccessfully): - We tried to deny packets to the container's network before we destroy the namesapce - Backported the patch mentioned in the previous comment to ubuntu kernel 3.19 and 4.2

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-02-09 Thread Rodrigo Vaz
I left a couple instances running with the mainline kernel (http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-rc7-vivid/) during the weekend, it took more time to see the bug on the mainline kernel but this morning one out of ten instances had the same problem so I'm assuming mainline is also

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-02-09 Thread Rodrigo Vaz
Meno, I just tried your testcase where you described adding an ipv6gre to the container and rebooting it, couldn't reproduce the netdev_hung problem so far, do you mind sharing specific details or even a script that will reproduce the problem? Mounting an NFS share on my containers is not a

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-02-05 Thread Rodrigo Vaz
Some additional info: - The stack trace is always the same posted above and the break point seems to be copy_net_ns every time. - The process that hangs is always lxc-start in every occurrence that I was able to check. Rodrigo. -- You received this bug notification because you are a member

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-02-03 Thread Rodrigo Vaz
Just got an instance with the kernel 3.16.0-29-generic #39-Ubuntu (linux-lts-utopic) hitting this bug in production, we don't have a reliable reproducer so the only way for me to validate is to boot this kernel in production and wait the bug to happen. Is there anything I can get from an instance

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-01-15 Thread Rodrigo Vaz
Hi, We're hitting this bug on the latest trusty kernel in a similar context of this docker issue, we also had this problem on lucid with a custom 3.8.11 kernel which seems to be more agressive than trusty but still happens: https://github.com/docker/docker/issues/5618 In this issue an upstream

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-01-15 Thread Rodrigo Vaz
From the docker issue it seems that someone couldn't reproduce the bug when downgrading to kernel 3.13.0-32-generic, I can't validate this statement because kernels prior 3.13.0-35-generic has a regression that crashes my ec2 instance. Also people testing kernel 3.14.0 couldn't reproduce the bug.

[Kernel-packages] [Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

2015-01-15 Thread Rodrigo Vaz
Kernel stack trace when lxc-start process hang: [27211131.602770] INFO: task lxc-start:25977 blocked for more than 120 seconds. [27211131.602785] Not tainted 3.13.0-40-generic #69-Ubuntu [27211131.602789] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message.

[Kernel-packages] [Bug 1391339] Re: Trusty kernel inbound network performance regression when GRO is enabled

2014-11-14 Thread Rodrigo Vaz
Hi Stefan, Ok I think I've figured out why you were unable to reproduce the slowness, as I mentioned earlier we use the m2 instance type that runs on underlying Xen 3.4 where the t1.micro is probably running on newer infrastructure so I decided to give m3 a try and xen is actually newer (4.2)

[Kernel-packages] [Bug 1391339] Re: Trusty kernel inbound network performance regression when GRO is enabled

2014-11-12 Thread Rodrigo Vaz
Hi Stefan, Interesting finding indeed, looks like the author has only tested on Xen-4.4 and the EC2 instance type that we use report Xen-3.4 as the logline below shows: Xen version: 3.4.3.amazon (preserve-AD) I'm using a common S3 url in my test cases and the results are the same either using

[Kernel-packages] [Bug 1391339] Re: Trusty kernel inbound network performance regression when GRO is enabled

2014-11-11 Thread Rodrigo Vaz
Hi Kamal, thanks for puting together a PPA with the test kernel but unfortunately I had the same results: Linux runtime-common 4 3.13.0-40-generic #68+g73d3fe6-Ubuntu SMP Tue Nov 11 16:39:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux GRO enabled: root@runtime-common.23 ~# for i in {1..3}; do ab -n

[Kernel-packages] [Bug 1391339] [NEW] Trusty kernel inbound network performance regression when GRO is enabled

2014-11-10 Thread Rodrigo Vaz
Public bug reported: After upgrading our EC2 instances from Lucid to Trusty we noticed an increase on download times, Lucid instances were able to download twice as fast as Trusty. After some investigation and testing older kernels (precise, raring and saucy) we confirmed that this only happens

[Kernel-packages] [Bug 1391339] Re: Trusty kernel inbound network performance regression when GRO is enabled

2014-11-10 Thread Rodrigo Vaz
FWIW here is the speed for the Lucid instances with a custom 3.8.11 kernel which I've used as baseline: Transfer rate: 93501.64 [Kbytes/sec] received Transfer rate: 84949.88 [Kbytes/sec] received Transfer rate: 84795.65 [Kbytes/sec] received Rodrigo. -- You received

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-07-10 Thread Rodrigo Vaz
I verified the kernel on -proposed (3.13.0-32-generic) and could not reproduce the bug using our test case. No crashes. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1314274 Title: BUG

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-06-30 Thread Rodrigo Vaz
I can confirm the test kernels are good. Couldn't reproduce the bug on our environment. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1314274 Title: BUG in nf_nat_cleanup_conntrack

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-06-11 Thread Rodrigo Vaz
Chris, I've tested this patch too and it prevent the crash on our test case as well. The new patch applied clean on ubuntu kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1314274

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-06-09 Thread Rodrigo Vaz
I still didn't had luck generating a crashdump but with small change on the patch posted on the upstream bug I can confirm the crash doesn't happen anymore, tested on ubuntu trusty kernel 3.13.0-24-generic. ** Patch added: upstream workaround patch

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-05-18 Thread Rodrigo Vaz
I still wasn't able to get a kdump loaded for a crashdump on this ec2 instance although I was able to capture lockdep with the container running and when it get killed that is just before the crash happens. ** Attachment added: lockdep.txt

[Kernel-packages] [Bug 1314274] Re: BUG in nf_nat_cleanup_conntrack

2014-05-06 Thread Rodrigo Vaz
Chris, I work with Steve and was able to reproduce with the lockdep debugging, the output as follows: [18075576.538133] BUG: unable to handle kernel paging request at c900038ebac8 [18075576.538153] IP: [a013d1a1] nf_nat_cleanup_conntrack+0x41/0x70 [nf_nat] [18075576.538166] PGD