[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
[Expired for linux (Ubuntu) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
[Expired for linux (Ubuntu Bionic) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu Bionic) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
The system is largely unresponsive. I got this though: ubuntu@myserver:~$ free totalusedfree shared buff/cache available Mem: 6568594848896836 803588 4691721598552415572340 Swap: 1996796 51712 1945084 ubuntu@myserver:~$ It appears that the memory has been exhausted. Do kernel errors from resource exhaustion count as bugs? I am changing from CONFIRMED to NEW. ** Changed in: linux (Ubuntu) Status: Confirmed => New ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
I tried the following command: ubuntu@myserver:~$ lxd-benchmark launch --count 900 --parallel 24 ubuntu:18.04 Test environment: Server backend: lxd Server version: 3.0.2 Kernel: Linux Kernel architecture: x86_64 Kernel version: 4.15.0-36-generic Storage backend: zfs Storage version: 0.7.5-1ubuntu16.3 Container backend: lxc Container version: 3.0.2 Test variables: Container count: 900 Container mode: unprivileged Startup mode: normal startup Image: ubuntu:18.04 Batches: 37 Batch size: 24 Remainder: 12 [Oct 3 21:46:55.617] Found image in local store: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a [Oct 3 21:46:55.617] Batch processing start [Oct 3 21:47:09.310] Processed 24 containers in 13.693s (1.753/s) [Oct 3 21:47:26.739] Processed 48 containers in 31.122s (1.542/s) [Oct 3 21:48:06.052] Processed 96 containers in 70.435s (1.363/s) [Oct 3 21:49:27.340] Processed 192 containers in 151.723s (1.265/s) ^C I interrupted the benchmark because it got stuck. Note: 1. I am running Ubuntu 18.04. 2. With the updated kernel 4.15.0-36. I did not try the proposed kernel. 3. With LXD 3.0.2 (from bionic/proposed) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
I got the same bug again. Here are the kernel messages: [ 1450.993972] INFO: task systemd:1 blocked for more than 120 seconds. [ 1451.000279] Tainted: P O 4.15.0-36-generic #39-Ubuntu [ 1451.007094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1451.014957] systemd D0 1 0 0x [ 1451.014960] Call Trace: [ 1451.014969] __schedule+0x291/0x8a0 [ 1451.014971] schedule+0x2c/0x80 [ 1451.014973] schedule_preempt_disabled+0xe/0x10 [ 1451.014974] __mutex_lock.isra.2+0x18c/0x4d0 [ 1451.014976] __mutex_lock_slowpath+0x13/0x20 [ 1451.014978] ? __mutex_lock_slowpath+0x13/0x20 [ 1451.014979] mutex_lock+0x2f/0x40 [ 1451.014982] proc_cgroup_show+0x4c/0x2a0 [ 1451.014985] proc_single_show+0x56/0x80 [ 1451.014988] seq_read+0xe5/0x430 [ 1451.014990] __vfs_read+0x1b/0x40 [ 1451.014991] vfs_read+0x8e/0x130 [ 1451.014992] SyS_read+0x55/0xc0 [ 1451.014995] do_syscall_64+0x73/0x130 [ 1451.014997] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 1451.014999] RIP: 0033:0x7fc9a5300081 [ 1451.015000] RSP: 002b:7ffcdf16ab48 EFLAGS: 0246 ORIG_RAX: [ 1451.015002] RAX: ffda RBX: 55c5a2612290 RCX: 7fc9a5300081 [ 1451.015003] RDX: 0400 RSI: 55c5a269b4d0 RDI: 0026 [ 1451.015004] RBP: 0d68 R08: 0001 R09: [ 1451.015004] R10: R11: 0246 R12: 7fc9a55d7760 [ 1451.015005] R13: 7fc9a55d82a0 R14: 55c5a2612290 R15: 07ff [ 1451.015077] INFO: task systemd-journal:811 blocked for more than 120 seconds. [ 1451.022239] Tainted: P O 4.15.0-36-generic #39-Ubuntu [ 1451.029073] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1451.036938] systemd-journal D0 811 1 0x0120 [ 1451.036942] Call Trace: [ 1451.036950] __schedule+0x291/0x8a0 [ 1451.036954] ? ___slab_alloc+0x20a/0x4b0 [ 1451.036956] schedule+0x2c/0x80 [ 1451.036957] schedule_preempt_disabled+0xe/0x10 [ 1451.036958] __mutex_lock.isra.2+0x18c/0x4d0 [ 1451.036960] __mutex_lock_slowpath+0x13/0x20 [ 1451.036962] ? __mutex_lock_slowpath+0x13/0x20 [ 1451.036963] mutex_lock+0x2f/0x40 [ 1451.036966] proc_cgroup_show+0x4c/0x2a0 [ 1451.036969] proc_single_show+0x56/0x80 [ 1451.036972] seq_read+0xe5/0x430 [ 1451.036975] __vfs_read+0x1b/0x40 [ 1451.036978] vfs_read+0x8e/0x130 [ 1451.036981] SyS_read+0x55/0xc0 [ 1451.036985] do_syscall_64+0x73/0x130 [ 1451.036988] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 1451.036991] RIP: 0033:0x7f331df97081 [ 1451.036993] RSP: 002b:7ffd719a04c8 EFLAGS: 0246 ORIG_RAX: [ 1451.036997] RAX: ffda RBX: 56046305a6c0 RCX: 7f331df97081 [ 1451.037000] RDX: 0400 RSI: 560463012a10 RDI: 001f [ 1451.037001] RBP: 0d68 R08: 0001 R09: [ 1451.037003] R10: R11: 0246 R12: 7f331e26e760 [ 1451.037005] R13: 7f331e26f2a0 R14: 56046305a6c0 R15: 07ff [ 1451.037029] INFO: task lxcfs:39982 blocked for more than 120 seconds. [ 1451.043498] Tainted: P O 4.15.0-36-generic #39-Ubuntu [ 1451.050308] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1451.058165] lxcfs D0 39982 1 0x [ 1451.058167] Call Trace: [ 1451.058172] __schedule+0x291/0x8a0 [ 1451.058174] schedule+0x2c/0x80 [ 1451.058175] schedule_preempt_disabled+0xe/0x10 [ 1451.058176] __mutex_lock.isra.2+0x18c/0x4d0 [ 1451.058178] __mutex_lock_slowpath+0x13/0x20 [ 1451.058179] ? __mutex_lock_slowpath+0x13/0x20 [ 1451.058180] mutex_lock+0x2f/0x40 [ 1451.058182] proc_cgroup_show+0x4c/0x2a0 [ 1451.058184] proc_single_show+0x56/0x80 [ 1451.058185] seq_read+0xe5/0x430 [ 1451.058187] __vfs_read+0x1b/0x40 [ 1451.058188] vfs_read+0x8e/0x130 [ 1451.058189] SyS_read+0x55/0xc0 [ 1451.058191] do_syscall_64+0x73/0x130 [ 1451.058192] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 1451.058194] RIP: 0033:0x7fe0461c70b4 [ 1451.058194] RSP: 002b:7fe036ffc870 EFLAGS: 0246 ORIG_RAX: [ 1451.058196] RAX: ffda RBX: 001a RCX: 7fe0461c70b4 [ 1451.058196] RDX: 0400 RSI: 7fdfb00231e0 RDI: 001a [ 1451.058197] RBP: 7fdfb00231e0 R08: 0001 R09: [ 1451.058198] R10: R11: 0246 R12: 0400 [ 1451.058198] R13: 7fe04649f2a0 R14: R15: [ 1451.058200] INFO: task lxcfs:118730 blocked for more than 120 seconds. [ 1451.064746] Tainted: P O 4.15.0-36-generic #39-Ubuntu [ 1451.071559] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1451.079411] lxcfs D0 118730 1 0x [ 1451.079415] Call Trace: [ 1451.079418] __schedule+0x291/0x8a0 [ 1451.079421] ?
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
[Expired for linux (Ubuntu) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
[Expired for linux (Ubuntu Bionic) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu Bionic) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
I have been doing stress-testing on LXD at a freshly installed 18.04. The Linux kernel was the standard 18.04 kernel. LXD though was compiled from master, ZFS was also compile from master. I performed the stress-testing by running the command lxd-benchmark --count 384 --parallel 24 This launches 384 Ubuntu 16.04 containers in batches of 24 containers. LXD does not run well on the mainline Linux kernel because I think some necessary patches have not been upstreamed yet. I plan to do more stress testing, and when I get the same issue, I'll run apport to retrieve information from the system to attach here. Obviously, when I deploy the server, I'll prepare it for apport beforehand. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1781601] Re: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn)
Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem? Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.18 kernel[0]. If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'. If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'. Once testing of the upstream kernel is complete, please mark this bug as "Confirmed". Thanks in advance. [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc5 ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1781601 Title: Stress-testing LXD causes kernel hung in cgroups (cgroup_destroy css_killed_work_fn) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs