Public bug reported:
Hi guys,
For background: I'm running a container with an NFS filesystem bind
mounted into it. The workload I'm running is iozone, a filesystem
benchmarking tool. While running this workload, I attempt to freeze the
container, which gets stuck in the FREEZING state. After a while, I get:
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.104156] INFO: task iozone:20035
blocked for more than 120 seconds.
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.111056] Tainted: P
O 4.4.0-24-generic #43-Ubuntu
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.118053] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126110] iozone D
ffff880015673e18 0 20035 20005 0x00000104
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126116] ffff880015673e18
ffff880000000010 ffff880045a21b80 ffff880037776e00
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126118] ffff880015674000
ffff8800179d6e54 ffff880037776e00 00000000ffffffff
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126120] ffff8800179d6e58
ffff880015673e30 ffffffff81821b15 ffff8800179d6e50
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126121] Call Trace:
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126129] [<ffffffff81821b15>]
schedule+0x35/0x80
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126131] [<ffffffff81821dbe>]
schedule_preempt_disabled+0xe/0x10
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126134] [<ffffffff818239f9>]
__mutex_lock_slowpath+0xb9/0x130
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126136] [<ffffffff81823a8f>]
mutex_lock+0x1f/0x30
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126139] [<ffffffff8121d00b>]
do_unlinkat+0x12b/0x2d0
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126142] [<ffffffff8121dc16>]
SyS_unlink+0x16/0x20
Jul 1 01:45:14 juju-19f8e3-15 kernel: [206520.126146] [<ffffffff81825bf2>]
entry_SYSCALL_64_fastpath+0x16/0x71
It looks like the task is actually stuck in generic fs code, not
anything NFS specific, but perhaps that's a relevant detail. Anyway:
ubuntu@juju-19f8e3-15:~$ sudo cat /proc/20035/stack
[<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0
[<ffffffff8121dc16>] SyS_unlink+0x16/0x20
[<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
The container and host are both xenial:
ubuntu@juju-19f8e3-15:~$ uname -a
Linux juju-19f8e3-15 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
Finally, I don't have a good reproducer for this. It's pretty rare, as
I'm running this benchmark in a loop, and over thousands of runs I've
seen this exactly once.
I'll leave these hosts up for a bit if there's any other interesting
bits of info to collect.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1598285
Title:
possible deadlock while using the cgroup freezer on a container with
NFS-based workload
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1598285/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs