Public bug reported:

I'm using docker-compose to wrangle a bunch of containers, and quite
often one of my containers hangs.

When I inspect the process with strace, I see it's blocked on write(2,
"...").

/proc/$container_pid/fd/2 is a pipe, the other end of which is managed
by a containerd-shim process.

strace -p $containerd_shim_pid shows it blocked in futex(0xad3848,
FUTEX_WAIT_PRIVATE, 0, NULL).

After enough time passes (around 5 or 10 minutes?) I see that
containerd-shim crash with a SIGABRT.  This time I had strace still
attached:


strace: Process 861273 attached
futex(0xad3848, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted 
if SA_RESTART is set)
--- SIGABRT {si_signo=SIGABRT, si_code=SI_USER, si_pid=867057, si_uid=0} ---
nanosleep({tv_sec=0, tv_nsec=1000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=1000000}, NULL) = 0
write(2, "SIGABRT: abort", 14)          = 14
write(2, "\n", 1)                       = 1
write(2, "PC=", 3)                      = 3
write(2, "0x45c791", 8)                 = 8
write(2, " m=", 3)                      = 3
write(2, "0", 1)                        = 1
write(2, " sigcode=", 9)                = 9
write(2, "0", 1)                        = 1
write(2, "\n", 1)                       = 1
write(2, "\n", 1)                       = 1
write(2, "goroutine ", 10)              = 10
write(2, "0", 1)                        = 1
write(2, " [", 2)                       = 2
write(2, "idle", 4)                     = 4
write(2, "]:\n", 3)                     = 3
write(2, "runtime.futex", 13)           = 13
...
write(2, "rflags ", 7)                  = 7
write(2, "0x286", 5)                    = 5
write(2, "\n", 1)                       = 1
write(2, "cs     ", 7)                  = 7
write(2, "0x33", 4)                     = 4
write(2, "\n", 1)                       = 1
write(2, "fs     ", 7)                  = 7
write(2, "0x0", 3)                      = 3
write(2, "\n", 1)                       = 1
write(2, "gs     ", 7)                  = 7
write(2, "0x0", 3)                      = 3
write(2, "\n", 1)                       = 1
exit_group(2)                           = ?
+++ exited with 2 +++


(Full strace log attached, unless I forget)

It would be nice if I could read that Go traceback somewhere instead of
looking at truncated strace writes, but I don't know where.

journalctl -u containerd shows only this:

rugs. 15 12:10:32 blynas containerd[1133]: 
time="2020-09-15T12:10:32.805932666+03:00" level=info msg="shim containerd-shim 
started" 
address=/containerd-shim/a65286bda8fa7242d30c2a351a60d90c344ee6d8af60a7487b1efe75014914c3.sock
 debug=false pid=861273
rugs. 15 12:10:33 blynas containerd[1133]: 
time="2020-09-15T12:10:33.489963805+03:00" level=info msg="shim containerd-shim 
started" 
address=/containerd-shim/863c8b7d5dc91b24ae633dbf6efc21b2d9f4973ee6dd4bdf2cb855556868f9de.sock
 debug=false pid=861500
rugs. 15 12:10:34 blynas containerd[1133]: 
time="2020-09-15T12:10:34.197921760+03:00" level=info msg="shim containerd-shim 
started" 
address=/containerd-shim/08c9c71a81352069f02b14389fb0a636484071e76a443665cb3eefa781a86f3a.sock
 debug=false pid=861671
rugs. 15 12:11:03 blynas containerd[1133]: 
time="2020-09-15T12:11:03.521908472+03:00" level=info msg="shim containerd-shim 
started" 
address=/containerd-shim/32b7b8e90d39f2cc59125f5072f7b6012a2e784dc79101694fa749bdfedee8bc.sock
 debug=false pid=862495
rugs. 15 12:20:09 blynas containerd[1133]: 
time="2020-09-15T12:20:09.424130178+03:00" level=info msg="shim reaped" 
id=b76c63912cae0a668aa1f0b9baa2bd74a6c5bbb6a32c1bcae351028cfb101f78
rugs. 15 12:20:09 blynas containerd[1133]: 
time="2020-09-15T12:20:09.424177796+03:00" level=warning msg="cleaning up after 
shim dead" id=b76c63912cae0a668aa1f0b9baa2bd74a6c5bbb6a32c1bcae351028cfb101f78 
namespace=moby
rugs. 15 12:20:12 blynas containerd[1133]: 
time="2020-09-15T12:20:12.765586972+03:00" level=info msg="shim reaped" 
id=4ea9c0c055e7df514cb4cdb7b6aae7db2ecfea299b712907a229e464827ef219

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: containerd 1.3.3-0ubuntu2
ProcVersionSignature: Ubuntu 5.4.0-47.51-generic 5.4.55
Uname: Linux 5.4.0-47-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.8
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue Sep 15 12:20:56 2020
EcryptfsInUse: Yes
InstallationDate: Installed on 2019-06-12 (460 days ago)
InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
SourcePackage: containerd
UpgradeStatus: Upgraded to focal on 2020-04-24 (143 days ago)

** Affects: containerd (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug focal wayland-session

** Attachment added: "strace output during the hang and subsequent SIGABRT"
   
https://bugs.launchpad.net/bugs/1895647/+attachment/5410954/+files/containerd-shim-strace.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1895647

Title:
  containerd-shim deadlocks, then crashes

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1895647/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to