[ https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone reassigned MESOS-9334: --------------------------------- Shepherd: Gilbert Song Assignee: Qian Zhang Sprint: Mesosphere RI-6 Sprint 2018-31 Story Points: 5 > Container stuck at ISOLATING state due to libevent poll never returns > --------------------------------------------------------------------- > > Key: MESOS-9334 > URL: https://issues.apache.org/jira/browse/MESOS-9334 > Project: Mesos > Issue Type: Bug > Components: containerization > Reporter: Qian Zhang > Assignee: Qian Zhang > Priority: Critical > > We found UCR container may be stuck at `ISOLATING` state: > {code:java} > 2018-10-03 09:13:23: I1003 09:13:23.274561 2355 containerizer.cpp:3122] > Transitioning the state of container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 > from PREPARING to ISOLATING > 2018-10-03 09:13:23: I1003 09:13:23.279223 2354 cni.cpp:962] Bind mounted > '/proc/5244/ns/net' to > '/run/mesos/isolators/network/cni/1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54/ns' > for container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 > 2018-10-03 09:23:22: I1003 09:23:22.879868 2354 containerizer.cpp:2459] > Destroying container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 in ISOLATING state > {code} > In the above logs, the state of container > `1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54` was transitioned to `ISOLATING` at > 09:13:23, but did not transitioned to any other states until it was destroyed > due to the executor registration timeout (10 mins). And the destroy can never > complete since it needs to wait for the container to finish isolating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)