[jira] [Assigned] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well

2019-09-16 Thread Gilbert Song (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-9966:
---

Assignee: Qian Zhang  (was: Gilbert Song)

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> 
>
> Key: MESOS-9966
> URL: https://issues.apache.org/jira/browse/MESOS-9966
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.7.3
>Reporter: Jan Schlicht
>Assignee: Qian Zhang
>Priority: Major
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] 
> Recovering provisioner
> 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 metadata_manager.cpp:286] 
> Successfully loaded 64 Docker images
> 2019-09-09 05:04:26: I0909 05:04:26.388420 89932 provisioner.cpp:494] 

[jira] [Assigned] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well

2019-09-16 Thread Carl Dellar (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Dellar reassigned MESOS-9966:
--

Assignee: Gilbert Song

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> 
>
> Key: MESOS-9966
> URL: https://issues.apache.org/jira/browse/MESOS-9966
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.7.3
>Reporter: Jan Schlicht
>Assignee: Gilbert Song
>Priority: Major
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] 
> Recovering provisioner
> 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 metadata_manager.cpp:286] 
> Successfully loaded 64 Docker images
> 2019-09-09 05:04:26: I0909 05:04:26.388420 89932 provisioner.cpp:494] 
> Provisioner