[ https://issues.apache.org/jira/browse/MESOS-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mansheng Yang updated MESOS-5632: --------------------------------- Description: [This ticket|https://issues.apache.org/jira/browse/MESOS-3573] is marked as resolved but it was only partially fixed. As mentioned in that ticket, if you start a docker container, kill the docker-executor process, then a new container will be started but the old one will still be there. Some logs: {noformat} I0617 15:01:22.851604 7285 docker.cpp:877] Recovering container '71695f70-afad-421d-8636-deb6724ecaca' for executor 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework '317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000' I0617 15:01:22.853303 7285 docker.cpp:2107] Executor for container '71695f70-afad-421d-8636-deb6724ecaca' has exited I0617 15:01:22.853327 7285 docker.cpp:1826] Destroying container '71695f70-afad-421d-8636-deb6724ecaca' I0617 15:01:22.853575 7285 docker.cpp:1954] Running docker stop on container '71695f70-afad-421d-8636-deb6724ecaca' I0617 15:01:22.853607 7285 docker.cpp:1956] Running docker stop on container 'mesos-cbb3d52c-b6dd-4b7e-864d-705fc2fab983-S4.71695f70-afad-421d-8636-deb6724ecaca'0 I0617 15:01:22.854801 7283 slave.cpp:4767] Sending reconnect request to executor 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 at executor(1)@127.0.1.1:56304 E0617 15:01:22.855870 7283 process.cpp:2040] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0617 15:01:22.855974 7283 slave.cpp:4118] Termination of executor 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 failed: Unknown container: 71695f70-afad-421d-8636-deb6724ecaca I0617 15:01:22.857015 7283 slave.cpp:3257] Handling status update TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 from @0.0.0.0:0 W0617 15:01:22.858330 7288 docker.cpp:1403] Ignoring updating unknown container: 71695f70-afad-421d-8636-deb6724ecaca I0617 15:01:22.858819 7288 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 I0617 15:01:22.858986 7288 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 W0617 15:01:22.920336 7289 slave.cpp:3601] Dropping status update TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 sent by status update manager because the agent is in RECOVERING state {noformat} > Orphaned docker container not killed if executor has exited > ----------------------------------------------------------- > > Key: MESOS-5632 > URL: https://issues.apache.org/jira/browse/MESOS-5632 > Project: Mesos > Issue Type: Bug > Components: docker, slave > Reporter: Mansheng Yang > > [This ticket|https://issues.apache.org/jira/browse/MESOS-3573] is marked as > resolved but it was only partially fixed. > As mentioned in that ticket, if you start a docker container, kill the > docker-executor process, then a new container will be started but the old one > will still be there. > Some logs: > {noformat} > I0617 15:01:22.851604 7285 docker.cpp:877] Recovering container > '71695f70-afad-421d-8636-deb6724ecaca' for executor > 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework > '317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000' > I0617 15:01:22.853303 7285 docker.cpp:2107] Executor for container > '71695f70-afad-421d-8636-deb6724ecaca' has exited > I0617 15:01:22.853327 7285 docker.cpp:1826] Destroying container > '71695f70-afad-421d-8636-deb6724ecaca' > I0617 15:01:22.853575 7285 docker.cpp:1954] Running docker stop on container > '71695f70-afad-421d-8636-deb6724ecaca' > I0617 15:01:22.853607 7285 docker.cpp:1956] Running docker stop on container > 'mesos-cbb3d52c-b6dd-4b7e-864d-705fc2fab983-S4.71695f70-afad-421d-8636-deb6724ecaca'0 > I0617 15:01:22.854801 7283 slave.cpp:4767] Sending reconnect request to > executor 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 at executor(1)@127.0.1.1:56304 > E0617 15:01:22.855870 7283 process.cpp:2040] Failed to shutdown socket with > fd 10: Transport endpoint is not connected > E0617 15:01:22.855974 7283 slave.cpp:4118] Termination of executor > 'kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d' of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 failed: Unknown container: > 71695f70-afad-421d-8636-deb6724ecaca > I0617 15:01:22.857015 7283 slave.cpp:3257] Handling status update > TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task > kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 from @0.0.0.0:0 > W0617 15:01:22.858330 7288 docker.cpp:1403] Ignoring updating unknown > container: 71695f70-afad-421d-8636-deb6724ecaca > I0617 15:01:22.858819 7288 status_update_manager.cpp:320] Received status > update TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task > kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 > I0617 15:01:22.858986 7288 status_update_manager.cpp:824] Checkpointing > UPDATE for status update TASK_FAILED (UUID: > b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task > kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 > W0617 15:01:22.920336 7289 slave.cpp:3601] Dropping status update > TASK_FAILED (UUID: b5dfa1dc-62db-4fb5-93c8-958d22f930df) for task > kafka2.3802f3c9-3459-11e6-bf06-6e0c5199624d of framework > 317ab6ce-d599-4ad4-bae2-eb74a6c42d87-0000 sent by status update manager > because the agent is in RECOVERING state > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)