Hello, I'm trying to debug a TASK_LOST thats generated on the agent that I see on rare occasions.
Following is a log that I'm trying to understand. This is happening after the driver.sendStatusUpdate() has been called with a task state of TASK_FINISHED from a java executor. It looks to me like the container is already exited before the TASK_FINISHED is processed. Is there a timing issue here in this version of mesos that is causing this? The effect of this problem is that, even though the work of the executor is complete and the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is marked as LOST and the actual update of TASK_FINISHED is ignored. I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because: Failed to get usage: No process found at 28952 I0108 10:16:52.899657 37278 slave.cpp:2898] Executor 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework 20171208-050805-140555025-5050-3470-0000 exited with status 0 I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0 I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task ff631ad1-cfab-493e-be18-961581abcf3d W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109 I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 to the slave I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 to [email protected]:5050 I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager successfully handled status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager successfully handled status update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task ff631ad1-cfab-493e-be18-961581abcf3d of framework 20171208-050805-140555025-5050-3470-0000 I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task ff631ad1-cfab-493e-be18-961581abcf3d I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework 20171208-050805-140555025-5050-3470-0000 Regards, Ajay

