0.21 is really old and not supported. I highly recommend you upgrade to 1.3+.
Regarding what you are seeing, we definitely had issues in the past where the command executor didn't stay up long enough to guarantee that TASK_FINISHED was delivered to the agent; so races like above were possible. On Tue, Jan 9, 2018 at 5:33 PM, Ajay V <ajayv...@gmail.com> wrote: > Hello, > > I'm trying to debug a TASK_LOST thats generated on the agent that I see on > rare occasions. > > Following is a log that I'm trying to understand. This is happening after > the driver.sendStatusUpdate() has been called with a task state of > TASK_FINISHED from a java executor. It looks to me like the container is > already exited before the TASK_FINISHED is processed. Is there a timing > issue here in this version of mesos that is causing this? The effect of > this problem is that, even though the work of the executor is complete and > the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is > marked as LOST and the actual update of TASK_FINISHED is ignored. > > I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container > 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited > > I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container > 'bb0e5f2d-4bdb-479c-b829-4741993c4109' > > W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown > container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' > > W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource > statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because: > Failed to get usage: No process found at 28952 > > I0108 10:16:52.899657 37278 slave.cpp:2898] Executor > 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework > 20171208-050805-140555025-5050-3470-0000 exited with status 0 > > I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update > TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task > ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0 > > I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task > ff631ad1-cfab-493e-be18-961581abcf3d > > W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for > unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109 > > I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status > update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task > ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 > > I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding > update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task > ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 to the slave > > I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update > TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task > ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 to master@17.179.96.8:5050 > > I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager > successfully handled status update TASK_LOST (UUID: > f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) > for task ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 > > I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status > update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for > task ff631ad1-cfab-493e-be18-961581abcf3d of framework > 20171208-050805-140555025-5050-3470-0000 > > I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up > status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of > framework 20171208-050805-140555025-5050-3470-0000 > > I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager > successfully handled status update acknowledgement (UUID: > f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task > ff631ad1-cfab-493e-be18-961581abcf3d > of framework 20171208-050805-140555025-5050-3470-0000 > > I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task > ff631ad1-cfab-493e-be18-961581abcf3d > > I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor > 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework > 20171208-050805-140555025-5050-3470-0000 > > Regards, > Ajay >