0.21 is really old and not supported. I highly recommend you upgrade to
1.3+.

Regarding what you are seeing, we definitely had issues in the past where
the command executor didn't stay up long enough to guarantee that
TASK_FINISHED was delivered to the agent; so races like above were possible.

On Tue, Jan 9, 2018 at 5:33 PM, Ajay V <ajayv...@gmail.com> wrote:

> Hello,
>
> I'm trying to debug a TASK_LOST thats generated on the agent that I see on
> rare occasions.
>
> Following is a log that I'm trying to understand. This is happening after
> the driver.sendStatusUpdate() has been called with a task state of
> TASK_FINISHED from a java executor. It looks to me like the container is
> already exited before the TASK_FINISHED  is processed. Is there a timing
> issue here in this version of mesos that is causing this? The effect of
> this problem is that, even though the work of the executor is complete and
> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
> marked as LOST and the actual update of TASK_FINISHED is ignored.
>
> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited
>
> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
> Failed to get usage: No process found at 28952
>
> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470-0000 exited with status 0
>
> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0
>
> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109
>
> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000
>
> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000 to the slave
>
> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000 to master@17.179.96.8:5050
>
> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
> successfully handled status update TASK_LOST (UUID: 
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5)
> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000
>
> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status
> update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
> task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-0000
>
> I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
> status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
> framework 20171208-050805-140555025-5050-3470-0000
>
> I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
> successfully handled status update acknowledgement (UUID:
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task 
> ff631ad1-cfab-493e-be18-961581abcf3d
> of framework 20171208-050805-140555025-5050-3470-0000
>
> I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470-0000
>
> Regards,
> Ajay
>

Reply via email to