Thanks for getting back Vinod. So, does that mean that even for v1.2, these
race conditions (where the command executor doesn't stay long enough )
existed and that 1.3 versions fixes them ?. Reason for asking is because I
did try an upgrade to v1.2 and still found very similar issues.

Regards,
Ajay

On Tue, Jan 9, 2018 at 6:48 PM, Vinod Kone <[email protected]> wrote:

> 0.21 is really old and not supported. I highly recommend you upgrade to
> 1.3+.
>
> Regarding what you are seeing, we definitely had issues in the past where
> the command executor didn't stay up long enough to guarantee that
> TASK_FINISHED was delivered to the agent; so races like above were possible.
>
> On Tue, Jan 9, 2018 at 5:33 PM, Ajay V <[email protected]> wrote:
>
>> Hello,
>>
>> I'm trying to debug a TASK_LOST thats generated on the agent that I see
>> on rare occasions.
>>
>> Following is a log that I'm trying to understand. This is happening after
>> the driver.sendStatusUpdate() has been called with a task state of
>> TASK_FINISHED from a java executor. It looks to me like the container is
>> already exited before the TASK_FINISHED  is processed. Is there a timing
>> issue here in this version of mesos that is causing this? The effect of
>> this problem is that, even though the work of the executor is complete and
>> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
>> marked as LOST and the actual update of TASK_FINISHED is ignored.
>>
>> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for
>> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited
>>
>> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
>> 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>>
>> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
>> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>>
>> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
>> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
>> Failed to get usage: No process found at 28952
>>
>> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
>> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
>> 20171208-050805-140555025-5050-3470-0000 exited with status 0
>>
>> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
>> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0
>>
>> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
>> ff631ad1-cfab-493e-be18-961581abcf3d
>>
>> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
>> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109
>>
>> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received
>> status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
>> task ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000
>>
>> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
>> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000 to the slave
>>
>> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update
>> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000 to [email protected]:5050
>>
>> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
>> successfully handled status update TASK_LOST (UUID:
>> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000
>>
>> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received
>> status update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5)
>> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000
>>
>> I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
>> status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
>> framework 20171208-050805-140555025-5050-3470-0000
>>
>> I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
>> successfully handled status update acknowledgement (UUID:
>> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>> 20171208-050805-140555025-5050-3470-0000
>>
>> I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
>> ff631ad1-cfab-493e-be18-961581abcf3d
>>
>> I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
>> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
>> 20171208-050805-140555025-5050-3470-0000
>>
>> Regards,
>> Ajay
>>
>
>

Reply via email to