Thanks for getting back Vinod. So, does that mean that even for v1.2, these race conditions (where the command executor doesn't stay long enough ) existed and that 1.3 versions fixes them ?. Reason for asking is because I did try an upgrade to v1.2 and still found very similar issues.
Regards, Ajay On Tue, Jan 9, 2018 at 6:48 PM, Vinod Kone <[email protected]> wrote: > 0.21 is really old and not supported. I highly recommend you upgrade to > 1.3+. > > Regarding what you are seeing, we definitely had issues in the past where > the command executor didn't stay up long enough to guarantee that > TASK_FINISHED was delivered to the agent; so races like above were possible. > > On Tue, Jan 9, 2018 at 5:33 PM, Ajay V <[email protected]> wrote: > >> Hello, >> >> I'm trying to debug a TASK_LOST thats generated on the agent that I see >> on rare occasions. >> >> Following is a log that I'm trying to understand. This is happening after >> the driver.sendStatusUpdate() has been called with a task state of >> TASK_FINISHED from a java executor. It looks to me like the container is >> already exited before the TASK_FINISHED is processed. Is there a timing >> issue here in this version of mesos that is causing this? The effect of >> this problem is that, even though the work of the executor is complete and >> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is >> marked as LOST and the actual update of TASK_FINISHED is ignored. >> >> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for >> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited >> >> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container >> 'bb0e5f2d-4bdb-479c-b829-4741993c4109' >> >> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown >> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' >> >> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource >> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because: >> Failed to get usage: No process found at 28952 >> >> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor >> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework >> 20171208-050805-140555025-5050-3470-0000 exited with status 0 >> >> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update >> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task >> ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 from @0.0.0.0:0 >> >> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task >> ff631ad1-cfab-493e-be18-961581abcf3d >> >> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for >> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109 >> >> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received >> status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for >> task ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 >> >> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding >> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task >> ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 to the slave >> >> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update >> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task >> ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 to [email protected]:5050 >> >> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager >> successfully handled status update TASK_LOST (UUID: >> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task >> ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 >> >> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received >> status update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) >> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 >> >> I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up >> status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of >> framework 20171208-050805-140555025-5050-3470-0000 >> >> I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager >> successfully handled status update acknowledgement (UUID: >> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task >> ff631ad1-cfab-493e-be18-961581abcf3d of framework >> 20171208-050805-140555025-5050-3470-0000 >> >> I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task >> ff631ad1-cfab-493e-be18-961581abcf3d >> >> I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor >> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework >> 20171208-050805-140555025-5050-3470-0000 >> >> Regards, >> Ajay >> > >

