-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71343/
-----------------------------------------------------------
Review request for mesos, Gilbert Song, Greg Mann, and Qian Zhang.
Bugs: MESOS-9887
https://issues.apache.org/jira/browse/MESOS-9887
Repository: mesos
Description
-------
Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.
Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.
This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.
Diffs
-----
src/slave/slave.hpp a17bbee13cb8291ad694f1520b613764b57b046b
src/slave/slave.cpp 1d0ec9d2428c3ffa28ad3e960b74f171013cf0c2
Diff: https://reviews.apache.org/r/71343/diff/1/
Testing
-------
1. manual testing described in MESOS-9887
2. internal CI
Thanks,
Andrei Budnik