-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71343/#review217419
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 10775-10776 (patched)
<https://reviews.apache.org/r/71343/#comment304700>

    It does not seem likely to crash. But to be safe, could we consider to 
relax these CHECKs? E.g., just log a warning and return?


- Gilbert Song


On Aug. 21, 2019, 10:53 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71343/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2019, 10:53 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Qian Zhang.
> 
> 
> Bugs: MESOS-9887
>     https://issues.apache.org/jira/browse/MESOS-9887
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, Mesos agent could send TASK_FAILED status update on
> executor termination while processing of TASK_FINISHED status update
> was in progress. Processing of task status updates involves sending
> requests to the containerizer, which might finish processing of these
> requests out-of-order, e.g. `MesosContainerizer::status`. Also,
> the agent does not overwrite status of the terminal status update once
> it's stored in the `terminatedTasks`. Hence, there was a race condition
> between two terminal status updates.
> 
> Note that V1 Executors are not affected by this problem because they
> wait for an acknowledgement of the terminal status update by the agent
> before terminating.
> 
> This patch introduces a new data structure `pendingStatusUpdates`,
> which holds a list of status updates that are being processed. This
> data structure allows validating the order of processing of status
> updates by the agent.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp a17bbee13cb8291ad694f1520b613764b57b046b 
>   src/slave/slave.cpp 1d0ec9d2428c3ffa28ad3e960b74f171013cf0c2 
> 
> 
> Diff: https://reviews.apache.org/r/71343/diff/2/
> 
> 
> Testing
> -------
> 
> 1. manual testing described in MESOS-9887
> 2. internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to