> On Jan. 29, 2020, 6:28 p.m., Qian Zhang wrote:
> > The commit message seems not accurate to me:
> > > This could lead to termination of the executor before receiving all 
> > > status update acknowledgments from the agent.
> > 
> > I think the issue that we wanted to mitigate is, executor may shutdown 
> > itself before the terminal status update (rather than the acks) is sent to 
> > agent.
> 
> Andrei Budnik wrote:
>     Updated the description.
> 
> Qian Zhang wrote:
>     > This could lead to termination of the executor before processing of a 
> terminal status update by the agent.
>     
>     What do you mean for `before processing of a terminal status update by 
> the agent`? Executor processes terminal status update sent by the agent? I 
> think it should be `before the terminal status update is sent to the agent`.
> 
> Andrei Budnik wrote:
>     In the case of MESOS-9847, a terminal status update acknowledgment was 
> delivered to the agent, but the executor had been terminated before the agent 
> processed the status update in the `Slave::statusUpdate`.

OK, then I think we need to mention both of the two cases in the commit message.
1. Executor terminates before it sends terminal status update to agent. This 
may lead to a wrong terminal status update.
2. Executor terminates before agent finishes processing the terminal status 
update. This may lead to two terminal status updates.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72055/#review219410
-----------------------------------------------------------


On Jan. 30, 2020, 12:23 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72055/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2020, 12:23 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko, Greg Mann, Qian Zhang, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-9847
>     https://issues.apache.org/jira/browse/MESOS-9847
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the Docker executor terminated itself after a task's
> container had terminated. This could lead to termination of the
> executor before processing of a terminal status update by the agent.
> In order to mitigate this issue, the executor slept for one second to
> give a chance to send all status updates and receive all status update
> acknowledgments before terminating itself. This might have led to
> various race conditions in some circumstances (e.g., on a slow host).
> This patch terminates the Docker executor after receiving a terminal
> status update acknowledgment. Also, this patch increases the timeout
> from one second to one minute for fail-safety.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp 132f42bfa42c846fc5dc40f7763aa0b5d12a7798 
>   src/exec/exec.cpp 69e5e24b248c7c913421de5e42713c34fd79ad46 
> 
> 
> Diff: https://reviews.apache.org/r/72055/diff/2/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to