> On Jan. 29, 2020, 6:28 p.m., Qian Zhang wrote: > > The commit message seems not accurate to me: > > > This could lead to termination of the executor before receiving all > > > status update acknowledgments from the agent. > > > > I think the issue that we wanted to mitigate is, executor may shutdown > > itself before the terminal status update (rather than the acks) is sent to > > agent. > > Andrei Budnik wrote: > Updated the description. > > Qian Zhang wrote: > > This could lead to termination of the executor before processing of a > terminal status update by the agent. > > What do you mean for `before processing of a terminal status update by > the agent`? Executor processes terminal status update sent by the agent? I > think it should be `before the terminal status update is sent to the agent`. > > Andrei Budnik wrote: > In the case of MESOS-9847, a terminal status update acknowledgment was > delivered to the agent, but the executor had been terminated before the agent > processed the status update in the `Slave::statusUpdate`.
OK, then I think we need to mention both of the two cases in the commit message. 1. Executor terminates before it sends terminal status update to agent. This may lead to a wrong terminal status update. 2. Executor terminates before agent finishes processing the terminal status update. This may lead to two terminal status updates. - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72055/#review219410 ----------------------------------------------------------- On Jan. 30, 2020, 12:23 a.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72055/ > ----------------------------------------------------------- > > (Updated Jan. 30, 2020, 12:23 a.m.) > > > Review request for mesos, Andrei Sekretenko, Greg Mann, Qian Zhang, and Vinod > Kone. > > > Bugs: MESOS-9847 > https://issues.apache.org/jira/browse/MESOS-9847 > > > Repository: mesos > > > Description > ------- > > Previously, the Docker executor terminated itself after a task's > container had terminated. This could lead to termination of the > executor before processing of a terminal status update by the agent. > In order to mitigate this issue, the executor slept for one second to > give a chance to send all status updates and receive all status update > acknowledgments before terminating itself. This might have led to > various race conditions in some circumstances (e.g., on a slow host). > This patch terminates the Docker executor after receiving a terminal > status update acknowledgment. Also, this patch increases the timeout > from one second to one minute for fail-safety. > > > Diffs > ----- > > src/docker/executor.cpp 132f42bfa42c846fc5dc40f7763aa0b5d12a7798 > src/exec/exec.cpp 69e5e24b248c7c913421de5e42713c34fd79ad46 > > > Diff: https://reviews.apache.org/r/72055/diff/2/ > > > Testing > ------- > > internal CI > > > Thanks, > > Andrei Budnik > >