----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72055/#review219410 -----------------------------------------------------------
The commit message seems not accurate to me: > This could lead to termination of the executor before receiving all status > update acknowledgments from the agent. I think the issue that we wanted to mitigate is, executor may shutdown itself before the terminal status update (rather than the acks) is sent to agent. src/docker/executor.cpp Lines 786-787 (original) <https://reviews.apache.org/r/72055/#comment307583> We have a fail safe in command executor: https://github.com/apache/mesos/blob/1.9.0/src/launcher/executor.cpp#L1060:L1062 , do we want do the similar in Docker executor to ensure it can still self terminate in case the agent doesn't send an ACK for the terminal update for some reason? src/exec/exec.cpp Line 420 (original), 426 (patched) <https://reviews.apache.org/r/72055/#comment307578> Why do we remove the task only if it is a terminal status update acked? That is not our previous implemention where we always remove the task no matter it is a terminal status update or not, and it is also not consistent with what we have done in command executor: https://github.com/apache/mesos/blob/1.9.0/src/launcher/executor.cpp#L246:L248 src/exec/exec.cpp Lines 435 (patched) <https://reviews.apache.org/r/72055/#comment307584> Do we want a `return;` after this code? - Qian Zhang On Jan. 28, 2020, 10:13 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72055/ > ----------------------------------------------------------- > > (Updated Jan. 28, 2020, 10:13 p.m.) > > > Review request for mesos, Andrei Sekretenko, Greg Mann, Qian Zhang, and Vinod > Kone. > > > Bugs: MESOS-9847 > https://issues.apache.org/jira/browse/MESOS-9847 > > > Repository: mesos > > > Description > ------- > > Previously, the Docker executor terminated itself after a task's > container had terminated. This could lead to termination of the > executor before receiving all status update acknowledgments from > the agent. In order to mitigate this issue, the executor slept for > one second to give a chance to send all status updates and receive > all status update acknowledgments before terminating itself. This > might have led to various race conditions in some circumstances > (e.g., on a slow host). This patch terminates the Docker executor > after receiving a terminal status update acknowledgment. Also, > this patch removes the unnecessary call of sleep. > > > Diffs > ----- > > src/docker/executor.cpp 132f42bfa42c846fc5dc40f7763aa0b5d12a7798 > src/exec/exec.cpp 69e5e24b248c7c913421de5e42713c34fd79ad46 > > > Diff: https://reviews.apache.org/r/72055/diff/1/ > > > Testing > ------- > > internal CI > > > Thanks, > > Andrei Budnik > >
