> On April 15, 2016, 12:49 a.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > <https://reviews.apache.org/r/46187/diff/1/?file=1343828#file1343828line749>
> >
> >     Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> >     
> >     Can you do the following:
> >     
> >     --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> >     
> >     --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> >     
> >     sounds good?
> 
> Vinod Kone wrote:
>     @Qian, any update on this? If this particular review is going to take 
> some time, I think it is still useful two commit the other 2 reviews in this 
> chain. AFAICT, they are independent of this review?
> 
> Qian Zhang wrote:
>     @Vinod, sorry for the late. I have filed a ticket 
> (https://issues.apache.org/jira/browse/MESOS-5262) for enhancing 
> `slave::statusUpdate()` to always ACK the status update sent by executor.
>     
>     And can you please elaborate about the specific scenarios this executor 
> could not terminate forever. Originially I thought the scenario should be: 
> executor sends a terminal status upate to slave when the corresponding 
> framework is in `TERMINATING` state (e.g., operator tears down the 
> framework), then in `Slave::statusUpdate()`, this status update will be 
> ignored, so the executor will not get the ACK. But after testing, I found in 
> this case the executor can still terminate, because the container 
> corresponded to this executor will be destroyed by 
> `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so 
> after `--executor_shutdown_grace_period`, the executor can still terminate. 
> So I am not in which case the executor will never terminate.
>     
>     And yes, the other 2 patches are independent of this one, I will make 
> them not depending on this one in the review board, thanks!

After more thinking, I see one scenario the executor could never terminate is: 
agent is down right after it sends SHUTDOWN event to executor. In this case, 
when handling the SHUTDOWN event, executor will kill the task and send 
TASK_KILLED status update to agent, but it will not get ACK since agent is 
already down, so the executor will still run. But I think once agent is started 
again, executor will receive the ACK and then terminate. I am not sure if this 
behavior is OK, or we want executor to terminate once it receives the SHUTDOWN 
event rather than wait for agent is started again?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
-----------------------------------------------------------


On April 14, 2016, 1:17 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> -----------------------------------------------------------
> 
> (Updated April 14, 2016, 1:17 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
>     https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -----
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to