> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > <https://reviews.apache.org/r/46187/diff/5/?file=1488422#file1488422line682>
> >
> >     hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
>     If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
>     Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
>     Yes, it makes sense. So what we need to do is:
>     1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
>     2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>        2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>        2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
>     
>     Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
>     Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
>     For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.
> 
> Qian Zhang wrote:
>     > what if agent goes down right after executor checks for connectedness 
> in reaped()?
>     
>     Yes, I thought about this case as well, to handle it, it will make the 
> logic here even more complex though the logic of what I posted above is 
> already complex and not that readable. So I agree we should simplify it, what 
> about the following?
>     1. For adapter based executor, in `reaped()` do the self termination with 
> 0 as exit code after 1s.
>     2. For HTTP based executor, in `reaped()`, do the self termination with 1 
> as exit code after 60s (@huadongliu, this should satisfy your requirement :-).
>     Or you think we should always wait for 60s for both adapter based 
> executor and HTTP based executor?
> 
> Vinod Kone wrote:
>     your plan sounds good to me.
> 
> Huadong Liu wrote:
>     Thanks Qian. I am a bit confused with the terminology here. 
>     
>     +Is HTTP command executor the same thing as Mesos command executor in 
> master, i.e. 
> https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp? 
> Chronos uses the Mesos command executor.
>     +Is the "docker executor" one of the adapter based executors? It uses the 
> ExecutorDriver C++ interface. However, 
> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L451 
> comments about HTTP API. Will the comments simply get removed for "1. For 
> adapter based executor, in reaped() do the self termination with 0 as exit 
> code after 1s."?
> 
> Qian Zhang wrote:
>     Huadong, please see my comments below:
>     
>     * https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp 
> is the Mesos command executor which is used by Mesos containerizer, it 
> actually can run in two modes (controlled by the agent flag 
> `--http_command_executor`):
>        1. HTTP based mode (`--http_command_executor=true`): In this mode, it 
> will use the HTTP based executor library to interact with Mesos agent.
>        2. Adapter based mode (`--http_command_executor=false`): In this mode, 
> it will use an adapter (`V0ToV1Adapter`) to call the old `ExecutorDriver` to 
> interact with Mesos agent.
>     * https://github.com/apache/mesos/blob/master/src/docker/executor.cpp is 
> the Docker command executor which is used by Docker containerizer, it 
> currently still uses the old `ExecutorDriver` to interact with Mesos agent. 
> But I think in future, we will also update it just like what we did for the 
> Mesos command executor, see MESOS-5227 for more details.

Thank you Qian.


- Huadong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
-----------------------------------------------------------


On Sept. 15, 2016, 8:57 a.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2016, 8:57 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
>     https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to