Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-22 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Sept. 23, 2016, 8:17 a.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Rebased


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-19 Thread Qian Zhang


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> >
> 
> Vinod Kone wrote:
> also, can you add a test for this?
> 
> Qian Zhang wrote:
> Sure, I can add a test. I am just wondering how to use a test to verify 
> the logic we added in this patch, maybe we can just add a test to launch a 
> short task and check `ContainerTermination.status` is 0 when 
> `Slave::executorTerminated()` is call which means the HTTP command executor 
> has received the ACK from agent for the terminal status update. Any comments?
> 
> Vinod Kone wrote:
> That sounds like a good test. Can you add that?

Posted a patch for the test:
https://reviews.apache.org/r/52075/


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 15, 2016, 4:57 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 15, 2016, 4:57 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Vinod Kone


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> >
> 
> Vinod Kone wrote:
> also, can you add a test for this?
> 
> Qian Zhang wrote:
> Sure, I can add a test. I am just wondering how to use a test to verify 
> the logic we added in this patch, maybe we can just add a test to launch a 
> short task and check `ContainerTermination.status` is 0 when 
> `Slave::executorTerminated()` is call which means the HTTP command executor 
> has received the ACK from agent for the terminal status update. Any comments?

That sounds like a good test. Can you add that?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 15, 2016, 8:57 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 15, 2016, 8:57 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review149146
---


Ship it!




Ship It!

- Vinod Kone


On Sept. 15, 2016, 8:57 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 15, 2016, 8:57 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Huadong Liu


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
> For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.
> 
> Qian Zhang wrote:
> > what if agent goes down right after executor checks for connectedness 
> in reaped()?
> 
> Yes, I thought about this case as well, to handle it, it will make the 
> logic here even more complex though the logic of what I posted above is 
> already complex and not that readable. So I agree we should simplify it, what 
> about the following?
> 1. For adapter based executor, in `reaped()` do the self termination with 
> 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, do the self termination with 1 
> as exit code after 60s (@huadongliu, this should satisfy your requirement :-).
> Or you think we should always wait for 60s for both adapter based 
> executor and HTTP based executor?
> 
> Vinod Kone wrote:
> your plan sounds good to me.
> 
> Huadong Liu wrote:
> Thanks Qian. I am a bit confused with the terminology here. 
> 
> +Is HTTP command executor the same thing as Mesos command executor in 
> master, i.e. 
> https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp? 
> Chronos uses the Mesos command executor.
> +Is the "docker executor" one of the adapter based executors? It uses the 
> ExecutorDriver C++ interface. However, 
> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L451 
> comments about HTTP API. Will the comments simply get removed for "1. For 
> adapter based executor, in reaped() do the self termination with 0 as exit 
> code after 1s."?
> 
> Qian Zhang wrote:
> Huadong, please see my comments below:
> 
> * https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp 
> is the Mesos command executor which is used by Mesos containerizer, it 
> actually can run in two modes (controlled by the agent flag 
> `--http_command_executor`):
>1. HTTP based mode (`--http_command_executor=true`): In this mode, it 
> will use the HTTP based executor library to interact with Mesos agent.
>2. Adapter based mode (`--http_command_executor=false`): In this mode, 
> it will use an adapter (`V0ToV1Adapter`) to call the old `ExecutorDriver` to 
> interact with Mesos agent.
> 

Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review149057
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Sept. 15, 2016, 8:57 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 15, 2016, 8:57 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Qian Zhang


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
> For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.
> 
> Qian Zhang wrote:
> > what if agent goes down right after executor checks for connectedness 
> in reaped()?
> 
> Yes, I thought about this case as well, to handle it, it will make the 
> logic here even more complex though the logic of what I posted above is 
> already complex and not that readable. So I agree we should simplify it, what 
> about the following?
> 1. For adapter based executor, in `reaped()` do the self termination with 
> 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, do the self termination with 1 
> as exit code after 60s (@huadongliu, this should satisfy your requirement :-).
> Or you think we should always wait for 60s for both adapter based 
> executor and HTTP based executor?
> 
> Vinod Kone wrote:
> your plan sounds good to me.
> 
> Huadong Liu wrote:
> Thanks Qian. I am a bit confused with the terminology here. 
> 
> +Is HTTP command executor the same thing as Mesos command executor in 
> master, i.e. 
> https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp? 
> Chronos uses the Mesos command executor.
> +Is the "docker executor" one of the adapter based executors? It uses the 
> ExecutorDriver C++ interface. However, 
> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L451 
> comments about HTTP API. Will the comments simply get removed for "1. For 
> adapter based executor, in reaped() do the self termination with 0 as exit 
> code after 1s."?

Huadong, please see my comments below:

* https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp is the 
Mesos command executor which is used by Mesos containerizer, it actually can 
run in two modes (controlled by the agent flag `--http_command_executor`):
   1. HTTP based mode (`--http_command_executor=true`): In this mode, it will 
use the HTTP based executor library to interact with Mesos agent.
   2. Adapter based mode (`--http_command_executor=false`): In this mode, it 
will use an adapter (`V0ToV1Adapter`) to call the old `ExecutorDriver` to 
interact with Mesos agent.
* 

Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-15 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Sept. 15, 2016, 4:57 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Addressed @Vinod's comments.


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-14 Thread Huadong Liu


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
> For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.
> 
> Qian Zhang wrote:
> > what if agent goes down right after executor checks for connectedness 
> in reaped()?
> 
> Yes, I thought about this case as well, to handle it, it will make the 
> logic here even more complex though the logic of what I posted above is 
> already complex and not that readable. So I agree we should simplify it, what 
> about the following?
> 1. For adapter based executor, in `reaped()` do the self termination with 
> 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, do the self termination with 1 
> as exit code after 60s (@huadongliu, this should satisfy your requirement :-).
> Or you think we should always wait for 60s for both adapter based 
> executor and HTTP based executor?
> 
> Vinod Kone wrote:
> your plan sounds good to me.

Thanks Qian. I am a bit confused with the terminology here. 

+Is HTTP command executor the same thing as Mesos command executor in master, 
i.e. https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp? 
Chronos uses the Mesos command executor.
+Is the "docker executor" one of the adapter based executors? It uses the 
ExecutorDriver C++ interface. However, 
https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L451 
comments about HTTP API. Will the comments simply get removed for "1. For 
adapter based executor, in reaped() do the self termination with 0 as exit code 
after 1s."?


- Huadong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos

Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-13 Thread Vinod Kone


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
> For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.
> 
> Qian Zhang wrote:
> > what if agent goes down right after executor checks for connectedness 
> in reaped()?
> 
> Yes, I thought about this case as well, to handle it, it will make the 
> logic here even more complex though the logic of what I posted above is 
> already complex and not that readable. So I agree we should simplify it, what 
> about the following?
> 1. For adapter based executor, in `reaped()` do the self termination with 
> 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, do the self termination with 1 
> as exit code after 60s (@huadongliu, this should satisfy your requirement :-).
> Or you think we should always wait for 60s for both adapter based 
> executor and HTTP based executor?

your plan sounds good to me.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-13 Thread Qian Zhang


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?
> 
> Vinod Kone wrote:
> For 2.2) what if agent goes down right after executor checks for 
> connectedness in `reaped()`? seems like it would only wait 3s which seems 
> non-ideal. Maybe this logic is becoming too complex? To simplify, how about 
> having the executor always wait for a resonable amount of time (60s) in 
> `reaped()`? HTTP based executor would exit early if they get an ACK for 
> terminal update. How does that sound? We would need to make sure that our 
> unit tests (that use command executor) are not slowed down because of this.

> what if agent goes down right after executor checks for connectedness in 
> reaped()?

Yes, I thought about this case as well, to handle it, it will make the logic 
here even more complex though the logic of what I posted above is already 
complex and not that readable. So I agree we should simplify it, what about the 
following?
1. For adapter based executor, in `reaped()` do the self termination with 0 as 
exit code after 1s.
2. For HTTP based executor, in `reaped()`, do the self termination with 1 as 
exit code after 60s (@huadongliu, this should satisfy your requirement :-).
Or you think we should always wait for 60s for both adapter based executor and 
HTTP based executor?


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 10:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 10:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-13 Thread Vinod Kone


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.
> 
> Huadong Liu wrote:
> Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It 
> helped at some extent, but seemed not enough for all the cases. Since the 
> executor will terminate once an ACK is received, why not delay for an 
> extended period of time, say 10s, or even 30s for rare cases (e.g. agent 
> being extermely busy)?

For 2.2) what if agent goes down right after executor checks for connectedness 
in `reaped()`? seems like it would only wait 3s which seems non-ideal. Maybe 
this logic is becoming too complex? To simplify, how about having the executor 
always wait for a resonable amount of time (60s) in `reaped()`? HTTP based 
executor would exit early if they get an ACK for terminal update. How does that 
sound? We would need to make sure that our unit tests (that use command 
executor) are not slowed down because of this.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-13 Thread Huadong Liu


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?
> 
> Qian Zhang wrote:
> Yes, it makes sense. So what we need to do is:
> 1. For adapter based executor, keep what I am doing now, i.e., in 
> `reaped()` do the self termination with 0 as exit code after 1s.
> 2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is 
> set:
>2.1 If it is not set, do the self termination with 1 as exit code 
> after 3s (1s is too short).
>2.2 If it is set, check if the executor is disconnected from agent, if 
> yes, do the self termination with 1 as exit code after 
> `MESOS_RECOVERY_TIMEOUT`, if no, do the self termination with 1 as exit code 
> after 3s.
> 
> Please let me know if you have further comments, I will update the patch 
> soon.

Hi Qian, Vinod, I have tried 3s in Yelp's mesos/chronos environments. It helped 
at some extent, but seemed not enough for all the cases. Since the executor 
will terminate once an ACK is received, why not delay for an extended period of 
time, say 10s, or even 30s for rare cases (e.g. agent being extermely busy)?


- Huadong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-13 Thread Qian Zhang


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?
> 
> Vinod Kone wrote:
> Executor exiting with -1 code when an agent is down or restarting 
> (probably for an upgrade) seems unfortunate. Since we allow agents to be down 
> for upto "MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is 
> set, maybe the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it 
> is disconnected? If it is connected then yes, it probably should wait for 
> less time (1s is too short?) and then exit because that's seems like there is 
> something wrong with the agent. Does that make sense?

Yes, it makes sense. So what we need to do is:
1. For adapter based executor, keep what I am doing now, i.e., in `reaped()` do 
the self termination with 0 as exit code after 1s.
2. For HTTP based executor, in `reaped()`, check if `MESOS_CHECKPOINT` is set:
   2.1 If it is not set, do the self termination with 1 as exit code after 3s 
(1s is too short).
   2.2 If it is set, check if the executor is disconnected from agent, if yes, 
do the self termination with 1 as exit code after `MESOS_RECOVERY_TIMEOUT`, if 
no, do the self termination with 1 as exit code after 3s.

Please let me know if you have further comments, I will update the patch soon.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 10:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 10:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-09 Thread Vinod Kone


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
> If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?

Executor exiting with -1 code when an agent is down or restarting (probably for 
an upgrade) seems unfortunate. Since we allow agents to be down for upto 
"MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is set, maybe 
the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it is 
disconnected? If it is connected then yes, it probably should wait for less 
time (1s is too short?) and then exit because that's seems like there is 
something wrong with the agent. Does that make sense?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-07 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review148042
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-07 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Sept. 7, 2016, 10:03 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Addressed Vinod's comments.


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-07 Thread Qian Zhang


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> >
> 
> Vinod Kone wrote:
> also, can you add a test for this?

Sure, I can add a test. I am just wondering how to use a test to verify the 
logic we added in this patch, maybe we can just add a test to launch a short 
task and check `ContainerTermination.status` is 0 when 
`Slave::executorTerminated()` is call which means the HTTP command executor has 
received the ACK from agent for the terminal status update. Any comments?


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, lines 521-522
> > 
> >
> > why this change?

Sorry, it was introduced by mistake, will remove it.


> On Sept. 6, 2016, 7:33 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > 
> >
> > hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?

If executor does not receives an ACK from agent within 1 second, that means 
there should be something wrong in agent, so with this code, the executor will 
exit with -1 and a message so that we can catch such situation. Maybe we should 
enlarge this timeout a bit (e.g., 3 seconds) to be safer?


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Aug. 30, 2016, 2:01 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 30, 2016, 2:01 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-06 Thread Vinod Kone


> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> >

also, can you add a test for this?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---


On Aug. 30, 2016, 6:01 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 30, 2016, 6:01 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-09-06 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
---




src/launcher/executor.cpp (lines 226 - 231)


move this #220.

just look inside `updates` map. see comment below.



src/launcher/executor.cpp (lines 227 - 228)


How about:

// NOTE: The executor receives an ACK iff it uses the HTTP library. No ACK
// will be received if V0ToV1Adapter is used.



src/launcher/executor.cpp (lines 521 - 522)


why this change?



src/launcher/executor.cpp (line 674)


s/driver/adapter/



src/launcher/executor.cpp (line 675)


s/sent/is sent/



src/launcher/executor.cpp (lines 678 - 681)


// 2. For HTTP based executor, this is a fail safe in case the
// agent doesn't send an ACK for the terminal update for some reason.



src/launcher/executor.cpp (line 682)


hmm. what's the guarantee that an HTTP based executor receives an ACK 
within a second? what if the agent is down?



src/launcher/executor.cpp (line 751)


don't think you need `terminalStatusUUID`. you can just look in the 
`updates` map.



src/launcher/executor.cpp (lines 810 - 811)


see above. kill this.


- Vinod Kone


On Aug. 30, 2016, 6:01 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 30, 2016, 6:01 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-30 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147277
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 30, 2016, 6:01 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 30, 2016, 6:01 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-30 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Aug. 30, 2016, 2:01 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Renamed `SelfTerminate()` to `selfTerminate()`.


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-28 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147115
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 29, 2016, 3:06 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 29, 2016, 3:06 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 71ede1ea4f4e97fe94bd2bd136f17f231cedbce6 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-28 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Aug. 29, 2016, 11:06 a.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Rebase


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/executor.cpp 71ede1ea4f4e97fe94bd2bd136f17f231cedbce6 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-26 Thread Qian Zhang


> On Aug. 26, 2016, 1:59 a.m., Vinod Kone wrote:
> > qian, sorry for the delay on getting back on this one. are you still 
> > interested in seeing this through? if yes, can you please rebase?

Sure Vinod, I will do the rebase later.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review146821
---


On Aug. 26, 2016, 4:43 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 26, 2016, 4:43 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> 0b4136c040ec9cf585c0d38f8471495a855cd640 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-25 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review146903
---



Bad patch!

Reviews applied: [46187]

Failed command: ./support/apply-review.sh -n -r 46187

Error:
2016-08-26 00:58:46 URL:https://reviews.apache.org/r/46187/diff/raw/ 
[2687/2687] -> "46187.patch" [1]
error: src/launcher/http_command_executor.cpp: does not exist in index

Full log: https://builds.apache.org/job/mesos-reviewbot/14985/console

- Mesos ReviewBot


On Aug. 25, 2016, 8:43 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated Aug. 25, 2016, 8:43 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
> https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> 0b4136c040ec9cf585c0d38f8471495a855cd640 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-25 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated Aug. 25, 2016, 8:43 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

fixed bug id -- @vinodkone


Bugs: MESOS-5276
https://issues.apache.org/jira/browse/MESOS-5276


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs
-

  src/launcher/http_command_executor.cpp 
0b4136c040ec9cf585c0d38f8471495a855cd640 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-25 Thread Vinod Kone


> On April 14, 2016, 4:49 p.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > 
> >
> > Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> > 
> > Can you do the following:
> > 
> > --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> > 
> > --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> > 
> > sounds good?
> 
> Vinod Kone wrote:
> @Qian, any update on this? If this particular review is going to take 
> some time, I think it is still useful two commit the other 2 reviews in this 
> chain. AFAICT, they are independent of this review?
> 
> Qian Zhang wrote:
> @Vinod, sorry for the late. I have filed a ticket 
> (https://issues.apache.org/jira/browse/MESOS-5262) for enhancing 
> `slave::statusUpdate()` to always ACK the status update sent by executor.
> 
> And can you please elaborate about the specific scenarios this executor 
> could not terminate forever. Originially I thought the scenario should be: 
> executor sends a terminal status upate to slave when the corresponding 
> framework is in `TERMINATING` state (e.g., operator tears down the 
> framework), then in `Slave::statusUpdate()`, this status update will be 
> ignored, so the executor will not get the ACK. But after testing, I found in 
> this case the executor can still terminate, because the container 
> corresponded to this executor will be destroyed by 
> `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so 
> after `--executor_shutdown_grace_period`, the executor can still terminate. 
> So I am not in which case the executor will never terminate.
> 
> And yes, the other 2 patches are independent of this one, I will make 
> them not depending on this one in the review board, thanks!
> 
> Qian Zhang wrote:
> After more thinking, I see one scenario the executor could never 
> terminate is: agent is down right after it sends SHUTDOWN event to executor. 
> In this case, when handling the SHUTDOWN event, executor will kill the task 
> and send TASK_KILLED status update to agent, but it will not get ACK since 
> agent is already down, so the executor will still run. But I think once agent 
> is started again, executor will receive the ACK and then terminate. I am not 
> sure if this behavior is OK, or we want executor to terminate once it 
> receives the SHUTDOWN event rather than wait for agent is started again?
> 
> Vinod Kone wrote:
> I think it is the right thing for the executor to stay up until the agent 
> comes back up and sends an ACK. Otherwise, which is currently the case, the 
> agent has no idea about the exit status of the executor.
> 
> So the cases where I see no ACKs from the agent are:
> 
> 1) `update` doesn't have UUID.
>   --> Is this possible with the HTTP executor? If not, we should probably 
> shutdown the executor instead of just ignoring.
> 
> 2) Framework is unknown.
>   --> Don't think we can do much here because we don't have access to 
> Executor object?
> 
> 3) Framework is terminating.
>   --> Per your observation the agent is guaranteed to destroy the 
> container. We need to add a comment here explainining this.
> 
> 4) Executor is unknown. While we forward the status update to the 
> framework, the ACK is dropped by `_statusUpdate` and `___statusUpdate()`
>--> If the update is generated by the agent (`killTask()`, 
> `_runTask()`) we should be fine because there is no executor.
>--> If the update is sent by an executor for a task belonging to 
> another executor, that another executor is hopefully already dead. So we 
> should be fine.
>--> Is it possible for the update to be sent by an executor that is 
> not known to the agent? Don't think we can do much here since we don't have 
> access to the Executor object.
> 
> Qian Zhang wrote:
> > I think it is the right thing for the executor to stay up until the 
> agent comes back up and sends an ACK. Otherwise, which is currently the case, 
> the agent has no idea about the exit status of the executor.
> 
> But that seems conflict with your original comment. My understanding to 
> your original comment is, to avoid executor not terminating forever, executor 
> needs to enque a message to self terminate after some timeout (I think we can 
> do it via delay() in `reaped()` after terminal status update is sent), if so, 
> then when executor handles SHUTDOWN event, it will self terminate anyway 

Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-08-25 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review146821
---



qian, sorry for the delay on getting back on this one. are you still interested 
in seeing this through? if yes, can you please rebase?

- Vinod Kone


On April 27, 2016, 1:23 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 27, 2016, 1:23 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-4855
> https://issues.apache.org/jira/browse/MESOS-4855
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> 0b4136c040ec9cf585c0d38f8471495a855cd640 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-26 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated April 27, 2016, 9:23 a.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Rebased


Bugs: MESOS-4855
https://issues.apache.org/jira/browse/MESOS-4855


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/http_command_executor.cpp 
0b4136c040ec9cf585c0d38f8471495a855cd640 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-25 Thread Vinod Kone


> On April 14, 2016, 4:49 p.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > 
> >
> > Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> > 
> > Can you do the following:
> > 
> > --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> > 
> > --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> > 
> > sounds good?
> 
> Vinod Kone wrote:
> @Qian, any update on this? If this particular review is going to take 
> some time, I think it is still useful two commit the other 2 reviews in this 
> chain. AFAICT, they are independent of this review?
> 
> Qian Zhang wrote:
> @Vinod, sorry for the late. I have filed a ticket 
> (https://issues.apache.org/jira/browse/MESOS-5262) for enhancing 
> `slave::statusUpdate()` to always ACK the status update sent by executor.
> 
> And can you please elaborate about the specific scenarios this executor 
> could not terminate forever. Originially I thought the scenario should be: 
> executor sends a terminal status upate to slave when the corresponding 
> framework is in `TERMINATING` state (e.g., operator tears down the 
> framework), then in `Slave::statusUpdate()`, this status update will be 
> ignored, so the executor will not get the ACK. But after testing, I found in 
> this case the executor can still terminate, because the container 
> corresponded to this executor will be destroyed by 
> `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so 
> after `--executor_shutdown_grace_period`, the executor can still terminate. 
> So I am not in which case the executor will never terminate.
> 
> And yes, the other 2 patches are independent of this one, I will make 
> them not depending on this one in the review board, thanks!
> 
> Qian Zhang wrote:
> After more thinking, I see one scenario the executor could never 
> terminate is: agent is down right after it sends SHUTDOWN event to executor. 
> In this case, when handling the SHUTDOWN event, executor will kill the task 
> and send TASK_KILLED status update to agent, but it will not get ACK since 
> agent is already down, so the executor will still run. But I think once agent 
> is started again, executor will receive the ACK and then terminate. I am not 
> sure if this behavior is OK, or we want executor to terminate once it 
> receives the SHUTDOWN event rather than wait for agent is started again?

I think it is the right thing for the executor to stay up until the agent comes 
back up and sends an ACK. Otherwise, which is currently the case, the agent has 
no idea about the exit status of the executor.

So the cases where I see no ACKs from the agent are:

1) `update` doesn't have UUID.
  --> Is this possible with the HTTP executor? If not, we should probably 
shutdown the executor instead of just ignoring.

2) Framework is unknown.
  --> Don't think we can do much here because we don't have access to Executor 
object?

3) Framework is terminating.
  --> Per your observation the agent is guaranteed to destroy the container. We 
need to add a comment here explainining this.

4) Executor is unknown. While we forward the status update to the framework, 
the ACK is dropped by `_statusUpdate` and `___statusUpdate()`
   --> If the update is generated by the agent (`killTask()`, `_runTask()`) we 
should be fine because there is no executor.
   --> If the update is sent by an executor for a task belonging to another 
executor, that another executor is hopefully already dead. So we should be fine.
   --> Is it possible for the update to be sent by an executor that is not 
known to the agent? Don't think we can do much here since we don't have access 
to the Executor object.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
---


On April 25, 2016, 8:34 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 25, 2016, 8:34 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-4855
> https://issues.apache.org/jira/browse/MESOS-4855
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the 

Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-25 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated April 25, 2016, 8:34 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Updated the bug id -- @vinodkone


Bugs: MESOS-4855
https://issues.apache.org/jira/browse/MESOS-4855


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs
-

  src/launcher/http_command_executor.cpp 
0b4136c040ec9cf585c0d38f8471495a855cd640 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review130285
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On April 24, 2016, 12:18 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 24, 2016, 12:18 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> 0b4136c040ec9cf585c0d38f8471495a855cd640 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-23 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

(Updated April 24, 2016, 8:18 a.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
---

Made sure the executor self terminates if the task has not been launched when 
receiving SHUTDOWN event.


Bugs: MESOS-3558
https://issues.apache.org/jira/browse/MESOS-3558


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs (updated)
-

  src/launcher/http_command_executor.cpp 
0b4136c040ec9cf585c0d38f8471495a855cd640 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-23 Thread Qian Zhang


> On April 15, 2016, 12:49 a.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > 
> >
> > Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> > 
> > Can you do the following:
> > 
> > --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> > 
> > --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> > 
> > sounds good?
> 
> Vinod Kone wrote:
> @Qian, any update on this? If this particular review is going to take 
> some time, I think it is still useful two commit the other 2 reviews in this 
> chain. AFAICT, they are independent of this review?
> 
> Qian Zhang wrote:
> @Vinod, sorry for the late. I have filed a ticket 
> (https://issues.apache.org/jira/browse/MESOS-5262) for enhancing 
> `slave::statusUpdate()` to always ACK the status update sent by executor.
> 
> And can you please elaborate about the specific scenarios this executor 
> could not terminate forever. Originially I thought the scenario should be: 
> executor sends a terminal status upate to slave when the corresponding 
> framework is in `TERMINATING` state (e.g., operator tears down the 
> framework), then in `Slave::statusUpdate()`, this status update will be 
> ignored, so the executor will not get the ACK. But after testing, I found in 
> this case the executor can still terminate, because the container 
> corresponded to this executor will be destroyed by 
> `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so 
> after `--executor_shutdown_grace_period`, the executor can still terminate. 
> So I am not in which case the executor will never terminate.
> 
> And yes, the other 2 patches are independent of this one, I will make 
> them not depending on this one in the review board, thanks!

After more thinking, I see one scenario the executor could never terminate is: 
agent is down right after it sends SHUTDOWN event to executor. In this case, 
when handling the SHUTDOWN event, executor will kill the task and send 
TASK_KILLED status update to agent, but it will not get ACK since agent is 
already down, so the executor will still run. But I think once agent is started 
again, executor will receive the ACK and then terminate. I am not sure if this 
behavior is OK, or we want executor to terminate once it receives the SHUTDOWN 
event rather than wait for agent is started again?


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
---


On April 14, 2016, 1:17 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 14, 2016, 1:17 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-23 Thread Qian Zhang


> On April 15, 2016, 12:49 a.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > 
> >
> > Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> > 
> > Can you do the following:
> > 
> > --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> > 
> > --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> > 
> > sounds good?
> 
> Vinod Kone wrote:
> @Qian, any update on this? If this particular review is going to take 
> some time, I think it is still useful two commit the other 2 reviews in this 
> chain. AFAICT, they are independent of this review?

@Vinod, sorry for the late. I have filed a ticket 
(https://issues.apache.org/jira/browse/MESOS-5262) for enhancing 
`slave::statusUpdate()` to always ACK the status update sent by executor.

And can you please elaborate about the specific scenarios this executor could 
not terminate forever. Originially I thought the scenario should be: executor 
sends a terminal status upate to slave when the corresponding framework is in 
`TERMINATING` state (e.g., operator tears down the framework), then in 
`Slave::statusUpdate()`, this status update will be ignored, so the executor 
will not get the ACK. But after testing, I found in this case the executor can 
still terminate, because the container corresponded to this executor will be 
destroyed by `Slave::shutdownExecutorTimeout()` -> 
`MesosContainerizer::destroy()`, so after `--executor_shutdown_grace_period`, 
the executor can still terminate. So I am not in which case the executor will 
never terminate.

And yes, the other 2 patches are independent of this one, I will make them not 
depending on this one in the review board, thanks!


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
---


On April 14, 2016, 1:17 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 14, 2016, 1:17 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-21 Thread Vinod Kone


> On April 14, 2016, 4:49 p.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > 
> >
> > Looking at slave::statusUpdate() code there are several scenarios where 
> > the slave ignores a status update sent by the executor; this means this 
> > executor could end up not terminating forever.
> > 
> > Can you do the following:
> > 
> > --> Enque a message in the queue to self terminate after some timeout 
> > (you can use the delay() function) to be safe.
> > 
> > --> Add a TODO that we do this to be safe and also because slave 
> > sometimes doesn't ACK a status update. Link to a ticket that fixes the 
> > slave status update semantics to always ACK a status update sent by an 
> > executor.
> > 
> > sounds good?

@Qian, any update on this? If this particular review is going to take some 
time, I think it is still useful two commit the other 2 reviews in this chain. 
AFAICT, they are independent of this review?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
---


On April 14, 2016, 5:17 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 14, 2016, 5:17 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-14 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
---




src/launcher/http_command_executor.cpp 


Looking at slave::statusUpdate() code there are several scenarios where the 
slave ignores a status update sent by the executor; this means this executor 
could end up not terminating forever.

Can you do the following:

--> Enque a message in the queue to self terminate after some timeout (you 
can use the delay() function) to be safe.

--> Add a TODO that we do this to be safe and also because slave sometimes 
doesn't ACK a status update. Link to a ticket that fixes the slave status 
update semantics to always ACK a status update sent by an executor.

sounds good?


- Vinod Kone


On April 14, 2016, 5:17 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 14, 2016, 5:17 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-14 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128848
---



Patch looks great!

Reviews applied: [46187]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On April 14, 2016, 5:17 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> ---
> 
> (Updated April 14, 2016, 5:17 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-3558
> https://issues.apache.org/jira/browse/MESOS-3558
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -
> 
>   src/launcher/http_command_executor.cpp 
> ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Review Request 46187: Terminate when receiving the ACK of terminal status update.

2016-04-13 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/
---

Review request for mesos, Anand Mazumdar and Vinod Kone.


Bugs: MESOS-3558
https://issues.apache.org/jira/browse/MESOS-3558


Repository: mesos


Description
---

Terminate when receiving the ACK of terminal status update.


Diffs
-

  src/launcher/http_command_executor.cpp 
ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac 

Diff: https://reviews.apache.org/r/46187/diff/


Testing
---

make check


Thanks,

Qian Zhang