Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-25 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/
---

(Updated Nov. 25, 2016, 3:10 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and 
haosdent huang.


Bugs: MESOS-5963
https://issues.apache.org/jira/browse/MESOS-5963


Repository: mesos


Description
---

Prior to this patch, HealthChecker would stop performing health
checks after it marks the task for kill. Since tasks' lifecycle
is managed by scheduler-executor, HealthChecker should never stop
health checking on its own.

Allowing health checks to run forever enables the scheduler
make the decision about how to deal with unhealthy tasks.


Diffs
-

  src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
  src/health-check/health_checker.hpp 837d1358e418d21536da488e4a23cbfa41db6060 
  src/health-check/health_checker.cpp af5500be249c74a4a5e64bf38dea607173e2f998 
  src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
  src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 

Diff: https://reviews.apache.org/r/52865/diff/


Testing
---

See https://reviews.apache.org/r/52873/.


Thanks,

Alexander Rukletsov



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-25 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/
---

(Updated Nov. 25, 2016, 11:49 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and 
haosdent huang.


Bugs: MESOS-5963
https://issues.apache.org/jira/browse/MESOS-5963


Repository: mesos


Description
---

Prior to this patch, HealthChecker would stop performing health
checks after it marks the task for kill. Since tasks' lifecycle
is managed by scheduler-executor, HealthChecker should never stop
health checking on its own.

Allowing health checks to run forever enables the scheduler
make the decision about how to deal with unhealthy tasks.


Diffs (updated)
-

  src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
  src/health-check/health_checker.hpp 837d1358e418d21536da488e4a23cbfa41db6060 
  src/health-check/health_checker.cpp af5500be249c74a4a5e64bf38dea607173e2f998 
  src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
  src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 

Diff: https://reviews.apache.org/r/52865/diff/


Testing
---

See https://reviews.apache.org/r/52873/.


Thanks,

Alexander Rukletsov



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-24 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review156845
---


Ship it!




Ship It!

- Gastón Kleiman


On Nov. 23, 2016, 12:50 a.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Nov. 23, 2016, 12:50 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> Allowing health checks to run forever enables the scheduler
> make the decision about how to deal with unhealthy tasks.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
>   src/health-check/health_checker.hpp 
> 837d1358e418d21536da488e4a23cbfa41db6060 
>   src/health-check/health_checker.cpp 
> af5500be249c74a4a5e64bf38dea607173e2f998 
>   src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
>   src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-22 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/
---

(Updated Nov. 23, 2016, 12:50 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and 
haosdent huang.


Bugs: MESOS-5963
https://issues.apache.org/jira/browse/MESOS-5963


Repository: mesos


Description (updated)
---

Prior to this patch, HealthChecker would stop performing health
checks after it marks the task for kill. Since tasks' lifecycle
is managed by scheduler-executor, HealthChecker should never stop
health checking on its own.

Allowing health checks to run forever enables the scheduler
make the decision about how to deal with unhealthy tasks.


Diffs
-

  src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
  src/health-check/health_checker.hpp 837d1358e418d21536da488e4a23cbfa41db6060 
  src/health-check/health_checker.cpp af5500be249c74a4a5e64bf38dea607173e2f998 
  src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
  src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 

Diff: https://reviews.apache.org/r/52865/diff/


Testing
---

See https://reviews.apache.org/r/52873/.


Thanks,

Alexander Rukletsov



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-22 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/
---

(Updated Nov. 23, 2016, 12:28 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and 
haosdent huang.


Bugs: MESOS-5963
https://issues.apache.org/jira/browse/MESOS-5963


Repository: mesos


Description
---

Prior to this patch, HealthChecker would stop performing health
checks after it marks the task for kill. Since tasks' lifecycle
is managed by scheduler-executor, HealthChecker should never stop
health checking on its own.


Diffs (updated)
-

  src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
  src/health-check/health_checker.hpp 837d1358e418d21536da488e4a23cbfa41db6060 
  src/health-check/health_checker.cpp af5500be249c74a4a5e64bf38dea607173e2f998 
  src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
  src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 

Diff: https://reviews.apache.org/r/52865/diff/


Testing
---

See https://reviews.apache.org/r/52873/.


Thanks,

Alexander Rukletsov



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-17 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review156280
---


Ship it!




We might want to update the description of the commit to mention where we're 
going with this change (i.e. allow health checks to run forever and let the 
scheduler make the decision about how to deal with unhealthy tasks).


src/health-check/health_checker.hpp (line 71)


How about '`initialize`'?

Later, when you introduce pause/resume functionality, we could call them 
start/stop and avoid the need for an initialize function, right?


- Benjamin Mahler


On Nov. 14, 2016, 10:27 a.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Nov. 14, 2016, 10:27 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
>   src/health-check/health_checker.hpp 
> a1dc72493ff31b87068d5691f4d5b794392caf76 
>   src/health-check/health_checker.cpp 
> e2b32e2d57515202f547d12ba492ad8eb633641b 
>   src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
>   src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-15 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review155934
---


Ship it!




Ship It!

- haosdent huang


On Nov. 14, 2016, 10:27 a.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Nov. 14, 2016, 10:27 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
>   src/health-check/health_checker.hpp 
> a1dc72493ff31b87068d5691f4d5b794392caf76 
>   src/health-check/health_checker.cpp 
> e2b32e2d57515202f547d12ba492ad8eb633641b 
>   src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
>   src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-14 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/
---

(Updated Nov. 14, 2016, 10:27 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and 
haosdent huang.


Bugs: MESOS-5963
https://issues.apache.org/jira/browse/MESOS-5963


Repository: mesos


Description
---

Prior to this patch, HealthChecker would stop performing health
checks after it marks the task for kill. Since tasks' lifecycle
is managed by scheduler-executor, HealthChecker should never stop
health checking on its own.


Diffs (updated)
-

  src/docker/executor.cpp eefbc0c2936eb93044c2d8a1b50155171f398562 
  src/health-check/health_checker.hpp a1dc72493ff31b87068d5691f4d5b794392caf76 
  src/health-check/health_checker.cpp e2b32e2d57515202f547d12ba492ad8eb633641b 
  src/launcher/default_executor.cpp f4e1ea4816fee9a2f6d59bb534933b66013f11cf 
  src/launcher/executor.cpp ce0b199551447504bb95743df4ce9ec4a0443cd4 

Diff: https://reviews.apache.org/r/52865/diff/


Testing
---

See https://reviews.apache.org/r/52873/.


Thanks,

Alexander Rukletsov



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-11 Thread haosdent huang


> On Oct. 17, 2016, 6:53 a.m., haosdent huang wrote:
> > src/health-check/health_checker.cpp, lines 206-217
> > 
> >
> > After we never stop health check, `consecutiveFailures` may become to 0 
> > after success again. Then `killTask` would transform from `true` to `false` 
> > here. Is it a expected bahaviour?
> 
> Alexander Rukletsov wrote:
> Very good point, Haosdent.
> 
> The problem here is that **one entity decides** when a task should be 
> killed, but **another entity enforce** this. The first one cannot really 
> enforce what the second does. What is the least surpising behaviour is that 
> unfortunate architecture? My opinion is to reset if the second entity, i.e. 
> executor, does not comply.
> 
> A better architecture would be to separate "health checker" from 
> "unhealthy policy enforcer". As we've already agreed, we need a "global" 
> health check policy, see 
> [MESOS-6171](https://issues.apache.org/jira/browse/MESOS-6171). With two 
> "unhealthy policies", local and global, the health checker library should 
> simply report the health status, while the executor will apply one of the 
> policies (that may still be implemented in a health checker library for code 
> reuse). If you think this makes sense, do you mind filing a ticket about this?

Got it, create at https://issues.apache.org/jira/browse/MESOS-6578


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152827
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-11 Thread Alexander Rukletsov


> On Oct. 14, 2016, 1:53 p.m., Gastón Kleiman wrote:
> > src/tests/health_check_tests.cpp, line 997
> > 
> >
> > The changes to this file don't seem to be related to the rest of the 
> > patch.

It is indeed not needed anymore once you fixed `TASK_KILLING`->`TASK_RUNNING` 
transition. Before it was possible to send a health update due to a `1s` after 
`TASK_KILLED`.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152670
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-11 Thread Alexander Rukletsov


> On Oct. 21, 2016, 7:35 p.m., Benjamin Mahler wrote:
> > src/launcher/executor.cpp, lines 300-314
> > 
> >
> > This now looks incorrect, we will send a TASK_RUNNING update whenever 
> > we get a task health message but we may have already transitioned the task 
> > to TASK_KILLING, right?
> > 
> > Is there something I'm missing that prevents this from sending 
> > out-of-order status updates now that health messages are generated after 
> > the task kill is initiated?
> > 
> > Speaking of which, is this a problem more generally? If an external 
> > kill request arrives but we get a task health message, can we generate a 
> > TASK_RUNNING here after a TASK_KILLING?
> 
> Gastón Kleiman wrote:
> When the health of a task changes, Mesos will currently generate a 
> {{TASK_RUNNING}} even if it has already generated a {{TASK_KILLING}}, see 
> https://issues.apache.org/jira/browse/MESOS-6457.

Ben, we've fixed that as part of 
[MESOS-6457](https://issues.apache.org/jira/browse/MESOS-6457). Dropping the 
issue.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review153583
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-11 Thread Alexander Rukletsov


> On Oct. 17, 2016, 6:57 a.m., haosdent huang wrote:
> > src/tests/health_check_tests.cpp, lines 1214-1217
> > 
> >
> > Should add
> > 
> > ```
> > .WillRepeatedly(Return()); // Ignore subsequent updates.
> > ```
> > 
> > here?

I don't think so, because this is supposed to be the last update. In contrast 
for example to other tests in this file, where an extra `TASK_RUNNING` may be 
delivered.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152828
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-11-11 Thread Alexander Rukletsov


> On Oct. 17, 2016, 6:53 a.m., haosdent huang wrote:
> > src/health-check/health_checker.cpp, lines 206-217
> > 
> >
> > After we never stop health check, `consecutiveFailures` may become to 0 
> > after success again. Then `killTask` would transform from `true` to `false` 
> > here. Is it a expected bahaviour?

Very good point, Haosdent.

The problem here is that **one entity decides** when a task should be killed, 
but **another entity enforce** this. The first one cannot really enforce what 
the second does. What is the least surpising behaviour is that unfortunate 
architecture? My opinion is to reset if the second entity, i.e. executor, does 
not comply.

A better architecture would be to separate "health checker" from "unhealthy 
policy enforcer". As we've already agreed, we need a "global" health check 
policy, see [MESOS-6171](https://issues.apache.org/jira/browse/MESOS-6171). 
With two "unhealthy policies", local and global, the health checker library 
should simply report the health status, while the executor will apply one of 
the policies (that may still be implemented in a health checker library for 
code reuse). If you think this makes sense, do you mind filing a ticket about 
this?


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152827
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-24 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review153685
---




src/health-check/health_checker.cpp (lines 204 - 215)


Do you mean that if `consecutiveFailures` is set to `UINT_MAX `, then the 
health check will never fail?


- Gastón Kleiman


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-24 Thread Gastón Kleiman


> On Oct. 21, 2016, 7:35 p.m., Benjamin Mahler wrote:
> > src/launcher/executor.cpp, lines 300-314
> > 
> >
> > This now looks incorrect, we will send a TASK_RUNNING update whenever 
> > we get a task health message but we may have already transitioned the task 
> > to TASK_KILLING, right?
> > 
> > Is there something I'm missing that prevents this from sending 
> > out-of-order status updates now that health messages are generated after 
> > the task kill is initiated?
> > 
> > Speaking of which, is this a problem more generally? If an external 
> > kill request arrives but we get a task health message, can we generate a 
> > TASK_RUNNING here after a TASK_KILLING?

When the health of a task changes, Mesos will currently generate a 
{{TASK_RUNNING}} even if it has already generated a {{TASK_KILLING}}, see 
https://issues.apache.org/jira/browse/MESOS-6457.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review153583
---


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review153583
---




src/launcher/executor.cpp (lines 300 - 314)


This now looks incorrect, we will send a TASK_RUNNING update whenever we 
get a task health message but we may have already transitioned the task to 
TASK_KILLING, right?

Is there something I'm missing that prevents this from sending out-of-order 
status updates now that health messages are generated after the task kill is 
initiated?

Speaking of which, is this a problem more generally? If an external kill 
request arrives but we get a task health message, can we generate a 
TASK_RUNNING here after a TASK_KILLING?


- Benjamin Mahler


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152828
---




src/tests/health_check_tests.cpp (lines 1212 - 1215)


Should add

```
.WillRepeatedly(Return()); // Ignore subsequent updates.
```

here?


- haosdent huang


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152827
---




src/health-check/health_checker.cpp (lines 204 - 215)


After we never stop health check, `consecutiveFailures` may become to 0 
after success again. Then `killTask` would transform from `true` to `false` 
here. Is it a expected bahaviour?


- haosdent huang


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 52865: Refactored HealthChecker to never stop health checking.

2016-10-14 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152670
---


Ship it!





src/tests/health_check_tests.cpp (line 997)


The changes to this file don't seem to be related to the rest of the patch.


- Gastón Kleiman


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> ---
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, 
> and haosdent huang.
> 
> 
> Bugs: MESOS-5963
> https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 
> 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 
> 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>