Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-25 Thread Alexander Rukletsov


> On Nov. 25, 2016, 3:09 p.m., haosdent huang wrote:
> > src/tests/health_check_tests.cpp, line 842
> > 
> >
> > Change here to make it consistent with
> > 
> > ```
> > // This test creates a task whose health flaps, and verifies that the
> > // health status updates are sent to the framework.
> > 
> > TEST_F(HealthCheckTest, HealthStatusChange)
> > ```

Framework scheduler is probably the best name : )


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review156922
---


On Nov. 25, 2016, 3:08 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated Nov. 25, 2016, 3:08 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates: it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp a4436bdb70ca988106742dadb0762c99a4ebe369 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-25 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review156925
---


Ship it!




Ship It!

- Alexander Rukletsov


On Nov. 25, 2016, 3:08 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated Nov. 25, 2016, 3:08 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates: it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp a4436bdb70ca988106742dadb0762c99a4ebe369 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-25 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review156922
---




src/tests/health_check_tests.cpp (line 842)


Change here to make it consistent with

```
// This test creates a task whose health flaps, and verifies that the
// health status updates are sent to the framework.

TEST_F(HealthCheckTest, HealthStatusChange)
```


- haosdent huang


On Nov. 25, 2016, 3:08 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated Nov. 25, 2016, 3:08 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates: it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp a4436bdb70ca988106742dadb0762c99a4ebe369 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-25 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated Nov. 25, 2016, 3:08 p.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, Neil 
Conway, and Timothy Chen.


Changes
---

Rebase.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description
---

In HealthStatusChange test cases, we launch a task that toggles between
healthy and unhealthy, and will never be killed because no consecutive
health failures occur. We need to ignore subsequent status updates: it
is possible to continue to receive status updates before we stop the
driver.


Diffs (updated)
-

  src/tests/health_check_tests.cpp a4436bdb70ca988106742dadb0762c99a4ebe369 

Diff: https://reviews.apache.org/r/46307/diff/


Testing
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-24 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review156873
---




src/tests/health_check_tests.cpp (line 779)


Since you're touching this test, let's mark it `const` and rename to 
`healthCheckCmd`.


- Alexander Rukletsov


On Nov. 24, 2016, 10:51 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated Nov. 24, 2016, 10:51 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates: it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-24 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated Nov. 24, 2016, 10:51 p.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, Neil 
Conway, and Timothy Chen.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description (updated)
---

In HealthStatusChange test cases, we launch a task that toggles between
healthy and unhealthy, and will never be killed because no consecutive
health failures occur. We need to ignore subsequent status updates: it
is possible to continue to receive status updates before we stop the
driver.


Diffs
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-24 Thread Alexander Rukletsov


> On April 19, 2016, 2:35 p.m., Neil Conway wrote:
> > This patch does not solve the flakiness for me: failed once after 2 
> > iterations, then again after 77 iterations. Verbose test log here: 
> > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9
> 
> haosdent huang wrote:
> Thank you very much for your test! I saw you use `vagrant@archlinux`, may 
> you share your vagrantfile to me? So that I could try to reproduce in my 
> local.
> 
> haosdent huang wrote:
> ```
> I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from 
> 10.0.2.15:44478
> Received task health update, healthy: true
> I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- 
> from executor(1)@10.0.2.15:37107
> I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received 
> status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for 
> task 1 in health state healthy of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-
> I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- 
> to master@10.0.2.15:41408
> I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for 
> task 1 in health state healthy of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765- to executor(1)@10.0.2.15:37107
> I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING 
> (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state 
> healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- from agent 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 
> (archlinux.vagrant.vm)
> I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-
> I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 
> of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- (latest state: 
> TASK_RUNNING, status update state: TASK_RUNNING)
> I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call 
> e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765- (default) at 
> scheduler-5bd5e446-a017-45d9-8193-be7d23002487@10.0.2.15:41408 on agent 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0
> I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) 
> for task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-
> rm: cannot remove '/tmp/1NKfr1': No such file or directory
> I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state 
> from 10.0.2.15:44482
> ../../mesos/src/tests/health_check_tests.cpp:647: Failure
> Value of: (find).get()
>   Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00>
> Expected: false
> Which is: false
> *** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you 
> are using GNU date ***
> PC: @  0x1899ba0 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; 
> stack trace: ***
> 
> ```
> It looks like get `true` here. Let me try how to fix this.

There were at least two different issues in this test (see MESOS-1802), and 
this patch fixes just one. The one you see will be addressed in the next review 
in the chain.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
---


On May 17, 2016, 4:46 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated May 17, 2016, 4:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates it
> is possible to continue to receive status updates 

Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-11-24 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review156871
---


Ship it!




LGTM, modulo rebase.


src/tests/health_check_tests.cpp (line 561)


Looks like this has been fixed in another patch : ).



src/tests/health_check_tests.cpp (line 801)


But this is not!


- Alexander Rukletsov


On May 17, 2016, 4:46 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated May 17, 2016, 4:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-05-17 Thread haosdent huang


> On May 13, 2016, 8:50 p.m., Benjamin Mahler wrote:
> > This looks good but when you mentioned the consecutive failures in the 
> > description I was confused. The test should probably just say that we 
> > launch a task that toggles between healthy and unhealthy, and will never be 
> > killed because no consecutive health failures occur. That will make it 
> > clear that we need to ignore subsequent status updates because we'll 
> > continue to receive healthy/unhealthy updates.
> > 
> > Could you update the description to clarify this?

Thank you for your clarify, just updated.


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review133198
---


On May 17, 2016, 4:46 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated May 17, 2016, 4:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-05-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated May 17, 2016, 4:46 p.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, Neil 
Conway, and Timothy Chen.


Changes
---

Address @bmahler's comments.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description (updated)
---

In HealthStatusChange test cases, we launch a task that toggles between
healthy and unhealthy, and will never be killed because no consecutive
health failures occur. We need to ignore subsequent status updates it
is possible to continue to receive status updates before we stop the
driver.


Diffs (updated)
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-05-17 Thread haosdent huang


> On May 13, 2016, 8:50 p.m., Benjamin Mahler wrote:
> > src/tests/health_check_tests.cpp, line 504
> > 
> >
> > Why did this change?

Yes, we need this so that we could check the stdout/stderr from console if it 
still flaky next time.


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review133198
---


On May 7, 2016, 8:59 a.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated May 7, 2016, 8:59 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-05-13 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review133198
---



This looks good but when you mentioned the consecutive failures in the 
description I was confused. The test should probably just say that we launch a 
task that toggles between healthy and unhealthy, and will never be killed 
because no consecutive health failures occur. That will make it clear that we 
need to ignore subsequent status updates because we'll continue to receive 
healthy/unhealthy updates.

Could you update the description to clarify this?


src/tests/health_check_tests.cpp (line 504)


Why did this change?


- Benjamin Mahler


On May 7, 2016, 8:59 a.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated May 7, 2016, 8:59 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-05-07 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated May 7, 2016, 8:59 a.m.)


Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
Conway, and Timothy Chen.


Changes
---

Rebase.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description
---

We need to ignore subsequent status updates in HealthStatusChange
tests. In our test cases, we set `consecutive_failures` to 3 in
HealthCheck message definition. But the counter for
`consecutiveFailures` in `mesos-health-check` would be reset to 0
after a success check. It is possible to continue to receive status
updates before we stop the driver.


Diffs (updated)
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-19 Thread haosdent huang


> On April 19, 2016, 2:35 p.m., Neil Conway wrote:
> > This patch does not solve the flakiness for me: failed once after 2 
> > iterations, then again after 77 iterations. Verbose test log here: 
> > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9
> 
> haosdent huang wrote:
> Thank you very much for your test! I saw you use `vagrant@archlinux`, may 
> you share your vagrantfile to me? So that I could try to reproduce in my 
> local.

```
I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from 
10.0.2.15:44478
Received task health update, healthy: true
I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update TASK_RUNNING 
(UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- from 
executor(1)@10.0.2.15:37107
I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received status 
update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-
I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update TASK_RUNNING 
(UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- to master@10.0.2.15:41408
I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for status 
update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- to 
executor(1)@10.0.2.15:37107
I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING (UUID: 
e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of 
framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- from agent 
7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 
(archlinux.vagrant.vm)
I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update 
TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health 
state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-
I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 of 
framework 7cf5923c-3d03-4ed6-826a-efa97f54e765- (latest state: 
TASK_RUNNING, status update state: TASK_RUNNING)
I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call 
e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework 
7cf5923c-3d03-4ed6-826a-efa97f54e765- (default) at 
scheduler-5bd5e446-a017-45d9-8193-be7d23002487@10.0.2.15:41408 on agent 
7cf5923c-3d03-4ed6-826a-efa97f54e765-S0
I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received status 
update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-
rm: cannot remove '/tmp/1NKfr1': No such file or directory
I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state from 
10.0.2.15:44482
../../mesos/src/tests/health_check_tests.cpp:647: Failure
Value of: (find).get()
  Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00>
Expected: false
Which is: false
*** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you are 
using GNU date ***
PC: @  0x1899ba0 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; stack 
trace: ***

```
It looks like get `true` here. Let me try how to fix this.


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
---


On April 17, 2016, 5:15 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated April 17, 2016, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is 

Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-19 Thread haosdent huang


> On April 19, 2016, 2:35 p.m., Neil Conway wrote:
> > This patch does not solve the flakiness for me: failed once after 2 
> > iterations, then again after 77 iterations. Verbose test log here: 
> > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9

Thank you very much for your test! I saw you use `vagrant@archlinux`, may you 
share your vagrantfile to me? So that I could try to reproduce in my local.


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
---


On April 17, 2016, 5:15 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated April 17, 2016, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-19 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
---



This patch does not solve the flakiness for me: failed once after 2 iterations, 
then again after 77 iterations. Verbose test log here: 
https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9

- Neil Conway


On April 17, 2016, 5:15 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated April 17, 2016, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-17 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129272
---



Patch looks great!

Reviews applied: [46307]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On April 17, 2016, 5:15 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated April 17, 2016, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated April 17, 2016, 5:15 p.m.)


Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
Conway, and Timothy Chen.


Changes
---

Update description.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description (updated)
---

We need to ignore subsequent status updates in HealthStatusChange
tests. In our test cases, we set `consecutive_failures` to 3 in
HealthCheck message definition. But the counter for
`consecutiveFailures` in `mesos-health-check` would be reset to 0
after a success check. It is possible to continue to receive status
updates before we stop the driver.


Diffs (updated)
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-17 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129265
---



Bad patch!

Reviews applied: [46307]

Failed command: ./support/apply-review.sh -n -r 46307

Error:
2016-04-17 16:51:54 URL:https://reviews.apache.org/r/46307/diff/raw/ 
[1446/1446] -> "46307.patch" [1]
Total errors found: 0
Checking 1 files
Error: No line in the commit message summary may exceed 72 characters.

Full log: https://builds.apache.org/job/mesos-reviewbot/12574/console

- Mesos ReviewBot


On April 17, 2016, 3:15 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> ---
> 
> (Updated April 17, 2016, 3:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
> https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0 after
> a success check. It is possible to continue to receive status updates
> before we stop the driver.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> ---
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated April 17, 2016, 3:11 p.m.)


Review request for mesos, Ben Mahler, Neil Conway, and Timothy Chen.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description
---

We need to ignore subsequent status updates in HealthStatusChange
tests. In our test cases, we set `consecutive_failures` to 3 in
HealthCheck message definition. But the counter for
`consecutiveFailures` in `mesos-health-check` would be reset to 0 after
a success check. It is possible to continue to receive status updates
before we stop the driver.


Diffs
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing (updated)
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang



Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

2016-04-17 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/
---

(Updated April 17, 2016, 3:11 p.m.)


Review request for mesos, Ben Mahler, Neil Conway, and Timothy Chen.


Bugs: MESOS-1802
https://issues.apache.org/jira/browse/MESOS-1802


Repository: mesos


Description
---

We need to ignore subsequent status updates in HealthStatusChange
tests. In our test cases, we set `consecutive_failures` to 3 in
HealthCheck message definition. But the counter for
`consecutiveFailures` in `mesos-health-check` would be reset to 0 after
a success check. It is possible to continue to receive status updates
before we stop the driver.


Diffs
-

  src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 

Diff: https://reviews.apache.org/r/46307/diff/


Testing (updated)
---

# I still could not reproduce the problem in old code after repeatedly tests. 
So seems no way to verify whether my assumption is correct or not.


Thanks,

haosdent huang