Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Ben Mahler


> On Dec. 10, 2015, 2:35 a.m., Artem Harutyunyan wrote:
> > src/health-check/main.cpp, line 120
> > 
> >
> > Do we need to create a JIRA for eventually get rid of the hack?

Good idea, I filed MESOS-4111 and will reference it in a TODO. Will also add a 
reference in the command executor sleep.


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109667
---


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Artem Harutyunyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109668
---

Ship it!


Ship It!

- Artem Harutyunyan


On Dec. 9, 2015, 6:01 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 9, 2015, 6:01 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Artem Harutyunyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109667
---



src/health-check/main.cpp (line 120)


Do we need to create a JIRA for eventually get rid of the hack?


- Artem Harutyunyan


On Dec. 9, 2015, 6:01 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 9, 2015, 6:01 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Ben Mahler


> On Dec. 10, 2015, 2:10 a.m., Neil Conway wrote:
> > src/tests/health_check_tests.cpp, line 633
> > 
> >
> > Comment needs updating.

Thanks for catching this!


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109664
---


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109664
---



src/tests/health_check_tests.cpp (line 633)


Comment needs updating.


- Neil Conway


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/
---

Review request for mesos, Artem Harutyunyan and Timothy Chen.


Bugs: MESOS-1613 and MESOS-4106
https://issues.apache.org/jira/browse/MESOS-1613
https://issues.apache.org/jira/browse/MESOS-4106


Repository: mesos


Description
---

Much like in the command executor, we need to sleep after we send
the final message in the health checker. Otherwise, we may exit
before libprocess is able to finish sending the message over the
local network.

This led to the following issues:
https://issues.apache.org/jira/browse/MESOS-1613
https://issues.apache.org/jira/browse/MESOS-4106


Diffs
-

  src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
  src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 

Diff: https://reviews.apache.org/r/41178/diff/


Testing
---

Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
on a machine loaded with many `openssl speed` commands in the background 
reproduces the flakiness. After this patch it is no longer flaky in this setup.


Thanks,

Ben Mahler