Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/
---

(Updated May 6, 2019, 10:04 p.m.)


Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.


Changes
---

Renamed the test to NoHang


Bugs: MESOS-9766
https://issues.apache.org/jira/browse/MESOS-9766


Repository: mesos


Description
---

This test fails on master prior to applying the fix for MESOS-9766.
It attempts to ensure that processes are terminated after the
/__processes__ handler dispatches to them.


Diffs (updated)
-

  3rdparty/libprocess/src/tests/process_tests.cpp 
60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 


Diff: https://reviews.apache.org/r/70595/diff/4/

Changes: https://reviews.apache.org/r/70595/diff/3-4/


Testing
---

Ran in repetition, although it appears to consistently fail on master without 
repetition needed.


Thanks,

Benjamin Mahler



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215075
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On May 6, 2019, 10:04 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 6, 2019, 10:04 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/4/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/
---

(Updated May 6, 2019, 10:02 p.m.)


Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.


Changes
---

Use a managed process and avoid an additional race.


Bugs: MESOS-9766
https://issues.apache.org/jira/browse/MESOS-9766


Repository: mesos


Description
---

This test fails on master prior to applying the fix for MESOS-9766.
It attempts to ensure that processes are terminated after the
/__processes__ handler dispatches to them.


Diffs (updated)
-

  3rdparty/libprocess/src/tests/process_tests.cpp 
60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 


Diff: https://reviews.apache.org/r/70595/diff/3/

Changes: https://reviews.apache.org/r/70595/diff/2-3/


Testing
---

Ran in repetition, although it appears to consistently fail on master without 
repetition needed.


Thanks,

Benjamin Mahler



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215074
---


Fix it, then Ship it!




Thanks for making this test more deterministic! Got some comments, but IMO this 
test is robust enough in practice so please feel free to drop my issues.


3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2122 (patched)


If you want libprocess to manager this actor, you can do the following:
```
PID pid = spawn(new TestProcess(), true);

Future waited = dispatch(pid, ::wait_for_terminate);
```
Then you won't need to do `process::wait` and `delete` later.

If you do so, you don't even need `waited`.



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2138 (patched)


Although highly unlikely, it is theoratically possible that the terminate 
event is enqueued before any event of `process` got dequeued. If this happen, 
this await would fail even w/ the fix.

A possible way to fix this is to set up some promise in 
`wait_for_terminate`, and then wait for the corresponding future before calling 
`terminate` in this test.

That said, fixing this flake that's unlikely to happen might not worth it, 
so please feel free to drop this issue.


- Chun-Hung Hsiao


On May 6, 2019, 6:34 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 6, 2019, 6:34 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/2/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215070
---



Patch looks great!

Reviews applied: [70594, 70595]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On May 6, 2019, 6:34 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 6, 2019, 6:34 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/2/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/
---

(Updated May 6, 2019, 6:34 p.m.)


Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.


Changes
---

Made the test not rely on racing behavior.


Bugs: MESOS-9766
https://issues.apache.org/jira/browse/MESOS-9766


Repository: mesos


Description
---

This test fails on master prior to applying the fix for MESOS-9766.
It attempts to ensure that processes are terminated after the
/__processes__ handler dispatches to them.


Diffs (updated)
-

  3rdparty/libprocess/src/tests/process_tests.cpp 
60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 


Diff: https://reviews.apache.org/r/70595/diff/2/

Changes: https://reviews.apache.org/r/70595/diff/1-2/


Testing
---

Ran in repetition, although it appears to consistently fail on master without 
repetition needed.


Thanks,

Benjamin Mahler



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Benjamin Mahler


> On May 3, 2019, 11:31 p.m., Chun-Hung Hsiao wrote:
> > 3rdparty/libprocess/src/tests/process_tests.cpp
> > Lines 2092-2108 (patched)
> > 
> >
> > I tried this unit test w/o the fix in the following settings:
> > 1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
> > 2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
> > passes and 40 failures.
> > 
> > So it seems to me that this test passes fairly easily if it only runs 
> > once.
> > 
> > Then I replace this snippet with the following code, which just used 1 
> > actor but failed every time in the above two settings:
> > ```
> >   UPID pid = spawn(new ProcessBase(), true);
> > 
> >   dispatch(pid, [] { os::sleep(Milliseconds(50)); });
> > 
> >   http::URL url = http::URL(
> >   "http",
> >   process::address().ip,
> >   process::address().port,
> >   "/__processes__");
> > 
> >   Future response = http::get(url);
> > 
> >   terminate(pid, true);
> > ```
> 
> Alexander Rukletsov wrote:
> Agree with Chun, we should avoid races as much as we can. Several 
> suggestions to further improve Chun's version:
> - Pause the clock in the beginnig. We can then sleep for 42 minutes to 
> make sure the first message is still being processes when `http::get()` is 
> invoked (will of course require `settle()` before).
> - `settle()` after `http::get` to ensure `/__processes__` handler is put 
> into the `pid`'s mailbaox.
> - Check that the resulted JSON does not include the entry for `pid`, 
> i.e., is empty.
> 
> Chun-Hung Hsiao wrote:
> I'm not sure if adding settles works:
> 1. Adding a `settle` before `http::get` makes it wait for the completion 
> of the sleep.
> 2. Adding a `settle` after `http::get` makes `terminate` wait for the 
> completion of the sleep.
> IIUC both will increase the likelihood of the test passing w/o che fix.

Thanks guys, I will upload a test that doesn't rely on racing, we could also go 
with Chun's simpler test which relies on racing but will only rarely not 
exercise what we want.


> On May 3, 2019, 11:31 p.m., Chun-Hung Hsiao wrote:
> > 3rdparty/libprocess/src/tests/process_tests.cpp
> > Lines 2110-2112 (patched)
> > 
> >
> > `AWAIT_EXPECT_RESPONSE_STATUS_EQ(http::OK().status, response);`

Now that I look at it, it looks less clear/readable with the long macro.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao


> On May 3, 2019, 11:31 p.m., Chun-Hung Hsiao wrote:
> > 3rdparty/libprocess/src/tests/process_tests.cpp
> > Lines 2092-2108 (patched)
> > 
> >
> > I tried this unit test w/o the fix in the following settings:
> > 1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
> > 2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
> > passes and 40 failures.
> > 
> > So it seems to me that this test passes fairly easily if it only runs 
> > once.
> > 
> > Then I replace this snippet with the following code, which just used 1 
> > actor but failed every time in the above two settings:
> > ```
> >   UPID pid = spawn(new ProcessBase(), true);
> > 
> >   dispatch(pid, [] { os::sleep(Milliseconds(50)); });
> > 
> >   http::URL url = http::URL(
> >   "http",
> >   process::address().ip,
> >   process::address().port,
> >   "/__processes__");
> > 
> >   Future response = http::get(url);
> > 
> >   terminate(pid, true);
> > ```
> 
> Alexander Rukletsov wrote:
> Agree with Chun, we should avoid races as much as we can. Several 
> suggestions to further improve Chun's version:
> - Pause the clock in the beginnig. We can then sleep for 42 minutes to 
> make sure the first message is still being processes when `http::get()` is 
> invoked (will of course require `settle()` before).
> - `settle()` after `http::get` to ensure `/__processes__` handler is put 
> into the `pid`'s mailbaox.
> - Check that the resulted JSON does not include the entry for `pid`, 
> i.e., is empty.

I'm not sure if adding settles works:
1. Adding a `settle` before `http::get` makes it wait for the completion of the 
sleep.
2. Adding a `settle` after `http::get` makes `terminate` wait for the 
completion of the sleep.
IIUC both will increase the likelihood of the test passing w/o che fix.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Alexander Rukletsov


> On May 3, 2019, 11:31 p.m., Chun-Hung Hsiao wrote:
> > 3rdparty/libprocess/src/tests/process_tests.cpp
> > Lines 2092-2108 (patched)
> > 
> >
> > I tried this unit test w/o the fix in the following settings:
> > 1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
> > 2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
> > passes and 40 failures.
> > 
> > So it seems to me that this test passes fairly easily if it only runs 
> > once.
> > 
> > Then I replace this snippet with the following code, which just used 1 
> > actor but failed every time in the above two settings:
> > ```
> >   UPID pid = spawn(new ProcessBase(), true);
> > 
> >   dispatch(pid, [] { os::sleep(Milliseconds(50)); });
> > 
> >   http::URL url = http::URL(
> >   "http",
> >   process::address().ip,
> >   process::address().port,
> >   "/__processes__");
> > 
> >   Future response = http::get(url);
> > 
> >   terminate(pid, true);
> > ```

Agree with Chun, we should avoid races as much as we can. Several suggestions 
to further improve Chun's version:
- Pause the clock in the beginnig. We can then sleep for 42 minutes to make 
sure the first message is still being processes when `http::get()` is invoked 
(will of course require `settle()` before).
- `settle()` after `http::get` to ensure `/__processes__` handler is put into 
the `pid`'s mailbaox.
- Check that the resulted JSON does not include the entry for `pid`, i.e., is 
empty.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-04 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215050
---



Patch looks great!

Reviews applied: [70594, 70595]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-03 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---




3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2086 (patched)


Should `/__processes__` be quoted by backticks? Ditto below.



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2090 (patched)


How about `NoProcessesEndpointHang`?



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2092-2108 (patched)


I tried this unit test w/o the fix in the following settings:
1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
passes and 40 failures.

So it seems to me that this test passes fairly easily if it only runs once.

Then I replace this snippet with the following code, which just used 1 
actor but failed every time in the above two settings:
```
  UPID pid = spawn(new ProcessBase(), true);

  dispatch(pid, [] { os::sleep(Milliseconds(50)); });

  http::URL url = http::URL(
  "http",
  process::address().ip,
  process::address().port,
  "/__processes__");

  Future response = http::get(url);

  terminate(pid, true);
```



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2110-2112 (patched)


`AWAIT_EXPECT_RESPONSE_STATUS_EQ(http::OK().status, response);`


- Chun-Hung Hsiao


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>