Re: Review Request 40849: Fix flaky MemoryPressureMesosTests

2015-12-09 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40849/#review109494
---

Ship it!


Ship It!

- Benjamin Bannier


On Dec. 3, 2015, 7:01 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40849/
> ---
> 
> (Updated Dec. 3, 2015, 7:01 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Greg Mann, Artem Harutyunyan, Jan 
> Schlicht, and Till Toenshoff.
> 
> 
> Bugs: MESOS-3586
> https://issues.apache.org/jira/browse/MESOS-3586
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The existing tests will check that "low" pressure events occur at least as 
> often as "medium" pressure events (this is the documented behavior).  
> However, the order of events and the order in which we process said events is 
> not guaranteed.  When we collect the pressure events via a counter, there may 
> be some events that are in-flight, and thereby not accounted for in the 
> counters.
> 
> This patch modifies MemoryPressureMesosTests to wait for memory pressure 
> events to stop before checking the counts for correctness.
> The tests now stop the memory-pressure-triggering task and then wait for all 
> events to be processed before checking the counters.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/memory_pressure_tests.cpp 
> e18b971c4df26c9b9c103ca73bdad4fd400d6c02 
> 
> Diff: https://reviews.apache.org/r/40849/diff/
> 
> 
> Testing
> ---
> 
> On Debian 8, Ubuntu 14, Centos 7 (Thanks Jan!):
> `make check`
> `sudo bin/mesos-tests.sh --gtest_filter="*MemoryPressureMesosTest*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> ^ Ran the above for a couple minutes or ~100 times.  It was previously 
> failing ~1/5 times.
> 
> ---
> Note on Centos 6 (Thanks Greg!):
> ```
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> ```
> 
> ---
> Note on Ubuntu 14:
> There is some other flakiness (in Agent recovery).  This will be tracked in 
> https://issues.apache.org/jira/browse/MESOS-4047
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 40849: Fix flaky MemoryPressureMesosTests

2015-12-03 Thread Bernd Mathiske

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40849/#review108829
---

Ship it!



src/tests/containerizer/memory_pressure_tests.cpp (line 140)


Consider moving this comment to line 143, before "break;". It makes more 
sense to me to talk about what you are not doing once you have actually reached 
the statement where you might do something.


- Bernd Mathiske


On Dec. 2, 2015, 11:25 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40849/
> ---
> 
> (Updated Dec. 2, 2015, 11:25 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Greg Mann, Artem Harutyunyan, Jan 
> Schlicht, and Till Toenshoff.
> 
> 
> Bugs: MESOS-3586
> https://issues.apache.org/jira/browse/MESOS-3586
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The existing tests will check that "low" pressure events occur at least as 
> often as "medium" pressure events (this is the documented behavior).  
> However, the order of events and the order in which we process said events is 
> not guaranteed.  When we collect the pressure events via a counter, there may 
> be some events that are in-flight, and thereby not accounted for in the 
> counters.
> 
> This patch modifies MemoryPressureMesosTests to wait for memory pressure 
> events to stop before checking the counts for correctness.
> The tests now stop the memory-pressure-triggering task and then wait for all 
> events to be processed before checking the counters.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/memory_pressure_tests.cpp 
> e18b971c4df26c9b9c103ca73bdad4fd400d6c02 
> 
> Diff: https://reviews.apache.org/r/40849/diff/
> 
> 
> Testing
> ---
> 
> On Debian 8, Ubuntu 14, Centos 7 (Thanks Jan!):
> `make check`
> `sudo bin/mesos-tests.sh --gtest_filter="*MemoryPressureMesosTest*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> ^ Ran the above for a couple minutes or ~100 times.  It was previously 
> failing ~1/5 times.
> 
> ---
> Note on Centos 6 (Thanks Greg!):
> ```
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> ```
> 
> ---
> Note on Ubuntu 14:
> There is some other flakiness (in Agent recovery).  This will be tracked in 
> https://issues.apache.org/jira/browse/MESOS-4047
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 40849: Fix flaky MemoryPressureMesosTests

2015-12-03 Thread Joseph Wu


> On Dec. 3, 2015, 6:24 a.m., Bernd Mathiske wrote:
> > src/tests/containerizer/memory_pressure_tests.cpp, line 140
> > 
> >
> > Consider moving this comment to line 143, before "break;". It makes 
> > more sense to me to talk about what you are not doing once you have 
> > actually reached the statement where you might do something.

Done!  
Also improved the wording a bit :)


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40849/#review108829
---


On Dec. 3, 2015, 11:01 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40849/
> ---
> 
> (Updated Dec. 3, 2015, 11:01 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Greg Mann, Artem Harutyunyan, Jan 
> Schlicht, and Till Toenshoff.
> 
> 
> Bugs: MESOS-3586
> https://issues.apache.org/jira/browse/MESOS-3586
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The existing tests will check that "low" pressure events occur at least as 
> often as "medium" pressure events (this is the documented behavior).  
> However, the order of events and the order in which we process said events is 
> not guaranteed.  When we collect the pressure events via a counter, there may 
> be some events that are in-flight, and thereby not accounted for in the 
> counters.
> 
> This patch modifies MemoryPressureMesosTests to wait for memory pressure 
> events to stop before checking the counts for correctness.
> The tests now stop the memory-pressure-triggering task and then wait for all 
> events to be processed before checking the counters.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/memory_pressure_tests.cpp 
> e18b971c4df26c9b9c103ca73bdad4fd400d6c02 
> 
> Diff: https://reviews.apache.org/r/40849/diff/
> 
> 
> Testing
> ---
> 
> On Debian 8, Ubuntu 14, Centos 7 (Thanks Jan!):
> `make check`
> `sudo bin/mesos-tests.sh --gtest_filter="*MemoryPressureMesosTest*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> ^ Ran the above for a couple minutes or ~100 times.  It was previously 
> failing ~1/5 times.
> 
> ---
> Note on Centos 6 (Thanks Greg!):
> ```
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> ```
> 
> ---
> Note on Ubuntu 14:
> There is some other flakiness (in Agent recovery).  This will be tracked in 
> https://issues.apache.org/jira/browse/MESOS-4047
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 40849: Fix flaky MemoryPressureMesosTests

2015-12-03 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40849/
---

(Updated Dec. 3, 2015, 11:01 a.m.)


Review request for mesos, Bernd Mathiske, Greg Mann, Artem Harutyunyan, Jan 
Schlicht, and Till Toenshoff.


Changes
---

Moved and reworded a comment.


Bugs: MESOS-3586
https://issues.apache.org/jira/browse/MESOS-3586


Repository: mesos


Description
---

The existing tests will check that "low" pressure events occur at least as 
often as "medium" pressure events (this is the documented behavior).  However, 
the order of events and the order in which we process said events is not 
guaranteed.  When we collect the pressure events via a counter, there may be 
some events that are in-flight, and thereby not accounted for in the counters.

This patch modifies MemoryPressureMesosTests to wait for memory pressure events 
to stop before checking the counts for correctness.
The tests now stop the memory-pressure-triggering task and then wait for all 
events to be processed before checking the counters.


Diffs (updated)
-

  src/tests/containerizer/memory_pressure_tests.cpp 
e18b971c4df26c9b9c103ca73bdad4fd400d6c02 

Diff: https://reviews.apache.org/r/40849/diff/


Testing
---

On Debian 8, Ubuntu 14, Centos 7 (Thanks Jan!):
`make check`
`sudo bin/mesos-tests.sh --gtest_filter="*MemoryPressureMesosTest*" 
--gtest_repeat=-1 --gtest_break_on_failure`

^ Ran the above for a couple minutes or ~100 times.  It was previously failing 
~1/5 times.

---
Note on Centos 6 (Thanks Greg!):
```
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
../../src/tests/mesos.cpp:849: Failure
Value of: _baseHierarchy.get()
  Actual: "/cgroup"
Expected: baseHierarchy
Which is: "/tmp/mesos_test_cgroup"
-
Multiple cgroups base hierarchies detected:
  '/tmp/mesos_test_cgroup'
  '/cgroup'
Mesos does not support multiple cgroups base hierarchies.
Please unmount the corresponding (or all) subsystems.
-
../../src/tests/mesos.cpp:932: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
```

---
Note on Ubuntu 14:
There is some other flakiness (in Agent recovery).  This will be tracked in 
https://issues.apache.org/jira/browse/MESOS-4047


Thanks,

Joseph Wu



Re: Review Request 40849: Fix flaky MemoryPressureMesosTests

2015-12-02 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40849/
---

(Updated Dec. 2, 2015, 11:25 a.m.)


Review request for mesos, Bernd Mathiske, Greg Mann, Artem Harutyunyan, Jan 
Schlicht, and Till Toenshoff.


Changes
---

Fixed typo in comment.  Tested on a few more platforms.


Summary (updated)
-

Fix flaky MemoryPressureMesosTests


Bugs: MESOS-3586
https://issues.apache.org/jira/browse/MESOS-3586


Repository: mesos


Description
---

The existing tests will check that "low" pressure events occur at least as 
often as "medium" pressure events (this is the documented behavior).  However, 
the order of events and the order in which we process said events is not 
guaranteed.  When we collect the pressure events via a counter, there may be 
some events that are in-flight, and thereby not accounted for in the counters.

This patch modifies MemoryPressureMesosTests to wait for memory pressure events 
to stop before checking the counts for correctness.
The tests now stop the memory-pressure-triggering task and then wait for all 
events to be processed before checking the counters.


Diffs (updated)
-

  src/tests/containerizer/memory_pressure_tests.cpp 
e18b971c4df26c9b9c103ca73bdad4fd400d6c02 

Diff: https://reviews.apache.org/r/40849/diff/


Testing (updated)
---

On Debian 8, Ubuntu 14, Centos 7 (Thanks Jan!):
`make check`
`sudo bin/mesos-tests.sh --gtest_filter="*MemoryPressureMesosTest*" 
--gtest_repeat=-1 --gtest_break_on_failure`

^ Ran the above for a couple minutes or ~100 times.  It was previously failing 
~1/5 times.

---
Note on Centos 6 (Thanks Greg!):
```
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
../../src/tests/mesos.cpp:849: Failure
Value of: _baseHierarchy.get()
  Actual: "/cgroup"
Expected: baseHierarchy
Which is: "/tmp/mesos_test_cgroup"
-
Multiple cgroups base hierarchies detected:
  '/tmp/mesos_test_cgroup'
  '/cgroup'
Mesos does not support multiple cgroups base hierarchies.
Please unmount the corresponding (or all) subsystems.
-
../../src/tests/mesos.cpp:932: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
```

---
Note on Ubuntu 14:
There is some other flakiness (in Agent recovery).  This will be tracked in 
https://issues.apache.org/jira/browse/MESOS-4047


Thanks,

Joseph Wu