Re: Review Request 66269: End to end tests misc. fixes

2018-03-25 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66269/#review199938
---


Ship it!




Master (03eb337) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On March 25, 2018, 7:59 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66269/
> ---
> 
> (Updated March 25, 2018, 7:59 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Stephan Erb.
> 
> 
> Bugs: AURORA-1974
> https://issues.apache.org/jira/browse/AURORA-1974
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Excluding kerberos unit file from being copied on provision as it's later 
> copied and deleted by the end to end test.
> 
> Bypass leader redirect changed from upstart to systemd. This test wasn't 
> being run because the kerberos test was failing.
> 
> Fixing kerberos end to end test. Previous version had it's signing key 
> revoked resulting in the test failing.
> 
> Chaning docker image to slim-stretch in docker aurora tests to address 
> AURORA-1974.
> 
> Added daemon-reload to aurorabuild whenever the daemons are restarted.
> 
> 
> Diffs
> -
> 
>   examples/jobs/hello_docker_engine.aurora 
> 99d99a26844f2f2f473626b16cfbf91aa70031ff 
>   examples/jobs/hello_docker_image.aurora 
> 049a147749876f795636827ea5e5485fa72a0930 
>   examples/vagrant/aurorabuild.sh c39388f46ea4718117889a5c67aec9afcc7f5d2e 
>   examples/vagrant/provision-dev-cluster.sh 
> fe3281f6b1f6adee021e534b230221efb86a5d3c 
>   examples/vagrant/systemd/aurora-scheduler-kerberos.service 
> 10e4f2c355c10b8204518af0a49c9d90be4f6ef8 
>   src/test/sh/org/apache/aurora/e2e/test_bypass_leader_redirect_end_to_end.sh 
> 5c0f12b56a30eef35c1903d5f4a96591d3c74471 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> 646c213ea105e32f2d37df29832aa1009481b6d1 
> 
> 
> Diff: https://reviews.apache.org/r/66269/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Review Request 66269: End to end tests misc. fixes

2018-03-25 Thread Renan DelValle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66269/
---

Review request for Aurora, Jordan Ly and Stephan Erb.


Bugs: AURORA-1974
https://issues.apache.org/jira/browse/AURORA-1974


Repository: aurora


Description
---

Excluding kerberos unit file from being copied on provision as it's later 
copied and deleted by the end to end test.

Bypass leader redirect changed from upstart to systemd. This test wasn't being 
run because the kerberos test was failing.

Fixing kerberos end to end test. Previous version had it's signing key revoked 
resulting in the test failing.

Chaning docker image to slim-stretch in docker aurora tests to address 
AURORA-1974.

Added daemon-reload to aurorabuild whenever the daemons are restarted.


Diffs
-

  examples/jobs/hello_docker_engine.aurora 
99d99a26844f2f2f473626b16cfbf91aa70031ff 
  examples/jobs/hello_docker_image.aurora 
049a147749876f795636827ea5e5485fa72a0930 
  examples/vagrant/aurorabuild.sh c39388f46ea4718117889a5c67aec9afcc7f5d2e 
  examples/vagrant/provision-dev-cluster.sh 
fe3281f6b1f6adee021e534b230221efb86a5d3c 
  examples/vagrant/systemd/aurora-scheduler-kerberos.service 
10e4f2c355c10b8204518af0a49c9d90be4f6ef8 
  src/test/sh/org/apache/aurora/e2e/test_bypass_leader_redirect_end_to_end.sh 
5c0f12b56a30eef35c1903d5f4a96591d3c74471 
  src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
646c213ea105e32f2d37df29832aa1009481b6d1 


Diff: https://reviews.apache.org/r/66269/diff/1/


Testing
---

./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Renan DelValle



Re: Review Request 66103: Introduce mesos disk collector

2018-03-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199934
---


Ship it!




Ship It!

- Stephan Erb


On March 23, 2018, 6:39 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 23, 2018, 6:39 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/10/
> 
> 
> Testing
> ---
> 
> - I added unit tests.
> - Tested in vagrant and it works as intenced.
> - I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> Here is one specific test setup: On two hosts I created a two tasks. Each 
> task creates identical nested directory structures and files in them. The 
> overall size is 30GB. test_host_1 runs the current version of observer and 
> test_host_2 runs Observer with this patch and also has mesos_disk_collection 
> enabled. The results are as follows:
> 
> ```
> rezam[7]TEST_HOST_1 ~ $ while true; do echo `date`; curl localhost:1338/vars 
> -s | grep cpu; sleep 10; done
> Thu Mar 22 04:36:17 UTC 2018
> observer.observer_cpu 108.9
> Thu Mar 22 04:36:27 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:38 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:48 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:58 UTC 2018
> observer.observer_cpu 111.0
> Thu Mar 22 04:37:08 UTC 2018
> observer.observer_cpu 111.0
> Thu Mar 22 04:37:18 UTC 2018
> observer.observer_cpu 111.0
> 
> 
> rezam[7]TEST_HOST_2 ~ $ while true; do echo `date`; curl