-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199815
-----------------------------------------------------------



Master (f32086d) is red with this patch.
  ./build-support/jenkins/build.sh

                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[add_instances]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 46%]
                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[create_job]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 53%]
                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[kill_job]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 60%]
                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[restart]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 66%]
                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[start_cronjob]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 73%]
                     
src/test/python/apache/aurora/client/hooks/test_hooked_api.py::test_api_methods_params[start_job_update]
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_hooked_api.py
 PASSED [ 80%]
                     
src/test/python/apache/aurora/client/hooks/test_non_hooked_api.py::TestNonHookedAuroraClientAPI::test_kill_job_discards_config
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_non_hooked_api.py
 PASSED [ 86%]
                     
src/test/python/apache/aurora/client/hooks/test_non_hooked_api.py::TestNonHookedAuroraClientAPI::test_restart_discards_config
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_non_hooked_api.py
 PASSED [ 93%]
                     
src/test/python/apache/aurora/client/hooks/test_non_hooked_api.py::TestNonHookedAuroraClientAPI::test_start_cronjob_discards_config
 <- 
.pants.d/pyprep/sources/c19e1cfebce41d1e9b9c5fa55be409ab288ab83d/apache/aurora/client/hooks/test_non_hooked_api.py
 PASSED [100%]
                     
                      generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.pants.d/test/pytest/src.test.python.apache.aurora.client.hooks.hooks/junitxml/TEST-src.test.python.apache.aurora.client.hooks.hooks.xml
 
                     =========== 15 passed in 0.31 seconds ============
                     
                   src.test.python.apache.aurora.admin.admin                    
                   .....   SUCCESS
                   src.test.python.apache.aurora.client.client                  
                   .....   SUCCESS
                   src.test.python.apache.aurora.client.api.api                 
                   .....   SUCCESS
                   src.test.python.apache.aurora.client.cli.cli                 
                   .....   SUCCESS
                   src.test.python.apache.aurora.client.docker.docker           
                   .....   SUCCESS
                   src.test.python.apache.aurora.client.hooks.hooks             
                   .....   SUCCESS
                   src.test.python.apache.aurora.common.common                  
                   .....   SUCCESS
                   
src.test.python.apache.aurora.common.health_check.health_check                  
.....   SUCCESS
                   src.test.python.apache.aurora.config.config                  
                   .....   SUCCESS
                   src.test.python.apache.aurora.executor.executor              
                   .....   FAILURE
                   src.test.python.apache.aurora.executor.bin.bin               
                   .....   SUCCESS
                   src.test.python.apache.aurora.executor.common.common         
                   .....   SUCCESS
                   src.test.python.apache.aurora.tools.tools                    
                   .....   SUCCESS
                   src.test.python.apache.thermos.cli.cli                       
                   .....   SUCCESS
                   src.test.python.apache.thermos.cli.commands.commands         
                   .....   SUCCESS
                   src.test.python.apache.thermos.common.common                 
                   .....   SUCCESS
                   src.test.python.apache.thermos.config.config                 
                   .....   SUCCESS
                   src.test.python.apache.thermos.core.core                     
                   .....   SUCCESS
                   src.test.python.apache.thermos.monitoring.monitoring         
                   .....   SUCCESS
                   src.test.python.apache.thermos.observer.observer             
                   .....   SUCCESS
                   src.test.python.apache.thermos.observer.http.http            
                   .....   SUCCESS
FAILURE


               Waiting for background workers to finish.
22:32:24 06:24   [complete]
               FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On March 22, 2018, 9:52 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> -----------------------------------------------------------
> 
> (Updated March 22, 2018, 9:52 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -----
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/7/
> 
> 
> Testing
> -------
> 
> - I added unit tests.
> - Tested in vagrant and it works as intenced.
> - I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> Here is one specific test setup: On two hosts I created a two tasks. Each 
> task creates identical nested directory structures and files in them. The 
> overall size is 30GB. test_host_1 runs the current version of observer and 
> test_host_2 runs Observer with this patch and also has mesos_disk_collection 
> enabled. The results are as follows:
> 
> ```
> rezam[7]TEST_HOST_1 ~ $ while true; do echo `date`; curl localhost:1338/vars 
> -s | grep cpu; sleep 10; done
> Thu Mar 22 04:36:17 UTC 2018
> observer.observer_cpu 108.9
> Thu Mar 22 04:36:27 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:38 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:48 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:58 UTC 2018
> observer.observer_cpu 111.0
> Thu Mar 22 04:37:08 UTC 2018
> observer.observer_cpu 111.0
> Thu Mar 22 04:37:18 UTC 2018
> observer.observer_cpu 111.0
> 
> 
> rezam[7]TEST_HOST_2 ~ $ while true; do echo `date`; curl localhost:1338/vars 
> -s | grep cpu; sleep 10; done
> Thu Mar 22 04:36:20 UTC 2018
> observer.observer_cpu 1.3
> Thu Mar 22 04:36:30 UTC 2018
> observer.observer_cpu 1.3
> Thu Mar 22 04:36:40 UTC 2018
> observer.observer_cpu 1.3
> Thu Mar 22 04:36:50 UTC 2018
> observer.observer_cpu 1.2
> Thu Mar 22 04:37:00 UTC 2018
> observer.observer_cpu 1.2
> Thu Mar 22 04:37:10 UTC 2018
> observer.observer_cpu 1.2
> Thu Mar 22 04:37:20 UTC 2018
> observer.observer_cpu 1.8
> ```
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Reply via email to