-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60748/
-----------------------------------------------------------

(Updated July 10, 2017, 10:03 p.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Zameer Manji.


Repository: aurora


Description
-------

# Prototype using cgroups for monitoring Thermos Process resource consumption 
(CPU and memory)
The idea behind this prototype is to use kernel cgroups instead of per pid 
monitoring of Thermos Tasks and Processes.
This 
[document](https://docs.google.com/a/twitter.com/document/d/16JFIqY2ftvNNXxYf6jQwO6EXPajCKp7kPJRAQSsaPko/edit?usp=sharing)
 describes more about the problem that this prototype tries to solve.

__Note:__ Since I am piggybacking on the cgroup clean-up implemented in Mesos, 
if Mesos's memory and CPU isolation are not enabled, I will not create cgroups 
and will simply revert to using old monitoring scheme. 

# Notes on Performance:

I used `top -p <thermos-pid> -bc -n 10 | grep 'python'` to monitor the cpu 
usage of thermos on my vagrant. I had 7 Tasks each with 3 Processes.
> Stock Thermos Observer
```
21641 root      20   0 1351200  44448   4088 S   6.6  1.4   0:35.69 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44448   4088 S   2.7  1.4   0:35.77 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44448   4088 S   3.3  1.4   0:35.87 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44448   4088 S   2.3  1.4   0:35.94 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44448   4088 S   4.3  1.4   0:36.07 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44448   4088 S   3.6  1.4   0:36.18 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351204  44616   4088 S  11.6  1.4   0:36.53 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44552   4088 S  39.6  1.4   0:37.72 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44552   4088 S   2.7  1.4   0:37.80 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
21641 root      20   0 1351200  44552   4088 S   7.6  1.4   0:38.03 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=NONE --log_to_stderr=google:INFO
```
> Thermos Observer using CGROUP monitoring
```
15203 root      20   0 1367828  45344   4088 S   6.6  1.5   0:55.37 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1367828  45344   4088 S   2.0  1.5   0:55.43 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   4.3  1.5   0:55.56 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   2.3  1.5   0:55.63 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   2.0  1.5   0:55.69 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   3.3  1.5   0:55.79 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   2.3  1.5   0:55.86 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   1.0  1.5   0:55.89 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   2.3  1.5   0:55.96 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
15203 root      20   0 1351436  45308   4088 S   3.3  1.5   0:56.06 python2.7 
/home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 
--log_to_disk=DEBUG --log_to_stderr=google:INFO
```


Diffs (updated)
-----

  examples/vagrant/mesos_config/etc_mesos-slave/isolation 
1a7028ffc70116b104ef3ad22b7388f637707a0f 
  src/main/python/apache/aurora/executor/thermos_task_runner.py 
8f88af4c24ddc603fa12587741af56a6c711e420 
  src/main/python/apache/thermos/core/cgroup.py PRE-CREATION 
  src/main/python/apache/thermos/core/process.py 
4a4678ff39c84cb87836aca19365c5b2aabc4fa4 
  src/main/python/apache/thermos/monitoring/process_collector_cgroup.py 
PRE-CREATION 
  src/main/python/apache/thermos/monitoring/resource.py 
434666696e600a0e6c19edd986c86575539976f2 
  src/main/python/apache/thermos/observer/http/templates/task.tpl 
f3e06985eb3c05572aa4389d97da575b1179f616 


Diff: https://reviews.apache.org/r/60748/diff/2/

Changes: https://reviews.apache.org/r/60748/diff/1-2/


Testing
-------

This patch is mostly a prototype. Note that I had to enable Mesos's cpu and 
memory isolation.

Current tests pass. I first want to see how the community feels generally about 
this approach, and then I will add additional tests.


Thanks,

Reza Motamedi

Reply via email to