----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60748/ -----------------------------------------------------------
(Updated July 10, 2017, 6:30 p.m.) Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer Manji. Repository: aurora Description ------- # Prototype using cgroups for monitoring Thermos Process resource consumption (CPU and memory) The idea behind this prototype is to use kernel cgroups instead of per pid monitoring of Thermos Tasks and Processes. This [document](https://docs.google.com/a/twitter.com/document/d/16JFIqY2ftvNNXxYf6jQwO6EXPajCKp7kPJRAQSsaPko/edit?usp=sharing) describes more about the problem that this prototype tries to solve. __Note:__ Since I am piggybacking on the cgroup clean-up implemented in Mesos, if Mesos's memory and CPU isolation are not enabled, I will not create cgroups and will simply revert to using old monitoring scheme. # Notes on Performance: I used `top -p <thermos-pid> -bc -n 10 | grep 'python'` to monitor the cpu usage of thermos on my vagrant. I had 7 Tasks each with 3 Processes. > Stock Thermos Observer ``` 21641 root 20 0 1351200 44448 4088 S 6.6 1.4 0:35.69 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44448 4088 S 2.7 1.4 0:35.77 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44448 4088 S 3.3 1.4 0:35.87 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44448 4088 S 2.3 1.4 0:35.94 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44448 4088 S 4.3 1.4 0:36.07 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44448 4088 S 3.6 1.4 0:36.18 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351204 44616 4088 S 11.6 1.4 0:36.53 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44552 4088 S 39.6 1.4 0:37.72 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44552 4088 S 2.7 1.4 0:37.80 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO 21641 root 20 0 1351200 44552 4088 S 7.6 1.4 0:38.03 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO ``` > Thermos Observer using CGROUP monitoring ``` 15203 root 20 0 1367828 45344 4088 S 6.6 1.5 0:55.37 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1367828 45344 4088 S 2.0 1.5 0:55.43 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 4.3 1.5 0:55.56 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.63 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 2.0 1.5 0:55.69 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:55.79 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.86 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 1.0 1.5 0:55.89 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.96 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:56.06 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO ``` Diffs ----- examples/vagrant/mesos_config/etc_mesos-slave/isolation 1a7028ffc70116b104ef3ad22b7388f637707a0f src/main/python/apache/aurora/executor/thermos_task_runner.py 8f88af4c24ddc603fa12587741af56a6c711e420 src/main/python/apache/thermos/core/cgroup.py PRE-CREATION src/main/python/apache/thermos/core/process.py 4a4678ff39c84cb87836aca19365c5b2aabc4fa4 src/main/python/apache/thermos/monitoring/process_collector_cgroup.py PRE-CREATION src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 src/main/python/apache/thermos/observer/http/templates/task.tpl f3e06985eb3c05572aa4389d97da575b1179f616 Diff: https://reviews.apache.org/r/60748/diff/1/ Testing (updated) ------- This patch is mostly a prototype. Note that I had to enable Mesos's cpu and memory isolation. Current tests pass. I first want to see how the community feels generally about this approach, and then I will add additional tests. Thanks, Reza Motamedi
