----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60748/#review180698 -----------------------------------------------------------
I don't have the bandwidth to review this. Can you remove me? I support this idea however, it is probably far better than the `/proc` approach we have today. - Zameer Manji On July 11, 2017, 1:12 p.m., Reza Motamedi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60748/ > ----------------------------------------------------------- > > (Updated July 11, 2017, 1:12 p.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, > Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > # Prototype using cgroups for monitoring Thermos Process resource consumption > (CPU and memory) > The idea behind this prototype is to use kernel cgroups instead of per pid > monitoring of Thermos Tasks and Processes. > This > [document](https://docs.google.com/document/d/1i5GY8cK_KZ_ebG8V2FLXeu0waRqzSHz82bAHCud_yoQ/edit?usp=sharing) > describes more about the problem that this prototype tries to solve. > > __Note:__ Since I am piggybacking on the cgroup clean-up implemented in > Mesos, if Mesos's memory and CPU isolation are not enabled, I will not create > cgroups and will simply revert to using old monitoring scheme. > > __Important Compatibilty:__ It also came to my attention that this kind of > monitoring for memory only works when `memory.use_hierarchy` flag is enabled. > At least in my vagrant this does not seem to be the case, therefore some > support on the Mesos side is needed first. > > > # Notes on Performance: > > I used `top -p <thermos-pid> -bc -n 10 | grep 'python'` to monitor the cpu > usage of thermos on my vagrant. I had 7 Tasks each with 3 Processes. > > Stock Thermos Observer > ``` > 21641 root 20 0 1351200 44448 4088 S 6.6 1.4 0:35.69 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 2.7 1.4 0:35.77 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 3.3 1.4 0:35.87 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 2.3 1.4 0:35.94 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 4.3 1.4 0:36.07 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 3.6 1.4 0:36.18 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351204 44616 4088 S 11.6 1.4 0:36.53 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 39.6 1.4 0:37.72 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 2.7 1.4 0:37.80 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 7.6 1.4 0:38.03 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=NONE --log_to_stderr=google:INFO > ``` > > Thermos Observer using CGROUP monitoring > ``` > 15203 root 20 0 1367828 45344 4088 S 6.6 1.5 0:55.37 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1367828 45344 4088 S 2.0 1.5 0:55.43 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 4.3 1.5 0:55.56 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.63 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.0 1.5 0:55.69 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:55.79 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.86 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 1.0 1.5 0:55.89 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.96 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:56.06 python2.7 > /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 > --log_to_disk=DEBUG --log_to_stderr=google:INFO > ``` > > > Diffs > ----- > > examples/vagrant/mesos_config/etc_mesos-slave/isolation > 1a7028ffc70116b104ef3ad22b7388f637707a0f > src/main/python/apache/aurora/executor/thermos_task_runner.py > 8f88af4c24ddc603fa12587741af56a6c711e420 > src/main/python/apache/thermos/core/cgroup.py PRE-CREATION > src/main/python/apache/thermos/core/process.py > 4a4678ff39c84cb87836aca19365c5b2aabc4fa4 > src/main/python/apache/thermos/monitoring/process_collector_cgroup.py > PRE-CREATION > src/main/python/apache/thermos/monitoring/resource.py > 434666696e600a0e6c19edd986c86575539976f2 > src/main/python/apache/thermos/observer/http/templates/task.tpl > f3e06985eb3c05572aa4389d97da575b1179f616 > > > Diff: https://reviews.apache.org/r/60748/diff/3/ > > > Testing > ------- > > This patch is mostly a prototype. Note that I had to enable Mesos's cpu and > memory isolation. > > Current tests pass. I first want to see how the community feels generally > about this approach, and then I will add additional tests. > > > Thanks, > > Reza Motamedi > >
