----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61016/ -----------------------------------------------------------
Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb. Repository: aurora Description ------- # lock psutil's oneshot TLDR; psutil's `oneshot` is not threadsafe. After a lot of testing on busy machines, I realized that psutil's oneshot is not threadsafe. I contanced the developer however, have not recevied a conceret fix. Please read https://issues.apache.org/jira/browse/AURORA-1939 and https://github.com/giampaolo/psutil/issues/1110 for more information. Diffs ----- src/main/python/apache/thermos/monitoring/process_collector_psutil.py 3594955c68b45ab65c01426ba0a18ec8a132a27f Diff: https://reviews.apache.org/r/61016/diff/1/ Testing ------- The following test is done by adding additional logging in the current code: ``` ... cpu_times = process.cpu_times() + log.debug("process:{} cpu times {}".format(process, cpu_times)) user, system = cpu_times.user, cpu_times.system memory_info = p ... ``` ``` $ grep '36350' thermos-observer.atla-btm-09-sr1.prod.twttr.net.root.log.DEBUG.20170721-163950.9421 D0721 16:55:28.242974 9421 process_collector_psutil.py:40] process:psutil.Process(pid=36350, name='mesos-slave') cpu times pcputimes(user=2500.95, system=4487.06, children_user=0.0, children_system=0.0) D0721 17:11:21.940462 9421 process_collector_psutil.py:40] process:psutil.Process(pid=36350, name='bash') cpu times pcputimes(user=0.0, system=0.03, children_user=0.0, children_system=0.0) D0721 17:11:22.247414 9421 process_collector_psutil.py:111] Calculated rate for pid=34339 and children: -7.32560348996 (old: 6988.040000, new: 0.060000) {34339: 1498166704.32, 36350: 1498166720.51} -> {34339: 1498166704.32, 36350: 1498166720.51} [{34339: ProcessSample(rate=0.0, user=0.0, system=0.03, rss=2777088, vms=11919360, nice=0, status='sleeping', threads=1), 36350: ProcessSample(rate=0.0, user=2500.95, system=4487.06, rss=41906176, vms=1601019904, nice=0, status='sleeping', threads=20)}] [{34339: ProcessSample(rate=0.0, user=0.0, system=0.03, rss=2777088, vms=11919360, nice=0, status='sleeping', threads=1), 36350: ProcessSample(rate=0.0, user=0.0, system=0.03, rss=41906176, vms=1601019904, nice=0, status='sleeping', threads=20)}] ``` These inconsistencies disappear after removing oneshot. Thanks, Reza Motamedi
