-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61016/
-----------------------------------------------------------

(Updated July 21, 2017, 9:12 p.m.)


Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.


Repository: aurora


Description
-------

# lock psutil's oneshot

TLDR; psutil's `oneshot` is not threadsafe.

After a lot of testing on busy machines, I realized that psutil's oneshot is 
not threadsafe. I contanced the developer however, have not recevied a conceret 
fix.

Please read https://issues.apache.org/jira/browse/AURORA-1939 and 
https://github.com/giampaolo/psutil/issues/1110 for more information.


Diffs
-----

  src/main/python/apache/thermos/monitoring/process_collector_psutil.py 
3594955c68b45ab65c01426ba0a18ec8a132a27f 


Diff: https://reviews.apache.org/r/61016/diff/1/


Testing (updated)
-------

The following test is done by adding additional logging in the current code:


```
... 
     cpu_times = process.cpu_times()
+    log.debug("process:{} cpu times {}".format(process, cpu_times))
     user, system = cpu_times.user, cpu_times.system
     memory_info = p
...      
```

```
$ grep '36350' 
thermos-observer.XXXX.prod.twttr.net.root.log.DEBUG.20170721-163950.9421
D0721 16:55:28.242974 9421 process_collector_psutil.py:40] 
process:psutil.Process(pid=36350, name='mesos-slave') cpu times 
pcputimes(user=2500.95, system=4487.06, children_user=0.0, children_system=0.0)
D0721 17:11:21.940462 9421 process_collector_psutil.py:40] 
process:psutil.Process(pid=36350, name='bash') cpu times pcputimes(user=0.0, 
system=0.03, children_user=0.0, children_system=0.0)
D0721 17:11:22.247414 9421 process_collector_psutil.py:111] Calculated rate for 
pid=34339 and children: -7.32560348996 (old: 6988.040000, new: 0.060000) 
{34339: 1498166704.32, 36350: 1498166720.51} -> {34339: 1498166704.32, 36350: 
1498166720.51} [{34339: ProcessSample(rate=0.0, user=0.0, system=0.03, 
rss=2777088, vms=11919360, nice=0, status='sleeping', threads=1), 36350: 
ProcessSample(rate=0.0, user=2500.95, system=4487.06, rss=41906176, 
vms=1601019904, nice=0, status='sleeping', threads=20)}] [{34339: 
ProcessSample(rate=0.0, user=0.0, system=0.03, rss=2777088, vms=11919360, 
nice=0, status='sleeping', threads=1), 36350: ProcessSample(rate=0.0, user=0.0, 
system=0.03, rss=41906176, vms=1601019904, nice=0, status='sleeping', 
threads=20)}]
```

These inconsistencies disappear after removing oneshot.


Thanks,

Reza Motamedi

Reply via email to