-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61016/#review181152
-----------------------------------------------------------



Master (8f5a591) is red with this patch.
  ./build-support/jenkins/build.sh

                     
                         def test_runner_state_reconstruction(self):
                     >     assert self.state == 
self.runner.reconstructed_state
                     E     assert RunnerState(header=None, 
processes=None, statuses=None) == None
                     E      +  where RunnerState(header=None, 
processes=None, statuses=None) = <test_finalization.TestRegularFinalizingTask 
object at 0x7f88745da090>.state
                     E      +  and   None = None
                     E      +    where None = 
<apache.thermos.testing.runner.Runner object at 
0x7f88745daa10>.reconstructed_state
                     E      +      where 
<apache.thermos.testing.runner.Runner object at 0x7f88745daa10> = 
<test_finalization.TestRegularFinalizingTask object at 
0x7f88745da090>.runner
                     
                     
.pants.d/python-setup/chroots/6108b131782500e43b1f032e7433d264e763b3e9/apache/thermos/testing/runner.py:212:
 AssertionError
                     ------------- Captured stderr setup --------------
                     ERROR:root:Failed to recover from 
/tmp/tmpPdZxLt/checkpoints/1500672624993483-runner-base/runner: [Errno 2] No 
such file or directory: 
'/tmp/tmpPdZxLt/checkpoints/1500672624993483-runner-base/runner'
                     __ TestRegularFinalizingTask.test_runner_state ___
                     
                     self = <test_finalization.TestRegularFinalizingTask object 
at 0x7f88745b7e10>
                     
                         def test_runner_state(self):
                     >     assert self.state.statuses[-1].state == 
TaskState.SUCCESS
                     E     TypeError: 'NoneType' object has no 
attribute '__getitem__'
                     
                     
src/test/python/apache/thermos/core/test_finalization.py:30: TypeError
                      
TestRegularFinalizingTask.test_runner_process_in_expected_states 
                     
                     self = <test_finalization.TestRegularFinalizingTask object 
at 0x7f88745aea50>
                     
                         def 
test_runner_process_in_expected_states(self):
                           history = self.state.processes
                           for process in ('main', 'finalizer'):
                     >       assert len(history[process]) == 1
                     E       TypeError: 'NoneType' object has no 
attribute '__getitem__'
                     
                     
src/test/python/apache/thermos/core/test_finalization.py:35: TypeError
                      generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/aaf4d108c31293299a0839bdc404a91802f80937.xml
 
                      3 failed, 794 passed, 6 skipped, 1 warnings in 
298.43 seconds 
                     
FAILURE


21:33:41 05:34   [complete]
               FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 21, 2017, 9:12 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61016/
> -----------------------------------------------------------
> 
> (Updated July 21, 2017, 9:12 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # lock psutil's oneshot
> 
> TLDR; psutil's `oneshot` is not threadsafe.
> 
> After a lot of testing on busy machines, I realized that psutil's oneshot is 
> not threadsafe. I contanced the developer however, have not recevied a 
> conceret fix.
> 
> Please read https://issues.apache.org/jira/browse/AURORA-1939 and 
> https://github.com/giampaolo/psutil/issues/1110 for more information.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/process_collector_psutil.py 
> 3594955c68b45ab65c01426ba0a18ec8a132a27f 
> 
> 
> Diff: https://reviews.apache.org/r/61016/diff/1/
> 
> 
> Testing
> -------
> 
> The following test is done by adding additional logging in the current code:
> 
> 
> ```
> ... 
>      cpu_times = process.cpu_times()
> +    log.debug("process:{} cpu times {}".format(process, cpu_times))
>      user, system = cpu_times.user, cpu_times.system
>      memory_info = p
> ...      
> ```
> 
> ```
> $ grep '36350' 
> thermos-observer.XXXX.prod.twttr.net.root.log.DEBUG.20170721-163950.9421
> D0721 16:55:28.242974 9421 process_collector_psutil.py:40] 
> process:psutil.Process(pid=36350, name='mesos-slave') cpu times 
> pcputimes(user=2500.95, system=4487.06, children_user=0.0, 
> children_system=0.0)
> D0721 17:11:21.940462 9421 process_collector_psutil.py:40] 
> process:psutil.Process(pid=36350, name='bash') cpu times pcputimes(user=0.0, 
> system=0.03, children_user=0.0, children_system=0.0)
> D0721 17:11:22.247414 9421 process_collector_psutil.py:111] Calculated rate 
> for pid=34339 and children: -7.32560348996 (old: 6988.040000, new: 0.060000) 
> {34339: 1498166704.32, 36350: 1498166720.51} -> {34339: 1498166704.32, 36350: 
> 1498166720.51} [{34339: ProcessSample(rate=0.0, user=0.0, system=0.03, 
> rss=2777088, vms=11919360, nice=0, status='sleeping', threads=1), 36350: 
> ProcessSample(rate=0.0, user=2500.95, system=4487.06, rss=41906176, 
> vms=1601019904, nice=0, status='sleeping', threads=20)}] [{34339: 
> ProcessSample(rate=0.0, user=0.0, system=0.03, rss=2777088, vms=11919360, 
> nice=0, status='sleeping', threads=1), 36350: ProcessSample(rate=0.0, 
> user=0.0, system=0.03, rss=41906176, vms=1601019904, nice=0, 
> status='sleeping', threads=20)}]
> ```
> 
> These inconsistencies disappear after removing oneshot.
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Reply via email to