[ https://issues.apache.org/jira/browse/AURORA-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836806#comment-15836806 ]
Stephan Erb commented on AURORA-1809: ------------------------------------- This test fails due to the recent introduction of {{PR_SET_CHILD_SUBREAPER}}. Only in the full suite {{setup_child_subreaping()}} is called before the above mentioned test case is run. If we additionally call {{setup_child_subreaping()}} from within the testcase it fails all the time. > Investigate flaky test TestRunnerKillProcessGroup.test_pg_is_killed > ------------------------------------------------------------------- > > Key: AURORA-1809 > URL: https://issues.apache.org/jira/browse/AURORA-1809 > Project: Aurora > Issue Type: Bug > Reporter: Zameer Manji > Fix For: 0.17.0 > > > If you run it apart of the full test suite it fails like this: > {noformat} > ==================== FAILURES ==================== > __ TestRunnerKillProcessGroup.test_pg_is_killed __ > > self = <test_staged_kill.TestRunnerKillProcessGroup > object at 0x7f0c79893e10> > > [1m def test_pg_is_killed(self):[0m > [1m runner = self.start_runner()[0m > [1m tm = TaskMonitor(runner.tempdir, > runner.task_id)[0m > [1m self.wait_until_running(tm)[0m > [1m process_state, run_number = > tm.get_active_processes()[0][0m > [1m assert process_state.process == 'process'[0m > [1m assert run_number == 0[0m > [1m [0m > [1m child_pidfile = os.path.join(runner.sandbox, > runner.task_id, 'child.txt')[0m > [1m while not os.path.exists(child_pidfile):[0m > [1m time.sleep(0.1)[0m > [1m parent_pidfile = os.path.join(runner.sandbox, > runner.task_id, 'parent.txt')[0m > [1m while not os.path.exists(parent_pidfile):[0m > [1m time.sleep(0.1)[0m > [1m with open(child_pidfile) as fp:[0m > [1m child_pid = int(fp.read().rstrip())[0m > [1m with open(parent_pidfile) as fp:[0m > [1m parent_pid = int(fp.read().rstrip())[0m > [1m [0m > [1m ps = ProcessProviderFactory.get()[0m > [1m ps.collect_all()[0m > [1m assert parent_pid in ps.pids()[0m > [1m assert child_pid in ps.pids()[0m > [1m assert child_pid in > ps.children_of(parent_pid)[0m > [1m [0m > [1m with open(os.path.join(runner.sandbox, > runner.task_id, 'exit.txt'), 'w') as fp:[0m > [1m fp.write('go away!')[0m > [1m [0m > [1m while tm.task_state() is not > TaskState.SUCCESS:[0m > [1m time.sleep(0.1)[0m > [1m [0m > [1m state = tm.get_state()[0m > [1m assert state.processes['process'][0].state == > ProcessState.SUCCESS[0m > [1m [0m > [1m ps.collect_all()[0m > [1m assert parent_pid not in ps.pids()[0m > [1m> assert child_pid not in ps.pids()[0m > [1m[31mE assert 30475 not in set([1, 2, 3, 5, 7, > 8, ...])[0m > [1m[31mE + where set([1, 2, 3, 5, 7, 8, ...]) = > <bound method ProcessProvider_Procfs.pids of > <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object > at 0x7f0c798b1990>>()[0m > [1m[31mE + where <bound method > ProcessProvider_Procfs.pids of > <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object > at 0x7f0c798b1990>> = > <twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object > at 0x7f0c798b1990>.pids[0m > > > src/test/python/apache/thermos/core/test_staged_kill.py:287: AssertionError > -------------- Captured stderr call -------------- > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > WARNING:root:Could not read from checkpoint > /tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner > generated xml file: > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml > > [1m[31m 1 failed, 719 passed, 6 skipped, 1 warnings in > 206.00 seconds [0m > > FAILURE > {noformat} > If you run the test as a one off you see this: > {noformat} > 00:45:32 00:00 [main] > (To run a reporting server: ./pants server) > 00:45:32 00:00 [setup] > 00:45:32 00:00 [parse]fatal: Not a git repository (or any of the parent > directories): .git > Executing tasks in goals: test > 00:45:32 00:00 [test] > 00:45:32 00:00 [test-prep-command] > 00:45:32 00:00 [test] > 00:45:32 00:00 [pytest] > 00:45:32 00:00 [run] > ============== test session starts =============== > platform linux2 -- Python 2.7.6 -- py-1.4.31 -- > pytest-2.6.4 -- /usr/bin/python2.7 > plugins: cov, timeout > collected 83 items > > src/test/python/apache/thermos/core/test_staged_kill.py::TestRunnerKillProcessGroup::test_pg_is_killed > PASSED > generated xml file: > /home/vagrant/aurora/dist/test-results/src.test.python.apache.thermos.core.core.xml > 82 tests deselected by '-kTestRunnerKillProcessGroup' > 1 passed, 82 deselected, 1 warnings in 2.49 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)