----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63443/#review189731 -----------------------------------------------------------
Master (9e646ae) is green with this patch. ./build-support/jenkins/build.sh However, it appears that it might lack test coverage. I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On Oct. 31, 2017, 4:17 p.m., Stephan Erb wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63443/ > ----------------------------------------------------------- > > (Updated Oct. 31, 2017, 4:17 p.m.) > > > Review request for Aurora, Bill Farner and Zameer Manji. > > > Bugs: AURORA-1955 > https://issues.apache.org/jira/browse/AURORA-1955 > > > Repository: aurora > > > Description > ------- > > This commit consits of two independent parts: > > a) ensure we interrupt the main thread when there are unhandled exceptions > b) ensure the main thread of the executor can be interrupted > > > Diffs > ----- > > src/main/python/apache/aurora/executor/bin/thermos_executor_main.py > a191cf9eec844035c0f6aa5aed3731a06024c0df > src/main/python/apache/aurora/tools/thermos.py > de20c06cea5bbb45c7a6f5acfeee69289f8e6ad8 > src/main/python/apache/aurora/tools/thermos_observer.py > 0318f990ac003c0b8925b7eb7359431cdee34f05 > src/main/python/apache/thermos/common/excepthook.py PRE-CREATION > src/main/python/apache/thermos/runner/thermos_runner.py > 847f51ed2c0e003f1325aa903bd0f0b760acb365 > > > Diff: https://reviews.apache.org/r/63443/diff/1/ > > > Testing > ------- > > This bug is pretty hard to reproduce and test. I therefore opted for a manual > verification and injected an exception throw shortly before the last > statement > of the `AuroraExecutor._shutdown` method. Without this patch, this resulted in > hanging executors on the host. With this patch everything is terminated as > expected. > > For details of the suffessful run, please see the executor logs below. Please > note that the `apport.fileutils` is due to Ubuntu messing with its Python > installation. This is not critical. > > ``` > twitter.common.app debug: Initializing: apache.thermos.common.excepthook > (Exception termination handler.) > I1031 15:59:37.188621 25437 exec.cpp:162] Version: 1.2.0 > I1031 15:59:37.192201 25429 exec.cpp:237] Executor registered on agent > 93259518-14f4-4956-a39c-aa615bff9a5e-S0 > Writing log files to disk in > /var/lib/mesos/slaves/93259518-14f4-4956-a39c-aa615bff9a5e-S0/frameworks/7b202c2e-8796-4f27-afeb-8b76ba4b3037-0000/executors/thermos-www-data-prod-hello-0-d8d50c2f-e79b-467d-8c65-cca3cb44cf9c/runs/54a5ed51-aa9b-476f-9f75-0b42bd6dfa8d > > ERROR] Unhandled error in <StatusManager(Thread-7 [TID=25450], started daemon > 139968452134656)>. Interrupting main thread. > Traceback (most recent call last): > File > "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py", > line 126, in _excepting_run > self.__real_run(*args, **kw) > File "apache/aurora/executor/status_manager.py", line 62, in run > File "apache/aurora/executor/aurora_executor.py", line 236, in _shutdown > RuntimeError: Woops! > Exception in thread Thread-7 [TID=25450]: > Traceback (most recent call last): > File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner > self.run() > File > "/root/.pex/install/twitter.common.decorators-0.3.7-py2-none-any.whl.b23f2874a4392741fca582d9e0528c08e0335c68/twitter.common.decorators-0.3.7-py2-none-any.whl/twitter/common/decorators/threads.py", > line 115, in identified > return instancemethod(self, *args, **kwargs) > File > "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py", > line 130, in _excepting_run > sys.excepthook(*sys.exc_info()) > File "apache/thermos/common/excepthook.py", line 41, in teardown_handler > self._former_hook()(exc_type, value, trace) > File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 63, in > apport_excepthook > from apport.fileutils import likely_packaged, get_recent_crashes > ImportError: No module named apport.fileutils > > twitter.common.app debug: main exited with ^C > twitter.common.app debug: Shutting application down. > twitter.common.app debug: Running exit function for > apache.thermos.common.excepthook (Exception termination handler.) > twitter.common.app debug: Running exit function for twitter.common.log > (Logging subsystem.) > twitter.common.app debug: Finishing up module teardown. > twitter.common.app debug: Active thread: <_MainThread(MainThread, started > 139968622749504)> > twitter.common.app debug: Active thread (daemon): > <TaskResourceMonitor(TaskResourceMonitor[www-data-prod-hello-0-d8d50c2f-e79b-467d-8c65-cca3cb44cf9c] > [TID=25449], started daemon 139967951009536)> > twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-13, > started daemon 139968485705472)> > twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-9, > started daemon 139967934224128)> > twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-12, > started daemon 139967942616832)> > twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-3, > started daemon 139968510883584)> > twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-11, > started daemon 139967925831424)> > twitter.common.app debug: Exiting cleanly. > ``` > > Corresponding agent logs, indicating that Mesos knows about the crash on > teardown: > ``` > I1031 15:59:54.692739 1956 slave.cpp:4769] Executor > 'thermos-www-data-prod-hello-0-d8d50c2f-e79b-467d-8c65-cca3cb44cf9c' of > framework 7b202c2e-8796-4f27-afeb-8b76ba4b3037-0000 exited with status 130 > I1031 15:59:54.692834 1956 slave.cpp:4869] Cleaning up executor > 'thermos-www-data-prod-hello-0-d8d50c2f-e79b-467d-8c65-cca3cb44cf9c' of > framework 7b202c2e-8796-4f27-afeb-8b76ba4b3037-0000 at > executor(1)@192.168.33.7:48931 > I1031 15:59:54.692996 1956 slave.cpp:4957] Cleaning up framework > 7b202c2e-8796-4f27-afeb-8b76ba4b3037-0000 > ``` > > > Thanks, > > Stephan Erb > >
