> On Nov. 19, 2015, 6:16 p.m., Neil Conway wrote: > > Good find. I wonder: > > > > (a) is there some general advice we should give to people implementing > > Processes (e.g., "always provide a destructor that does terminate/wait" -- > > that is probably too broad though). Would be nice to add this to the > > libprocess README. > > (b) does this problem occur anywhere else? and/or is there a way to detect > > it? > > Bernd Mathiske wrote: > [~neilc] a) I agree that we should provide documentation in this regard. > This kind of pattern is too easily overlooked, also by reviewers, as > exemplified when the bug got introduced: https://reviews.apache.org/r/27483/ > b) For test code (such as is the case here), we could put something into > the inherited fixture that scans for orphaned processes at TearDown. > > Benjamin Bannier wrote: > Note that there is some assymmetry in the API: when *not giving* a > `manage` arg to `spawn`. you get the default value `false`; but you then *do > need to add code* to terminate and wait for the process. > > I personally would naively have expected an API to either (i) require me > to be explicitly in both places (explictly set `manage=false`, and to manual > `terminate` and `wait`), or (ii) it just do The Right Thing if I was brief (I > called `spawn` with default args, and do not have to worry about cleanup). > > I.e., wouldn't it in the long run make more sense the change `spawn` to > `PID<T> spawn(T& t, bool manage = true)` than to add more documentation?
[~neilc] a) I agree that documentation is a good direction. (I'll draft something.) [~bernd-mesos] b) We already have an orphaned process detector in `src/tests/environment.cpp` `Environment::TearDown`. However, this only catches **orphaned** processes (whose parents have been killed :( ). This HTTP process's parent is the test process. There is an endpoint `/__processes__` which we might be able to use to check for extraneous processes. But this might be messy, since we have a great deal of global processes. [~bbannier] The `manage` argument will only change ownership of the process pointer. If you dig into the `GarbageCollector`, you'll see that all it does is `delete` the pointer when the process terminates. The original process is still in charge of its own termination. - Joseph ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40501/#review107290 ----------------------------------------------------------- On Nov. 19, 2015, 1:51 p.m., Joseph Wu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40501/ > ----------------------------------------------------------- > > (Updated Nov. 19, 2015, 1:51 p.m.) > > > Review request for mesos, Bernd Mathiske, Artem Harutyunyan, and Joris Van > Remoortere. > > > Bugs: MESOS-3753 > https://issues.apache.org/jira/browse/MESOS-3753 > > > Repository: mesos > > > Description > ------- > > Two of the fetcher tests will spawn a process which is stored in the stack > (i.e. local variable in the test). `spawn` will store a pointer to the > process in libprocess's `ProcessManager`. When the test finishes, the > process goes out of scope and is therefore lost. However, the process is > **not** terminated. > > Failing to terminate this process will lead to an infinite loop in > `~ProcessManager`, which is called in `process::finalize`. In > `ProcessManager` 's destructor, we will loop and try to kill all processes. > The process spawned in the test will be running. However, since the pointer > lives in the stack, the `ProcessManager` will be unable to find the process > and will thereby be stuck trying to kill a process it cannot find. > > > Diffs > ----- > > src/tests/fetcher_tests.cpp 04079964b3539f555351d1444f3635c64700a1a8 > > Diff: https://reviews.apache.org/r/40501/diff/ > > > Testing > ------- > > `make check` > > Additional testing: > > Insert a `process::finalize` in `src/test/main.cpp`. i.e. > ``` > // Replace `return RUN_ALL_TESTS();` with this: > int ret = RUN_ALL_TESTS(); > process::finalize(); > return ret; > ``` > > Then `make check > GTEST_FILTER="*FetcherTest.OSNetUriTest*:*FetcherTest.OSNetUriSpaceTest*"`. > The test program should not stall or segfault or abort in some weird way. > > > Thanks, > > Joseph Wu > >