> On Dec. 2, 2015, 8:08 a.m., Benjamin Bannier wrote: > > 3rdparty/libprocess/src/process.cpp, line 2200 > > <https://reviews.apache.org/r/40266/diff/1/?file=1134919#file1134919line2200> > > > > I find adding another manual iteration index manipulation here makes > > this even harder to read (e.g., do we really iterate over all elements? Are > > there assumptions about ordering (hopefully not)?, ...). > > > > You could e.g., factor out a `synchronized` helper to get the next > > not-ignored element (or a `nullptr` if nothing is left); the whole existing > > loop could then collapse to > > > > while (true) { > > ProcessBase* process = next_cleanup(processes)); > > if (!process) { > > break; > > } > > process::terminate(process, false); > > process::wait(process); > > } > > > > This would also make it clear that we intent over all not-ignored > > processes (which currently is implicit through `wait` removing processes > > one after the other, and `processes` not containing any `nullptr` elements). > > > > The helper `next_cleanup` can be implemented without querying size > > information, e.g., it could iterate `processes` until the process doesn't > > pattern-match with ignored processes.
This comment actually helped me uncover another edge case in the finalization code :) As it turns out, the current patch **is dependent** on ordering. --- The existing `process::finalize` (prior to this patch) ends up dereferencing a terminated process. But we don't segfault because the terminated process's pointer is never deleted: 1) Run a test like `ProcessTest.Http1`. 2) This leaves two processes behind (bad test cleanup?), a client `ConnectionProcess` and an `HttpProxy` on the server side. 3) After the libprocess tests, we call `process::finalize`. 4) We delete in alphabetical order, because `std::map` sorts the processes alphabetically. 4) It just so happens that `__gc__` is always the first to die. This means no processes spawned via `spawn(..., true)` will be deleted. 5) The `HttpProxy` (named `__http__`) is deleted next. We leak this pointer :( 6) The `ConnectionProcess` (named `__http_connection__`) is deleted next. This also fires `process::internal::decode_recv`, which cleans up the socket. 7) During socket cleanup, we terminate the associated `HttpProxy` (which was terminated in step 5). Termination actually requires a dereference (i.e. process->pid). This works because we leaked the pointer. - Joseph ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40266/#review108650 ----------------------------------------------------------- On Dec. 2, 2015, 4:50 p.m., Joseph Wu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40266/ > ----------------------------------------------------------- > > (Updated Dec. 2, 2015, 4:50 p.m.) > > > Review request for mesos, Artem Harutyunyan and Joris Van Remoortere. > > > Bugs: MESOS-3910 > https://issues.apache.org/jira/browse/MESOS-3910 > > > Repository: mesos > > > Description > ------- > > The `SocketManager` and `ProcessManager` are highly inter-dependent, which > requires some untangling in `process::finalize`. > > * Logic originally found in `~ProcessManager` has been split into > `ProcessManager::finalize` due to what happens during cleanup. > * Some additional cleanup was added for dangling pointers. > * The future from `__s__->accept()` must be explicitly discarded as libevent > does not detect a locally closed socket. > > > Diffs > ----- > > 3rdparty/libprocess/src/process.cpp > a7af4671efa2f370137ed4a749e5247c91e6f95e > > Diff: https://reviews.apache.org/r/40266/diff/ > > > Testing > ------- > > `make check` (libev) > `make check` (--enable-libevent --enable-ssl) > > > Thanks, > > Joseph Wu > >