----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55313/ -----------------------------------------------------------
(Updated Jan. 18, 2017, 8:49 p.m.) Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, and Joseph Wu. Changes ------- Address Joseph's comments. Bugs: MESOS-6698, MESOS-6839 and MESOS-6870 https://issues.apache.org/jira/browse/MESOS-6698 https://issues.apache.org/jira/browse/MESOS-6839 https://issues.apache.org/jira/browse/MESOS-6870 Repository: mesos Description ------- MESOS-6839 tracks a bug that causes the current implementation of the default executor to be unable to delete any processes associated with a task. To understand why requires some knowledge of the differences between the process model of Windows and Unix. In Unix, there is a robust notion of a process tree, with a well-defined notion of process groups, sessions, signal delivery on the tree, and so on. Windows lacks a robust notion of a process hierarchy, and therefore largely has no equivalents to these constructs (including, notably, signal semantics). One of the problems this mismatch causes Mesos is that it complicates the problem of killing a task, which is at its core a group of processes. On Windows, the easiest way to make a process and all its descendents easily killable is to track these processes in a Job Object, which is a Windows kernel construct similar in principle to Linux's control groups (though with different ideas of process namespacing). There is some subtlety in making sure _all_ processes associated with a task are captured inside a Job Object. The most important consideration is that we need to make sure to add any process to the Job Object before it has a chance to create any child processes; if we fail to do this, the children will not be captured in the Job Object. The solution to this is fairly simple on Windows. The process creation API allows users to trivially create a process in a suspended state, so that the Windows kernel scheduler does not schedule the process to run until the user explicitly resumes the main thread. This allows us to create the process and add it to a Job Object before it has a chance to create children, and then start the process. This commit will accomplish this by changing `PosixLauncher::fork` to use the Subprocess parent hooks API, which implements exactly this semantics. Essentially, the launcher will launch the containerizer process, which will inspect the TaskInfo or the environment for a task to launch, and then launch it. Using the parent hooks API, Subprocess will create the containerizer process on Windows in a suspended state, and then the parent hook supplied by the launcher will add that process to a Job Object before it has a chance to run. Finally, Subprocess will mark the process as runnable, and return. This commit resolves MESOS-6839. We also light up the executor tests, so it also resolves MESOS-6870 and MESOS-6839. Diffs (updated) ----- src/slave/containerizer/mesos/launcher.cpp a6a8c01cb39f35f8174fcb5af0ef18de2da5ee78 src/tests/command_executor_tests.cpp 4d5c21ec427ebaac053e56ae554cb466dfeb0b8b src/tests/default_executor_tests.cpp ec3e854ed58a0fbb3bfad0bd21eb0e2974548865 Diff: https://reviews.apache.org/r/55313/diff/ Testing ------- Thanks, Alex Clemmer