> On Feb. 1, 2018, 10:32 p.m., Jie Yu wrote: > > src/slave/containerizer/mesos/main.cpp > > Lines 40-50 (patched) > > <https://reviews.apache.org/r/65465/diff/1/?file=1951378#file1951378line40> > > > > Flying by. Why this logic is not in launch.cpp? Sounds to me it's > > unrelated to, for example, Mount below? > > Andrew Schwartzmeyer wrote: > Where in `launch.cpp` would you put it? The handle needs to exist for > exactly as long as the process exists (or as close as we can get, which > putting it here gets it really close).
well, i don't think putting here or in launch.cpp has any noticible difference in terms of "closeness" (probably a dozen of instructions?). my question is: is this logic only related to the launch of a container or not? If yes, this should be moved to launch.cpp (i.e., `MesosContainerizerLaunch::execute()`). - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65465/#review196662 ----------------------------------------------------------- On Feb. 1, 2018, 7:57 p.m., Andrew Schwartzmeyer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65465/ > ----------------------------------------------------------- > > (Updated Feb. 1, 2018, 7:57 p.m.) > > > Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu. > > > Bugs: MESOS-8519 > https://issues.apache.org/jira/browse/MESOS-8519 > > > Repository: mesos > > > Description > ------- > > The Windows OS deletes the job object created in the agent process when > the agent dies, because no other process holds a handle to it (despite > processes being assigned to the job object). While this is > counter-intuitive, it is the observed behavior. So in order for recovery > to succeed, the containerizer must also hold an otherwise unused handle > to its job object to keep it alive in the kernel, and available for > recovery to find. > > > Diffs > ----- > > src/slave/containerizer/mesos/main.cpp > a53ccd68bf975d919f9d1f920cf3fa74d4e43f24 > > > Diff: https://reviews.apache.org/r/65465/diff/1/ > > > Testing > ------- > > ``` > [----------] Global test environment tear-down > [==========] 874 tests from 85 test cases ran. (253311 ms total) > [ PASSED ] 874 tests. > > I0201 12:46:58.159368 3116 slave.cpp:6921] Recovering framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 > I0201 12:46:58.159368 3116 slave.cpp:8543] Recovering executor > 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 > I0201 12:46:58.162847 9456 task_status_update_manager.cpp:207] Recovering > task status update manager > I0201 12:46:58.162847 9456 task_status_update_manager.cpp:215] Recovering > executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 > I0201 12:46:58.166851 7344 containerizer.cpp:674] Recovering containerizer > I0201 12:46:58.167351 7344 containerizer.cpp:731] Recovering container > 69cefa53-61e0-444b-a808-e38ffb4cb18f for executor > 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 > I0201 12:46:58.183379 17088 provisioner.cpp:493] Provisioner recovery complete > I0201 12:46:58.186367 16792 slave.cpp:6695] Sending reconnect request to > executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:52591 > I0201 12:46:58.194370 7344 slave.cpp:4519] Received re-registration message > from executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 > I0201 12:47:00.193958 16792 slave.cpp:4737] Cleaning up un-reregistered > executors > I0201 12:47:00.193958 16792 slave.cpp:6824] Finished recovery > I0201 12:47:00.200943 9456 task_status_update_manager.cpp:181] Pausing > sending task status updates > I0201 12:47:00.200943 3116 slave.cpp:1146] New master detected at > master@10.123.6.78:5050 > I0201 12:47:00.200943 3116 slave.cpp:1190] No credentials provided. > Attempting to register without authentication > I0201 12:47:00.200943 3116 slave.cpp:1201] Detecting new master > I0201 12:47:00.214944 16792 slave.cpp:1471] Re-registered with master > master@10.123.6.78:5050 > I0201 12:47:00.214944 13180 task_status_update_manager.cpp:188] Resuming > sending task status updates > I0201 12:47:00.215942 16792 slave.cpp:1516] Forwarding agent update > {"operations":{},"resource_version_uuid" > {"value":"jLIL1d\/PQnuwmFxpMf8CLQ=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4S3"},"update_oversubscribed_resources":true} > I0201 12:47:00.219952 3116 slave.cpp:3625] Updating info for framework > eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 with pid updated to > scheduler-aaa62980-8b1b-4775-b8bb-c6890b41941e@10.123.6.78:45907 > I0201 12:47:00.233942 7344 task_status_update_manager.cpp:188] Resuming > sending task status updates > ``` > > > Thanks, > > Andrew Schwartzmeyer > >