-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65465/
-----------------------------------------------------------
(Updated Feb. 8, 2018, 11:54 a.m.)
Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu.
Bugs: MESOS-8519
https://issues.apache.org/jira/browse/MESOS-8519
Repository: mesos
Description
-------
The Windows OS deletes the job object created in the agent process when
the agent dies, because no other process holds a handle to it (despite
processes being assigned to the job object). While this is
counter-intuitive, it is the observed behavior. So in order for recovery
to succeed, the containerizer must also hold an otherwise unused handle
to its job object to keep it alive in the kernel, and available for
recovery to find.
Diffs (updated)
-----
src/slave/containerizer/mesos/launch.cpp
91016ed417428e3a5b21a132a96b9d7760d13aa3
Diff: https://reviews.apache.org/r/65465/diff/2/
Changes: https://reviews.apache.org/r/65465/diff/1-2/
Testing
-------
```
[----------] Global test environment tear-down
[==========] 874 tests from 85 test cases ran. (253311 ms total)
[ PASSED ] 874 tests.
I0201 12:46:58.159368 3116 slave.cpp:6921] Recovering framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.159368 3116 slave.cpp:8543] Recovering executor
'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.162847 9456 task_status_update_manager.cpp:207] Recovering task
status update manager
I0201 12:46:58.162847 9456 task_status_update_manager.cpp:215] Recovering
executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.166851 7344 containerizer.cpp:674] Recovering containerizer
I0201 12:46:58.167351 7344 containerizer.cpp:731] Recovering container
69cefa53-61e0-444b-a808-e38ffb4cb18f for executor
'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.183379 17088 provisioner.cpp:493] Provisioner recovery complete
I0201 12:46:58.186367 16792 slave.cpp:6695] Sending reconnect request to
executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:52591
I0201 12:46:58.194370 7344 slave.cpp:4519] Received re-registration message
from executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:47:00.193958 16792 slave.cpp:4737] Cleaning up un-reregistered
executors
I0201 12:47:00.193958 16792 slave.cpp:6824] Finished recovery
I0201 12:47:00.200943 9456 task_status_update_manager.cpp:181] Pausing sending
task status updates
I0201 12:47:00.200943 3116 slave.cpp:1146] New master detected at
[email protected]:5050
I0201 12:47:00.200943 3116 slave.cpp:1190] No credentials provided. Attempting
to register without authentication
I0201 12:47:00.200943 3116 slave.cpp:1201] Detecting new master
I0201 12:47:00.214944 16792 slave.cpp:1471] Re-registered with master
[email protected]:5050
I0201 12:47:00.214944 13180 task_status_update_manager.cpp:188] Resuming
sending task status updates
I0201 12:47:00.215942 16792 slave.cpp:1516] Forwarding agent update
{"operations":{},"resource_version_uuid"
{"value":"jLIL1d\/PQnuwmFxpMf8CLQ=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4S3"},"update_oversubscribed_resources":true}
I0201 12:47:00.219952 3116 slave.cpp:3625] Updating info for framework
eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 with pid updated to
[email protected]:45907
I0201 12:47:00.233942 7344 task_status_update_manager.cpp:188] Resuming
sending task status updates
```
Thanks,
Andrew Schwartzmeyer