----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65409/ -----------------------------------------------------------
(Updated Feb. 1, 2018, 4:15 p.m.) Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu. Changes ------- Rebased. Bugs: MESOS-6713 https://issues.apache.org/jira/browse/MESOS-6713 Repository: mesos Description ------- Because it is not possible to delete a file (or a folder recursively) with open handles on Windows, we have to explicitly `reset()` the agent before removing the framework meta directory. Otherwise, the task status update manager will be destructed too late, and so an open handle for `task.updates` will cause the `os::rmdir` to fail. This is safe because we previously destructed the agent anyway, just later in the test when it was reassigned. Diffs (updated) ----- src/tests/slave_recovery_tests.cpp 77aa60c953bd0769eaba05f001755e4cec9ba028 Diff: https://reviews.apache.org/r/65409/diff/2/ Changes: https://reviews.apache.org/r/65409/diff/1-2/ Testing ------- make check on CentOS 7, all passed ctest on Windows, all passed including new SlaveRecoveryTests Note that while this chain enables recovery of Docker tasks on Windows, it explicitly does not fix MESOS-8519 (recovery of job object tasks). ``` I0131 11:52:01.545505 8316 docker.cpp:898] Recovering Docker containers I0131 11:52:01.546005 660 containerizer.cpp:674] Recovering containerizer I0131 11:52:01.546505 660 containerizer.cpp:725] Skipping recovery of executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 because it was not launched from mesos containerizer I0131 11:52:01.557006 11272 provisioner.cpp:493] Provisioner recovery complete I0131 11:52:02.521003 8720 docker.cpp:1008] Recovering container 'f7978e90-32f5-458d-ad4e-3ffa25a7b190' for executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 I0131 11:52:02.530527 8316 slave.cpp:6695] Sending reconnect request to executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:63903 I0131 11:52:02.549062 8720 slave.cpp:4519] Received re-registration message from executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 I0131 11:52:04.548064 10556 slave.cpp:4737] Cleaning up un-reregistered executors I0131 11:52:04.548064 10556 slave.cpp:6824] Finished recovery I0131 11:52:04.566066 660 task_status_update_manager.cpp:181] Pausing sending task status updates I0131 11:52:04.567059 14636 slave.cpp:1146] New master detected at master@10.123.6.78:5050 I0131 11:52:04.567059 14636 slave.cpp:1190] No credentials provided. Attempting to register without authentication I0131 11:52:04.568047 14636 slave.cpp:1201] Detecting new master I0131 11:52:04.604035 8720 slave.cpp:1471] Re-registered with master master@10.123.6.78:5050 I0131 11:52:04.605060 660 task_status_update_manager.cpp:188] Resuming sending task status updates I0131 11:52:04.606036 8720 slave.cpp:1516] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"mzwol7M6SrGxOml4zYlA8Q=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4-S0"},"update_oversubscribed_resource s":true} I0131 11:52:04.612036 8720 slave.cpp:3625] Updating info for framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 with pid updated to scheduler-aaa62980-8b1b-4775-b8bb-c6890b41941e@10.123.6.78:45907 I0131 11:52:04.636543 13468 task_status_update_manager.cpp:188] Resuming sending task status updates ``` Thanks, Andrew Schwartzmeyer