----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64940/#review195241 -----------------------------------------------------------
LGTM modulo Vinod's comments. Sorry we have gone back and forth on this but this time I think we are close to reach what's shippable to unblock 1.5. src/master/http.cpp Lines 360-362 (original) <https://reviews.apache.org/r/64940/#comment274470> So, as reminded by Vinod, for backward compatibility let's keep the checks. However here we should change the condition to ``` if (task->state() == TASK_UNREACHABLE) { continue; } ``` because other terminal tasks that are not TASK_LOST could also be in this map... src/master/master.hpp Lines 2326-2327 (original) <https://reviews.apache.org/r/64940/#comment274417> The following CHECK looks correct? ``` CHECK(Master::isRemovable(task->state()) << task->state(); ``` Although failing CHECKs are annoying, it helped us uncover the inconsistency in our task state transitions so it's better to correct it? src/master/master.cpp Lines 11023-11036 (original), 11021-11034 (patched) <https://reviews.apache.org/r/64940/#comment274415> I was going to suggest that we remove the `if (task->state() == TASK_UNREACHABLE)` check but per Vinod's comment we shouldn't. src/tests/mesos.hpp Lines 3511 (patched) <https://reviews.apache.org/r/64940/#comment274471> s/a/an/? src/tests/mesos.hpp Line 3511 (original), 3518 (patched) <https://reviews.apache.org/r/64940/#comment274472> s/a/an/? src/tests/partition_tests.cpp Lines 2401 (patched) <https://reviews.apache.org/r/64940/#comment274477> We have to s/unreachable/completed/ and s/but/and/ ? src/tests/partition_tests.cpp Lines 2402 (patched) <https://reviews.apache.org/r/64940/#comment274478> With all the discussions about teminology it seems s/Completed/Terminal/ is more accurate? src/tests/partition_tests.cpp Lines 2440-2442 (patched) <https://reviews.apache.org/r/64940/#comment274473> Nit: declare them in the order we are expecting on them? i.e., starting -> running -> finished. src/tests/partition_tests.cpp Lines 2474-2475 (patched) <https://reviews.apache.org/r/64940/#comment274474> Without a paused clock, there is a chance of race here right? If the same status update is resent after a timeout before we acknowledge, it's possible that the resent update is going to be caught by the 2nd expectation? Would it be possible to hoist the Clock::pause() statement? src/tests/partition_tests.cpp Lines 2527 (patched) <https://reviews.apache.org/r/64940/#comment274475> Remove this debugging log. src/tests/partition_tests.cpp Lines 2532-2534 (patched) <https://reviews.apache.org/r/64940/#comment274476> Sorry now you have to expect it to be in the completed_task section... - Jiang Yan Xu On Jan. 5, 2018, 11:37 a.m., James Peach wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64940/ > ----------------------------------------------------------- > > (Updated Jan. 5, 2018, 11:37 a.m.) > > > Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod > Kone, and Jiang Yan Xu. > > > Bugs: MESOS-8337 > https://issues.apache.org/jira/browse/MESOS-8337 > > > Repository: mesos > > > Description > ------- > > If an agent is lost, we try to remove all the tasks that might have > been lost. If a task is already terminal but has unacknowleged status > updates, it is expected that we track it in the unreachable tasks list > so we should remove the CHECK that prevents this. This also backs out > changes to how unreachable tasks are presented in the HTTP endpoints to > restore compatibility with previous Mesos releases. > > > Diffs > ----- > > src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 > src/master/master.hpp 5e6ba53c075174a1e514a395ceb17c26201ec470 > src/master/master.cpp 6fc5de89e54ba0b9ae2c4fb475be9878910820d3 > src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 > src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a > > > Diff: https://reviews.apache.org/r/64940/diff/6/ > > > Testing > ------- > > make check (Fedora 27) > > > Thanks, > > James Peach > >