> On Jan. 4, 2018, 2:09 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp > > Lines 10061 (patched) > > <https://reviews.apache.org/r/64940/diff/1/?file=1930130#file1930130line10062> > > > > Perhaps move all the logic around determining the unreachable task down > > here. > > > > ``` > > // We need to inform `removeTask` whether the task became unreachable > > due to the agent being removed. Note that terminal tasks are not considered > > unreachable. We can not rely on the task's state because it may have been > > set to `TASK_LOST` for backwards-compatibility. > > bool unreachable = > > unreachableTime.isSome() && !protobuf::isTerminalState(task->state()); > > ``` > > > > Inlining `unreachable` is probably fine too but separating it helps > > clarify it with comments? > > James Peach wrote: > You have to sample the task state before the update because it might > transition to a terminal state at that time (and LOST is terminal).
Right, so put the line above `updateTask`? The main thing I am commenting on is that we can consolidate the logic without first defining `bool unreachable = true;` much far above? > On Jan. 4, 2018, 2:09 p.m., Jiang Yan Xu wrote: > > src/tests/master_tests.cpp > > Lines 7637 (patched) > > <https://reviews.apache.org/r/64940/diff/1/?file=1930131#file1930131line7637> > > > > You could just wait for the the status update (being droppped) instead > > of capaturing `Slave::executorTerminated` and risk the race condition of > > the master not receiving the status update before the agent is deemed > > partitioned? > > James Peach wrote: > Dealing with all the status updates was less reliable. Do you see the race condition I was trying to point out? > On Jan. 4, 2018, 2:09 p.m., Jiang Yan Xu wrote: > > src/tests/master_tests.cpp > > Lines 7686 (patched) > > <https://reviews.apache.org/r/64940/diff/1/?file=1930131#file1930131line7686> > > > > No need to settle? > > James Peach wrote: > This is settling to ensure that events following from marking the agent > down are fully processed. Oh ok that's right. - Jiang Yan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64940/#review194777 ----------------------------------------------------------- On Jan. 5, 2018, 11:06 a.m., James Peach wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64940/ > ----------------------------------------------------------- > > (Updated Jan. 5, 2018, 11:06 a.m.) > > > Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod > Kone, and Jiang Yan Xu. > > > Bugs: MESOS-8337 > https://issues.apache.org/jira/browse/MESOS-8337 > > > Repository: mesos > > > Description > ------- > > If an agent is lost, we try to remove all the tasks that might > have been lost. However, if a task is already terminal, it hasn't > really been lost so we should not be tracking it in the framework's > unreachable tasks list. > > > Diffs > ----- > > src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 > src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 > src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a > > > Diff: https://reviews.apache.org/r/64940/diff/4/ > > > Testing > ------- > > make check (Fedora 27) > > > Thanks, > > James Peach > >