----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40429/#review107042 -----------------------------------------------------------
Thanks for working on this. This functionality has been missing for some-time now. Just some minor comments around style and an alternate suggestion for the method name to be on parity with the existing `lostSlave` method. src/sched/sched.cpp (line 594) <https://reviews.apache.org/r/40429/#comment165938> We generally avoid redundant/self-explanatory comments. Remove this: s/Notify scheduler for exited executor./ src/sched/sched.cpp (line 595) <https://reviews.apache.org/r/40429/#comment165939> s/executorExited/lostExecutor Let's name this method `lostExecutor` to be on parity with the already existing `lostSlave` method. Also, let's change the signature of the function to: ``` void executorExited( const UPID& from, const SlaveID& slaveId, const ExecutorID& executorId, int32_t status) { ``` src/sched/sched.cpp (line 598) <https://reviews.apache.org/r/40429/#comment165958> Note that the `status` field is an optional now in the new `V1` Call/Event API: https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler/scheduler.proto#L126 If it's not set, let's set it to `-1` i.e. a special status when the `status` is unknown. This is similar to how the Slave does it. https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L3544 ``` int32_t status = -1; if (event.failure().has_status()) { status = event.failure().status(); } lostExecutor( from, event.failure().slave_id(), event.failure().executor_id(), status); ``` src/sched/sched.cpp (line 1051) <https://reviews.apache.org/r/40429/#comment165940> We prefer braces on new-line for function definitions: ``` void lostExecutor( const UPID& from, const SlaveID& slaveId, const ExecutorID& executorId, int32_t status) { } ``` src/sched/sched.cpp (line 1053) <https://reviews.apache.org/r/40429/#comment165942> s/Ignoring framework message/Ignoring lost executor message src/sched/sched.cpp (line 1056) <https://reviews.apache.org/r/40429/#comment165944> Let's have another check to see if the driver is not disconnected here: ``` if (!connected) { VLOG(1) << "Ignoring lost executor message because the driver is disconnected!"; return; } ``` ^^ My bad for the bad indent. RB was not letting me indent this properly. Also, let's add another check to see if the message was received from the leading master: ``` CHECK_SOME(master); if (from != master.get().pid()) { VLOG(1) << "Ignoring lost executor message because it was sent " << "from '" << from << "' instead of the leading master '" << master.get().pid() << "'"; return; } ``` src/sched/sched.cpp (line 1069) <https://reviews.apache.org/r/40429/#comment165947> s/executorExited/executorLost src/tests/scheduler_event_call_tests.cpp (line 592) <https://reviews.apache.org/r/40429/#comment165948> New line before this line. src/tests/scheduler_event_call_tests.cpp (line 593) <https://reviews.apache.org/r/40429/#comment165950> You can just do: `ExecutorID executorID = DEFAULT_EXECUTOR_ID;` src/tests/scheduler_event_call_tests.cpp (line 595) <https://reviews.apache.org/r/40429/#comment165949> `const int32_t status = 255;` src/tests/scheduler_event_call_tests.cpp (line 608) <https://reviews.apache.org/r/40429/#comment165951> New line before this line - Anand Mazumdar On Nov. 18, 2015, 7:01 a.m., Zhitao Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40429/ > ----------------------------------------------------------- > > (Updated Nov. 18, 2015, 7:01 a.m.) > > > Review request for mesos, Adam B and Vinod Kone. > > > Bugs: MESOS-313 > https://issues.apache.org/jira/browse/MESOS-313 > > > Repository: mesos > > > Description > ------- > > Report executor exit to framework schedulers. This is a MVP to start the work > of notifying scheduler on scheduler refresh. > > Next step would be sending this message reliabily, and/or splitting > Event::FAILURE for slave failure and executor termination. > > > Diffs > ----- > > src/sched/sched.cpp a6faf92ff99cd79c3817684581862fecd1608048 > src/tests/scheduler_event_call_tests.cpp > 39f67a8243db8073d1c9c92c7aeb71854143131d > > Diff: https://reviews.apache.org/r/40429/diff/ > > > Testing > ------- > > Modified test for SchedulerDriverEventTest.Failure, which verifies that > MockScheduler::executorLost is invoked. > > > Thanks, > > Zhitao Li > >