----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48453/#review136777 -----------------------------------------------------------
src/sched/sched.cpp (line 1001) <https://reviews.apache.org/r/48453/#comment201868> What happens in the following scenario: * framework launches task with executor (=> add UPID to `taskPids`) * agent where the task is running fails health checks (=> framework receives `TASK_LOST`, which is considered a terminal state per `isTerminalState()`, so we remove the UPID from `taskPids`) * master fails over and we reregister with a new master * agent reregisters with the master; this is allowed, per non-strict registry * we get `TASK_RUNNING` for the task ISTM we won't track the executor in `executorPids`, although we should. In general, the logic here seems pretty complicated and a little arbitrary... src/sched/sched.cpp (line 1134) <https://reviews.apache.org/r/48453/#comment201866> Can this actually occur? src/sched/sched.cpp (line 1669) <https://reviews.apache.org/r/48453/#comment201867> Is `taskPids` the best name here? Seems like we use this only to store task PIDs in the transient period between launching a task and getting a `TASK_RUNNING` update for it. - Neil Conway On June 9, 2016, 1:08 a.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48453/ > ----------------------------------------------------------- > > (Updated June 9, 2016, 1:08 a.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: MESOS-5143 > https://issues.apache.org/jira/browse/MESOS-5143 > > > Repository: mesos > > > Description > ------- > > Since UPIDs are tracked in the scheduler driver to be able to directly > send FrameworkMessage to executor, we now track UPIDs for an executor > running on an agent (instead for a slave). We track this mapping only > for the life of the executor (instead of the life of the agent). This > enables us to avoid sending lost slave message to all frameworks > (instead of relevant frameworks only). > > > Diffs > ----- > > src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 > > Diff: https://reviews.apache.org/r/48453/diff/ > > > Testing > ------- > > All tests passed. > > > Thanks, > > Anindya Sinha > >
