> On June 12, 2015, 6:43 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, lines 4383-4400
> > <https://reviews.apache.org/r/34720/diff/4/?file=983235#file983235line4383>
> >
> > First of all, we don't need to check if 'executor' is NULL or not here.
> > If you looked at the callers, it's guaranteed that 'executor' will be
> > non-NULL. So you need a CHECK here.
> >
> > Also, I thought about what the correct logic should be in this funtion
> > (see below). Let me know your thoughts. (we have too many tech debts in
> > this function, so we better fix this asap, instead of waiting further).
> >
> > ```
> > TaskState state;
> > TaskStatus::Reason reason;
> > string message;
> >
> > // Determine the state and reason for the status update.
> > if (termination.isReady() && termination.get().has_reason()) {
> > // This is the case where the resource limit has been
> > // reached and the container destroyed itself.
> > state = TASK_FAILED;
> > reason = termination.get().reason();
> > } else if (executor->reason.isSome()) {
> > // This is the case where the slave initiates the
> > // termination of the container.
> >
> > // NOTE: For command executors, previously we send TASK_FAILED
> > // and REASON_COMMAND_EXECUTOR_FAILED when the command executor
> > // failed to launch (e.g., invalid executor path).
> > if (executor->isCommandExecutor() &&
> > executor->reason.get() == REASON_EXECUTOR_REGISTRATION_TIMEOUT) {
> > state = TASK_FAILED;
> > reason = REASON_COMMAND_EXECUTOR_FAILED;
> > } else {
> > state = TASK_LOST;
> > reason = executor->reason.get();
> > }
> > } else {
> > state = TASK_LOST;
> > reason = TaskStatus::REASON_EXECUTOR_TERMINATED
> > }
> >
> > // Determine the message for the status update.
> > // TODO(nnielsen): Consider adding a message from the slave
> > // as well, and concatenating the messages.
> > if (termination.isReady()) {
> > message = termination.get().message();
> > } else {
> > // This is case where the containerizer fails
> > // to destroy the container.
> > message = "Abnormal executor termination";
> > }
> >
> > statusUpdate(...);
> > ```
>
> Jie Yu wrote:
> Forgot to mention that you need to set executor->reason accordingly when
> slave initiates a destroy of the container (e.g., containerizer->update
> fails, executor registration timeout, etc.).
Changing the command task state from TASK_FAILED to TASK_LOST is an API change!
It should've a deprecation cycle with a headsup given on the dev list. I would
recommend doing:
```
// Determine the state and reason for the status update.
if (termination.isReady() && termination.get().has_reason()) {
// This is the case where the resource limit has been
// reached and the container destroyed itself.
state = TASK_FAILED;
reason = termination.get().reason();
} else if (executor->isCommandExecutor()) {
// TODO(jieyu): .....
state = TASK_FAILED;
reason = REASON_COMMAND_EXECUTOR_FAILED;
} else if (executor->reason.isSome()) {
// This is the case where the slave initiates the
// termination of the container.
state = TASK_LOST;
reason = executor->reason.get();
} else {
state = TASK_LOST;
reason = TaskStatus::REASON_EXECUTOR_TERMINATED
}
```
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87734
-----------------------------------------------------------
On June 11, 2015, 10:58 p.m., Niklas Nielsen wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
>
> (Updated June 11, 2015, 10:58 p.m.)
>
>
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and
> Vinod Kone.
>
>
> Bugs: MESOS-2653
> https://issues.apache.org/jira/browse/MESOS-2653
>
>
> Repository: mesos
>
>
> Description
> -------
>
> See summary
>
>
> Diffs
> -----
>
> src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7
> src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4
>
> Diff: https://reviews.apache.org/r/34720/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Niklas Nielsen
>
>