Re: Review Request 55157: Fixed a bug in the default executor around not committing suicide.

Anand Mazumdar Sun, 08 Jan 2017 18:36:43 -0800


> On Jan. 9, 2017, 2:10 a.m., Vinod Kone wrote:
> > src/launcher/default_executor.cpp, lines 663-683
> > <https://reviews.apache.org/r/55157/diff/1/?file=1596482#file1596482line663>
> >
> >     How about
> >     
> >     ```
> >     // Check to see if the executor needs to shutdown.
> >     
> >     // If the executor is already in the process of shutting down, return.
> >     if (shuttingDown) {
> >       return;
> >     }
> >     
> >     // If there are no active containers, shutdown the executor.
> >     if (containers.empty()) {
> >       shutdown();
> >       return;
> >     }
> >     
> >     // If this container exited with non-zero status, kill all the other 
> > containers,
> >     // per the default policy.
> >     if (taskState == TASK_FAILED) {
> >     
> >       // TODO(Anand): ...
> >       shutdown();
> >     }
> >     
> >     
> >     ```


hmm,  I had refrained from splitting the no active containers check and moving 
it above was because I still wanted to log why the single task pod failed. With 
the above suggested snippet, we would need to do the logging for both the cases 
where we return early if the task failed? Another alternative can be that I 
return early when the executor is shutting down while keeping the rest of the 
logic same:

```cpp
    // Ignore if the executor is already in the process of shutting down.
    if (shuttingDown) {
      return;
    }

    // The default restart policy for a task group is to kill all the
    // remaining child containers if one of them terminated with a
    // non-zero exit code.
    if (taskState == TASK_FAILED) {
      LOG(ERROR)
        << "Child container " << containerId << " terminated with status "
        << (status.isSome() ? WSTRINGIFY(status.get()) : "unknown");

      // Kill all the other active containers.
      //
      // TODO(anand): Invoke `kill()` once per active container
      // instead of directly invoking `shutdown()`.
      if (!containers.empty()) {
        shutdown();
        return;
      }
    }

    // Shutdown the executor if all the active child containers have terminated.
    if (containers.empty()) {
      __shutdown();
    }
```

What do you think?


- Anand


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55157/#review160849
-----------------------------------------------------------


On Jan. 4, 2017, 12:38 a.m., Anand Mazumdar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55157/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2017, 12:38 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6848
>     https://issues.apache.org/jira/browse/MESOS-6848
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This bug is only observed when the task group contains a single task.
> The default executor was not committing suicide when this single task
> used to exit with a non-zero status code as per the default restart
> policy.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp fd9963996c82461b60888397989e309d51b60aaa 
>   src/tests/default_executor_tests.cpp 
> 340e8c8b36a6a3cc6e5bae021e19dfa7a54852c3 
> 
> Diff: https://reviews.apache.org/r/55157/diff/
> 
> 
> Testing
> -------
> 
> make check + added a test
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>

Re: Review Request 55157: Fixed a bug in the default executor around not committing suicide.

Reply via email to