> On Jan. 9, 2017, 2:10 a.m., Vinod Kone wrote:
> > src/launcher/default_executor.cpp, lines 663-683
> > <https://reviews.apache.org/r/55157/diff/1/?file=1596482#file1596482line663>
> >
> >     How about
> >     
> >     ```
> >     // Check to see if the executor needs to shutdown.
> >     
> >     // If the executor is already in the process of shutting down, return.
> >     if (shuttingDown) {
> >       return;
> >     }
> >     
> >     // If there are no active containers, shutdown the executor.
> >     if (containers.empty()) {
> >       shutdown();
> >       return;
> >     }
> >     
> >     // If this container exited with non-zero status, kill all the other 
> > containers,
> >     // per the default policy.
> >     if (taskState == TASK_FAILED) {
> >     
> >       // TODO(Anand): ...
> >       shutdown();
> >     }
> >     
> >     
> >     ```
> 
> Anand Mazumdar wrote:
>     hmm,  I had refrained from splitting the no active containers check and 
> moving it above was because I still wanted to log why the single task pod 
> failed. With the above suggested snippet, we would need to do the logging for 
> both the cases where we return early if the task failed? Another alternative 
> can be that I return early when the executor is shutting down while keeping 
> the rest of the logic same:
>     
>     ```cpp
>         // Ignore if the executor is already in the process of shutting down.
>         if (shuttingDown) {
>           return;
>         }
>     
>         // The default restart policy for a task group is to kill all the
>         // remaining child containers if one of them terminated with a
>         // non-zero exit code.
>         if (taskState == TASK_FAILED) {
>           LOG(ERROR)
>             << "Child container " << containerId << " terminated with status "
>             << (status.isSome() ? WSTRINGIFY(status.get()) : "unknown");
>     
>           // Kill all the other active containers.
>           //
>           // TODO(anand): Invoke `kill()` once per active container
>           // instead of directly invoking `shutdown()`.
>           if (!containers.empty()) {
>             shutdown();
>             return;
>           }
>         }
>     
>         // Shutdown the executor if all the active child containers have 
> terminated.
>         if (containers.empty()) {
>           __shutdown();
>         }
>     ```
>     
>     What do you think?
> 
> Vinod Kone wrote:
>     I think we can print the exit code/reason in the LOG(INFO) message at 
> #656. It is useful irrespective of task status?
>     
>     I moved up `if (containers.empty())` check because it was not very 
> intuitive to me why you had a `!containers.empty()` check inside that if 
> statement. I had to read the `shutdown()` code to understand that. As a side 
> note, it sounds like `shutdown()` should have the logic to call 
> `__shutdown()` if containers is empty.

hmm, I couldn't think of any possible other task states where it might be 
useful other than `TASK_FAILED` to log the exit status. There might be some 
`TASK_KILLED` instances (OOM) where we might still be able to retrieve the 
status? But, neverthless it seemed useful to log the exit status reason.

Can you take a look at the review again?


- Anand


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55157/#review160849
-----------------------------------------------------------


On Jan. 9, 2017, 7:37 p.m., Anand Mazumdar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55157/
> -----------------------------------------------------------
> 
> (Updated Jan. 9, 2017, 7:37 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6848
>     https://issues.apache.org/jira/browse/MESOS-6848
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This bug is only observed when the task group contains a single task.
> The default executor was not committing suicide when this single task
> used to exit with a non-zero status code as per the default restart
> policy.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp fd9963996c82461b60888397989e309d51b60aaa 
>   src/tests/default_executor_tests.cpp 
> 340e8c8b36a6a3cc6e5bae021e19dfa7a54852c3 
> 
> Diff: https://reviews.apache.org/r/55157/diff/
> 
> 
> Testing
> -------
> 
> make check + added a test
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>

Reply via email to