-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54355/#review158038
-----------------------------------------------------------




src/slave/containerizer/mesos/io/switchboard.cpp (line 153)
<https://reviews.apache.org/r/54355/#comment228709>

    Can we also log the error case? Like:
    ```
    if (pid.isError()) {
      LOG(ERROR) << ...
    } else if (pid.None()) {
      // Add a comment about when this will happen.
      // E.g., container that do not enable io switchboard server.
    } else {
      ...
    }
    ```



src/slave/containerizer/mesos/io/switchboard.cpp (lines 154 - 162)
<https://reviews.apache.org/r/54355/#comment228714>

    orphans will be killed by containerizer. So we don't have to kill the io 
switchboard here. Just recover the pid and rely on 'cleanup' to clean it up.



src/slave/containerizer/mesos/io/switchboard.cpp (lines 167 - 171)
<https://reviews.apache.org/r/54355/#comment228720>

    I think this flag should apply to containers that are about to be launched 
next.
    
    So let's still recover Info for those containers that have io switchboard 
server even if this flag has been turned off after agent restart.
    
    What do you think?



src/slave/containerizer/mesos/io/switchboard.cpp (lines 536 - 546)
<https://reviews.apache.org/r/54355/#comment228725>

    It's likely that the io switchboard server has been forked, but the agent 
crashes before it was able to checkpoint the pid.
    
    If that happens, during recovery, we will not maintain Info for that 
container. As a result, we won't try to cleanup the socket file potentially 
created?
    
    I think we probably need to createa directory for io switchboard related 
files (sock and pid files). When we create the directory, it indicates that the 
io switchboard server might or might not be created. During recovery, if we 
find the directory exists, but pid file does not exist, we should still create 
the Info with pid set to None(), and cleanup the socket file in 'cleanup' 
method.
    
    Thoughts?


- Jie Yu


On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54355/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:46 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6688
>     https://issues.apache.org/jira/browse/MESOS-6688
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added implementation of `recover()` to the IOSwitchboard isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp 
> 839665a22aca9b1c1c1cf4992406bc924ee2b065 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> d5211b98616e72a27ca6b472a5ee83505c227f22 
> 
> Diff: https://reviews.apache.org/r/54355/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> 
> Test added in follow-on patch.
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Reply via email to