> On Aug. 28, 2018, 1:16 a.m., Alexander Rukletsov wrote:
> > src/checks/checker_process.cpp
> > Lines 878-889 (original), 890-901 (patched)
> > <https://reviews.apache.org/r/68495/diff/1/?file=2077041#file2077041line890>
> >
> >     It looks like we should always call `waitNestedContainer()` after we 
> > said `previousCheckContainerId = checkContainerId;`. For example here.
> >     
> >     Maybe it makes sense to call `waitNestedContainer()` right in the 
> > beginning? We can end up calling it twice, but I think it's fine?
> 
> Qian Zhang wrote:
>     > Maybe it makes sense to call waitNestedContainer() right in the 
> beginning? We can end up calling it twice, but I think it's fine?
>     
>     That means we will call agent API `WAIT_NESTED_CONTAINER` twice for each 
> successful launch of check container, I think that might be a burden for 
> agent in a large scale env. So I'd still prefer to call it only in the places 
> where we have to do it.

And for the case (L879:L888, i.e., the connection to agent failed) that you 
pointed out above, I think when the connection to agent is back (e.g., agent 
starts up again), the check container will be treated as orphan container and 
destroyed by agent, and then we will remove it here: 
https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L616:L638.
 However I am going to post another patch to change these codes 
(https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L660:L664)
 to something like:
```
          promise->discard();
        } else {
          previousCheckContainerId = None();
          _nestedCommandCheck(promise, cmd, nested);
        }
```
In this way, if we fail to remove the check container (e.g., due to agent has 
not finished recovery, or the check container is still in `DESTROYING` state), 
we will try to remove it again.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68495/#review207984
-----------------------------------------------------------


On Aug. 24, 2018, 5:54 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68495/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2018, 5:54 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Alexander Rukletsov, Gastón Kleiman, 
> and Gilbert Song.
> 
> 
> Bugs: MESOS-8568
>     https://issues.apache.org/jira/browse/MESOS-8568
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made command check always waits before removing the nested container.
> 
> 
> Diffs
> -----
> 
>   src/checks/checker_process.cpp 77a76f465fe57eab89f027b5acb74c2339551678 
> 
> 
> Diff: https://reviews.apache.org/r/68495/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to