> On Aug. 28, 2018, 1:16 a.m., Alexander Rukletsov wrote: > > src/checks/checker_process.cpp > > Lines 878-889 (original), 890-901 (patched) > > <https://reviews.apache.org/r/68495/diff/1/?file=2077041#file2077041line890> > > > > It looks like we should always call `waitNestedContainer()` after we > > said `previousCheckContainerId = checkContainerId;`. For example here. > > > > Maybe it makes sense to call `waitNestedContainer()` right in the > > beginning? We can end up calling it twice, but I think it's fine? > > Qian Zhang wrote: > > Maybe it makes sense to call waitNestedContainer() right in the > beginning? We can end up calling it twice, but I think it's fine? > > That means we will call agent API `WAIT_NESTED_CONTAINER` twice for each > successful launch of check container, I think that might be a burden for > agent in a large scale env. So I'd still prefer to call it only in the places > where we have to do it. > > Qian Zhang wrote: > And for the case (L879:L888, i.e., the connection to agent failed) that > you pointed out above, I think when the connection to agent is back (e.g., > agent starts up again), the check container will be treated as orphan > container and destroyed by agent, and then we will remove it here: > https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L616:L638. > However I am going to post another patch to change these codes > (https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L660:L664) > to something like: > ``` > promise->discard(); > } else { > previousCheckContainerId = None(); > _nestedCommandCheck(promise, cmd, nested); > } > ``` > In this way, if we fail to remove the check container (e.g., due to agent > has not finished recovery, or the check container is still in `DESTROYING` > state), we will try to remove it again.
I posted another patch https://reviews.apache.org/r/68555/ as I mentioned above. - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68495/#review207984 ----------------------------------------------------------- On Aug. 24, 2018, 5:54 p.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68495/ > ----------------------------------------------------------- > > (Updated Aug. 24, 2018, 5:54 p.m.) > > > Review request for mesos, Andrei Budnik, Alexander Rukletsov, Gastón Kleiman, > and Gilbert Song. > > > Bugs: MESOS-8568 > https://issues.apache.org/jira/browse/MESOS-8568 > > > Repository: mesos > > > Description > ------- > > Made command check always waits before removing the nested container. > > > Diffs > ----- > > src/checks/checker_process.cpp 77a76f465fe57eab89f027b5acb74c2339551678 > > > Diff: https://reviews.apache.org/r/68495/diff/1/ > > > Testing > ------- > > sudo make check > > > Thanks, > > Qian Zhang > >