----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49650/#review141029 -----------------------------------------------------------
Fix it, then Ship it! Hmm...It surprises me the FETCHING state is not set for a long time. Thanks for fixing it. src/slave/containerizer/mesos/containerizer.cpp (line 1054) <https://reviews.apache.org/r/49650/#comment206404> Please add a check on `DESTROYING` state, otherwise it may cost a race if destroy while fetching. ``` if (containers_[containerId]->state == DESTROYING) { return Failure("Container is currently being destroyed"); } ``` - Gilbert Song On July 6, 2016, 10:07 a.m., Jiang Yan Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49650/ > ----------------------------------------------------------- > > (Updated July 6, 2016, 10:07 a.m.) > > > Review request for mesos, Jie Yu and Vinod Kone. > > > Bugs: MESOS-5763 > https://issues.apache.org/jira/browse/MESOS-5763 > > > Repository: mesos > > > Description > ------- > > If the container state is not properly set to FETCHING, Mesos agent > cannot detect the terminated executor when the fetcher times out. > > > Diffs > ----- > > src/slave/containerizer/mesos/containerizer.cpp > f53b01b0eef8dd24db28d9dbd86bcbd40dc8d17f > > Diff: https://reviews.apache.org/r/49650/diff/ > > > Testing > ------- > > make check. > > Also with an experimental setup using mesos-execute with an agent with a fake > hadoop binary that sleeps forever. The task is transitioned to LOST if the > executor fetching times out; without the patch the task is stuck in STAGING. > > Megha will submit a code test for this soon. > > > Thanks, > > Jiang Yan Xu > >