> On Aug. 10, 2017, 9 p.m., Jiang Yan Xu wrote: > > Some of the comments below were made before I started to feel that we are > > probably doing too many conversions to justify storing these tasks in > > TASK_UNREACHABLE. Perhaps we can just store them in > > `Framework.unreachableTasks` but in TASK_LOST state? > > > > It's possible that we can add another map `BoundedHashMap<TaskID, > > process::Owned<Task>> unreachableNonPartitionAwareTasks;` for these tasks > > but it's clunky in the sense that you have to clarify that > > `unreachableTasks` is only for partition aware tasks but in fact all of > > these tasks belong to the same framework which is either parition aware or > > not, however with the possibility of changing capability... so it's > > probably easier to describe things if we just put all of them in > > `unreachableTasks` and simply say that "if the framework is not > > partition-aware, the tasks stored in `unreachableTasks` may be in > > `TASK_LOST`". > > > > If we do that, then some of the comments below don't apply any more but I > > am keep them just for posterity (some styling issues etc).
Dicussed in person, +1 for keeping the non partition aware but unreachable tasks in Framework.unreachableTasks in state TASK_LOST. - Megha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61473/#review182605 ----------------------------------------------------------- On Aug. 10, 2017, 4:07 p.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/61473/ > ----------------------------------------------------------- > > (Updated Aug. 10, 2017, 4:07 p.m.) > > > Review request for mesos, Vinod Kone and Jiang Yan Xu. > > > Bugs: MESOS-7215 > https://issues.apache.org/jira/browse/MESOS-7215 > > > Repository: mesos > > > Description > ------- > > Master will not kill the tasks for non-Partition aware frameworks > when an unreachable agent re-registers with the master. > Master used to send a ShutdownFrameworkMessages to the agent > to kill the tasks from non partition aware frameworks including the > ones that are still registered which was problematic because the offer > from this agent could still go to the same framework which could then > launch new tasks. The agent would then receive tasks of the same > framework and ignore them because it thinks the framework is shutting > down. The framework is not shutting down of course, so from the master > and the scheduler’s perspective the task is pending in STAGING forever > until the next agent reregistration, which could happen much later. > This commit fixes the problem by not shutting down the non-partition > aware frameworks on such an agent. > > > Diffs > ----- > > src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 > src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 > src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 > src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 > > > Diff: https://reviews.apache.org/r/61473/diff/3/ > > > Testing > ------- > > make check > > > Thanks, > > Megha Sharma > >
