> On Oct. 28, 2019, 7:07 p.m., Benjamin Mahler wrote: > > src/master/master.cpp > > Lines 7848 (patched) > > <https://reviews.apache.org/r/71641/diff/2/?file=2170613#file2170613line7848> > > > > Hm.. don't we enforce agent removal by not allowing the agent to > > re-register? > > > > In the framework removal case, I guess we're not enforcing it? > > > > Having the task transition out of terminal seems a bit strange for > > those two cases (are there other cases?)
One scenario where this can happen is maintenance where an agent goes `down` and then `up` again after agent failover. The master will transition the tasks without waiting for task status updates from the agent. This patch adds a test for that (which fails without the patch). I could imagine scenarios involving framework teardown, agent failover, and framework registration using the old `FrameworkID` as well when the master has already forgotten the ID. This patch merely introduces a patch for possible inconsistencies due to the design; we should fix the design as well, see e.g., MESOS-9940 which addresses one framework teardown edge case. - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71641/#review218422 ----------------------------------------------------------- On Oct. 28, 2019, 6:53 p.m., Benjamin Bannier wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71641/ > ----------------------------------------------------------- > > (Updated Oct. 28, 2019, 6:53 p.m.) > > > Review request for mesos, Benno Evers, Benjamin Mahler, and Greg Mann. > > > Bugs: MESOS-10018 > https://issues.apache.org/jira/browse/MESOS-10018 > > > Repository: mesos > > > Description > ------- > > Under certain conditions tasks which were previously `TASK_LOST` and > completed can reappear in non-terminal states, e.g., if the agent on > which they where running reconnect. > > This patch adds garbage collection of such completed tasks so that users > do not see tasks twice when obtaining task information from the master > API. This change does not affect tasks status updates where we already > correctly reported a previously `TASK_LOST` state as superseded by e.g., > `TASK_RUNNING`. > > > Diffs > ----- > > src/master/master.cpp 351823e69f14dbb5eb1ea2b108c42e93722f1eff > src/tests/master_tests.cpp 5486e23ce146eda9191e081a48c1f3fcb52a7569 > > > Diff: https://reviews.apache.org/r/71641/diff/2/ > > > Testing > ------- > > `make check` > > > Thanks, > > Benjamin Bannier > >
