> On Dec. 1, 2017, 9:29 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp > > Line 6808 (original), 6808 (patched) > > <https://reviews.apache.org/r/64098/diff/5/?file=1905844#file1905844line6808> > > > > When considering the comment by Ilya in MESOS-6406 (i.e., what if > > agents GCed from the unreachable or gone list reregster?), seems like we > > can do this: > > > > 1. Move down the line `slaves.recovered.erase(slaveInfo.id());` to > > after we process `recoveredTasks`. > > 2. Instead of checking `slaves.unreachable.contains(slaveInfo.id()` we > > could check `!slaves.recovered.contains(slaveInfo.id()` > > 3. Now we are sending status updates for two cases: reregistering > > unreachable agents or unknown agents (which could have been marked either > > unreachable or gone but we can't distiguish) > > - We can distinguish unreachable and unknown in the task status > > message. > > - We can probably log a warning about tasks from unknown agents.
Sounds good, thanks for fomalizing. > On Dec. 1, 2017, 9:29 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp > > Lines 6818 (patched) > > <https://reviews.apache.org/r/64098/diff/5/?file=1905844#file1905844line6818> > > > > s/REASON_AGENT_REREGISTERED/REASON_SLAVE_REREGISTERED/ Thanks! - Megha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64098/#review192557 ----------------------------------------------------------- On Nov. 28, 2017, 12:55 a.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64098/ > ----------------------------------------------------------- > > (Updated Nov. 28, 2017, 12:55 a.m.) > > > Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu. > > > Bugs: MESOS-6406 > https://issues.apache.org/jira/browse/MESOS-6406 > > > Repository: mesos > > > Description > ------- > > Master will send task status updates to frameworks when an agent > which has been previously removed by the master for being unreachable > re-registers. > > > Diffs > ----- > > src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e > src/tests/master_allocator_tests.cpp > 3400d70bb0ba564eac43c4639eee0efd4d8059e6 > src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 > src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d > src/tests/persistent_volume_tests.cpp > 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 > src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 > src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c > > > Diff: https://reviews.apache.org/r/64098/diff/6/ > > > Testing > ------- > > with make check > > > Thanks, > > Megha Sharma > >
