----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44258/#review121643 -----------------------------------------------------------
src/master/http.cpp (lines 1999 - 2000) <https://reviews.apache.org/r/44258/#comment183392> I think we can just remove `master->updateUnavailability(id, updated[id]);` here, so other machine will `UP` and scheduler will be updated in next loop. Futhur more, we can avoid the loop for update. The draft code will be: ``` hashmap<MachineID, Unavailability> updated; // delete the loop for "updated[id] = window.unavailability();" foreach (const mesos::maintenance::Window& window, schedule.windows()) { foreach (const MachineID& id, window.machine_ids()) { ... master->updateUnavailability(id, window.unavailability()); updated[id] = window.unavailability(); } } foreachkey (const MachineID& id, utils::copy(master->machines)) { if (updated.contains(id)) { continue; } // Transition each removed machine back to the `UP` mode and remove the // unavailability. master->machines[id].info.set_mode(MachineInfo::UP); master->updateUnavailability(id, None()); } ``` - Klaus Ma On March 2, 2016, 3:45 p.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44258/ > ----------------------------------------------------------- > > (Updated March 2, 2016, 3:45 p.m.) > > > Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu. > > > Bugs: MESOS-4831 > https://issues.apache.org/jira/browse/MESOS-4831 > > > Repository: mesos > > > Description > ------- > > There is a bug when setting host maintain with http endpoint: > https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021 > The logic is as this: > 1) Get all host list from maintain window and put it to updated hashmap. > 2) If the machine in was in updated was also in master->machines, call master > updateUnavailability to trigger recoverResources, updateUnavailability etc in > allocator > 3) Otherwise, clear the unavailabity time window for the machine. > 4) Update each new machines in updated to call master updateUnavailability > > But the logic in step 4) is getting all machines from the schedule windows > but not the machines that is new to the cluster, this caused master get two > updateUnavailability calls for a machine in the updated hashmap. > > The fix is filter machines in updated hashmap when handling new machines. > > > Diffs > ----- > > src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 > src/tests/master_maintenance_tests.cpp > 3faa8136cf57276295553910319480028f433e4c > > Diff: https://reviews.apache.org/r/44258/diff/ > > > Testing > ------- > > make > make check > ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose > > > Thanks, > > Guangya Liu > >