----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44258/#review122146 -----------------------------------------------------------
src/master/http.cpp (line 2016) <https://reviews.apache.org/r/44258/#comment184046> can we rename this to `unavailabilities`? `updated` suggests only ones that have been modified, which may be why we missed this problem before. src/master/http.cpp (lines 2026 - 2027) <https://reviews.apache.org/r/44258/#comment184045> Can you explain how machines going from `UP` to `DOWN` are handled in the next loop? I see logic for `UP` to `DRAINING` in the next loop. Also missing a backtick after `UP` src/master/http.cpp (line 2028) <https://reviews.apache.org/r/44258/#comment184044> Comments should be in sentence form: `merge` -> `Merge`. src/master/http.cpp (lines 2031 - 2033) <https://reviews.apache.org/r/44258/#comment184047> For some of these early exit conditions, does it make sense to add `CHECK`s (and maybe event comments) to document why we are exiting? Stating *that* we are exiting less helpful to readers than *why*. I think the implied invariant here (which we should call out explicitly) is that any machine should only be "touched" by 1 of the 2 loops here. The exit conditions between them are meant to enforce this exclusion? - Joris Van Remoortere On March 4, 2016, 2:10 a.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44258/ > ----------------------------------------------------------- > > (Updated March 4, 2016, 2:10 a.m.) > > > Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu. > > > Bugs: MESOS-4831 > https://issues.apache.org/jira/browse/MESOS-4831 > > > Repository: mesos > > > Description > ------- > > There is a bug when setting host maintain with http endpoint: > https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021 > The logic is as this: > 1) Get all host list from maintain window and put it to updated hashmap. > 2) If the machine in was in updated was also in master->machines, call master > updateUnavailability to trigger recoverResources, updateUnavailability etc in > allocator > 3) Otherwise, clear the unavailabity time window for the machine. > 4) Update each new machines in updated to call master updateUnavailability > > But the logic in step 4) is getting all machines from the schedule windows > but not the machines that is new to the cluster, this caused master get two > updateUnavailability calls for a machine in the updated hashmap. > > The fix is filter machines in updated hashmap when handling new machines. > > > Diffs > ----- > > src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd > src/tests/master_maintenance_tests.cpp > 3faa8136cf57276295553910319480028f433e4c > > Diff: https://reviews.apache.org/r/44258/diff/ > > > Testing > ------- > > make > make check > ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose > > > Thanks, > > Guangya Liu > >
