> On March 4, 2016, 8:28 p.m., Joris Van Remoortere wrote: > > src/master/http.cpp, lines 2026-2027 > > <https://reviews.apache.org/r/44258/diff/3/?file=1280368#file1280368line2026> > > > > Can you explain how machines going from `UP` to `DOWN` are handled in > > the next loop? > > I see logic for `UP` to `DRAINING` in the next loop. > > > > Also missing a backtick after `UP`
My bad, it is from `UP` to `DRAINING`. > On March 4, 2016, 8:28 p.m., Joris Van Remoortere wrote: > > src/master/http.cpp, lines 2031-2033 > > <https://reviews.apache.org/r/44258/diff/3/?file=1280368#file1280368line2031> > > > > For some of these early exit conditions, does it make sense to add > > `CHECK`s (and maybe event comments) to document why we are exiting? > > Stating *that* we are exiting less helpful to readers than *why*. > > > > I think the implied invariant here (which we should call out > > explicitly) is that any machine should only be "touched" by 1 of the 2 > > loops here. The exit conditions between them are meant to enforce this > > exclusion? I was adding some comments to address any machine should only be "touched" by 1 of the 2 loops here. Can you please show more for how we can add `CHECK` here? I did not found a good way to add `CHECK` for here but only by checking via some `if` conditions. - Guangya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44258/#review122146 ----------------------------------------------------------- On March 4, 2016, 2:10 a.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44258/ > ----------------------------------------------------------- > > (Updated March 4, 2016, 2:10 a.m.) > > > Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu. > > > Bugs: MESOS-4831 > https://issues.apache.org/jira/browse/MESOS-4831 > > > Repository: mesos > > > Description > ------- > > There is a bug when setting host maintain with http endpoint: > https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021 > The logic is as this: > 1) Get all host list from maintain window and put it to updated hashmap. > 2) If the machine in was in updated was also in master->machines, call master > updateUnavailability to trigger recoverResources, updateUnavailability etc in > allocator > 3) Otherwise, clear the unavailabity time window for the machine. > 4) Update each new machines in updated to call master updateUnavailability > > But the logic in step 4) is getting all machines from the schedule windows > but not the machines that is new to the cluster, this caused master get two > updateUnavailability calls for a machine in the updated hashmap. > > The fix is filter machines in updated hashmap when handling new machines. > > > Diffs > ----- > > src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd > src/tests/master_maintenance_tests.cpp > 3faa8136cf57276295553910319480028f433e4c > > Diff: https://reviews.apache.org/r/44258/diff/ > > > Testing > ------- > > make > make check > ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose > > > Thanks, > > Guangya Liu > >