-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122146
-----------------------------------------------------------




src/master/http.cpp (line 2016)
<https://reviews.apache.org/r/44258/#comment184046>

    can we rename this to `unavailabilities`?
    `updated` suggests only ones that have been modified, which may be why we 
missed this problem before.



src/master/http.cpp (lines 2026 - 2027)
<https://reviews.apache.org/r/44258/#comment184045>

    Can you explain how machines going from `UP` to `DOWN` are handled in the 
next loop?
    I see logic for `UP` to `DRAINING` in the next loop.
    
    Also missing a backtick after `UP`



src/master/http.cpp (line 2028)
<https://reviews.apache.org/r/44258/#comment184044>

    Comments should be in sentence form: `merge` -> `Merge`.



src/master/http.cpp (lines 2031 - 2033)
<https://reviews.apache.org/r/44258/#comment184047>

    For some of these early exit conditions, does it make sense to add `CHECK`s 
(and maybe event comments) to document why we are exiting?
    Stating *that* we are exiting less helpful to readers than *why*.
    
    I think the implied invariant here (which we should call out explicitly) is 
that any machine should only be "touched" by 1 of the 2 loops here. The exit 
conditions between them are meant to enforce this exclusion?


- Joris Van Remoortere


On March 4, 2016, 2:10 a.m., Guangya Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> -----------------------------------------------------------
> 
> (Updated March 4, 2016, 2:10 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
>     https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>

Reply via email to