-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35702/
-----------------------------------------------------------

(Updated July 31, 2015, 9:56 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, Joris 
Van Remoortere, and Vinod Kone.


Changes
-------

Rebased.


Bugs: MESOS-2600
    https://issues.apache.org/jira/browse/MESOS-2600


Repository: mesos


Description
-------

This involved a lot more challenges than I anticipated, I've captured the 
various approaches and limitations and deal-breakers of those approaches here: 
[Master Endpoint Implementation 
Challenges](https://docs.google.com/document/d/1cwVz4aKiCYP9Y4MOwHYZkyaiuEv7fArCye-vPvB2lAI/edit#)

Key points:

* This is a stop-gap solution until we shift the offer creation/management 
logic from the master to the allocator.
* `updateAvailable` and `updateSlave` are kept separate because
  (1) `updateAvailable` is allowed to fail whereas `updateSlave` must not.
  (2) `updateAvailable` returns a `Future` whereas `updateSlave` does not.
  (3) `updateAvailable` never leaves the allocator in an over-allocated state 
and must not, whereas `updateSlave` does, and can.
* The algorithm:
    * Initially, the master pessimistically assume that what seems like 
"available" resources will be gone.
      This is due to the race between the allocator scheduling an `allocate` 
call to itself vs master's `allocator->updateAvailable` invocation.
      As such, we first try to satisfy the request only with the offered 
resources.
    * We greedily rescind one offer at a time until we've rescinded 
sufficiently many offers.
      IMPORTANT: We perform `recoverResources(..., Filters())` rather than 
`recoverResources(..., None())` so that we can pretty much always win the race 
against `allocate`.
                 In the case that we lose, no disaster occurs. We simply fail 
to satisfy the request.
    * If we still don't have enough resources after resciding all offers, be 
optimistic and forward the request to the allocator since there may be 
available resources to satisfy the request.
    * If the allocator returns a failure, report the error to the user with 
`PreconditionFailed`. This could be updated to be `Forbidden`, or `Conflict` 
maybe as well. We'll pick one eventually.

This approach is clearly not ideal, since we would prefer to rescind as little 
offers as possible.
The challenges of implementing the ideal solution in the current state is 
described in the document above.

TODO(mpark): Add more comments and test cases.


Diffs (updated)
-----

  src/master/http.cpp 3772e39015a22655dcad00ad844dc5ddc90db43f 
  src/master/master.hpp ea18c4e0bb0743747401b9cd5ea14ae9b56ae3cc 
  src/master/master.cpp 351a3c2b5f551ad065682cea601d2436258e4544 
  src/master/validation.hpp 43b8d84556e7f0a891dddf6185bbce7ca50b360a 
  src/master/validation.cpp ffb7bf07b8a40d6e14f922eabcf46045462498b5 

Diff: https://reviews.apache.org/r/35702/diff/


Testing
-------

`make check`


Thanks,

Michael Park

Reply via email to