Yan Xu created MESOS-6774:
-----------------------------

             Summary: Role sorter and quota role sorter can have more copies of 
share resources in allocations than in total.
                 Key: MESOS-6774
                 URL: https://issues.apache.org/jira/browse/MESOS-6774
             Project: Mesos
          Issue Type: Improvement
          Components: allocation
            Reporter: Yan Xu


The way shared resources support works in the allocator is to allocate multiple 
copies of the shared resources so multiple frameworks can receive them. 
Multiple copies of the same shared resources doesn't affect the quantity of the 
sorter's allocations and total pool so it doesn't have an impact on DRF.

To make resource accounting work, though, when the copies of the same resource 
are add to a framework's allocation, we increase total size of the total pool 
in the sorter (again, adding these copies doesn't affect quantity) so that the 
*allocations in a sorter is always bounded by the total pool in the sorter*. 
This invariant is a requirement for the following logic in the allocator to 
work:

{code:title=Remove the resources from the framework sorter when it's 
unallocated from the framework}
      frameworkSorters[role]->unallocated(
          frameworkId.value(), slaveId, resources);
      frameworkSorters[role]->remove(slaveId, resources);
{code}

e.g., if there are 2 copies of a shared disk allocated to framework1, the 
sorter's total pool has 2 copies of the disk as well.

However we currently only do this for the framework sorter below a role because 
the allocator (implicitly) assumes that role sorter, being the root-level 
sorter, has a total pool that's unchanged during allocation or resource 
recover. This is not a problem right now because for this reason, 
{{Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const 
SlaveID& slaveId, const Resources& resources)}} are not called during 
allocation or resource recover.

This will likely change with MESOS-6375, when role sorters are having a 
hierarchy so not all of them are bound to the physical size of the cluster. We 
should revisit the shared resource allocation logic then to make sure the 
invariant *allocations in a sorter is always bounded by the total pool in the 
sorter* holds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to