> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 757
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line757>
> >
> >     So if `operation` contians multiple copies, the result will end up 
> > having multiple copies right?

This piece of code is executed for non LAUNCH operations only. So we should 
have at most a single copy of shared resources at this point for a specific 
operation.


> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 752-755
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line752>
> >
> >     This seems the wrong place to do it, multiple copies of the same shared 
> > resources are added to the allocation of the role (in roleSorter and 
> > quotaRoleSorter) from multiple places:
> >     
> >     - updateAllocation
> >     - addFramework
> >     - addSlave
> >     - allocate
> >     
> >     If the rule is to not have more than one copy the shared resources to 
> > roleSorter and quotaRoleSorter, they should be invariants, i.e., we should 
> > prevent them from being added instead of deduping them here.
> >     
> >     Let's chat about the design.

That is indeed the case. We can have multiple copies of shared resources in 
allocations for role sorter and quota sorter; but only a single copy of shared 
resources in the total maintained in role sorter and quota sorter. The issue 
here is that we use the shared count in allocations in framework sorter to 
remove appropriate resources in the role and quota sorter's total resources. 
Since total resources in role sorter and quota sorter is atmost 1, but 
allocations in framework sorter can be more which results in 
`CHECK(total_.resources[slaveId].contains(resources))` failing.

So, this fix seems to take care of the inconsistencies in the shared count. 
However, as discussed it would be better to use the agent's total resources 
(before and after applying the appropriate `operation`) to account for the 
total resources in the role and quota sorter, i.e. something similar to the 
approach in `HierarchicalAllocatorProcess::updateAvailable()`, as follows:

```
  // Update the total resources.
  Try<Resources> updatedTotal = slaves[slaveId].total.apply(operations);
  CHECK_SOME(updatedTotal);

  const Resources oldTotal = slaves[slaveId].total;
  slaves[slaveId].total = updatedTotal.get();

  // Now, update the total resources in the role sorters by removing
  // the previous resources at this slave and adding the new resources.
  roleSorter->remove(slaveId, oldTotal);
  roleSorter->add(slaveId, updatedTotal.get());

  // See comment at `quotaRoleSorter` declaration regarding non-revocable.
  quotaRoleSorter->remove(slaveId, oldTotal.nonRevocable());
  quotaRoleSorter->add(slaveId, updatedTotal.get().nonRevocable());
```


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53096/#review156326
-----------------------------------------------------------


On Nov. 18, 2016, 5:14 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53096/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2016, 5:14 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6444
>     https://issues.apache.org/jira/browse/MESOS-6444
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We maintain a single copy of shared resource in the role and quota
> sorter's total resources. So, when we update these resources, we ensure
> that we only count a single copy even though the framework sorter
> may be returned multiple copies of a shared resource.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> c8f9492ee1b69e125a1e841116d22a578a9b524e 
>   src/tests/persistent_volume_tests.cpp 
> 8651b2c5455089041f16d091c90a7e0d948191b8 
> 
> Diff: https://reviews.apache.org/r/53096/diff/
> 
> 
> Testing
> -------
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to