> On Oct. 26, 2015, 1:49 p.m., Qian Zhang wrote:
> > For this patch, it seems that we add the code related to quota support in 
> > the slave foreach loop in the HierarchicalAllocatorProcess::allocate(const 
> > hashset<SlaveID>& slaveIds_) method, so that means for **each slave**, we 
> > handle quota first and then the existing DRF fair share. I think there 
> > might be an issue for this approach: let say for the first slave, its 
> > available unreserved non-revocable resources can not satisfy a role’s quota 
> > due to the framework in this role has a filter for this slave, and then we 
> > lay aside the filtered resources of this slave for this role immediately. I 
> > think it might be too early for doing this since the other slaves may have 
> > resources which can satisfy this role’s quota. But if we lay aside this 
> > slave's resource for this role at this point, then the result is the 
> > framework of this role will not use these resources (due to the filter) AND 
> > all other role’s frameworks can not be offered with these resources too, 
> > this is kind of wasting resource
> > 
> > I think maybe we can handle this quota support in this way: In 
> > HierarchicalAllocatorProcess::allocate(const hashset<SlaveID>& slaveIds_), 
> > leave the existing 3 levels foreach loops (slave/role/framework) as they 
> > are, and add the quota related code separately before them in this way: 
> > traverse all quota’ed roles, for each of them, traverse all the slaves, and 
> > allocate each slave’s available unreserved non-revocable resources to the 
> > role’s framework (take filter and suppress into account) until the role’s 
> > quota is satisfied. After all the quota’ed role has been traversed, if 
> > there are still some role’s quotas are not satisfied, then lay aside 
> > resources (should be the resources filtered or suppressed) for them. In 
> > this way, before laying aside resources, we have tried our best to use all 
> > slave's the available resources to satisfy the quotas first, there should 
> > be less resources wasted.
> Alexander Rukletsov wrote:
>     I'm not sure I got your point. If my mental compiler is correct, if a 
> framework in quota'ed role opts out, we do not immediately lay aside 
> resources. We do that after we have checked all the frameworks in the role in 
> a separate loop.
> Qian Zhang wrote:
>     Let me clarify my point with an example:
>     Say in the Mesos cluster, there are 2 agents, a1 and a2, each has 4GB 
> memory. And there are 2 roles, r1 and r2, r1 has a quota set (4GB) and r2 has 
> not quota set. r1 has a framework f1 which currenlty has no allocation but 
> has a filter (4GB memory on a1), r2 also has a framework f2 which currently 
> has no allocation too and has no filter. And there is no static/dynamic 
> reservation and no revocable resources. Now with the logic in this patch, for 
> a1, in the quotaRoleSorter foreach loop, when we handle the quota for r1, we 
> will not allocate a1's resouces to f1 because f1 has a filter, so a1's 4GB 
> memory will be laid aside to satisfy r1's quota. And then in the roleSorter 
> foreach loop, we will NOT allocate a1's resources to f2 too since currently 
> a1 has no available resources due to all its 4GB memory has been laid aside 
> for r1. And then when we handle a2, its 4GB memory will be allocated to f1, 
> so f2 will not get anything in the end. So the result is, a1's 4GB memory is 
> laid aside
  to satisfy r1's quota, a2's 4GB is allocated to f1, and f2 gets nothing. But 
I think for this example, the expected result should be, f1 gets a2's 4GB (r1's 
quota is also satisfied) and f2 gets a1's 4GB.
> Alexander Rukletsov wrote:
>     This can happen during the allocation cycle, but we do not persist laid 
> aside resources between allocation cycles. Without refactoring `allocate()` 
> we do not know whether we get a suitable agent, hence we have to lay aside. 
> But at the next allocation cycle, `r1`'s quota is satisfied and `f2` will get 
> `a1`'s 4GB, which is OK in my opinion.
> Qian Zhang wrote:
>     Yes, I understand ```f2``` will get 4GB at the ***next*** allocation 
> cycle. But with the proposal in my first post, in a ***single*** allocation 
> cycle, both ```f1``` and ```f2``` can get 4GB respectively because when we 
> find we can not allocate ```a1```'s 4GB to f1 due to the filter, we will NOT 
> lay aside resources at this point, instead we will try ```a2``` and allocate 
> ```a2```'s 4GB to f1, and then when we handle the fair share, we will 
> allocate ```a2```'s 4GB to ```f2```. I think this proposal is also aligned 
> with the principle mentioned in the design doc: quota first, fair share 
> second. My understanding to this design principle is, we handle all role's 
> quota for all slaves first, and then handle all role's fair share for all 
> slaves (my proposal), rather than for ***each slave*** handle all role's 
> quota and then all role's fair share (this patch).

Design doc addresses the semantics, not the actual implementation. An important 
thing is to ensure that no allocationd for non quota'ed roles are made unless 
there exist unsatisified quotas. I think the difference between two approaches 
is not that big. I tend to agree that mine can be slightly more confusing, but 
I would argue it's less intrusive (especially given we plan an allocator 

- Alexander

This is an automatically generated e-mail. To reply, visit:

On Oct. 27, 2015, 7:27 p.m., Alexander Rukletsov wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39401/
> -----------------------------------------------------------
> (Updated Oct. 27, 2015, 7:27 p.m.)
> Review request for mesos, Bernd Mathiske, Joerg Schad, and Joris Van 
> Remoortere.
> Bugs: MESOS-3718
>     https://issues.apache.org/jira/browse/MESOS-3718
> Repository: mesos
> Description
> -------
> See summary.
> Diffs
> -----
>   src/master/allocator/mesos/hierarchical.cpp 
> f4e4a123d3da0442e8b0b0ad14d1ee760752ba36 
> Diff: https://reviews.apache.org/r/39401/diff/
> Testing
> -------
> make check (Mac OS X 10.10.4)
> Thanks,
> Alexander Rukletsov

Reply via email to