> On Oct. 26, 2015, 1:49 p.m., Qian Zhang wrote: > > For this patch, it seems that we add the code related to quota support in > > the slave foreach loop in the HierarchicalAllocatorProcess::allocate(const > > hashset<SlaveID>& slaveIds_) method, so that means for **each slave**, we > > handle quota first and then the existing DRF fair share. I think there > > might be an issue for this approach: let say for the first slave, its > > available unreserved non-revocable resources can not satisfy a role’s quota > > due to the framework in this role has a filter for this slave, and then we > > lay aside the filtered resources of this slave for this role immediately. I > > think it might be too early for doing this since the other slaves may have > > resources which can satisfy this role’s quota. But if we lay aside this > > slave's resource for this role at this point, then the result is the > > framework of this role will not use these resources (due to the filter) AND > > all other role’s frameworks can not be offered with these resources too, > > this is kind of wasting resource s. > > > > I think maybe we can handle this quota support in this way: In > > HierarchicalAllocatorProcess::allocate(const hashset<SlaveID>& slaveIds_), > > leave the existing 3 levels foreach loops (slave/role/framework) as they > > are, and add the quota related code separately before them in this way: > > traverse all quota’ed roles, for each of them, traverse all the slaves, and > > allocate each slave’s available unreserved non-revocable resources to the > > role’s framework (take filter and suppress into account) until the role’s > > quota is satisfied. After all the quota’ed role has been traversed, if > > there are still some role’s quotas are not satisfied, then lay aside > > resources (should be the resources filtered or suppressed) for them. In > > this way, before laying aside resources, we have tried our best to use all > > slave's the available resources to satisfy the quotas first, there should > > be less resources wasted. > > Alexander Rukletsov wrote: > I'm not sure I got your point. If my mental compiler is correct, if a > framework in quota'ed role opts out, we do not immediately lay aside > resources. We do that after we have checked all the frameworks in the role in > a separate loop. > > Qian Zhang wrote: > Let me clarify my point with an example: > Say in the Mesos cluster, there are 2 agents, a1 and a2, each has 4GB > memory. And there are 2 roles, r1 and r2, r1 has a quota set (4GB) and r2 has > not quota set. r1 has a framework f1 which currenlty has no allocation but > has a filter (4GB memory on a1), r2 also has a framework f2 which currently > has no allocation too and has no filter. And there is no static/dynamic > reservation and no revocable resources. Now with the logic in this patch, for > a1, in the quotaRoleSorter foreach loop, when we handle the quota for r1, we > will not allocate a1's resouces to f1 because f1 has a filter, so a1's 4GB > memory will be laid aside to satisfy r1's quota. And then in the roleSorter > foreach loop, we will NOT allocate a1's resources to f2 too since currently > a1 has no available resources due to all its 4GB memory has been laid aside > for r1. And then when we handle a2, its 4GB memory will be allocated to f1, > so f2 will not get anything in the end. So the result is, a1's 4GB memory is > laid aside to satisfy r1's quota, a2's 4GB is allocated to f1, and f2 gets nothing. But I think for this example, the expected result should be, f1 gets a2's 4GB (r1's quota is also satisfied) and f2 gets a1's 4GB. > > Alexander Rukletsov wrote: > This can happen during the allocation cycle, but we do not persist laid > aside resources between allocation cycles. Without refactoring `allocate()` > we do not know whether we get a suitable agent, hence we have to lay aside. > But at the next allocation cycle, `r1`'s quota is satisfied and `f2` will get > `a1`'s 4GB, which is OK in my opinion. > > Qian Zhang wrote: > Yes, I understand ```f2``` will get 4GB at the ***next*** allocation > cycle. But with the proposal in my first post, in a ***single*** allocation > cycle, both ```f1``` and ```f2``` can get 4GB respectively because when we > find we can not allocate ```a1```'s 4GB to f1 due to the filter, we will NOT > lay aside resources at this point, instead we will try ```a2``` and allocate > ```a2```'s 4GB to f1, and then when we handle the fair share, we will > allocate ```a2```'s 4GB to ```f2```. I think this proposal is also aligned > with the principle mentioned in the design doc: quota first, fair share > second. My understanding to this design principle is, we handle all role's > quota for all slaves first, and then handle all role's fair share for all > slaves (my proposal), rather than for ***each slave*** handle all role's > quota and then all role's fair share (this patch). > > Alexander Rukletsov wrote: > Design doc addresses the semantics, not the actual implementation. An > important thing is to ensure that no allocationd for non quota'ed roles are > made unless there exist unsatisified quotas. I think the difference between > two approaches is not that big. I tend to agree that mine can be slightly > more confusing, but I would argue it's less intrusive (especially given we > plan an allocator refactoring).
We had an offline discussion with Joris and BenM regarding your proposal and agreed it has less impact on non-quota'ed frameworks. I will update the request soon. - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39401/#review104012 ----------------------------------------------------------- On Nov. 9, 2015, 10:01 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/39401/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2015, 10:01 p.m.) > > > Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, > and Joseph Wu. > > > Bugs: MESOS-3718 > https://issues.apache.org/jira/browse/MESOS-3718 > > > Repository: mesos > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/master/allocator/mesos/hierarchical.cpp > 14fef63714721fcda7cea3c28704766efda6d007 > > Diff: https://reviews.apache.org/r/39401/diff/ > > > Testing > ------- > > make check (Mac OS X 10.10.4) > > > Thanks, > > Alexander Rukletsov > >
