----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66984/#review202673 -----------------------------------------------------------
The description could use some more explanation around allocating up to quota guarantee vs above guarantee up to limit, and how these pertain to exhaustiveness. This also seems to have an implication on the code (e.g. if quota check for candidates). I already forgot what the thinking is there so I certainly won't remember a week from now :) src/master/allocator/mesos/hierarchical.hpp Lines 440-445 (patched) <https://reviews.apache.org/r/66984/#comment284619> I think we can just update the above paragraph to clarify it's per role, rather than having it as a separate note here. src/master/allocator/mesos/hierarchical.hpp Lines 447-448 (patched) <https://reviews.apache.org/r/66984/#comment284620> Let's clearly specify the conditions under which it gets cleared, is this the only event outside of the set filling up? I'm inclined to clear filters for the agent attribute change case but keep it in the exclusion set, so that we don't hit the starvation problem this feature aims to address. src/master/allocator/mesos/hierarchical.hpp Lines 450-451 (patched) <https://reviews.apache.org/r/66984/#comment284621> Let's note here that if introduce cases such that we clear the set too frequently we run into the problem this feature aims to address. src/master/allocator/mesos/hierarchical.hpp Lines 452 (patched) <https://reviews.apache.org/r/66984/#comment284623> hashset? If not, can you clarify where we rely on the ordering? src/master/allocator/mesos/hierarchical.cpp Lines 652-654 (patched) <https://reviews.apache.org/r/66984/#comment284626> See my comment above. src/master/allocator/mesos/hierarchical.cpp Lines 1215-1217 (patched) <https://reviews.apache.org/r/66984/#comment284638> It's a little odd that it's not cleared here but it's cleared on other insertions. If we make the framework candidates a function can we then do the clearing here as well? Or is it too expensive to compute on each recovery of resources? src/master/allocator/mesos/hierarchical.cpp Lines 1216 (patched) <https://reviews.apache.org/r/66984/#comment284628> Just curious, what would it mean if it were already in the set? Would it be a bug? Do you want to check for that? src/master/allocator/mesos/hierarchical.cpp Lines 2067-2087 (patched) <https://reviews.apache.org/r/66984/#comment284645> Can you contain this in a labmda? ``` hashmap<string, set<FrameworkID>> roleFrameworkCandidates = [&]() { ... return candidates; }(); ``` src/master/allocator/mesos/hierarchical.cpp Lines 2084 (patched) <https://reviews.apache.org/r/66984/#comment284646> std::move it into the map src/master/allocator/mesos/hierarchical.cpp Lines 2089-2107 (patched) <https://reviews.apache.org/r/66984/#comment284648> Do you actually need subset? The way this is used it seems like you just need to use equality? src/master/allocator/mesos/hierarchical.cpp Line 2061 (original) <https://reviews.apache.org/r/66984/#comment284643> The removal of this looks like an optimization to avoid re-lookup of the agent, so we could pull that out in front of this change to get it quickly committed and minimize this diff. src/master/allocator/mesos/hierarchical.cpp Lines 2145 (patched) <https://reviews.apache.org/r/66984/#comment284649> Can you use .at for read-only access here and elsewhere? src/master/allocator/mesos/hierarchical.cpp Lines 2154-2160 (patched) <https://reviews.apache.org/r/66984/#comment284622> > In this case, the agent will keep offering resources to the frameworks that are omitted in the exclusion set, starving others. Hm.. is this right? It looks more like no one would be getting the offer? This invariant looks like: exclusion set for role != framework candidate set for role. I guess the more difficult invariant to check is: how do we avoid getting "stuck" with a partially full exclusion set? In some cases, it seems like this is actually the implemented behavior: if a framework keeps accepting and lauching small very short-lived tasks on an agent, another framework that declined it might never get it back. src/master/allocator/mesos/hierarchical.cpp Lines 2169-2180 (patched) <https://reviews.apache.org/r/66984/#comment284650> Is this required for correctness or is this an optimization? src/master/allocator/mesos/hierarchical.cpp Lines 2188-2199 (patched) <https://reviews.apache.org/r/66984/#comment284651> Is this required for correctness or is this an optimization? src/master/allocator/mesos/hierarchical.cpp Lines 2104-2151 (original), 2230-2282 (patched) <https://reviews.apache.org/r/66984/#comment284631> Moving this up seems like an optimization that we could pull out in front of this change to minimize this diff? src/master/allocator/mesos/hierarchical.cpp Lines 2287-2299 (patched) <https://reviews.apache.org/r/66984/#comment284634> Is this required for correctness? Or is this an optimization? src/master/allocator/mesos/hierarchical.cpp Lines 2315 (patched) <https://reviews.apache.org/r/66984/#comment284652> See my comment above, is subset actually needed here or is equality sufficient? src/master/allocator/mesos/hierarchical.cpp Lines 2343-2349 (patched) <https://reviews.apache.org/r/66984/#comment284642> Rather than the bool indirection, is it possible to check the two sets we care about? That would be more readable. - Benjamin Mahler On May 8, 2018, 4:55 p.m., Meng Zhu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66984/ > ----------------------------------------------------------- > > (Updated May 8, 2018, 4:55 p.m.) > > > Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and > Vinod Kone. > > > Repository: mesos > > > Description > ------- > > Schedulers that are below their fair share will continue to get > allocated resources even if they don't need those resources. The > expected behavior here is that schedulers will decline those resources > for a long time (or forever). Not all schedulers do this, however, > which means that some schedulers might get _starved_ of > resources. Technically these schedulers are already at or above their > fair share, but some operators feel that this is keeping the cluster > underutilized. > > Rather than guarantee that all schedulers will decline with large > timeouts this patch ensures that all other schedulers will see > resources from the declined agent before they see resources from that > agent again, even if there are now more resources available on that > agent. > > To this end, each agent maintains an exclusion set of frameworks > that declined its allocation earlier and skips them during the > allocation. Note, a framework here is associated with each of > its roles. If a framework declined resources allocated to one > of its roles, it can still get allocations for its other roles. > > Note, an agent's exclusion set will be cleared if its attribute > changes. > > > Diffs > ----- > > src/master/allocator/mesos/hierarchical.hpp > 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 > src/master/allocator/mesos/hierarchical.cpp > 1000968be6a2935a4cac571414d7f06d7df7acf0 > > > Diff: https://reviews.apache.org/r/66984/diff/5/ > > > Testing > ------- > > make check > Dedicated test in #66994 > > > Thanks, > > Meng Zhu > >
