> On Nov. 18, 2015, 3:51 p.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to 
> > satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request 
> (remember minimal agent count). If we have a check in place, this will 
> effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there 
> is an offer which only include 2 CPU & 2GB memory and has no disk resources 
> at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind 
> offers from at least `numF` agents? The reason is that we want to ensure each 
> framework in that role will have a chance to get an offer in next allocation 
> cycle?
> 
> Alexander Rukletsov wrote:
>     That's correct, we will rescind that offer and yes, it's a bit unfair. 
> Let me explain why I decided to remove this check. Suppose we a quota request 
> is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 
> CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but 
> we would like to rescind offers from at least 2 more agents. Having a check 
> in place will prevent us from doing so. Do you think greedy rescinding can be 
> a problem?
>     
>     Yes, we would like to facilitate allocation for each framework in the 
> role, for which quota is set.
> 
> Qian Zhang wrote:
>     The most unclear in my mind is why we need to rescind offers from at 
> least numF agents, i.e., in your example above, why do we want to rescind 
> offers from at least 2 more agents after quota has been satisfied? Can you 
> please let me know the motivation behind it? I think quota is kind of global 
> concept which should not have direct relation with agent and framework, it 
> should stay in role level. So I am not sure why we want to facilitate 
> allocation for each framework in the role, is that something that we 
> mentioned in design doc? Maybe I forget ... :-)
> 
> Alexander Rukletsov wrote:
>     Nope, it wasn't in the design doc, that's something we decided recently. 
> The main motivation is to improve user experience and simplify debugging. 
> Because the built-in allocator is used in 99% of clusters, it makes sense to 
> exploit some knowledge about how it works. Because of coarse-grained 
> allocations, to facilitate fairness we may want to rescind from more agents 
> than necessary to satisfy quota numbers.
> 
> Joris Van Remoortere wrote:
>     `why do we want to rescind offers from at least 2 more agents after quota 
> has been satisfied?`
>     Just to be clear: it's not numF or more agents *on top of* quota. It's at 
> least numF agents in case the quota itself doesn't already rescind offers 
> from that many.
>     
>     I'm not sure this is really "un-fair", as these are *offers*, and not 
> *allocations*. We are not pre-empting tasks. If the resources in the offers 
> that are rescinded are not needed for quota, then they will be re-offered 
> using the same fair-sharing logic that they were before. In fact, this is 
> *more* fair, as we might end up making better offers due to information that 
> has changed in the cluster.
>     
>     The argument for the `numF` condition that Alex is making is one I pushed 
> for. We often end up debugging clusters around new features, even not so new 
> features. Although the `numF` condition by no means guarantees that every 
> framework in the role will receive an offer, it does increase the chances 
> greatly. The fact that they will receive any offer at all means we will see 
> messages flowing to the framework, and hopefully log lines at the framework 
> after receiving the offer. If the offer is still too small to launch a task, 
> at least we will see a message at the framework level to that regard. **what 
> we are optimizing for** is the ability to eliminate quickly (in most cases) 
> the possibility that there is a bug in quota because the framework didn't 
> receive any offers.
>     
>     Please let me know if this is not clear, as I believe it is very 
> important. The more of us understand why this extra condition is here, the 
> fewer framework writers and cluster operators will be coming on IRC / dev 
> list with debug logs that don't allow us to easily eliminate quota as the 
> source of the problem.

Thanks Joris. So the motivation of this extra condition is to improve the 
debuggability, right? But I am still not clear about why increasing the chances 
for frameworks to receive offer will improve the debuggability of Mesos, can 
you please clarify more?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 25, 2015, 12:29 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 25, 2015, 12:29 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, 
> Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp e5e0ed01a56d869cc535687c8dbb6b99f6295b66 
>   src/master/quota_handler.cpp b8e501be43de6bc02aebfa5bd415b4212a96da31 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to