Questions about Capacity scheduler behavior

Fabio Wed, 29 Oct 2014 07:15:21 -0700

Hi guys, I have to model as well as possible the capacity schedulerbehavior and I have some questions, hope someone can help me with this(in the following I will consider all container to be equal for simplicity):

1) Let's say I have queues A and B. A is configured to get 20% (20containers) of the total cluster capacity (100 containers), B gets 80%(80 containers). Capacity scheduler gives available resources firstly tothe most under-served queue.In case A is using 10 containers and B is using 20, who is going to getthe first available container? A is already using 50% of it's assignedcapacity, B just 25%, but A has less containers than B... who isconsidered to be more under-served?

2) Does the previous question make sense at all? Because I have a doubtthat when I have free containers I will just serve requests as theyarrive, possibly over-provisioning a queue (that is: if I get acontainer request for an app in A, I will give it a container since Idon't know that after a few (milli)seconds I will get a new request fromB, or vice versa). The previous question may have sense if there wassome sort of buffer that is filled with incoming requests, due to thedifficulty of serving them in real time, thus making the scheduler ableto choose the request from the most under-served queue. Is this whathappens?


3) Now let's consider a new configuration:

We have a cluster hosting a total of 40 containers. We have 3 queues: Ais configured to get 39% of cluster capacity, B also gets 39% and C gets22%. The number of clusters is going to be 15.6, 15.6 and 8.8 for A, Ban C. Since we can't split a container, how does the Capacity schedulerround these values in a real case? Who gets the two contendedcontainers? I may think they are considered as extra containers, thusshared upon need among the three queues. Is this correct?

4) According to the example presented in "Apache Hadoop YARN: Movingbeyond MapReduce and Batch Processing with Apache Hadoop 2" about theresource allocation with the capacity scheduler, what I understood isthat the chance for a leaf queue to get resources above it's assignedcapacity is always upper-limited by the fraction of cluster capacityassigned to its first/closer parent queue. That is: if I am a leaf queueA1, I can only get at most the resources dedicated to my parent A, whileI can't get the ones from B, sibling of A, even if it doesn't have anyrunning application. Actually at first I thought this over-provisioningwas not limited, and regardless of the queue configuration a singleapplication could get the whole cluster (excluding per-applicationlimits). Did I misunderstood the example?


Thanks a lot

Fabio

Questions about Capacity scheduler behavior

Reply via email to