Bibin A Chundatt created YARN-4618:
--------------------------------------

             Summary: RM Stops allocating containers if large number pending 
containers
                 Key: YARN-4618
                 URL: https://issues.apache.org/jira/browse/YARN-4618
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Bibin A Chundatt
            Assignee: Bibin A Chundatt
            Priority: Critical


In  one of the test found that when RM is having so many pending container 
request to be served RM Stops assigning containers.

Root total = 6 lakhs containers = 
Queue 1 = 3 lakh containers = 1328800000 MB
Queue 2 = 3+ lakh containers = 1428800000 MB
Each container request is with 4GB. 


{{ParentQueue#assignContainers}} is as below
{noformat}
    // Check if this queue need more resource, simply skip allocation if this
    // queue doesn't need more resources.
    if (!super.hasPendingResourceRequest(node.getPartition(),
        clusterResource, schedulingMode)) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("Skip this queue=" + getQueuePath()
            + ", because it doesn't need more resource, schedulingMode="
            + schedulingMode.name() + " node-partition=" + node.getPartition());
      }
      return CSAssignment.NULL_ASSIGNMENT;
    }
{noformat}

When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX 
MB}} and always NULL_ASSIGNMENT is return.

Tools used to test SLS.

For checking pendingResource request we should first check any pending 
containers (from getMetrics()) are there to be served. If pending containers 
are available then return true else consider other check for increase request.

Thoughts ??







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to