[ 
https://issues.apache.org/jira/browse/YARN-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13595500#comment-13595500
 ] 

Karthik Kambatla commented on YARN-454:
---------------------------------------

The code potentially causing that issue is:

{code}
    RMContainer reservedContainer = node.getReservedContainer();
    if (reservedContainer != null) {
       //schedule for reservedContainer
    }

    // Otherwise, schedule at queue which is furthest below fair share
    else {
      while (node.getReservedContainer() == null) {
         // allocate based on fairshare
      }
    }
{code}

Looks like the fairshare loop breaks even though it could schedule more, but it 
just abruptly stops. Need a test for the same, and a fix if there is indeed an 
issue.
                
> FS could wait until next NODE_UPDATE event to schedule a reserved container
> ---------------------------------------------------------------------------
>
>                 Key: YARN-454
>                 URL: https://issues.apache.org/jira/browse/YARN-454
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.0.3-alpha
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>
> FS#nodeUpdate() allocates reserved containers first. However, it seems (from 
> code observation): if an app reserves a container on a node while FS is 
> scheduling a task on that node from the non-reserved pool, the request is 
> skipped in that NODE_UPDATE event. It is addressed on the next event.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to