[
https://issues.apache.org/jira/browse/YARN-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13595500#comment-13595500
]
Karthik Kambatla commented on YARN-454:
---------------------------------------
The code potentially causing that issue is:
{code}
RMContainer reservedContainer = node.getReservedContainer();
if (reservedContainer != null) {
//schedule for reservedContainer
}
// Otherwise, schedule at queue which is furthest below fair share
else {
while (node.getReservedContainer() == null) {
// allocate based on fairshare
}
}
{code}
Looks like the fairshare loop breaks even though it could schedule more, but it
just abruptly stops. Need a test for the same, and a fix if there is indeed an
issue.
> FS could wait until next NODE_UPDATE event to schedule a reserved container
> ---------------------------------------------------------------------------
>
> Key: YARN-454
> URL: https://issues.apache.org/jira/browse/YARN-454
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 2.0.3-alpha
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
>
> FS#nodeUpdate() allocates reserved containers first. However, it seems (from
> code observation): if an app reserves a container on a node while FS is
> scheduling a task on that node from the non-reserved pool, the request is
> skipped in that NODE_UPDATE event. It is addressed on the next event.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira