[
https://issues.apache.org/jira/browse/YARN-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718504#comment-13718504
]
Omkar Vinit Joshi commented on YARN-389:
----------------------------------------
[~zjshen] [~bikassaha] I think we should reject the problematic requests at
allocate call but not when it is accepted. As that will be a problem.
* For allocate call today we are only rejecting requests if their request is
more than what cluster has but we don't do any validation w.r.t. how much a
single container will need to run. I think we should add that check.
SchedulerUtils#validateResourceRequest().. thoughts??
* We can not reject requests once they are accepted. How the AM will come to
know which requests were rejected later? is there anyway we can inform AM about
the accepted (earlier) but now rejected requests? One more thing to be
considered here is that Node manager having large amount of resources may go
down and come back in short span.. (node reconnect or..node removed and added
back after very small time)..in whichever case we should not reject that
request if it was accepted....large jobs will definitely suffer if few nodes
restart in very short span.. thoughts?
> Infinitely assigning containers when the required resource exceeds the
> cluster's absolute capacity
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-389
> URL: https://issues.apache.org/jira/browse/YARN-389
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Zhijie Shen
> Assignee: Omkar Vinit Joshi
>
> I've run wordcount example on branch-2 and trunk. I've set
> yarn.nodemanager.resource.memory-mb to 1G and
> yarn.app.mapreduce.am.resource.mb to 1.5G. Therefore, resourcemanager is to
> assign a 2G AM container for AM. However, the nodemanager doesn't have enough
> memory to assign the container. The problem is that the assignment operation
> will be repeated infinitely, if the assignment cannot be accomplished. Logs
> follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira