[
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923897#comment-13923897
]
Thomas Graves commented on YARN-1127:
-------------------------------------
The capacity scheduler should have eventually looked at the second node even
with the first one being reserved. There is a formula for this where it its
bias'd against really large requests. What was your minimum allocation size
and your maximum allocation size? Can you still reproduce this on 2.3.0 or
newer?
Also note that this should be superceded by
https://issues.apache.org/jira/browse/YARN-1769 which makes it so that
reservations will continue to look other heartbeating nodes .
> reservation exchange and excess reservation is not working for capacity
> scheduler
> ---------------------------------------------------------------------------------
>
> Key: YARN-1127
> URL: https://issues.apache.org/jira/browse/YARN-1127
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.1.1-beta
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
> Priority: Blocker
>
> I have 2 node managers.
> * one with 1024 MB memory.(nm1)
> * second with 2048 MB memory.(nm2)
> I am submitting simple map reduce application with 1 mapper and one reducer
> with 1024mb each. The steps to reproduce this are
> * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's
> heartbeat doesn't reach RM first).
> * now submit application. As soon as it receives first node's (nm1) heartbeat
> it will try to reserve memory for AM-container (2048MB). However it has only
> 1024MB of memory.
> * now start nm2 with 2048 MB memory.
> It hangs forever... Ideally this has two potential issues.
> * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available
> memory. In this case if the original request was made without any locality
> then scheduler should unreserve memory on nm1 and allocate requested 2048MB
> container on nm2.
> * We support a notion where if say we have 5 nodes with 4 AM and all node
> managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now
> to avoid deadlock AM will make an extra reservation. By doing this we would
> never hit the deadlock situation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)