Omkar Vinit Joshi created YARN-1127:
---------------------------------------

             Summary: reservation exchange and excess reservation is not 
working for capacity scheduler
                 Key: YARN-1127
                 URL: https://issues.apache.org/jira/browse/YARN-1127
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.1.1-beta
            Reporter: Omkar Vinit Joshi
            Assignee: Omkar Vinit Joshi
            Priority: Blocker


I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer 
with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat 
it will try to reserve memory for AM-container (2048MB). However it has only 
1024MB of memory.
* now start nm2 with 2048 MB memory.

It hangs forever... Ideally this has two potential issues.

* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
memory. In this case if the original request was made without any locality then 
scheduler should unreserve memory on nm1 and allocate requested 2048MB 
container on nm2. 
* We support a notion where if say we have 5 nodes with 4 AM and all node 
managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now to 
avoid deadlock AM will make an extra reservation. By doing this we would never 
hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to