[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4148:
-----------------------------
    Attachment: YARN-4148.002.patch

Sorry for the delay.  I rebased the patch on trunk and added a unit test.

We've been running with this patch on our production clusters for quite some 
time now, and it works well for us.  It simply tracks what the node has 
reported as running and does not allow the space on the node to be freed up 
until the node has reported the container as completed.  It _does_ free up the 
space in the scheduler queue sense, just not the specific node.  Therefore if 
there is sufficient space in the cluster elsewhere for containers the user 
limit won't artificially slow down allocation.

This patch does not address the race condition discussed above, so there could 
still be a case where the RM could over-allocate a node if a container is 
released by the RM when it is in the ACQUIRED state.  The node may be running 
the container but not yet heartbeated into the RM to let it know, and we will 
immediately free the space on the node since we never saw it running there.  In 
practice this isn't a significant problem for us, so this patch is working well 
to fix the most common case where this occurs (i.e.: container is already 
running for a while then is released by the RM and quickly re-allocated to 
something else).


> When killing app, RM releases app's resource before they are released by NM
> ---------------------------------------------------------------------------
>
>                 Key: YARN-4148
>                 URL: https://issues.apache.org/jira/browse/YARN-4148
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jun Gong
>            Assignee: Jason Lowe
>         Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.wip.patch, free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to