[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

Jason Lowe (JIRA) Mon, 09 Jan 2017 14:47:30 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813092#comment-15813092
 ]


Jason Lowe commented on YARN-4148:
----------------------------------

The unit test failures appear to be unrelated.  They pass for me locally with 
the patch applied, and there are JIRAs that are tracking those failures.  The 
TestDelegationTokenRenewer failure is being tracked by YARN-5816 and the 
TestRMRestart failure is tracked by YARN-5548.

Thanks for the review, [~djp]!  If you agree the failures are unrelated then 
feel free to commit, or I'll do so in a few days unless I hear otherwise.

> When killing app, RM releases app's resource before they are released by NM
> ---------------------------------------------------------------------------
>
>                 Key: YARN-4148
>                 URL: https://issues.apache.org/jira/browse/YARN-4148
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jun Gong
>            Assignee: Jason Lowe
>         Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.003.patch, YARN-4148.wip.patch, 
> free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

Reply via email to