Jian He commented on YARN-4000:

bq. In recoverContainersOnNode, we check if application is present in the 
scheduler or not, which will not be there.
Ah, right, missed this part. thanks for pointing this out.
bq. we consider them as orphan containers and in the next HB from NM, report 
these containers as the ones to be cleaned up by NM.
Is this the case? I think in current code, RM is still ignoring these orphan 

> RM crashes with NPE if leaf queue becomes parent queue during restart
> ---------------------------------------------------------------------
>                 Key: YARN-4000
>                 URL: https://issues.apache.org/jira/browse/YARN-4000
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.

This message was sent by Atlassian JIRA

Reply via email to