[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-7591:
---------------------------
    Attachment: YARN-7591.002.patch

Thanks [~leftnoteasy] for your suggestion. 
Attaching v2 patch with comments of #3 check. Please help to review again.

> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
>                 Key: YARN-7591
>                 URL: https://issues.apache.org/jira/browse/YARN-7591
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0-alpha4, 2.9.1
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
>     RMContainer reservedContainer = node.getReservedContainer();
>     if (reservedContainer != null) {
>       FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>           reservedContainer.getContainerId());
>       // NPE here: reservedApplication could be null after this application 
> finished
>       // Try to fulfill the reservation
>       LOG.info(
>           "Trying to fulfill reservation for application " + 
> reservedApplication
>               .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
>     if (reservedContainerOnNode != null) {
>       // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>       RMContainer fromReservedContainer =
>           allocation.getAllocateFromReservedContainer().getRmContainer();
>       if (fromReservedContainer != reservedContainerOnNode) {
>         if (LOG.isDebugEnabled()) {
>           LOG.debug(
>               "Try to allocate from a non-existed reserved container");
>         }
>         return false;
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to