[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276346#comment-16276346
 ] 

Tao Yang commented on YARN-7591:
--------------------------------

Thanks [~leftnoteasy] for review. 
{quote}
I'm still not sure why commit reserved container before allocate container can 
result in NPE? And could you add such explanations to the code as well so we 
can understand why the additional null check is required.
{quote}
For proposal1 (allocate containerX on node1) , the 
allocateFromReservedContainer field in ContainerAllocationProposal object is 
null because {{csAssignment.getFulfilledReservedContainer()}} is null in next 
code segment.
{noformat} 
        allocated = new ContainerAllocationProposal<>(
            getSchedulerContainer(rmContainer, true),
            getSchedulerContainersToRelease(csAssignment),
            getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
                false), csAssignment.getType(),   //set 
allocateFromReservedContainer field=null for proposal1
            csAssignment.getRequestLocalityType(),
            csAssignment.getSchedulingMode() != null ?
                csAssignment.getSchedulingMode() :
                SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
            csAssignment.getResource());
{noformat}
When trying to commit proposal1 and making sure node is not reserved by anyone 
else in {{FiCaSchedulerApp#commonCheckContainerAllocation}}, 
fromReservedContainer will be got for this check through 
{{allocation.getAllocateFromReservedContainer().getRmContainer()}} when node 
already has reserved container(because of proposal2), NPE will be raised 
because {{allocation.getAllocateFromReservedContainer()}} is null.
Please correct me if I misunderstand something.

> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
>                 Key: YARN-7591
>                 URL: https://issues.apache.org/jira/browse/YARN-7591
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0-alpha4, 2.9.1
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
>     RMContainer reservedContainer = node.getReservedContainer();
>     if (reservedContainer != null) {
>       FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>           reservedContainer.getContainerId());
>       // NPE here: reservedApplication could be null after this application 
> finished
>       // Try to fulfill the reservation
>       LOG.info(
>           "Trying to fulfill reservation for application " + 
> reservedApplication
>               .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
>     if (reservedContainerOnNode != null) {
>       // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>       RMContainer fromReservedContainer =
>           allocation.getAllocateFromReservedContainer().getRmContainer();
>       if (fromReservedContainer != reservedContainerOnNode) {
>         if (LOG.isDebugEnabled()) {
>           LOG.debug(
>               "Try to allocate from a non-existed reserved container");
>         }
>         return false;
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to