[
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akira Ajisaka updated YARN-8233:
--------------------------------
Fix Version/s: 3.2.1
3.3.0
> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal
> whose allocatedOrReservedContainer is null
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.2.0
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.patch, YARN-8233.002.patch,
> YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
> ContainerAllocationProposal<FiCaSchedulerApp, FiCaSchedulerNode> c =
> request.getFirstAllocatedOrReservedContainer();
> attemptId =
> c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
> .getApplicationAttemptId(); //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in
> {{CapacityScheduler#createResourceCommitRequest}} and
> allocatedOrReservedContainer is possibly null in async-scheduling process
> when node was lost or application was finished (details in
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
> // Allocated something
> List<AssignmentInformation.AssignmentDetails> allocations =
> csAssignment.getAssignmentInformation().getAllocationDetails();
> if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true), //possibly null
> getSchedulerContainersToRelease(csAssignment),
>
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
> }
> {code}
> I think we should add null check for allocateOrReserveContainer before create
> allocate/reserve proposals. Besides the allocation process has increase
> unconfirmed resource of app when creating an allocate assignment, so if this
> check is null, we should decrease the unconfirmed resource of live app.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]