[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677562#comment-16677562 ]
Hudson commented on YARN-8233: ------------------------------ SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15378 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15378/]) YARN-8233. NPE in CapacityScheduler#tryCommit when handling (aajisaka: rev 951c98f89059d64fda8456366f680eff4a7a6785) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAsyncScheduling.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > --------------------------------------------------------------------------------------------------------------------- > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8233.001.patch, YARN-8233.002.patch, > YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal<FiCaSchedulerApp, FiCaSchedulerNode> c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List<AssignmentInformation.AssignmentDetails> allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org