[
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284430#comment-16284430
]
Hudson commented on YARN-7591:
------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13350 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/13350/])
YARN-7591. NPE in async-scheduling mode of CapacityScheduler. (Tao Yang
(wangda: rev adca1a72e4eca2ea634551e9fb8e9b878c36cb5c)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.0.0-alpha4, 2.9.1
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Critical
> Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may
> be raised if getting something from user object without the null check in
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
> FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
> reservedContainer.getContainerId());
> // NPE here: reservedApplication could be null after this application
> finished
> // Try to fulfill the reservation
> LOG.info(
> "Trying to fulfill reservation for application " +
> reservedApplication
> .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve
> containerY on node1) were generated by different async-scheduling threads
> around the same time and proposal2 was submitted in front of proposal1, NPE
> is raised when trying to submit proposal2 in
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
> // NPE here: allocation.getAllocateFromReservedContainer() should be
> null for proposal2 in this case
> RMContainer fromReservedContainer =
> allocation.getAllocateFromReservedContainer().getRmContainer();
> if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
> LOG.debug(
> "Try to allocate from a non-existed reserved container");
> }
> return false;
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]