[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284620#comment-16284620 ] Wangda Tan commented on YARN-7591: -- [~subru], cherry-picked to branch-3.0 / branch-2.9 / branch-2. Updated fix version. Thanks for the reminding! > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.0.0, 2.9.1 > > Attachments: YARN-7591.001.patch, YARN-7591.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284441#comment-16284441 ] Subru Krishnan commented on YARN-7591: -- Thanks [~Tao Yang] for the contribution and [~leftnoteasy] for the review/commit. [~leftnoteasy], I see the commit in trunk but not in branch-2/2.9 so are you planning cherry-pick down? > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch, YARN-7591.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284430#comment-16284430 ] Hudson commented on YARN-7591: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13350 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13350/]) YARN-7591. NPE in async-scheduling mode of CapacityScheduler. (Tao Yang (wangda: rev adca1a72e4eca2ea634551e9fb8e9b878c36cb5c) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch, YARN-7591.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282233#comment-16282233 ] genericqa commented on YARN-7591: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 5s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 138 unchanged - 0 fixed = 139 total (was 138) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 48s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7591 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901050/YARN-7591.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a39a1f190cc7 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 67b2661 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18828/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resou
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282398#comment-16282398 ] Wangda Tan commented on YARN-7591: -- Patch looks good, thanks [~Tao Yang], will commit tomorrow if no objections. > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch, YARN-7591.002.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277731#comment-16277731 ] Wangda Tan commented on YARN-7591: -- Thanks [~Tao Yang], make sense to me, Only one minor suggestion: {code} 323 if (allocation.getAllocateFromReservedContainer() == null) { 324 return false; 325 } {code} Could you add comments above this {{if}} check so in the future we can easier remember why this check if needed. > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276346#comment-16276346 ] Tao Yang commented on YARN-7591: Thanks [~leftnoteasy] for review. {quote} I'm still not sure why commit reserved container before allocate container can result in NPE? And could you add such explanations to the code as well so we can understand why the additional null check is required. {quote} For proposal1 (allocate containerX on node1) , the allocateFromReservedContainer field in ContainerAllocationProposal object is null because {{csAssignment.getFulfilledReservedContainer()}} is null in next code segment. {noformat} allocated = new ContainerAllocationProposal<>( getSchedulerContainer(rmContainer, true), getSchedulerContainersToRelease(csAssignment), getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), false), csAssignment.getType(), //set allocateFromReservedContainer field=null for proposal1 csAssignment.getRequestLocalityType(), csAssignment.getSchedulingMode() != null ? csAssignment.getSchedulingMode() : SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, csAssignment.getResource()); {noformat} When trying to commit proposal1 and making sure node is not reserved by anyone else in {{FiCaSchedulerApp#commonCheckContainerAllocation}}, fromReservedContainer will be got for this check through {{allocation.getAllocateFromReservedContainer().getRmContainer()}} when node already has reserved container(because of proposal2), NPE will be raised because {{allocation.getAllocateFromReservedContainer()}} is null. Please correct me if I misunderstand something. > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274842#comment-16274842 ] genericqa commented on YARN-7591: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 138 unchanged - 0 fixed = 139 total (was 138) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7591 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12900160/YARN-7591.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux abbf38396c4f 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 556aea3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18753/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resou
[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274710#comment-16274710 ] Wangda Tan commented on YARN-7591: -- Thanks [~Tao Yang], Rest of the fixes looks good except #3: bq. (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve containerY on node1) were generated by different async-scheduling threads around the same time and proposal2 was submitted in front of proposal1, NPE is raised when trying to submit proposal2 in FiCaSchedulerApp#commonCheckContainerAllocation. When I check the implementation: {code} if (null == rmContainer) { return null; } FiCaSchedulerApp app = getApplicationAttempt( rmContainer.getApplicationAttemptId()); if (null == app) { return null; } NodeId nodeId; // Get nodeId if (rmContainer.getState() == RMContainerState.RESERVED) { nodeId = rmContainer.getReservedNode(); } else{ nodeId = rmContainer.getNodeId(); } FiCaSchedulerNode node = getNode(nodeId); if (null == node) { return null; } {code} I'm still not sure why commit reserved container before allocate container can result in NPE? And could you add such explanations to the code as well so we can understand why the additional null check is required. > NPE in async-scheduling mode of CapacityScheduler > - > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0-alpha4, 2.9.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-7591.001.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org