[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284620#comment-16284620
 ] 

Wangda Tan commented on YARN-7591:
--

[~subru], cherry-picked to branch-3.0 / branch-2.9 / branch-2. Updated fix 
version. Thanks for the reminding!

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.0.0, 2.9.1
>
> Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-08 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284441#comment-16284441
 ] 

Subru Krishnan commented on YARN-7591:
--

Thanks [~Tao Yang] for the contribution and [~leftnoteasy] for the 
review/commit. 

[~leftnoteasy], I see the commit in trunk but not in branch-2/2.9 so are you 
planning cherry-pick down?

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284430#comment-16284430
 ] 

Hudson commented on YARN-7591:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13350 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13350/])
YARN-7591. NPE in async-scheduling mode of CapacityScheduler. (Tao Yang 
(wangda: rev adca1a72e4eca2ea634551e9fb8e9b878c36cb5c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282233#comment-16282233
 ] 

genericqa commented on YARN-7591:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 138 unchanged - 0 fixed = 139 total (was 138) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7591 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12901050/YARN-7591.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a39a1f190cc7 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 67b2661 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18828/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resou

[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282398#comment-16282398
 ] 

Wangda Tan commented on YARN-7591:
--

Patch looks good, thanks [~Tao Yang], will commit tomorrow if no objections.

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277731#comment-16277731
 ] 

Wangda Tan commented on YARN-7591:
--

Thanks [~Tao Yang], make sense to me,

Only one minor suggestion:
{code}
323   if (allocation.getAllocateFromReservedContainer() == null) {
324 return false;
325   }
{code}

Could you add comments above this {{if}} check so in the future we can easier 
remember why this check if needed.  

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-03 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276346#comment-16276346
 ] 

Tao Yang commented on YARN-7591:


Thanks [~leftnoteasy] for review. 
{quote}
I'm still not sure why commit reserved container before allocate container can 
result in NPE? And could you add such explanations to the code as well so we 
can understand why the additional null check is required.
{quote}
For proposal1 (allocate containerX on node1) , the 
allocateFromReservedContainer field in ContainerAllocationProposal object is 
null because {{csAssignment.getFulfilledReservedContainer()}} is null in next 
code segment.
{noformat} 
allocated = new ContainerAllocationProposal<>(
getSchedulerContainer(rmContainer, true),
getSchedulerContainersToRelease(csAssignment),
getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
false), csAssignment.getType(),   //set 
allocateFromReservedContainer field=null for proposal1
csAssignment.getRequestLocalityType(),
csAssignment.getSchedulingMode() != null ?
csAssignment.getSchedulingMode() :
SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
csAssignment.getResource());
{noformat}
When trying to commit proposal1 and making sure node is not reserved by anyone 
else in {{FiCaSchedulerApp#commonCheckContainerAllocation}}, 
fromReservedContainer will be got for this check through 
{{allocation.getAllocateFromReservedContainer().getRmContainer()}} when node 
already has reserved container(because of proposal2), NPE will be raised 
because {{allocation.getAllocateFromReservedContainer()}} is null.
Please correct me if I misunderstand something.

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-01 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274842#comment-16274842
 ] 

genericqa commented on YARN-7591:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 138 unchanged - 0 fixed = 139 total (was 138) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 22s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m  7s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7591 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12900160/YARN-7591.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux abbf38396c4f 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 556aea3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18753/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resou

[jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler

2017-12-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274710#comment-16274710
 ] 

Wangda Tan commented on YARN-7591:
--

Thanks [~Tao Yang], 

Rest of the fixes looks good except #3:

bq. (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
containerY on node1) were generated by different async-scheduling threads 
around the same time and proposal2 was submitted in front of proposal1, NPE is 
raised when trying to submit proposal2 in 
FiCaSchedulerApp#commonCheckContainerAllocation.

When I check the implementation:

{code}
if (null == rmContainer) {
  return null;
}

FiCaSchedulerApp app = getApplicationAttempt(
rmContainer.getApplicationAttemptId());
if (null == app) {
  return null;
}

NodeId nodeId;
// Get nodeId
if (rmContainer.getState() == RMContainerState.RESERVED) {
  nodeId = rmContainer.getReservedNode();
} else{
  nodeId = rmContainer.getNodeId();
}

FiCaSchedulerNode node = getNode(nodeId);
if (null == node) {
  return null;
} 
{code}

I'm still not sure why commit reserved container before allocate container can 
result in NPE? And could you add such explanations to the code as well so we 
can understand why the additional null check is required.

> NPE in async-scheduling mode of CapacityScheduler
> -
>
> Key: YARN-7591
> URL: https://issues.apache.org/jira/browse/YARN-7591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0-alpha4, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in 
> special scenarios as below.
> (1) The user should be removed after its last application finished, NPE may 
> be raised if getting something from user object without the null check in 
> async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished 
> application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
> RMContainer reservedContainer = node.getReservedContainer();
> if (reservedContainer != null) {
>   FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>   reservedContainer.getContainerId());
>   // NPE here: reservedApplication could be null after this application 
> finished
>   // Try to fulfill the reservation
>   LOG.info(
>   "Trying to fulfill reservation for application " + 
> reservedApplication
>   .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve 
> containerY on node1) were generated by different async-scheduling threads 
> around the same time and proposal2 was submitted in front of proposal1, NPE 
> is raised when trying to submit proposal2 in 
> {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
> if (reservedContainerOnNode != null) {
>   // NPE here: allocation.getAllocateFromReservedContainer() should be 
> null for proposal2 in this case
>   RMContainer fromReservedContainer =
>   allocation.getAllocateFromReservedContainer().getRmContainer();
>   if (fromReservedContainer != reservedContainerOnNode) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
>   "Try to allocate from a non-existed reserved container");
> }
> return false;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org