date:20141221


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255134#comment-14255134
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #48 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/48/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255135#comment-14255135
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #48 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/48/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255144#comment-14255144
 ] 

Hudson commented on YARN-2977:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #782 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/782/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255143#comment-14255143
 ] 

Hudson commented on YARN-2975:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #782 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/782/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

Varun Saxena created YARN-2983:
--

 Summary: NPE possible in ClientRMService#getQueueInfo
 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena


While going through code for checking YARN-2978 , found one issue. 
During construction GetQueueInfoResponse in {{ClientRMService#getQueueInfo}}, 
we first collect application attempts from scheduler and then get apps from a 
ConcurrentHashMap in {{RMContext}}. Although the operation(get/put/remove,etc) 
itself on a ConcurrentHashMap is thread-safe, but a series of multiple 
ConcurrentHashMap#get (say, in a for loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Description: 
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}

  was:
While going through code for checking YARN-2978 , found one issue. 
During construction GetQueueInfoResponse in {{ClientRMService#getQueueInfo}}, 
we first collect application attempts from scheduler and then get apps from a 
ConcurrentHashMap in {{RMContext}}. Although the operation(get/put/remove,etc) 
itself on a ConcurrentHashMap is thread-safe, but a series of multiple 
ConcurrentHashMap#get (say, in a for loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}


 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) is not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Description: 
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) are not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}

  was:
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}


 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: YARN-2983.patch

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: (was: YARN-2983.patch)

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: YARN-2983.patch

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


[ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255156#comment-14255156
 ] 

Hadoop QA commented on YARN-2983:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688574/YARN-2983.patch
  against trunk revision 8f5522e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6163//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6163//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6163//console

This message is automatically generated.

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2983) NPE possible in ClientRMService#getQueueInfo


[ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255157#comment-14255157
 ] 

Varun Saxena commented on YARN-2983:


Findbugs to be addressed by YARN-2937 to YARN-2940

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255158#comment-14255158
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1980 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1980/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255159#comment-14255159
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1980 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1980/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255160#comment-14255160
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/45/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255161#comment-14255161
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/45/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255180#comment-14255180
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #49 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/49/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255179#comment-14255179
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #49 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/49/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255187#comment-14255187
 ] 

Hudson commented on YARN-2975:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1999/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255188#comment-14255188
 ] 

Hudson commented on YARN-2977:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1999/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2980) Move health check script related functionality to hadoop-common


 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2980:
---
Attachment: YARN-2980.001.patch

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

[
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255232#comment-14255232
]

Hadoop QA commented on YARN-2980:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12688579/YARN-2980.001.patch
against trunk revision 8f5522e.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 8 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 21 new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager

org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6164//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6164//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6164//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6164//console

This message is automatically generated.

Move health check script related functionality to hadoop-common
---

Key: YARN-2980
URL: https://issues.apache.org/jira/browse/YARN-2980
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
Attachments: YARN-2980.001.patch

HDFS might want to leverage health check functionality available in YARN in
both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode
https://issues.apache.org/jira/browse/HDFS-7441.
We can move health check functionality including the protocol between hadoop
daemons and health check script to hadoop-common. That will simplify the
development and maintenance for both hadoop source code and health check
script.
Thoughts?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2980) Move health check script related functionality to hadoop-common


 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2980:
---
Attachment: YARN-2980.002.patch

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common


[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255253#comment-14255253
 ] 

Hadoop QA commented on YARN-2980:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688584/YARN-2980.002.patch
  against trunk revision 8f5522e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 21 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.ha.TestZKFailoverController

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6165//console

This message is automatically generated.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common


[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255296#comment-14255296
 ] 

Varun Saxena commented on YARN-2980:


Test Failure unrelated. Passing in local.
Findbugs to be addressed by other JIRAs.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common


[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255421#comment-14255421
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision 7bc0a6d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6166//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6166//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6166//console

This message is automatically generated.

 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common


[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255484#comment-14255484
 ] 

Varun Saxena commented on YARN-2980:


[~mingma], kindly look at the patch. Going by the patch for HDFS-7400, it seems 
LocalDirsHandlerService is not required hence that would still be part of 
nodemanager.


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode