[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255134#comment-14255134
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #48 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/48/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255135#comment-14255135
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #48 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/48/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255144#comment-14255144
 ] 

Hudson commented on YARN-2977:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #782 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/782/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255143#comment-14255143
 ] 

Hudson commented on YARN-2975:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #782 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/782/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-2983:
--

 Summary: NPE possible in ClientRMService#getQueueInfo
 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena


While going through code for checking YARN-2978 , found one issue. 
During construction GetQueueInfoResponse in {{ClientRMService#getQueueInfo}}, 
we first collect application attempts from scheduler and then get apps from a 
ConcurrentHashMap in {{RMContext}}. Although the operation(get/put/remove,etc) 
itself on a ConcurrentHashMap is thread-safe, but a series of multiple 
ConcurrentHashMap#get (say, in a for loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Description: 
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}

  was:
While going through code for checking YARN-2978 , found one issue. 
During construction GetQueueInfoResponse in {{ClientRMService#getQueueInfo}}, 
we first collect application attempts from scheduler and then get apps from a 
ConcurrentHashMap in {{RMContext}}. Although the operation(get/put/remove,etc) 
itself on a ConcurrentHashMap is thread-safe, but a series of multiple 
ConcurrentHashMap#get (say, in a for loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}


 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) is not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete 

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Description: 
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) are not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}

  was:
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in 
{{ClientRMService#getQueueInfo}}, we first collect application attempts from 
scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for 
loop) is not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a 
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But 
there is no null check inside this for loop before dereferencing the value 
returned i.e. rmApp. Although all the applicationattempts  have been fetched 
for the queue just above the for loop, but as this block of code is not 
synchronized, there is a possibility that another thread may delete RMApp from 
the ConcurrentHashMap at the same time. This can happen when an app 
finishes/completes and number of completed apps exceed the config  
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can 
occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
  throws YarnException {
  .
  if (request.getIncludeApplications()) {
ListApplicationAttemptId apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayListApplicationReport(apps.size());
for (ApplicationAttemptId app : apps) {
  RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
  appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
  }
  ..
}
{code}


 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another 

[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: YARN-2983.patch

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: (was: YARN-2983.patch)

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena

 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2983:
---
Attachment: YARN-2983.patch

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255156#comment-14255156
 ] 

Hadoop QA commented on YARN-2983:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688574/YARN-2983.patch
  against trunk revision 8f5522e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6163//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6163//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6163//console

This message is automatically generated.

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2983) NPE possible in ClientRMService#getQueueInfo

2014-12-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255157#comment-14255157
 ] 

Varun Saxena commented on YARN-2983:


Findbugs to be addressed by YARN-2937 to YARN-2940

 NPE possible in ClientRMService#getQueueInfo
 

 Key: YARN-2983
 URL: https://issues.apache.org/jira/browse/YARN-2983
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2983.patch


 While going through code for checking YARN-2978 , found one issue. 
 During construction of {{GetQueueInfoResponse}} in 
 {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
 scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
 Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
 thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
 for loop) are not. 
 For instance, in code below, we are calling rmContext.getRMApps()#get in a 
 loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
 But there is no null check inside this for loop before dereferencing the 
 value returned i.e. rmApp. Although all the applicationattempts  have been 
 fetched for the queue just above the for loop, but as this block of code is 
 not synchronized, there is a possibility that another thread may delete RMApp 
 from the ConcurrentHashMap at the same time. This can happen when an app 
 finishes/completes and number of completed apps exceed the config  
 {{yarn.resourcemanager.max-completed-applications}}.
 I think there should be a null check inside this for loop, otherwise a NPE 
 can occur.
 {code:title=ClientRMService#getQueueInfo}
 public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
   throws YarnException {
   .
   if (request.getIncludeApplications()) {
 ListApplicationAttemptId apps =
 scheduler.getAppsInQueue(request.getQueueName());
 appReports = new ArrayListApplicationReport(apps.size());
 for (ApplicationAttemptId app : apps) {
   RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
   appReports.add(rmApp.createAndGetApplicationReport(null, true));
 }
   }
   ..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255158#comment-14255158
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1980 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1980/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255159#comment-14255159
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1980 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1980/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255160#comment-14255160
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/45/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255161#comment-14255161
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/45/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255180#comment-14255180
 ] 

Hudson commented on YARN-2977:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #49 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/49/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255179#comment-14255179
 ] 

Hudson commented on YARN-2975:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #49 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/49/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255187#comment-14255187
 ] 

Hudson commented on YARN-2975:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1999/])
YARN-2975. FSLeafQueue app lists are accessed without required locks. (kasha) 
(kasha: rev 24ee9e3431d27811530ffa01d8d241133fd643fe)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255188#comment-14255188
 ] 

Hudson commented on YARN-2977:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1999/])
YARN-2977. Fixed intermittent TestNMClient failure. (Contributed by Junping Du) 
(ozawa: rev cf7fe583d14ebb16fc1b6e29dc2afbf67d24b9d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
CHANGES.txt: add YARN-2977 (ozawa: rev 76b0370a27c482caff9498e15ef889d37f413ce7)
* hadoop-yarn-project/CHANGES.txt
CHANGES.txt: move YARN-2977 from 2.6.1 to 2.7.0 (ozawa: rev 
8f5522ed9913ab175c422cbf89928742243c207e)
* hadoop-yarn-project/CHANGES.txt


 TestNMClient get failed intermittently 
 ---

 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.7.0

 Attachments: YARN-2977.patch


 There are still some test failures for TestNMClient in slow testbed. Like my 
 comments in YARN-2148, the container could be finished before 
 CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
 add more message for test case.
 The failure stack:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2980:
---
Attachment: YARN-2980.001.patch

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255232#comment-14255232
 ] 

Hadoop QA commented on YARN-2980:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688579/YARN-2980.001.patch
  against trunk revision 8f5522e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 21 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6164//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6164//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6164//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6164//console

This message is automatically generated.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2980:
---
Attachment: YARN-2980.002.patch

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255253#comment-14255253
 ] 

Hadoop QA commented on YARN-2980:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688584/YARN-2980.002.patch
  against trunk revision 8f5522e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 21 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.ha.TestZKFailoverController

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6165//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6165//console

This message is automatically generated.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255296#comment-14255296
 ] 

Varun Saxena commented on YARN-2980:


Test Failure unrelated. Passing in local.
Findbugs to be addressed by other JIRAs.

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255421#comment-14255421
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision 7bc0a6d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6166//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6166//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6166//console

This message is automatically generated.

 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255484#comment-14255484
 ] 

Varun Saxena commented on YARN-2980:


[~mingma], kindly look at the patch. Going by the patch for HDFS-7400, it seems 
LocalDirsHandlerService is not required hence that would still be part of 
nodemanager.


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Attachments: YARN-2980.001.patch, YARN-2980.002.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255497#comment-14255497
 ] 

Varun Saxena commented on YARN-2962:


[~rakeshr], thanks for your input. ApplicationID in YARN is of the format 
{noformat}application_[cluster timestamp]_[sequence number]{noformat}
Here sequence number has 4 digits and is in the range -. 
Going along the lines of what you are saying, I think we can break the sequence 
number part of ApplicationID as cluster timestamp will probably be same for 
most of the application IDs'. My suggestion is to have it as 
{noformat}(app_root)\application_[cluster timestamp]_\[first 2 digits of 
sequence number]\[last 2 digits]{noformat}
We can view it as under :
{noformat}
   * |--- RM_APP_ROOT
   * | |- (application_{cluster timestamp}_)
   * | ||- (00 to 99)
   * | |||-- (00 to 99)
   * | ||| |- (#ApplicationAttemptIds)
{noformat}

[~rakeshr] and [~kasha], kindly comment on the approach. One constraint is that 
this would entail a larger number of contacts to ZK when RM is recovering.
I am not sure how many znodes can lead to reaching limit of 1 MB. We can break 
sequence number as 1 digit and last 3 digit as well.

Moreover, I dont see much of an issue with application attempt znodes as 
max-attempts by default are limited to 2. 

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical

 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-12-21 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255533#comment-14255533
 ] 

Anubhav Dhoot commented on YARN-2664:
-

Overall this is a very natural way to visualize reservations, great job!!. It 
maps to the mental model of skylines.

high level comments/questions
a) This is a view of reservations and does not indicate actual allocations 
right? But the legend for y axis says Utilization GB. Allocation  would be a 
great addition (knowing how much is left of my reservation etc)
b) This shows everything in terms of memory but not cpu, right? Should we add a 
switch to show both and in future other resource types? Showing them together 
is more correct but harder to visualize.
c) Should we also show total plan capacity as the end of the yaxis or an 
explicit ceiling line? 

Minor usability issues
a) How is the time window for the slider and time window that is selected in 
the slider chosen? Some times it would keep the slider to some point before the 
current time at other times it would show future time as part of the view. Also 
if there was no reservations it would not go to current time until a new 
reservation shows up?
b) related to previous, why does refresh button on page allow me to move the 
chosen time window forward but not the refresh button. Maybe rename the refresh 
to refresh queues? Also provide a refresh time button if c) below cannot be 
solved?
c) is there a query parameter or some other way to get back to a specific 
queue? That would help not having to do drop down every time i refresh the page



 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.patch, legal.patch, 
 screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)