[jira] [Created] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval

2015-02-25 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3259:
---

 Summary: FairScheduler: Update to fairShare could be triggered 
early on node events instead of waiting for update interval 
 Key: YARN-3259
 URL: https://issues.apache.org/jira/browse/YARN-3259
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Instead of waiting for update interval unconditionally, we can trigger early 
updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-25 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3248:

Attachment: apache-yarn-3248.0.patch
Screenshot.jpg

Uploaded patch with fix and screenshot showing the new apps UI

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screenshot.jpg, apache-yarn-3248.0.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336534#comment-14336534
 ] 

Hudson commented on YARN-3247:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #106 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/106/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336526#comment-14336526
 ] 

Hudson commented on YARN-2980:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #106 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/106/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336452#comment-14336452
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700724/YARN-3249.3.patch
  against trunk revision ad8ed3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6731//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6731//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6731//console

This message is automatically generated.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.patch, killapp-failed.log, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336451#comment-14336451
 ] 

Tsuyoshi Ozawa commented on YARN-1809:
--

[~xgong] The failure of TestRMWebAppFairScheduler looks related - could you 
double check?

Minor nits: ApplicationHistoryClientService.java: is cancelDelegationToken's 
comment correct?
{code}
+  @Override
+  public CancelDelegationTokenResponse cancelDelegationToken(
+  CancelDelegationTokenRequest request) throws YarnException, IOException {
+// TODO Auto-generated method stub
+return null;
+  }
{code}

TestAHSWebServices.java: Noothing should be Nothing.
{code}
+  protected void serviceStart() throws Exception {
+// Do Noothing
+  }
{code}



 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.11.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, 
 YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, 
 YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336502#comment-14336502
 ] 

Hadoop QA commented on YARN-3258:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700747/YARN-3258.001.patch
  against trunk revision ad8ed3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6732//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6732//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6732//console

This message is automatically generated.

 FairScheduler: Need to add more logging to investigate allocations
 --

 Key: YARN-3258
 URL: https://issues.apache.org/jira/browse/YARN-3258
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Minor
 Attachments: YARN-3258.001.patch


 Its hard to investigate allocation failures without any logging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations

2015-02-25 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3258:
---

 Summary: FairScheduler: Need to add more logging to investigate 
allocations
 Key: YARN-3258
 URL: https://issues.apache.org/jira/browse/YARN-3258
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Minor


Its hard to investigate allocation failures without any logging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336413#comment-14336413
 ] 

Hudson commented on YARN-3247:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #849 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/849/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations

2015-02-25 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3258:

Attachment: YARN-3258.001.patch

 FairScheduler: Need to add more logging to investigate allocations
 --

 Key: YARN-3258
 URL: https://issues.apache.org/jira/browse/YARN-3258
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Minor
 Attachments: YARN-3258.001.patch


 Its hard to investigate allocation failures without any logging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-25 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3249:
-
Attachment: killapp-failed2.log

[~ryu_kobayashi] I faced another 500 error. Please check attached log. BTW, is 
it possible to add a test to TestRMWebServicesApps? It would be good to avoid 
this kind of errors.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336513#comment-14336513
 ] 

Hudson commented on YARN-3247:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2047/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java
* hadoop-yarn-project/CHANGES.txt


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336505#comment-14336505
 ] 

Hudson commented on YARN-2980:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2047/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336524#comment-14336524
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700767/killapp-failed2.log
  against trunk revision ad8ed3e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6735//console

This message is automatically generated.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336459#comment-14336459
 ] 

Hadoop QA commented on YARN-3259:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700748/YARN-3259.001.patch
  against trunk revision ad8ed3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6733//console

This message is automatically generated.

 FairScheduler: Update to fairShare could be triggered early on node events 
 instead of waiting for update interval 
 --

 Key: YARN-3259
 URL: https://issues.apache.org/jira/browse/YARN-3259
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3259.001.patch


 Instead of waiting for update interval unconditionally, we can trigger early 
 updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2776) In HA mode, can't set ip but hostname to yarn.resourcemanager.webapp.address.*

2015-02-25 Thread Mukesh Jha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336540#comment-14336540
 ] 

Mukesh Jha commented on YARN-2776:
--

I'm facing this exact issue as described in 
https://issues.apache.org/jira/browse/SPARK-5837 (comments)

From the Driver logs [3] I can see that SparkUI started on a specified port, 
also my YARN app tracking URL[1] points to that port which is in turn getting 
redirected to the proxy URL[2] which gives me java.net.BindException: Cannot 
assign requested address.
If there was a port conflict issue the sparkUI stark will have issues but that 
id not the case.
[1] YARN: 
application_1424814313649_0006  spark-realtime-MessageStoreWriter   SPARK   
ciuser  root.ciuser RUNNING UNDEFINED   10% 
http://host21.cloud.com:44648
[2] ProxyURL: 
http://host28.cloud.com:8088/proxy/application_1424814313649_0006/
[3] LOGS:
15/02/25 04:25:02 INFO util.Utils: Successfully started service 'SparkUI' on 
port 44648.
15/02/25 04:25:02 INFO ui.SparkUI: Started SparkUI at 
http://host21.cloud.com:44648
15/02/25 04:25:02 INFO cluster.YarnClusterScheduler: Created 
YarnClusterScheduler
15/02/25 04:25:02 INFO netty.NettyBlockTransferService: Server created on 41518

I am running a spark-streaming app inside YARN. I have Spark History server 
running as well (Do we need it running to access UI?).
The app is running fine as expected but the Spark's web UI is not accessible.
When I try to access the ApplicationMaster of the Yarn application I get the 
below error.
java.net.BindException: Cannot assign requested address
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
at java.net.Socket.bind(Socket.java:631)
at java.net.Socket.init(Socket.java:423)
at java.net.Socket.init(Socket.java:280)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:555)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336405#comment-14336405
 ] 

Hudson commented on YARN-2980:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #849 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/849/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval

2015-02-25 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3259:

Attachment: YARN-3259.001.patch

 FairScheduler: Update to fairShare could be triggered early on node events 
 instead of waiting for update interval 
 --

 Key: YARN-3259
 URL: https://issues.apache.org/jira/browse/YARN-3259
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3259.001.patch


 Instead of waiting for update interval unconditionally, we can trigger early 
 updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336595#comment-14336595
 ] 

Hudson commented on YARN-3247:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/115/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java
* hadoop-yarn-project/CHANGES.txt


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-25 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336641#comment-14336641
 ] 

Varun Vasudev commented on YARN-3248:
-

Test case failure and findbugs warnings are not related to this patch. There 
are no tests since it's a UI change.

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screenshot.jpg, apache-yarn-3248.0.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-25 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3260:


 Summary: NPE if AM attempts to register before RM processes launch 
event
 Key: YARN-3260
 URL: https://issues.apache.org/jira/browse/YARN-3260
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe


The RM on one of our clusters was running behind on processing AsyncDispatcher 
events, and this caused AMs to fail to register due to an NPE.  The AM was 
launched and attempting to register before the RMAppAttemptImpl had processed 
the LAUNCHED event, and the client to AM token had not been generated yet.  The 
NPE occurred because the ApplicationMasterService tried to encode the missing 
token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336634#comment-14336634
 ] 

Hadoop QA commented on YARN-3248:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700766/apache-yarn-3248.0.patch
  against trunk revision ad8ed3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6734//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6734//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6734//console

This message is automatically generated.

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screenshot.jpg, apache-yarn-3248.0.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3260:
---

Assignee: Naganarasimha G R

 NPE if AM attempts to register before RM processes launch event
 ---

 Key: YARN-3260
 URL: https://issues.apache.org/jira/browse/YARN-3260
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R

 The RM on one of our clusters was running behind on processing 
 AsyncDispatcher events, and this caused AMs to fail to register due to an 
 NPE.  The AM was launched and attempting to register before the 
 RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
 had not been generated yet.  The NPE occurred because the 
 ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336621#comment-14336621
 ] 

Hudson commented on YARN-3247:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2065 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2065/])
YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. 
Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java
* hadoop-yarn-project/CHANGES.txt


 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336587#comment-14336587
 ] 

Hudson commented on YARN-2980:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/115/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336647#comment-14336647
 ] 

Jason Lowe commented on YARN-3260:
--

Sample stack trace from the AM side when it fails to register:

{noformat}
2015-02-25 06:26:26,908 ERROR [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while 
registering
java.lang.NullPointerException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:294)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2079)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2075)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2073)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy80.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:161)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:241)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:819)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1087)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:294)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at 

[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336678#comment-14336678
 ] 

Jason Lowe commented on YARN-3239:
--

Committing this.

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336613#comment-14336613
 ] 

Hudson commented on YARN-2980:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2065 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2065/])
YARN-2980. Move health check script related functionality to hadoop-common 
(Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java


 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena
 Fix For: 3.0.0

 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
 YARN-2980.003.patch, YARN-2980.004.patch


 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336665#comment-14336665
 ] 

Naganarasimha G R commented on YARN-3260:
-

Hi [~jlowe], I would like to work on this issue but if you are already planning 
to fix this please feel free to reassign.

 NPE if AM attempts to register before RM processes launch event
 ---

 Key: YARN-3260
 URL: https://issues.apache.org/jira/browse/YARN-3260
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R

 The RM on one of our clusters was running behind on processing 
 AsyncDispatcher events, and this caused AMs to fail to register due to an 
 NPE.  The AM was launched and attempting to register before the 
 RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
 had not been generated yet.  The NPE occurred because the 
 ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336693#comment-14336693
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7198 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7198/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-25 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336697#comment-14336697
 ] 

Ming Ma commented on YARN-3231:
---

LGTM.

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336724#comment-14336724
 ] 

Jason Lowe commented on YARN-2902:
--

We still need to do this for APPLICATION resources.  It is true that those 
resources will be cleaned up when the application finishes, but that could be 
hours or days later.  And as for PUBLIC resources, Sangjin confirmed earlier 
he's seen the orphaning occur with those resources as well, so it must be 
occurring somehow even for those.  [~sjlee0] do you have any ideas on how 
PUBLIC resources ended up hung in a DOWNLOADING state?  I'm wondering if this 
is specific to the shared cache setup or if there's a code path we're missing.

I don't think we should special case the resource types to fix this.  Again I 
think the cleanest approach is to make sure we send an event to the 
LocalizedResource when a container localizer (or maybe just the container 
itself) is killed, and let that state machine handle it appropriately (e.g.: 
try to remove the _tmp file if the resource was in the downloading state, 
ignore it if it's already localized, etc.).

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly

2015-02-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336747#comment-14336747
 ] 

zhihai xu commented on YARN-3247:
-

Thanks [~ozawa] for reviewing and committing the patch! 
Greatly appreciated.
zhihai

 TestQueueMappings should use CapacityScheduler explicitly
 -

 Key: YARN-3247
 URL: https://issues.apache.org/jira/browse/YARN-3247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3247.000.patch


 TestQueueMappings is only supported by CapacityScheduler.
 We should configure CapacityScheduler for this test. Otherwise if the default 
 scheduler is set to FairScheduler, the test will fail with the following 
 message:
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
 testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings)
   Time elapsed: 2.202 sec   ERROR!
 java.lang.ClassCastException: 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot 
 be cast to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336763#comment-14336763
 ] 

Hadoop QA commented on YARN-2902:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685607/YARN-2902.002.patch
  against trunk revision 1a68fc4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6736//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6736//console

This message is automatically generated.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336770#comment-14336770
 ] 

Tsuyoshi Ozawa commented on YARN-3248:
--

[~vvasudev] Thanks for taking this issue. This change looks very useful. I have 
some comments against 1st patch:

The blacklist is an instance of HashSet, so it can throw 
ConcurrentModificationException when blacklist is modified in another thread. 
One alternative is to use Collections.newSetFromMap(new 
ConcurrentHashMapObject,Boolean()) instead of HashSet.
{code}
+  public SetString getBlacklistedNodes() {
+return this.appSchedulingInfo.getBlackList();
+  }
{code}

If AbstractYarnScheduler#getApplicationAttempt() can be used, I think it's more 
straightforward and simple. What do you think?
{code}
+  private CapacityScheduler scheduler = null;
{code}

Could you add tests to TestRMWebServicesApps?


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screenshot.jpg, apache-yarn-3248.0.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336691#comment-14336691
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe], looked into it. Was able to simulate the issue as well for PRIVATE 
resources.
I think we need to handle only for PRIVATE resources. APPLICATION resources 
will be cleaned up when application finishes. And PUBLIC resources should not 
remain orphaned as we do not kill or stop PublicLocalizer in between.

To download the resource, FSDownload appends a _tmp at the end of the directory 
to which resource will be downloaded to.
And while processing HB from Container Localizer, NM sends a destination path 
for the resource to be downloaded in response. 
We also download one resource at a time.

So, we can store this destination path in a queue in LocalizerRunner whenever 
we are sending a new path for download and remove it when fetch is successful. 
When container is killed (which causes LocalizerRunner to be cleaned up) we can 
fetch the path from the front of the queue and submit the associated temp path 
for deletion to DeletionService, if ref count for the resource is 0.

We cannot do this cleanup in ContainerLocalizer as LCE launches it as a new 
process and kills it when LocalizerRunner is interrupted.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336707#comment-14336707
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~Naganarasimha] and [~rkanter] for review and comments!

bq. I feel AM should be informed of AggregatorAddr as early as register itself 
than currently being done in ApplicationMasterService.allocate().
That's a good point. Another idea (from Vinod in offline discussion) is to add 
a blocking call in AMRMClient to get aggregator address directly from RM. 
AMRMClient can be wrapped into TimelineClient so no aggregator address or 
aggregator failure can be handled transparently. Thoughts?

bq. For NM's too, would it be better to update during registering itself (may 
be recovered during recovery, not sure though) thoughts?
I think NM case is slightly different here: NM need this knowledge whenever the 
first container of this app get allocated/launched, so get things updated in 
heartbeat sounds good enough. Isn't it? In addition, if adding a new API in 
AMRMClient can be accepted, NM will use TimelineClient too so can handle 
service discovery automatically.


bq. Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on 
YARN-3030 (Aggregators collection through NM's Aux service), 
PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so 
will AppLevelAggregatorService inform RM about the aggregator for the 
application? and then RM will inform NM about the appAggregatorAddr as part of 
heart beat response ? if this is the flow will there be chances of race 
condition where in before NM gets appAggregatorAddr from RM, NM might require 
to post some AM container Entities/events?
I think we can discuss this flow in two scenarios, the first time launch of app 
aggregator and app aggregator failed over on another NM:
For the first time launch of app aggregator, NM aux service will bind the app 
aggregator to perNodeAggregator when AM container get allocated (per 
YARN-3030). NM will notify RM that this new appAggregator is ready for use in 
next heartbeat to RM (missing in this patch). After received this messsage from 
NM, RM with update its aggregator list and send 
RMAppEventType.AGGREGATOR_UPDATE to trigger persistent of aggregator list 
updating in RMStateStore (for RM failed over).
For app aggregator get failed over, AM or NMs (who called putEntities with 
timelineClient) will notify RM on this failure, RM verify the out of service 
for this app aggregator first and kick off rebind appAggregator to another NM's 
perNodeAggregatorService in next heartbeat comes. When hear back from this new 
NM, RM did the same thing as the 1st case.
One gap here today is we launched appAggregatorService (by NM's auxiliary 
service) whenever AM container get launched, no matter first time launch or 
rescheduled as failed before. As my early comments above - AM container failed 
over with rescheduled to other NM may not have to cause rebind of aggregator 
service just like out of service for app's aggregator may not cause AM 
container get killed. So I think appAggregatorService should get launched by NM 
automatic only in first attemp and taken care by RM in next attempts. 
About rack condition between NM heartbeat with posting entities, I don't think 
posting entities should block any major logic especially NM heartbeat. In 
addition, if we make TimelineClient can handle service discovery automatically, 
this will never happen. What do you think?

bq. Sorry for not commenting earlier. Thanks for taking this up Junping Du.
No worry. Thanks!

bq. Not using YARN-913 is fine if it's not going to make sense. I haven't 
looked too closely at it either; it just sounded like it might be helpful here.
Agree. My feeling now is service discovery get couple tightly with service 
lifecycle management. Given our app aggregator service - not inside of a 
dedicated container, but have many options, and its consumer include YARN 
components but not only AM. So I think YARN-913 may not be the best fit at this 
moment.
 [~ste...@apache.org] is the main author of YARN-913. Steve, do you have any 
comments here?

bq. Given that a particular NM is only interested in the Applications that are 
running on it, is there some way to have it only receive the aggregator info 
for those apps? This would decrease the amount of throw away data that gets 
sent.
In current patch, RM only send NM the aggregator lists for active Apps on this 
container. Please check the code in ResourceTrackerService:  
{code}
+ConcurrentMapApplicationId, String liveAppAggregatorsMap = new 
+ConcurrentHashMapApplicationId, String();
+ListApplicationId keepAliveApps = 
remoteNodeStatus.getKeepAliveApplications();
+if (keepAliveApps != null) {
+  ConcurrentMapApplicationId, RMApp rmApps = rmContext.getRMApps();
+  for (ApplicationId appId : keepAliveApps) {
+

[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336828#comment-14336828
 ] 

Hadoop QA commented on YARN-1853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640461/YARN-1853-trunk.patch
  against trunk revision 5731c0e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6737//console

This message is automatically generated.

 Allow containers to be ran under real user even in insecure mode
 

 Key: YARN-1853
 URL: https://issues.apache.org/jira/browse/YARN-1853
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1853-trunk.patch, YARN-1853.patch


 Currently unsecure cluster runs all containers under one user (typically 
 nobody). That is not appropriate, because yarn applications doesn't play well 
 with hdfs having enabled permissions. Yarn applications try to write data (as 
 expected) into /user/nobody regardless of user, who launched application.
 Another sideeffect is that it is not possible to configure cgroups for 
 particular users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode

2015-02-25 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336805#comment-14336805
 ] 

Ravi Prakash commented on YARN-1853:


This seems like a dupe of YARN-2424 . Andrey! Could you please confirm?

 Allow containers to be ran under real user even in insecure mode
 

 Key: YARN-1853
 URL: https://issues.apache.org/jira/browse/YARN-1853
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1853-trunk.patch, YARN-1853.patch


 Currently unsecure cluster runs all containers under one user (typically 
 nobody). That is not appropriate, because yarn applications doesn't play well 
 with hdfs having enabled permissions. Yarn applications try to write data (as 
 expected) into /user/nobody regardless of user, who launched application.
 Another sideeffect is that it is not possible to configure cgroups for 
 particular users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336811#comment-14336811
 ] 

Zhijie Shen commented on YARN-3240:
---

Thanks for committing the patch.

PS: .keep seems to be missed, I added it in a followup commit.

 [Data Mode] Implement client API to put generic entities
 

 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-2928

 Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-977) Interface for users/AM to know actual usage by the container

2015-02-25 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336808#comment-14336808
 ] 

Ravi Prakash commented on YARN-977:
---

Is this related to YARN-1856? 

 Interface for users/AM to know actual usage by the container
 

 Key: YARN-977
 URL: https://issues.apache.org/jira/browse/YARN-977
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi

 Today we allocate resource (memory and cpu) and node manager starts the 
 container with requested resource [I am assuming they are using cgroups]. But 
 there is definitely a possibility of users requesting more than what they 
 actually may need during the execution of their container/job-task. If we add 
 a way for users/AM to know the actual usage of the requested/completed 
 container then they may optimize it for next run..
 This will be helpful for AM to optimize cpu/memory resource requests by 
 querying NM/RM to know avg/max cpu/memory usage of the container or may be 
 containers belonging to application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode

2015-02-25 Thread Andrey Stepachev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336880#comment-14336880
 ] 

Andrey Stepachev commented on YARN-1853:


[~raviprak] not exactly. basically it does the same, but this patch also adds 
check that user actually exists, and sends reject if not. without that check RM 
will fail with exception and user will not know that request was failed due of 
misconfiguration in user/group mapping.

 Allow containers to be ran under real user even in insecure mode
 

 Key: YARN-1853
 URL: https://issues.apache.org/jira/browse/YARN-1853
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1853-trunk.patch, YARN-1853.patch


 Currently unsecure cluster runs all containers under one user (typically 
 nobody). That is not appropriate, because yarn applications doesn't play well 
 with hdfs having enabled permissions. Yarn applications try to write data (as 
 expected) into /user/nobody regardless of user, who launched application.
 Another sideeffect is that it is not possible to configure cgroups for 
 particular users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2015-02-25 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved YARN-1943.

Resolution: Duplicate

Marking as dupe of YARN-2424. Please reopen if my understanding is incorrect

 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337092#comment-14337092
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700829/YARN-1809.12.patch
  against trunk revision 5731c0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilter
  
org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6739//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6739//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6739//console

This message is automatically generated.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
 YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: (was: YARN-3031.03.patch)

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337011#comment-14337011
 ] 

Allen Wittenauer commented on YARN-3168:


some nits:

* table of contents have extra spacing (global)
* RM restart doc has broken rendering for containerid
* docker: job config has broken rendering
* yarn secure containers: link title is broken for nested jobs




 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
 YARN-3168.20150225.2.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336923#comment-14336923
 ] 

Wangda Tan commented on YARN-3197:
--

There's one difference, when container id and app id is null, it is
{{containerId=xx completed with status=yyy from completed or unknown 
application id=zzz}}

And when RMContainer is not null, but cannot find app id, it should be:
{{containerId=xx completed with status=yyy from completed application 
id=zzz}}.

The 2nd one's application shouldn't be indicated as unknown

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits

2015-02-25 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-3261:
--

 Summary: rewrite resourcemanager restart doc to remove roadmap 
bits 
 Key: YARN-3261
 URL: https://issues.apache.org/jira/browse/YARN-3261
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer


Another mixture of roadmap and instruction manual that seems to be ever present 
in a lot of the recently written documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: YARN-3031.03.patch

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337008#comment-14337008
 ] 

Hadoop QA commented on YARN-3031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700828/YARN-3031.03.patch
  against trunk revision 5731c0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6738//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6738//console

This message is automatically generated.

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: YARN-3031.03.patch


Uploading a new patch with Identifier instead of ApplicationId as per 
[~zjshen]'s review suggestion. 

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: YARN-3031.03.patch

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: (was: YARN-3031.03.patch)

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-25 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1809:

Attachment: YARN-1809.12.patch

get rid of the changes for FairSchedulerAppsBlock, let us file a separate 
ticket for FairScheduler changes.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
 YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3197:
---
Attachment: YARN-3197.003.patch

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, 
 YARN-3197.003.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-02-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3039:
-
Summary: [Aggregator wireup] Implement ATS app-appgregator service 
discovery  (was: [Aggregator wireup] Implement ATS writer service discovery)

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336994#comment-14336994
 ] 

Junping Du commented on YARN-3031:
--

Hi [~vrushalic], thanks for updating the patch. A quick question here: 
Looks like we only specify three tracks for entity: FLOW, USER and QUEUE
{code}
+public enum AggregateUpTo {
+   FLOW,
+   USER,
+   QUEUE
+}
{code}
But we have more tracks (type) in YARN-3041.
{code}
+public enum TimelineEntityType {
+  YARN_CLUSTER,
+  YARN_FLOW,
+  YARN_APPLICATION,
+  YARN_APPLICATION_ATTEMPT,
+  YARN_CONTAINER,
+  YARN_USER,
+  YARN_QUEUE;
{code}
May be we should add more here?

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337144#comment-14337144
 ] 

Vinod Kumar Vavilapalli commented on YARN-3197:
---

You can infer ApplicationID from the ContainerID, so need for printing both.

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, 
 YARN-3197.003.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337148#comment-14337148
 ] 

Vinod Kumar Vavilapalli commented on YARN-3031:
---

Quick comment: Following the proposal at YARN-3166, let's put the storage APIs 
in hadoop-yarn-server/hadoop-yarn-server-timelineservice under package 
o.a.h.yarn.server.timelineservice.storage? We shouldn't be exposing these to 
application writers as part of the hadoop-yarn-api records.

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3254) HealthReport should include disk full information

2015-02-25 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3254:

Attachment: YARN-3254-002.patch

v2 patch fixes log formatting.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png, 
 YARN-3254-001.patch, YARN-3254-002.patch


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3254) HealthReport should include disk full information

2015-02-25 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3254:

Attachment: YARN-3254-001.patch

Attaching a patch to add a new public method {{getDisksHealthReport(boolean, 
boolean)}}, and deprecate the existing method {{getDisksHealthReport(boolean)}} 
for backward compatibility. I'll attach a screen shot later.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png, 
 YARN-3254-001.patch


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Description: It would be useful to surface the resource requests table on 
the application web page to facilitate  scheduling analysis and debugging.  
(was: It would be useful to surface the resource requests table on the web page 
to facilitate  scheduling analysis and debugging.)

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He

 It would be useful to surface the resource requests table on the application 
 web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)
Jian He created YARN-3262:
-

 Summary: Suface application resource requests table
 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He


It would be useful to surface the resource requests table on the web page to 
facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-02-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337301#comment-14337301
 ] 

zhihai xu commented on YARN-2893:
-

I feel I know what cause this issue.
This issue most likely in the following code:
{code}
   Credentials credentials = new Credentials();
DataInputByteBuffer dibb = new DataInputByteBuffer();
if (container.getTokens() != null) {
  // TODO: Don't do this kind of checks everywhere.
  dibb.reset(container.getTokens());
  credentials.readTokenStorageStream(dibb);
}
{code}
we didn't rewind the token after credentials.readTokenStorageStream(dibb),
I checked the code in DataInputByteBuffer. It will move the position of 
ByteBuffer (container.getTokens) in DataInputByteBuffer.read.
(HeapByteBuffer is used for container.getTokens).
{code}
public int read(byte[] b, int off, int len) {
  if (bidx = buffers.length) {
return -1;
  }
  int cur = 0;
  do {
int rem = Math.min(len, buffers[bidx].remaining());
buffers[bidx].get(b, off, rem);
cur += rem;
off += rem;
len -= rem;
  } while (len  0  ++bidx  buffers.length);
  pos += cur;
  return cur;
}
{code}
So If exception happen in AMLauncher.setupTokens before the ByteBuffer changed 
in container.setTokens.
Then the position of ByteBuffer of Tokens will be at the end and we will see 
this issue next time when we retry.
So it think it will be good to add  container.getTokens().rewind() after 
credentials.readTokenStorageStream(dibb);.
I will create a patch for this.


 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Attachment: (was: YARN-3251.trunk.1.patch)

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Attachment: YARN-3251.trunk.1.patch

Attached ver.1 patch against trunk (YARN-3251.trunk.1.patch)

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337305#comment-14337305
 ] 

Wangda Tan commented on YARN-3251:
--

Found few typos, removed patch and will upload again.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337311#comment-14337311
 ] 

Jason Lowe commented on YARN-3262:
--

Duplicate of YARN-451?

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He

 It would be useful to surface the resource requests table on the application 
 web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Attachment: YARN-3251.trunk.1.patch

Attached ver.1 patch against trunk

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3254) HealthReport should include disk full information

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337317#comment-14337317
 ] 

Hadoop QA commented on YARN-3254:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700863/YARN-3254-001.patch
  against trunk revision 5731c0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6741//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6741//console

This message is automatically generated.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png, 
 YARN-3254-001.patch, YARN-3254-002.patch


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337319#comment-14337319
 ] 

Jian He commented on YARN-3262:
---

not exactly, this jira is to grab the AppSchedulingInfo#requests and dump it on 
the UI. I have a patch almost ready. attaching the screen shot

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He

 It would be useful to surface the resource requests table on the application 
 web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream

2015-02-25 Thread zhihai xu (JIRA)
zhihai xu created YARN-3263:
---

 Summary: ContainerManagerImpl#parseCredentials don't rewind the 
ByteBuffer after credentials.readTokenStorageStream
 Key: YARN-3263
 URL: https://issues.apache.org/jira/browse/YARN-3263
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
credentials.readTokenStorageStream. So the next time if we access Tokens, we 
will have EOFException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream

2015-02-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3263:

Description: 
ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
credentials.readTokenStorageStream. So the next time if we access Tokens, we 
will have EOFException.
The following is the code for parseCredentials in ContainerManagerImpl.
{code}
  private Credentials parseCredentials(ContainerLaunchContext launchContext)
  throws IOException {
Credentials credentials = new Credentials();
//  Parse credentials
ByteBuffer tokens = launchContext.getTokens();

if (tokens != null) {
  DataInputByteBuffer buf = new DataInputByteBuffer();
  tokens.rewind();
  buf.reset(tokens);
  credentials.readTokenStorageStream(buf);
  if (LOG.isDebugEnabled()) {
for (Token? extends TokenIdentifier tk : credentials.getAllTokens()) {
  LOG.debug(tk.getService() +  =  + tk.toString());
}
  }
}
//  End of parsing credentials
return credentials;
  }
{code}

  was:ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
credentials.readTokenStorageStream. So the next time if we access Tokens, we 
will have EOFException.


 ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
 credentials.readTokenStorageStream
 --

 Key: YARN-3263
 URL: https://issues.apache.org/jira/browse/YARN-3263
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu

 ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
 credentials.readTokenStorageStream. So the next time if we access Tokens, we 
 will have EOFException.
 The following is the code for parseCredentials in ContainerManagerImpl.
 {code}
   private Credentials parseCredentials(ContainerLaunchContext launchContext)
   throws IOException {
 Credentials credentials = new Credentials();
 //  Parse credentials
 ByteBuffer tokens = launchContext.getTokens();
 if (tokens != null) {
   DataInputByteBuffer buf = new DataInputByteBuffer();
   tokens.rewind();
   buf.reset(tokens);
   credentials.readTokenStorageStream(buf);
   if (LOG.isDebugEnabled()) {
 for (Token? extends TokenIdentifier tk : 
 credentials.getAllTokens()) {
   LOG.debug(tk.getService() +  =  + tk.toString());
 }
   }
 }
 //  End of parsing credentials
 return credentials;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Attachment: resource requests.png

attached a screenshot to demonstrate the idea

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: resource requests.png


 It would be useful to surface the resource requests table on the application 
 web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337348#comment-14337348
 ] 

Vinod Kumar Vavilapalli commented on YARN-3251:
---

bq. Since the target of your patch is to make a quick fix for old version, it's 
better to create a patch in branch-2.6. and the patch you created will be 
committed to branch-2.6 as well. I noticed some functionalities and interfaces 
being used in your patch are not part of 2.6. And patch I'm working on now will 
remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a 
intermediate fix in branch-2.
How about we have a separate JIRA solely focused on 2.6.1 - as we have two 
separate patches and two different contributors?

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3254) HealthReport should include disk full information

2015-02-25 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3254:

Attachment: Screen Shot 2015-02-25 at 14.38.10.png

Attaching a screenshot after applying v2 patch.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot 
 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Description: It would be useful to surface the outstanding resource 
requests table on the application web page to facilitate  scheduling analysis 
and debugging.  (was: It would be useful to surface the resource requests table 
on the application web page to facilitate  scheduling analysis and debugging.)

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3262.1.patch, resource requests.png


 It would be useful to surface the outstanding resource requests table on the 
 application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337459#comment-14337459
 ] 

Zhijie Shen commented on YARN-3125:
---

Hi Junping, thanks for updating the patch. Some more comments:

1. One side-by-side change is missed around the following code.
{code}
if(timelineClient != null) {
  publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(),
  DSEvent.DS_APP_ATTEMPT_END, domainId, appSubmitterUgi);
}
{code}

2. Client needs to be changed too: taking CLI option and appending it as the 
args of the command to run AM.

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Junping Du
 Attachments: YARN-3125.patch, YARN-3125v2.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337670#comment-14337670
 ] 

Vinod Kumar Vavilapalli commented on YARN-2423:
---

Sorry, I misspoke. Yes, without this patch, they are the REST APIs; that is all 
we have.

bq. Are they declared as InterfaceAudience(Public)? If they're not, we're not 
willing to use those in Spark, because there's no commitment from Yarn to keep 
them stable.
In my opinion, the APIs being added in this JIRA are no more stable than the 
REST APIs. They will possibly change the moment our data-model via YARN-2928 
changes and correspondingly the REST APIs change. So, If Spark doesn't want to 
code against this in a shimmed layer, I am not sure I can be of much help. 
FWIW, I am trying as much to maintain compatibility of what exists so far.

I'm not trying to block this patch, just pointing out potential breakages in 
the near future given YARN-2928.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337672#comment-14337672
 ] 

Zhijie Shen commented on YARN-1809:
---

[~xgong], thanks for moving this work forward. I've tried the patch. It seems 
to work overall, but a couple of links are broken. The reason is that the old 
patch is about 1yr before, while RM and AHS webUIs have been changed to some 
extent.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
 YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
 YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337701#comment-14337701
 ] 

Hadoop QA commented on YARN-3265:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700919/YARN-3265.1.patch
  against trunk revision d140d76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6748//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6748//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6748//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity (fix 
 for trunk/branch-2)
 --

 Key: YARN-3265
 URL: https://issues.apache.org/jira/browse/YARN-3265
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3265.1.patch


 This patch is trying to solve the same problem described in YARN-3251, but 
 this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.

2015-02-25 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-3173.

Resolution: Duplicate

I'm going to close this as a dupe of HADOOP-11590.  I've got the requested 
functionality of starting HA RM nodes as part of that patch already.

 start-yarn.sh script  can't aware how many RMs to be started.
 -

 Key: YARN-3173
 URL: https://issues.apache.org/jira/browse/YARN-3173
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: BOB
Priority: Minor

 When config more than one RM, for example in the HA cluster, using the 
 start-yarn.sh script to start yarn cluster,but the cluster only start up with 
 one resourcemanager  on the node which start-yarn.sh be executed. I think 
 yarn should sense how many RMs been configured at the beginning, and start 
 them all in start-yarn.sh script. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337756#comment-14337756
 ] 

Hadoop QA commented on YARN-3262:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700928/YARN-3262.2.patch
  against trunk revision d140d76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6750//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6750//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6750//console

This message is automatically generated.

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource 
 requests.png


 It would be useful to surface the outstanding resource requests table on the 
 application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-25 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337682#comment-14337682
 ] 

Marcelo Vanzin commented on YARN-2423:
--

Hi Vinod,

I think the point that Robert was trying to make is that adding these APIs 
might force Yarn to maintain compatibility for it. So it would allow clients to 
code against the public API and have a reasonable expectation that it wouldn't 
break.

But I understand that with the redesign it might be hard to maintain 
compatibility. I guess it's a choice you guys have to make, but the lack of a 
public, stable read API is definitely a barrier for Spark adopting this 
feature. (I understand we could write code to talk to the REST server directly, 
but you seem to imply that this approach would also run into compatibility 
issues after the redesign.)

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337443#comment-14337443
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700874/YARN-3251.trunk.1.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6743//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6743//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6743//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Target Version/s: 2.6.1  (was: 2.7.0, 2.6.1)

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-25 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337630#comment-14337630
 ] 

Marcelo Vanzin commented on YARN-2423:
--

Hi [~vinodkv],

What are these APIs? Are they declared as InterfaceAudience(Public)? If 
they're not, we're not willing to use those in Spark, because there's no 
commitment from Yarn to keep them stable.

The only TimelineClient I see on branch-2.6 has no methods to read data from 
the ATS.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337354#comment-14337354
 ] 

Jason Lowe commented on YARN-3131:
--

bq. We can just simply check for failToSubmitStates? Why do we also need to 
check for waitingStates?

If we only check for failToSubmitStates then we'll continue to loop waiting for 
those failed states.  We need to check for waitingStates because that's the 
typical looping condition.  We need to keep polling for a new application 
report as long as the app is in the NEW, NEW_SAVING, or SUBMITTED state since 
those states indicate the RM hasn't finished accepting the app yet.  When it's 
not in one of the waiting states, we need to check if its one of the failed 
states to decide to throw rather than just return indicating it was successful.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3254) HealthReport should include disk full information

2015-02-25 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337376#comment-14337376
 ] 

Akira AJISAKA commented on YARN-3254:
-

I reconsidered that, after all the issue is not a problem because
* Admin can read NodeManager's log and find the message as follows:
{code}
2015-02-26 07:34:22,485 WARN org.apache.hadoop.yarn.server.nodemanager.Directory
Collection: Directory /usr/local/20150225-YARN-3254-2/logs/userlogs error, used 
space above threshold of 90.0%, removing from list of valid directories
{code}
* This patch is still incompatible as jmx information is actually changed.

 HealthReport should include disk full information
 -

 Key: YARN-3254
 URL: https://issues.apache.org/jira/browse/YARN-3254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot 
 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch


 When a NodeManager's local disk gets almost full, the NodeManager sends a 
 health report to ResourceManager that local/log dir is bad and the message 
 is displayed on ResourceManager Web UI. It's difficult for users to detect 
 why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-25 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337470#comment-14337470
 ] 

Chang Li commented on YARN-3131:


Thanks [~jlowe] for commenting. [~vinodkv] I have fixed the space issue. 
Jason's explanation is reasonable and I also agree that the check for 
waitingStates is logical and necessary. Do you have any other concern? Thanks.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
 yarn_3131_v6.patch, yarn_3131_v7.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Attachment: (was: YARN-3251.trunk.1.patch)

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-02-25 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3080:
--
Attachment: YARN-3080.patch

Fixed the findbug errors.

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337571#comment-14337571
 ] 

Wangda Tan commented on YARN-3251:
--

Removed patch for trunk and uploaded the same one to YARN-3265, reassigned this 
to [~cwelch].

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-25 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337401#comment-14337401
 ] 

Vrushali C commented on YARN-3031:
--


Hi [~djp]

The AggregateUpTo enum has the tracks to aggregate along, the 
TimelineEntityType enum has the types of entities that can exist. There may not 
be aggregations along all entity types. 
For example: a query can be, give all the apps run by this user in the last 
week. This will read the data that is aggregated along the USER track. I think 
I can rename those with a YARN prefix though.

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337465#comment-14337465
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700875/YARN-3251.trunk.1.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6744//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6744//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6744//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Attachment: YARN-3262.2.patch

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource 
 requests.png


 It would be useful to surface the outstanding resource requests table on the 
 application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Suface application resource requests table

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337464#comment-14337464
 ] 

Hadoop QA commented on YARN-3262:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700884/YARN-3262.1.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1151 javac 
compiler warnings (more than the trunk's current 208 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
47 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6746//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6746//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6746//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6746//console

This message is automatically generated.

 Suface application resource requests table
 --

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3262.1.patch, resource requests.png


 It would be useful to surface the outstanding resource requests table on the 
 application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337463#comment-14337463
 ] 

Hadoop QA commented on YARN-3131:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700886/yarn_3131_v7.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6747//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6747//console

This message is automatically generated.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
 yarn_3131_v6.patch, yarn_3131_v7.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3265:
-
Priority: Blocker  (was: Major)

 CapacityScheduler deadlock when computing absolute max avail capacity (fix 
 for trunk/branch-2)
 --

 Key: YARN-3265
 URL: https://issues.apache.org/jira/browse/YARN-3265
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker

 This patch is trying to solve the same problem described in YARN-3251, but 
 this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3265:
-
Attachment: YARN-3265.1.patch

 CapacityScheduler deadlock when computing absolute max avail capacity (fix 
 for trunk/branch-2)
 --

 Key: YARN-3265
 URL: https://issues.apache.org/jira/browse/YARN-3265
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3265.1.patch


 This patch is trying to solve the same problem described in YARN-3251, but 
 this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >