[jira] [Commented] (YARN-3424) Reduce log for ContainerMonitorImpl resoure monitoring from info to debug

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390052#comment-14390052
 ] 

Hadoop QA commented on YARN-3424:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708505/YARN-3424.001.patch
  against trunk revision 2daa478.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7186//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7186//console

This message is automatically generated.

 Reduce log for ContainerMonitorImpl resoure monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-04-01 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390076#comment-14390076
 ] 

Arun Suresh commented on YARN-2962:
---

.. alternative to starting the index from the front.

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-2962.01.patch


 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390156#comment-14390156
 ] 

Hadoop QA commented on YARN-3429:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708614/YARN-3429.000.patch
  against trunk revision 2daa478.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7187//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7187//console

This message is automatically generated.

 TestAMRMTokens.testTokenExpiry fails Intermittently with error 
 message:Invalid AMRMToken
 

 Key: YARN-3429
 URL: https://issues.apache.org/jira/browse/YARN-3429
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3429.000.patch


 TestAMRMTokens.testTokenExpiry fails Intermittently with error 
 message:Invalid AMRMToken from appattempt_1427804754787_0001_01
 The error logs is at 
 https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3424:
-
Summary: Change logs for ContainerMonitorImpl's resourse monitoring from 
info to debug  (was: Reduce log for ContainerMonitorImpl resoure monitoring 
from info to debug)

 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken

2015-04-01 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3429:

Attachment: YARN-3429.000.patch

 TestAMRMTokens.testTokenExpiry fails Intermittently with error 
 message:Invalid AMRMToken
 

 Key: YARN-3429
 URL: https://issues.apache.org/jira/browse/YARN-3429
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3429.000.patch


 TestAMRMTokens.testTokenExpiry fails Intermittently with error 
 message:Invalid AMRMToken from appattempt_1427804754787_0001_01
 The error logs is at 
 https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore

2015-04-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390128#comment-14390128
 ] 

Rohith commented on YARN-3410:
--

Just like YARN-2131 is handled , I think there is choice between start up 
option vs admin support. If both are in sync, then it would be better.


 YARN admin should be able to remove individual application records from 
 RMStateStore
 

 Key: YARN-3410
 URL: https://issues.apache.org/jira/browse/YARN-3410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: Wangda Tan
Assignee: Rohith
Priority: Critical

 When RM state store entered an unexpected state, one example is YARN-2340, 
 when an attempt is not in final state but app already completed, RM can never 
 get up unless format RMStateStore.
 I think we should support remove individual application records from 
 RMStateStore to unblock RM admin make choice of either waiting for a fix or 
 format state store.
 In addition, RM should be able to report all fatal errors (which will 
 shutdown RM) when doing app recovery, this can save admin some time to remove 
 apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3424:
-
Affects Version/s: 2.7.0

 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2015-04-01 Thread vishal.rajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishal.rajan updated YARN-2624:
---
 Target Version/s:   (was: 2.6.0)
Affects Version/s: 2.6.0

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2015-04-01 Thread vishal.rajan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390319#comment-14390319
 ] 

vishal.rajan commented on YARN-2624:


please verify and reopen the jira

 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3424:
-
Issue Type: Improvement  (was: Bug)

 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-01 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3225:

Attachment: YARN-3225-3.patch

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2015-04-01 Thread vishal.rajan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390305#comment-14390305
 ] 

vishal.rajan commented on YARN-2624:


seems like this issue still persist in yarn  2.6.0 under certain conditions.

Dump of the log relating to this issue.

15/04/01 12:13:20 ERROR test.Job: Task error: Rename cannot overwrite non empty 
destination directory /grid/6/yarn/local/usercache/azkaban/filecache/344860
java.io.IOException: Rename cannot overwrite non empty destination directory 
/grid/6/yarn/local/usercache/azkaban/filecache/344860
at 
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
at 
org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:909)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
=
yarn version : hadoop-2-2-0-0-2041-yarn   2.6.0.2.2.0.0-2041
=

This node was taken OOR for maintanance, and when it was added back to the 
cluster, seems like this 344860 directory was not removed before assigning it 
to the new container.




 Resource Localization fails on a cluster due to existing cache directories
 --

 Key: YARN-2624
 URL: https://issues.apache.org/jira/browse/YARN-2624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2624.001.patch, YARN-2624.001.patch


 We have found resource localization fails on a cluster with following error 
 in certain cases.
 {noformat}
 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc { { 
 hdfs://blahhostname:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
  1412027745352, FILE, null 
 },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
 java.io.IOException: Rename cannot overwrite non empty destination directory 
 /data/yarn/nm/filecache/27
   at 
 org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits

2015-04-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390258#comment-14390258
 ] 

Rohith commented on YARN-3261:
--

Thanks [~gururaj] for the patch..
+1(non-binding) for the change.


 rewrite resourcemanager restart doc to remove roadmap bits 
 ---

 Key: YARN-3261
 URL: https://issues.apache.org/jira/browse/YARN-3261
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3261.01.patch


 Another mixture of roadmap and instruction manual that seems to be ever 
 present in a lot of the recently written documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390243#comment-14390243
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7482 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7482/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3286) Cleanup RMNode#ReconnectNodeTransition

2015-04-01 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3286:
-
Attachment: 0001-YARN-3286.patch

 Cleanup RMNode#ReconnectNodeTransition
 --

 Key: YARN-3286
 URL: https://issues.apache.org/jira/browse/YARN-3286
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3286.patch, YARN-3286-test-only.patch


 RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This 
 part of the code can be clean up where we do not require to remove node and 
 add new node every time.
 Supporting to above point, in the issue discussion YARN-3222 mentioned in the 
 comment 
 [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
  and 
 [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]
 Clean up can do the following things
 # It always remove an old node and add a new node. This is not really 
 required, instead old node can be updated with new values.
 # RMNode#totalCapability has stale capability after NM is reconnected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-3430:
---

Assignee: Xuan Gong

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3430:

Attachment: YARN-3430.1.patch

trivial patch without testcase

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3286) Cleanup RMNode#ReconnectNodeTransition

2015-04-01 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3286:
-
 Description: 
RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This 
part of the code can be clean up where we do not require to remove node and add 
new node every time.

Supporting to above point, in the issue discussion YARN-3222 mentioned in the 
comment 
[link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
 and 
[link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]

Clean up can do the following things
# It always remove an old node and add a new node. This is not really required, 
instead old node can be updated with new values.
# RMNode#totalCapability has stale capability after NM is reconnected.

  was:
This is found while fixing YARN-3222 mentioned in the comment 
[link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
 and 
[link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]

And RMNode#ReconnectNodeTransition clean up : It always remove an old node and 
add a new node. This need to be examined whether this is really required.

Target Version/s: 2.8.0
  Issue Type: Improvement  (was: Bug)
 Summary: Cleanup RMNode#ReconnectNodeTransition  (was: 
RMNode#totalCapability has stale capability after NM is reconnected.)

 Cleanup RMNode#ReconnectNodeTransition
 --

 Key: YARN-3286
 URL: https://issues.apache.org/jira/browse/YARN-3286
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-3286-test-only.patch


 RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This 
 part of the code can be clean up where we do not require to remove node and 
 add new node every time.
 Supporting to above point, in the issue discussion YARN-3222 mentioned in the 
 comment 
 [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
  and 
 [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]
 Clean up can do the following things
 # It always remove an old node and add a new node. This is not really 
 required, instead old node can be updated with new values.
 # RMNode#totalCapability has stale capability after NM is reconnected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-04-01 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Attachment: queue_with_max333cores.png
queue_with_max263cores.png
queue_with_max163cores.png

queue_with_max163cores.png : submit a job to a queue with max 163 cores
queue_with_max263cores.png : submit a job to a queue with max 263 cores
queue_with_max333cores.png : submit a job to a queue with max 333 cores

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical
 Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
 queue_with_max163cores.png, queue_with_max263cores.png, 
 queue_with_max333cores.png


 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-04-01 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390466#comment-14390466
 ] 

mai shurong commented on YARN-3416:
---

I found a new case today. I submitted a more larger job with 5800 maps and 380 
reduces to a queue which has max 263 cores. Even though no map fail, a deadlock 
of map and reduce cores allocation always occured when I tried several times.  
And I tried to submitted to other queues, as long as reduces of a job is more 
than max cores of the queue , deadlock always happened. 
I attach the printscreens of deadlock jobs, and attach the head 10 line log 
(AM_log_head10.txt.gz) and tail 10 line (AM_log_tail10.txt.gz) of 
AM log of one deadlock job.

The parameter mapreduce.job.reduce.slowstart.completedmaps is 0.5.

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical
 Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
 queue_with_max163cores.png, queue_with_max263cores.png, 
 queue_with_max333cores.png


 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3430:
---

 Summary: RMAppAttempt headroom data is missing in RM Web UI
 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-04-01 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390074#comment-14390074
 ] 

Arun Suresh commented on YARN-2962:
---

Yup.. agreed, star index from the end is a better

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-2962.01.patch


 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running

2015-04-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390122#comment-14390122
 ] 

Rohith commented on YARN-2268:
--

Thinking on this jira, getting many questions. 
# How to identify RM is running since RM can be formated from anywhere in the 
cluster? 
# In HA, for each rm-ids to be checked for serviceState. This would result in 
time consuming for each hosts retry would take time. If switch happens in the 
middle while checking rm-ids, it would give wrong result that all RM's are in 
standby.

I think if admin support is there, 1st can be solved easily.




 Disallow formatting the RMStateStore when there is an RM running
 

 Key: YARN-2268
 URL: https://issues.apache.org/jira/browse/YARN-2268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Rohith

 YARN-2131 adds a way to format the RMStateStore. However, it can be a problem 
 if we format the store while an RM is actively using it. It would be nice to 
 fail the format if there is an RM running and using this store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Reduce log for ContainerMonitorImpl resoure monitoring from info to debug

2015-04-01 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390226#comment-14390226
 ] 

Tsuyoshi Ozawa commented on YARN-3424:
--

+1 with minor indentation fix on my local.  Committing this shortly.

 Reduce log for ContainerMonitorImpl resoure monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2015-04-01 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390471#comment-14390471
 ] 

Peng Zhang commented on YARN-3405:
--

bq. 
2. if parent's usage reached its fair share, it will not propagate preemption 
request upside again. So preemption request in parent queue means preemption 
needed between its children.

make above statement more clear:
If request from children added with current usage less than fair share, parent 
queue will propagate request upside. This means current queue is under fair 
share, it need preempt from its sibling that who is over scheduled. When the 
amount reached current queue's fair share, the above request amount will be 
stored on current queue. This means these request amount need happen between 
current queue's children, 

 FairScheduler's preemption cannot happen between sibling in some case
 -

 Key: YARN-3405
 URL: https://issues.apache.org/jira/browse/YARN-3405
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical

 Queue hierarchy described as below:
 {noformat}
   root
/ \
queue-1  queue-2   
   /  \
 queue-1-1 queue-1-2
 {noformat}
 Assume cluster resource is 100
 # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
 # When queue-1-2 is active, and it cause some new preemption request for 
 fairshare 25.
 # When preemption from root, it has possibility to find preemption candidate 
 is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
 it's equal to its fairshare.
 # Finally queue-1-2 will be waiting for resource release form queue-1-1 
 itself.
 What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390407#comment-14390407
 ] 

Xuan Gong commented on YARN-3248:
-

+1 lgtm. Will commit

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390442#comment-14390442
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #884 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/884/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390436#comment-14390436
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #884 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/884/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390443#comment-14390443
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #884 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/884/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-04-01 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Attachment: AM_log_head10.txt.gz
AM_log_tail10.txt.gz

head 10 lines and tail 10 lines of AM log of a deadlock job.

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical
 Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz


 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI

2015-04-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3301:

Target Version/s: 2.8.0

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390431#comment-14390431
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/150/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390424#comment-14390424
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/150/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390430#comment-14390430
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/150/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* hadoop-yarn-project/CHANGES.txt


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390379#comment-14390379
 ] 

Hadoop QA commented on YARN-3225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708647/YARN-3225-3.patch
  against trunk revision c69ba81.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7188//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7188//console

This message is automatically generated.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390412#comment-14390412
 ] 

Xuan Gong commented on YARN-3248:
-

Committed into trunk/branch-2

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390422#comment-14390422
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7483 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7483/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3428) Debug log resources to be localized for a container

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390428#comment-14390428
 ] 

Hudson commented on YARN-3428:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/150/])
YARN-3428. Debug log resources to be localized for a container. (kasha) (kasha: 
rev 2daa478a6420585dc13cea2111580ed5fe347bc1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


 Debug log resources to be localized for a container
 ---

 Key: YARN-3428
 URL: https://issues.apache.org/jira/browse/YARN-3428
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.8.0

 Attachments: yarn-3428-1.patch


 For each container, we log the resources going through INIT - LOCALIZING - 
 DOWNLOADED transitions. These logs do not have container-id itself. It would 
 be nice to add debug logs to capture the resources being localized for a 
 container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390452#comment-14390452
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7484 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7484/])
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2015-04-01 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390455#comment-14390455
 ] 

Peng Zhang commented on YARN-3405:
--

I've a primitive idea to fix this and YARN-3414 under current preemption 
architecture.

1. When calculation preemption request, update parent's preemption request.
2. if parent's usage reached its fair share, it will not propagate preemption 
request upside again. So preemption request in parent queue means preemption 
needed between its children.
3. During preempting phase, walk from root to downside
  a. if parent queue has preemption request, it will do preemption between its 
children for the request(process like now, find the most over fair, and preempt 
recursively).  
  b. And then(including after doing 3.a and the case not need preempt between 
children), traverse its children and repeat 3.a;

This process bring in traverse of the tree. And I think this will not affect 
performance severely because there are usually small amount of queues.

 FairScheduler's preemption cannot happen between sibling in some case
 -

 Key: YARN-3405
 URL: https://issues.apache.org/jira/browse/YARN-3405
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical

 Queue hierarchy described as below:
 {noformat}
   root
/ \
queue-1  queue-2   
   /  \
 queue-1-1 queue-1-2
 {noformat}
 Assume cluster resource is 100
 # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
 # When queue-1-2 is active, and it cause some new preemption request for 
 fairshare 25.
 # When preemption from root, it has possibility to find preemption candidate 
 is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
 it's equal to its fairshare.
 # Finally queue-1-2 will be waiting for resource release form queue-1-1 
 itself.
 What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390542#comment-14390542
 ] 

Rohith commented on YARN-3430:
--

+1 lgtm (non-binding)

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3286) Cleanup RMNode#ReconnectNodeTransition

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390553#comment-14390553
 ] 

Hadoop QA commented on YARN-3286:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708671/0001-YARN-3286.patch
  against trunk revision 2e79f1c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7189//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7189//console

This message is automatically generated.

 Cleanup RMNode#ReconnectNodeTransition
 --

 Key: YARN-3286
 URL: https://issues.apache.org/jira/browse/YARN-3286
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3286.patch, YARN-3286-test-only.patch


 RMNode#ReconnectNodeTransition has messed up for every ReconnectedEvent. This 
 part of the code can be clean up where we do not require to remove node and 
 add new node every time.
 Supporting to above point, in the issue discussion YARN-3222 mentioned in the 
 comment 
 [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
  and 
 [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]
 Clean up can do the following things
 # It always remove an old node and add a new node. This is not really 
 required, instead old node can be updated with new values.
 # RMNode#totalCapability has stale capability after NM is reconnected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI

2015-04-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3301:

Attachment: YARN-3301.1.patch

Simple fix

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390616#comment-14390616
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2082/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390622#comment-14390622
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2082/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390623#comment-14390623
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2082/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3428) Debug log resources to be localized for a container

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390620#comment-14390620
 ] 

Hudson commented on YARN-3428:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2082/])
YARN-3428. Debug log resources to be localized for a container. (kasha) (kasha: 
rev 2daa478a6420585dc13cea2111580ed5fe347bc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Debug log resources to be localized for a container
 ---

 Key: YARN-3428
 URL: https://issues.apache.org/jira/browse/YARN-3428
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.8.0

 Attachments: yarn-3428-1.patch


 For each container, we log the resources going through INIT - LOCALIZING - 
 DOWNLOADED transitions. These logs do not have container-id itself. It would 
 be nice to add debug logs to capture the resources being localized for a 
 container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390624#comment-14390624
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2082/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* hadoop-yarn-project/CHANGES.txt


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390638#comment-14390638
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/150/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3428) Debug log resources to be localized for a container

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390636#comment-14390636
 ] 

Hudson commented on YARN-3428:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/150/])
YARN-3428. Debug log resources to be localized for a container. (kasha) (kasha: 
rev 2daa478a6420585dc13cea2111580ed5fe347bc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Debug log resources to be localized for a container
 ---

 Key: YARN-3428
 URL: https://issues.apache.org/jira/browse/YARN-3428
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.8.0

 Attachments: yarn-3428-1.patch


 For each container, we log the resources going through INIT - LOCALIZING - 
 DOWNLOADED transitions. These logs do not have container-id itself. It would 
 be nice to add debug logs to capture the resources being localized for a 
 container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390632#comment-14390632
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/150/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390640#comment-14390640
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/150/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390639#comment-14390639
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #150 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/150/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390662#comment-14390662
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #141 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/141/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390660#comment-14390660
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #141 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/141/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3428) Debug log resources to be localized for a container

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390658#comment-14390658
 ] 

Hudson commented on YARN-3428:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #141 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/141/])
YARN-3428. Debug log resources to be localized for a container. (kasha) (kasha: 
rev 2daa478a6420585dc13cea2111580ed5fe347bc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Debug log resources to be localized for a container
 ---

 Key: YARN-3428
 URL: https://issues.apache.org/jira/browse/YARN-3428
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.8.0

 Attachments: yarn-3428-1.patch


 For each container, we log the resources going through INIT - LOCALIZING - 
 DOWNLOADED transitions. These logs do not have container-id itself. It would 
 be nice to add debug logs to capture the resources being localized for a 
 container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390654#comment-14390654
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #141 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/141/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390661#comment-14390661
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #141 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/141/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2467) Add SpanReceiverHost to ResourceManager

2015-04-01 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2467:
---
Summary: Add SpanReceiverHost to ResourceManager  (was: Add 
SpanReceiverHost to YARN daemons )

 Add SpanReceiverHost to ResourceManager
 ---

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2467) Add SpanReceiverHost to ResourceManager

2015-04-01 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2467:
---
Component/s: (was: nodemanager)

 Add SpanReceiverHost to ResourceManager
 ---

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390766#comment-14390766
 ] 

Hadoop QA commented on YARN-3430:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708678/YARN-3430.1.patch
  against trunk revision 2e79f1c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7190//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7190//console

This message is automatically generated.

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v5.patch

Upload v5 patch with addressing all review comments above.
For using ContainerEntity to replace TimelineEntity, there is a bug that 
UnrecognizedPropertyException will get thrown in serialize/deserialize 
children element when consuming it as base class (TimelineEntity). 
Comment that element annotation out until we find a better solution (will not 
addressed in this JIRA). 

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2467) Add SpanReceiverHost to ResourceManager

2015-04-01 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2467:
---
Attachment: YARN-2467.001.patch

I would like to narrow down the focus of this sub-task to ResourceManager only. 
Attached patch adds SpanReceiverHost to RM and moves some testing utils from 
hadoop-hdfs to hadoop-common.

 Add SpanReceiverHost to ResourceManager
 ---

 Key: YARN-2467
 URL: https://issues.apache.org/jira/browse/YARN-2467
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: YARN-2467.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390900#comment-14390900
 ] 

Sangjin Lee commented on YARN-3391:
---

Hi [~djp],

The flow id identifies a distinct flow application that can be run repeatedly 
over time. The flow run id identifies one instance (or specific execution) of 
that flow. Finally, the flow version keeps track of the changes made to the 
flow (e.g. changes to the source code).

Let me give you a concrete example. Suppose you have a pig script you run 
repeatedly, named tracking.pig. The flow id in this case may be 
tracking.pig (or al...@tracking.pig to denote the fact that user alice 
runs this script).

The tracking.pig script will be run repeatedly many times. If I run it today, 
that specific run may have the flow run id of 1427846400 (timestamp when the 
pig script started). If I run it again tomorrow, the run id of that run would 
be 1427932800, and so on. Multiple run id's for the same flow id is a series 
of runs of the same script.

The flow version identifies changes made to the flow (user application). One 
scheme may be to use some kind of a hash of the pig script. Another scheme may 
be to use the git commit hash. Or some real versions if the user application 
has well-defined versions.

A flow run is *NOT* a subset of YARN apps run inside a flow. A flow is a 
template of runs if you will, and a flow run is an actual run instances of that 
flow. These are described in some detail in the original design doc in 
YARN-2928.

I hope this helps.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390791#comment-14390791
 ] 

Hadoop QA commented on YARN-3301:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708686/YARN-3301.1.patch
  against trunk revision 2e79f1c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7191//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7191//console

This message is automatically generated.

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390844#comment-14390844
 ] 

Junping Du commented on YARN-3391:
--

Thanks [~zjshen] for delivering the patch! 
To be honest, I am getting more confused on these concepts from some discussion 
above:
From what I was understanding, flow is a group of applications that will get 
run (sequential or parallel) in a batch, and flow_run is one run branch for 
subset of flow applications (apps in flow_run only get run in sequence, 
however, different flow_runs under one flow could run in parallel). Does flow 
version sounds like a timestamp concept (from HBase prospective) which 
represent a specific run time for the flow?
Just quickly go through the attached patch, I didn't find answer there. I think 
we should document the concept/definition of flow, flow run and flow 
version clearly in Javadoc (web doc could be later when we finish the feature) 
which could help reviewer and developers to understand better. 

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390941#comment-14390941
 ] 

Zhijie Shen commented on YARN-3391:
---

I'll put some description in the javadoc somewhere, but I think eventually we 
need to describe it clearly in the documentation of YTS v2.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-01 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390670#comment-14390670
 ] 

Devaraj K commented on YARN-3225:
-

This failed test is not related to the patch.
{code:xml}
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
{code}

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390936#comment-14390936
 ] 

Junping Du commented on YARN-3391:
--

Thanks [~sjlee0] for reply quickly! That helps a lot. I initially thought flow 
run is a run instance (may from YARN-2928 design doc or somewhere) but get 
confused to something else when I saw flow version. Thanks for bringing me 
back. :)
Given other contributors could miss discussions here, I would suggest we add 
Javadoc to explain these somewhere, e.g. in TimelineCollectorContext.java.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-01 Thread Rohit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391738#comment-14391738
 ] 

Rohit Agarwal commented on YARN-3415:
-

+1

 Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
 queue
 -

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-04-01 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3432:
--

Assignee: Brahma Reddy Battula

 Cluster metrics have wrong Total Memory when there is reserved memory on CS
 ---

 Key: YARN-3432
 URL: https://issues.apache.org/jira/browse/YARN-3432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula

 I noticed that when reservations happen when using the Capacity Scheduler, 
 the UI and web services report the wrong total memory.
 For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
 and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
 This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
 there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2015-04-01 Thread Kareem El Gebaly (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kareem El Gebaly updated YARN-1572:
---
Attachment: YARN-1572-branch-2.3.0.001.patch

 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-1572-branch-2.3.0.001.patch, YARN-1572-log.tar.gz, 
 conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 {code}
 2014-07-31 04:18:19,653 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_1406794589275_0001_01_21 of capacity 
 memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, 
 memory:6144, vCores:6 used and memory:2048, vCores:2 available after 
 allocation
 2014-07-31 04:18:19,654 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:662)
 2014-07-31 04:18:19,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391864#comment-14391864
 ] 

Zhijie Shen commented on YARN-3391:
---

Sangjin, thanks for your comments, too. According to your and Joep's comments, 
I can see the benefit to show application aggregation information by 
application (type). However, IMHO, it's orthogonal to flow definition. Isn't 
the straightforward approach to provide this feature via aggregating on 
application name/type dimension instead of let flow name = application name.

On the other side, flow should semantically stand for *workflow* (correct me if 
I'm wrong about flow concept), which contains a group of applications that work 
together to resolve a problem. Making flow name == application name changes the 
semantics That said, a flow of applications means the applications of the same 
type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as 
different instances of the same thing.
{quote}

I guess the same thing you had in mind is not the same workflow, but the same 
application type, right? How about we decoupling the two concepts? One step 
back, when users set the flow explicitly, are they going to tell the 
application that you belong to workflow abc, or that you belong to job type 
xyz? I think it will be the former.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3413) Node label attributes (like exclusive or not) should be able to set when addToClusterNodeLabels and shouldn't be changed during runtime

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391866#comment-14391866
 ] 

Hadoop QA commented on YARN-3413:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708821/YARN-3413.1.patch
  against trunk revision c94d594.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7194//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7194//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7194//console

This message is automatically generated.

 Node label attributes (like exclusive or not) should be able to set when 
 addToClusterNodeLabels and shouldn't be changed during runtime
 ---

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391921#comment-14391921
 ] 

Hadoop QA commented on YARN-2729:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708788/YARN-2729.20150402-1.patch
  against trunk revision c94d594.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
org.apache.hadoop.hdfs.TestLeaseRecovery2

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7193//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7193//console

This message is automatically generated.

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
 YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
 YARN-2729.20150402-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392066#comment-14392066
 ] 

Naganarasimha G R commented on YARN-2740:
-

Thanks for the review [~wangda],

 bq. Beyond CommonNodeLabelsManager shouldn't persist labels on nodes when NM 
do heartbeat., it shouldn't recover labels on nodes when RM restart. This is 
because RM configured centralized config, add some labels to nodes and change 
config to distributed then restart.
Good catch !.  This i can achieve in couple of ways 
* Modify {{NodeLabelsStore.recover()}} to accept a boolean parameter like 
{{boolean skipNodeToLabelsMappings}} and leave the responsibility to the store 
(FileSystemNodeLabelsStore need to take care of skipping)
* Add a method in CommonNodeLabelsManager like {{recoverLabelsOnNode}} and let 
the store use this instead of {{replaceLabelsOnNode}} and we can handle the 
skipping in the new method i.e. 
{{CommonNodeLabelsManager.recoverLabelsOnNode}}. If needed to further ensure 
that NodeLabelsStore do not call replaceLabelsOnNode we can extract a interface 
for the methods used by the  NodeLabelsStore and make CommonNodeLabelsManager 
implement it.

Please provide your opinion on the suggested approaches and also if you have 
any other alternatives in mind.

2nd point will handle in the next patch

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-04-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391872#comment-14391872
 ] 

Vinod Kumar Vavilapalli commented on YARN-2261:
---

MAPREDUCE-4099 originally facilitated this for MapReduce in a not so ideal way.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-04-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391875#comment-14391875
 ] 

Brahma Reddy Battula commented on YARN-3432:


Reverting of YARN-656 should be fine I think..

 Cluster metrics have wrong Total Memory when there is reserved memory on CS
 ---

 Key: YARN-3432
 URL: https://issues.apache.org/jira/browse/YARN-3432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula

 I noticed that when reservations happen when using the Capacity Scheduler, 
 the UI and web services report the wrong total memory.
 For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
 and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
 This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
 there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391959#comment-14391959
 ] 

Vrushali C commented on YARN-3391:
--


For default values, workflow = appname is much more user friendly and intuitive 
than workflow name = flow_number_number. 

Setting the flow name to flow_number_number per run will mean the UI will 
have a lengthy list of flow_number_number (similar to JT/RM). This will not be 
a step up from current JT / RM UI experience.


 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391972#comment-14391972
 ] 

Hadoop QA commented on YARN-3415:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708850/YARN-3415.002.patch
  against trunk revision 4d14816.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7196//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7196//console

This message is automatically generated.

 Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
 queue
 -

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391976#comment-14391976
 ] 

Hadoop QA commented on YARN-1572:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708878/YARN-1572-branch-2.3.0.001.patch
  against trunk revision f383fd9.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7197//console

This message is automatically generated.

 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-1572-branch-2.3.0.001.patch, YARN-1572-log.tar.gz, 
 conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 {code}
 2014-07-31 04:18:19,653 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_1406794589275_0001_01_21 of capacity 
 memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, 
 memory:6144, vCores:6 used and memory:2048, vCores:2 available after 
 allocation
 2014-07-31 04:18:19,654 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:662)
 2014-07-31 04:18:19,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore

2015-04-01 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392173#comment-14392173
 ] 

Rohith commented on YARN-3410:
--

For state store format  in YARN-2131, discussion happened whether to format 
state using admin service or resourcemanager start up options [comment 
link|https://issues.apache.org/jira/browse/YARN-2131?focusedCommentId=14032694page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032694].
 Similarly I am thinking for application state deletion options
# ./yarn resourcemanager -delete-from-state-store app-id OR
# ./yarn rmadmin -delete-from-state-store app-id
1st choice is pretty staight forward deletion neverthless of app state is 
finished or running. I would like to choose 2nd option.


 YARN admin should be able to remove individual application records from 
 RMStateStore
 

 Key: YARN-3410
 URL: https://issues.apache.org/jira/browse/YARN-3410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: Wangda Tan
Assignee: Rohith
Priority: Critical

 When RM state store entered an unexpected state, one example is YARN-2340, 
 when an attempt is not in final state but app already completed, RM can never 
 get up unless format RMStateStore.
 I think we should support remove individual application records from 
 RMStateStore to unblock RM admin make choice of either waiting for a fix or 
 format state store.
 In addition, RM should be able to report all fatal errors (which will 
 shutdown RM) when doing app recovery, this can save admin some time to remove 
 apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2015-04-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391710#comment-14391710
 ] 

Hadoop QA commented on YARN-1572:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708801/0001-Fix-for-YARN-1572.patch
  against trunk revision 3c7adaa.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7195//console

This message is automatically generated.

 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: 0001-Fix-for-YARN-1572.patch, YARN-1572-log.tar.gz, 
 conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 {code}
 2014-07-31 04:18:19,653 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_1406794589275_0001_01_21 of capacity 
 memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, 
 memory:6144, vCores:6 used and memory:2048, vCores:2 available after 
 allocation
 2014-07-31 04:18:19,654 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:662)
 2014-07-31 04:18:19,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391728#comment-14391728
 ] 

zhihai xu commented on YARN-3415:
-

[~ragarwal], thanks for the review. I uploaded a new patch YARN-3415.002.patch 
which addressed your comment.


 Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
 queue
 -

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal

2015-04-01 Thread Kareem El Gebaly (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kareem El Gebaly updated YARN-1572:
---
Attachment: (was: 0001-Fix-for-YARN-1572.patch)

 Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
 --

 Key: YARN-1572
 URL: https://issues.apache.org/jira/browse/YARN-1572
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz


 we have lower chance to hit NPE in allocateNodeLocal  when run benchmark(hit 
 4 in 20 times).
 {code}
 2014-07-31 04:18:19,653 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_1406794589275_0001_01_21 of capacity 
 memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, 
 memory:6144, vCores:6 used and memory:2048, vCores:2 available after 
 allocation
 2014-07-31 04:18:19,654 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:662)
 2014-07-31 04:18:19,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2015-04-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391719#comment-14391719
 ] 

Jian Fang commented on YARN-796:


JIRA MAPREDUCE-6304 has been created for this purpose.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics

2015-04-01 Thread Rohit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Agarwal updated YARN-3382:

 Target Version/s: 2.7.0
Affects Version/s: 2.2.0
   2.3.0
   2.4.0
   2.5.0
   2.6.0

 Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
 -

 Key: YARN-3382
 URL: https://issues.apache.org/jira/browse/YARN-3382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
 Attachments: YARN-3382.patch


 {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in 
 {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of 
 the user's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391864#comment-14391864
 ] 

Zhijie Shen edited comment on YARN-3391 at 4/2/15 12:39 AM:


Sangjin, thanks for your comments, too. According to your and Joep's comments, 
I can see the benefit to show application aggregation information by 
application (type). However, IMHO, it's orthogonal to flow definition. Isn't 
the straightforward approach to provide this feature via aggregating on 
application name/type dimension instead of let flow name = application name.

On the other side, flow should semantically stand for *workflow* (correct me if 
I'm wrong about flow concept), which contains a group of applications that work 
together to resolve a problem. Making flow name == application name changes the 
semantics That said, a flow of applications means the applications of the same 
type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as 
different instances of the same thing.
{quote}

I guess the same thing you had in mind is not the same workflow, but the same 
application type, right? And back to Joep's web UI example, it's better to be 
described as getting sum(cost) from apps where app_name(type) = sleep. 
Therefore, how about we decoupling the two concepts? One step back, when users 
set the flow explicitly, are they going to tell the application that it belongs 
to workflow ABC, or that it belongs to job type XYZ? I think it will be the 
former.


was (Author: zjshen):
Sangjin, thanks for your comments, too. According to your and Joep's comments, 
I can see the benefit to show application aggregation information by 
application (type). However, IMHO, it's orthogonal to flow definition. Isn't 
the straightforward approach to provide this feature via aggregating on 
application name/type dimension instead of let flow name = application name.

On the other side, flow should semantically stand for *workflow* (correct me if 
I'm wrong about flow concept), which contains a group of applications that work 
together to resolve a problem. Making flow name == application name changes the 
semantics That said, a flow of applications means the applications of the same 
type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as 
different instances of the same thing.
{quote}

I guess the same thing you had in mind is not the same workflow, but the same 
application type, right? How about we decoupling the two concepts? One step 
back, when users set the flow explicitly, are they going to tell the 
application that you belong to workflow abc, or that you belong to job type 
xyz? I think it will be the former.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392025#comment-14392025
 ] 

Naganarasimha G R commented on YARN-2729:
-

Testcase failures not related to my jira

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
 YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
 YARN-2729.20150402-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-01 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3415:

Attachment: YARN-3415.002.patch

 Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
 queue
 -

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-01 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391711#comment-14391711
 ] 

Sangjin Lee commented on YARN-3391:
---

OK, just to clarify, we're talking about a case where one flow (run) is one 
YARN app. The only debate is whether the repeated runs of the (essentially) 
same YARN app should be grouped as different runs of the same flow, or all 
different flows altogether. In other words, *if it ran 100 times, should we 
have 100 flow runs of one flow, or 100 flows each of which has exactly one flow 
run?*

To me it seems a no brainer (thanks [~vrushalic] for reminding me) that we do 
want to group the runs of the same YARN app. If a user is running TestDFSIO 
over and over, they should be recognized as different instances of the same 
thing.

One mitigating factor is we would modify the mapreduce code to provide the flow 
name/id in case it's not set. Then the default behavior won't kick in for the 
most part. But I think it is important enough to group them and surface them as 
instances of the same flow.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391794#comment-14391794
 ] 

Zhijie Shen commented on YARN-3430:
---

I temporally remove this commit from branch-2.7 to keep the branch compilable. 
It's pending on whether we can pull YARN-3273 in 2.7.

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended

2015-04-01 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated YARN-2369:
--
Attachment: YARN-2369-1.patch

I like the second idea where the user should explicitly append to the variable. 
 I think we can do this just by removing the code to append and just replace 
the entire variable every time we get an update.  I'm going to try this out, 
but figured I'd attach the code change in case I'm missing something obvious.

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3428) Debug log resources to be localized for a container

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391186#comment-14391186
 ] 

Hudson commented on YARN-3428:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2100/])
YARN-3428. Debug log resources to be localized for a container. (kasha) (kasha: 
rev 2daa478a6420585dc13cea2111580ed5fe347bc1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Debug log resources to be localized for a container
 ---

 Key: YARN-3428
 URL: https://issues.apache.org/jira/browse/YARN-3428
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.8.0

 Attachments: yarn-3428-1.patch


 For each container, we log the resources going through INIT - LOCALIZING - 
 DOWNLOADED transitions. These logs do not have container-id itself. It would 
 be nice to add debug logs to capture the resources being localized for a 
 container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3412) RM tests should use MockRM where possible

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391190#comment-14391190
 ] 

Hudson commented on YARN-3412:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2100/])
YARN-3412. RM tests should use MockRM where possible. (kasha) (kasha: rev 
79f7f2aabfd7a69722748850f4d3b1ff54af7556)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerEventLog.java


 RM tests should use MockRM where possible
 -

 Key: YARN-3412
 URL: https://issues.apache.org/jira/browse/YARN-3412
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.8.0

 Attachments: yarn-3412-1.patch


 Noticed TestZKRMStateStore and TestMoveApplication fail when running on a 
 mac, due to not being able to start the webapp. There are a few other tests 
 that could use MockRM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391188#comment-14391188
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2100/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI

2015-04-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391012#comment-14391012
 ] 

Junping Du commented on YARN-3301:
--

Thanks [~xgong] for delivering a patch. Does test failure here related to your 
patch?

 Fix the format issue of the new RM web UI and AHS web UI
 

 Key: YARN-3301
 URL: https://issues.apache.org/jira/browse/YARN-3301
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3301.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3424) Change logs for ContainerMonitorImpl's resourse monitoring from info to debug

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391189#comment-14391189
 ] 

Hudson commented on YARN-3424:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2100/])
YARN-3424. Change logs for ContainerMonitorImpl's resourse monitoring from info 
to debug. Contributed by Anubhav Dhoot. (ozawa: rev 
c69ba81497ae4da329ddb34ba712a64a7eec479f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 Change logs for ContainerMonitorImpl's resourse monitoring from info to debug
 -

 Key: YARN-3424
 URL: https://issues.apache.org/jira/browse/YARN-3424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3424.001.patch


 Today we log the memory usage of process at info level which spams the log 
 with hundreds of log lines 
 {noformat}
 2015-03-27 09:32:48,905 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Memory usage of ProcessTree 9215 for container-id 
 container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
 used; 2.6 GB of 2.1 GB virtual memory used
 {noformat}
 Proposing changing this to debug level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391181#comment-14391181
 ] 

Hudson commented on YARN-3304:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2100/])
YARN-3304. Addendum patch. Cleaning up ResourceCalculatorProcessTree APIs for 
public use and removing inconsistencies in the default values. (Junping Du and 
Karthik Kambatla via vinodkv) (vinodkv: rev 
7610925e90155dfe5edce05da31574e4fb81b948)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java


 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3304-appendix-v2.patch, 
 YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
 YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
 YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
 YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
 YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
 YARN-3304.patch, yarn-3304-5.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391059#comment-14391059
 ] 

Zhijie Shen commented on YARN-3334:
---

bq. For using ContainerEntity to replace TimelineEntity, there is a bug that 
UnrecognizedPropertyException will get thrown in serialize/deserialize 
children element when consuming it as base class (TimelineEntity). 

I probably know the problem. I'll fix it seperately: YARN-3431. Let's leave 
this issue in this jira.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >