date:20141120


[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219119#comment-14219119
 ] 

Hadoop QA commented on YARN-2679:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682592/YARN-2679.001.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5884//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5884//console

This message is automatically generated.

 add container launch prepare time metrics to NM.
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2679.000.patch, YARN-2679.001.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class


 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2404:
-
Attachment: YARN-2404.5.patch

Refreshed a patch.

I found that TestRMRestart#testAppRecoveredInOrderOnRMRestart fails after the 
refactoring since recoverApplication loads data from 
RMStateStore#RMState#appState, which is created as a instance of HashMap. We 
should make it TreeMap to preserve the restoring order by key, so I fixed it in 
this patch.

[~jianhe], could you take a look?

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch, YARN-2404.5.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2675:

Attachment: YARN-2675.004.patch

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219141#comment-14219141
 ] 

zhihai xu commented on YARN-2675:
-

Hi [~kasha],

Good suggestion, I added unit tests to exercise all the newly added transitions 
in the new patch YARN-2675.004.patch.

thanks
zhihai

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor

2014-11-20 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-2243:

Attachment: YARN-2243.patch

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch, YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2267) Auxiliary Service support in RM

2014-11-20 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2267:


Assignee: Rohith

 Auxiliary Service support in RM
 ---

 Key: YARN-2267
 URL: https://issues.apache.org/jira/browse/YARN-2267
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Rohith

 Currently RM does not have a provision to run any Auxiliary services. For 
 health/monitoring in RM, its better to make a plugin mechanism in RM itself, 
 similar to NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor


[ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219196#comment-14219196
 ] 

Hadoop QA commented on YARN-2243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682603/YARN-2243.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5886//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5886//console

This message is automatically generated.

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch, YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219195#comment-14219195
 ] 

Hadoop QA commented on YARN-2675:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682601/YARN-2675.004.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
19 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5885//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5885//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5885//console

This message is automatically generated.

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2675:

Attachment: YARN-2675.004.patch

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2675:

Attachment: (was: YARN-2675.004.patch)

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219221#comment-14219221
 ] 

Hudson commented on YARN-2865:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting


[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219222#comment-14219222
 ] 

Hudson commented on YARN-2878:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm


 Fix DockerContainerExecutor.apt.vm formatting
 -

 Key: YARN-2878
 URL: https://issues.apache.org/jira/browse/YARN-2878
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Fix For: 2.7.0

 Attachments: YARN-1964-docs.patch


 The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays


[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219218#comment-14219218
 ] 

Hudson commented on YARN-2802:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/])
YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu 
via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java


 ClusterMetrics to include AM launch and register delays
 ---

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
 YARN-2802.005.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2014-11-20 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219224#comment-14219224
 ] 

Rohith commented on YARN-2880:
--

Hi Wangda Tan, 
   I am trying to write test cases for nodelabel recovery. IIUC, as of now 
recovery is not yet supported till YARN-2800 is committed.

I just started using NodeLabel feature(it is still in development process) and 
I stuck with several doubts on usage of NodeLabels features. Any document 
available?
1. How can I configure Nodelabels? Is it only rmadmin as of now?
2. I set labels to NM from rmadmin,but how do I make use of these labels?
If you dont mind, please give me crisp details.


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith

 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219220#comment-14219220
 ] 

Hudson commented on YARN-2315:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays


[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219226#comment-14219226
 ] 

Hudson commented on YARN-2802:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/749/])
YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu 
via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java


 ClusterMetrics to include AM launch and register delays
 ---

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
 YARN-2802.005.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219230#comment-14219230
 ] 

Hudson commented on YARN-2865:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/749/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* hadoop-yarn-project/CHANGES.txt


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219229#comment-14219229
 ] 

Hudson commented on YARN-2315:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/749/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting


[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219231#comment-14219231
 ] 

Hudson commented on YARN-2878:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/749/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Fix DockerContainerExecutor.apt.vm formatting
 -

 Key: YARN-2878
 URL: https://issues.apache.org/jira/browse/YARN-2878
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Fix For: 2.7.0

 Attachments: YARN-1964-docs.patch


 The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.


[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219244#comment-14219244
 ] 

Hadoop QA commented on YARN-2675:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682609/YARN-2675.004.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5887//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5887//console

This message is automatically generated.

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2301) Improve yarn container command

[
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naganarasimha G R updated YARN-2301:

Attachment: YARN-2301.20141120-1.patch

bq. NM can setup SSL and so the port can also be https port.
Ok. crosschecked the code, http/https port is set in RMNode.httpport based on
configuration. So there should not be any issues

bq. I meant Times.format is internally doing the check.
Ok, corrected

bq. we may set the conf object in the rmContext and get it from context
Ok, corrected

Improve yarn container command
--

Key: YARN-2301
URL: https://issues.apache.org/jira/browse/YARN-2301
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
Labels: usability
Attachments: YARN-2301.01.patch, YARN-2301.03.patch,
YARN-2301.20141120-1.patch, YARN-2303.patch

While running yarn container -list Application Attempt ID command, some
observations:
1) the scheme (e.g. http/https ) before LOG-URL is missing
2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to
print as time format.
3) finish-time is 0 if container is not yet finished. May be N/A
4) May have an option to run as yarn container -list appId OR yarn
application -list-containers appId also.
As attempt Id is not shown on console, this is easier for user to just copy
the appId and run it, may also be useful for container-preserving AM
restart.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219287#comment-14219287
 ] 

Naganarasimha G R commented on YARN-2495:
-

Following 3 test failure issues does not seem to be introduced from of my 
modifications 
{quote}
 TestApplicationClientProtocolOnHA.testGetContainersOnHA:154
  TestApplicationClientProtocolOnHA.testSubmitApplicationOnHA:173 
  TestApplicationClientProtocolOnHA.testGetClusterMetricsOnHA:85
{quote}

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics

2014-11-20 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219307#comment-14219307
 ] 

Rohith commented on YARN-2599:
--

I do agree that StandBy RM should expose jmx and metrics. I did some analysis 
by comparing /jmx for active rm's and standby rm's(with private patch). All the 
metrics that were listed in /jmx are consistent with Active details and standby 
details. But I could not find any details when /metrics page when I tried 
from browser's(IE,chrome and fairefox).It displayed empty page!!! . I think 
/metrics details are embedded in /jmx only.




 Standby RM should also expose some jmx and metrics
 --

 Key: YARN-2599
 URL: https://issues.apache.org/jira/browse/YARN-2599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith

 YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
 need to separate out metrics displayed so the Standby RM can also be 
 monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor


[ 
https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219322#comment-14219322
 ] 

Hadoop QA commented on YARN-2243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682603/YARN-2243.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5888//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5888//console

This message is automatically generated.

 Order of arguments for Preconditions.checkNotNull() is wrong in 
 SchedulerApplicationAttempt ctor
 

 Key: YARN-2243
 URL: https://issues.apache.org/jira/browse/YARN-2243
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Devaraj K
Priority: Minor
 Attachments: YARN-2243.patch, YARN-2243.patch


 {code}
   public SchedulerApplicationAttempt(ApplicationAttemptId 
 applicationAttemptId, 
   String user, Queue queue, ActiveUsersManager activeUsersManager,
   RMContext rmContext) {
 Preconditions.checkNotNull(RMContext should not be null, rmContext);
 {code}
 Order of arguments is wrong for Preconditions.checkNotNull().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class


[ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219324#comment-14219324
 ] 

Hadoop QA commented on YARN-2404:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682598/YARN-2404.5.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5889//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5889//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5889//console

This message is automatically generated.

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch, YARN-2404.5.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2881) Implement PlanFollower for FairScheduler

Anubhav Dhoot created YARN-2881:
---

 Summary: Implement PlanFollower for FairScheduler
 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero

2014-11-20 Thread Vasanth kumar RJ (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasanth kumar RJ updated YARN-2165:
---
Attachment: YARN-2165.3.patch

[~zjshen] Implemented your suggestion. Kindly review.

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 -

 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh
Assignee: Vasanth kumar RJ
 Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.3.patch, 
 YARN-2165.patch


 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 Currently if set yarn.timeline-service.ttl-ms=0 
 Or yarn.timeline-service.ttl-ms=-86400 
 Timeline server start successfully with complaining
 {code}
 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
 (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
 -60480 and cycle interval 30
 {code}
 At starting timelinserver should that yarn.timeline-service-ttl-ms  0
 otherwise specially for -ive value discard oldvalues timestamp will be set 
 future value. Which may lead to inconsistancy in behavior 
 {code}
 public void run() {
   while (true) {
 long timestamp = System.currentTimeMillis() - ttl;
 try {
   discardOldEntities(timestamp);
   Thread.sleep(ttlInterval);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219406#comment-14219406
 ] 

Hudson commented on YARN-2865:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting


[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219407#comment-14219407
 ] 

Hudson commented on YARN-2878:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Fix DockerContainerExecutor.apt.vm formatting
 -

 Key: YARN-2878
 URL: https://issues.apache.org/jira/browse/YARN-2878
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Fix For: 2.7.0

 Attachments: YARN-1964-docs.patch


 The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219405#comment-14219405
 ] 

Hudson commented on YARN-2315:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219413#comment-14219413
 ] 

Hudson commented on YARN-2315:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1939 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1939/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219414#comment-14219414
 ] 

Hudson commented on YARN-2865:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1939 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1939/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero


[ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219417#comment-14219417
 ] 

Hadoop QA commented on YARN-2165:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682646/YARN-2165.3.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5890//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5890//console

This message is automatically generated.

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 -

 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh
Assignee: Vasanth kumar RJ
 Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.3.patch, 
 YARN-2165.patch


 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 Currently if set yarn.timeline-service.ttl-ms=0 
 Or yarn.timeline-service.ttl-ms=-86400 
 Timeline server start successfully with complaining
 {code}
 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
 (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
 -60480 and cycle interval 30
 {code}
 At starting timelinserver should that yarn.timeline-service-ttl-ms  0
 otherwise specially for -ive value discard oldvalues timestamp will be set 
 future value. Which may lead to inconsistancy in behavior 
 {code}
 public void run() {
   while (true) {
 long timestamp = System.currentTimeMillis() - ttl;
 try {
   discardOldEntities(timestamp);
   Thread.sleep(ttlInterval);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219468#comment-14219468
 ] 

Hudson commented on YARN-2315:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting


[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219470#comment-14219470
 ] 

Hudson commented on YARN-2878:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm


 Fix DockerContainerExecutor.apt.vm formatting
 -

 Key: YARN-2878
 URL: https://issues.apache.org/jira/browse/YARN-2878
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Fix For: 2.7.0

 Attachments: YARN-1964-docs.patch


 The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219469#comment-14219469
 ] 

Hudson commented on YARN-2865:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays


[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219466#comment-14219466
 ] 

Hudson commented on YARN-2802:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/])
YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu 
via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 ClusterMetrics to include AM launch and register delays
 ---

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
 YARN-2802.005.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting


[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219482#comment-14219482
 ] 

Hudson commented on YARN-2878:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm


 Fix DockerContainerExecutor.apt.vm formatting
 -

 Key: YARN-2878
 URL: https://issues.apache.org/jira/browse/YARN-2878
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Abin Shahab
Assignee: Abin Shahab
 Fix For: 2.7.0

 Attachments: YARN-1964-docs.patch


 The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays


[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219478#comment-14219478
 ] 

Hudson commented on YARN-2802:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/])
YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu 
via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java
* hadoop-yarn-project/CHANGES.txt


 ClusterMetrics to include AM launch and register delays
 ---

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
 YARN-2802.005.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219481#comment-14219481
 ] 

Hudson commented on YARN-2865:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219480#comment-14219480
 ] 

Hudson commented on YARN-2315:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: Set current capacity in addition to capacity
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
 YARN-2315.003.patch, YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class


 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2404:
-
Attachment: YARN-2404.6.patch

Fixed warnings by findbugs.

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2727) In RMAdminCLI usage display, instead of yarn.node-labels.fs-store.root-dir, yarn.node-labels.fs-store.uri is being displayed


[ 
https://issues.apache.org/jira/browse/YARN-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219491#comment-14219491
 ] 

Naganarasimha G R commented on YARN-2727:
-

Hi [~wangda],
As discussed, shall i close this issue as you have handled as part of other 
jira?

 In RMAdminCLI usage display, instead of yarn.node-labels.fs-store.root-dir, 
 yarn.node-labels.fs-store.uri is being displayed
 

 Key: YARN-2727
 URL: https://issues.apache.org/jira/browse/YARN-2727
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-2727.20141023.1.patch


 In org.apache.hadoop.yarn.client.cli.RMAdminCLI usage display instead of 
 yarn.node-labels.fs-store.root-dir, yarn.node-labels.fs-store.uri is 
 being used
 And also some modifications for the description



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework


 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2375:

Attachment: YARN-2375.patch

Thanks for taking a look [~zjshen] and [~jeagles].
Attaching updated patch to address Zhijie's comments.

*Additions:*
# In ApplicationMaster.finish(), we now stop the timeline client if the 
timelineClient instance is not null.
# Fixed the indent issue in TimelineClientImpl#serviceInit().
# LOG.info(Timeline server is (not) enabled) changed to LOG.info(Timeline 
service is (not) enabled); to be consistent with the log statements in other 
places.

I still have not added the test case for testing the scenario if 
MAPREDUCE_JOB_EMIT_TIMELINE_DATA = true and TIMELINE_SERVICE_ENABLED = false; 
MiniMRYarnCluster doesn't start the timeline server.

MiniMRYarnCluster seems to follow a little different path for starting up the 
timeline service. I am investigating that currently. I propose to address that 
in a followup jira. That way we can have the important fix checked in.
If you guys are ok with that, I will file a jira.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219546#comment-14219546
 ] 

Hadoop QA commented on YARN-2375:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682667/YARN-2375.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5891//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5891//console

This message is automatically generated.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2517) Implement TimelineClientAsync


 [ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2517:
-
Attachment: YARN-2517.2.patch

Sorry for the delay. Attached Future-based implementation for simplicity. I 
think this design is one of the best way we go to.

[~vinodkv], [~zjshen], should we add read APIs on another JIRA? And, do you 
have any opinions about Future-based design?



 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2517) Implement TimelineClientAsync


[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219554#comment-14219554
 ] 

Tsuyoshi OZAWA commented on YARN-2517:
--

[~mitdesai] I think you're one of the users of TimelineClient. If you have any 
feedbacks about the interface, please let me know.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class


[ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219581#comment-14219581
 ] 

Hadoop QA commented on YARN-2404:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682668/YARN-2404.6.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5892//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5892//console

This message is automatically generated.

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2517) Implement TimelineClientAsync


[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219587#comment-14219587
 ] 

Hadoop QA commented on YARN-2517:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682679/YARN-2517.2.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5893//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5893//console

This message is automatically generated.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch, YARN-2517.2.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-20 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219598#comment-14219598
]

Allen Wittenauer commented on YARN-2786:

bq. Does this make sense?

No, it doesn't. It completely ignores 20+ years of industry experience towards
operations and configuration management of large scale installations. This is
mostly exemplified by this comment:

bq. I'd rather have the tools call an API instead of 'automatically' sshing
into 1000 machines and changing labels.

I'm completely stunned and saddened by this ignorance. I suspect that there is
a corporate mandate to get Ambari working as a third tier scheduling system by
dictating where services run. But that mandate (and its likely required
deliverable time) has put blinders on the architecture and may very well cause
long term pain and could potentially prevent other, more complex needs from
being met.

The only silver linings I'm seeing are thus:
* We still have time to undo the damage either now or in 3.x.
* Selfishly, this will give me years of material about how not to design a
system to be operationally friendly.

Create yarn cluster CLI to enable list node labels collection
-

Key: YARN-2786
URL: https://issues.apache.org/jira/browse/YARN-2786
Project: Hadoop YARN
Issue Type: Sub-task
Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch,
YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch,
YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch,
YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch,
YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch

With YARN-2778, we can list node labels on existing RM nodes. But it is not
enough, we should be able to:
1) list node labels collection
The command should start with yarn cluster ..., in the future, we can add
more functionality to the yarnClusterCLI

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-11-20 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2356:
--
Attachment: 0002-YARN-2356.patch

bq.I thought it would be good if we avoid rehandling the same exception
Yes [~devaraj.k]. I also feel that is more better. Double throwing of exception 
can be removed. 

I also updated the test case as mentioned.

 yarn status command for non-existent application/application 
 attempt/container is too verbose 
 --

 Key: YARN-2356
 URL: https://issues.apache.org/jira/browse/YARN-2356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
 Yarn-2356.1.patch


 *yarn application -status* or *applicationattempt -status* or *container 
 status* commands can suppress exception such as ApplicationNotFound, 
 ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
 RM or History Server. 
 For example, below exception can be suppressed better
 sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status 
 application_1402668848165_0015
 No GC_PROFILE is given. Defaults to medium.
 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
 /10.18.40.77:45022
 Exception in thread main 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1402668848165_0015' doesn't exist in RM.
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
 at $Proxy12.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
  Application with id 'application_1402668848165_0015' doesn't exist in RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated


 [ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2854:

Attachment: YARN-2854.20141120-1.patch

Patch for documentation lssues in timeline server. [~zjshen] can you please 
review 

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: YARN-2854.20141120-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2877) Extend YARN to support distributed scheduling


 [ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-2877:
--

Assignee: Carlo Curino

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao
Assignee: Carlo Curino

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2877) Extend YARN to support distributed scheduling


 [ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2877:
---
Assignee: (was: Carlo Curino)

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose


[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219723#comment-14219723
 ] 

Hadoop QA commented on YARN-2356:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682693/0002-YARN-2356.patch
  against trunk revision a9a0cc3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5894//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5894//console

This message is automatically generated.

 yarn status command for non-existent application/application 
 attempt/container is too verbose 
 --

 Key: YARN-2356
 URL: https://issues.apache.org/jira/browse/YARN-2356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
 Yarn-2356.1.patch


 *yarn application -status* or *applicationattempt -status* or *container 
 status* commands can suppress exception such as ApplicationNotFound, 
 ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
 RM or History Server. 
 For example, below exception can be suppressed better
 sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status 
 application_1402668848165_0015
 No GC_PROFILE is given. Defaults to medium.
 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
 /10.18.40.77:45022
 Exception in thread main 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1402668848165_0015' doesn't exist in RM.
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

[jira] [Created] (YARN-2882) Introducing container types

Konstantinos Karanasos created YARN-2882:


 Summary: Introducing container types
 Key: YARN-2882
 URL: https://issues.apache.org/jira/browse/YARN-2882
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


This JIRA introduces the notion of container types.
We propose two initial types of containers: guaranteed-start and queueable 
containers.
Guaranteed-start are the existing containers, which are allocated by the 
central RM and are instantaneously started, once allocated.
Queueable is a new type of container, which allows containers to be queued in 
the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2882) Introducing container types

[
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219763#comment-14219763
]

Carlo Curino commented on YARN-2882:

To help understand this notion, think of containers types as a priority. The
*guaranteed-start* containers, have higher priority, and are never over-booked
(i.e., when they show up in the NM they are started instantaneously). By
contrast the *queueable* containers are sent to the NM, and will be started
only when there is room in the node. Also if *guaranteed-start* containers show
up in a node that was completely utilized running *queueable* containers, the
*queueable* containers are preempted/killed, to guarantee the start of the
higher priority containers.

_(The rest of the comment below is covered in other sub-JIRAs of YARN-2877,
adding here some hints to the ideas for context)_
By having an explicit notion of container types, the AM can control when to use
one type vs the other. For example, one can use *queueable* containers for
tasks that are not yet on the critical path, and/or for short-running tasks
(higher chance to complete). One important use of *queueable* containers is to
allow us to boost utilization of the nodes (having a queue of work, minimize
the times in which the NM is idle).

Introducing container types
---

Key: YARN-2882
URL: https://issues.apache.org/jira/browse/YARN-2882
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager, resourcemanager
Reporter: Konstantinos Karanasos

This JIRA introduces the notion of container types.
We propose two initial types of containers: guaranteed-start and queueable
containers.
Guaranteed-start are the existing containers, which are allocated by the
central RM and are instantaneously started, once allocated.
Queueable is a new type of container, which allows containers to be queued in
the NM, thus their execution may be arbitrarily delayed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem


[ 
https://issues.apache.org/jira/browse/YARN-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219769#comment-14219769
 ] 

Anubhav Dhoot commented on YARN-2773:
-

getAdmissionPolicy uses the planQueuePath (fully qualified) while rest of the 
methods (e.g. getPlanQueueCapacity) uses planQueueName (just leaf queue name). 

 ReservationSystem's use of Queue names vs paths is inconsistent for 
 CapacityReservationSystem and FairReservationSystem  
 -

 Key: YARN-2773
 URL: https://issues.apache.org/jira/browse/YARN-2773
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Priority: Minor

 Reservation system requires use the ReservationDefinition to use a queue name 
 to choose which reservation queue is being used. CapacityScheduler does not 
 allow duplicate leaf queue names. Because of this we can refer to a unique 
 leaf queue by simply using its name and not full path (which includes 
 parentName + .). FairScheduler allows duplicate leaf queue names because of 
 which one needs to refer to the full queue name to identify a queue uniquely. 
 This is inconsistent for the implementation of the AbstractReservationSystem 
 where one implementation of getQueuePath will do conversion 
 (CapacityReservationSystem) while the FairReservationSystem will return the 
 same value back 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2883) Queuing of container requests in the NM

Konstantinos Karanasos created YARN-2883:


 Summary: Queuing of container requests in the NM
 Key: YARN-2883
 URL: https://issues.apache.org/jira/browse/YARN-2883
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


We propose to add a queue in each NM, where queueable container requests can be 
held.
Based on the available resources in the node and the containers in the queue, 
the NM will decide when to allow the execution of a queued container.
In order to ensure the instantaneous start of a guaranteed-start container, the 
NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2882) Introducing container types


[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219783#comment-14219783
 ] 

Konstantinos Karanasos commented on YARN-2882:
--

The queuing of containers is discussed in YARN-2883.

 Introducing container types
 ---

 Key: YARN-2882
 URL: https://issues.apache.org/jira/browse/YARN-2882
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Konstantinos Karanasos

 This JIRA introduces the notion of container types.
 We propose two initial types of containers: guaranteed-start and queueable 
 containers.
 Guaranteed-start are the existing containers, which are allocated by the 
 central RM and are instantaneously started, once allocated.
 Queueable is a new type of container, which allows containers to be queued in 
 the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2884) Proxying all AM-RM communications

Carlo Curino created YARN-2884:
--

 Summary: Proxying all AM-RM communications
 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


We introduce the notion of an RMProxy, running on each node (or once per rack). 
Upon start the AM is forced (via tokens and configuration) to direct all its 
requests to a new services running on the NM that provide a proxy to the 
central RM. 

This give us a place to:
1) perform distributed scheduling decisions
2) throttling mis-behaving AMs
3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2885) LocalRM: distributed scheduling decisions for queueable containers

Konstantinos Karanasos created YARN-2885:


 Summary: LocalRM: distributed scheduling decisions for queueable 
containers
 Key: YARN-2885
 URL: https://issues.apache.org/jira/browse/YARN-2885
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
support distributed scheduling decisions. 
Architecturally we leverage the RMProxy, introduced in YARN-2884. 
The LocalRM makes distributed decisions for queuable containers requests. 
Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

[
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219844#comment-14219844
]

Karthik Kambatla commented on YARN-2877:

+1 to the idea, particularly to reduce the allocation latency. I definitely see
Impala wanting to use this in the future. Not mentioned in the description, I
believe scale is probably another big reason for distributed scheduling.

bq. Improve cluster utilization by opportunistically executing tasks otherwise
idle resources on individual machines.
A centralized RM could schedule tasks opportunistically too? Is the intention
to quickly adapt to changing resource usage on the node, and the latency due to
NM-RM-NM communication being too long to loose this window of opportunity?

Extend YARN to support distributed scheduling
-

Key: YARN-2877
URL: https://issues.apache.org/jira/browse/YARN-2877
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, resourcemanager
Reporter: Sriram Rao

This is an umbrella JIRA that proposes to extend YARN to support distributed
scheduling. Briefly, some of the motivations for distributed scheduling are
the following:
1. Improve cluster utilization by opportunistically executing tasks otherwise
idle resources on individual machines.
2. Reduce allocation latency. Tasks where the scheduling time dominates
(i.e., task execution time is much less compared to the time required for
obtaining a container from the RM).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219850#comment-14219850
 ] 

Karthik Kambatla commented on YARN-2884:


Given we already have an RMProxy, can we go with LocalRM as Sriram suggested on 
YARN-2877? 

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino

 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2886) Estimating waiting time in NM container queues

Konstantinos Karanasos created YARN-2886:


 Summary: Estimating waiting time in NM container queues
 Key: YARN-2886
 URL: https://issues.apache.org/jira/browse/YARN-2886
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


This JIRA is about estimating the waiting time of each NM queue.
Having these estimates is crucial for the distributed scheduling of container 
requests, as it allows the LocalRM to decide in which NMs to queue the queuable 
container requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2887) AM policies for choosing type of containers

Konstantinos Karanasos created YARN-2887:


 Summary: AM policies for choosing type of containers
 Key: YARN-2887
 URL: https://issues.apache.org/jira/browse/YARN-2887
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


Each AM can employ policies that determine what type of container 
(guaranteed-start or queueable) should be requested for each task. 
An example policy may be to use only guaranteed-start or only queueable 
containers, or to randomly pick a percentage of the requests to be queueable, 
or to choose the container type based on the characteristics of the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

[
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219871#comment-14219871
]

Carlo Curino commented on YARN-2877:

Karthik, you are correct...

Karthik, glad you like the idea, and you ask good questions...

This could be relevant to lower the load on the central RM (hence help with
scale), in particular if we have a vast number of short-lived tasks (heavy
scheduling cost for little work).
(However, we have other ongoing work towards that, which we will post soon,
hence the focus on utilization)

What takes care of the fast adaption to node conditions is having a local
queue (from which to pick more work if I am idle), and the notion of different
containers types (i.e., I can kick out the optimistic containers if I am
overbooked).
With this in mind, the RM could be the one making scheduling decisions for
queueable/optimistic containers as well, as you pointed out.

What is constant (whether you make the scheduling decisions centrally or
distributed), is the notion of different container types (see YARN-2882).
This should be exposed to the AM, as it comes with very different level of
guarantees on the container start/completion.
Thus the AM need to know which type of containers to use for different tasks
(e.g., short lived or non-critical-path containers can be optimistic).

Extend YARN to support distributed scheduling
-

Key: YARN-2877
URL: https://issues.apache.org/jira/browse/YARN-2877
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, resourcemanager
Reporter: Sriram Rao

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

Konstantinos Karanasos created YARN-2888:


 Summary: Corrective mechanisms for rebalancing NM container queues
 Key: YARN-2888
 URL: https://issues.apache.org/jira/browse/YARN-2888
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
the scheduling decisions or due to having a stale image of the system) may lead 
to an imbalance in the waiting times of the NM container queues. This can in 
turn have an impact in job execution times and cluster utilization.
To this end, we introduce corrective mechanisms that may remove (whenever 
needed) container requests from overloaded queues, adding them to less-loaded 
ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-20 Thread Sriram Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219874#comment-14219874
 ] 

Sriram Rao commented on YARN-2877:
--

[~kasha]  (1) Yes, the central RM can allocate optimistic containers, however, 
as you note it introduces extra latency. (2) Scaling the RM's allocation 
particularly when you have small tasks is another motivation as well.

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219881#comment-14219881
 ] 

Carlo Curino commented on YARN-2884:


I agree we should give it another name... but the LocalRM is a slightly 
different concept YARN-2885, i.e., it is the logic making distributed 
scheduling decisions. 
The Proxy itself is just the mechanics to hijack the connection between 
AM-RM, which we will need for some more work on federating multiple RMs (JIRAs 
coming soon).

Hence the need to call out separately the architectural piece (proxy) and the 
distributed scheduling logic (LocalRM). Any name suggestion?  

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino

 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2014-11-20 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219894#comment-14219894
 ] 

Subru Krishnan commented on YARN-2884:
--

What about RMAgent ?

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino

 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219898#comment-14219898
 ] 

Konstantinos Karanasos commented on YARN-2884:
--

Karthik, just a clarification: what is the current RMProxy responsible for?
As Carlo says, the functionality needed for the distributed scheduling is 
explained in more detail in YARN-2885, where we introduce the LocalRM.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino

 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-20 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2604:

Attachment: YARN-2604.patch

The new patch addresses Karthik's 2nd suggestion.  That actually made it so 
that we didn't need to exclude anything from findbugs, making the 1st 
suggestion moot now.

I spoke to Karthik offline; the 3rd suggestion does not apply because we're not 
looking for the node to remove it; we're looking for the new largest node.

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2889) Limit in the number of queueable container requests per AM

Konstantinos Karanasos created YARN-2889:


 Summary: Limit in the number of queueable container requests per AM
 Key: YARN-2889
 URL: https://issues.apache.org/jira/browse/YARN-2889
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos


We introduce a way to limit the number of queueable requests that each AM can 
submit to the LocalRM.
This way we can restrict the number of queueable containers handed out by the 
system, as well as throttle down misbehaving AMs (asking for too many queueable 
containers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219914#comment-14219914
 ] 

Anubhav Dhoot commented on YARN-2738:
-

The issue is the only configuration in the system is at the per queue level. I 
can add a new configuration level for global defaults in addition to this if 
needed in future. Have opened YARN-2881 for FairSchedulerPlanFollower.

 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
 YARN-2738.003.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-20 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219930#comment-14219930
 ] 

Jonathan Eagles commented on YARN-2375:
---

I think creating a separate ticket for enabling timeline server in the mini MR 
cluster is a good idea. changes look good to me. [~zjshen], any additional 
feedback before this goes in? 

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2375:
--
Attachment: YARN-2375.1.patch

+1. LGTM. I uploaded a new patch to just fix the indent issue for one line.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node


[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220030#comment-14220030
 ] 

Hadoop QA commented on YARN-2604:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682729/YARN-2604.patch
  against trunk revision eb4045e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5895//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5895//console

This message is automatically generated.

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

Mit Desai created YARN-2890:
---

 Summary: MiniMRYarnCluster should turn on timeline service if 
configured to do so
 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
 Fix For: 2.6.1


Currently the MiniMRYarnCluster does not consider the configuration value for 
enabling timeline service before starting. The MiniYarnCluster should only 
start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-20 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220047#comment-14220047
 ] 

Arun C Murthy commented on YARN-2139:
-

Sorry, been busy with 2.6.0 - just coming up for air.

What are we modeling with vdisk again? What is the metric? Is it directly the 
blkio parameter? If so, that is my biggest concern.

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220045#comment-14220045
 ] 

Mit Desai commented on YARN-2375:
-

typo YARN-2890

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220044#comment-14220044
 ] 

Mit Desai commented on YARN-2375:
-

Thanks [~zjshen] for picking that indenting issue. I have filed YARH-2890 for 
addressing the test case scenario.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220077#comment-14220077
 ] 

Karthik Kambatla commented on YARN-2139:


It is very similar to vcores. vdisks is the number of virtual disks, no metric 
just a number. 

If we want to allow upto 'n' tasks to share a disk, {{vdisks = n * num-disks}}. 
For cases with n  1, spindle locality will help with ensuring all the 'n' 
vdisks correspond to the same spindle(s). 

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220091#comment-14220091
 ] 

Hadoop QA commented on YARN-2375:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682743/YARN-2375.1.patch
  against trunk revision eb4045e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5896//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5896//console

This message is automatically generated.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node


[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220199#comment-14220199
 ] 

Karthik Kambatla commented on YARN-2604:


Thanks Robert. One more thing I missed - we need to handle vcores in addition 
to memory. I was hoping this would come for free with Resource suggestion, 
but from looking at the code, I think we should handle vcores alongside memory 
the way the patch does now. 

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220203#comment-14220203
 ] 

Karthik Kambatla commented on YARN-2884:


RMAgent seems okay to me. 

RMProxy is responsible to create a Proxy depending on the protocol the client 
wants to converse with the RM. 

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino

 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2806) log container allocation requests

2014-11-20 Thread Eric Wohlstadter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220214#comment-14220214
 ] 

Eric Wohlstadter commented on YARN-2806:


Looking at scheduler.AppSchedulingInfo (lines 141-146, trunk):

What is the significance of ResourceRequest.ANY, in terms of determining 
whether to LOG a ResourceRequest? Why only ResourceRequest.ANY?

Why is the ANY location the only one which can determine that 
updatePendingResources = true?

Are all updated resource requests from the AM initiated with a ResourceRequest 
at the ANY location? 

Can all allocate calls from the AM which do not include a ResourceRequest.ANY 
be considered followup requests to a previous initial request for those 
resources (e.g. by asking for less number of containers in the followup or by 
modifying preferred locations in the followup)?

{code:title=AppSchedulingInfo(141-146) |borderStyle=solid}
if (resourceName.equals(ResourceRequest.ANY)) {
if (LOG.isDebugEnabled()) {
  LOG.debug(update: +  application= + applicationId +  request=
  + request);
}
updatePendingResources = true;
{code}

 log container allocation requests
 -

 Key: YARN-2806
 URL: https://issues.apache.org/jira/browse/YARN-2806
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Eric Wohlstadter
 Attachments: YARN-2806.patch


 I might have missed it, but I don't see where we log application container 
 requests outside of the DEBUG context.  Without this being logged, we have no 
 idea on a per application the lag an application might be having in the 
 allocation system. 
 We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-11-20 Thread Matteo Mazzucchelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Mazzucchelli updated YARN-2664:
--
Attachment: YARN-2664.3.patch

Hi,
Submitted the new version of the patch.
The patch includes unit tests and corrections of the previous errors.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220239#comment-14220239
 ] 

Hadoop QA commented on YARN-2664:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682778/YARN-2664.3.patch
  against trunk revision 90194ca.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5897//console

This message is automatically generated.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2188) Client service for cache manager


[ 
https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220256#comment-14220256
 ] 

Karthik Kambatla commented on YARN-2188:


Sorry, didn't realize this patch was updated. Patch looks mostly good. Some 
minor comments:
# Rename yarn.sharedcache.client.server.* to yarn.sharedcache.client-server.*? 
# ClientSCMProtocolPBClientImpl#close should set this.proxy to null.
# In the test, cleanUp() should set variables to null after stopping them. 


 Client service for cache manager
 

 Key: YARN-2188
 URL: https://issues.apache.org/jira/browse/YARN-2188
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch, 
 YARN-2188-trunk-v3.patch, YARN-2188-trunk-v4.patch


 Implement the client service for the shared cache manager. This service is 
 responsible for handling client requests to use and release resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-20 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220269#comment-14220269
]

Vinod Kumar Vavilapalli commented on YARN-2786:
---

Let me conclude this, it has gone on for far too long.

We should agree to disagree. It's ops _and_ developers, not _or_. You want to
manually configure through scripts, you will get it in the distributed-config
setup. The rest of us want to configure programmatically through APIs, we will
have that as an option. I don't see a technical argument against what is done
so far, only opinions on which is the right approach.

This JIRA is not the place for this discussion, if you have more qualms about
this, you should comment on YARN-796. Suggestions to change design in a leaf
JIRA is not constructive.

Create yarn cluster CLI to enable list node labels collection
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.


 [ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2664:
---
Attachment: YARN-2664.4.patch

simply adding a new-line at the end to get patch to apply

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220389#comment-14220389
 ] 

Hadoop QA commented on YARN-2664:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682782/YARN-2664.4.patch
  against trunk revision 90194ca.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 5 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5898//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5898//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5898//console

This message is automatically generated.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220395#comment-14220395
 ] 

Carlo Curino commented on YARN-2664:


I tried to test this, but:
 #  the patch was missing a new-line at the end to go through with patch... I 
added it. 
 #  It is missing a couple of .js files (I guess imported by other .js)... I 
think d3.js possibly more. 

That was true of your previous patch as well (I manually fixed it in my 
previous tests)
You should definitely included those files in the patch, and please make sure 
to test this on a 
clean machine with a clean browser.

I am happy to try out a new patch once that is done. (In fact, I would deploy 
it in a research cluster 
where we are using this stuff actively, so we get some foot-traffic on the UI).



 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-20 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2604:

Attachment: YARN-2604.patch

The new patch adds similar code for scores.  I made some other minor changes 
and updated the unit tests.

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node


[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220510#comment-14220510
 ] 

Hadoop QA commented on YARN-2604:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682805/YARN-2604.patch
  against trunk revision 90194ca.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5899//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5899//console

This message is automatically generated.

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
 YARN-2604.patch, YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework


[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220551#comment-14220551
 ] 

Hudson commented on YARN-2375:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6584 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6584/])
YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai 
via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java


 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-11-20 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220580#comment-14220580
 ] 

Sunil G commented on YARN-2356:
---

Test case failure of TestApplicationClientProtocolOnHA is not related to this 
patch.

 yarn status command for non-existent application/application 
 attempt/container is too verbose 
 --

 Key: YARN-2356
 URL: https://issues.apache.org/jira/browse/YARN-2356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
 Yarn-2356.1.patch


 *yarn application -status* or *applicationattempt -status* or *container 
 status* commands can suppress exception such as ApplicationNotFound, 
 ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
 RM or History Server. 
 For example, below exception can be suppressed better
 sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status 
 application_1402668848165_0015
 No GC_PROFILE is given. Defaults to medium.
 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
 /10.18.40.77:45022
 Exception in thread main 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1402668848165_0015' doesn't exist in RM.
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
 at $Proxy12.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at 
 org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
  Application with id 'application_1402668848165_0015' doesn't exist in RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2881) Implement PlanFollower for FairScheduler