date:20140107


[ 
https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864008#comment-13864008
 ] 

Hadoop QA commented on YARN-1482:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621760/YARN-1482.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2811//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2811//console

This message is automatically generated.

 WebApplicationProxy should be always-on w.r.t HA even if it is embedded in 
 the RM
 -

 Key: YARN-1482
 URL: https://issues.apache.org/jira/browse/YARN-1482
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, 
 YARN-1482.4.patch, YARN-1482.4.patch, YARN-1482.5.patch, YARN-1482.5.patch, 
 YARN-1482.6.patch


 This way, even if an RM goes to standby mode, we can affect a redirect to the 
 active. And more importantly, users will not suddenly see all their links 
 stop working.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1560) TestYarnClient#testAMMRTokens fails with null AMRM token


[ 
https://issues.apache.org/jira/browse/YARN-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864101#comment-13864101
 ] 

Hudson commented on YARN-1560:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
YARN-1560. Fixed TestYarnClient#testAMMRTokens failure with null AMRM token. 
(Contributed by Ted Yu) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555975)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 TestYarnClient#testAMMRTokens fails with null AMRM token
 

 Key: YARN-1560
 URL: https://issues.apache.org/jira/browse/YARN-1560
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-1560-v1.txt, yarn-1560-v2.txt


 The following can be reproduced locally:
 {code}
 testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient)  Time 
 elapsed: 3.341 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertNotNull(Assert.java:218)
   at junit.framework.Assert.assertNotNull(Assert.java:211)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382)
 {code}
 This test didn't appear in 
 https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM


[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864106#comment-13864106
 ] 

Hudson commented on YARN-1029:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
YARN-1029. Added embedded leader election in the ResourceManager. Contributed 
by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556103)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
 yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-10.patch, yarn-1029-2.patch, 
 yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, 
 yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch,

[jira] [Commented] (YARN-1559) Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE


[ 
https://issues.apache.org/jira/browse/YARN-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864103#comment-13864103
 ] 

Hudson commented on YARN-1559:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
YARN-1559. Race between ServerRMProxy and ClientRMProxy setting 
RMProxy#INSTANCE. (kasha and vinodkv via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555970)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ServerRMProxy.java


 Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE
 -

 Key: YARN-1559
 URL: https://issues.apache.org/jira/browse/YARN-1559
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1559-20140105.txt, yarn-1559-1.patch, 
 yarn-1559-2.patch, yarn-1559-3.patch


 RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and 
 ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
 Sample trace:
 {noformat}
 java.lang.IllegalArgumentException: RM does not support this client protocol
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at 
 org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
 at 
 org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
 at 
 org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1287) Consolidate MockClocks


 [ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Wong updated YARN-1287:
-

Attachment: (was: YARN-1287-2.patch)

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1287) Consolidate MockClocks


 [ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Wong updated YARN-1287:
-

Attachment: YARN-1287-3.patch

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1287) Consolidate MockClocks


 [ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Wong updated YARN-1287:
-

Attachment: (was: YARN-1287-3.patch)

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie

 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1287) Consolidate MockClocks


 [ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Wong updated YARN-1287:
-

Attachment: YARN-1287-3.patch

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1287) Consolidate MockClocks


[ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864161#comment-13864161
 ] 

Hadoop QA commented on YARN-1287:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621778/YARN-1287-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2812//console

This message is automatically generated.

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1287) Consolidate MockClocks


[ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864170#comment-13864170
 ] 

Hadoop QA commented on YARN-1287:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621781/YARN-1287-3.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2813//console

This message is automatically generated.

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864209#comment-13864209
 ] 

Steve Loughran commented on YARN-896:
-

Link to YARN-1489: umbrella JAR for work preserving AM restart

 Roll up for long-lived services in YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart


[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864212#comment-13864212
 ] 

Steve Loughran commented on YARN-1489:
--

regarding the rebinding problem, YARN-913 proposes some registry where we 
restrict the names of services and apps, and require uniqueness. This lets us 
register something like (hoya, stevel, accumulo5) and then let a client app 
look it up.

Today we have the list of running apps, and you can find and bind to one, but
# there's nothing to stop a single user having 1 instance of the same name
# there's no way for a AM to enumerate this as the list operation isn't in the 
AMRM protocol



 [Umbrella] Work-preserving ApplicationMaster restart
 

 Key: YARN-1489
 URL: https://issues.apache.org/jira/browse/YARN-1489
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Work preserving AM restart.pdf


 Today if AMs go down,
  - RM kills all the containers of that ApplicationAttempt
  - New ApplicationAttempt doesn't know where the previous containers are 
 running
  - Old running containers don't know where the new AM is running.
 We need to fix this to enable work-preserving AM restart. The later two 
 potentially can be done at the app level, but it is good to have a common 
 solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1560) TestYarnClient#testAMMRTokens fails with null AMRM token


[ 
https://issues.apache.org/jira/browse/YARN-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864218#comment-13864218
 ] 

Hudson commented on YARN-1560:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
YARN-1560. Fixed TestYarnClient#testAMMRTokens failure with null AMRM token. 
(Contributed by Ted Yu) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555975)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 TestYarnClient#testAMMRTokens fails with null AMRM token
 

 Key: YARN-1560
 URL: https://issues.apache.org/jira/browse/YARN-1560
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-1560-v1.txt, yarn-1560-v2.txt


 The following can be reproduced locally:
 {code}
 testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient)  Time 
 elapsed: 3.341 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertNotNull(Assert.java:218)
   at junit.framework.Assert.assertNotNull(Assert.java:211)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382)
 {code}
 This test didn't appear in 
 https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1559) Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE


[ 
https://issues.apache.org/jira/browse/YARN-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864220#comment-13864220
 ] 

Hudson commented on YARN-1559:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
YARN-1559. Race between ServerRMProxy and ClientRMProxy setting 
RMProxy#INSTANCE. (kasha and vinodkv via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555970)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ServerRMProxy.java


 Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE
 -

 Key: YARN-1559
 URL: https://issues.apache.org/jira/browse/YARN-1559
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1559-20140105.txt, yarn-1559-1.patch, 
 yarn-1559-2.patch, yarn-1559-3.patch


 RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and 
 ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
 Sample trace:
 {noformat}
 java.lang.IllegalArgumentException: RM does not support this client protocol
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at 
 org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
 at 
 org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
 at 
 org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM


[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864223#comment-13864223
 ] 

Hudson commented on YARN-1029:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
YARN-1029. Added embedded leader election in the ResourceManager. Contributed 
by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556103)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
 yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-10.patch, yarn-1029-2.patch, 
 yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, 
 yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch,

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864237#comment-13864237
 ] 

Steve Loughran commented on YARN-1490:
--

How will the AM get notified of its existing containers? I can't seem to see 
this in the code.

I can see the AM needing to know the following
# that it has been restarted with containers retained
# the list of the container allocations {{ListContainer liveContainers}}.
# the list of containers that failed during the outage. {{ListContainer 
completedContainers}}.

From that I can rebuild my model of the world (using container priorities to 
map to allocated roles)


 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1560) TestYarnClient#testAMMRTokens fails with null AMRM token


[ 
https://issues.apache.org/jira/browse/YARN-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864282#comment-13864282
 ] 

Hudson commented on YARN-1560:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
YARN-1560. Fixed TestYarnClient#testAMMRTokens failure with null AMRM token. 
(Contributed by Ted Yu) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555975)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 TestYarnClient#testAMMRTokens fails with null AMRM token
 

 Key: YARN-1560
 URL: https://issues.apache.org/jira/browse/YARN-1560
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-1560-v1.txt, yarn-1560-v2.txt


 The following can be reproduced locally:
 {code}
 testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient)  Time 
 elapsed: 3.341 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertNotNull(Assert.java:218)
   at junit.framework.Assert.assertNotNull(Assert.java:211)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382)
 {code}
 This test didn't appear in 
 https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1559) Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE


[ 
https://issues.apache.org/jira/browse/YARN-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864284#comment-13864284
 ] 

Hudson commented on YARN-1559:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
YARN-1559. Race between ServerRMProxy and ClientRMProxy setting 
RMProxy#INSTANCE. (kasha and vinodkv via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555970)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ServerRMProxy.java


 Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE
 -

 Key: YARN-1559
 URL: https://issues.apache.org/jira/browse/YARN-1559
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1559-20140105.txt, yarn-1559-1.patch, 
 yarn-1559-2.patch, yarn-1559-3.patch


 RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and 
 ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
 Sample trace:
 {noformat}
 java.lang.IllegalArgumentException: RM does not support this client protocol
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at 
 org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
 at 
 org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
 at 
 org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
 at 
 org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM


[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864287#comment-13864287
 ] 

Hudson commented on YARN-1029:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
YARN-1029. Added embedded leader election in the ResourceManager. Contributed 
by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556103)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
 yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-10.patch, yarn-1029-2.patch, 
 yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, 
 yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch,

[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart


[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864365#comment-13864365
 ] 

Steve Loughran commented on YARN-1489:
--

Actually, the simplest way for an AM to work with a restarted cluster would be 
if there was a blocking operation to list active containers. At startup it 
could get that list and use it to init its data structures -on a first start 
the list would be empty.

Alternatively, the restart information could be passed down in 
{{RegisterApplicationMasterResponse}} -which would avoid adding any new RPC 
calls

 [Umbrella] Work-preserving ApplicationMaster restart
 

 Key: YARN-1489
 URL: https://issues.apache.org/jira/browse/YARN-1489
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Work preserving AM restart.pdf


 Today if AMs go down,
  - RM kills all the containers of that ApplicationAttempt
  - New ApplicationAttempt doesn't know where the previous containers are 
 running
  - Old running containers don't know where the new AM is running.
 We need to fix this to enable work-preserving AM restart. The later two 
 potentially can be done at the app level, but it is good to have a common 
 solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2014-01-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864382#comment-13864382
 ] 

Bikas Saha commented on YARN-1489:
--

The POR is the attempt AMRM register RPC to return the currently running 
containers for that app. So when the attempt makes the initial sync with the RM 
then it will get all that info.

 [Umbrella] Work-preserving ApplicationMaster restart
 

 Key: YARN-1489
 URL: https://issues.apache.org/jira/browse/YARN-1489
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Work preserving AM restart.pdf


 Today if AMs go down,
  - RM kills all the containers of that ApplicationAttempt
  - New ApplicationAttempt doesn't know where the previous containers are 
 running
  - Old running containers don't know where the new AM is running.
 We need to fix this to enable work-preserving AM restart. The later two 
 potentially can be done at the app level, but it is good to have a common 
 solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1531) Update yarn command document


[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864390#comment-13864390
 ] 

Karthik Kambatla commented on YARN-1531:


[~ajisakaa], thanks for taking this up. Even though formatting the code to 80 
chars per line is a good thing, it is probably better to limit those formatting 
changes to the actual text being changed. We can create a separate JIRA just 
for the formatting.

 Update yarn command document
 

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Attachments: YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1409) NonAggregatingLogHandler can throw RejectedExecutionException

2014-01-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864392#comment-13864392
 ] 

Jason Lowe commented on YARN-1409:
--

+1 lgtm, committing this.  A minor nit is that the org.junit.Assert import that 
was added to the test is unnecessary.  Will clean this up during the commit.

 NonAggregatingLogHandler can throw RejectedExecutionException
 -

 Key: YARN-1409
 URL: https://issues.apache.org/jira/browse/YARN-1409
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1409.1.patch, YARN-1409.2.patch, YARN-1409.3.patch


 This problem is caused by handling APPLICATION_FINISHED events after calling 
 sched.shotdown() in NonAggregatingLongHandler#serviceStop(). 
 org.apache.hadoop.mapred.TestJobCleanup can fail because of 
 RejectedExecutionException by NonAggregatingLogHandler.
 {code}
 2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in 
 dispatcher thread
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 
 rejected from 
 java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool 
 size = 4, active threads = 0, queued tasks = 7, completed tasks = 0]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95)
 at java.lang.Thread.run(Thread.java:724)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1409) NonAggregatingLogHandler can throw RejectedExecutionException


[ 
https://issues.apache.org/jira/browse/YARN-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864417#comment-13864417
 ] 

Hudson commented on YARN-1409:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4967 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4967/])
YARN-1409. NonAggregatingLogHandler can throw RejectedExecutionException. 
Contributed by Tsuyoshi OZAWA (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556282)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/TestNonAggregatingLogHandler.java


 NonAggregatingLogHandler can throw RejectedExecutionException
 -

 Key: YARN-1409
 URL: https://issues.apache.org/jira/browse/YARN-1409
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.4.0

 Attachments: YARN-1409.1.patch, YARN-1409.2.patch, YARN-1409.3.patch


 This problem is caused by handling APPLICATION_FINISHED events after calling 
 sched.shotdown() in NonAggregatingLongHandler#serviceStop(). 
 org.apache.hadoop.mapred.TestJobCleanup can fail because of 
 RejectedExecutionException by NonAggregatingLogHandler.
 {code}
 2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in 
 dispatcher thread
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 
 rejected from 
 java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool 
 size = 4, active threads = 0, queued tasks = 7, completed tasks = 0]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95)
 at java.lang.Thread.run(Thread.java:724)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1409) NonAggregatingLogHandler can throw RejectedExecutionException


[ 
https://issues.apache.org/jira/browse/YARN-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864447#comment-13864447
 ] 

Tsuyoshi OZAWA commented on YARN-1409:
--

Thank you, Jason!

 NonAggregatingLogHandler can throw RejectedExecutionException
 -

 Key: YARN-1409
 URL: https://issues.apache.org/jira/browse/YARN-1409
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.4.0

 Attachments: YARN-1409.1.patch, YARN-1409.2.patch, YARN-1409.3.patch


 This problem is caused by handling APPLICATION_FINISHED events after calling 
 sched.shotdown() in NonAggregatingLongHandler#serviceStop(). 
 org.apache.hadoop.mapred.TestJobCleanup can fail because of 
 RejectedExecutionException by NonAggregatingLogHandler.
 {code}
 2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in 
 dispatcher thread
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 
 rejected from 
 java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool 
 size = 4, active threads = 0, queued tasks = 7, completed tasks = 0]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95)
 at java.lang.Thread.run(Thread.java:724)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864481#comment-13864481
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

Thank you for the comment, Akira. [~jianhe], can you merge a latest patch?

 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time


[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864473#comment-13864473
 ] 

Tsuyoshi OZAWA commented on YARN-1326:
--

I assume that a user can forget to configure RMStateStore and use unexpected 
RMStateStore, because MemoryRMStateStore is the default value of 
RMStateStoreFactory#getStore(). 

{code}
public class RMStateStoreFactory {
  
  public static RMStateStore getStore(Configuration conf) {
RMStateStore store = ReflectionUtils.newInstance(
conf.getClass(YarnConfiguration.RM_STORE, 
MemoryRMStateStore.class, RMStateStore.class), 
conf);
return store;
  }
}
{code}

 RM should log using RMStore at startup time
 ---

 Key: YARN-1326
 URL: https://issues.apache.org/jira/browse/YARN-1326
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1326.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Currently there are no way to know which RMStore RM uses. It's useful to log 
 the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-01-07 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864516#comment-13864516
 ] 

Xuan Gong commented on YARN-1410:
-

Have offline discussion with Bikas and Vinod. The approach we will use is to 
make RM accept the appId in the context. 
Assume that RM1 assign an applicationId, say Application_12345_1. Before the 
app is accepted, the failover happens. Now, RM2 becomes active, the RM2 will 
re-use the same applicationId  Application_12345_1 (instead of assigning a new 
appId) to submitApplication.

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410.1.patch


 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864530#comment-13864530
 ] 

Jian He commented on YARN-1293:
---

Thanks Akira for verifying , +1, committing it.

 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission


[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864540#comment-13864540
 ] 

Karthik Kambatla commented on YARN-1410:


That makes sense. I am on board too. 

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410.1.patch


 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-01-07 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864557#comment-13864557
]

Sangjin Lee commented on YARN-1492:
---

Thanks for the comments [~kkambatl]!

bq. In the client protocol, if a cleaner instance (or run) starts after R2 and
before R2', the client wouldn't know of this cleaner's existence.

That's why step R1 exists. Since the client lock is dropped *before* the client
inspects the cleaner lock, even if the cleaner starts between R2 and R2' the
cleaner simply skips this entry in favor of the client.

Having said that, we are currently looking at the design again to better
address the issue of security and other aspects. So it is likely some of these
design choices may be revisited.

truly shared cache for jars (jobjar/libjar)
---

Key: YARN-1492
URL: https://issues.apache.org/jira/browse/YARN-1492
Project: Hadoop YARN
Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf,
shared_cache_design_v3.pdf, shared_cache_design_v4.pdf

Currently there is the distributed cache that enables you to cache jars and
files so that attempts from the same job can reuse them. However, sharing is
limited with the distributed cache because it is normally on a per-job basis.
On a large cluster, sometimes copying of jobjars and libjars becomes so
prevalent that it consumes a large portion of the network bandwidth, not to
speak of defeating the purpose of bringing compute to where data is. This
is wasteful because in most cases code doesn't change much across many jobs.
I'd like to propose and discuss feasibility of introducing a truly shared
cache so that multiple jobs from multiple users can share and cache jars.
This JIRA is to open the discussion.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864560#comment-13864560
 ] 

Jian He commented on YARN-1293:
---

Committed to trunk and branch-2, thanks Tsuyoshi !

 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1520) update capacity scheduler docs to include necessary parameters

2014-01-07 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1520:
--

Attachment: yarn-1520

 update capacity scheduler docs to include necessary parameters
 --

 Key: YARN-1520
 URL: https://issues.apache.org/jira/browse/YARN-1520
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Chen He
Assignee: Chen He
  Labels: documentation, newbie
 Attachments: yarn-1520






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864566#comment-13864566
 ] 

Hudson commented on YARN-1293:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4968 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4968/])
YARN-1293. Fixed TestContainerLaunch#testInvalidEnvSyntaxDiagnostics failure 
caused by non-English system locale. Contributed by Tsuyoshi OZAWA. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556318)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864571#comment-13864571
 ] 

Jian He commented on YARN-1490:
---

bq. the list of containers that failed during the outage. ListContainer 
completedContainers.
RMAppImpl.AttemptFailedTransition transition is retrieving those.
 bq. the list of the container allocations ListContainer liveContainers.
SchedulerApplicationAttempt.recover()

Beyond this patch, there's more AM protocol change patch, I have a local patch 
and will upload once this gets in. 

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864574#comment-13864574
 ] 

Tsuyoshi OZAWA commented on YARN-1293:
--

Thanks Jian!

 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission


[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864578#comment-13864578
 ] 

Jian He commented on YARN-1410:
---

Is it possible for RM2 to already have an existing conflicting applicationId 
compared to the one from RM1?  

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410.1.patch


 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1520) update capacity scheduler docs to include necessary parameters


[ 
https://issues.apache.org/jira/browse/YARN-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864590#comment-13864590
 ] 

Hadoop QA commented on YARN-1520:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621835/yarn-1520
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2814//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2814//console

This message is automatically generated.

 update capacity scheduler docs to include necessary parameters
 --

 Key: YARN-1520
 URL: https://issues.apache.org/jira/browse/YARN-1520
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Chen He
Assignee: Chen He
  Labels: documentation, newbie
 Attachments: yarn-1520






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs

[
https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864733#comment-13864733
]

Karthik Kambatla commented on YARN-1461:

Thanks for taking a look, [~zjshen].

bq. How about making the two constants configurable?
As discussed earlier on YARN-1399, I think we should leave them as constants
for now and create configs when we really think we need them.

bq. Should ApplicationSubmissionContext#newInstance have String[] tags as well?
Same for ApplicationReport and GetApplicationsRequest. Or you didn't do it on
purpose for sake of compatibility? If so, I'm just feeling we're going to have
more newInstance methods that cannot cover all the fields the objects should
have.
Intentionally left them out. IMO, there should be a single newInstance method
to create the instance and then setters be used to actually set the fields -
builder pattern.

bq. Should we consider both case-sensitive and -insensitive, and both AND and
OR logic?
It would be unnecessarily complicating things. Again, as people have suggested
on YARN-1399, case-insensitive and OR should address most cases, at least
first-cut users can handle the AND. We can support AND in the future.

RM API and RM changes to handle tags for running jobs
-

Key: YARN-1461
URL: https://issues.apache.org/jira/browse/YARN-1461
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch,
yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch,
yarn-1461-7.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs


 [ 
https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1461:
---

Attachment: yarn-1461-8.patch

New patch includes a new field in GetApplicationsRequest - Scope - to capture 
the scope of apps to be returned, the default value being OWN apps. For 
compatibility with 2.2, I have updated newInstance() methods to set it to ALL 
apps.

 RM API and RM changes to handle tags for running jobs
 -

 Key: YARN-1461
 URL: https://issues.apache.org/jira/browse/YARN-1461
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, 
 yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, 
 yarn-1461-7.patch, yarn-1461-8.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864748#comment-13864748
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~rvs], mind if i take this JIRA from you? (besides being a cool JIRA number to 
own) the current dependencies in Yarn POMs are breaking intellij integration 
and this is kind of driving me crazy and I took a stub this morning and have a 
working patch.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM

2014-01-07 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864754#comment-13864754
 ] 

Vinod Kumar Vavilapalli commented on YARN-1482:
---

+1, looks good. Checking this in.

 WebApplicationProxy should be always-on w.r.t HA even if it is embedded in 
 the RM
 -

 Key: YARN-1482
 URL: https://issues.apache.org/jira/browse/YARN-1482
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, 
 YARN-1482.4.patch, YARN-1482.4.patch, YARN-1482.5.patch, YARN-1482.5.patch, 
 YARN-1482.6.patch


 This way, even if an RM goes to standby mode, we can affect a redirect to the 
 active. And more importantly, users will not suddenly see all their links 
 stop working.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864759#comment-13864759
 ] 

Jian He commented on YARN-1506:
---

- Instead of the check here, I think we can send the event and make 
RMNodeTransition to ignore this event. This can prevent the case that 
isUnusable return true right before the node is about to become usable, since 
the events will be processed sequentially.
{code}
else if (node.getState().isUnusable()) {
   LOG.warn(Resource update get failed on an unusable node:  + nodeId);
{code}
- Did we have an overall test for testing AdminService to send the request and 
verify RMNode and schedulerNode are changed accordingly?

Patch looks good to me mostly, [~bikassaha]/ [~vinodkv] you may also want to 
take a look.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM


[ 
https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864763#comment-13864763
 ] 

Hudson commented on YARN-1482:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4970 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4970/])
YARN-1482. Modified WebApplicationProxy to make it work across ResourceManager 
fail-over. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556380)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java


 WebApplicationProxy should be always-on w.r.t HA even if it is embedded in 
 the RM
 -

 Key: YARN-1482
 URL: https://issues.apache.org/jira/browse/YARN-1482
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, 
 YARN-1482.4.patch, YARN-1482.4.patch, YARN-1482.5.patch, YARN-1482.5.patch, 
 YARN-1482.6.patch


 This way, even if an RM goes to standby mode, we can affect a redirect to the 
 active. And more importantly, users will not suddenly see all their links 
 stop working.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864769#comment-13864769
 ] 

Sandy Ryza commented on YARN-1399:
--

Adding a field to GetApplicationsRequest whose default limits what's returned 
by GetApplicationsRequest would be an incompatible change. -1 to that.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


[ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864775#comment-13864775
 ] 

Sandy Ryza commented on YARN-1568:
--

+1

 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues


 [ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1496:
-

Attachment: YARN-1496-4.patch

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864806#comment-13864806
 ] 

Jian He commented on YARN-1506:
---

bq. REBOOT - RUNNING for a rebooted node come back as running for accepting 
RECONNECTED/CLEAN_CONTAINER/AP
not sure about this. A restart node seems only trigger the RECONNECT event on 
register and RMNode stays on RUNNING when receiving this event.
bq. DECOMMISSIONED - RUNNING for a decommissioned node be recommissoned again 
simply because we are not supporting recommission ?
bq. LOST - NEW/UNHELATHY/DECOMMISSONED for a expired node heartbeat again
from the code, I can see the node is actually gone from RM's point of view once 
the node expires
bq. UNHEALTHY - RUNNING for a unhealthy node report to be healthy again 
this is handled

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs


[ 
https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864822#comment-13864822
 ] 

Hadoop QA commented on YARN-1461:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621867/yarn-1461-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2815//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2815//console

This message is automatically generated.

 RM API and RM changes to handle tags for running jobs
 -

 Key: YARN-1461
 URL: https://issues.apache.org/jira/browse/YARN-1461
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, 
 yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, 
 yarn-1461-7.patch, yarn-1461-8.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864849#comment-13864849
 ] 

Sandy Ryza commented on YARN-1496:
--

bq. 'Move' doesn't seem informative enough.
Good point.  How does ChangeApplicationQueue sound to you?

bq. Also, we should not mark APIs stable till they are, well, stable. Let's 
mark them unstable to begin with.
APIs marked stable can still change before they are included in a release, 
right?  By marking them stable I mean that once we include them in a release 
they shouldn't be able to change.  Only committing to trunk at this time to 
ensure they're not included in a release accidentally.

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned YARN-888:
---

Assignee: Alejandro Abdelnur  (was: Roman Shaposhnik)

Thanks Roman, I'll be posting the patch momentarily. If you have time to review 
it, it would be great.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-888:


Attachment: YARN-888.patch

The patch moves all the dependencies to the leaf projects declaring explicitly 
what the module needs (used the dependency:analyze plugin to zero on that, 
commented in the POMs the dependencies not caught by the plugin as used).

I've also did a DIST build and verified the JARs in the DIST are all the same 
(with the exception of the yarn-site JAR which is no more, the project for it 
is of type 'pom').

I've also verified Intellij now works fine compiling and running testcases.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to web UI and metrics


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864892#comment-13864892
 ] 

Karthik Kambatla commented on YARN-1033:


Hey [~nemon]. Are you still planning to work on this? Otherwise, I would like 
to take a stab at it.

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Nemon Lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


[ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864902#comment-13864902
 ] 

Hadoop QA commented on YARN-1568:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621871/yarn-1568-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2817//console

This message is automatically generated.

 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


 [ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1568:
---

Attachment: yarn-1568-1.patch

Re-submitting patch. 

 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1568-1.patch, yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies

[
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864909#comment-13864909
]

Hadoop QA commented on YARN-888:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12621881/YARN-888.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices

The test build failed in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/2816//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2816//console

This message is automatically generated.

clean up POM dependencies
-

Key: YARN-888
URL: https://issues.apache.org/jira/browse/YARN-888
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Attachments: YARN-888.patch

Intermediate 'pom' modules define dependencies inherited by leaf modules.
This is causing issues in intellij IDE.
We should normalize the leaf modules like in common, hdfs and tools where all
dependencies are defined in each leaf module and the intermediate 'pom'
module do not define any dependency.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-01-07 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1410:
--

Remaining Estimate: 48h
 Original Estimate: 48h

 Handle client failover during 2 step client API's like app submission
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410.1.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1041) RM to bind and notify a restarted AM of existing containers


 [ 
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1041:
--

Attachment: YARN-1041.1.patch

Uploaded  a patch for changing AM protocols to get the previous running 
containers on registration.
The uploaded patch is based on YARN-1490 and may not apply locally for now. 
Just to give an early view of the patch.

 RM to bind and notify a restarted AM of existing containers
 ---

 Key: YARN-1041
 URL: https://issues.apache.org/jira/browse/YARN-1041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-1041.1.patch


 For long lived containers we don't want the AM to be a SPOF.
 When the RM restarts a (failed) AM, it should be given the list of containers 
 it had already been allocated. the AM should then be able to contact the NMs 
 to get details on them. NMs would also need to do any binding of the 
 containers needed to handle a moved/restarted AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1041) Protocol changes for RM to bind and notify a restarted AM of existing containers


 [ 
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1041:
--

Summary: Protocol changes for RM to bind and notify a restarted AM of 
existing containers  (was: RM to bind and notify a restarted AM of existing 
containers)

 Protocol changes for RM to bind and notify a restarted AM of existing 
 containers
 

 Key: YARN-1041
 URL: https://issues.apache.org/jira/browse/YARN-1041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-1041.1.patch


 For long lived containers we don't want the AM to be a SPOF.
 When the RM restarts a (failed) AM, it should be given the list of containers 
 it had already been allocated. the AM should then be able to contact the NMs 
 to get details on them. NMs would also need to do any binding of the 
 containers needed to handle a moved/restarted AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


[ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864934#comment-13864934
 ] 

Hadoop QA commented on YARN-1568:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621887/yarn-1568-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2818//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2818//console

This message is automatically generated.

 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1568-1.patch, yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics

2014-01-07 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1033:


Assignee: Karthik Kambatla  (was: Nemon Lou)

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to web UI and metrics

2014-01-07 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864948#comment-13864948
 ] 

Nemon Lou commented on YARN-1033:
-

Hi,Karthik Kambatla .Feel free to take it. : )
Thanks

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Nemon Lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'


[ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864969#comment-13864969
 ] 

Jian He commented on YARN-1166:
---

patch looks good overall.
- In FairScheduler, log if application == null ?
{code}
  private synchronized void removeApplication(ApplicationId applicationId,
  RMAppState finalState) {
SchedulerApplication application = applications.get(applicationId);
if (application == null){
  return;
}
{code}
- There are things other than queue metrics. For example, 
LeafQueue.activeApplications and PendingApplications.  These two are actually 
recording the attempts. But I remember those two are exposed on scheduler UI as 
schedulable and non-schedulable apps. Can you check if these two collections 
are also needed be associated with application ?

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, 
 YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.patch


 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


[ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864987#comment-13864987
 ] 

Karthik Kambatla commented on YARN-1568:


Didn't add tests as the patch just changes field name.

Thanks for the review, Sandy. Will commit this later today. 

 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1568-1.patch, yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

[
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864989#comment-13864989
]

Junping Du commented on YARN-1506:
--

Hi [~jianhe], Thanks again for your review and comments:
bq. Instead of the check here, I think we can send the event and make
RMNodeTransition to ignore this event. This can prevent the case that
isUnusable return true right before the node is about to become usable, since
the events will be processed sequentially.
Good point. We should just let event mechanism to handle this concurrent issue.
bq. Did we have an overall test for testing AdminService to send the request
and verify RMNode and schedulerNode are changed accordingly?
No system test yet with this patch but just some unit tests. However, I did
some integration tests on previous patches in YARN-291 with a raw patch of
YARN-313 (patch with admin CLI) and found it works well. More integration tests
will come with YARN-313 (the next and last patch on YARN-291 that target for
2.4 branch). Make sense?
bq. [REBOOT - RUNNING] not sure about this. A restart node seems only trigger
the RECONNECT event on register and RMNode stays on RUNNING when receiving this
event.
The interesting thing here is DeactivateNodeTransition will be trigged from
RUNNING - REBOOT, so node will be removed from RMContext.nodes and put to
RMContext.inactiveNodes. So for next time registration, the event is sent as
START instead of RECONNECT and nothing happens as we don't have state machine
trigged from REBOOT with START event. We should fix it. Isn't it?
bq. [DECOMMISSIONED - RUNNING] simply because we are not supporting
recommission?
Yes. IMO, Recommission is a *must* to have if we claim YARN support
decommission.
bq. [LOST - NEW/UNHELATHY/DECOMMISSIONED] from the code, I can see the node is
actually gone from RM's point of view once the node expires
node is just go to RMContext.inactiveNodes. But it is possible for node to
heartbeat with status update again (cases like: network outage and come back,
node VM are suspended or freeze, clock unsynchronized, etc.) when its status is
put into LOST, and we don't have any code to handle this. We should fix it.
Isn't it?
It seems to me that many state transitions are missing in above discuss cases,
we can file a separate JIRA to address this. Thoughts?

Replace set resource change on RMNode/SchedulerNode directly with event
notification.
-

Key: YARN-1506
URL: https://issues.apache.org/jira/browse/YARN-1506
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch,
YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch

According to Vinod's comments on YARN-312
(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
we should replace RMNode.setResourceOption() with some resource change event.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting


 [ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1569:
-

Description: 
As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
always check appropriate type before casting. 
handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
far (no bug there now) but should be improved as FairScheduler.

  was:As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we 
should always check appropriate type before casting. handle(SchedulerEvent) in 
FifoScheduler and CapacityScheduler didn't check so far (no bug there now) but 
should be improved as FairScheduler.


 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Priority: Minor
  Labels: newbie

 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. 
 handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
 far (no bug there now) but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting


 [ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1569:
-

Summary: For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
SchedulerEvent should get checked (instanceof) for appropriate type before 
casting  (was: For handl(SchedulerEvent) in FifoScheduler and 
CapacityScheduler, SchedulerEvent should get checked (instanceof) for 
appropriate type before casting)

 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Priority: Minor
  Labels: newbie

 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. handle(SchedulerEvent) in 
 FifoScheduler and CapacityScheduler didn't check so far (no bug there now) 
 but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1569) For handl(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting

Junping Du created YARN-1569:


 Summary: For handl(SchedulerEvent) in FifoScheduler and 
CapacityScheduler, SchedulerEvent should get checked (instanceof) for 
appropriate type before casting
 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Priority: Minor


As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
always check appropriate type before casting. handle(SchedulerEvent) in 
FifoScheduler and CapacityScheduler didn't check so far (no bug there now) but 
should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1568) Rename clusterid to clusterId in ActiveRMInfoProto


[ 
https://issues.apache.org/jira/browse/YARN-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865034#comment-13865034
 ] 

Hudson commented on YARN-1568:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4974 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4974/])
YARN-1568. Rename clusterid to clusterId in ActiveRMInfoProto (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556435)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java


 Rename clusterid to clusterId in ActiveRMInfoProto 
 ---

 Key: YARN-1568
 URL: https://issues.apache.org/jira/browse/YARN-1568
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.4.0

 Attachments: yarn-1568-1.patch, yarn-1568-1.patch


 YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field 
 clusterid, which is inconsistent with other fields. Better to fix it 
 immediately than leave the inconsistency. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1531) Update yarn command document


[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865050#comment-13865050
 ] 

Akira AJISAKA commented on YARN-1531:
-

[~kkambatl], thanks for your comment! I'll split the patch.

 Update yarn command document
 

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Attachments: YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1531) Update yarn command document


 [ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-1531:


Attachment: YARN-1531.2.patch

Attaching a patch except formatting.

 Update yarn command document
 

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Attachments: YARN-1531.2.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1531) Update yarn command document


[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865083#comment-13865083
 ] 

Hadoop QA commented on YARN-1531:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621919/YARN-1531.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2819//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2819//console

This message is automatically generated.

 Update yarn command document
 

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Attachments: YARN-1531.2.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1293) TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865107#comment-13865107
 ] 

Akira AJISAKA commented on YARN-1293:
-

Thanks [~jianhe] and [~ozawa]!

 TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk
 --

 Key: YARN-1293
 URL: https://issues.apache.org/jira/browse/YARN-1293
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: linux
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.3.0

 Attachments: YARN-1293.1.patch


 {quote}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 0.114 sec   FAILURE!
 junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:48)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm

Akira AJISAKA created YARN-1570:
---

 Summary: Formatting the lines within 80 chars in 
YarnCommands.apt.vm
 Key: YARN-1570
 URL: https://issues.apache.org/jira/browse/YARN-1570
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Priority: Minor
 Fix For: 2.4.0


In YarnCommands.apt.vm, there are some lines longer than 80 characters.
For example:
{code}
  Yarn commands are invoked by the bin/yarn script. Running the yarn script 
without any arguments prints the description for all commands.
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1531) Update yarn command document


[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865114#comment-13865114
 ] 

Akira AJISAKA commented on YARN-1531:
-

Created YARN-1570 for formatting.

 Update yarn command document
 

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Attachments: YARN-1531.2.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1041) Protocol changes for RM to bind and notify a restarted AM of existing containers


[ 
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865123#comment-13865123
 ] 

Sandy Ryza commented on YARN-1041:
--

Took a look at the Fair Scheduler changes.  A couple nits:
{code}
+ContainerId amContainerId =
+this.rmContext.getRMApps().get(applicationId).getCurrentAppAttempt()
+  .getMasterContainer().getId();
{code}
Other references to rmContext in this file do not use this.

{code}
+
   private SchedulerApplicationAttempt getCurrentAttemptForContainer(
   ContainerId containerId) {
 SchedulerApplication app =
@@ -1361,5 +1384,4 @@ public void onReload(AllocationConfiguration queueInfo) {
 queue.collectSchedulerApplications(apps);
 return apps;
   }
-
{code}
False whitespace changes

 Protocol changes for RM to bind and notify a restarted AM of existing 
 containers
 

 Key: YARN-1041
 URL: https://issues.apache.org/jira/browse/YARN-1041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Jian He
 Attachments: YARN-1041.1.patch


 For long lived containers we don't want the AM to be a SPOF.
 When the RM restarts a (failed) AM, it should be given the list of containers 
 it had already been allocated. the AM should then be able to contact the NMs 
 to get details on them. NMs would also need to do any binding of the 
 containers needed to handle a moved/restarted AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2014-01-07 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-1166:
--

Attachment: YARN-1166.7.patch

bq. In FairScheduler, log if application == null ?

Add the log not only for FairScheduler#removeApplication, but for
CapacityScheduler#doneApplication and FifoScheduler#doneApplication

bq. There are things other than queue metrics. For example,
LeafQueue.activeApplications and PendingApplications. These two are actually
recording the attempts. But I remember those two are exposed on scheduler UI as
schedulable and non-schedulable apps. Can you check if these two collections
are also needed be associated with application ?

As is mentioned in my last comment, active apps and pending apps are changed
with app-attempt trigger. The two metrics may increase and decrease during the
life cycle of an application given there're multiple attempts.

YARN 'appsFailed' metric should be of type 'counter'

Key: YARN-1166
URL: https://issues.apache.org/jira/browse/YARN-1166
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch,
YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.patch

Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of
type 'guage' - which means the exact value will be reported.
All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled)
are all of type 'counter' - meaning Ganglia will use slope to provide deltas
between time-points.
To be consistent, AppsFailed metric should also be of type 'counter'.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'


[ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865136#comment-13865136
 ] 

Jian He commented on YARN-1166:
---

bq. As is mentioned in my last comment, active apps and pending apps are 
changed with app-attempt trigger
What I meant is the activeApplications and  PendingApplications inside 
LeafQueue, these two also end up showing metrics on the scheduler UI and these 
two are different from the pending/running metrics of the QueueMetrics.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, 
 YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.patch


 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-07 Thread Shinichi Yamashita (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865140#comment-13865140
]

Shinichi Yamashita commented on YARN-321:
-

I confirmed attached design document. And I have two questions about
FileSystemApplicationHistoryStore.

1. Does it provide a function to set maximum files and maximum retention period
of AppicationHistory to store in HDFS?
2. When there are many AppilicationHistory in HDFS, does it not limit the
number of the reading of ApplicationHistory?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf,
Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'


[ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865145#comment-13865145
 ] 

Hadoop QA commented on YARN-1166:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621931/YARN-1166.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2820//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2820//console

This message is automatically generated.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, 
 YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.patch


 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1571) Don't allow periods in Fair Scheduler queue names

Sandy Ryza created YARN-1571:


 Summary: Don't allow periods in Fair Scheduler queue names
 Key: YARN-1571
 URL: https://issues.apache.org/jira/browse/YARN-1571
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza


Periods can't be used in fair scheduler queue names because they're used as 
delimiters between queues and their parents.  Maybe we should replace them with 
underscores or something.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2014-01-07 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865154#comment-13865154
 ] 

Zhijie Shen commented on YARN-1166:
---

bq. What I meant is the activeApplications and PendingApplications inside 
LeafQueue, these two also end up showing metrics on the scheduler UI and these 
two are different from the pending/running metrics of the QueueMetrics.

Those two metrics alter with application attempt being added/activated/removed, 
which is similar to those in QueueMetrics. IMHO, it is reasonable that the 
pending/active metrics (either in LeafQueue or QueueMetrics) is binding to 
application attempt, given one application can at most have one attempt at any 
time.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, 
 YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.patch


 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1506:
-

Attachment: YARN-1506-v6.patch

Address [~jianhe]'s comments in v6 patch.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865168#comment-13865168
 ] 

Hadoop QA commented on YARN-1506:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621936/YARN-1506-v6.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2822//console

This message is automatically generated.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.