[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013356#comment-14013356
 ] 

Jian He commented on YARN-1366:
---

The bulk of the patch here is MR changes. I think we should have a MR jira to 
track the MR changes? Both patches are very related and patch size seems 
reasonable to be consolidated. It's fine to leave as-is, but just easier for 
reviewer to have more context.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013398#comment-14013398
 ] 

Sandy Ryza commented on YARN-2010:
--

Agree with Vinod that non-secure cluster to secure cluster is not currently 
supported and bound to have tons of issues.  I've come across other bugs that 
have turned out to stem from this.  If this is the only situation where we 
could conceivably face this issue, I'm somewhat dubious about whether it needs 
to be fixed.  On the other hand, in general, being defensive about allowing a 
transition to active even when an app recovery fails makes sense to me.

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022-DesignDraft.docx

Hi [~curino]

I have attached a Design Draft document and I tried to capture the corner 
cases. This draft also includes the approach to handle the same. Please review 
the same and share your thoughts.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013509#comment-14013509
 ] 

Hadoop QA commented on YARN-2022:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12647576/YARN-2022-DesignDraft.docx
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3865//console

This message is automatically generated.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2112) Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013520#comment-14013520
 ] 

Hudson commented on YARN-2112:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #568 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/568/])
YARN-2112. Fixed yarn-common's pom.xml to include jackson dependencies so that 
both Timeline Server and client can access them. Contributed by Zhijie Shen. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598373)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml


 Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml
 -

 Key: YARN-2112
 URL: https://issues.apache.org/jira/browse/YARN-2112
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2112.1.patch


 Now YarnClient is using TimelineClient, which has dependency on jackson libs. 
 However, the current dependency configurations make the hadoop-client 
 artifect miss 2 jackson libs, such that the applications which have 
 hadoop-client dependency will see the following exception
 {code}
 java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.init(TimelineClientImpl.java:92)
   at 
 org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:44)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:149)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.init(ResourceMgrDelegate.java:88)
   at org.apache.hadoop.mapred.YARNRunner.init(YARNRunner.java:111)
   at 
 org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
   at 

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-30 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013607#comment-14013607
 ] 

Rohith commented on YARN-1366:
--

Let this jira keep only for Yarn Client. I created MAPREDUCE-5910 for handling 
at MR.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.4.patch

Attached updated patch that address Anubhav's all comments. This patch contains 
only YarnClient changes. 
The changes are
1. Added a test that covers the scenario for functionalities.
2. Added core-site.xml for test with disabling ip based check.
3. modified yarn-client pom.xml for getting yarn-common-test in test classpath.

I am not changing status for Patch Available since test require patch of 
Yarn-1365. 

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2112) Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013639#comment-14013639
 ] 

Hudson commented on YARN-2112:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1759 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1759/])
YARN-2112. Fixed yarn-common's pom.xml to include jackson dependencies so that 
both Timeline Server and client can access them. Contributed by Zhijie Shen. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598373)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml


 Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml
 -

 Key: YARN-2112
 URL: https://issues.apache.org/jira/browse/YARN-2112
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2112.1.patch


 Now YarnClient is using TimelineClient, which has dependency on jackson libs. 
 However, the current dependency configurations make the hadoop-client 
 artifect miss 2 jackson libs, such that the applications which have 
 hadoop-client dependency will see the following exception
 {code}
 java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.init(TimelineClientImpl.java:92)
   at 
 org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:44)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:149)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.init(ResourceMgrDelegate.java:88)
   at org.apache.hadoop.mapred.YARNRunner.init(YARNRunner.java:111)
   at 
 org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
   at 

[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2014-05-30 Thread Dave Disser (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013710#comment-14013710
 ] 

Dave Disser commented on YARN-800:
--

As a follow-up, I also notice that the proxying works correctly while the 
tracking URL is UNASSIGNED (the first couple seconds after AM container 
launch), but then HTTP 500 occurs shortly after.


 Clicking on an AM link for a running app leads to a HTTP 500
 

 Key: YARN-800
 URL: https://issues.apache.org/jira/browse/YARN-800
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Priority: Minor

 Clicking the AM link tries to open up a page with url like
 http://hostname:8088/proxy/application_1370886527995_0645/
 and this leads to an HTTP 500



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013734#comment-14013734
 ] 

Hadoop QA commented on YARN-1338:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647161/YARN-1338v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3866//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3866//console

This message is automatically generated.

 Recover localized resource cache state upon nodemanager restart
 ---

 Key: YARN-1338
 URL: https://issues.apache.org/jira/browse/YARN-1338
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1338.patch, YARN-1338v2.patch, 
 YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch, 
 YARN-1338v6.patch


 Today when node manager restarts we clean up all the distributed cache files 
 from disk. This is definitely not ideal from 2 aspects.
 * For work preserving restart we definitely want them as running containers 
 are using them
 * For even non work preserving restart this will be useful in the sense that 
 we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) Use scheduling policies throughout the queue hierarchy to decide which containers to preempt

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013757#comment-14013757
 ] 

Hudson commented on YARN-596:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1786 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1786/])
YARN-596. Use scheduling policies throughout the queue hierarchy to decide 
which containers to preempt (Wei Yan via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598197)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


 Use scheduling policies throughout the queue hierarchy to decide which 
 containers to preempt
 

 Key: YARN-596
 URL: https://issues.apache.org/jira/browse/YARN-596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
 YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch


 In the fair scheduler, containers are chosen for preemption in the following 
 way:
 All containers for all apps that are in queues that are over their fair share 
 are put in a list.
 The list is sorted in order of the priority that the container was requested 
 in.
 This means that an application can shield itself from preemption by 
 requesting it's containers at higher priorities, which doesn't really make 
 sense.
 Also, an application that is not over its fair share, but that is in a queue 
 that is over it's fair share is just as likely to have containers preempted 
 as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013761#comment-14013761
 ] 

Hudson commented on YARN-2107:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1786 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1786/])
YARN-2107. Refactored timeline classes into o.a.h.y.s.timeline package. 
Contributed by Vinod Kumar Vavilapalli. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/NameValuePair.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineWriter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineClientAuthenticationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp
* 

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013777#comment-14013777
 ] 

Karthik Kambatla commented on YARN-2010:


Let me clarify a couple of things. It is true that the first time we 
encountered this was during an upgrade from non-secure to secure cluster. 
However, as I mentioned earlier in the JIRA, it is possible to run into this in 
other situations. 

Even in the case of upgrading from non-secure to secure cluster, I totally 
understand we can't support recovering running/completed applications. However, 
one shouldn't have to explicitly nuke the ZK store (which by the way is 
involved due to the ACLs-magic and lacks an rmadmin command) to be able to 
start the RM. 

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 

[jira] [Commented] (YARN-2112) Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013766#comment-14013766
 ] 

Hudson commented on YARN-2112:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1786 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1786/])
YARN-2112. Fixed yarn-common's pom.xml to include jackson dependencies so that 
both Timeline Server and client can access them. Contributed by Zhijie Shen. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598373)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml


 Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml
 -

 Key: YARN-2112
 URL: https://issues.apache.org/jira/browse/YARN-2112
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2112.1.patch


 Now YarnClient is using TimelineClient, which has dependency on jackson libs. 
 However, the current dependency configurations make the hadoop-client 
 artifect miss 2 jackson libs, such that the applications which have 
 hadoop-client dependency will see the following exception
 {code}
 java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.init(TimelineClientImpl.java:92)
   at 
 org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:44)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:149)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:94)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.init(ResourceMgrDelegate.java:88)
   at org.apache.hadoop.mapred.YARNRunner.init(YARNRunner.java:111)
   at 
 org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
   at 

[jira] [Commented] (YARN-2054) Better defaults for YARN ZK configs for retries and retry-inteval when HA is enabled

2014-05-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013788#comment-14013788
 ] 

Karthik Kambatla commented on YARN-2054:


Saw this late - thanks for the review, [~ozawa] :)

 Better defaults for YARN ZK configs for retries and retry-inteval when HA is 
 enabled
 

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.5.0

 Attachments: yarn-2054-1.patch, yarn-2054-2.patch, yarn-2054-3.patch, 
 yarn-2054-4.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1877) Document yarn.resourcemanager.zk-auth and its scope

2014-05-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1877:
---

Summary: Document yarn.resourcemanager.zk-auth and its scope  (was: ZK 
store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node 
auth)

 Document yarn.resourcemanager.zk-auth and its scope
 ---

 Key: YARN-1877
 URL: https://issues.apache.org/jira/browse/YARN-1877
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-1877.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth

2014-05-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013793#comment-14013793
 ] 

Karthik Kambatla commented on YARN-1877:


Thanks for investigating this, Robert. 

+1 - the description is missing a closing ), I ll add it at commit time.

 ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root 
 node auth
 ---

 Key: YARN-1877
 URL: https://issues.apache.org/jira/browse/YARN-1877
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-1877.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1877) Document yarn.resourcemanager.zk-auth and its scope

2014-05-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1877:
--

Assignee: Robert Kanter  (was: Karthik Kambatla)

 Document yarn.resourcemanager.zk-auth and its scope
 ---

 Key: YARN-1877
 URL: https://issues.apache.org/jira/browse/YARN-1877
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: YARN-1877.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2054) Better defaults for YARN ZK configs for retries and retry-inteval when HA is enabled

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013812#comment-14013812
 ] 

Hudson commented on YARN-2054:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5631 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5631/])
YARN-2054. Better defaults for YARN ZK configs for retries and retry-inteval 
when HA is enabled. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598630)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


 Better defaults for YARN ZK configs for retries and retry-inteval when HA is 
 enabled
 

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.5.0

 Attachments: yarn-2054-1.patch, yarn-2054-2.patch, yarn-2054-3.patch, 
 yarn-2054-4.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013833#comment-14013833
 ] 

Hudson commented on YARN-1338:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5632 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5632/])
YARN-1338. Recover localized resource cache state upon nodemanager restart 
(Contributed by Jason Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598640)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceRecoveredEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java
* 

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013834#comment-14013834
 ] 

Hudson commented on YARN-2010:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5632 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5632/])
YARN-2010. Document yarn.resourcemanager.zk-auth and its scope. (Robert Kanter 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598636)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013876#comment-14013876
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Thank you for the suggestion, Sandy. 

{quote}
ContainerExitStatus should stay an int. While ContainerStatus.getExitStatus is 
technically marked Unstable, I'm sure changing this would break some 
applications.
{quote}

I agree with this. In the latest patch, ContainerExitStatus stays an int.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013880#comment-14013880
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], while reading YARN-2022, I don't know if you considered 
masterContainer recovering in this patch or not. I haven't found it in the 
patch, I think we need to consider it if it's not here. For preemption policy, 
it's important to get AM containers before making preemption decisions.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013890#comment-14013890
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Changelogs in v4:
* Added KILL_EXCEEDED_PMEM, KILL_EXCEEDED_VMEM to ContainerExitStatus.
* Updated ContainersMonitorImpl for dispatching 
KILL_EXCEEDED_VMEM/KILL_EXCEEDED_PMEM.
* If the exit reason is AM-aware({{ContainerExitStatus#isAMAware()}}), pass it 
to app masters. Otherwise, the exit reason is converted into 
ExitCode.TERMINATED.getExitCode() for backward compatibility. AM-Aware events 
are DISKS_FAILED, KILL_EXCEEDED_PMEM, KILL_EXCEEDED_VMEM currently.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013926#comment-14013926
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Jenkins passed last night. It's ready for review. 

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2099) Preemption in fair scheduler should consider app priorities

2014-05-30 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2099:
--

Attachment: YARN-2099.patch

Upload an initial patch to capture which we discussed.
Need to add testcases once YARN-2098 resolved.

 Preemption in fair scheduler should consider app priorities
 ---

 Key: YARN-2099
 URL: https://issues.apache.org/jira/browse/YARN-2099
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2099.patch


 Fair scheduler should take app priorities into account while
 preempting containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013998#comment-14013998
 ] 

Jian He commented on YARN-1368:
---

Wangda, AM container is just one type of container and should be covered 
already in the patch.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2072) RM/NM UIs and webservices are missing vcore information

2014-05-30 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-2072:
-

Attachment: YARN-2072.patch

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1868:


Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

 YARN status web ui does not show correctly in IE 11
 ---

 Key: YARN-1868
 URL: https://issues.apache.org/jira/browse/YARN-1868
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
  Labels: yxls123123
 Attachments: YARN-1868.1.patch, YARN-1868.2.patch, YARN-1868.patch, 
 YARN_status.png


 The YARN status web ui does not show correctly in IE 11. The drop down menu 
 for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-05-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014043#comment-14014043
 ] 

Zhijie Shen commented on YARN-1702:
---

[~vvasudev], thanks for the big patch! I've looked through it, and bellow are 
some high level comments:

1. There're lot of formatting changes in TestRMWebServicesApps, which seem no 
to be necessary, and affect review.

2. getAppState seem not to be necessary, as we have getApp, which returns a 
full report including the state.

3. I'm not sure it is good idea to have a updateAppState API, but only allow 
to change the state to KILLED. Why not having killApp directly, and accepting 
an appId?
{code}
+  @PUT
+  @Path(/apps/{appid}/state)
+  @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
+  @Consumes({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
+  public Response updateAppState(AppState targetState,
+  @Context HttpServletRequest hsr, @PathParam(appid) String appId)
+  throws AuthorizationException, YarnException, InterruptedException,
+  IOException {
{code}

4. We should make killApp work in insecure mode as well, as we can do it via 
PRC.

5. In YarnClientImpl, we have implemented the logic to keep sending the kill 
request until we get confirmed that the app is killed. IMHO, as the user of 
REST API should be a thin client, we may want to implement this logic at the 
server side, blocking the response until we confirm that the app is killed. In 
RPC we have limited the number of concurrent threads. However, at the web side, 
we don't have this limitation, right?

6. As to the authentication filter, I think it's not just the problem of 
killApp, the whole RM web is unprotected, but we can handle this issue 
separately. Some lessons from implementing security for the timeline server:

a) It's better to have separate configs for RM only, and load the 
authentication filter for RM daemon only instead of all.
b) RM may also want Kerberos + DT authentication style.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014050#comment-14014050
 ] 

Hudson commented on YARN-1868:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5634 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5634/])
YARN-1868. YARN status web ui does not show correctly in IE 11. Contributed by 
Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598686)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/HtmlPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/TestSubViews.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestHtmlPage.java


 YARN status web ui does not show correctly in IE 11
 ---

 Key: YARN-1868
 URL: https://issues.apache.org/jira/browse/YARN-1868
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
  Labels: yxls123123
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1868.1.patch, YARN-1868.2.patch, YARN-1868.patch, 
 YARN_status.png


 The YARN status web ui does not show correctly in IE 11. The drop down menu 
 for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014079#comment-14014079
 ] 

Bikas Saha commented on YARN-2091:
--

Why is isAMAware needed. All values in ContainerExitStatus are public and 
hence user code should already be aware of them.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2072) RM/NM UIs and webservices are missing vcore information

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014100#comment-14014100
 ] 

Hadoop QA commented on YARN-2072:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647655/YARN-2072.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3867//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3867//console

This message is automatically generated.

 RM/NM UIs and webservices are missing vcore information
 ---

 Key: YARN-2072
 URL: https://issues.apache.org/jira/browse/YARN-2072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2072.patch


 Change RM and NM UIs and webservices to include virtual cores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014106#comment-14014106
 ] 

Tsuyoshi OZAWA commented on YARN-2103:
--

[~decster], thank you for the update! I think some test cases are missing like 
calling functions before {{init()}} and calling {{deSerialize()}}. Do you mind 
adding these tests to your patch? It covers overall functions in 
SerializedExceptionPBImpl.

{code}
  @Test
  public void testDeserialize() throws Exception {
SerializedExceptionProto defaultProto =
SerializedExceptionProto.newBuilder().build();
Exception ex =  new Exception(test exception);
SerializedExceptionPBImpl pb = new SerializedExceptionPBImpl();

try {
  pb.deSerialize();
  Assert.fail(deSerialze should throw YarnRuntimeException);
} catch (YarnRuntimeException e) {
  Assert.assertEquals(ClassNotFoundException.class,
  e.getCause().getClass());
}

pb.init(ex);
Assert.assertEquals(ex.toString(), pb.deSerialize().toString());
  }

  @Test
  public void testBeforeInit() throws Exception {
SerializedExceptionProto defaultProto =
SerializedExceptionProto.newBuilder().build();

SerializedExceptionPBImpl pb1 = new SerializedExceptionPBImpl();
Assert.assertNull(pb1.getCause());

SerializedExceptionPBImpl pb2 = new SerializedExceptionPBImpl();
Assert.assertEquals(defaultProto, pb2.getProto());

SerializedExceptionPBImpl pb3 = new SerializedExceptionPBImpl();
Assert.assertEquals(defaultProto.getTrace(), pb3.getRemoteTrace());
  }
{code}

 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014124#comment-14014124
 ] 

Hadoop QA commented on YARN-1702:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12644316/apache-yarn-1702.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3868//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3868//console

This message is automatically generated.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2115:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-556

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 ContainerRecoveryReport
 -

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new ContainerRecoveryReport to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014245#comment-14014245
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Make sense. I found some test cases that check the exit code as 
ExitCode.TERMINATION.getCode() and I thought we need to preserve the 
semantics. These should be fixed, right? Thanks for the clarification. 

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2115:
--

Attachment: YARN-2115.2.patch

Thanks Vinod for the review,
Addressed the comments accordingly.

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 ContainerRecoveryReport
 -

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new ContainerRecoveryReport to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus

2014-05-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2115:
--

Description: This jira is protocol changes only to replace the 
ContainerStatus sent across via NM register call with a new NMContainerStatus 
to include all the necessary information for container recovery.  (was: This 
jira is protocol changes only to replace the ContainerStatus sent across via NM 
register call with a new ContainerRecoveryReport to include all the necessary 
information for container recovery.)

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 NMContainerStatus
 ---

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new NMContainerStatus to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus

2014-05-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2115:
--

Summary: Replace RegisterNodeManagerRequest#ContainerStatus with a new 
NMContainerStatus  (was: Replace RegisterNodeManagerRequest#ContainerStatus 
with a new ContainerRecoveryReport)

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 NMContainerStatus
 ---

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new ContainerRecoveryReport to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014284#comment-14014284
 ] 

Bikas Saha commented on YARN-2091:
--

We can check all cases of ContainerKillEvent and add new ExitStatus values 
where it makes sense or use some good default value. If a test needs to change 
to account for a new value then we should change the test. There may be other 
cases of exit status being set or tested which are unrelated to container kill 
event. Those can stay out of the scope of this jira.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014304#comment-14014304
 ] 

Sandy Ryza commented on YARN-1913:
--

Thanks for the updated patch Wei.  For queues, maxAMShare should be defined as 
a fraction of the queue's fair share, not maxShare.  The majority of queues are 
configured with infinite maxResources.  We need to be careful with this, as 
fair shares can change when queues are created dynamically.

I think it might make sense to only allow the queue-level maxAMShare on leaf 
queues for the moment.  I can't think of a strong reason somebody would want to 
set it on a parent queue, and doing this would allow us to avoid the complex 
logic in MaxRunningAppsEnforcer, and merely enforce the AM max share by 
checking in AppSchedulable.assignContainer.  This is also what the Capacity 
Scheduler has at the moment.


 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014308#comment-14014308
 ] 

Wei Yan commented on YARN-1913:
---

Thanks, Sandy. Will update a patch.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-05-30 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1550:


Attachment: YARN-1550.002.patch

Added tests

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014328#comment-14014328
 ] 

Hadoop QA commented on YARN-2115:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647696/YARN-2115.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3869//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3869//console

This message is automatically generated.

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 NMContainerStatus
 ---

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new NMContainerStatus to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014357#comment-14014357
 ] 

Ashwin Shankar commented on YARN-1913:
--

Hey [~sandyr], quick comment 
bq.I think it might make sense to only allow the queue-level maxAMShare on leaf 
queues for the moment. I can't think of a strong reason somebody would want to 
set it on a parent queue
For NestedUserQueue rule, user queues would be created dynamically under a 
parent. For this use case,
maxAMShare at the parent would be useful, since leaf user queues are not 
configured in the alloc xml. 
I see your point that it would complicate the logic at 
MaxRunningAppsEnforcer,but just wanted to bring this up in case you
didn't consider this use case.


 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014370#comment-14014370
 ] 

Hadoop QA commented on YARN-1550:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647711/YARN-1550.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3870//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3870//console

This message is automatically generated.

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2116) TestRMAdminCLI#testTransitionToActive and testHelp fail on trunk

2014-05-30 Thread Jian He (JIRA)
Jian He created YARN-2116:
-

 Summary: TestRMAdminCLI#testTransitionToActive and testHelp fail 
on trunk
 Key: YARN-2116
 URL: https://issues.apache.org/jira/browse/YARN-2116
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


Two tests fail as following
{code}
testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
elapsed: 0.105 sec   ERROR!
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.remove(AbstractList.java:144)
at java.util.AbstractList$Itr.remove(AbstractList.java:360)
at java.util.AbstractCollection.remove(AbstractCollection.java:252)
at 
org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)

testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.091 sec 
  FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)


Results :
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus

2014-05-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014407#comment-14014407
 ] 

Vinod Kumar Vavilapalli commented on YARN-2115:
---

Looks good, +1. Checking this in.

 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 NMContainerStatus
 ---

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new NMContainerStatus to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1368:
--

Attachment: YARN-1368.7.patch

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2117) Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in finally block

2014-05-30 Thread Ted Yu (JIRA)
Ted Yu created YARN-2117:


 Summary: Close of Reader in 
TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in 
finally block
 Key: YARN-2117
 URL: https://issues.apache.org/jira/browse/YARN-2117
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
Reader reader = new FileReader(signatureSecretFile);
int c = reader.read();
while (c  -1) {
  secret.append((char) c);
  c = reader.read();
}
reader.close();
{code}
If IOException is thrown out of reader.read(), reader would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus

2014-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014419#comment-14014419
 ] 

Hudson commented on YARN-2115:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5639 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5639/])
YARN-2115. Replaced RegisterNodeManagerRequest's ContainerStatus with a new 
NMContainerStatus which has more information that is needed for work-preserving 
RM-restart. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598790)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NMContainerStatus.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Replace RegisterNodeManagerRequest#ContainerStatus with a new 
 NMContainerStatus
 ---

 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2115.1.patch, YARN-2115.2.patch


 This jira is protocol changes only to replace the ContainerStatus sent across 
 via NM register call with a new NMContainerStatus to include all the 
 necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Ted Yu (JIRA)
Ted Yu created YARN-2118:


 Summary: Type mismatch in contains() check of 
TimelineWebServices#injectOwnerInfo()
 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
if (timelineEntity.getPrimaryFilters() != null 
timelineEntity.getPrimaryFilters().containsKey(
TimelineStore.SystemFilter.ENTITY_OWNER)) {
  throw new YarnException(
{code}
getPrimaryFilters() returns a Map keyed by String.
However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014422#comment-14014422
 ] 

Wei Yan commented on YARN-1913:
---

Update a new patch to fix Sandy's comments.
[~ashwinshankar77], if the leaf queue is not configured, the default AM 
resource limit is (leaf_queue_fair_share * 1.0f), still limited by its fair 
share.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014426#comment-14014426
 ] 

Jian He commented on YARN-1368:
---

bq.Kill container? Same for the following too?
good point,fixed.
bq. Instead we should use getCurrentAttemptForContainer(ContainerId 
containerId)?
I think the RMContainer should be created with the original attempt Id. The 
containerId to attemptId routing will happen automatically.
bq. ContainerRecoveredTransition: Missing other transitions that a regular 
container goes through?
checked the code, we only need to send event to update the ranNodes. Added 
here. Eventually, YARN-1885 should fix the ranNodes thing on recovery.
bq. Kill the container when the following happens?
I added comment saying this condition can never happen.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-05-30 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---

Attachment: YARN-2080.patch

Attaching a patch file that wires the reservation APIs into existing YARN APIs.

It introduces a new component *ReservationSystem* that essentially manages all 
the _Plans_ (#YARN-1709) configured in the ResourceSchedulers. The 
ReservationSystem is bootstrapped by ResourceManager if it is enabled in 
configuration.

The ClientRMService has implementation of the reservation APIs which are 
additionally exposed via the YarnClient.


 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014439#comment-14014439
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], I mean after RM restart and recover, the 
RMAppAttempt.getMasterContainer will return correct master container or not?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014442#comment-14014442
 ] 

Jian He commented on YARN-1368:
---

bq.  RMAppAttempt.getMasterContainer will return correct master container or 
not?
Yes, RMAppAttemptImpl.recover does that.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014445#comment-14014445
 ] 

Hadoop QA commented on YARN-1368:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647729/YARN-1368.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3871//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3871//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014447#comment-14014447
 ] 

Hadoop QA commented on YARN-1913:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647733/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3872//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3872//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014481#comment-14014481
 ] 

Zhijie Shen commented on YARN-2118:
---

Ted, good catch! Do you want to pick this issue?

 Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()
 --

 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu

 {code}
 if (timelineEntity.getPrimaryFilters() != null 
 timelineEntity.getPrimaryFilters().containsKey(
 TimelineStore.SystemFilter.ENTITY_OWNER)) {
   throw new YarnException(
 {code}
 getPrimaryFilters() returns a Map keyed by String.
 However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
 Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2118:
--

Priority: Major  (was: Minor)

 Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()
 --

 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu

 {code}
 if (timelineEntity.getPrimaryFilters() != null 
 timelineEntity.getPrimaryFilters().containsKey(
 TimelineStore.SystemFilter.ENTITY_OWNER)) {
   throw new YarnException(
 {code}
 getPrimaryFilters() returns a Map keyed by String.
 However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
 Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned YARN-2118:


Assignee: Ted Yu

 Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()
 --

 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-2118-v1.txt


 {code}
 if (timelineEntity.getPrimaryFilters() != null 
 timelineEntity.getPrimaryFilters().containsKey(
 TimelineStore.SystemFilter.ENTITY_OWNER)) {
   throw new YarnException(
 {code}
 getPrimaryFilters() returns a Map keyed by String.
 However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
 Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2118:
-

Attachment: yarn-2118-v1.txt

 Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()
 --

 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: yarn-2118-v1.txt


 {code}
 if (timelineEntity.getPrimaryFilters() != null 
 timelineEntity.getPrimaryFilters().containsKey(
 TimelineStore.SystemFilter.ENTITY_OWNER)) {
   throw new YarnException(
 {code}
 getPrimaryFilters() returns a Map keyed by String.
 However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
 Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2118) Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014494#comment-14014494
 ] 

Hadoop QA commented on YARN-2118:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647748/yarn-2118-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3873//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3873//console

This message is automatically generated.

 Type mismatch in contains() check of TimelineWebServices#injectOwnerInfo()
 --

 Key: YARN-2118
 URL: https://issues.apache.org/jira/browse/YARN-2118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-2118-v1.txt


 {code}
 if (timelineEntity.getPrimaryFilters() != null 
 timelineEntity.getPrimaryFilters().containsKey(
 TimelineStore.SystemFilter.ENTITY_OWNER)) {
   throw new YarnException(
 {code}
 getPrimaryFilters() returns a Map keyed by String.
 However, TimelineStore.SystemFilter.ENTITY_OWNER is an enum.
 Their types don't match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014529#comment-14014529
 ] 

Jian He commented on YARN-1367:
---

Thanks for working on the patch. 
The patch needs update, can you update please ? 
A few initial comments:
- Let's leave containerId handled  in YARN-2052 separately.
- The extra ContainerReport in RegisterNodeManagerRequest is not needed any 
more.
- NM side may not need the config of work-preserving restart enabled. Given RM 
has this config already, RM should be able to instruct NM to 
keep_containers_on_resync in the case of work-preserving restart and 
kill_containers_on_resync in the case of non-work-preserving restart.  We also 
avoid config overhead on each NM if doing this. 

 After restart NM should resync with the RM without killing containers
 -

 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1367.prototype.patch


 After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
  Upon receiving the resync response, the NM kills all containers and 
 re-registers with the RM. The NM should be changed to not kill the container 
 and instead inform the RM about all currently running containers including 
 their allocations etc. After the re-register, the NM should send all pending 
 container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)