date:20131025


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805076#comment-13805076
 ] 

Hadoop QA commented on YARN-1172:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610275/YARN-1172.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2279//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2279//console

This message is automatically generated.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
 YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch, 
 YARN-1172.8.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message

2013-10-25 Thread Konstantin Weitz (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Weitz updated YARN-1351:
---

Attachment: patch

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message


[ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805103#comment-13805103
 ] 

Hadoop QA commented on YARN-1351:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610284/patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2280//console

This message is automatically generated.

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

[
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805105#comment-13805105
]

Bikas Saha commented on YARN-1307:
--

After looking at the code I understand what you mean by the suffix numbers. Its
the _1, _2 in the tree above for token and key. I think its fine to use the
current approach that uses sequence number for tokens and key id for secret
keys. Or we can name them serially as 1,2,3... etc like you describe above.
Either is fine. We will soon be batching them into znodes anyways in the near
future.
Above looks good.

Rethink znode structure for RM HA
-

Key: YARN-1307
URL: https://issues.apache.org/jira/browse/YARN-1307
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Attachments: YARN-1307.1.patch, YARN-1307.2.patch

Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659,
YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in
YARN-1222:
{quote}
We should move to creating a node hierarchy for apps such that all znodes for
an app are stored under an app znode instead of the app root znode. This will
help in removeApplication and also in scaling better on ZK. The earlier code
was written this way to ensure create/delete happens under a root znode for
fencing. But given that we have moved to multi-operations globally, this isnt
required anymore.
{quote}

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805111#comment-13805111
 ] 

Tsuyoshi OZAWA commented on YARN-1307:
--

Yes, you're right. Thanks for your feedback!

 Rethink znode structure for RM HA
 -

 Key: YARN-1307
 URL: https://issues.apache.org/jira/browse/YARN-1307
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1307.1.patch, YARN-1307.2.patch


 Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
 YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
 YARN-1222:
 {quote}
 We should move to creating a node hierarchy for apps such that all znodes for 
 an app are stored under an app znode instead of the app root znode. This will 
 help in removeApplication and also in scaling better on ZK. The earlier code 
 was written this way to ensure create/delete happens under a root znode for 
 fencing. But given that we have moved to multi-operations globally, this isnt 
 required anymore.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805114#comment-13805114
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

Thank you for the review, Karthik. Yes, a approach I suggested is more 
complicated approach than converting SecretManager to extend AbstractService. 
Let's discuss on HADOOP-10043.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
 YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch, 
 YARN-1172.8.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1333) Support blacklisting in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1333:
-

Attachment: YARN-1333.2.patch

* Fixed method name from testBlackListNodes to testBlacklistNodes
* Fixed to use a resource manager created in setup method.
* Fixed to create AppAddedSchedulerEvent from createSchedulingRequest()'s 
return value
* Fixed indentation.
* Deleted needless Assert. prefix because of static import.
* Changed to call scheduler.applications.get().
* Added to test to verify that an container does not actually get place on the 
blacklisted host. 

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805198#comment-13805198
 ] 

Hadoop QA commented on YARN-1333:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610296/YARN-1333.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2281//console

This message is automatically generated.

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication


[ 
https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805221#comment-13805221
 ] 

Hudson commented on YARN-1335:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #373 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/373/])
YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
SchedulerApplication (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java


 Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
 SchedulerApplication
 --

 Key: YARN-1335
 URL: https://issues.apache.org/jira/browse/YARN-1335
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.1

 Attachments: YARN-1335-1.patch, YARN-1335.patch


 FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places.  
 They both extend SchedulerApplication.  We can move a lot of this duplicate 
 code into SchedulerApplication.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1340) MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space

2013-10-25 Thread Per Bergland (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Bergland updated YARN-1340:
---

Description: 
We found that our tests based on the ClusterMapReduceTestCase class failed when 
the jenkins job contained spaces and were able to reproduce the error by just 
renaming the project directory to contain a space character.
The failure happens when validatePaths method in LocalDirsHandlerService tries 
to interpret the paths as URLs new URL(dir) and this fails.

https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

The code in the MiniYARNCluster.prepareDirs method needs to be modified to 
create properly escaped file://-based URLs instead of raw file paths OR the 
receiving end in LocalDirsHandlerService needs to stop interpreting the 
directories as urls. Since MiniYARNCluster is a test class I suspect that the 
former needs to be done.

  was:
We found that our tests based on the ClusterMapReduceTestCase class failed when 
the jenkins job contained spaces and were able to reproduce the error by just 
renaming the project directory to create a space character.
The failure happens when validatePaths method in LocalDirsHandlerService tries 
to interpret the paths as URLs new URL(dir) and this fails.

https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

The code in the MiniYARNCluster.prepareDirs method needs to be modified to 
create properly escaped file://-based URLs instead of raw file paths OR the 
receiving end in LocalDirsHandlerService needs to stop interpreting the 
directories as urls. Since MiniYARNCluster is a test class I suspect that the 
former needs to be done.


 MiniYARNCluster generates wrong style directories in 
 YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space
 

 Key: YARN-1340
 URL: https://issues.apache.org/jira/browse/YARN-1340
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.6-alpha
 Environment: Mac OS X 10.8.5, CentOS 6.3
Reporter: Per Bergland

 We found that our tests based on the ClusterMapReduceTestCase class failed 
 when the jenkins job contained spaces and were able to reproduce the error by 
 just renaming the project directory to contain a space character.
 The failure happens when validatePaths method in LocalDirsHandlerService 
 tries to interpret the paths as URLs new URL(dir) and this fails.
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
 The code in the MiniYARNCluster.prepareDirs method needs to be modified to 
 create properly escaped file://-based URLs instead of raw file paths OR the 
 receiving end in LocalDirsHandlerService needs to stop interpreting the 
 directories as urls. Since MiniYARNCluster is a test class I suspect that the 
 former needs to be done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1340) MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space

2013-10-25 Thread Per Bergland (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805289#comment-13805289
 ] 

Per Bergland commented on YARN-1340:


prepareDirs should do:
new File(path).toURI().toURL() on the paths if the NM_LOCAL_DIR is supposed to 
be URIs

 MiniYARNCluster generates wrong style directories in 
 YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space
 

 Key: YARN-1340
 URL: https://issues.apache.org/jira/browse/YARN-1340
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.6-alpha
 Environment: Mac OS X 10.8.5, CentOS 6.3
Reporter: Per Bergland

 We found that our tests based on the ClusterMapReduceTestCase class failed 
 when the jenkins job contained spaces and were able to reproduce the error by 
 just renaming the project directory to contain a space character.
 The failure happens when validatePaths method in LocalDirsHandlerService 
 tries to interpret the paths as URLs new URL(dir) and this fails.
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
 The code in the MiniYARNCluster.prepareDirs method needs to be modified to 
 create properly escaped file://-based URLs instead of raw file paths OR the 
 receiving end in LocalDirsHandlerService needs to stop interpreting the 
 directories as urls. Since MiniYARNCluster is a test class I suspect that the 
 former needs to be done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication


[ 
https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805292#comment-13805292
 ] 

Hudson commented on YARN-1335:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1563 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1563/])
YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
SchedulerApplication (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java


 Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
 SchedulerApplication
 --

 Key: YARN-1335
 URL: https://issues.apache.org/jira/browse/YARN-1335
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.1

 Attachments: YARN-1335-1.patch, YARN-1335.patch


 FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places.  
 They both extend SchedulerApplication.  We can move a lot of this duplicate 
 code into SchedulerApplication.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication


[ 
https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805315#comment-13805315
 ] 

Hudson commented on YARN-1335:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1589 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1589/])
YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
SchedulerApplication (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java


 Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into 
 SchedulerApplication
 --

 Key: YARN-1335
 URL: https://issues.apache.org/jira/browse/YARN-1335
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.1

 Attachments: YARN-1335-1.patch, YARN-1335.patch


 FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places.  
 They both extend SchedulerApplication.  We can move a lot of this duplicate 
 code into SchedulerApplication.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler

2013-10-25 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805465#comment-13805465
 ] 

Sandy Ryza commented on YARN-1333:
--

Thanks Tsuyoshi.  Just a couple more things:
{code}
+resourceManager.getRMContext().getAMFinishingMonitor();
{code}
Is this line necessary?

{code}
+ApplicationAttemptId appAttemptId = createSchedulingRequest(1024, 
root.default, user, 1);
{code}
This looks like more than 80 characters

{code}
+if (SchedulerAppUtils.isBlacklisted(application, node, LOG)) {
+  return null;
+}
{code}
Can this be moved to the equivalent of where it is in the capacity scheduler, 
i.e. FSLeafQueue?

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2013-10-25 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805491#comment-13805491
 ] 

Arun C Murthy commented on YARN-1042:
-

[~djp] Do you mind if I take this over? I can do this concurrently with 
YARN-796 (which I already have a patch). Tx!

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Junping Du
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-10-25 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805494#comment-13805494
]

Zhijie Shen commented on YARN-674:
--

I've a quick look at the patch. Here're my comments:

1. It seem that the change in RMAppManager is not necessary, because the
current logic is to reject the app in the secure case when parsing the
credentials and adding the apps to DelegationTokenRenewer have something wrong;
otherwise, the app will be accepted. Though there's no obvious if... else...
structure, it achieves the same logic control via:
{code}
throw RPCUtil.getRemoteException(ie);
{code}
I think the exception needs to be thrown, which is missing in your patch. The
exception will notice the client that the app submission fails; otherwise, the
client will think the submission succeeds?

If I miss some ideas here, please let me know.

2. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will
the impact of that the application is already accepted and starts its life
cycle, while DelegationTokenRenewer is so slow to
DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else
due to the fresh token unavailable?

3. I noticed testConncurrentAddApplication has been removed. Does the change
affect the current app submission?

Slow or failing DelegationToken renewals on submission itself make RM
unavailable
-

Key: YARN-674
URL: https://issues.apache.org/jira/browse/YARN-674
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Attachments: YARN-674.1.patch

This was caused by YARN-280. A slow or a down NameNode for will make it look
like RM is unavailable as it may run out of RPC handlers due to blocked
client submissions.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.


 [ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1349:


Attachment: YARN-1349.1.patch

Attaching patch to fix yarn.cmd.  I also needed to add a special case for the 
classpath sub-command so that it wouldn't try to dispatch to java.  This is 
identical to how it's handled in hadoop.cmd.

 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1352) Recover LogAggregationService upon nodemanager restart

Jason Lowe created YARN-1352:


 Summary: Recover LogAggregationService upon nodemanager restart
 Key: YARN-1352
 URL: https://issues.apache.org/jira/browse/YARN-1352
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe


LogAggregationService state needs to be recovered as part of the 
work-preserving nodemanager restart feature.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message

2013-10-25 Thread Konstantin Weitz (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Weitz updated YARN-1351:
---

Attachment: (was: patch)

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: fixprnt.patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2013-10-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805512#comment-13805512
 ] 

Steve Loughran commented on YARN-941:
-

Ignoring HDFS, updated Yarn RM tokens (and the RM-assigned AM RPC token) could 
be passed to the AM by killing the container and creating a new one, once 
YARN-1041 handles AM restart better.

This may seem brutal, but it stops your code getting complacent about not 
having to handle AM failure -and it means the current token retrieval process 
is all that is needed



 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans

 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message

2013-10-25 Thread Konstantin Weitz (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Weitz updated YARN-1351:
---

Attachment: fixprnt.patch

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: fixprnt.patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1353) Containers not being killed on Linux after application is killed

Bikas Saha created YARN-1353:


 Summary: Containers not being killed on Linux after application is 
killed
 Key: YARN-1353
 URL: https://issues.apache.org/jira/browse/YARN-1353
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Bikas Saha


Running application on a Linux cluster where setsid is available. After killing 
the application via yarn application -kill we see that containers for that 
application are still hanging around for up to 30 mins after the application 
kill.
The NM log says that the container was killed with code 143 but it seems that 
only the shell launcher is killed.
uname -a output
Linux ZZZ.com 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 
x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.

[
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805515#comment-13805515
]

Hadoop QA commented on YARN-1349:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12610350/YARN-1349.1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2283//console

This message is automatically generated.

yarn.cmd does not support passthrough to any arbitrary class.
-

Key: YARN-1349
URL: https://issues.apache.org/jira/browse/YARN-1349
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Attachments: YARN-1349.1.patch

The yarn shell script supports passthrough to calling any arbitrary class if
the first argument is not one of the per-defined sub-commands. The
equivalent cmd script does not implement this and instead fails trying to do
a labeled goto to the first argument.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.

2013-10-25 Thread Chuan Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805518#comment-13805518
 ] 

Chuan Liu commented on YARN-1349:
-

Are 'proxyserver' and 'node' missing from 'yarncommands'? Otherwise +1 from me.

 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1353) Containers not being killed on Linux after application is killed


 [ 
https://issues.apache.org/jira/browse/YARN-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1353:
-

Attachment: container_1382388401549_1060_01_000227.log

Logs for the container attached.

 Containers not being killed on Linux after application is killed
 

 Key: YARN-1353
 URL: https://issues.apache.org/jira/browse/YARN-1353
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Attachments: container_1382388401549_1060_01_000227.log


 Running application on a Linux cluster where setsid is available. After 
 killing the application via yarn application -kill we see that containers 
 for that application are still hanging around for up to 30 mins after the 
 application kill.
 The NM log says that the container was killed with code 143 but it seems that 
 only the shell launcher is killed.
 uname -a output
 Linux ZZZ.com 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 
 x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2013-10-25 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805531#comment-13805531
 ] 

Junping Du commented on YARN-1042:
--

Sure. Arun, thanks for working on this. Please go ahead!

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Junping Du
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805522#comment-13805522
]

Bikas Saha commented on YARN-311:
-

Can we please double check and assure ourselves that this is deadlock free.
{code}
+// Update resource if any change
+synchronized(nm) {
+ SchedulerUtils.updateResourceIfChanged(node, nm, clusterResource, LOG);
+}
{code}

Dynamic node resource configuration: core scheduler changes
---

Key: YARN-311
URL: https://issues.apache.org/jira/browse/YARN-311
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-311-v10.patch, YARN-311-v1.patch,
YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch,
YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch,
YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch

As the first step, we go for resource change on RM side and expose admin APIs
(admin protocol, CLI, REST and JMX API) later. In this jira, we will only
contain changes in scheduler.
The flow to update node's resource and awareness in resource scheduling is:
1. Resource update is through admin API to RM and take effect on RMNodeImpl.
2. When next NM heartbeat for updating status comes, the RMNode's resource
change will be aware and the delta resource is added to schedulerNode's
availableResource before actual scheduling happens.
3. Scheduler do resource allocation according to new availableResource in
SchedulerNode.
For more design details, please refer proposal and discussions in parent
JIRA: YARN-291.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1354) Recover applications upon nodemanager restart

Jason Lowe created YARN-1354:


 Summary: Recover applications upon nodemanager restart
 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe


The set of active applications in the nodemanager context need to be recovered 
for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1355) Recover application ACLs upon nodemanager restart

Jason Lowe created YARN-1355:


 Summary: Recover application ACLs upon nodemanager restart
 Key: YARN-1355
 URL: https://issues.apache.org/jira/browse/YARN-1355
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe


The ACLs for applications need to be recovered for work-preserving nodemanager 
restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2013-10-25 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-1042:


Assignee: Arun C Murthy  (was: Junping Du)

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805535#comment-13805535
 ] 

Hadoop QA commented on YARN-674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610181/YARN-674.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2282//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2282//console

This message is automatically generated.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1355) Recover application ACLs upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805537#comment-13805537
 ] 

Jason Lowe commented on YARN-1355:
--

One idea is to persist the ACLs for an application underneath the application 
directory in the log directory tree.  That has the benefit of  automatically 
removing the persisted ACL data when an application's logs are removed (and 
thus ACLs are no longer needed).  Restoring of application ACLs potentially 
could be lazily performed as well if it isn't cached in memory.

 Recover application ACLs upon nodemanager restart
 -

 Key: YARN-1355
 URL: https://issues.apache.org/jira/browse/YARN-1355
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe

 The ACLs for applications need to be recovered for work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store

[
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805545#comment-13805545
]

Hadoop QA commented on YARN-891:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12610216/YARN-891.7.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2285//console

This message is automatically generated.

Store completed application information in RM state store
-

Key: YARN-891
URL: https://issues.apache.org/jira/browse/YARN-891
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch,
YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch,
YARN-891.7.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch,
YARN-891.patch, YARN-891.patch, YARN-891.patch

Store completed application/attempt info in RMStateStore when
application/attempt completes. This solves some problems like finished
application get lost after RM restart and some other races like YARN-1195

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.


[ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805549#comment-13805549
 ] 

Chris Nauroth commented on YARN-1349:
-

{quote}
-1 tests included. The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
{quote}

All changes are in scripts, so there are no new tests.  I manually tested this 
patch in a running Windows cluster by running all yarn.cmd sub-commands and one 
additional direct class to cover the passthrough case.

 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch, YARN-1349.2.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.


 [ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1349:


Attachment: YARN-1349.2.patch

node was already in there, but proxyserver was missing.  Thanks for 
catching that.  Here is a new patch.

 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch, YARN-1349.2.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.

2013-10-25 Thread Chuan Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805552#comment-13805552
 ] 

Chuan Liu commented on YARN-1349:
-

+1

 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch, YARN-1349.2.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-25 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1380#comment-1380
]

Junping Du commented on YARN-311:
-

Thanks for comments, Bikas! The synchronization here is to make sure the read
of nm (rmNode) resource is thread-safe while another thread do write
(nm.setTotalCapacity()) triggered in AdminService (an implementation of
RMAdminProtocol). Given SchedulerUtils.updateResourceIfChanged() itself is
lock-free and nm.setTotalCapacity() is also lock-free, it is easily to execute
through when getting nm synchronization lock, so it is deadlock free. Does it
make sense?

Dynamic node resource configuration: core scheduler changes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.

[
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805557#comment-13805557
]

Hadoop QA commented on YARN-1349:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12610357/YARN-1349.2.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2286//console

This message is automatically generated.

yarn.cmd does not support passthrough to any arbitrary class.
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message


[ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805560#comment-13805560
 ] 

Hadoop QA commented on YARN-1351:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610355/fixprnt.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2284//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2284//console

This message is automatically generated.

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: fixprnt.patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.


[ 
https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805565#comment-13805565
 ] 

Chris Nauroth commented on YARN-1349:
-

bq. -1 javac. The patch appears to cause the build to fail.

It looks like the Jenkins box is overloaded.  It can't fork a new thread.  This 
patch only contains cmd script changes, so there is no way it can make javac 
fail.

{code}
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at java.lang.ref.Reference.clinit(Reference.java:145)
{code}


 yarn.cmd does not support passthrough to any arbitrary class.
 -

 Key: YARN-1349
 URL: https://issues.apache.org/jira/browse/YARN-1349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-1349.1.patch, YARN-1349.2.patch


 The yarn shell script supports passthrough to calling any arbitrary class if 
 the first argument is not one of the per-defined sub-commands.  The 
 equivalent cmd script does not implement this and instead fails trying to do 
 a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service


 [ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1318:
-

Priority: Critical  (was: Major)

 Promote AdminService to an Always-On service
 

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
  Labels: ha

 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1068) Add admin support for HA operations


[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805581#comment-13805581
 ] 

Bikas Saha commented on YARN-1068:
--

[~vinodkv] Does the new patch address your concerns?
I have marked YARN-1318 as a blocker for YARN-149. We must fix that before 
failover is available. Karthik, in your final patch can you please include 
clear comments pointing to YARN-1318 near the @Private annotations for 
RMHAProtocolService. Thanks!

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-1.patch, yarn-1068-2.patch, 
 yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, 
 yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, 
 yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1068) Add admin support for HA operations

2013-10-25 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Attachment: yarn-1068-14.patch

Updated patch to add a comment on why RMHAProtocolService is Private-Unstable 
and a pointer to YARN-1318.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1356) Typo in MergeManagerImpl.java

2013-10-25 Thread Efe Gencer (JIRA)

Efe Gencer created YARN-1356:


 Summary: Typo in MergeManagerImpl.java
 Key: YARN-1356
 URL: https://issues.apache.org/jira/browse/YARN-1356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
 Environment: all
Reporter: Efe Gencer
Priority: Trivial


There's a typo (Invlaid which should be Invalid) in line 199 of 
MergeManagerImpl.java
currently:
if (this.maxSingleShuffleLimit = this.mergeThreshold) {
  throw new RuntimeException(Invlaid configuration: 
  + maxSingleShuffleLimit should be less than mergeThreshold
  + maxSingleShuffleLimit:  + this.maxSingleShuffleLimit
  + mergeThreshold:  + this.mergeThreshold);
}

should be:

if (this.maxSingleShuffleLimit = this.mergeThreshold) {
  throw new RuntimeException(Invalid configuration: 
  + maxSingleShuffleLimit should be less than mergeThreshold
  + maxSingleShuffleLimit:  + this.maxSingleShuffleLimit
  + mergeThreshold:  + this.mergeThreshold);
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service

2013-10-25 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805598#comment-13805598
 ] 

Karthik Kambatla commented on YARN-1318:


To move forward on this, I propose RMContext should be more along the lines of 
a Builder - default constructor (no arguments) and use set* methods to set the 
internal fields. [~vinodkv], does this sound reasonable? If yes, it might make 
sense to open a new JIRA for that? 

 Promote AdminService to an Always-On service
 

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
  Labels: ha

 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1356) Typo in MergeManagerImpl.java

2013-10-25 Thread Efe Gencer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Efe Gencer updated YARN-1356:
-

Attachment: MergeManagerImpl.java

Typo fixed in attached file

 Typo in MergeManagerImpl.java
 -

 Key: YARN-1356
 URL: https://issues.apache.org/jira/browse/YARN-1356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
 Environment: all
Reporter: Efe Gencer
Priority: Trivial
 Attachments: MergeManagerImpl.java

   Original Estimate: 1m
  Remaining Estimate: 1m

 There's a typo (Invlaid which should be Invalid) in line 199 of 
 MergeManagerImpl.java
 currently:
 if (this.maxSingleShuffleLimit = this.mergeThreshold) {
   throw new RuntimeException(Invlaid configuration: 
   + maxSingleShuffleLimit should be less than mergeThreshold
   + maxSingleShuffleLimit:  + this.maxSingleShuffleLimit
   + mergeThreshold:  + this.mergeThreshold);
 }
 should be:
 if (this.maxSingleShuffleLimit = this.mergeThreshold) {
   throw new RuntimeException(Invalid configuration: 
   + maxSingleShuffleLimit should be less than mergeThreshold
   + maxSingleShuffleLimit:  + this.maxSingleShuffleLimit
   + mergeThreshold:  + this.mergeThreshold);
 }



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message

2013-10-25 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805599#comment-13805599
 ] 

Sandy Ryza commented on YARN-1351:
--

+1

 Invalid string format in Fair Scheduler log warn message
 

 Key: YARN-1351
 URL: https://issues.apache.org/jira/browse/YARN-1351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Konstantin Weitz
 Attachments: fixprnt.patch


 While trying to print a warning, two values of the wrong type (Resource 
 instead of int) are passed into a String.format method call, leading to a 
 runtime exception, in the file:
 _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
 The warning was intended to be printed whenever the resources don't fit into 
 each other, either because the number of virtual cores or the memory is too 
 small. I changed the %d's into %s, this way the warning will contain both the 
 cores and the memory.
 Following the patch that fixes the issue:
 Index: 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ===
 --- 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (revision 1535589)
 +++ 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 (working copy)
 @@ -471,7 +471,7 @@
  if (maxQueueResources.containsKey(queueName)  
 minQueueResources.containsKey(queueName)
   !Resources.fitsIn(minQueueResources.get(queueName),
  maxQueueResources.get(queueName))) {
 -  LOG.warn(String.format(Queue %s has max resources %d less than min 
 resources %d,
 +  LOG.warn(String.format(Queue %s has max resources %s less than min 
 resources %s,
queueName, maxQueueResources.get(queueName), 
 minQueueResources.get(queueName)));
  }
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2013-10-25 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805618#comment-13805618
 ] 

Robert Joseph Evans commented on YARN-941:
--

That sounds like a great default.  I would like to also have a way for an AM to 
say I can handle updating tokens without being shot, but that may be something 
that shows up in a follow on JIRA.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans

 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage

2013-10-25 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805631#comment-13805631
 ] 

Vinod Kumar Vavilapalli commented on YARN-956:
--

Seems like you missed the NPE issue with getAMContainer(). Please check all the 
methods once again. Tx.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1068) Add admin support for HA operations


[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805636#comment-13805636
 ] 

Hadoop QA commented on YARN-1068:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610366/yarn-1068-14.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2287//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2287//console

This message is automatically generated.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage


[ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805639#comment-13805639
 ] 

Hadoop QA commented on YARN-956:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610204/YARN-956.8.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2288//console

This message is automatically generated.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1333) Support blacklisting in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1333:
-

Attachment: YARN-1333.3.patch

Thanks for the review again, Sandy! Updated a patch.
* Fixed to pass compile.
* Removed needless line from a test.
* Checked Indentation.
* Moved SchedulerAppUtils.isBlacklisted() to FSLeafQueue.


 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler

2013-10-25 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805649#comment-13805649
 ] 

Sandy Ryza commented on YARN-1333:
--

+1 pending jenkins

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage

2013-10-25 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-956:
-

Attachment: YARN-956.9.patch

Fix the NPE for getAMContainer().

Review the Memory implementation again, and simplify the NPE check in 
applicationAttemptFinish and containerFinish, because getSubMap always return 
non-null value.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch, YARN-956.9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage


[ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805669#comment-13805669
 ] 

Hadoop QA commented on YARN-956:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610382/YARN-956.9.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2290//console

This message is automatically generated.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch, YARN-956.9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805672#comment-13805672
 ] 

Hadoop QA commented on YARN-1333:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610381/YARN-1333.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2289//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2289//console

This message is automatically generated.

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-25 Thread Jackie Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackie Chang updated YARN-1321:
---

Summary: NMTokenCache is a singleton, prevents multiple AMs running in a 
single JVM to work correctly  (was: NMTokenCache is a a singleton, prevents 
multiple AMs running in a single JVM to work correctly)

 NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
 work correctly
 

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Attachments: YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
 YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

[
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805697#comment-13805697
]

Omkar Vinit Joshi commented on YARN-674:

Thanks [~zjshen] for reviewing my patch
bq. I think the exception needs to be thrown, which is missing in your patch.
The exception will notice the client that the app submission fails; otherwise,
the client will think the submission succeeds?
Yes I have removed the error purposefully..here are the thoughts.
* For client once he submits the application should check the app status and
will come to know about the failing app from it.
** Either when parsing credentials fails.
** OR when initial token renewal fails.

bq. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will
the impact of that the application is already accepted and starts its life
cycle, while DelegationTokenRenewer is so slow to
DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else
due to the fresh token unavailable?
The logic here is modified a bit. If token renewal succeeds then only app is
submitted to scheduler not before that. Today too it is the same case. Only
problem is that we are holding client request while doing this. With the change
this will become async.

bq. I noticed testConncurrentAddApplication has been removed. Does the change
affect the current app submission?
No. Now there is no problem w.r.t. concurrent app submission as we are anyway
funneling it through event handler. This test is no longer required so removed
it completely.

* Fixing findbug warnings...
* fixing failed test case...

Slow or failing DelegationToken renewals on submission itself make RM
unavailable
-

This was caused by YARN-280. A slow or a down NameNode for will make it look
like RM is unavailable as it may run out of RPC handlers due to blocked
client submissions.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1350) Should not add Lost Node by NodeManager reboot


 [ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated YARN-1350:
-

Attachment: NodeState.txt

I attach a detailed information.

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.2.patch

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805757#comment-13805757
 ] 

Hadoop QA commented on YARN-674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610392/YARN-674.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2291//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2291//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2291//console

This message is automatically generated.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-25 Thread Andrey Klochkov (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrey Klochkov updated YARN-415:
-

Attachment: YARN-415--n9.patch

Updated the patch moving tracking logic into Scheduler:
- AppSchedulingInfo tracks resources usage. Existing methods are reused and
overall it seems more like the right place to have this logic in.
- When app is finished and Scheduler evicts it from it's cache, it sends a new
type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage
stats to the event.
- RMAppAttemptImpl test is modified accordingly
- a new test is added to verify resources tracking in AppSchedulingInfo

Capture memory utilization at the app-level for chargeback
--

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n2.patch, YARN-415--n3.patch,
YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch,
YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-25 Thread Andrey Klochkov (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805804#comment-13805804
]

Andrey Klochkov commented on YARN-415:
--

This scheme has a downside that the stats would be incorrect between 2 events:
1) Scheduler evicting the app from the cache and sending an event and 2)
RMAppAttemptImpl receiving the event and updating it's internal stats. The only
idea I have is to add an additional roundtrip extending this schema to:
1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance
and does not evict the app from the cache yet
2. RMAppAttemptImpl receives the event, updates it's internal fields
finalMemorySeconds and finalVcoreSeconds and sends a new type event to the
Scheduler allowing it to evict the app.
3. Scheduler gets the event and evicts the app.

Thoughts?

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1307) Rethink znode structure for RM HA


 [ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1307:
-

Attachment: YARN-1307.3.patch

This is a first patch for reviewing. 

 Rethink znode structure for RM HA
 -

 Key: YARN-1307
 URL: https://issues.apache.org/jira/browse/YARN-1307
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch


 Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
 YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
 YARN-1222:
 {quote}
 We should move to creating a node hierarchy for apps such that all znodes for 
 an app are stored under an app znode instead of the app root znode. This will 
 help in removeApplication and also in scaling better on ZK. The earlier code 
 was written this way to ensure create/delete happens under a root znode for 
 fencing. But given that we have moved to multi-operations globally, this isnt 
 required anymore.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805817#comment-13805817
 ] 

Hadoop QA commented on YARN-415:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console

This message is automatically generated.

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n2.patch, YARN-415--n3.patch, 
 YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, 
 YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805825#comment-13805825
 ] 

Omkar Vinit Joshi commented on YARN-1350:
-

[~sinchii] I have basic question..why your nodeId is changing everytime? have 
you configured your nodemanager with ephemeral port (0) ? what is NM_ADDRESS? 
RM will consider this as same node only when your newly restarted node manager 
reports with same node id .. i.e. host-name:port

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception


 [ 
https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1252:
---

Assignee: Omkar Vinit Joshi

 Secure RM fails to start up in secure HA setup with Renewal request for 
 unknown token exception
 ---

 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi

 {code}
 2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
 Responder: starting
 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
 as:rm/host@realm (auth:KERBEROS) 
 cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
 request for unknown token
 at 
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception


[ 
https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805827#comment-13805827
 ] 

Omkar Vinit Joshi commented on YARN-1252:
-

taking it over..


 Secure RM fails to start up in secure HA setup with Renewal request for 
 unknown token exception
 ---

 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta

 {code}
 2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
 Responder: starting
 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
 as:rm/host@realm (auth:KERBEROS) 
 cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
 request for unknown token
 at 
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception


[ 
https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805828#comment-13805828
 ] 

Omkar Vinit Joshi commented on YARN-1252:
-

YARN-674 should solve this problem. Now as token renewal is asynchronous in 
nature so if the token in unknown or external system (token renewing system) is 
down then the application for which this token was submitted will be marked as 
failed without crashing RM.

 Secure RM fails to start up in secure HA setup with Renewal request for 
 unknown token exception
 ---

 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi

 {code}
 2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
 Responder: starting
 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
 as:rm/host@realm (auth:KERBEROS) 
 cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
 request for unknown token
 at 
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805832#comment-13805832
]

Jason Lowe commented on YARN-415:
-

I haven't fully digested the latest patch yet, but here are some initial
impressions:

I believe Sandy's intention was to remove the need for a separate
runningContainers map, but that map that still exists in the patch and has
simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated
a new event and added a new race condition, so I'm not sure this is a better
overall approach.

To remove the need for a separate runningContainers map we need to reuse the
place in the code where the schedulers are already tracking the active
containers for an application, and that's in
SchedulerApplication.liveContainers. We could extend RMContainer to add the
ability to obtain an allocation start time, and now we can compute the resource
consumption for the active containers in SchedulerApplication and roll them up
into a usage total when the containers complete and are removed from
liveContainers. Then at least we're eliminating an extra map to track active
containers.

As for the race condition, how about requiring schedulers to retain app
attempts in their cache until signaled by RMAppAttemptImpl that it can be
flushed? RMAppAttemptImpl already knows (eventually) when an application
completes, and it can grab the latest app report with the rollup of resource
usage from the scheduler, cache that usage locally into a total, then tell the
scheduler via a new scheduler event that it can release the app attempt from
its cache.

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception


[ 
https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805830#comment-13805830
 ] 

Omkar Vinit Joshi commented on YARN-1252:
-

[~vinodkv] [~jianhe] if you agree then we can close this.

 Secure RM fails to start up in secure HA setup with Renewal request for 
 unknown token exception
 ---

 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi

 {code}
 2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
 Responder: starting
 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
 as:rm/host@realm (auth:KERBEROS) 
 cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
 request for unknown token
 at 
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage

2013-10-25 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805860#comment-13805860
 ] 

Vinod Kumar Vavilapalli commented on YARN-956:
--

Okay, looks good. Will check it in.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch, YARN-956.9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805862#comment-13805862
 ] 

Shinichi Yamashita commented on YARN-1350:
--

In other words, you say that I don't have any problem if I fix port number in 
yarn.nodemanager.address property.
And this problem will not surely occur.
But then it should set a fixed appropriate port number like 
yarn.resourcemanager.address in yarn-default.xml and default port number. Why 
is 0?


 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805867#comment-13805867
 ] 

Omkar Vinit Joshi commented on YARN-1350:
-

That is mainly for single node cluster to avoid port clashing. For real cluster 
you should define a port there. If you agree I will close this as invalid.

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1350) Should not add Lost Node by NodeManager reboot


 [ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi resolved YARN-1350.
-

Resolution: Invalid
  Assignee: Omkar Vinit Joshi

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Omkar Vinit Joshi
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805885#comment-13805885
 ] 

Omkar Vinit Joshi commented on YARN-674:


* The recent test failure doesn't seem to be related to the code. The test 
passes locally. Should I open one ticket for this? 
* Not understanding how to fix that findbug warning .. should I add that too 
into exclude-findbug.xml? I tried this. Even eclipse doesn't complain
{code}
@Override
@SuppressWarnings(unchecked)
public void handle(DelegationTokenRenewerEvent event) {
  if (event.getType().equals(
  DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) event;
handleDTRenewerEvent(appSubmitEvt);
  } else if (event.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
rmContext.getDelegationTokenRenewer().applicationFinished(event);
  }

}
{code}

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805887#comment-13805887
 ] 

Hudson commented on YARN-1333:
--

FAILURE: Integrated in Hadoop-trunk-Commit #4657 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4657/])
YARN-1333. Support blacklisting in the Fair Scheduler (Tsuyoshi Ozawa via Sandy 
Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535899)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Fix For: 2.2.1

 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805892#comment-13805892
 ] 

Tsuyoshi OZAWA commented on YARN-1333:
--

Thanks Sandy!

 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Fix For: 2.2.1

 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805900#comment-13805900
 ] 

Shinichi Yamashita commented on YARN-1350:
--

Why port clashing? For example, does it use multiple NodeManager with one 
server?

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Omkar Vinit Joshi
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805899#comment-13805899
 ] 

Hudson commented on YARN-1333:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4658 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4658/])
YARN-1333: Add missing file SchedulerAppUtils (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535900)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java


 Support blacklisting in the Fair Scheduler
 --

 Key: YARN-1333
 URL: https://issues.apache.org/jira/browse/YARN-1333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Fix For: 2.2.1

 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage

2013-10-25 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805907#comment-13805907
]

Vinod Kumar Vavilapalli commented on YARN-975:
--

Had a look at the patch, some comments:
- HDFS jar is not needed as a test-dependency in
hadoop-yarn-server-applicationhistoryservice/pom.xml
- You should make HistoryFileReader and HistoryFileWriter as static private
inner classes and avoid sharing state altogether.
- Wrap new
ApplicationStartDataPBImpl(ApplicationStartDataProto.parseFrom(entry.value)))
into a method in ApplicationStartData. Similarly others.
- getApplicationAttempts(): If there is no history-file, we should throw a
valid exception?
- finishDtata: Typo
- There is no limit on outstandingWriters. If RM runs 1K applications in
parallel, we'll have 1K writers - RM can thus potentially go out of file
handles. We need to limit this (configurable?) and queue any more writes into a
limited number of threads. Can do in a follow up JIRA, please file one.
- appId + START_DATA_SUFFIX: Instead of strings and appending, you can write a
complex key which has an ApplicationId and the start marker and convert them to
bytes when storing via a getBytes() method.
- Similarly for ApplicationAttempt and Container suffixes.
- When a HistoryFile exists, HistoryFileWriter should open it in append mode.
- In both the reader and the writer, you should use IOUtils.cleanup() instead
of explicitly calling close on each stream yourselves everywhere.
- Don't think we should do this. Any retries should be inside
FileSystemHistoryStore. We should close the writer in a finally block.
{code}
+// Not put close() in finally block in case callers want to retry writing+
// the data. On the other hand, the file will anyway be close when the
+// store is stopped.
{code}
- Dismantle retriveStartFinishData() into two methods - one for start and one
for finish.
- TestApplicationHistoryStore was renamed in YARN-956, please update the patch
- Test: A single file will only have data about a single application. So
testWriteHistoryData() should not have multiple applications. Similarly
ApplicationAttempt finish to follow after container-finish.
- Test: We should NOT have this dependency. Java 7 reorders tests in some
cases.
{code}
+ // The order of the test cases matters
{code}

Add a file-system implementation for history-storage

Key: YARN-975
URL: https://issues.apache.org/jira/browse/YARN-975
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Attachments: YARN-975.10.patch, YARN-975.1.patch, YARN-975.2.patch,
YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch, YARN-975.6.patch,
YARN-975.7.patch, YARN-975.8.patch, YARN-975.9.patch

HDFS implementation should be a standard persistence strategy of history
storage

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805908#comment-13805908
 ] 

Omkar Vinit Joshi commented on YARN-1350:
-

you should checkout MiniYarnCluster.

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Omkar Vinit Joshi
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot

2013-10-25 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805921#comment-13805921
 ] 

Akira AJISAKA commented on YARN-1350:
-

IMO, there are two options for not causing this problem.
 * Document to fix port number in real cluster.
 * Change yarn-default.xml to fix port number and MiniYarnCluster to use 0.

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Omkar Vinit Joshi
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot


[ 
https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805932#comment-13805932
 ] 

Shinichi Yamashita commented on YARN-1350:
--

Thank you for additional information. I understand that the port number sets 0 
to use multiple NodeManager for a test in MiniYARNCluster.
And it seems to be easy to understand that there is notes setting it in real 
cluster in yarn-default.xml description.

 Should not add Lost Node by NodeManager reboot
 --

 Key: YARN-1350
 URL: https://issues.apache.org/jira/browse/YARN-1350
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Omkar Vinit Joshi
 Attachments: NodeState.txt


 In current trunk, when NodeManager reboots, the node information before the 
 reboot is treated as LOST.
 This occurs to confirm only Inactive node information at the time of reboot.
 Therefore Lost Node will exist even if NodeManager works in all nodes.
 We should change it not to register Lost Node by the NodeManager reboot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA