[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805076#comment-13805076 ] Hadoop QA commented on YARN-1172: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610275/YARN-1172.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 21 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2279//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2279//console This message is automatically generated. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch, YARN-1172.8.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Weitz updated YARN-1351: --- Attachment: patch Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805103#comment-13805103 ] Hadoop QA commented on YARN-1351: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610284/patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2280//console This message is automatically generated. Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA
[ https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805105#comment-13805105 ] Bikas Saha commented on YARN-1307: -- After looking at the code I understand what you mean by the suffix numbers. Its the _1, _2 in the tree above for token and key. I think its fine to use the current approach that uses sequence number for tokens and key id for secret keys. Or we can name them serially as 1,2,3... etc like you describe above. Either is fine. We will soon be batching them into znodes anyways in the near future. Above looks good. Rethink znode structure for RM HA - Key: YARN-1307 URL: https://issues.apache.org/jira/browse/YARN-1307 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1307.1.patch, YARN-1307.2.patch Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA
[ https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805111#comment-13805111 ] Tsuyoshi OZAWA commented on YARN-1307: -- Yes, you're right. Thanks for your feedback! Rethink znode structure for RM HA - Key: YARN-1307 URL: https://issues.apache.org/jira/browse/YARN-1307 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1307.1.patch, YARN-1307.2.patch Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services
[ https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805114#comment-13805114 ] Tsuyoshi OZAWA commented on YARN-1172: -- Thank you for the review, Karthik. Yes, a approach I suggested is more complicated approach than converting SecretManager to extend AbstractService. Let's discuss on HADOOP-10043. Convert *SecretManagers in the RM to services - Key: YARN-1172 URL: https://issues.apache.org/jira/browse/YARN-1172 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch, YARN-1172.8.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1333: - Attachment: YARN-1333.2.patch * Fixed method name from testBlackListNodes to testBlacklistNodes * Fixed to use a resource manager created in setup method. * Fixed to create AppAddedSchedulerEvent from createSchedulingRequest()'s return value * Fixed indentation. * Deleted needless Assert. prefix because of static import. * Changed to call scheduler.applications.get(). * Added to test to verify that an container does not actually get place on the blacklisted host. Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805198#comment-13805198 ] Hadoop QA commented on YARN-1333: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610296/YARN-1333.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2281//console This message is automatically generated. Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication
[ https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805221#comment-13805221 ] Hudson commented on YARN-1335: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #373 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/373/]) YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication -- Key: YARN-1335 URL: https://issues.apache.org/jira/browse/YARN-1335 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.1 Attachments: YARN-1335-1.patch, YARN-1335.patch FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1340) MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space
[ https://issues.apache.org/jira/browse/YARN-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Bergland updated YARN-1340: --- Description: We found that our tests based on the ClusterMapReduceTestCase class failed when the jenkins job contained spaces and were able to reproduce the error by just renaming the project directory to contain a space character. The failure happens when validatePaths method in LocalDirsHandlerService tries to interpret the paths as URLs new URL(dir) and this fails. https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java The code in the MiniYARNCluster.prepareDirs method needs to be modified to create properly escaped file://-based URLs instead of raw file paths OR the receiving end in LocalDirsHandlerService needs to stop interpreting the directories as urls. Since MiniYARNCluster is a test class I suspect that the former needs to be done. was: We found that our tests based on the ClusterMapReduceTestCase class failed when the jenkins job contained spaces and were able to reproduce the error by just renaming the project directory to create a space character. The failure happens when validatePaths method in LocalDirsHandlerService tries to interpret the paths as URLs new URL(dir) and this fails. https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java The code in the MiniYARNCluster.prepareDirs method needs to be modified to create properly escaped file://-based URLs instead of raw file paths OR the receiving end in LocalDirsHandlerService needs to stop interpreting the directories as urls. Since MiniYARNCluster is a test class I suspect that the former needs to be done. MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space Key: YARN-1340 URL: https://issues.apache.org/jira/browse/YARN-1340 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.6-alpha Environment: Mac OS X 10.8.5, CentOS 6.3 Reporter: Per Bergland We found that our tests based on the ClusterMapReduceTestCase class failed when the jenkins job contained spaces and were able to reproduce the error by just renaming the project directory to contain a space character. The failure happens when validatePaths method in LocalDirsHandlerService tries to interpret the paths as URLs new URL(dir) and this fails. https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java The code in the MiniYARNCluster.prepareDirs method needs to be modified to create properly escaped file://-based URLs instead of raw file paths OR the receiving end in LocalDirsHandlerService needs to stop interpreting the directories as urls. Since MiniYARNCluster is a test class I suspect that the former needs to be done. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1340) MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space
[ https://issues.apache.org/jira/browse/YARN-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805289#comment-13805289 ] Per Bergland commented on YARN-1340: prepareDirs should do: new File(path).toURI().toURL() on the paths if the NM_LOCAL_DIR is supposed to be URIs MiniYARNCluster generates wrong style directories in YarnConfiguration.NM_LOCAL_DIR, causes tests to fail if path contains space Key: YARN-1340 URL: https://issues.apache.org/jira/browse/YARN-1340 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.6-alpha Environment: Mac OS X 10.8.5, CentOS 6.3 Reporter: Per Bergland We found that our tests based on the ClusterMapReduceTestCase class failed when the jenkins job contained spaces and were able to reproduce the error by just renaming the project directory to contain a space character. The failure happens when validatePaths method in LocalDirsHandlerService tries to interpret the paths as URLs new URL(dir) and this fails. https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java The code in the MiniYARNCluster.prepareDirs method needs to be modified to create properly escaped file://-based URLs instead of raw file paths OR the receiving end in LocalDirsHandlerService needs to stop interpreting the directories as urls. Since MiniYARNCluster is a test class I suspect that the former needs to be done. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication
[ https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805292#comment-13805292 ] Hudson commented on YARN-1335: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1563 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1563/]) YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication -- Key: YARN-1335 URL: https://issues.apache.org/jira/browse/YARN-1335 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.1 Attachments: YARN-1335-1.patch, YARN-1335.patch FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1335) Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication
[ https://issues.apache.org/jira/browse/YARN-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805315#comment-13805315 ] Hudson commented on YARN-1335: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1589 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1589/]) YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535582) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication -- Key: YARN-1335 URL: https://issues.apache.org/jira/browse/YARN-1335 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.1 Attachments: YARN-1335-1.patch, YARN-1335.patch FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805465#comment-13805465 ] Sandy Ryza commented on YARN-1333: -- Thanks Tsuyoshi. Just a couple more things: {code} +resourceManager.getRMContext().getAMFinishingMonitor(); {code} Is this line necessary? {code} +ApplicationAttemptId appAttemptId = createSchedulingRequest(1024, root.default, user, 1); {code} This looks like more than 80 characters {code} +if (SchedulerAppUtils.isBlacklisted(application, node, LOG)) { + return null; +} {code} Can this be moved to the equivalent of where it is in the capacity scheduler, i.e. FSLeafQueue? Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805491#comment-13805491 ] Arun C Murthy commented on YARN-1042: - [~djp] Do you mind if I take this over? I can do this concurrently with YARN-796 (which I already have a patch). Tx! add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Junping Du Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805494#comment-13805494 ] Zhijie Shen commented on YARN-674: -- I've a quick look at the patch. Here're my comments: 1. It seem that the change in RMAppManager is not necessary, because the current logic is to reject the app in the secure case when parsing the credentials and adding the apps to DelegationTokenRenewer have something wrong; otherwise, the app will be accepted. Though there's no obvious if... else... structure, it achieves the same logic control via: {code} throw RPCUtil.getRemoteException(ie); {code} I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds? If I miss some ideas here, please let me know. 2. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable? 3. I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission? Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1349: Attachment: YARN-1349.1.patch Attaching patch to fix yarn.cmd. I also needed to add a special case for the classpath sub-command so that it wouldn't try to dispatch to java. This is identical to how it's handled in hadoop.cmd. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1352) Recover LogAggregationService upon nodemanager restart
Jason Lowe created YARN-1352: Summary: Recover LogAggregationService upon nodemanager restart Key: YARN-1352 URL: https://issues.apache.org/jira/browse/YARN-1352 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe LogAggregationService state needs to be recovered as part of the work-preserving nodemanager restart feature. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Weitz updated YARN-1351: --- Attachment: (was: patch) Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: fixprnt.patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805512#comment-13805512 ] Steve Loughran commented on YARN-941: - Ignoring HDFS, updated Yarn RM tokens (and the RM-assigned AM RPC token) could be passed to the AM by killing the container and creating a new one, once YARN-1041 handles AM restart better. This may seem brutal, but it stops your code getting complacent about not having to handle AM failure -and it means the current token retrieval process is all that is needed RM Should have a way to update the tokens it has for a running application -- Key: YARN-941 URL: https://issues.apache.org/jira/browse/YARN-941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans When an application is submitted to the RM it includes with it a set of tokens that the RM will renew on behalf of the application, that will be passed to the AM when the application is launched, and will be used when launching the application to access HDFS to download files on behalf of the application. For long lived applications/services these tokens can expire, and then the tokens that the AM has will be invalid, and the tokens that the RM had will also not work to launch a new AM. We need to provide an API that will allow the RM to replace the current tokens for this application with a new set. To avoid any real race issues, I think this API should be something that the AM calls, so that the client can connect to the AM with a new set of tokens it got using kerberos, then the AM can inform the RM of the new set of tokens and quickly update its tokens internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Weitz updated YARN-1351: --- Attachment: fixprnt.patch Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: fixprnt.patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1353) Containers not being killed on Linux after application is killed
Bikas Saha created YARN-1353: Summary: Containers not being killed on Linux after application is killed Key: YARN-1353 URL: https://issues.apache.org/jira/browse/YARN-1353 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Bikas Saha Running application on a Linux cluster where setsid is available. After killing the application via yarn application -kill we see that containers for that application are still hanging around for up to 30 mins after the application kill. The NM log says that the container was killed with code 143 but it seems that only the shell launcher is killed. uname -a output Linux ZZZ.com 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805515#comment-13805515 ] Hadoop QA commented on YARN-1349: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610350/YARN-1349.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2283//console This message is automatically generated. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805518#comment-13805518 ] Chuan Liu commented on YARN-1349: - Are 'proxyserver' and 'node' missing from 'yarncommands'? Otherwise +1 from me. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1353) Containers not being killed on Linux after application is killed
[ https://issues.apache.org/jira/browse/YARN-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1353: - Attachment: container_1382388401549_1060_01_000227.log Logs for the container attached. Containers not being killed on Linux after application is killed Key: YARN-1353 URL: https://issues.apache.org/jira/browse/YARN-1353 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Bikas Saha Attachments: container_1382388401549_1060_01_000227.log Running application on a Linux cluster where setsid is available. After killing the application via yarn application -kill we see that containers for that application are still hanging around for up to 30 mins after the application kill. The NM log says that the container was killed with code 143 but it seems that only the shell launcher is killed. uname -a output Linux ZZZ.com 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805531#comment-13805531 ] Junping Du commented on YARN-1042: -- Sure. Arun, thanks for working on this. Please go ahead! add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Junping Du Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805522#comment-13805522 ] Bikas Saha commented on YARN-311: - Can we please double check and assure ourselves that this is deadlock free. {code} +// Update resource if any change +synchronized(nm) { + SchedulerUtils.updateResourceIfChanged(node, nm, clusterResource, LOG); +} {code} Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1354) Recover applications upon nodemanager restart
Jason Lowe created YARN-1354: Summary: Recover applications upon nodemanager restart Key: YARN-1354 URL: https://issues.apache.org/jira/browse/YARN-1354 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe The set of active applications in the nodemanager context need to be recovered for work-preserving nodemanager restart -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1355) Recover application ACLs upon nodemanager restart
Jason Lowe created YARN-1355: Summary: Recover application ACLs upon nodemanager restart Key: YARN-1355 URL: https://issues.apache.org/jira/browse/YARN-1355 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe The ACLs for applications need to be recovered for work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1042) add ability to specify affinity/anti-affinity in container requests
[ https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-1042: Assignee: Arun C Murthy (was: Junping Du) add ability to specify affinity/anti-affinity in container requests --- Key: YARN-1042 URL: https://issues.apache.org/jira/browse/YARN-1042 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Arun C Murthy Attachments: YARN-1042-demo.patch container requests to the AM should be able to request anti-affinity to ensure that things like Region Servers don't come up on the same failure zones. Similarly, you may be able to want to specify affinity to same host or rack without specifying which specific host/rack. Example: bringing up a small giraph cluster in a large YARN cluster would benefit from having the processes in the same rack purely for bandwidth reasons. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805535#comment-13805535 ] Hadoop QA commented on YARN-674: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610181/YARN-674.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2282//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2282//console This message is automatically generated. Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1355) Recover application ACLs upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805537#comment-13805537 ] Jason Lowe commented on YARN-1355: -- One idea is to persist the ACLs for an application underneath the application directory in the log directory tree. That has the benefit of automatically removing the persisted ACL data when an application's logs are removed (and thus ACLs are no longer needed). Restoring of application ACLs potentially could be lazily performed as well if it isn't cached in memory. Recover application ACLs upon nodemanager restart - Key: YARN-1355 URL: https://issues.apache.org/jira/browse/YARN-1355 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe The ACLs for applications need to be recovered for work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805545#comment-13805545 ] Hadoop QA commented on YARN-891: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610216/YARN-891.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2285//console This message is automatically generated. Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805549#comment-13805549 ] Chris Nauroth commented on YARN-1349: - {quote} -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {quote} All changes are in scripts, so there are no new tests. I manually tested this patch in a running Windows cluster by running all yarn.cmd sub-commands and one additional direct class to cover the passthrough case. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch, YARN-1349.2.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1349: Attachment: YARN-1349.2.patch node was already in there, but proxyserver was missing. Thanks for catching that. Here is a new patch. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch, YARN-1349.2.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805552#comment-13805552 ] Chuan Liu commented on YARN-1349: - +1 yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch, YARN-1349.2.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1380#comment-1380 ] Junping Du commented on YARN-311: - Thanks for comments, Bikas! The synchronization here is to make sure the read of nm (rmNode) resource is thread-safe while another thread do write (nm.setTotalCapacity()) triggered in AdminService (an implementation of RMAdminProtocol). Given SchedulerUtils.updateResourceIfChanged() itself is lock-free and nm.setTotalCapacity() is also lock-free, it is easily to execute through when getting nm synchronization lock, so it is deadlock free. Does it make sense? Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805557#comment-13805557 ] Hadoop QA commented on YARN-1349: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610357/YARN-1349.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2286//console This message is automatically generated. yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch, YARN-1349.2.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805560#comment-13805560 ] Hadoop QA commented on YARN-1351: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610355/fixprnt.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2284//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2284//console This message is automatically generated. Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: fixprnt.patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1349) yarn.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/YARN-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805565#comment-13805565 ] Chris Nauroth commented on YARN-1349: - bq. -1 javac. The patch appears to cause the build to fail. It looks like the Jenkins box is overloaded. It can't fork a new thread. This patch only contains cmd script changes, so there is no way it can make javac fail. {code} java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at java.lang.ref.Reference.clinit(Reference.java:145) {code} yarn.cmd does not support passthrough to any arbitrary class. - Key: YARN-1349 URL: https://issues.apache.org/jira/browse/YARN-1349 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-1349.1.patch, YARN-1349.2.patch The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1318: - Priority: Critical (was: Major) Promote AdminService to an Always-On service Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805581#comment-13805581 ] Bikas Saha commented on YARN-1068: -- [~vinodkv] Does the new patch address your concerns? I have marked YARN-1318 as a blocker for YARN-149. We must fix that before failover is available. Karthik, in your final patch can you please include clear comments pointing to YARN-1318 near the @Private annotations for RMHAProtocolService. Thanks! Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Attachment: yarn-1068-14.patch Updated patch to add a comment on why RMHAProtocolService is Private-Unstable and a pointer to YARN-1318. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1356) Typo in MergeManagerImpl.java
Efe Gencer created YARN-1356: Summary: Typo in MergeManagerImpl.java Key: YARN-1356 URL: https://issues.apache.org/jira/browse/YARN-1356 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: all Reporter: Efe Gencer Priority: Trivial There's a typo (Invlaid which should be Invalid) in line 199 of MergeManagerImpl.java currently: if (this.maxSingleShuffleLimit = this.mergeThreshold) { throw new RuntimeException(Invlaid configuration: + maxSingleShuffleLimit should be less than mergeThreshold + maxSingleShuffleLimit: + this.maxSingleShuffleLimit + mergeThreshold: + this.mergeThreshold); } should be: if (this.maxSingleShuffleLimit = this.mergeThreshold) { throw new RuntimeException(Invalid configuration: + maxSingleShuffleLimit should be less than mergeThreshold + maxSingleShuffleLimit: + this.maxSingleShuffleLimit + mergeThreshold: + this.mergeThreshold); } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805598#comment-13805598 ] Karthik Kambatla commented on YARN-1318: To move forward on this, I propose RMContext should be more along the lines of a Builder - default constructor (no arguments) and use set* methods to set the internal fields. [~vinodkv], does this sound reasonable? If yes, it might make sense to open a new JIRA for that? Promote AdminService to an Always-On service Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1356) Typo in MergeManagerImpl.java
[ https://issues.apache.org/jira/browse/YARN-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Efe Gencer updated YARN-1356: - Attachment: MergeManagerImpl.java Typo fixed in attached file Typo in MergeManagerImpl.java - Key: YARN-1356 URL: https://issues.apache.org/jira/browse/YARN-1356 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: all Reporter: Efe Gencer Priority: Trivial Attachments: MergeManagerImpl.java Original Estimate: 1m Remaining Estimate: 1m There's a typo (Invlaid which should be Invalid) in line 199 of MergeManagerImpl.java currently: if (this.maxSingleShuffleLimit = this.mergeThreshold) { throw new RuntimeException(Invlaid configuration: + maxSingleShuffleLimit should be less than mergeThreshold + maxSingleShuffleLimit: + this.maxSingleShuffleLimit + mergeThreshold: + this.mergeThreshold); } should be: if (this.maxSingleShuffleLimit = this.mergeThreshold) { throw new RuntimeException(Invalid configuration: + maxSingleShuffleLimit should be less than mergeThreshold + maxSingleShuffleLimit: + this.maxSingleShuffleLimit + mergeThreshold: + this.mergeThreshold); } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1351) Invalid string format in Fair Scheduler log warn message
[ https://issues.apache.org/jira/browse/YARN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805599#comment-13805599 ] Sandy Ryza commented on YARN-1351: -- +1 Invalid string format in Fair Scheduler log warn message Key: YARN-1351 URL: https://issues.apache.org/jira/browse/YARN-1351 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Konstantin Weitz Attachments: fixprnt.patch While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory. Following the patch that fixes the issue: Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (revision 1535589) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java (working copy) @@ -471,7 +471,7 @@ if (maxQueueResources.containsKey(queueName) minQueueResources.containsKey(queueName) !Resources.fitsIn(minQueueResources.get(queueName), maxQueueResources.get(queueName))) { - LOG.warn(String.format(Queue %s has max resources %d less than min resources %d, + LOG.warn(String.format(Queue %s has max resources %s less than min resources %s, queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName))); } } -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805618#comment-13805618 ] Robert Joseph Evans commented on YARN-941: -- That sounds like a great default. I would like to also have a way for an AM to say I can handle updating tokens without being shot, but that may be something that shows up in a follow on JIRA. RM Should have a way to update the tokens it has for a running application -- Key: YARN-941 URL: https://issues.apache.org/jira/browse/YARN-941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans When an application is submitted to the RM it includes with it a set of tokens that the RM will renew on behalf of the application, that will be passed to the AM when the application is launched, and will be used when launching the application to access HDFS to download files on behalf of the application. For long lived applications/services these tokens can expire, and then the tokens that the AM has will be invalid, and the tokens that the RM had will also not work to launch a new AM. We need to provide an API that will allow the RM to replace the current tokens for this application with a new set. To avoid any real race issues, I think this API should be something that the AM calls, so that the client can connect to the AM with a new set of tokens it got using kerberos, then the AM can inform the RM of the new set of tokens and quickly update its tokens internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805631#comment-13805631 ] Vinod Kumar Vavilapalli commented on YARN-956: -- Seems like you missed the NPE issue with getAMContainer(). Please check all the methods once again. Tx. [YARN-321] Add a testable in-memory HistoryStorage --- Key: YARN-956 URL: https://issues.apache.org/jira/browse/YARN-956 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, YARN-956.8.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805636#comment-13805636 ] Hadoop QA commented on YARN-1068: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610366/yarn-1068-14.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2287//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2287//console This message is automatically generated. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805639#comment-13805639 ] Hadoop QA commented on YARN-956: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610204/YARN-956.8.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2288//console This message is automatically generated. [YARN-321] Add a testable in-memory HistoryStorage --- Key: YARN-956 URL: https://issues.apache.org/jira/browse/YARN-956 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, YARN-956.8.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1333: - Attachment: YARN-1333.3.patch Thanks for the review again, Sandy! Updated a patch. * Fixed to pass compile. * Removed needless line from a test. * Checked Indentation. * Moved SchedulerAppUtils.isBlacklisted() to FSLeafQueue. Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805649#comment-13805649 ] Sandy Ryza commented on YARN-1333: -- +1 pending jenkins Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-956: - Attachment: YARN-956.9.patch Fix the NPE for getAMContainer(). Review the Memory implementation again, and simplify the NPE check in applicationAttemptFinish and containerFinish, because getSubMap always return non-null value. [YARN-321] Add a testable in-memory HistoryStorage --- Key: YARN-956 URL: https://issues.apache.org/jira/browse/YARN-956 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, YARN-956.8.patch, YARN-956.9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805669#comment-13805669 ] Hadoop QA commented on YARN-956: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610382/YARN-956.9.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2290//console This message is automatically generated. [YARN-321] Add a testable in-memory HistoryStorage --- Key: YARN-956 URL: https://issues.apache.org/jira/browse/YARN-956 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, YARN-956.8.patch, YARN-956.9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805672#comment-13805672 ] Hadoop QA commented on YARN-1333: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610381/YARN-1333.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2289//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2289//console This message is automatically generated. Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly
[ https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jackie Chang updated YARN-1321: --- Summary: NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly (was: NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly Key: YARN-1321 URL: https://issues.apache.org/jira/browse/YARN-1321 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Attachments: YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens. The error observed in the client side is something like: {code} ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. NMToken for application attempt : appattempt_1382038445650_0002_01 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_01 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805697#comment-13805697 ] Omkar Vinit Joshi commented on YARN-674: Thanks [~zjshen] for reviewing my patch bq. I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds? Yes I have removed the error purposefully..here are the thoughts. * For client once he submits the application should check the app status and will come to know about the failing app from it. ** Either when parsing credentials fails. ** OR when initial token renewal fails. bq. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable? The logic here is modified a bit. If token renewal succeeds then only app is submitted to scheduler not before that. Today too it is the same case. Only problem is that we are holding client request while doing this. With the change this will become async. bq. I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission? No. Now there is no problem w.r.t. concurrent app submission as we are anyway funneling it through event handler. This test is no longer required so removed it completely. * Fixing findbug warnings... * fixing failed test case... Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated YARN-1350: - Attachment: NodeState.txt I attach a detailed information. Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-674: --- Attachment: YARN-674.2.patch Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch, YARN-674.2.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805757#comment-13805757 ] Hadoop QA commented on YARN-674: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610392/YARN-674.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2291//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2291//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2291//console This message is automatically generated. Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch, YARN-674.2.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-415: - Attachment: YARN-415--n9.patch Updated the patch moving tracking logic into Scheduler: - AppSchedulingInfo tracks resources usage. Existing methods are reused and overall it seems more like the right place to have this logic in. - When app is finished and Scheduler evicts it from it's cache, it sends a new type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage stats to the event. - RMAppAttemptImpl test is modified accordingly - a new test is added to verify resources tracking in AppSchedulingInfo Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805804#comment-13805804 ] Andrey Klochkov commented on YARN-415: -- This scheme has a downside that the stats would be incorrect between 2 events: 1) Scheduler evicting the app from the cache and sending an event and 2) RMAppAttemptImpl receiving the event and updating it's internal stats. The only idea I have is to add an additional roundtrip extending this schema to: 1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance and does not evict the app from the cache yet 2. RMAppAttemptImpl receives the event, updates it's internal fields finalMemorySeconds and finalVcoreSeconds and sends a new type event to the Scheduler allowing it to evict the app. 3. Scheduler gets the event and evicts the app. Thoughts? Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1307) Rethink znode structure for RM HA
[ https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1307: - Attachment: YARN-1307.3.patch This is a first patch for reviewing. Rethink znode structure for RM HA - Key: YARN-1307 URL: https://issues.apache.org/jira/browse/YARN-1307 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805817#comment-13805817 ] Hadoop QA commented on YARN-415: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console This message is automatically generated. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805825#comment-13805825 ] Omkar Vinit Joshi commented on YARN-1350: - [~sinchii] I have basic question..why your nodeId is changing everytime? have you configured your nodemanager with ephemeral port (0) ? what is NM_ADDRESS? RM will consider this as same node only when your newly restarted node manager reports with same node id .. i.e. host-name:port Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception
[ https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-1252: --- Assignee: Omkar Vinit Joshi Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception --- Key: YARN-1252 URL: https://issues.apache.org/jira/browse/YARN-1252 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi {code} 2013-09-26 08:15:20,507 INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException as:rm/host@realm (auth:KERBEROS) cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request for unknown token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception
[ https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805827#comment-13805827 ] Omkar Vinit Joshi commented on YARN-1252: - taking it over.. Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception --- Key: YARN-1252 URL: https://issues.apache.org/jira/browse/YARN-1252 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Arpit Gupta {code} 2013-09-26 08:15:20,507 INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException as:rm/host@realm (auth:KERBEROS) cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request for unknown token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception
[ https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805828#comment-13805828 ] Omkar Vinit Joshi commented on YARN-1252: - YARN-674 should solve this problem. Now as token renewal is asynchronous in nature so if the token in unknown or external system (token renewing system) is down then the application for which this token was submitted will be marked as failed without crashing RM. Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception --- Key: YARN-1252 URL: https://issues.apache.org/jira/browse/YARN-1252 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi {code} 2013-09-26 08:15:20,507 INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException as:rm/host@realm (auth:KERBEROS) cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request for unknown token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805832#comment-13805832 ] Jason Lowe commented on YARN-415: - I haven't fully digested the latest patch yet, but here are some initial impressions: I believe Sandy's intention was to remove the need for a separate runningContainers map, but that map that still exists in the patch and has simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated a new event and added a new race condition, so I'm not sure this is a better overall approach. To remove the need for a separate runningContainers map we need to reuse the place in the code where the schedulers are already tracking the active containers for an application, and that's in SchedulerApplication.liveContainers. We could extend RMContainer to add the ability to obtain an allocation start time, and now we can compute the resource consumption for the active containers in SchedulerApplication and roll them up into a usage total when the containers complete and are removed from liveContainers. Then at least we're eliminating an extra map to track active containers. As for the race condition, how about requiring schedulers to retain app attempts in their cache until signaled by RMAppAttemptImpl that it can be flushed? RMAppAttemptImpl already knows (eventually) when an application completes, and it can grab the latest app report with the rollup of resource usage from the scheduler, cache that usage locally into a total, then tell the scheduler via a new scheduler event that it can release the app attempt from its cache. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception
[ https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805830#comment-13805830 ] Omkar Vinit Joshi commented on YARN-1252: - [~vinodkv] [~jianhe] if you agree then we can close this. Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception --- Key: YARN-1252 URL: https://issues.apache.org/jira/browse/YARN-1252 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi {code} 2013-09-26 08:15:20,507 INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException as:rm/host@realm (auth:KERBEROS) cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal request for unknown token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805860#comment-13805860 ] Vinod Kumar Vavilapalli commented on YARN-956: -- Okay, looks good. Will check it in. [YARN-321] Add a testable in-memory HistoryStorage --- Key: YARN-956 URL: https://issues.apache.org/jira/browse/YARN-956 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: YARN-321 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, YARN-956.8.patch, YARN-956.9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805862#comment-13805862 ] Shinichi Yamashita commented on YARN-1350: -- In other words, you say that I don't have any problem if I fix port number in yarn.nodemanager.address property. And this problem will not surely occur. But then it should set a fixed appropriate port number like yarn.resourcemanager.address in yarn-default.xml and default port number. Why is 0? Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805867#comment-13805867 ] Omkar Vinit Joshi commented on YARN-1350: - That is mainly for single node cluster to avoid port clashing. For real cluster you should define a port there. If you agree I will close this as invalid. Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi resolved YARN-1350. - Resolution: Invalid Assignee: Omkar Vinit Joshi Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Omkar Vinit Joshi Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805885#comment-13805885 ] Omkar Vinit Joshi commented on YARN-674: * The recent test failure doesn't seem to be related to the code. The test passes locally. Should I open one ticket for this? * Not understanding how to fix that findbug warning .. should I add that too into exclude-findbug.xml? I tried this. Even eclipse doesn't complain {code} @Override @SuppressWarnings(unchecked) public void handle(DelegationTokenRenewerEvent event) { if (event.getType().equals( DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) event; handleDTRenewerEvent(appSubmitEvt); } else if (event.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { rmContext.getDelegationTokenRenewer().applicationFinished(event); } } {code} Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-674.1.patch, YARN-674.2.patch This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805887#comment-13805887 ] Hudson commented on YARN-1333: -- FAILURE: Integrated in Hadoop-trunk-Commit #4657 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4657/]) YARN-1333. Support blacklisting in the Fair Scheduler (Tsuyoshi Ozawa via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535899) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Fix For: 2.2.1 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805892#comment-13805892 ] Tsuyoshi OZAWA commented on YARN-1333: -- Thanks Sandy! Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Fix For: 2.2.1 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805900#comment-13805900 ] Shinichi Yamashita commented on YARN-1350: -- Why port clashing? For example, does it use multiple NodeManager with one server? Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Omkar Vinit Joshi Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1333) Support blacklisting in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805899#comment-13805899 ] Hudson commented on YARN-1333: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4658 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4658/]) YARN-1333: Add missing file SchedulerAppUtils (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535900) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java Support blacklisting in the Fair Scheduler -- Key: YARN-1333 URL: https://issues.apache.org/jira/browse/YARN-1333 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Fix For: 2.2.1 Attachments: YARN-1333.1.patch, YARN-1333.2.patch, YARN-1333.3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805907#comment-13805907 ] Vinod Kumar Vavilapalli commented on YARN-975: -- Had a look at the patch, some comments: - HDFS jar is not needed as a test-dependency in hadoop-yarn-server-applicationhistoryservice/pom.xml - You should make HistoryFileReader and HistoryFileWriter as static private inner classes and avoid sharing state altogether. - Wrap new ApplicationStartDataPBImpl(ApplicationStartDataProto.parseFrom(entry.value))) into a method in ApplicationStartData. Similarly others. - getApplicationAttempts(): If there is no history-file, we should throw a valid exception? - finishDtata: Typo - There is no limit on outstandingWriters. If RM runs 1K applications in parallel, we'll have 1K writers - RM can thus potentially go out of file handles. We need to limit this (configurable?) and queue any more writes into a limited number of threads. Can do in a follow up JIRA, please file one. - appId + START_DATA_SUFFIX: Instead of strings and appending, you can write a complex key which has an ApplicationId and the start marker and convert them to bytes when storing via a getBytes() method. - Similarly for ApplicationAttempt and Container suffixes. - When a HistoryFile exists, HistoryFileWriter should open it in append mode. - In both the reader and the writer, you should use IOUtils.cleanup() instead of explicitly calling close on each stream yourselves everywhere. - Don't think we should do this. Any retries should be inside FileSystemHistoryStore. We should close the writer in a finally block. {code} +// Not put close() in finally block in case callers want to retry writing+ // the data. On the other hand, the file will anyway be close when the +// store is stopped. {code} - Dismantle retriveStartFinishData() into two methods - one for start and one for finish. - TestApplicationHistoryStore was renamed in YARN-956, please update the patch - Test: A single file will only have data about a single application. So testWriteHistoryData() should not have multiple applications. Similarly ApplicationAttempt finish to follow after container-finish. - Test: We should NOT have this dependency. Java 7 reorders tests in some cases. {code} + // The order of the test cases matters {code} Add a file-system implementation for history-storage Key: YARN-975 URL: https://issues.apache.org/jira/browse/YARN-975 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-975.10.patch, YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch, YARN-975.6.patch, YARN-975.7.patch, YARN-975.8.patch, YARN-975.9.patch HDFS implementation should be a standard persistence strategy of history storage -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805908#comment-13805908 ] Omkar Vinit Joshi commented on YARN-1350: - you should checkout MiniYarnCluster. Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Omkar Vinit Joshi Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805921#comment-13805921 ] Akira AJISAKA commented on YARN-1350: - IMO, there are two options for not causing this problem. * Document to fix port number in real cluster. * Change yarn-default.xml to fix port number and MiniYarnCluster to use 0. Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Omkar Vinit Joshi Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1350) Should not add Lost Node by NodeManager reboot
[ https://issues.apache.org/jira/browse/YARN-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805932#comment-13805932 ] Shinichi Yamashita commented on YARN-1350: -- Thank you for additional information. I understand that the port number sets 0 to use multiple NodeManager for a test in MiniYARNCluster. And it seems to be easy to understand that there is notes setting it in real cluster in yarn-default.xml description. Should not add Lost Node by NodeManager reboot -- Key: YARN-1350 URL: https://issues.apache.org/jira/browse/YARN-1350 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Omkar Vinit Joshi Attachments: NodeState.txt In current trunk, when NodeManager reboots, the node information before the reboot is treated as LOST. This occurs to confirm only Inactive node information at the time of reboot. Therefore Lost Node will exist even if NodeManager works in all nodes. We should change it not to register Lost Node by the NodeManager reboot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA
[ https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805978#comment-13805978 ] Hadoop QA commented on YARN-1307: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610404/YARN-1307.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2293//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2293//console This message is automatically generated. Rethink znode structure for RM HA - Key: YARN-1307 URL: https://issues.apache.org/jira/browse/YARN-1307 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote} -- This message was sent by Atlassian JIRA (v6.1#6144)