[jira] [Created] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows
Chuan Liu created YARN-1078: --- Summary: TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows Key: YARN-1078 URL: https://issues.apache.org/jira/browse/YARN-1078 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name localhost. {noformat} org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0__01_00 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345 {noformat} {noformat} testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec FAILURE! org.junit.ComparisonFailure: expected:[localhost]:12345 but was:[127.0.0.1]:12345 at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows
[ https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-1078: Attachment: YARN-1078.patch Attach a patch. The fixes are quite straight forward. TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows - Key: YARN-1078 URL: https://issues.apache.org/jira/browse/YARN-1078 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1078.patch The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name localhost. {noformat} org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0__01_00 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345 {noformat} {noformat} testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec FAILURE! org.junit.ComparisonFailure: expected:[localhost]:12345 but was:[127.0.0.1]:12345 at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows
[ https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743619#comment-13743619 ] Hadoop QA commented on YARN-1078: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598714/YARN-1078.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1738//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1738//console This message is automatically generated. TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows - Key: YARN-1078 URL: https://issues.apache.org/jira/browse/YARN-1078 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1078.patch The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name localhost. {noformat} org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0__01_00 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345 {noformat} {noformat} testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec FAILURE! org.junit.ComparisonFailure: expected:[localhost]:12345 but was:[127.0.0.1]:12345 at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743717#comment-13743717 ] Hudson commented on YARN-643: - SUCCESS: Integrated in Hadoop-Yarn-trunk #306 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/306/]) YARN-643. Fixed ResourceManager to remove all tokens consistently on app finish. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition -- Key: YARN-643 URL: https://issues.apache.org/jira/browse/YARN-643 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, YARN-643.4.patch, YARN-643.5.patch The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743795#comment-13743795 ] Hudson commented on YARN-643: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1496 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1496/]) YARN-643. Fixed ResourceManager to remove all tokens consistently on app finish. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition -- Key: YARN-643 URL: https://issues.apache.org/jira/browse/YARN-643 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, YARN-643.4.patch, YARN-643.5.patch The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743819#comment-13743819 ] Robert Joseph Evans commented on YARN-896: -- [~criccomini], That is a great point. To do this we need the application to somehow inform YARN that it is a long lived application. We could do this either through some sort of metadata that is submitted with the application to YARN, possibly through the service registry, or even perhaps just setting the progress to a special value like -1. I think I would prefer the first one, because then YARN could use that metadata later on for other things. After that the UI change should not be too difficult. If you want to file a JIRA for it, either as a sub task or just link it in, that would be great. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743868#comment-13743868 ] Hudson commented on YARN-643: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1523 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1523/]) YARN-643. Fixed ResourceManager to remove all tokens consistently on app finish. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition -- Key: YARN-643 URL: https://issues.apache.org/jira/browse/YARN-643 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, YARN-643.4.patch, YARN-643.5.patch The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-194) Log handling in case of NM restart.
[ https://issues.apache.org/jira/browse/YARN-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743902#comment-13743902 ] Jason Lowe commented on YARN-194: - The NM waits not only for the container to complete but for the entire application to complete -- see YARN-219. Holding long-lived leases on many files in HDFS puts a lot of load on the namenode. It also cannot append on the fly since all the logs for all containers for an application on the node are in a single file in HDFS with the data for each log being contiguous within that file. Adding the ability to append to multiple log streams simultaneously is not possible in the current aggregated log format. It would be nice to have some mechanism to get the NM to clean up logs, as currently each time the NM restarts log files are being leaked. This has been fixed for container local directories and the distributed cache via YARN-71, but logs have been ignored. Seems like we should be consistent about these two. If the application is still running, isn't YARN-71 already deleting the app's current working directory and distcache files out from underneath it? Log handling in case of NM restart. --- Key: YARN-194 URL: https://issues.apache.org/jira/browse/YARN-194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.4 Reporter: Siddharth Seth Assignee: Omkar Vinit Joshi Currently, if an NM restarts - existing logs will be left around till they're manually cleaned up. The NM could be improved to handle these files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1020) Resource Localization using Groups as a new Localization Type
[ https://issues.apache.org/jira/browse/YARN-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743946#comment-13743946 ] Sangjin Lee commented on YARN-1020: --- This is an interesting problem/challenge. I kind of like [~jlowe]'s idea to make these files owned by the NM user. To me it seems consistent with the fact that these files are really owned and manipulated by the NM user. Resource Localization using Groups as a new Localization Type - Key: YARN-1020 URL: https://issues.apache.org/jira/browse/YARN-1020 Project: Hadoop YARN Issue Type: Improvement Reporter: Omkar Vinit Joshi The scenario is as follows.. * We definitely will have multiple applications running on top of yarn. These applications whenever run by users will need resources to be localized. Now the options what application-users will have for localizing resources are:- ** APPLICATION ... these files will be available for only that instance of the application and only for that single user. If we talk in terms of MR then for single job. ** PRIVATE ... available only for that user only for multiple runs of that application. Other users clearly will not be able to take advantage of that. So ideally will be wasting space (local resource cache) by replicating the same file again and again. ** PUBLIC... there will be only one copy of individual files of the application say APP_1..GOOD ..in the sense it will be accessible to all the users...But for secured clusters; users of different application (say APP_2) containers can then gain easy access to this applications (APP_1) private files and potentially may modify it. So clearly we don't have any solution today to solve the above problem with existing RESOURCE_LOCALIZATION_TYPES without effectively using space. Therefore we need something like GROUP to address this scenario. Thoughts?? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1079) Fix progress bar for long-lived services in YARN
Chris Riccomini created YARN-1079: - Summary: Fix progress bar for long-lived services in YARN Key: YARN-1079 URL: https://issues.apache.org/jira/browse/YARN-1079 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Chris Riccomini YARN currently shows a progress bar for jobs in its web UI. This is non-sensical for long-lived services, which have no concept of progress. For example, with Samza, we have stream processors which run for an indefinite amount of time (sometimes forever). YARN should support jobs without a concept of progress. Some discussion about this is on YARN-896. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler
[ https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743986#comment-13743986 ] Roger Hoover commented on YARN-466: --- @[~zjshen], yes, I'm referring to the MapReduce Application Master. Thanks for looking into this and for sharing what you've found so far. Slave hostname mismatches in ResourceManager/Scheduler -- Key: YARN-466 URL: https://issues.apache.org/jira/browse/YARN-466 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Roger Hoover Assignee: Zhijie Shen The problem is that the ResourceManager learns the hostname of a slave node when the NodeManager registers itself and it seems the node manager is getting the hostname by asking the OS. When a job is submitted, I think the ApplicationMaster learns the hostname by doing a reverse DNS lookup based on the slaves file. Therefore, the ApplicationMaster submits requests for containers using the fully qualified domain name (node1.foo.com) but the scheduler uses the OS hostname (node1) when checking to see if any requests are node-local. The result is that node-local requests are never found using this method of searching for node-local requests: ResourceRequest request = application.getResourceRequest(priority, node.getHostName()); I think it's unfriendly to ask users to make sure they configure hostnames to match fully qualified domain names. There should be a way for the ApplicationMaster and NodeManager to agree on the hostname. Steps to Reproduce: 1) Configure the OS hostname on slaves to differ from the fully qualified domain name. For example, if the FQDN for the slave is node1.foo.com, set the hostname on the node to be just node1. 2) On submitting a job, observe that the AM submits resource requests using the FQDN (e.g. node1.foo.com). You can add logging to the allocate() method of whatever scheduler you're using for (ResourceRequest req: ask) { LOG.debug(String.format(Request %s for %d containers on %s, req, req.getNumContainers(), req.getHostName())); } 3) Observe that when the scheduler checks for node locality (in the handle() method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is the one set in the host OS (e.g. node1). NOTE: if you're using FifoScheduler, this bug needs to be fixed first (https://issues.apache.org/jira/browse/YARN-412). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows
[ https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744003#comment-13744003 ] Chuan Liu commented on YARN-1078: - bq. -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch seems regressing on Linux. I will investigate the failure. TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows - Key: YARN-1078 URL: https://issues.apache.org/jira/browse/YARN-1078 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1078.patch The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name localhost. {noformat} org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0__01_00 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345 {noformat} {noformat} testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec FAILURE! org.junit.ComparisonFailure: expected:[localhost]:12345 but was:[127.0.0.1]:12345 at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1077) TestContainerLaunch fails on Windows
[ https://issues.apache.org/jira/browse/YARN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744024#comment-13744024 ] Hadoop QA commented on YARN-1077: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598710/YARN-1077.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1739//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1739//console This message is automatically generated. TestContainerLaunch fails on Windows Key: YARN-1077 URL: https://issues.apache.org/jira/browse/YARN-1077 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1077.2.patch, YARN-1077.patch Several cases in this unit tests fail on Windows. (Append error log at the end.) testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command call succeeded and there is no exception. testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test. testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906. {noformat} --- Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch --- Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec FAILURE! testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269) ... testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314) ... testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec FAILURE! junit.framework.AssertionFailedError: expected:137 but was:143 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at
[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.
[ https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744025#comment-13744025 ] Jian He commented on YARN-881: -- Hi [~sandyr], do you have more comments ? Priority#compareTo method seems to be wrong. Key: YARN-881 URL: https://issues.apache.org/jira/browse/YARN-881 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-881.1.patch, YARN-881.patch if lower int value means higher priority, shouldn't we return other.getPriority() - this.getPriority() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744029#comment-13744029 ] Steve Loughran commented on YARN-896: - Chris -I use the bar today as measure of expected nodes vs actual; i.e. what percentage of the goal of work has been met -which is free to vary up and down w/node failures -the percent bar is free to go in both directions YARN-1039 already says add a flag to say long-lived, so that future versions of YARN can behave differently. This could do more than GUI -in particular YARN-3 cgroup limits would be something you may want to turn on for services, to exactly limit their RAM CPU to what they asked for. If a long-lived service underestimates its requirements the impact on the node is worse than if a short-lived container does it -for that you may want to be more forgiving. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1080) Standardize help message for required parameter of $ yarn logs
Tassapol Athiapinya created YARN-1080: - Summary: Standardize help message for required parameter of $ yarn logs Key: YARN-1080 URL: https://issues.apache.org/jira/browse/YARN-1080 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Fix For: 2.1.0-beta YARN CLI has a command logs ($ yarn logs). The command always requires a parameter of -applicationId arg. However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily. {code:title=current help message} -bash-4.1$ yarn logs usage: general options are: -applicationId arg ApplicationId (required) -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} {code:title=proposed help message} -bash-4.1$ yarn logs usage: yarn logs -applicationId application ID [OPTIONS] general options are: -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1077) TestContainerLaunch fails on Windows
[ https://issues.apache.org/jira/browse/YARN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744049#comment-13744049 ] Chuan Liu commented on YARN-1077: - bq. -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: I will take a look at the failure. TestContainerLaunch fails on Windows Key: YARN-1077 URL: https://issues.apache.org/jira/browse/YARN-1077 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1077.2.patch, YARN-1077.patch Several cases in this unit tests fail on Windows. (Append error log at the end.) testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command call succeeded and there is no exception. testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test. testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906. {noformat} --- Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch --- Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec FAILURE! testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269) ... testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314) ... testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec FAILURE! junit.framework.AssertionFailedError: expected:137 but was:143 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500) ... testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 2744 sec FAILURE! junit.framework.AssertionFailedError: expected:137 but was:143 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601) ... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1080) Improve help message for $ yarn logs
[ https://issues.apache.org/jira/browse/YARN-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tassapol Athiapinya updated YARN-1080: -- Description: There are 2 parts I am proposing in this jira. They can be fixed together in one patch. 1. Standardize help message for required parameter of $ yarn logs YARN CLI has a command logs ($ yarn logs). The command always requires a parameter of -applicationId arg. However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily. {code:title=current help message} -bash-4.1$ yarn logs usage: general options are: -applicationId arg ApplicationId (required) -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} {code:title=proposed help message} -bash-4.1$ yarn logs usage: yarn logs -applicationId application ID [OPTIONS] general options are: -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} 2. Add description for help command. As far as I know, a user cannot get logs for running job. Since I spent some time trying to get logs of running applications, it should be nice to say this in command description. {code:title=proposed help} Retrieve logs for completed/killed YARN application usage: general options are... {code} was: YARN CLI has a command logs ($ yarn logs). The command always requires a parameter of -applicationId arg. However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily. {code:title=current help message} -bash-4.1$ yarn logs usage: general options are: -applicationId arg ApplicationId (required) -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} {code:title=proposed help message} -bash-4.1$ yarn logs usage: yarn logs -applicationId application ID [OPTIONS] general options are: -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node address is specified) -nodeAddress arg NodeAddress in the format nodename:port (must be specified if container id is specified) {code} Summary: Improve help message for $ yarn logs (was: Standardize help message for required parameter of $ yarn logs) Improve help message for $ yarn logs Key: YARN-1080 URL: https://issues.apache.org/jira/browse/YARN-1080 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Fix For: 2.1.0-beta There are 2 parts I am proposing in this jira. They can be fixed together in one patch. 1. Standardize help message for required parameter of $ yarn logs YARN CLI has a command logs ($ yarn logs). The command always requires a parameter of -applicationId arg. However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily. {code:title=current help message} -bash-4.1$ yarn logs usage: general options are: -applicationId arg ApplicationId (required) -appOwner argAppOwner (assumed to be current user if not specified) -containerId arg ContainerId (must be specified if node
[jira] [Commented] (YARN-49) Improve distributed shell application to work on a secure cluster
[ https://issues.apache.org/jira/browse/YARN-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744054#comment-13744054 ] Omkar Vinit Joshi commented on YARN-49: --- yes it is not working because of missing token propagation... I thought it is fixed but it is not.. Improve distributed shell application to work on a secure cluster - Key: YARN-49 URL: https://issues.apache.org/jira/browse/YARN-49 Project: Hadoop YARN Issue Type: Sub-task Components: applications/distributed-shell Reporter: Hitesh Shah Assignee: Omkar Vinit Joshi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744061#comment-13744061 ] Chris Riccomini commented on YARN-896: -- [~stev...@iseran.com] I've linked the JIRAs as relates to. The progress behavior you're describing is somewhat reasonable, but a bit unintuitive. Still feels like a hack. If that's the route we want to go, we should change the UI accordingly. If you think YARN-1079 is a dupe, feel free to close and update YARN-1039 with UI notes. Regarding CGroup limits, have a look at YARN-810. Might be related to what you're saying. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1081) Minor improvement to output header for $ yarn node -list
Tassapol Athiapinya created YARN-1081: - Summary: Minor improvement to output header for $ yarn node -list Key: YARN-1081 URL: https://issues.apache.org/jira/browse/YARN-1081 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Fix For: 2.1.0-beta Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding. {code:title=current output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} {code:title=proposed output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.
[ https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744084#comment-13744084 ] Sandy Ryza commented on YARN-881: - Lgtm, +1 Priority#compareTo method seems to be wrong. Key: YARN-881 URL: https://issues.apache.org/jira/browse/YARN-881 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-881.1.patch, YARN-881.patch if lower int value means higher priority, shouldn't we return other.getPriority() - this.getPriority() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.
[ https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744114#comment-13744114 ] Jian He commented on YARN-881: -- [~sandyr], can you commit this also ? thanks! Priority#compareTo method seems to be wrong. Key: YARN-881 URL: https://issues.apache.org/jira/browse/YARN-881 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-881.1.patch, YARN-881.patch if lower int value means higher priority, shouldn't we return other.getPriority() - this.getPriority() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
Arpit Gupta created YARN-1082: - Summary: Secure RM with recovery enabled and rm state store on hdfs fails with gss exception Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744121#comment-13744121 ] Arpit Gupta commented on YARN-1082: --- Here are the logs {code} 2013-08-17 17:32:08,272 INFO resourcemanager.ResourceManager (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2013-08-17 17:32:08,544 DEBUG service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state INITED 2013-08-17 17:32:08,683 DEBUG service.CompositeService (CompositeService.java:addService(69)) - Adding service Dispatcher 2013-08-17 17:32:08,685 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:rollMasterKey(105)) - Rolling master-key for amrm-tokens 2013-08-17 17:32:08,690 DEBUG service.CompositeService (CompositeService.java:addService(69)) - Adding service org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer 2013-08-17 17:32:08,691 DEBUG service.CompositeService (CompositeService.java:addService(69)) - Adding service AMLivelinessMonitor 2013-08-17 17:32:08,691 DEBUG service.CompositeService (CompositeService.java:addService(69)) - Adding service AMLivelinessMonitor 2013-08-17 17:32:08,694 DEBUG service.CompositeService (CompositeService.java:addService(69)) - Adding service org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer 2013-08-17 17:32:08,699 INFO security.RMContainerTokenSecretManager (RMContainerTokenSecretManager.java:init(75)) - ContainerTokenKeyRollingInterval: 8640ms and ContainerTokenKeyActivationDelay: 90ms 2013-08-17 17:32:08,704 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:init(77)) - NMTokenKeyRollingInterval: 8640ms and NMTokenKeyActivationDelay: 90ms 2013-08-17 17:32:08,738 DEBUG service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state INITED 2013-08-17 17:32:08,847 INFO event.AsyncDispatcher (AsyncDispatcher.java:register(157)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 2013-08-17 17:32:08,848 DEBUG service.AbstractService (AbstractService.java:start(197)) - Service Dispatcher is started 2013-08-17 17:32:09,084 DEBUG security.Groups (Groups.java:getUserToGroupsMappingService(180)) - Creating new Groups object 2013-08-17 17:32:09,088 DEBUG util.NativeCodeLoader (NativeCodeLoader.java:clinit(46)) - Trying to load the custom-built native-hadoop library... 2013-08-17 17:32:09,089 DEBUG util.NativeCodeLoader (NativeCodeLoader.java:clinit(50)) - Loaded the native-hadoop library 2013-08-17 17:32:09,089 DEBUG security.JniBasedUnixGroupsMapping (JniBasedUnixGroupsMapping.java:clinit(50)) - Using JniBasedUnixGroupsMapping for Group resolution 2013-08-17 17:32:09,090 DEBUG security.JniBasedUnixGroupsMappingWithFallback (JniBasedUnixGroupsMappingWithFallback.java:init(44)) - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping 2013-08-17 17:32:09,090 DEBUG security.Groups (Groups.java:init(66)) - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=30 2013-08-17 17:32:09,097 DEBUG security.UserGroupInformation (UserGroupInformation.java:login(176)) - hadoop login 2013-08-17 17:32:09,097 DEBUG security.UserGroupInformation (UserGroupInformation.java:commit(125)) - hadoop login commit 2013-08-17 17:32:09,098 DEBUG security.UserGroupInformation (UserGroupInformation.java:commit(139)) - using kerberos user:null 2013-08-17 17:32:09,099 DEBUG security.UserGroupInformation (UserGroupInformation.java:commit(155)) - using local user:UnixPrincipal: yarn 2013-08-17 17:32:09,101 DEBUG security.UserGroupInformation (UserGroupInformation.java:getLoginUser(696)) - UGI loginUser:yarn (auth:KERBEROS) 2013-08-17 17:32:09,216 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(326)) - dfs.client.use.legacy.blockreader.local = false 2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(329)) - dfs.client.read.shortcircuit = true 2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(332)) - dfs.client.domain.socket.data.traffic = false 2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(335)) - dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2013-08-17 17:32:09,234 DEBUG hdfs.HAUtil (HAUtil.java:cloneDelegationTokenForLogicalUri(276)) - No HA service delegation token found for logical URI hdfs://host/apps/yarn/recovery 2013-08-17 17:32:09,235 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(326)) - dfs.client.use.legacy.blockreader.local = false 2013-08-17 17:32:09,235 DEBUG hdfs.BlockReaderLocal (DFSClient.java:init(329)) - dfs.client.read.shortcircuit = true 2013-08-17
[jira] [Created] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
yeshavora created YARN-1083: --- Summary: ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval Key: YARN-1083 URL: https://issues.apache.org/jira/browse/YARN-1083 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes' Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
[ https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yeshavora updated YARN-1083: Affects Version/s: 2.1.0-beta ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval Key: YARN-1083 URL: https://issues.apache.org/jira/browse/YARN-1083 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: yeshavora if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes' Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744154#comment-13744154 ] Arpit Gupta commented on YARN-1082: --- It looks like we try to interact with hdfs before the rm has logged in using the keytab. Secure RM with recovery enabled and rm state store on hdfs fails with gss exception --- Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
[ https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1083: Labels: newbie (was: ) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval Key: YARN-1083 URL: https://issues.apache.org/jira/browse/YARN-1083 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: yeshavora Labels: newbie if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes' Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
[ https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1083: Component/s: resourcemanager ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval Key: YARN-1083 URL: https://issues.apache.org/jira/browse/YARN-1083 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: yeshavora Labels: newbie if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes' Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744252#comment-13744252 ] Jian He commented on YARN-1082: --- Yes, specifically, we should create the state store base directories after doSecureLogin() inside ResourceManaager.serviceStart() has been called. So propose to augment RMStateStore to extend service model, where creating base dirs can be performed inside serviceStart(). Secure RM with recovery enabled and rm state store on hdfs fails with gss exception --- Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744306#comment-13744306 ] Vinod Kumar Vavilapalli commented on YARN-1006: --- +1, the patch looks good to me. Tested this on a single node and the bug's gone. Checking this in. Nodes list web page on the RM web UI is broken -- Key: YARN-1006 URL: https://issues.apache.org/jira/browse/YARN-1006 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong Attachments: YARN-1006.1.patch The nodes web page which list all the connected nodes of the cluster is broken. 1. The page is not showing in correct format/style. 2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-905) Add state filters to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-905: - Attachment: YARN-905.patch retrigger the QA server Add state filters to nodes CLI -- Key: YARN-905 URL: https://issues.apache.org/jira/browse/YARN-905 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Wei Yan Attachments: Yarn-905.patch, YARN-905.patch, YARN-905.patch It would be helpful for the nodes CLI to have a node-states option that allows it to return nodes that are not just in the RUNNING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744346#comment-13744346 ] Hudson commented on YARN-1006: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4292 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4292/]) YARN-1006. Fixed broken rendering in the Nodes list web page on the RM web UI. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515629) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java Nodes list web page on the RM web UI is broken -- Key: YARN-1006 URL: https://issues.apache.org/jira/browse/YARN-1006 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1006.1.patch The nodes web page which list all the connected nodes of the cluster is broken. 1. The page is not showing in correct format/style. 2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-905) Add state filters to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744391#comment-13744391 ] Hadoop QA commented on YARN-905: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598850/YARN-905.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1740//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1740//console This message is automatically generated. Add state filters to nodes CLI -- Key: YARN-905 URL: https://issues.apache.org/jira/browse/YARN-905 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Wei Yan Attachments: Yarn-905.patch, YARN-905.patch, YARN-905.patch It would be helpful for the nodes CLI to have a node-states option that allows it to return nodes that are not just in the RUNNING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744493#comment-13744493 ] Carlo Curino commented on YARN-1021: Sorry for the delay. I went over the patch today together with Chris Douglas and here some input from the both of us. I generally like the effort, and the live visualization is really neat. Also making it into a completely separate tool is convenient/safe. The main limitations I see in this simulator are: * it only simulates the Scheduler code, mocking out most of the RM, and all AM and NM, communication, submissions... * If I am not mistaken runs at wall-clock time (not faster) * does not run the monitors which are needed for simulating preemption in the CapacityScheduler An alternative approach that we explored was to hijack the Clocks around the RM and drive them using a discrete event simulation, thus exercising more of the RM code, protocols etc... and enabling faster than wall-clock speeds (though not trivial to achieve). We have some working but not polished code in this space, which we could probably provide if you think might be integrated/leveraged. Ignoring alternative approaches, and broader spectrum we mentioned above, there are few issues with the current patch: * It should be possible to consistently replay (seed RANDOM) * Using Rumen reader (JobProducer, etc.) instead of parsing json directly seems cleaner. Also we have a synth load generator which we will release soon that implements the JobProducer/JobStory interface (might be nice to use that to drive your simulations) * LICENSE/NOTICE should be updated to include the BSD-like licenses you bring in with the new libraries * It seems somewhat hard to detect regressions w/ trunk since: ** mocks away much of the AM/NM/RM ** few unit tests ** does not simulate important behaviors in the AM (no slow start, headroom, etc.) ** does not exercise failures, timeouts, etc. Smaller issues: * some javadoc @param unpopulated * why a dependency on another metrics package, instead of Hadoop's? * why NodeUpdateSchedulerEventWrapper? Doesn't seem to add anything... * use ResourceCalculator instead of manually adjusting Resources from RR * initMetrics is a very large method... * SLSWebApp: is a wall of string appends. I am not very web savvy but I believe there should be cleaner ways to generate this. This seems hard to maintain/evolve. I hope this helps. I will be traveling abroad for a couple of weeks so I might be slow/unresponsive. Altogether since it is rather on a side I am not too concern about it, the suggestions are mostly to make sure it is really useful and that people can use it / maintain it overtime. If committed as is will do no harm, but I think it risk to be dropped in, used twice for FairScheduler work, and than loose relevance and get out of sync from trunk. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler
[jira] [Updated] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1082: Priority: Blocker (was: Major) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception --- Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1082: Target Version/s: 2.1.1-beta Secure RM with recovery enabled and rm state store on hdfs fails with gss exception --- Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1081) Minor improvement to output header for $ yarn node -list
[ https://issues.apache.org/jira/browse/YARN-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1081: Labels: newbie (was: ) Minor improvement to output header for $ yarn node -list Key: YARN-1081 URL: https://issues.apache.org/jira/browse/YARN-1081 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Priority: Minor Labels: newbie Fix For: 2.1.0-beta Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding. {code:title=current output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} {code:title=proposed output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1081) Minor improvement to output header for $ yarn node -list
[ https://issues.apache.org/jira/browse/YARN-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1081: Priority: Minor (was: Major) Minor improvement to output header for $ yarn node -list Key: YARN-1081 URL: https://issues.apache.org/jira/browse/YARN-1081 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Priority: Minor Fix For: 2.1.0-beta Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding. {code:title=current output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} {code:title=proposed output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
[ https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-1082: - Assignee: Vinod Kumar Vavilapalli (was: Jian He) Taking this over for a quick fix. Secure RM with recovery enabled and rm state store on hdfs fails with gss exception --- Key: YARN-1082 URL: https://issues.apache.org/jira/browse/YARN-1082 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Vinod Kumar Vavilapalli Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers
[ https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744583#comment-13744583 ] Omkar Vinit Joshi commented on YARN-1076: - Hi [~maysamyabandeh] did you see this issue by code walk through? or you faced this in your cluster? related YARN-957 ? RM gets stuck with a reservation, ignoring new containers - Key: YARN-1076 URL: https://issues.apache.org/jira/browse/YARN-1076 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Maysam Yabandeh Priority: Minor LeafQueue#assignContainers rejects newly available containers if #needContainers returns false: {code:java} if (!needContainers(application, priority, required)) { continue; } {code} When the application has already reserved all the required containers, #needContainers returns false as long as no starvation is reported: {code:java} return (((starvation + requiredContainers) - reservedContainers) 0); {code} where starvation is computed based on the attempts on re-reserving a resource. On the other hand, a resource is re-reserved via #assignContainersOnNode only if it passed the #needContainers precondition: {code:java} // Do we need containers at this 'priority'? if (!needContainers(application, priority, required)) { continue; } //. //. //. // Try to schedule CSAssignment assignment = assignContainersOnNode(clusterResource, node, application, priority, null); {code} In other words, once needContainers returns false due to a reservation, it keeps rejecting newly available resources, since no reservation is ever attempted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1084) RM restart does not work for map only job
yeshavora created YARN-1084: --- Summary: RM restart does not work for map only job Key: YARN-1084 URL: https://issues.apache.org/jira/browse/YARN-1084 Project: Hadoop YARN Issue Type: Bug Reporter: yeshavora Map only job (randomwriter, randomtextwriter) restarts from scratch [0% map 0% reduce] after RM restart. It should resume from the last state when RM restarted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1085) Yarn and MRv2 should do HTTP client authentication in kerberos setup.
Jaimin D Jetly created YARN-1085: Summary: Yarn and MRv2 should do HTTP client authentication in kerberos setup. Key: YARN-1085 URL: https://issues.apache.org/jira/browse/YARN-1085 Project: Hadoop YARN Issue Type: Task Components: nodemanager, resourcemanager Reporter: Jaimin D Jetly In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1085) Yarn and MRv2 should do HTTP client authentication in kerberos setup.
[ https://issues.apache.org/jira/browse/YARN-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-1085: --- Assignee: Omkar Vinit Joshi Yarn and MRv2 should do HTTP client authentication in kerberos setup. - Key: YARN-1085 URL: https://issues.apache.org/jira/browse/YARN-1085 Project: Hadoop YARN Issue Type: Task Components: nodemanager, resourcemanager Reporter: Jaimin D Jetly Assignee: Omkar Vinit Joshi Labels: security In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers
[ https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744683#comment-13744683 ] Maysam Yabandeh commented on YARN-1076: --- Hi [~ojoshi]. I am observing the problem with a unit test using MiniYarnCluster. The explanation however is based solely on code walk through. I did not submit the test case since the problem did not always show up--due to the non-determinism in MiniYarnCluster. Anyway, I see that you have already covered that in the objectives of YARN-957: | Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. RM gets stuck with a reservation, ignoring new containers - Key: YARN-1076 URL: https://issues.apache.org/jira/browse/YARN-1076 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Maysam Yabandeh Priority: Minor LeafQueue#assignContainers rejects newly available containers if #needContainers returns false: {code:java} if (!needContainers(application, priority, required)) { continue; } {code} When the application has already reserved all the required containers, #needContainers returns false as long as no starvation is reported: {code:java} return (((starvation + requiredContainers) - reservedContainers) 0); {code} where starvation is computed based on the attempts on re-reserving a resource. On the other hand, a resource is re-reserved via #assignContainersOnNode only if it passed the #needContainers precondition: {code:java} // Do we need containers at this 'priority'? if (!needContainers(application, priority, required)) { continue; } //. //. //. // Try to schedule CSAssignment assignment = assignContainersOnNode(clusterResource, node, application, priority, null); {code} In other words, once needContainers returns false due to a reservation, it keeps rejecting newly available resources, since no reservation is ever attempted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers
[ https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744690#comment-13744690 ] Omkar Vinit Joshi commented on YARN-1076: - if it is similar then we can close it as duplicate.. can you try patch YARN-957 locally and see if it fixes your problem too?? thanks. RM gets stuck with a reservation, ignoring new containers - Key: YARN-1076 URL: https://issues.apache.org/jira/browse/YARN-1076 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Maysam Yabandeh Priority: Minor LeafQueue#assignContainers rejects newly available containers if #needContainers returns false: {code:java} if (!needContainers(application, priority, required)) { continue; } {code} When the application has already reserved all the required containers, #needContainers returns false as long as no starvation is reported: {code:java} return (((starvation + requiredContainers) - reservedContainers) 0); {code} where starvation is computed based on the attempts on re-reserving a resource. On the other hand, a resource is re-reserved via #assignContainersOnNode only if it passed the #needContainers precondition: {code:java} // Do we need containers at this 'priority'? if (!needContainers(application, priority, required)) { continue; } //. //. //. // Try to schedule CSAssignment assignment = assignContainersOnNode(clusterResource, node, application, priority, null); {code} In other words, once needContainers returns false due to a reservation, it keeps rejecting newly available resources, since no reservation is ever attempted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-879) Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources()
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-879: Attachment: YARN-879-v2.patch In v2 patch, fix unit test in TestCapacityScheduler and TestResourceManager and illustrate how test/resourcemanager.Application works. Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources() - Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch, YARN-879-v2.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira