[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757503#comment-13757503 ] Konstantin Boudnik commented on YARN-696: - #1 - good one: makes code way more readable. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur moved HDFS-5161 to YARN-1142: Component/s: (was: test) Fix Version/s: (was: 2.1.1-beta) 2.1.1-beta Target Version/s: 2.1.1-beta (was: 2.1.1-beta) Affects Version/s: (was: 2.1.0-beta) 2.1.0-beta Key: YARN-1142 (was: HDFS-5161) Project: Hadoop YARN (was: Hadoop HDFS) MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.1-beta When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757597#comment-13757597 ] Steve Loughran commented on YARN-1001: -- As long as everyone is happy with the URL structure, yes. If it were just one app type at a time I'd push for having the app in the URL. One thing I'd like to see is tests to verify that it works for app types with the following characters in them: space, ?, ,, , %. \. There's no restrictions on the type of an app (unless someone wants to add those restrictions right now, *which may not be a bad idea*), so you need to make sure the unusual characters make it all the way to this API and back YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757599#comment-13757599 ] J.Andreina commented on YARN-842: - Unable to view the jobs in IE9 both at RM and JHS UI. Currently using hadoop-2.1.0-beta version. IE version :9.0.8112.16421 Attached screenshots of RM and JHS UI Resource Manager Node Manager UI's doesn't work with IE - Key: YARN-842 URL: https://issues.apache.org/jira/browse/YARN-842 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) Timestamp: Mon, 17 Jun 2013 12:06:03 UTC Message: 'JSON' is undefined Line: 41 Char: 218 Code: 0 URI: http://10.18.40.24:8088/cluster/apps {code} RM NM UI's are not working with IE and showing the above error for every link on the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1143) Restrict the names that apps and types can have
Steve Loughran created YARN-1143: Summary: Restrict the names that apps and types can have Key: YARN-1143 URL: https://issues.apache.org/jira/browse/YARN-1143 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Steve Loughran Priority: Minor YARN-1001 is an example of a RESTy API to the RM's list of apps and app types -and it shows that we may want to add some restrictions on the characters allowed in an app name or type (or at least forbid some) -before it is too late. If we don't do that, then tests should verify that you can have apps with high-unicode names as well as other troublesome characters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-842: -- Assignee: (was: Devaraj K) Resource Manager Node Manager UI's doesn't work with IE - Key: YARN-842 URL: https://issues.apache.org/jira/browse/YARN-842 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Devaraj K {code:xml} Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) Timestamp: Mon, 17 Jun 2013 12:06:03 UTC Message: 'JSON' is undefined Line: 41 Char: 218 Code: 0 URI: http://10.18.40.24:8088/cluster/apps {code} RM NM UI's are not working with IE and showing the above error for every link on the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1074) Clean up YARN CLI app list to show only running apps.
[ https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757652#comment-13757652 ] Hudson commented on YARN-1074: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #322 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/322/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java Clean up YARN CLI app list to show only running apps. - Key: YARN-1074 URL: https://issues.apache.org/jira/browse/YARN-1074 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, YARN-1074.8.patch Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result. {code} [user1@host1 ~]$ yarn application -list Total Applications:150 Application-IdApplication-Name Application-Type User Queue State Final-State ProgressTracking-URL application_1374638600275_0109 Sleep job MAPREDUCEuser1 default KILLED KILLED 100%host1:54059 application_1374638600275_0121 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121 application_1374638600275_0020 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020 application_1374638600275_0038 Sleep job MAPREDUCEuser1 default {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED
[ https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757654#comment-13757654 ] Hudson commented on YARN-1124: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #322 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/322/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java By default yarn application -list should display all the applications in a state other than FINISHED / FAILED - Key: YARN-1124 URL: https://issues.apache.org/jira/browse/YARN-1124 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1124.1.patch Today we are just listing application in RUNNING state by default for yarn application -list. Instead we should show all the applications which are either submitted/accepted/running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757710#comment-13757710 ] Junping Du commented on YARN-311: - Thanks for review! [~tucu00] bq. If we make totalCapability volatile then we don't need to use a read/write lock. Yes. Make it as volatile sounds better as locking whole object is not necessary. Will update patch soon. bq. Does this mean that if the node is restarted we lose the capacity correction done thru the RM admin API for that node? Yes and No. It is correct that this patch will not guarantee capacity correction persist through NM restart but the other jira (YARN-998) under the same umbrella will address this persistent issue. My current thinking is we can cache a mapping in RM as NodeID - updatedResource which is updated by RM admin call and NM restart heartbeat will try to find if new resource there before registering node's resource. Does that make sense to you? May be we can discuss more options in YARN-998. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API). In this jira, we will only contain changes in scheduler. For design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1074) Clean up YARN CLI app list to show only running apps.
[ https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757749#comment-13757749 ] Hudson commented on YARN-1074: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1512 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1512/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java Clean up YARN CLI app list to show only running apps. - Key: YARN-1074 URL: https://issues.apache.org/jira/browse/YARN-1074 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, YARN-1074.8.patch Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result. {code} [user1@host1 ~]$ yarn application -list Total Applications:150 Application-IdApplication-Name Application-Type User Queue State Final-State ProgressTracking-URL application_1374638600275_0109 Sleep job MAPREDUCEuser1 default KILLED KILLED 100%host1:54059 application_1374638600275_0121 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121 application_1374638600275_0020 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020 application_1374638600275_0038 Sleep job MAPREDUCEuser1 default {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-311: Attachment: YARN-311-v6.2.patch Address recent comments from Alejandro on replacing read/write lock with volatile on setCapability in RMNodeImpl. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API). In this jira, we will only contain changes in scheduler. For design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED
[ https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757751#comment-13757751 ] Hudson commented on YARN-1124: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1512 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1512/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java By default yarn application -list should display all the applications in a state other than FINISHED / FAILED - Key: YARN-1124 URL: https://issues.apache.org/jira/browse/YARN-1124 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1124.1.patch Today we are just listing application in RUNNING state by default for yarn application -list. Instead we should show all the applications which are either submitted/accepted/running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757787#comment-13757787 ] Thomas Graves edited comment on YARN-1106 at 9/4/13 2:03 PM: - Commented on that jira, I don't see how that fixes the tracking url being empty issue. Once we have the generic history server that fixes at least some of the cases, but as you say in the other jira there are a bunch of corner cases. was (Author: tgraves): Commented on that jira, I don't see how that fixing the tracking url being empty issue. Once we have the generic history server that fixes at least some of the cases, but as you say in the other jira there are a bunch of corner cases. The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757791#comment-13757791 ] Hadoop QA commented on YARN-311: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601377/YARN-311-v6.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1829//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1829//console This message is automatically generated. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API). In this jira, we will only contain changes in scheduler. For design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Lorimer updated YARN-696: Attachment: YARN-696.diff No problem Zhijie, they are great comments thanks. I have applied the changes and broke the lines where I can at 80 characters. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757787#comment-13757787 ] Thomas Graves commented on YARN-1106: - Commented on that jira, I don't see how that fixing the tracking url being empty issue. Once we have the generic history server that fixes at least some of the cases, but as you say in the other jira there are a bunch of corner cases. The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1140) Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation
[ https://issues.apache.org/jira/browse/YARN-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757788#comment-13757788 ] Thomas Graves commented on YARN-1140: - I understand always linking to the per app page to try to make it more consistent but at the same time I don't like that power users will have to click one more time. I also don't see how this solves the issue with the tracking url being a bad link, unless you are also proposing to handle that better on the app page? if an app finishes and doesn't set the history link (for instance a non-mapreduce app) or crashes before they can set it, the tracking url link is still going to to go a bad page. Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation --- Key: YARN-1140 URL: https://issues.apache.org/jira/browse/YARN-1140 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Today, there are so many corner cases, specifically when the AM fails to start, when users will see that the tracking URL is broken or redirected to the per-app page. I am thinking of removing the tracking URL completely from the landing web-page and always force users to first jump on to the application-page. That way, there is consistency and there will always be one page that users can go to for their app information and then subsequently navigate to the AM page if all went well. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED
[ https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757811#comment-13757811 ] Hudson commented on YARN-1124: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1539 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1539/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java By default yarn application -list should display all the applications in a state other than FINISHED / FAILED - Key: YARN-1124 URL: https://issues.apache.org/jira/browse/YARN-1124 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1124.1.patch Today we are just listing application in RUNNING state by default for yarn application -list. Instead we should show all the applications which are either submitted/accepted/running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1074) Clean up YARN CLI app list to show only running apps.
[ https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757809#comment-13757809 ] Hudson commented on YARN-1074: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1539 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1539/]) YARN-1124. Modified YARN CLI application list to display new and submitted applications together with running apps by default, following up YARN-1074. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java Clean up YARN CLI app list to show only running apps. - Key: YARN-1074 URL: https://issues.apache.org/jira/browse/YARN-1074 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, YARN-1074.8.patch Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result. {code} [user1@host1 ~]$ yarn application -list Total Applications:150 Application-IdApplication-Name Application-Type User Queue State Final-State ProgressTracking-URL application_1374638600275_0109 Sleep job MAPREDUCEuser1 default KILLED KILLED 100%host1:54059 application_1374638600275_0121 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121 application_1374638600275_0020 Sleep job MAPREDUCEuser1 defaultFINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020 application_1374638600275_0038 Sleep job MAPREDUCEuser1 default {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1106: Attachment: YARN-1106.patch The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1106.patch It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757866#comment-13757866 ] Hadoop QA commented on YARN-1106: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601390/YARN-1106.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1831//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1831//console This message is automatically generated. The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1106.patch It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757839#comment-13757839 ] Jason Lowe commented on YARN-540: - Sorry for arriving late, but why wouldn't we want to implement choice (1) above? (i.e.: block until store confirms app state is removed). From an AM's perspective, that's the simplest solution. Returning control to the AM early from the unregister is inviting the AM to do bad things wrt. a potential restart (e.g.: MR AM will remove its staging directory, effectively preventing the restart from succeeding and leading the RM to believe the app failed). The unregister call is a terminal call in the AM-RM protocol, so I think it's appropriate for that to not return until the app truly is unregistered. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757904#comment-13757904 ] Daryn Sharp commented on YARN-707: -- Ug, the RM and AM are abusing the same secret manager impl. The RM wants the secret key to be generated, whereas the AM really wants to verify it. 2.x fixed this. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757909#comment-13757909 ] Jason Lowe commented on YARN-707: - bq. Ug, the RM and AM are abusing the same secret manager impl. The RM wants the secret key to be generated, whereas the AM really wants to verify it. 2.x fixed this. Right, this condition as well as the fact that the RM leaks keys in the secret manager for each app (no way to remove them) is not new with this patch as it was already pre-existing in 0.23. IMO those issues should be fixed in another JIRA since they're not introduced by this change. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757890#comment-13757890 ] Daryn Sharp commented on YARN-707: -- Still reviewing, but an initial observation is {{ClientToAMSecretManager#getMasterKey}} is fabricating a new secret key if there is no pre-existing key for the appId. This should be an error condition. The secret manager knows the secret key for the specific app so there's no need to ever generate a secret key, right? Else I can flood the AM with invalid appIds to make it go OOM from generating secret keys for invalid appIds. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied
Alejandro Abdelnur created YARN-1144: Summary: Unmanaged AMs registering a tracking URI should not be proxy-fied Key: YARN-1144 URL: https://issues.apache.org/jira/browse/YARN-1144 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.1.1-beta Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757925#comment-13757925 ] Srimanth Gunturi commented on YARN-1001: Ambari's primary use-case is to show in YARN UI the app-type/state distribution as a time-series graph. For this we would make periodic calls to {{/ws/v1/cluster/appscount}}. Apart from that we need similar information for MR2 UI, where a call to {{/ws/v1/cluster/appscount?types=mapreduce}} would be made. For now having these calls should suffice. I am hoping that these calls include both the current/real-time app-type counts, as well as historical information (atleast until last RM restart)? YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe moved MAPREDUCE-5486 to YARN-1145: - Component/s: (was: jobhistoryserver) Assignee: (was: Rohith Sharma K S) Target Version/s: 2.1.1-beta (was: 2.1.1-beta) Affects Version/s: (was: 2.1.1-beta) (was: 2.0.5-alpha) 2.1.1-beta 2.0.5-alpha Key: YARN-1145 (was: MAPREDUCE-5486) Project: Hadoop YARN (was: Hadoop Map/Reduce) Potential file handler leak in JobHistoryServer web ui. --- Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 2.1.1-beta Reporter: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757926#comment-13757926 ] Daryn Sharp commented on YARN-707: -- Minor: # {{ClientToAMTokenIdentifier#getUser()}} doesn't do a null check on the client name (because it can't be null) but should perhaps still check isEmpty()? # Is {{ResourceManager#clientToAMSecretManager}} still needed now that it's in the context? # Now that the client token is generated in {{RMAppAttemptImpl}} - should it contain the attemptId, not the appId? Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-1145: Assignee: Rohith Sharma K S Potential file handler leak in JobHistoryServer web ui. --- Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757824#comment-13757824 ] Hadoop QA commented on YARN-696: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601382/YARN-696.diff against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1830//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1830//console This message is automatically generated. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757932#comment-13757932 ] Hadoop QA commented on YARN-1145: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600541/MAPREDUCE-5486.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1832//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1832//console This message is automatically generated. Potential file handler leak in JobHistoryServer web ui. --- Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757973#comment-13757973 ] Zhijie Shen commented on YARN-1001: --- [~srimanth.gunturi], for /ws/v1/cluster/appscount, does Ambari want to specify multiple states and multiple application-types in the params, and get the per application-type and state count for every combination of one application-type and one state? Or Ambari is fine to make multiple calls, each of which just specify one (or zero) application-type and state? The two cases make a big difference to the results. The former one needs to return a table which contains the counts of all application-type and state combinations, while the latter one is so simple to return one number. bq. I am hoping that these calls include both the current/real-time app-type counts, as well as historical information (atleast until last RM restart)? A count every constant time interval? Would you please specify more about the requirement? bq. There's no restrictions on the type of an app (unless someone wants to add those restrictions right now, which may not be a bad idea), so you need to make sure the unusual characters make it all the way to this API and back Sounds a good idea YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number
[ https://issues.apache.org/jira/browse/YARN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757983#comment-13757983 ] Daryn Sharp commented on YARN-1146: --- Note that bug #2 will not self-correct if the following sequence occurs: # Issue token 1, 2, 3, 4 (seq=4) # Renew token 2 (seq=2) # Cancel token 3, 4 (seq=2) # Stop RM # Start RM (seq=2) and will issue token 3 and 4 again The issue is _probably_ benign given the current implementation, but is a bug if anything relies on sequence number. RM DTSM and RMStateStore mismanage sequence number -- Key: YARN-1146 URL: https://issues.apache.org/jira/browse/YARN-1146 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp {{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and {{updateStoredToken}} (renew) to pass the token and its sequence number to {{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}. There are two problems: # The assumption is that new tokens will be synchronously stored in-order. With an async secret manager this may not hold true and the state's sequence number may be incorrect. # A token renewal will reset the state's sequence number to _that token's_ sequence number. Bug #2 is generally masked. Creating a new token (with the first caveat) will bump the state's sequence number back up. Restoring the dtsm will first set the state's stored sequence number, then re-add all the tokens which will update the sequence number if the token's sequence number is greater than the dtsm's current sequence number. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1145: - Target Version/s: 0.23.10, 2.1.1-beta (was: 2.1.1-beta) Affects Version/s: 0.23.9 Summary: Potential file handle leak in aggregated logs web ui (was: Potential file handler leak in JobHistoryServer web ui.) +1 lgtm, will commit this shortly. I noticed this affects 0.23 as well. Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757991#comment-13757991 ] Zhijie Shen commented on YARN-696: -- Thanks, Trevor. +1, the patch looks good to me. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff, YARN-696.diff Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number
Daryn Sharp created YARN-1146: - Summary: RM DTSM and RMStateStore mismanage sequence number Key: YARN-1146 URL: https://issues.apache.org/jira/browse/YARN-1146 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp {{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and {{updateStoredToken}} (renew) to pass the token and its sequence number to {{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}. There are two problems: # The assumption is that new tokens will be synchronously stored in-order. With an async secret manager this may not hold true and the state's sequence number may be incorrect. # A token renewal will reset the state's sequence number to _that token's_ sequence number. Bug #2 is generally masked. Creating a new token (with the first caveat) will bump the state's sequence number back up. Restoring the dtsm will first set the state's stored sequence number, then re-add all the tokens which will update the sequence number if the token's sequence number is greater than the dtsm's current sequence number. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757994#comment-13757994 ] Bikas Saha commented on YARN-1063: -- Will this be used in secure and non-secure clusters? I dont think I fully understood the privileges of the launcher. By that do we mean the TaskTracker/NodeManager or the winutils process that is launched by the TT/NM. If its the TT/NM then do we end up having a long-running Hadoop service with elevated privileges? Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: trunk-win Environment: Windows Reporter: Kyle Leckie Labels: security Fix For: trunk-win Attachments: YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact
[jira] [Created] (YARN-1147) Add end-to-end tests for HA
Karthik Kambatla created YARN-1147: -- Summary: Add end-to-end tests for HA Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Fix For: 2.3.0 While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758007#comment-13758007 ] Vinod Kumar Vavilapalli commented on YARN-1145: --- Reader.close() calls BCFile.Reader.close() which. Isn't doing anything. Am I missing something? Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1140) Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation
[ https://issues.apache.org/jira/browse/YARN-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758019#comment-13758019 ] Vinod Kumar Vavilapalli commented on YARN-1140: --- bq. I understand always linking to the per app page to try to make it more consistent but at the same time I don't like that power users will have to click one more time. Yeah, but it's a trade off for consistency. I've seen people struggling when failures happen and that is a bigger pain than clicking through one more link. bq. I also don't see how this solves the issue with the tracking url being a bad link, unless you are also proposing to handle that better on the app page? if an app finishes and doesn't set the history link (for instance a non-mapreduce app) or crashes before they can set it, the tracking url link is still going to to go a bad page. We should do a better job fixing such bad links, so yeah +1 for YARN-1106 and the likes. But even without that, this is still fine. Without my proposed change, users will hit bad links and then have no clue of what happened. With the change, they'll land up on the app-page, learn *something* about their apps, and then click the bad link. Net-net, they get more info than they do now. Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation --- Key: YARN-1140 URL: https://issues.apache.org/jira/browse/YARN-1140 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Today, there are so many corner cases, specifically when the AM fails to start, when users will see that the tracking URL is broken or redirected to the per-app page. I am thinking of removing the tracking URL completely from the landing web-page and always force users to first jump on to the application-page. That way, there is consistency and there will always be one page that users can go to for their app information and then subsequently navigate to the AM page if all went well. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758037#comment-13758037 ] Bikas Saha commented on YARN-540: - 1) or 2) are basically the same thing. 1) will block the unregister call until it succeeds. 2) requires the AM to keep looping on unregister until it succeeds. 2) just enables the RM to make the store operation asynchronously and prevent RPC threads from getting blocked. The core issue is that the RM can crash before removing the app from the store. Thus when it restarts it thinks that the app is still running and tries to re-launch it. This is the core issue in this jira and should be a rare event. The MR app master sleeps for 5s before unregistering with the RM and reports success meanwhile to the client. This exacerbates the above rare issue and makes it possible to repro it more often. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758044#comment-13758044 ] Omkar Vinit Joshi commented on YARN-957: Thanks vinod. addressed the comments. bq. Use Resource.newInstance instead of RecordFactory. Fixed bq. The Log message in LeafQueue should be at WARN level fixed bq. The test looks good, but let's not have hard-coded waits like the following in the test Yes changed it. Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-957: --- Attachment: YARN-957-20130904.1.patch Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758047#comment-13758047 ] Jason Lowe commented on YARN-1145: -- I think the logic behind calling close on the TFile.Reader is consistency -- if an object has a close() method, probably prudent to call as it may not always do nothing in the future. The real fix with this patch is in AggregatedLogsBlock which will call close() on the LogReader which will in turn close the data stream and release the associated socket. Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758096#comment-13758096 ] Jason Lowe commented on YARN-540: - Yes, I realize that 1) and 2) are at a high level accomplishing the same thing. However 2) requires cooperation from the AM which is user code and therefore harder to control while 1) does not. There is the issue of RPC threads getting blocked which may necessitate 2), but otherwise 1) would be preferable since it requires less cooperation/coordination with the AMs. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758115#comment-13758115 ] Hadoop QA commented on YARN-957: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601417/YARN-957-20130904.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1833//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1833//console This message is automatically generated. Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758209#comment-13758209 ] Jason Lowe commented on YARN-707: - Thanks for the review, Daryn. bq. ClientToAMTokenIdentifier#getUser() doesn't do a null check on the client name (because it can't be null) but should perhaps still check isEmpty()? Will fix that. bq. Is ResourceManager#clientToAMSecretManager still needed now that it's in the context? Technically no, but all the other pieces of the context are also fields of ResourceManager so it's consistent with those. bq. Now that the client token is generated in RMAppAttemptImpl - should it contain the attemptId, not the appId? The original client tokens in 0.23 were per-app and not per-app-attempt, and I didn't want to change that association as part of this change. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-707: Attachment: YARN-707-20130904.branch-0.23.txt Updated patch for branch-0.23 to add isEmpty() check on client token username. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758231#comment-13758231 ] Hadoop QA commented on YARN-1065: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601436/YARN-1065.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1834//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1834//console This message is automatically generated. NM should provide AuxillaryService data to the container Key: YARN-1065 URL: https://issues.apache.org/jira/browse/YARN-1065 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, YARN-1065.8.patch Start container returns auxillary service data to the AM but does not provide the same information to the task itself. It could add that information to the container env with key=service_name and value=service_data. This allows the container to start using the service without having to depend on the AM to send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758182#comment-13758182 ] Srimanth Gunturi commented on YARN-1001: [~zjshen], we are expecting {{/ws/v1/cluster/appscount}} to provide all app-types/state-counts in 1 call. We are expecting {{/ws/v1/cluster/appscount?types=mapreduce}} to provide all mapreduce state-counts in 1 call. Apart from that, we need {{/ws/v1/cluster/appscount}} information pushed to Ganglia. Or else Ambari will not be able to show various graphs which are important. We currently populate {{/etc/hadoop/conf/hadoop-metrics2.properties}} file telling RM to push to Ganglia (resourcemanager.sink.ganglia.servers). YARN should provide per application-type and state statistics - Key: YARN-1001 URL: https://issues.apache.org/jira/browse/YARN-1001 Project: Hadoop YARN Issue Type: Task Components: api Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Attachments: YARN-1001.1.patch In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number
[ https://issues.apache.org/jira/browse/YARN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758184#comment-13758184 ] Daryn Sharp commented on YARN-1146: --- [~vinodkv] I'm desynch'ing the ADTSM on HADOOP-9930. Is it ok for me to exasperate this seq number handling? RM DTSM and RMStateStore mismanage sequence number -- Key: YARN-1146 URL: https://issues.apache.org/jira/browse/YARN-1146 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp {{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and {{updateStoredToken}} (renew) to pass the token and its sequence number to {{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}. There are two problems: # The assumption is that new tokens will be synchronously stored in-order. With an async secret manager this may not hold true and the state's sequence number may be incorrect. # A token renewal will reset the state's sequence number to _that token's_ sequence number. Bug #2 is generally masked. Creating a new token (with the first caveat) will bump the state's sequence number back up. Restoring the dtsm will first set the state's stored sequence number, then re-add all the tokens which will update the sequence number if the token's sequence number is greater than the dtsm's current sequence number. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1065: Attachment: YARN-1065.8.patch create function getPrefixServiceName in AuxiliaryServiceHelper to eliminate duplicate code NM should provide AuxillaryService data to the container Key: YARN-1065 URL: https://issues.apache.org/jira/browse/YARN-1065 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, YARN-1065.8.patch Start container returns auxillary service data to the AM but does not provide the same information to the task itself. It could add that information to the container env with key=service_name and value=service_data. This allows the container to start using the service without having to depend on the AM to send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-957: - Attachment: YARN-957-20130904.2.patch Same patch with trailing white spaces removed. Will commit when Jenkins says okay. Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch, YARN-957-20130904.2.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests
[ https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1119: Attachment: YARN-1119-v1-b23.patch Patch posted for branch 0.23. Add ClusterMetrics checks to tho TestRMNodeTransitions tests Key: YARN-1119 URL: https://issues.apache.org/jira/browse/YARN-1119 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha Reporter: Robert Parker Attachments: YARN-1119-v1-b23.patch YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758268#comment-13758268 ] Hadoop QA commented on YARN-957: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601443/YARN-957-20130904.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1835//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1835//console This message is automatically generated. Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch, YARN-957-20130904.2.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Summary: Add admin support for HA operations (was: Implement YarnHAAdmin for HA specific admin operations) Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Implement YarnHAAdmin along the lines of DFSHAAdmin for HA-specific admin operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Target Version/s: 2.3.0 (was: 2.1.1-beta) Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla To transitionTo{Active,Standby} etc. we should support admin operations the same way DFS does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758303#comment-13758303 ] Omkar Vinit Joshi commented on YARN-1107: - Thanks vinod.. bq. In DelegationTokenRenewer, you are breaking the following assumption, which was put in via YARN-280 yeah fixed it. bq. Leave a comment in DelegationTokenRenewer.serviceStart() as to what we are really doing w.r.t pendingTokenForRenewal. Yes added one. bq. Not just in the test-code, can you move the token-short-circuit setting from ClientRMService into DelegationTokenRenewer? fixed. moved the code from ClientRMService to DelegationTokenRenewer. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1148) NM should only return requested auxillary service data to the AM
Xuan Gong created YARN-1148: --- Summary: NM should only return requested auxillary service data to the AM Key: YARN-1148 URL: https://issues.apache.org/jira/browse/YARN-1148 Project: Hadoop YARN Issue Type: Task Reporter: Xuan Gong Right now, Start container returns all auxillary service data to the AM. AM can set request through ContainerLauchContext, and NM should only return the request auxillary service data to AM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1148) NM should only return requested auxillary service data to the AM
[ https://issues.apache.org/jira/browse/YARN-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1148: --- Assignee: Xuan Gong NM should only return requested auxillary service data to the AM Key: YARN-1148 URL: https://issues.apache.org/jira/browse/YARN-1148 Project: Hadoop YARN Issue Type: Task Reporter: Xuan Gong Assignee: Xuan Gong Right now, Start container returns all auxillary service data to the AM. AM can set request through ContainerLauchContext, and NM should only return the request auxillary service data to AM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758348#comment-13758348 ] Hudson commented on YARN-1065: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4367 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4367/]) YARN-1065. NM should provide AuxillaryService data to the container (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520135) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java NM should provide AuxillaryService data to the container Key: YARN-1065 URL: https://issues.apache.org/jira/browse/YARN-1065 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, YARN-1065.8.patch Start container returns auxillary service data to the AM but does not provide the same information to the task itself. It could add that information to the container env with key=service_name and value=service_data. This allows the container to start using the service without having to depend on the AM to send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758375#comment-13758375 ] Daryn Sharp commented on YARN-707: -- +1 Looks good enough to me. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758308#comment-13758308 ] Bikas Saha commented on YARN-1098: -- Not a big fan of anonymous classes. We should probably create an ActiveServices that extends CompositeService. We can later add transitionToActive() and transitionToStandby() method to this object. Dispatcher can actually also go into ActiveServices for now. We can move it into the main service later on because it looks like that the HAProtocol service will be the only always on service to start with. Jenkins is not happy with the patch. Separate out RM services into Always On and Active -- Key: YARN-1098 URL: https://issues.apache.org/jira/browse/YARN-1098 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1098-1.patch, yarn-1098-approach.patch, yarn-1098-approach.patch From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1107: Attachment: YARN-1107.20130904.1.patch Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
Ramya Sunil created YARN-1149: - Summary: NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Fix For: 2.1.1-beta When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-890: -- Assignee: Xuan Gong The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart
[ https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758419#comment-13758419 ] Hadoop QA commented on YARN-1107: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601476/YARN-1107.20130904.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1836//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1836//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1836//console This message is automatically generated. Job submitted with Delegation token in secured environment causes RM to fail during RM restart -- Key: YARN-1107 URL: https://issues.apache.org/jira/browse/YARN-1107 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: rm.log, YARN-1107.20130828.1.patch, YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-707) Add user info in the YARN ClientToken
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-707: Fix Version/s: 0.23.10 I committed this to branch-0.23. Add user info in the YARN ClientToken - Key: YARN-707 URL: https://issues.apache.org/jira/browse/YARN-707 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Jason Lowe Priority: Blocker Fix For: 3.0.0, 0.23.10, 2.1.1-beta Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt If user info is present in the client token then it can be used to do limited authz in the AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Attachment: yarn-1068-prelim.patch I am uploading a preliminary patch adds admin support for HA operations for any feedback on the approach. The patch is very much along the lines of HDFS admin implementation and reuses the common code for the same. Outline: # RMHAProtocolService starts an RPC server for HA commands. # yarn rmhaadmin command invokes RMHAdminCLI which extends HAAdmin I haven't figured out how to use the ClientRMProxy while using HAAdmin yet. Would love to hear any thoughts/inputs on that. Pending tasks: (1) yarn-site, (2) RPC server instantiation through YARNRPC calls like in AdminService. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1068-prelim.patch To transitionTo{Active,Standby} etc. we should support admin operations the same way DFS does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1134) Add support for zipping/unzipping logs while in transit for the NM logs web-service
[ https://issues.apache.org/jira/browse/YARN-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758494#comment-13758494 ] Chris Nauroth commented on YARN-1134: - Is the intent to serve actual compressed files (i.e. it has a .gz extension), or is the intent to layer compression over the HTTP transfer (i.e. the Transfer-Encoding: gzip HTTP header). The original comment about how it will take a long time to download makes me think the latter is appropriate. Add support for zipping/unzipping logs while in transit for the NM logs web-service --- Key: YARN-1134 URL: https://issues.apache.org/jira/browse/YARN-1134 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli As [~zjshen] pointed out at [YARN-649|https://issues.apache.org/jira/browse/YARN-649?focusedCommentId=13698415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698415], {quote} For the long running applications, they may have a big log file, such that it will take a long time to download the log file via the RESTful API. Consequently, HTTP connection may timeout before downloading before downloading a complete log file. Maybe it is good to zip the log file before sending it, and unzip it after receiving it. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1134) Add support for zipping/unzipping logs while in transit for the NM logs web-service
[ https://issues.apache.org/jira/browse/YARN-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758497#comment-13758497 ] Chris Nauroth commented on YARN-1134: - Also, if the latter is appropriate, then you may want to test the existing code by sending an HTTP request with the header Accept-Encoding: gzip to see if the response comes back compressed. Many web servers support this out of the box, though I'm not sure about Jetty specifically. Add support for zipping/unzipping logs while in transit for the NM logs web-service --- Key: YARN-1134 URL: https://issues.apache.org/jira/browse/YARN-1134 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli As [~zjshen] pointed out at [YARN-649|https://issues.apache.org/jira/browse/YARN-649?focusedCommentId=13698415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698415], {quote} For the long running applications, they may have a big log file, such that it will take a long time to download the log file via the RESTful API. Consequently, HTTP connection may timeout before downloading before downloading a complete log file. Maybe it is good to zip the log file before sending it, and unzip it after receiving it. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758499#comment-13758499 ] Karthik Kambatla commented on YARN-1098: Thanks Bikas. Will create an ActiveServices inner class and move everything there. TestRMRestart#testAppAttemptTokensRestoredOnRMRestart is flakey with the patch - will investigate further. Separate out RM services into Always On and Active -- Key: YARN-1098 URL: https://issues.apache.org/jira/browse/YARN-1098 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1098-1.patch, yarn-1098-approach.patch, yarn-1098-approach.patch From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758585#comment-13758585 ] Xuan Gong commented on YARN-890: When we get totalMemory from ClusterMetricInfo, it had already been rounded up. So, use clusterResource from ResourceScheduler to get real memory from Cluster. The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-890: --- Attachment: YARN-890.1.patch The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1149: --- Assignee: Xuan Gong NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.1-beta When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758611#comment-13758611 ] Jian He commented on YARN-540: -- Finally come to the conclusion that removeApplicationState immediately after attempt unregister. This combined with MAPREDUCE-5497 can significantly reduce the race here. Once work-preserving restart is implemented, this jira should not be a problem as there's no notion of relaunching a new AM in work-preserving restart, the old AM will just spin and resync with RM after RM restarts. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-540: - Attachment: YARN-540.3.patch upload a new patch that - removeApplicationState in RMAppAttempt.AMUnregisteredTransistion and RMApp.FinalTransition - rename RMAppEventType.ATTEMPT_FINISHING to ATTEMPT_UNREGISTERED Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1070: -- Attachment: YARN-1070.3.patch Thanks Vinod for your review. I've updated the patch accordingly. The important change in this patch is that I removed the logic of canceling ContainerLaunch.call(), and in call(), I checked the container state first, returned immediately if the container is not at LOCALIZED, and send CONTAINER_KILLED_ON_REQUEST if necessary. The rationale of checking the container state is that the thread of ContainerLaunch.call() is scheduled and should be executed after the container enters LOCALIZED. As this thread can run parallel with the thread of ContainerImpl, the container is free to move on to some other state, which can be either RUNNING, EXIT_WITH_FAILURE or KILLING. The first two should be triggered by the event send from ContainerLaunch.call(), while KILLING is caused by a kill event. Therefore, when ContainerLaunch.call() is started, we check the container state. If it is KILLING, ContainerLaunch.call() can stop immediately, which is equivalent to the cancel operation which is removed in ContainersLauncher. Actually, it should even be better, because Future.cancel will not terminate call() immediately. On the other side, if at this point the container state is still LOCALIZED, call() will move on. Then, if the container state changes to KILLING in the midway, we just ignore it let call() finish as usual. It does no harm because when the container reaches KILLING, CLEANUP_CONTAINER is scheduled or is started. ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1070 URL: https://issues.apache.org/jira/browse/YARN-1070 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.1.patch Add a new AppShutDownTransition to handle ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1149.1.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758637#comment-13758637 ] Hadoop QA commented on YARN-1070: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601531/YARN-1070.3.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1840//console This message is automatically generated. ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1070 URL: https://issues.apache.org/jira/browse/YARN-1070 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758642#comment-13758642 ] Hudson commented on YARN-957: - SUCCESS: Integrated in Hadoop-trunk-Commit #4369 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4369/]) YARN-957. Fixed a bug in CapacityScheduler because of which requests that need more than a node's total capability were incorrectly allocated on that node causing apps to hang. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520187) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java Capacity Scheduler tries to reserve the memory more than what node manager reports. --- Key: YARN-957 URL: https://issues.apache.org/jira/browse/YARN-957 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, YARN-957-20130830.1.patch, YARN-957-20130904.1.patch, YARN-957-20130904.2.patch I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758651#comment-13758651 ] Hadoop QA commented on YARN-540: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601527/YARN-540.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1838//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1838//console This message is automatically generated. Race condition causing RM to potentially relaunch already unregistered AMs on RM restart Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, YARN-540.patch, YARN-540.patch When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
[ https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758666#comment-13758666 ] Vinod Kumar Vavilapalli commented on YARN-1070: --- The argument is reasonable. bq. On the other side, if at this point the container state is still LOCALIZED, call() will move on. Then, if the container state changes to KILLING in the midway, we just ignore it let call() finish as usual. It does no harm because when the container reaches KILLING, CLEANUP_CONTAINER is scheduled or is started. We do have one more check just before we launch the process. We should do the same stack-check there too. Also, as part of ContainerLaunch.cleanupContainer(), we should try to cancel the Callable. Taking a step back, this approach will work, though the code is hard to read for me. A very simple state machine should make this code a lot cleaner. ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL - Key: YARN-1070 URL: https://issues.apache.org/jira/browse/YARN-1070 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758734#comment-13758734 ] Vinod Kumar Vavilapalli commented on YARN-292: -- Thanks for the logs Nemon. Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put :) The ptach isn't applying anymore, can you please update? Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence. ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K Assignee: Zhijie Shen Attachments: ArrayIndexOutOfBoundsException.log, YARN-292.1.patch, YARN-292.2.patch, YARN-292.3.patch {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-1145: Attachment: YARN-1145.patch Thank you Vinod Kumar Vavilapalli and Jason Lowe for reviewing patch :-) I have addressed Vinode comments and attached updated patch. Please review updated patch. Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch, YARN-1145.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-1145: Attachment: YARN-1145.1.patch Handled clean up during reader creation.Previous patch misses this clean up. Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, YARN-1145.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui
[ https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-1145: Attachment: YARN-1145.2.patch Please ignore YARN-1145.1.patch. All the comments has been fixed in YARN-1145.2.patch. Please consider this patch for review. Potential file handle leak in aggregated logs web ui Key: YARN-1145 URL: https://issues.apache.org/jira/browse/YARN-1145 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: MAPREDUCE-5486.patch, YARN-1145.1.patch, YARN-1145.2.patch, YARN-1145.patch Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser: jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser: netstat -tanlp |grep 50010 tcp0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira