[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602080#comment-13602080 ] Vinod Kumar Vavilapalli commented on YARN-71: - Some more comments: - In case of errors, can you say need to be manually deleted instead of just need to be deleted? - Please add tests to -- verify fileCache and NM_PRIVATE_DIR deletion -- verify deleteHistoricalLocalDirs by rebooting NM multiple times when a previous deletion was in progress Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-141) NodeManager shuts down if it can't find the ResourceManager
[ https://issues.apache.org/jira/browse/YARN-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-141. -- Resolution: Duplicate Duplicated by YARN-196. NodeManager shuts down if it can't find the ResourceManager --- Key: YARN-141 URL: https://issues.apache.org/jira/browse/YARN-141 Project: Hadoop YARN Issue Type: Bug Reporter: Ahmed Radwan Assignee: Ahmed Radwan When starting yarn services, and if the NodeManager is started but the ResourceManager is not. The NodeManager tries 10 times (the default setting), and then shuts down. I understand that this default setting can be changed and possibly wait for a longer period, but I think it is better to keep the NodeManager trying without shutting down. This can accomodate cases where the ResourceManager is late to start for any problems and it will preserve the same behavior to what the DataNode does when it cannot find the NameNode at startup as it keeps trying without shutting down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602139#comment-13602139 ] Zhijie Shen commented on YARN-378: -- {quote} Env vars are brittle.. {quote} Does the env method work with other applications? I can see the merit of embedding maxAppAttempts into AM registration response is that the number can be read by AM of other applications in the same way. I think we can begin to discuss the issue related to informing AM of maxAppAttempts in MAPREDUCE-5062. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602154#comment-13602154 ] Hadoop QA commented on YARN-378: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573698/YARN-378_7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/512//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/512//console This message is automatically generated. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log
Aleksey Gorshkov created YARN-478: - Summary: fix coverage org.apache.hadoop.yarn.webapp.log Key: YARN-478 URL: https://issues.apache.org/jira/browse/YARN-478 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-478-trunk.patch fix coverage org.apache.hadoop.yarn.webapp.log -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.webapp.log
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Summary: coverage fix for org.apache.hadoop.yarn.webapp.log (was: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter ) coverage fix for org.apache.hadoop.yarn.webapp.log -- Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.webapp.log
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Description: coverage fix for org.apache.hadoop.yarn.webapp.log patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 was: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 coverage fix for org.apache.hadoop.yarn.webapp.log -- Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.webapp.log patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Summary: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter (was: coverage fix for org.apache.hadoop.yarn.webapp.log) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix for org.apache.hadoop.yarn.webapp.log patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-468: -- Description: coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 was: coverage fix for org.apache.hadoop.yarn.webapp.log patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602307#comment-13602307 ] Hadoop QA commented on YARN-468: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573308/YARN-468-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/513//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/513//console This message is automatically generated. coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-468-trunk.patch coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log
[ https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-478: -- Attachment: (was: YARN-478-trunk.patch) fix coverage org.apache.hadoop.yarn.webapp.log -- Key: YARN-478 URL: https://issues.apache.org/jira/browse/YARN-478 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-478-trunk.patch fix coverage org.apache.hadoop.yarn.webapp.log one patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log
[ https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-478: -- Attachment: YARN-478-trunk.patch fix coverage org.apache.hadoop.yarn.webapp.log -- Key: YARN-478 URL: https://issues.apache.org/jira/browse/YARN-478 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-478-trunk.patch fix coverage org.apache.hadoop.yarn.webapp.log one patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log
[ https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602319#comment-13602319 ] Hadoop QA commented on YARN-478: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573718/YARN-478-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/514//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/514//console This message is automatically generated. fix coverage org.apache.hadoop.yarn.webapp.log -- Key: YARN-478 URL: https://issues.apache.org/jira/browse/YARN-478 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Attachments: YARN-478-trunk.patch fix coverage org.apache.hadoop.yarn.webapp.log one patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter
[ https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602326#comment-13602326 ] Hudson commented on YARN-468: - Integrated in Hadoop-trunk-Commit #3469 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3469/]) YARN-468. coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter (Aleksey Gorshkov via bobby) (Revision 1456458) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1456458 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter - Key: YARN-468 URL: https://issues.apache.org/jira/browse/YARN-468 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-468-trunk.patch coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602333#comment-13602333 ] Robert Joseph Evans commented on YARN-378: -- Using the environment variables works for other applications too. That is the only way to get some pieces of critical information that are needed for registration with the RM. On Windows there are limits http://msdn.microsoft.com/en-us/library/windows/desktop/ms682653%28v=vs.85%29.aspx But they should not cause too much of an issue on Windows Server 2008 and above. I would prefer for us to only return the information to the AM one way. Either though thrift or through the environment variable just so there is less confusion, but I am not adamant about it. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602341#comment-13602341 ] Robert Joseph Evans commented on YARN-378: -- Looking at the code too I am fine with renaming retries to attempts. But we need to mark this JIRA as an incompatible change or put in a deprecated config mapping. We are early enough in YARN that deprecating it seems like a waste. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602435#comment-13602435 ] jian he commented on YARN-237: -- Thanks, Robert ! Refreshing the RM page forgets how many rows I had in my Datatables --- Key: YARN-237 URL: https://issues.apache.org/jira/browse/YARN-237 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 Reporter: Ravi Prakash Assignee: jian he Labels: usability Fix For: 3.0.0, 2.0.5-beta Attachments: YARN-237.patch, YARN-237.v2.patch, YARN-237.v3.patch, YARN-237.v4.patch If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows. This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602437#comment-13602437 ] Hudson commented on YARN-237: - Integrated in Hadoop-trunk-Commit #3471 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3471/]) YARN-237. Refreshing the RM page forgets how many rows I had in my Datatables (jian he via bobby) (Revision 1456536) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1456536 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/CountersBlock.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsTasksBlock.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsTasksPage.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java Refreshing the RM page forgets how many rows I had in my Datatables --- Key: YARN-237 URL: https://issues.apache.org/jira/browse/YARN-237 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 Reporter: Ravi Prakash Assignee: jian he Labels: usability Fix For: 3.0.0, 2.0.5-beta Attachments: YARN-237.patch, YARN-237.v2.patch, YARN-237.v3.patch, YARN-237.v4.patch If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows. This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602481#comment-13602481 ] Bikas Saha commented on YARN-378: - env vars are brittle from an api point of view. windows supports such use cases fine. the point being for a application developers the information should come from the api, and not from a combination of api and env. env requires an agent on the other side to set the env apart from the info coming from the api itself. here is works because the agent on the other side happens to be the NM which is in our control. To summarize, lets agree to keep this in the API as it exists in the patch. For the MR AM's sake, we could additionally add the information in the env also. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602516#comment-13602516 ] Vinod Kumar Vavilapalli commented on YARN-378: -- Bikas, as of today, env is also part of the API, see the env vars in the public class ApplicationConstants. The correct way to avoid env vars if at all is to pass in another named file/resource before container launch, so that AMs/Containers can load them for initial settings. We need that anyways, so let's continue to put it in env for now (and not introduce multiple ways of access), and fix it (if need be) separately. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602542#comment-13602542 ] Vinod Kumar Vavilapalli commented on YARN-378: -- bq. But we need to mark this JIRA as an incompatible change or put in a deprecated config mapping. We are early enough in YARN that deprecating it seems like a waste. +1, unfortunately YARN JIRA setup is messed up, so cannot set the incompatible field for now, will file an INFRA ticket. Will put this in INCOMPATIBLE section of CHANGES.txt. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1
[ https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602614#comment-13602614 ] Eli Reisman commented on YARN-226: -- Hmm, that stirs up some trouble. Giraph tasks may or may not need to be contiguous ID's but will need a task 0 for at least one of the reserved containers (so container 2 right now is the one) in order to bootstrap our master election process. I am using container Id's to translate into giraph task Id's rigth now by just subtracting two from container Id! It works in all my tests, but the reservation thing could kick in on big asks (1000 container ask etc.) is that what you're saying? How big can the ask be? Perhaps I can move this bootstrap stuff from Giraph into my app master if this is a big problem. Good to know, thanks! Log aggregation should not assume an AppMaster will have containerId 1 -- Key: YARN-226 URL: https://issues.apache.org/jira/browse/YARN-226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth In case of reservcations, etc - AppMasters may not get container id 1. We likely need additional info in the CLC / tokens indicating whether a container is an AM or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1
[ https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602661#comment-13602661 ] Robert Joseph Evans commented on YARN-226: -- Big means amount of memory/CPU relative to the minimum allocation size. For example you ask for a 4 GB container with a min allocation size of 500MB. Log aggregation should not assume an AppMaster will have containerId 1 -- Key: YARN-226 URL: https://issues.apache.org/jira/browse/YARN-226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth In case of reservcations, etc - AppMasters may not get container id 1. We likely need additional info in the CLC / tokens indicating whether a container is an AM or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1
[ https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602667#comment-13602667 ] Hitesh Shah commented on YARN-226: -- @Sid, nevermind - reservations could effectively increment the id and never assign 1 to anything. @Eli, this will occur on clusters when the AM resource ask is greater than a single slot and it requires multiple scheduling cycles before a large free slot is available to launch the AM. Log aggregation should not assume an AppMaster will have containerId 1 -- Key: YARN-226 URL: https://issues.apache.org/jira/browse/YARN-226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth In case of reservcations, etc - AppMasters may not get container id 1. We likely need additional info in the CLC / tokens indicating whether a container is an AM or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-378: - Attachment: YARN-378_8.patch Make the patch be restricted in the scope of YARN only. In addition, maxAppRetries is set in the environment when launching AM. The global maxAppRetries is validated when RM is initiated, if it is non-positive, RM will crash. Note that MRAppMaster and TestStagingCleanup have reference to YarnConfiguration.RM_AM_MAX_RETRIES, which has been changed to YarnConfiguration.RM_AM_MAX_ATTEMPTS. Therefore, the build of mapreduce will be broken temporally. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602716#comment-13602716 ] Hadoop QA commented on YARN-378: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573761/YARN-378_8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/515//console This message is automatically generated. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602719#comment-13602719 ] Hitesh Shah commented on YARN-196: -- @Xuan, latest patch looks good. Addressing a very minor nit and uploading it. Will commit as soon as jenkins does a +1. Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM. --- Key: YARN-196 URL: https://issues.apache.org/jira/browse/YARN-196 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Ramgopal N Assignee: Xuan Gong Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, YARN-196.11.patch, YARN-196.12.patch, YARN-196.1.patch, YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch If NM is started before starting the RM ,NM is shutting down with the following error {code} ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) ... 3 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) at $Proxy23.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 5 more Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) at org.apache.hadoop.ipc.Client.call(Client.java:1141) at org.apache.hadoop.ipc.Client.call(Client.java:1100) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) ... 7 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) at org.apache.hadoop.ipc.Client.call(Client.java:1117) ... 9 more 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at
[jira] [Updated] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM.
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-196: - Attachment: YARN-196.12.1.patch Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM. --- Key: YARN-196 URL: https://issues.apache.org/jira/browse/YARN-196 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Ramgopal N Assignee: Xuan Gong Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, YARN-196.11.patch, YARN-196.12.1.patch, YARN-196.12.patch, YARN-196.1.patch, YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch If NM is started before starting the RM ,NM is shutting down with the following error {code} ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) ... 3 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) at $Proxy23.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 5 more Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) at org.apache.hadoop.ipc.Client.call(Client.java:1141) at org.apache.hadoop.ipc.Client.call(Client.java:1100) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) ... 7 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) at org.apache.hadoop.ipc.Client.call(Client.java:1117) ... 9 more 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) at java.lang.Thread.run(Thread.java:619) 2012-01-16
[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
[ https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-473: - Labels: usability (was: ) Capacity Scheduler webpage and REST API not showing correct number of pending applications -- Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp Labels: usability The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602767#comment-13602767 ] Hadoop QA commented on YARN-196: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573764/YARN-196.12.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/516//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/516//console This message is automatically generated. Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM. --- Key: YARN-196 URL: https://issues.apache.org/jira/browse/YARN-196 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Ramgopal N Assignee: Xuan Gong Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, YARN-196.11.patch, YARN-196.12.1.patch, YARN-196.12.patch, YARN-196.1.patch, YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch If NM is started before starting the RM ,NM is shutting down with the following error {code} ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) ... 3 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) at $Proxy23.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 5 more Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) at org.apache.hadoop.ipc.Client.call(Client.java:1141) at org.apache.hadoop.ipc.Client.call(Client.java:1100) at
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602788#comment-13602788 ] Bikas Saha commented on YARN-378: - I am in favor of setting the value in env in addition to the api. I want it in the api to encourage other app developers to do the desired thing and obtain such (and other) information from the RM upon registration. This is different from the use case of the application attempt id where we need something before contacting the RM. I also took a quick look at the MR AM code. Its currently reading the value from config and the only use is setting the isLastAMRetry value. The isLastRetry value is later used during job shutdown. Job shutdown will happen after services.start(). So it should not be a terribly large change to get and use the retry value after registration. registration happens during services.start(). ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
Hitesh Shah created YARN-479: Summary: NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602817#comment-13602817 ] Hadoop QA commented on YARN-378: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573777/YARN-378_9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/517//console This message is automatically generated. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-474: - Assignee: Zhijie Shen CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.
[ https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-275. -- Resolution: Duplicate Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not. -- Key: YARN-275 URL: https://issues.apache.org/jira/browse/YARN-275 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: Prototype.txt, YARN-270.1.patch Update HeartBeat info on RMNode Side, and CS read the info directly from each RMNode -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job
[ https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-477: - Assignee: Zhijie Shen When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job --- Key: YARN-477 URL: https://issues.apache.org/jira/browse/YARN-477 Project: Hadoop YARN Issue Type: Bug Reporter: Eli Reisman Assignee: Zhijie Shen I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch my App Master, if the container command line runs it successfully, any failure in the App Master or my launched Giraph Tasks promptly reports to Client and ends my job run. However, if the command line sent to the app master container fails to launch it at all, the error exit code is not propagating. My client hangs with the job at containersUsed == 1 and state == ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way out. Disclaimer: this could be my fault. But I wanted to throw it out there in case its not. I also (when this happens) not getting error logs since the app master never launched, so I really have no visibility into why it failed to launch. I am sure its not launching, but the client IS sending the app request, getting a container for my AM, and I see the command line run on the container in my logs. Thats all. Thanks! If this is a dup or won't fix for some reason, let me know and sorry for wasting your time! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job
[ https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602892#comment-13602892 ] Eli Reisman commented on YARN-477: -- Duh. Too many issues fixed all at once, they are all running together in my mind. OK, going over this again, this is happening during my integration tests with MiniYARNCluster, not on the real cluster. So perhaps the real YARN implementation handles propagating the error to the client and RM (etc) when the command line the client tries to use to launch the container for the AM fails. I think its the MiniYARNCluster that is not handling this situation correctly. Again, the issue is: Client starts fine. Creates AMContainerSpec stuff and tries to request AM container. This request includes the shell command to launch our AM in the container. Container shows up as being granted and provisioned by RM, but from there the client hangs waiting for job success/fail, saying it has 1 container used the whole time (the AM failed container.) What seems to be happening is this shell script fails in launching the AM in its container, so the container just sits there forever. Lets check this in MiniYARNCluster and see. I will try to break the Giraph MiniYARNCluster test again and recreate some decent log traces leading up to the event and I will post here. Thanks! When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job --- Key: YARN-477 URL: https://issues.apache.org/jira/browse/YARN-477 Project: Hadoop YARN Issue Type: Bug Reporter: Eli Reisman Assignee: Zhijie Shen I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch my App Master, if the container command line runs it successfully, any failure in the App Master or my launched Giraph Tasks promptly reports to Client and ends my job run. However, if the command line sent to the app master container fails to launch it at all, the error exit code is not propagating. My client hangs with the job at containersUsed == 1 and state == ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way out. Disclaimer: this could be my fault. But I wanted to throw it out there in case its not. I also (when this happens) not getting error logs since the app master never launched, so I really have no visibility into why it failed to launch. I am sure its not launching, but the client IS sending the app request, getting a container for my AM, and I see the command line run on the container in my logs. Thats all. Thanks! If this is a dup or won't fix for some reason, let me know and sorry for wasting your time! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job
[ https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602895#comment-13602895 ] Eli Reisman commented on YARN-477: -- nodemanager log for MiniYARNCluster DID get a log report for app master that could only come from the shell command failing: {code} Exception in thread main java.lang.NoClassDefFoundError: org/apache/giraph/yarn/GiraphApplicationMaster Caused by: java.lang.ClassNotFoundException: org.apache.giraph.yarn.GiraphApplicationMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) {code} So thats good. But I don't think its propagating this to the MiniYARNCluster's RM or my Client. From my Client's end, the logs are endless heartbeat msg's with -1000 exitCode until I ctrl-c out of the test suite. FYI, this is not a priority or blocker for my Giraph on YARN, it all works now (including the test) in case I wasn't clear. But it should probably get investigated/fixed soon if I've really found something here ;) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job --- Key: YARN-477 URL: https://issues.apache.org/jira/browse/YARN-477 Project: Hadoop YARN Issue Type: Bug Reporter: Eli Reisman Assignee: Zhijie Shen I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch my App Master, if the container command line runs it successfully, any failure in the App Master or my launched Giraph Tasks promptly reports to Client and ends my job run. However, if the command line sent to the app master container fails to launch it at all, the error exit code is not propagating. My client hangs with the job at containersUsed == 1 and state == ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way out. Disclaimer: this could be my fault. But I wanted to throw it out there in case its not. I also (when this happens) not getting error logs since the app master never launched, so I really have no visibility into why it failed to launch. I am sure its not launching, but the client IS sending the app request, getting a container for my AM, and I see the command line run on the container in my logs. Thats all. Thanks! If this is a dup or won't fix for some reason, let me know and sorry for wasting your time! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602898#comment-13602898 ] omkar vinit joshi commented on YARN-109: [~mayank_bansal] Are you still looking into this issue? or else I would like to take over. .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-477) MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job
[ https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated YARN-477: - Summary: MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job (was: When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job) MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job -- Key: YARN-477 URL: https://issues.apache.org/jira/browse/YARN-477 Project: Hadoop YARN Issue Type: Bug Reporter: Eli Reisman Assignee: Zhijie Shen I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch my App Master, if the container command line runs it successfully, any failure in the App Master or my launched Giraph Tasks promptly reports to Client and ends my job run. However, if the command line sent to the app master container fails to launch it at all, the error exit code is not propagating. My client hangs with the job at containersUsed == 1 and state == ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way out. Disclaimer: this could be my fault. But I wanted to throw it out there in case its not. I also (when this happens) not getting error logs since the app master never launched, so I really have no visibility into why it failed to launch. I am sure its not launching, but the client IS sending the app request, getting a container for my AM, and I see the command line run on the container in my logs. Thats all. Thanks! If this is a dup or won't fix for some reason, let me know and sorry for wasting your time! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602904#comment-13602904 ] Xuan Gong commented on YARN-71: --- Uploaded new patch: 1. move out the timestamps, so all local dirs will use the same timestamps. 2. Rewrite the rename and deletion block, create two new functions, renameLocalDir() to rename the dirs, deleteLocalDir() to delete dirs 3. change to unit test to cover: a: verify the correct user who is used by deletionService b. verify fileCache and NM_PRIVATE_DIR deletion 4. Use Records instead of RecordFatory Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602909#comment-13602909 ] Hadoop QA commented on YARN-71: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573803/YARN-71.8.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/518//console This message is automatically generated. Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-71: -- Attachment: YARN-71.9.patch Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch, YARN-71.9.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602916#comment-13602916 ] Hadoop QA commented on YARN-71: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573806/YARN-71.9.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/519//console This message is automatically generated. Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch, YARN-71.9.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602972#comment-13602972 ] Xuan Gong commented on YARN-71: --- Test the patch in a single cluster running at Centos 6: 1. config LinuxContainerExecutor 2. start namenode,datanode,resourcemanager,nodemanager 3. run a pi example 4. manually kill the nodemanager 5. found the localFiles under localDir which need to be deleted 6. restart nodemanager 7. verify the localFiles have been deleted. Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch, YARN-71.9.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603164#comment-13603164 ] Bikas Saha commented on YARN-417: - What is the deadlock here? Its late night and I cant see it :P Is it related to the exception being thrown when stop() is called on the handler thread? Is this guaranteed bad behavior and so we need to throw a runtime exception immediately? I think we need to call client.stop() after the heartbeat thread has stopped. otherwise, the heartbeat thread can call client.allocate() in between the current client.stop() and keepRunning=false, right? {code} + /** + * Tells the heartbeat and handler threads to stop and waits for them to + * terminate. Calling this method from the callback handler thread would cause + * deadlock, and thus should be avoided. + */ + @Override + public void stop() { +if (Thread.currentThread() == handlerThread) { + throw new YarnException(Cannot call stop from callback handler thread!); +} +client.stop(); +keepRunning = false; +try { + heartbeatThread.join(); {code} Didnt quite get the assert inside the loop. Perhaps you meant takeCompletedContainers()? {code} +// wait for the allocated containers from the first heartbeat's response +while (callbackHandler.takeAllocatedContainers() == null) { + Assert.assertEquals(null, callbackHandler.takeAllocatedContainers()); + Thread.sleep(10); +} {code} I think updating progress needs to be its own callback since its possible that no container allocations and completions happen for a long time and thus the heartbeats show no progress to the RM. Add a poller that allows the AM to receive notifications when it is assigned containers --- Key: YARN-417 URL: https://issues.apache.org/jira/browse/YARN-417 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, YARN-417-4.patch, YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira