[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644695#comment-13644695 ] Xuan Gong commented on YARN-513: bq:We could try to reuse existing RetryPolicy etc inside RMClient as long as we maintain the RMClient abstraction. Reuse the RetryPolicy in new patch. The RetryInvocationHandler provides the retry logic in its invoke method. We can reuse that bq:Are we not missing an RMClient.disconnect()? This one would internally stop the proxy? Yes, we need that. Adding the disconnect code in the new patch bq:Looks like NMStatusUpdater.getRMClient() can be removed because createRMClient() is being overridden by all tests. Removed from the new patch bq:Why are we throwing YARNException? Original code throws the YarnException, now i want to keep consistant. And I think we will change the exception thru YARN-142. bq:Is any test explicitly testing the new code with a real RM? How about manually doing it? Tested the new code in single node cluster Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644891#comment-13644891 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581001/YARN-513.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1366 javac compiler warnings (more than the trunk's current 1365 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/838//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/838//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/838//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645042#comment-13645042 ] Hadoop QA commented on YARN-513: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581065/YARN-513.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/843//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/843//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644122#comment-13644122 ] Vinod Kumar Vavilapalli commented on YARN-513: -- See HDFS-3504 for related discussion. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644174#comment-13644174 ] Bikas Saha commented on YARN-513: - If this jira is going to address NM then how about we change the title to reflect that and open jiras for the other clients? Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644176#comment-13644176 ] Bikas Saha commented on YARN-513: - Overall approach looks good. We could try to reuse existing RetryPolicy etc inside RMClient as long as we maintain the RMClient abstraction. Detailed review comments after we decide on what we want to resuse. Are we not missing an RMClient.disconnect()? This one would internally stop the proxy? Looks like NMStatusUpdater.getRMClient() can be removed because createRMClient() is being overridden by all tests. Why are we throwing YARNException? Is any test explicitly testing the new code with a real RM? How about manually doing it? Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644186#comment-13644186 ] Xuan Gong commented on YARN-513: After reading the code in HDFS-3504, we can definitely reuse the existing RetryPolicy code. We can use RetryPolicies.retryUpToMaximumTimeWithFixedSleep as defaultPolicy. We can also use RetryPolicies.retryByException to set a default policy with some explicit handlers for specific exceptions. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643551#comment-13643551 ] Hadoop QA commented on YARN-513: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580819/YARN-513.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/833//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/833//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643521#comment-13643521 ] Xuan Gong commented on YARN-513: This patch changes NodeStatusUpdaterImpl.java only. We can do for YarnClient, AppMaster And RMAdminClient separately by calling the same functions. The idea is When the RM is restarting, the NM, AM, RMAdminClient and YarnClient should wait for some time for the RM to be back up. The problem is we will apply the similar wait and retry logic to all of them, so there will be lots of duplicate code. We try to find out if we can write the retry logic into a single place that all of them can reuse the code. 1. Created RMClient.java. In this java file, there are two generic functions, one is for creating the proxy object, the other one is for calling the APIs(The retry-sleep logic is in it). By give different type of parameters, we can get the different objects and call different apis. 2. Created LocalRMClient.java. This java file is only for the testing purpose. 3. Changed registerWithRM() and startStatusUpdater() from NodeStatusUpdaterImpl by calling generic function from RMClient 4. fix all the test cases Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643526#comment-13643526 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580818/YARN-513.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/832//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/832//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/832//console This message is automatically generated. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640150#comment-13640150 ] nemon lou commented on YARN-513: What about admin client? refreshQueues,refreshNodes,etc. These will be needed in HA. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640574#comment-13640574 ] Vinod Kumar Vavilapalli commented on YARN-513: -- Good catch Nemon! Yes, we should do this for RMAdminClient too. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630971#comment-13630971 ] Xuan Gong commented on YARN-513: Oh, yes, you are right. We need to handle more cases. For Client: getNewApplication(),submitApplication(),killApplication(),getApplicationReport(),getApplicationList(),getYarnClusterMetrics(),getNodeReports(),getRMDelegationToken(),getQueueInfo(),getQueueAclsInfo(),getAllQueues(),getRootQueueInfos() and getChildQueueInfos() For ApplicationMaster: registerApplicationMaster(),allocate() and unregisterApplicationMaster() Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630672#comment-13630672 ] Xuan Gong commented on YARN-513: From ApplicationMaster perspective: 1. The very first communication it will have with the RM is for Register itself with RM which is from AMRMClientImpl::registerApplicationMaster(), so we can add waitting logic here, to try several times until it is accepted or throw out the exceptions From Client Perspective: 1. The very first communication it will have with the RM is getNewApplication(), which is in YarnClientImpl::getNewApplication(request), we can add waitting logic here. In order to do that, we need add several const and variables to YarnConfiguration, such as AM_RM_CONNECTION_RETRY_INTERVAL_SECS, AM_RM_CONNECT_WAIT_SECS, CLIENT_RM_CONNECTION_RETRY_INTERVAL_SECS and CLIENT_RM_CONNECTION_WAIT_SECS. Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630677#comment-13630677 ] Bikas Saha commented on YARN-513: - What about other interactions with the RM such as allocate() or finishApplicationMaster() Verify all clients will wait for RM to restart -- Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira