[
https://issues.apache.org/jira/browse/YARN-11509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733292#comment-17733292
]
ASF GitHub Bot commented on YARN-11509:
---------------------------------------
slfan1989 commented on code in PR #5727:
URL: https://github.com/apache/hadoop/pull/5727#discussion_r1231725983
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java:
##########
@@ -1432,4 +1432,53 @@ private void finishApplication() throws IOException,
YarnException {
Assert.assertNotNull(finishResponse);
Assert.assertTrue(finishResponse.getIsUnregistered());
}
+
+ @Test
+ public void testLaunchUAMAndRegisterApplicationMasterRetry() throws
Exception {
+
+ UserGroupInformation ugi =
interceptor.getUGIWithToken(interceptor.getAttemptId());
+ interceptor.setRetryCount(2);
+
+ ugi.doAs((PrivilegedExceptionAction<Object>) () -> {
+ // Register the application
+ RegisterApplicationMasterRequest registerReq =
+ Records.newRecord(RegisterApplicationMasterRequest.class);
+ registerReq.setHost(Integer.toString(testAppId));
+ registerReq.setRpcPort(0);
+ registerReq.setTrackingUrl("");
+
+ RegisterApplicationMasterResponse registerResponse =
+ interceptor.registerApplicationMaster(registerReq);
+ Assert.assertNotNull(registerResponse);
+ lastResponseId = 0;
+
+ Assert.assertEquals(0, interceptor.getUnmanagedAMPoolSize());
+
+ // Allocate the first batch of containers, with sc1 active
+ registerSubCluster(SubClusterId.newInstance("SC-1"));
+
+ int numberOfContainers = 3;
+ List<Container> containers = getContainersAndAssert(numberOfContainers,
numberOfContainers);
+ Assert.assertEquals(1, interceptor.getUnmanagedAMPoolSize());
+
+ // Release all containers
+ releaseContainersAndAssert(containers);
+
+ // Finish the application
+ FinishApplicationMasterRequest finishReq =
+ Records.newRecord(FinishApplicationMasterRequest.class);
+ finishReq.setDiagnostics("");
+ finishReq.setTrackingUrl("");
+ finishReq.setFinalApplicationStatus(FinalApplicationStatus.SUCCEEDED);
+
+ FinishApplicationMasterResponse finishResponse =
+ interceptor.finishApplicationMaster(finishReq);
+ Assert.assertNotNull(finishResponse);
+ Assert.assertEquals(true, finishResponse.getIsUnregistered());
Review Comment:
I will modify the code.
> The FederationInterceptor#launchUAM Added retry logic.
> ------------------------------------------------------
>
> Key: YARN-11509
> URL: https://issues.apache.org/jira/browse/YARN-11509
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: amrmproxy
> Affects Versions: 3.4.0
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Minor
> Labels: pull-request-available
>
> There is a "todo" in the
> FederationInterceptor#registerAndAllocateWithNewSubClusters method. According
> to the "todo" description, the request needs to be retried to other
> subclusters, but changing the parameter requests in
> registerAndAllocateWithNewSubClusters is not a good operation. It is better
> to add retry logic here.
> We don't need to worry about losing requests because when the request cannot
> be satisfied, the AM of the task will continue to apply, and these requests
> will be properly transferred to other clusters for execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]