[
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552091#comment-14552091
]
Rohith commented on YARN-3646:
------------------------------
Thanks for updating the patch, some comments on tests
# I think we can remove the tests added in the hadoop-common project, since
yarn-client verifies required funcitionality. And basically hadoop-common test
was mocking the RMProxy functionality which test was passing without RMProxy
fix also.
# code never reach {{Assert.fail("");}}. better to remove it
# Catch the ApplicationNotFoundException instead of catching throwable. I think
you can add {{expected = ApplicationNotFoundException.class}} in the @Test
annotation like below.
{code}
@Test(timeout = 30000, expected = ApplicationNotFoundException.class)
public void testClientWithRetryPolicyForEver() throws Exception {
YarnConfiguration conf = new YarnConfiguration();
conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1);
ResourceManager rm = null;
YarnClient yarnClient = null;
try {
// start rm
rm = new ResourceManager();
rm.init(conf);
rm.start();
yarnClient = YarnClient.createYarnClient();
yarnClient.init(conf);
yarnClient.start();
// create invalid application id
ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645);
// RM should throw ApplicationNotFoundException exception
yarnClient.getApplicationReport(appId);
} finally {
if (yarnClient != null) {
yarnClient.stop();
}
if (rm != null) {
rm.stop();
}
}
}
{code}
# can you rename the test name with actual functionality test, like
{{testShouldNotRetryForeverForNonNetworkExceptions}}
> Applications are getting stuck some times in case of retry policy forever
> -------------------------------------------------------------------------
>
> Key: YARN-3646
> URL: https://issues.apache.org/jira/browse/YARN-3646
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Reporter: Raju Bairishetti
> Attachments: YARN-3646.001.patch, YARN-3646.patch
>
>
> We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER
> retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is
> using retrying policy as FOREVER. The problem is it is retrying for all kinds
> of exceptions (like ApplicationNotFoundException), even though it is not a
> connection failure. Due to this my application is not progressing further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application
> report for an invalid or older appId. ResourceManager is throwing an
> ApplicationNotFoundException as this is an invalid or older appId. But
> because of retry policy FOREVER, client is keep on retrying for getting the
> application report and ResourceManager is throwing
> ApplicationNotFoundException continuously.
> {code}
> private void testYarnClientRetryPolicy() throws Exception{
> YarnConfiguration conf = new YarnConfiguration();
> conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS,
> -1);
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> ApplicationId appId = ApplicationId.newInstance(1430126768987L,
> 10645);
> ApplicationReport report = yarnClient.getApplicationReport(appId);
> }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
> from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
> with id 'application_1430126768987_10645' doesn't exist in RM.
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> ....
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
> from 10.14.120.231:61621 Call#875163 Retry#0
> ....
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)