[
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Advertising
Devaraj K updated YARN-3646:
----------------------------
Hadoop Flags: Reviewed
+1, latest patch looks good to me.
Thanks [~raju.bairishetti] for reporting and contribution, Thanks
[~rohithsharma] for review.
> Applications are getting stuck some times in case of retry policy forever
> -------------------------------------------------------------------------
>
> Key: YARN-3646
> URL: https://issues.apache.org/jira/browse/YARN-3646
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Reporter: Raju Bairishetti
> Assignee: Raju Bairishetti
> Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch
>
>
> We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER
> retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is
> using retrying policy as FOREVER. The problem is it is retrying for all kinds
> of exceptions (like ApplicationNotFoundException), even though it is not a
> connection failure. Due to this my application is not progressing further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application
> report for an invalid or older appId. ResourceManager is throwing an
> ApplicationNotFoundException as this is an invalid or older appId. But
> because of retry policy FOREVER, client is keep on retrying for getting the
> application report and ResourceManager is throwing
> ApplicationNotFoundException continuously.
> {code}
> private void testYarnClientRetryPolicy() throws Exception{
> YarnConfiguration conf = new YarnConfiguration();
> conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS,
> -1);
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> ApplicationId appId = ApplicationId.newInstance(1430126768987L,
> 10645);
> ApplicationReport report = yarnClient.getApplicationReport(appId);
> }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
> from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
> with id 'application_1430126768987_10645' doesn't exist in RM.
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> ....
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
> from 10.14.120.231:61621 Call#875163 Retry#0
> ....
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)