[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547529#comment-14547529
 ] 

Raju Bairishetti commented on YARN-3646:
----------------------------------------

bq. Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default 
policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() 
should check for Connect exceptions and handle it. Otherwise shouldRetry always 
return RetryAction.RETRY action.

 Do we need to catch exception in shouldRetry if we have separate 
exceptionToPolicy map  which contains only connectionException entry. ( like 
exceptiontoPolicyMap.put(connectionException, FOREVER polcicy))

Seems we do not even require exceptionToPolicy for FOREVER policy if we catch 
the exception in shouldRetry method.

thoughts?

> Applications are getting stuck some times in case of retry policy forever
> -------------------------------------------------------------------------
>
>                 Key: YARN-3646
>                 URL: https://issues.apache.org/jira/browse/YARN-3646
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Raju Bairishetti
>
> We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
> retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is 
> using retrying policy as FOREVER. The problem is it is retrying for all kinds 
> of exceptions (like ApplicationNotFoundException), even though it is not a 
> connection failure. Due to this my application is not progressing further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application 
> report for an invalid  or older appId. ResourceManager is throwing an 
> ApplicationNotFoundException as this is an invalid or older appId.  But 
> because of retry policy FOREVER, client is keep on retrying for getting the 
> application report and ResourceManager is throwing 
> ApplicationNotFoundException continuously.
> {code}
> private void testYarnClientRetryPolicy() throws  Exception{
>         YarnConfiguration conf = new YarnConfiguration();
>         conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
> -1);
>         YarnClient yarnClient = YarnClient.createYarnClient();
>         yarnClient.init(conf);
>         yarnClient.start();
>         ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
> 10645);
>         ApplicationReport report = yarnClient.getApplicationReport(appId);
>     }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1430126768987_10645' doesn't exist in RM.
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> ....
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.14.120.231:61621 Call#875163 Retry#0
> ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to