[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544815#comment-14544815
 ] 

Raju Bairishetti commented on YARN-3646:
----------------------------------------

Thanks for the quick response.

I have reproduced it with apache 2.6.0 release (HDP 2.2.4 distribution). We are 
using 2.5.0 version.

We are not having *exceptionToPolicyMap* for FOREVER retrypolicy. Updating the 
exceptionToPolicyMap only for other retry policies.

*RetryPolicies.java*
{code}
static class RetryForever implements RetryPolicy {
    @Override
    public RetryAction shouldRetry(Exception e, int retries, int failovers,
        boolean isIdempotentOrAtMostOnce) throws Exception {
      return RetryAction.RETRY;
    }
  }
{code}

*RMProxy.java*
{code}
if (waitForEver) {
      return RetryPolicies.RETRY_FOREVER;
    }

...

    Map<Class<? extends Exception>, RetryPolicy> exceptionToPolicyMap =
        new HashMap<Class<? extends Exception>, RetryPolicy>();
{code}

> Applications are getting stuck some times in case of retry policy forever
> -------------------------------------------------------------------------
>
>                 Key: YARN-3646
>                 URL: https://issues.apache.org/jira/browse/YARN-3646
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Raju Bairishetti
>
> We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
> retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is 
> using retrying policy as FOREVER. The problem is it is retrying for all kinds 
> of exceptions (like ApplicationNotFoundException), even though it is not a 
> connection failure. Due to this my application is not progressing further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application 
> report for an invalid  or older appId. ResourceManager is throwing an 
> ApplicationNotFoundException as this is an invalid or older appId.  But 
> because of retry policy FOREVER, client is keep on retrying for getting the 
> application report and ResourceManager is throwing 
> ApplicationNotFoundException continuously.
> {code}
> private void testYarnClientRetryPolicy() throws  Exception{
>         YarnConfiguration conf = new YarnConfiguration();
>         conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
> -1);
>         YarnClient yarnClient = YarnClient.createYarnClient();
>         yarnClient.init(conf);
>         yarnClient.start();
>         ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
> 10645);
>         ApplicationReport report = yarnClient.getApplicationReport(appId);
>     }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1430126768987_10645' doesn't exist in RM.
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> ....
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.14.120.231:61621 Call#875163 Retry#0
> ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to