Raju Bairishetti created YARN-3646:
--------------------------------------

             Summary: Applications are getting stuck some times in case of 
retry policy forever
                 Key: YARN-3646
                 URL: https://issues.apache.org/jira/browse/YARN-3646
             Project: Hadoop YARN
          Issue Type: Bug
          Components: client
            Reporter: Raju Bairishetti


We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER retry 
policy.

Yarn client is infinitely retrying in case of exceptions from the RM as it is 
using retrying policy as FOREVER. The problem is it is retrying for all kinds 
of exceptions (like ApplicationNotFoundException), even though it is not a 
connection failure. Due to this my application is not progressing further.

*Yarn client should not retry infinitely in case of non connection failures.*

We have written a simple yarn-client which is trying to get an application 
report for an invalid  or older appId. ResourceManager is throwing an 
ApplicationNotFoundException as this is an invalid or older appId.  But because 
of retry policy FOREVER, client is keep on retrying for getting the application 
report and ResourceManager is throwing ApplicationNotFoundException 
continuously.

{code}
private void testYarnClientRetryPolicy() throws  Exception{
        YarnConfiguration conf = new YarnConfiguration();
        conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1);
        YarnClient yarnClient = YarnClient.createYarnClient();
        yarnClient.init(conf);
        yarnClient.start();
        ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645);
        ApplicationReport report = yarnClient.getApplicationReport(appId);
    }
{code}


*RM logs:*

{noformat}

15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
from 10.14.120.231:61621 Call#875162 Retry#0
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1430126768987_10645' doesn't exist in RM.
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
        at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

....

15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
from 10.14.120.231:61621 Call#875163 Retry#0
....

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to