[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556332#comment-14556332 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556297#comment-14556297 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556226#comment-14556226 ] Hudson commented on YARN-3646: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556199#comment-14556199 ] Hudson commented on YARN-3646: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556093#comment-14556093 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556079#comment-14556079 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555168#comment-14555168 ] Junping Du commented on YARN-3646: -- Also, congratulations to [~raju.bairishetti] for contributing the first patch to Apache Hadoop project! :) > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554510#comment-14554510 ] Junping Du commented on YARN-3646: -- Filed YARN-3695 for continue the discussion. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554486#comment-14554486 ] Junping Du commented on YARN-3646: -- Sure. I will file a separated JIRA for discussing this. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554387#comment-14554387 ] Devaraj K commented on YARN-3646: - Thanks [~djp] for the comment. I have just committed this patch before noticing your comment. Can we handle adding EOFException to the list as part of another issue/improvement? Thanks. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554385#comment-14554385 ] Hudson commented on YARN-3646: -- FAILURE: Integrated in Hadoop-trunk-Commit #7882 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7882/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Fix For: 2.7.1 > > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554373#comment-14554373 ] Junping Du commented on YARN-3646: -- The patch LGTM too in overall. Just one minor issue: {code} exceptionToPolicyMap.put(EOFException.class, retryPolicy); {code} Do we need to apply RetryPolicies.RETRY_FOREVER on EOFException too? I don't think so. [~jianhe] and [~xgong], any comments here? > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti >Assignee: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552404#comment-14552404 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 6s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 51s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 45m 47s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734115/YARN-3646.002.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 4aa730c | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8023/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8023/console | This message was automatically generated. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.servic
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552225#comment-14552225 ] Rohith commented on YARN-3646: -- +1 lgtm (non-binding).. wait for jenkins report!! > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552091#comment-14552091 ] Rohith commented on YARN-3646: -- Thanks for updating the patch, some comments on tests # I think we can remove the tests added in the hadoop-common project, since yarn-client verifies required funcitionality. And basically hadoop-common test was mocking the RMProxy functionality which test was passing without RMProxy fix also. # code never reach {{Assert.fail("");}}. better to remove it # Catch the ApplicationNotFoundException instead of catching throwable. I think you can add {{expected = ApplicationNotFoundException.class}} in the @Test annotation like below. {code} @Test(timeout = 3, expected = ApplicationNotFoundException.class) public void testClientWithRetryPolicyForEver() throws Exception { YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); ResourceManager rm = null; YarnClient yarnClient = null; try { // start rm rm = new ResourceManager(); rm.init(conf); rm.start(); yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); // create invalid application id ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); // RM should throw ApplicationNotFoundException exception yarnClient.getApplicationReport(appId); } finally { if (yarnClient != null) { yarnClient.stop(); } if (rm != null) { rm.stop(); } } } {code} # can you rename the test name with actual functionality test, like {{testShouldNotRetryForeverForNonNetworkExceptions}} > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.had
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551895#comment-14551895 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 54s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 6m 54s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 73m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734062/YARN-3646.001.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / ce53c8e | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8017/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8017/console | This message was automatically generated. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550288#comment-14550288 ] Raju Bairishetti commented on YARN-3646: Thanks [~rohithsharma] for the review. Looks like it is mainly an issue with retry policy. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550258#comment-14550258 ] Rohith commented on YARN-3646: -- And I verified in one node cluster by enabling and disabling retryforever policy. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550256#comment-14550256 ] Rohith commented on YARN-3646: -- Thanks for working on this issue.. The patch overall looks good to me. nit : Can the test moved to Yarn package since issue is in Yarn? Otherwise if there is any changed in the RMProxy, test will not run. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550233#comment-14550233 ] Rohith commented on YARN-3646: -- bq. Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. make sense to me,will reveiw the patch, thanks > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550092#comment-14550092 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 17s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 63m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 93972a3 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7994/console | This message was automatically generated. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationRepo
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547537#comment-14547537 ] Raju Bairishetti commented on YARN-3646: [~vinodkv] I will provide a patch shortly. I am not able to assign myself. Can anyone help me in assigning myself? > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547529#comment-14547529 ] Raju Bairishetti commented on YARN-3646: bq. Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action. Do we need to catch exception in shouldRetry if we have separate exceptionToPolicy map which contains only connectionException entry. ( like exceptiontoPolicyMap.put(connectionException, FOREVER polcicy)) Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. thoughts? > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545702#comment-14545702 ] Devaraj K commented on YARN-3646: - bq. Would this not cause the client to wait for too long (well after the rm has come back online) "yarn.resourcemanager.connect.max-wait.ms" is the max time to wait to establish a connection to RM, If the RM comes online before this time it will connect immediately. IPC client would be internally retrying to connect RM for every "yarn.resourcemanager.connect.retry-interval.ms" (default value 30 * 1000) and exception will be thrown if it can't connect for "yarn.resourcemanager.connect.max-wait.ms". > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545548#comment-14545548 ] Srikanth Sundarrajan commented on YARN-3646: {quote} You can probably avoid this situation by setting a bigger value {quote} Would this not cause the client to wait for too long (well after the rm has come back online) > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545264#comment-14545264 ] Devaraj K commented on YARN-3646: - You can probably avoid this situation by setting a bigger value for "yarn.resourcemanager.connect.max-wait.ms"(like below) if you want to wait for long time to establish a connection to RM with retries. {code:xml} conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, Integer.MAX_VALUE); {code} Anyway it seems this issue needs to be fixed. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544988#comment-14544988 ] Rohith commented on YARN-3646: -- Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also {{RetryPolicies.RetryForever.shouldRetry()}} should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544959#comment-14544959 ] Rohith commented on YARN-3646: -- I was copied *yarn.resourcemanager.connect.wait-ms* from description but actual configuration is *yarn.resourcemanager.connect.max-wait.ms*. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544947#comment-14544947 ] Rohith commented on YARN-3646: -- RetryPolicies.RETRY_FOREVER should also should use exceptionToPolicyMap. [~raju.bairishetti] Feel free to take up this JIRA. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544938#comment-14544938 ] Rohith commented on YARN-3646: -- Thanks for the explanation.. I got the problem in my machines too. Last time when I test, the configuration settings had issue. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544815#comment-14544815 ] Raju Bairishetti commented on YARN-3646: Thanks for the quick response. I have reproduced it with apache 2.6.0 release (HDP 2.2.4 distribution). We are using 2.5.0 version. We are not having *exceptionToPolicyMap* for FOREVER retrypolicy. Updating the exceptionToPolicyMap only for other retry policies. *RetryPolicies.java* {code} static class RetryForever implements RetryPolicy { @Override public RetryAction shouldRetry(Exception e, int retries, int failovers, boolean isIdempotentOrAtMostOnce) throws Exception { return RetryAction.RETRY; } } {code} *RMProxy.java* {code} if (waitForEver) { return RetryPolicies.RETRY_FOREVER; } ... Map, RetryPolicy> exceptionToPolicyMap = new HashMap, RetryPolicy>(); {code} > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543776#comment-14543776 ] Rohith commented on YARN-3646: -- Which version of Hadoop are you using? I don't see this problem in trunk or branch-2. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)