[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726435#comment-16726435 ] Hadoop QA commented on HDFS-14161: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 58s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 45s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterAllResolver | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14161 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952617/HDFS-14161-HDFS-13891.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cf9d68c7fa79 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / c9ebaf2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/25837/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/25837/testReport/ | | Max. process+thread count | 1337 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output |
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726406#comment-16726406 ] Fei Hui commented on HDFS-14161: [~elgoiri] Upload v003 patch, improve the message in RouterRpcClient#420 > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161-HDFS-13891.001.patch, > HDFS-14161-HDFS-13891.002.patch, HDFS-14161-HDFS-13891.003.patch, > HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726375#comment-16726375 ] Íñigo Goiri commented on HDFS-14161: OK, now I remember. I tried wrapping StandbyException and report something like RouterOverloadedException but that didn't work because the client check specifically for StandbyException and not a child of it. So, let's return StandbyException for both. The onyl thing is that we should improve the message in RouterRpcClient#420 to distinguish between a NameNode really being in Standby or us issuing a StandbyException to retry in another Router. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161-HDFS-13891.001.patch, > HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726360#comment-16726360 ] Fei Hui commented on HDFS-14161: [~elgoiri] The purpose of throwing Exception is that client have chance to retry and not fail. Throwing RetriableExceptio makes client access the same router again and throwing StandbyException makes client access another router. Maybe Router reporting Standby state is reasonable when it could not service.On the other hand if throwing RetriableException, client will retry seconds later. Both methods are good, do you think which one is better? > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161-HDFS-13891.001.patch, > HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726077#comment-16726077 ] Íñigo Goiri commented on HDFS-14161: Thanks [~ferhui] for the patch. Coincidentally, yesterday I found an issue with throwing a {{StandbyException}}. In {{RouterRpcClient#420}}, we report all {{StandbyException}} as the Namenode was in Standby while this is not true in the overload case anymore. Will the {{RetriableException}} have the same effect of it trying into a different Router? If so, we may want to make both {{RetriableException}}. Another option is to improve {{RouterRpcClient#420}} to report the exception or verify if it's a real {{StandbyException}} or not. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161-HDFS-13891.001.patch, > HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725896#comment-16725896 ] Hadoop QA commented on HDFS-14161: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 29s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 14s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14161 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952527/HDFS-14161-HDFS-13891.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ab3868b6a261 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / c9ebaf2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/25835/testReport/ | | Max. process+thread count | 952 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25835/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725812#comment-16725812 ] Fei Hui commented on HDFS-14161: Upload v002 path, fix checkstyle > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161-HDFS-13891.001.patch, > HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725792#comment-16725792 ] Hadoop QA commented on HDFS-14161: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 50s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 4s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14161 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952516/HDFS-14161-HDFS-13891.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d403cec83784 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / c9ebaf2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/25834/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/25834/testReport/ | | Max. process+thread count | 1233 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output |
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725743#comment-16725743 ] Fei Hui commented on HDFS-14161: [~elgoiri] Maybe throw StandbyException is reasonable. Add a test case and rebase code on branch HDFS-13891. Test cases of TestRouterClientRejectOverload aim to test ThreadPoolExecutor from RouterRpcClient. This issue is different, it occurs when the ConnectionManager returns a null Connection. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725228#comment-16725228 ] Íñigo Goiri commented on HDFS-14161: Moved this as part of HDFS-13891. The patch should also target the branch. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725201#comment-16725201 ] Íñigo Goiri commented on HDFS-14161: Well, the same test that I added for {{TestRouterClientRejectOverload}} would apply here. Create a mini cluster with two Routers and trigger a method that uses {{invokeSequential()}} which makes it run out of connections in one Namenode and make sure it fails over to the other Router. So pretty much the same but with a method that uses {{invokeSequential()}}; maybe get file info? You even have the tooling to simulate slow NNs. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at >
[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection
[ https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724776#comment-16724776 ] Fei Hui commented on HDFS-14161: [~elgoiri] Maybe DFS_ROUTER_CLIENT_REJECT_OVERLOAD affects invokeConcurrent. Posted case calls invokeSequential Try to present a test case. > RBF: Throw RetriableException instead of IOException so that client can retry > when can not get connection > - > > Key: HDFS-14161 > URL: https://issues.apache.org/jira/browse/HDFS-14161 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14161.001.patch > > > Hive Client may hang when get IOException, stack follows > {code:java} > Exception in thread "Thread-150" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a > connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot > get a connection to bigdata-nn20.g01:8020 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > at org.apache.hadoop.ipc.Client.call(Client.java:1503) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >