[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726435#comment-16726435
 ] 

Hadoop QA commented on HDFS-14161:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-13891 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
58s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} HDFS-13891 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 45s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterAllResolver |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14161 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12952617/HDFS-14161-HDFS-13891.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cf9d68c7fa79 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-13891 / c9ebaf2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25837/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25837/testReport/ |
| Max. process+thread count | 1337 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726406#comment-16726406
 ] 

Fei Hui commented on HDFS-14161:


[~elgoiri] Upload v003 patch, improve the message in RouterRpcClient#420

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161-HDFS-13891.001.patch, 
> HDFS-14161-HDFS-13891.002.patch, HDFS-14161-HDFS-13891.003.patch, 
> HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726375#comment-16726375
 ] 

Íñigo Goiri commented on HDFS-14161:


OK, now I remember.
I tried wrapping StandbyException and report something like 
RouterOverloadedException but that didn't work because the client check 
specifically for StandbyException and not a child of it.
So, let's return StandbyException for both.

The onyl thing is that we should improve the message in RouterRpcClient#420 to 
distinguish between a NameNode really being in Standby or us issuing a 
StandbyException to retry in another Router.

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161-HDFS-13891.001.patch, 
> HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726360#comment-16726360
 ] 

Fei Hui commented on HDFS-14161:


[~elgoiri] The purpose of throwing Exception is that client have chance to 
retry and not fail. Throwing RetriableExceptio makes client access the same 
router again and  throwing StandbyException makes client access another router. 
Maybe Router reporting Standby state is reasonable when it could not service.On 
the other hand if throwing RetriableException, client will retry seconds later. 
Both methods are good, do you think which one is better?

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161-HDFS-13891.001.patch, 
> HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726077#comment-16726077
 ] 

Íñigo Goiri commented on HDFS-14161:


Thanks [~ferhui] for the patch.
Coincidentally, yesterday I found an issue with throwing a {{StandbyException}}.
In {{RouterRpcClient#420}}, we report all {{StandbyException}} as the Namenode 
was in Standby while this is not true in the overload case anymore.
Will the {{RetriableException}} have the same effect of it trying into a 
different Router?
If so, we may want to make both {{RetriableException}}.
Another option is to improve {{RouterRpcClient#420}} to report the exception or 
verify if it's a real {{StandbyException}} or not.

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161-HDFS-13891.001.patch, 
> HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725896#comment-16725896
 ] 

Hadoop QA commented on HDFS-14161:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-13891 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
29s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} HDFS-13891 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
14s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14161 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12952527/HDFS-14161-HDFS-13891.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ab3868b6a261 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-13891 / c9ebaf2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25835/testReport/ |
| Max. process+thread count | 952 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25835/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725812#comment-16725812
 ] 

Fei Hui commented on HDFS-14161:


Upload v002 path, fix checkstyle

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161-HDFS-13891.001.patch, 
> HDFS-14161-HDFS-13891.002.patch, HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725792#comment-16725792
 ] 

Hadoop QA commented on HDFS-14161:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-13891 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
50s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} HDFS-13891 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m  
4s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14161 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12952516/HDFS-14161-HDFS-13891.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d403cec83784 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-13891 / c9ebaf2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25834/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25834/testReport/ |
| Max. process+thread count | 1233 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-20 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725743#comment-16725743
 ] 

Fei Hui commented on HDFS-14161:


[~elgoiri] Maybe throw StandbyException is reasonable. Add a test case and 
rebase code on branch HDFS-13891.
Test cases of TestRouterClientRejectOverload aim to test ThreadPoolExecutor 
from RouterRpcClient. This issue is different, it occurs when the 
ConnectionManager returns a null Connection.
 

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725228#comment-16725228
 ] 

Íñigo Goiri commented on HDFS-14161:


Moved this as part of HDFS-13891.
The patch should also target the branch.

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725201#comment-16725201
 ] 

Íñigo Goiri commented on HDFS-14161:


Well, the same test that I added for {{TestRouterClientRejectOverload}} would 
apply here.
Create a mini cluster with two Routers and trigger a method that uses 
{{invokeSequential()}} which makes it run out of connections in one Namenode 
and make sure it fails over to the other Router.
So pretty much the same but with a method that uses {{invokeSequential()}}; 
maybe get file info?
You even have the tooling to simulate slow NNs.

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> 

[jira] [Commented] (HDFS-14161) RBF: Throw RetriableException instead of IOException so that client can retry when can not get connection

2018-12-18 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724776#comment-16724776
 ] 

Fei Hui commented on HDFS-14161:


[~elgoiri] Maybe DFS_ROUTER_CLIENT_REJECT_OVERLOAD affects invokeConcurrent. 
Posted case calls invokeSequential
Try to present a test case.

> RBF: Throw RetriableException instead of IOException so that client can retry 
> when can not get connection
> -
>
> Key: HDFS-14161
> URL: https://issues.apache.org/jira/browse/HDFS-14161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.1, 2.9.2, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14161.001.patch
>
>
> Hive Client may hang when get IOException, stack follows
> {code:java}
> Exception in thread "Thread-150" java.lang.RuntimeException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a 
> connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot 
> get a connection to bigdata-nn20.g01:8020
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>