Arun Suresh created HADOOP-10412: ------------------------------------ Summary: First call from Client fails after Server restart Key: HADOOP-10412 URL: https://issues.apache.org/jira/browse/HADOOP-10412 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.2.0 Environment: Linux : centos62-2 2.6.32-220.el6.x86_64, jdk : 1.7.0_15 Reporter: Arun Suresh
This seems to happen only for ProtobufRpc based services. Could not reproduce using simple WritableRpc. Steps to reproduce : Consider the case of namenode HA failover. nn1 and nn2 are both namenodes, nn1 is 'active' and nn2 is 'standby' 1) Bring down nn1 process. Now nn2 is active 2) Bring nn1 process back up. Now nn1 is standby and nn2 is active. 3) Manually issue failover using command : {quote} $ hdfs haadmin -failover nn2 nn1 {quote} It is observed that the first call always fails with the Following exception : {quote} Operation failed: Failed to become active. Couldn't make NameNode at centos62-2/192.168.2.202:8020 active java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "centos62-2/192.168.2.202"; destination host is: "centos62-2":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100) at org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48) at org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:373) at org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:818) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:803) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891) at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:673) at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:592) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:589) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:589) at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61) at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) {quote} The calls succeeds if I issue the same command subsequently -- This message was sent by Atlassian JIRA (v6.2#6252)