Arun Suresh created HADOOP-10412:
------------------------------------

             Summary: First call from Client fails after Server restart
                 Key: HADOOP-10412
                 URL: https://issues.apache.org/jira/browse/HADOOP-10412
             Project: Hadoop Common
          Issue Type: Bug
          Components: ipc
    Affects Versions: 2.2.0
         Environment: Linux : centos62-2 2.6.32-220.el6.x86_64,
jdk : 1.7.0_15
            Reporter: Arun Suresh


This seems to happen only for ProtobufRpc based services. Could not reproduce 
using simple WritableRpc.

Steps to reproduce :
Consider the case of namenode HA failover. nn1 and nn2 are both namenodes, nn1 
is 'active' and nn2 is 'standby'
1) Bring down nn1 process. Now nn2 is active
2) Bring nn1 process back up. Now nn1 is standby and nn2 is active.
3) Manually issue failover using command :
{quote}
$ hdfs haadmin -failover nn2 nn1
{quote}

It is observed that the first call always fails with the Following exception :
{quote}
Operation failed: Failed to become active. Couldn't make NameNode at 
centos62-2/192.168.2.202:8020 active
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: "centos62-2/192.168.2.202"; destination host is: 
"centos62-2":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source)
        at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
        at 
org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
        at 
org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:373)
        at 
org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:59)
        at 
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:818)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:803)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

        at 
org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:673)
        at 
org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:59)
        at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:592)
        at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:589)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at 
org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:589)
        at 
org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94)
        at 
org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
        at 
org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
{quote}

The calls succeeds if I issue the same command subsequently



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to