[ https://issues.apache.org/jira/browse/HBASE-13792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anoop Sam John resolved HBASE-13792. ------------------------------------ Resolution: Duplicate Assignee: (was: Samir Ahmic) Closing as dup of HBASE-13337. > Regionserver unable to report to master when master is restarted > ---------------------------------------------------------------- > > Key: HBASE-13792 > URL: https://issues.apache.org/jira/browse/HBASE-13792 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Affects Versions: 2.0.0 > Environment: x86_64 GNU/Linux > Reporter: Samir Ahmic > Priority: Critical > Fix For: 2.0.0 > > > I was testing master branch on distributed cluster and i notice that when > master is restarted on running cluster regionservers are unable report back > when master is up again. > Things back to normal after i restarted regionservers. Logs showing that > regionservers are correctly detecting master znode. > After some digging i notice that we have changed client implementation in > RpcClientFactory to AsyncRpcClient so i have tried running cluster with > previous RpcClientImpl and issue was gone. > So issue is probably caused by AsyncRpcClient which is unable reconnect to > master once original connection is gone. > I was able to fix issue by creating new rpcClient object inside > HRegionServer#createRegionServerStatusStub() and using it for channel > creation here is diff: > {code} > diff --git > a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > > b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > index fa56966..27e658c 100644 > --- > a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > +++ > b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > @@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements > break; > } > try { > + LOG.info("***Creating new client connection"); > + rpcClient = RpcClientFactory.createClient(conf, clusterId, new > InetSocketAddress( > + rpcServices.isa.getAddress(), 0)); > BlockingRpcChannel channel = > - this.rpcClient.createBlockingRpcChannel(sn, > userProvider.getCurrent(), > + rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(), > shortOperationTimeout); > intf = RegionServerStatusService.newBlockingStub(channel); > break; > {code} > If this is acceptable way for fixing this issue i will create and attach > patch? -- This message was sent by Atlassian JIRA (v6.3.4#6332)