Thanks for you! According to your suggestion,I configure "dfs.client.retry.policy.enabled" to "true" in core-site.xml,and restart making effect.I find some changes in hbase master log. In mater log,retry information appear.But it still takes a long time to be able to write.I want ask how long hbase can write?What is retry policy?Whether can configure which parameters?
attach hbase master log: 2014-12-01 22:47:30,487 INFO [master:l-hbase2:60000-SendThread(l-hbase2.dba.dev.cn0.qunar.com:2181)] zookeeper.ClientCnxn: Session establishment complete on server l-hbase2.dba.dev.cn0.qunar.com/10.86.36.218:2181, sessionid = 0x14a0640d2100007, negotiated timeout = 40000 2014-12-01 22:48:38,729 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 0 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:49:07,748 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 1 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:49:22,534 DEBUG [l-hbase2.dba.dev.cn0.qunar.com,60000,1417445031747-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s) 2014-12-01 22:49:34,080 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 2 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:49:54,752 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 3 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:50:19,014 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 4 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:50:44,438 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 5 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:51:05,546 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 6 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:51:58,980 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 7 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:53:33,330 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 8 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:54:22,533 DEBUG [l-hbase2.dba.dev.cn0.qunar.com,60000,1417445031747-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s) 2014-12-01 22:54:30,953 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 9 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:55:43,189 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 10 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:56:49,457 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 11 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:58:29,088 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 12 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 22:59:22,532 DEBUG [l-hbase2.dba.dev.cn0.qunar.com,60000,1417445031747-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s) 2014-12-01 22:59:25,346 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 13 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 23:00:55,023 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 14 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 23:01:59,966 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 15 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 23:02:46,067 INFO [master:l-hbase2:60000.oldLogCleaner] ipc.Client: Retrying connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried 16 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[6x10000ms, 10x60000ms], TryOnceThenFail] 2014-12-01 23:03:01,073 INFO [master:l-hbase2:60000.oldLogCleaner] retry.RetryInvocationHandler: Exception while invoking getListing of class ClientNamenodeProtocolTranslatorPB over l-hbase1.dba.dev.cn0/10.86.36.217:8020. Trying to fail over immediately. java.net.ConnectException: Call From l-hbase2.dba.dev.cn0.qunar.com/10.86.36.218 to l-hbase1.dba.dev.cn0:8020 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy17.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:546) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy18.getListing(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294) at com.sun.proxy.$Proxy20.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1906) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1889) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:104) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:716) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:712) at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1555) at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1575) at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:123) at org.apache.hadoop.hbase.Chore.run(Chore.java:87) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.ConnectException: Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463) at org.apache.hadoop.ipc.Client.call(Client.java:1382) ... 28 more 2014-12-01 23:03:01,082 DEBUG [master:l-hbase2:60000.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417428218321.1417442628891.meta ------------------ 原始邮件 ------------------ 发件人: "Bharath Vissapragada";<[email protected]>; 发送时间: 2014年12月1日(星期一) 晚上8:27 收件人: "hbase-user"<[email protected]>; 主题: Re: After hadoop QJM failover,hbase can not write Did you override "dfs.client.retry.policy.enabled" to "true" in the regionserver configs? On Mon, Dec 1, 2014 at 9:13 AM, 聪聪 <[email protected]> wrote: > hi,there: > I encount a problem,it let me upset. > > > I use version of hadoop is hadoop-2.3.0-cdh5.1.0,namenode HA use the > Quorum Journal Manager (QJM) feature ,dfs.ha.fencing.methods option is > following: > <property> > <name>dfs.ha.fencing.methods</name> > <value>sshfence > shell(q_hadoop_fence.sh $target_host $target_port) > </value> > </property> > > > > or > > > <property> > <name>dfs.ha.fencing.methods</name> > <value>sshfence > shell(/bin/true) > </value> > </property> > > > > I use iptables to simulate machine of active namenode crash。After > automatic failover completed,hdfs can the normal write,for example > ./bin/hdfs dfs -put a.txt /tmp,but hbase still can not write. > After a very long time,hbase can write,but I can not statistic How long > did it take. > I want to ask: > 1、Why hdfs Complete failover,hbase can not write? > 2、After hdfs Complete failover,how long hbase can write? > 3、Whether a particular parameters influence this time? > > > Looking forward for your responses! > attach regionserver log,until the following content appears to be able to > write: > 2014-12-01 11:35:16,965 INFO [MemStoreFlusher.6] regionserver.HRegion: > Finished memstore flush of ~7.9 K/8096, currentsize=0/0 for region > t,,1417403859247.645d0fbe63663fabfb73025d3eb99524. in 46ms, sequenceid=48, > compaction requested=false > 2014-12-01 11:35:17,755 WARN [RpcServer.reader=1,port=60020] > ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2248) > at > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1427) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:802) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:593) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:568) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > > part of the datanode log is following: > > > 2014-11-28 16:51:56,420 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried > 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2014-11-28 16:52:12,421 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried > 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2014-11-28 16:52:27,422 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService > java.net.ConnectException: Call From > l-hbase3.dba.dev.cn0.qunar.com/10.86.36.219 to l-hbase1.dba.dev.cn0:8020 > failed on connection exception: java.net.ConnectException: > Connection timed out; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > at org.apache.hadoop.ipc.Client.call(Client.java:1413) > at org.apache.hadoop.ipc.Client.call(Client.java:1362) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.sendHeartbeat(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.sendHeartbeat(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:178) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:566) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:664) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:834) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.net.ConnectException: Connection timed out > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1461) > at org.apache.hadoop.ipc.Client.call(Client.java:1380) > ... 14 more > 2014-11-28 16:52:43,424 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried > 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2014-11-28 16:52:59,424 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried > 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2014-11-28 16:53:15,425 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: l-hbase1.dba.dev.cn0/10.86.36.217:8020. Already tried > 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) -- Bharath Vissapragada <http://www.cloudera.com>
