I have 2 clusters ( 1 master and 1 slave) on CDH 5.4 hbase 1.0
replication is working 95% of the time
but I do get the following WARN which I consider an error


Can't replicate because of an error on the remote cluster:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException):
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 11 actions: NotServingRegionException: 11 times,
        at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
        at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
        at 
org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1003)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1017)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:236)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:160)
        at 
org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1584)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20880)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)



I consider this an error because my slave is missing data that I have in
the master.   Is there a setting in hbase to keep trying to send ?
Cloudera management does try to restart and alerts me if the region for
some reason dies.  As to why it dies, I am looking and that is a different
problem.   but when the slave returns, I have an expectation that the
unconfirmed records would be resent.

Best practices would be helpful as well
All zookeepers in the slave are listed as peers


-- 
Abraham Tom
Email:   [email protected]
Phone:  415-515-3621

Reply via email to