Hi Himanshu:

I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 
0.20 with append feature.

It is a one side replication (cluster A to cluster B) with cyclic replication 
enabled (i.e. add_peer of the other cluster configured). 

Best Regards,

Jerry

Sent from my iPad

On 2012-04-20, at 10:23, Himanshu Vashishtha <[email protected]> wrote:

> Hello Jerry,
> 
> Which HBase version?
> 
> You are not "using" cyclic replication? Its simple one side replication, 
> right?
> 
> Thanks,
> Himanshu
> 
> On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam <[email protected]> wrote:
>> Hi HBase community:
>> 
>> We have been testing cyclic replication for 1 week. The basic functionality 
>> seems to work as described in the document however when we started to 
>> increase the write workload, the replication starts to miss data (i.e. some 
>> data are not replicated to the other cluster). We have narrowed down to a 
>> scenario that we can reproduce the problem quite consistently and here it is:
>> 
>> -----------------------------
>> Setup:
>> - We have setup 2 clusters (cluster A and cluster B)with identical size in 
>> terms of number of nodes and configuration, 3 regionservers sit on top of 3 
>> datanodes.
>> - Cyclic replication is enabled.
>> 
>> - We use YCSB to generate load to hbase the workload is very similar to 
>> workloada:
>> 
>> recordcount=200000
>> operationcount=200000
>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>> fieldcount=1
>> fieldlength=25000
>> 
>> readallfields=true
>> writeallfields=true
>> 
>> readproportion=0
>> updateproportion=1
>> scanproportion=0
>> insertproportion=0
>> 
>> requestdistribution=uniform
>> 
>> - Records are inserted into Cluster A. After the benchmark is done and wait 
>> until all data are replicated to Cluster B, we used verifyrep mapreduce job 
>> for validation.
>> - Data are deleted from both table (truncate 'tablename') before a new 
>> experiment is started.
>> 
>> Scenario:
>> when we increase the number of threads until it max out the throughput of 
>> the cluster, we saw some data are missing in Cluster B (total count != 
>> 200000) although cluster A clearly has them all. This happens even though we 
>> disabled region splitting in both clusters (it happens more often when 
>> region splits occur). To further having more control of what is happening, 
>> we then decided to disable the load balancer so the region (which is 
>> responsible for the replicating data) will not relocate to other 
>> regionserver during the benchmark. The situation improves a lot. We don't 
>> see any missing data in 5 continuous runs. Finally, we decided to move the 
>> region around from a regionserver to another regionserver during the 
>> benchmark to see if the problem will reappear and it did.
>> 
>> We believe that the issue could be related to region splitting and load 
>> balancing during intensive write, the hbase replication strategy hasn't yet 
>> cover those corner cases.
>> 
>> Can someone take a look of it and suggest some ways to workaround this?
>> 
>> Thanks~
>> 
>> Jerry

Reply via email to