Hello Jerry,

Which HBase version?

You are not "using" cyclic replication? Its simple one side replication, right?

Thanks,
Himanshu

On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam <[email protected]> wrote:
> Hi HBase community:
>
> We have been testing cyclic replication for 1 week. The basic functionality 
> seems to work as described in the document however when we started to 
> increase the write workload, the replication starts to miss data (i.e. some 
> data are not replicated to the other cluster). We have narrowed down to a 
> scenario that we can reproduce the problem quite consistently and here it is:
>
> -----------------------------
> Setup:
> - We have setup 2 clusters (cluster A and cluster B)with identical size in 
> terms of number of nodes and configuration, 3 regionservers sit on top of 3 
> datanodes.
> - Cyclic replication is enabled.
>
> - We use YCSB to generate load to hbase the workload is very similar to 
> workloada:
>
> recordcount=200000
> operationcount=200000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> fieldcount=1
> fieldlength=25000
>
> readallfields=true
> writeallfields=true
>
> readproportion=0
> updateproportion=1
> scanproportion=0
> insertproportion=0
>
> requestdistribution=uniform
>
> - Records are inserted into Cluster A. After the benchmark is done and wait 
> until all data are replicated to Cluster B, we used verifyrep mapreduce job 
> for validation.
> - Data are deleted from both table (truncate 'tablename') before a new 
> experiment is started.
>
> Scenario:
> when we increase the number of threads until it max out the throughput of the 
> cluster, we saw some data are missing in Cluster B (total count != 200000) 
> although cluster A clearly has them all. This happens even though we disabled 
> region splitting in both clusters (it happens more often when region splits 
> occur). To further having more control of what is happening, we then decided 
> to disable the load balancer so the region (which is responsible for the 
> replicating data) will not relocate to other regionserver during the 
> benchmark. The situation improves a lot. We don't see any missing data in 5 
> continuous runs. Finally, we decided to move the region around from a 
> regionserver to another regionserver during the benchmark to see if the 
> problem will reappear and it did.
>
> We believe that the issue could be related to region splitting and load 
> balancing during intensive write, the hbase replication strategy hasn't yet 
> cover those corner cases.
>
> Can someone take a look of it and suggest some ways to workaround this?
>
> Thanks~
>
> Jerry

Reply via email to