Hello Jerry, Which HBase version?
You are not "using" cyclic replication? Its simple one side replication, right? Thanks, Himanshu On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam <[email protected]> wrote: > Hi HBase community: > > We have been testing cyclic replication for 1 week. The basic functionality > seems to work as described in the document however when we started to > increase the write workload, the replication starts to miss data (i.e. some > data are not replicated to the other cluster). We have narrowed down to a > scenario that we can reproduce the problem quite consistently and here it is: > > ----------------------------- > Setup: > - We have setup 2 clusters (cluster A and cluster B)with identical size in > terms of number of nodes and configuration, 3 regionservers sit on top of 3 > datanodes. > - Cyclic replication is enabled. > > - We use YCSB to generate load to hbase the workload is very similar to > workloada: > > recordcount=200000 > operationcount=200000 > workload=com.yahoo.ycsb.workloads.CoreWorkload > fieldcount=1 > fieldlength=25000 > > readallfields=true > writeallfields=true > > readproportion=0 > updateproportion=1 > scanproportion=0 > insertproportion=0 > > requestdistribution=uniform > > - Records are inserted into Cluster A. After the benchmark is done and wait > until all data are replicated to Cluster B, we used verifyrep mapreduce job > for validation. > - Data are deleted from both table (truncate 'tablename') before a new > experiment is started. > > Scenario: > when we increase the number of threads until it max out the throughput of the > cluster, we saw some data are missing in Cluster B (total count != 200000) > although cluster A clearly has them all. This happens even though we disabled > region splitting in both clusters (it happens more often when region splits > occur). To further having more control of what is happening, we then decided > to disable the load balancer so the region (which is responsible for the > replicating data) will not relocate to other regionserver during the > benchmark. The situation improves a lot. We don't see any missing data in 5 > continuous runs. Finally, we decided to move the region around from a > regionserver to another regionserver during the benchmark to see if the > problem will reappear and it did. > > We believe that the issue could be related to region splitting and load > balancing during intensive write, the hbase replication strategy hasn't yet > cover those corner cases. > > Can someone take a look of it and suggest some ways to workaround this? > > Thanks~ > > Jerry
