I see. Does this only happen when cyclic replication is enabled in this way (i.e. master <-> master replication). The replication back does take some overhead as the replicator needs to filter edits from being replication back to the originator, but I would not have thought that would cause any issues.
Could you run the same test once with replication only enabled from ClusterA -> ClusterB? Thanks. -- Lars ----- Original Message ----- From: Jerry Lam <[email protected]> To: "[email protected]" <[email protected]> Cc: Sent: Friday, April 20, 2012 3:43 PM Subject: Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write Hi Himanshu: I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 0.20 with append feature. It is a one side replication (cluster A to cluster B) with cyclic replication enabled (i.e. add_peer of the other cluster configured). Best Regards, Jerry Sent from my iPad On 2012-04-20, at 10:23, Himanshu Vashishtha <[email protected]> wrote: > Hello Jerry, > > Which HBase version? > > You are not "using" cyclic replication? Its simple one side replication, > right? > > Thanks, > Himanshu > > On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam <[email protected]> wrote: >> Hi HBase community: >> >> We have been testing cyclic replication for 1 week. The basic functionality >> seems to work as described in the document however when we started to >> increase the write workload, the replication starts to miss data (i.e. some >> data are not replicated to the other cluster). We have narrowed down to a >> scenario that we can reproduce the problem quite consistently and here it is: >> >> ----------------------------- >> Setup: >> - We have setup 2 clusters (cluster A and cluster B)with identical size in >> terms of number of nodes and configuration, 3 regionservers sit on top of 3 >> datanodes. >> - Cyclic replication is enabled. >> >> - We use YCSB to generate load to hbase the workload is very similar to >> workloada: >> >> recordcount=200000 >> operationcount=200000 >> workload=com.yahoo.ycsb.workloads.CoreWorkload >> fieldcount=1 >> fieldlength=25000 >> >> readallfields=true >> writeallfields=true >> >> readproportion=0 >> updateproportion=1 >> scanproportion=0 >> insertproportion=0 >> >> requestdistribution=uniform >> >> - Records are inserted into Cluster A. After the benchmark is done and wait >> until all data are replicated to Cluster B, we used verifyrep mapreduce job >> for validation. >> - Data are deleted from both table (truncate 'tablename') before a new >> experiment is started. >> >> Scenario: >> when we increase the number of threads until it max out the throughput of >> the cluster, we saw some data are missing in Cluster B (total count != >> 200000) although cluster A clearly has them all. This happens even though we >> disabled region splitting in both clusters (it happens more often when >> region splits occur). To further having more control of what is happening, >> we then decided to disable the load balancer so the region (which is >> responsible for the replicating data) will not relocate to other >> regionserver during the benchmark. The situation improves a lot. We don't >> see any missing data in 5 continuous runs. Finally, we decided to move the >> region around from a regionserver to another regionserver during the >> benchmark to see if the problem will reappear and it did. >> >> We believe that the issue could be related to region splitting and load >> balancing during intensive write, the hbase replication strategy hasn't yet >> cover those corner cases. >> >> Can someone take a look of it and suggest some ways to workaround this? >> >> Thanks~ >> >> Jerry
