Yeah, I should have mentioned that: its master-master, and on cdh4b1. But, replication on that specific slave table is disabled (so, effectively its master-slave for this test).
Is this same as yours (replication config wise), or shall I enable replication on the destination table too? Thanks, Himanshu On Tue, May 1, 2012 at 8:01 PM, Jerry Lam <[email protected]> wrote: > Hi Himanshu: > > Thanks for following up! I did looked up the log and there were some > exceptions. I'm not sure if those exceptions contribute to the problem I've > seen a week ago. > I did aware of the latency between the time that the master said "Nothing to > replicate" and the actual time it takes to actually replicate on the slave. I > remember I wait 12 hours for the replication to finish (i.e. start the test > before leaving office and check the result the next day) and data still not > fully replicated. > > By the way, is your test running with master-slave replication or > master-master replication? > > I will resume this again. I was busy on something else for the past week or > so. > > Best Regards, > > Jerry > > On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote: > >> Hello Jerry, >> >> Did you try this again. >> >> Whenever you try next, can you please share the logs somehow. >> >> I tried replicating your scenario today, but no luck. I used the same >> workload you have copied here; master cluster has 5 nodes and slave >> has just 2 nodes; and made tiny regions of 8MB (memstore flushing at >> 8mb too), so that I have around 1200+ regions even for 200k rows; ran >> the workload with 16, 24 and 32 client threads, but the verifyrep >> mapreduce job says its good. >> Yes, I ran the verifyrep command after seeing "there is nothing to >> replicate" message on all the regionservers; sometimes it was a bit >> slow. >> >> >> Thanks, >> Himanshu >> >> On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans >> <[email protected]> wrote: >>>> I will try your suggestion today with a master-slave replication enabled >>>> from Cluster A -> Cluster B. >>> >>> Please do. >>> >>>> Last Friday, I tried to limit the variability/the moving part of the >>>> replication components. I reduced the size of Cluster B to have only 1 >>>> regionserver and having Cluster A to replicate data from one region only >>>> without region splitting (therefore I have 1-to-1 region replication >>>> setup). During the benchmark, I moved the region between different >>>> regionservers in Cluster A (note there are still 3 regionservers in >>>> Cluster A). I ran this test for 5 times and no data were lost. Does it >>>> mean something? My feeling is there are some glitches/corner cases that >>>> have not been covered in the cyclic replication (or hbase replication in >>>> general). Note that, this happens only when the load is high. >>> >>> And have you looked at the logs? Any obvious exceptions coming up? >>> Replication uses the normal HBase client to insert the data on the >>> other cluster and this is what handles regions moving around. >>> >>>> >>>> By the way, why do we need to have a zookeeper not handled by hbase for >>>> the replication to work (it is described in the hbase documentation)? >>> >>> It says you *should* do it, not you *need* to do it :) >>> >>> But basically replication is zk-heavy and getting a better >>> understanding of it starts with handling it yourself. >>> >>> J-D >
