Hi Himanshu: My team is particularly interested in the cyclic replication so I have enable the master-master replication (so each cluster has the other cluster as its replication peer), although the replication was one direction (from cluster A to cluster B) in the test. I didn't stop_replication on the other cluster if that is what you mean by disabling the replication.
Thanks! Jerry On 2012-05-01, at 10:08 PM, Himanshu Vashishtha wrote: > Yeah, I should have mentioned that: its master-master, and on cdh4b1. > But, replication on that specific slave table is disabled (so, > effectively its master-slave for this test). > > Is this same as yours (replication config wise), or shall I enable > replication on the destination table too? > > Thanks, > Himanshu > > On Tue, May 1, 2012 at 8:01 PM, Jerry Lam <[email protected]> wrote: >> Hi Himanshu: >> >> Thanks for following up! I did looked up the log and there were some >> exceptions. I'm not sure if those exceptions contribute to the problem I've >> seen a week ago. >> I did aware of the latency between the time that the master said "Nothing to >> replicate" and the actual time it takes to actually replicate on the slave. >> I remember I wait 12 hours for the replication to finish (i.e. start the >> test before leaving office and check the result the next day) and data still >> not fully replicated. >> >> By the way, is your test running with master-slave replication or >> master-master replication? >> >> I will resume this again. I was busy on something else for the past week or >> so. >> >> Best Regards, >> >> Jerry >> >> On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote: >> >>> Hello Jerry, >>> >>> Did you try this again. >>> >>> Whenever you try next, can you please share the logs somehow. >>> >>> I tried replicating your scenario today, but no luck. I used the same >>> workload you have copied here; master cluster has 5 nodes and slave >>> has just 2 nodes; and made tiny regions of 8MB (memstore flushing at >>> 8mb too), so that I have around 1200+ regions even for 200k rows; ran >>> the workload with 16, 24 and 32 client threads, but the verifyrep >>> mapreduce job says its good. >>> Yes, I ran the verifyrep command after seeing "there is nothing to >>> replicate" message on all the regionservers; sometimes it was a bit >>> slow. >>> >>> >>> Thanks, >>> Himanshu >>> >>> On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans >>> <[email protected]> wrote: >>>>> I will try your suggestion today with a master-slave replication enabled >>>>> from Cluster A -> Cluster B. >>>> >>>> Please do. >>>> >>>>> Last Friday, I tried to limit the variability/the moving part of the >>>>> replication components. I reduced the size of Cluster B to have only 1 >>>>> regionserver and having Cluster A to replicate data from one region only >>>>> without region splitting (therefore I have 1-to-1 region replication >>>>> setup). During the benchmark, I moved the region between different >>>>> regionservers in Cluster A (note there are still 3 regionservers in >>>>> Cluster A). I ran this test for 5 times and no data were lost. Does it >>>>> mean something? My feeling is there are some glitches/corner cases that >>>>> have not been covered in the cyclic replication (or hbase replication in >>>>> general). Note that, this happens only when the load is high. >>>> >>>> And have you looked at the logs? Any obvious exceptions coming up? >>>> Replication uses the normal HBase client to insert the data on the >>>> other cluster and this is what handles regions moving around. >>>> >>>>> >>>>> By the way, why do we need to have a zookeeper not handled by hbase for >>>>> the replication to work (it is described in the hbase documentation)? >>>> >>>> It says you *should* do it, not you *need* to do it :) >>>> >>>> But basically replication is zk-heavy and getting a better >>>> understanding of it starts with handling it yourself. >>>> >>>> J-D >>
