Demai, I see. That is a good suggestion to add redundancy, but doubles the network traffic and also doubles the wal edits. Also, after HBASE-7709, HBase stores a list of cluster-ids and this list will grow very fat in this case, maybe making waledits heavy.
I am now inclined to implement what I described in the first post, but am not sure if it would be useful upstream. Ill file a JIRA and see. In any case, thanks for the wonderful discussion. Ill report back here on what I did and if it worked. On Fri, Nov 8, 2013 at 6:55 PM, Demai Ni <[email protected]> wrote: > Ishan, > > "Coming to Demai’s suggestion of M-M to 2 instead of 9, i still want to > have > the data available from 1 to all clusters. How would I do it with your > setup?". > > If I understand the requirement currently, your setup are almost here : > C1 <-> C2 <-> C3 <-> C4 and *C4<->C1* > Basically, a double-linked-list forming a cycle. In this way, no single > point of failure, writes on any of the cluster will eventually be > replicated to all the clusters. The good part is that for each write, > although the total # of the writes are the same as NXN, each cluster will > only need the handle at most 2. With this said, I never setup more than 3 > clusters, and have to assume no other bugs similar of HBASE-7709(loop in > Master/Master Replication) coming out of this. > > Still, I don't have a good solution for '..a row should be present in only > 4/10 clusters..". One approach will use more than one columnfamily, + > either HBase-5002(control replication peer per column family) or > HBase-8751. Unfortunately, neither of the jira has been resolved yet. my 2 > cents. > > Demai > > > On Fri, Nov 8, 2013 at 4:38 PM, Ishan Chhabra <[email protected] > >wrote: > > > Demai, Ted: > > > > Thanks for the detailed answer. > > > > I should add some more context here. The underlying network is a NxN > mesh. > > The “cost" for each link is same. > > > > Coming to Demai’s suggestion of M-M to 2 instead of 9, i still want to > have > > the data available from 1 to all clusters. How would I do it with your > > setup? > > > > For the difference between MST and NxN: > > Consider the following example, with 4 clusters: C1, C2, C3, C4, and > write > > going to C1. > > > > In NxN mesh, the write will be propagated as: > > C1 -> C2 > > C1 -> C3 > > C1 -> C4 > > > > network cost: 3, writes to wal: 3 > > > > MST with tree as C1 <-> C2 <-> C3 <-> C4, the write will be propagated > as: > > C1 -> C2 > > C2 -> C3 > > C3 -> C4 > > > > network cost: 3, writes to wal: 3 > > > > Both approaches have the same network and wal cost. The only difference > is > > that in MST, if C2 fails, writes from C1 will not go to C3 and C4, where > as > > in NxN case, the writes will still happen. > > > > Also, (1) and (3) are not an issue for us. > > > > Having said that, I do realize that adding more clusters is increasing > the > > load quadratically, and that does worry me. Our actual use case is that a > > row should be present in only 4/10 clusters, but it varies based on the > row > > and not on the cluster. So I cannot come up with a static replication > > configuration that will handle that. I am looking into per row > replication, > > but will start that a separate discussion and share my ideas there. > > > > I hope this makes more sense now. > > > > > > On Fri, Nov 8, 2013 at 3:47 PM, Ted Yu <[email protected]> wrote: > > > > > bq. how about your company have a new office in the 11th locations? > > > > > > With minimum spanning tree approach, the increase in load wouldn't be > > > exponential. > > > > > > > > > On Fri, Nov 8, 2013 at 2:58 PM, Demai Ni <[email protected]> wrote: > > > > > > > Ishan, > > > > > > > > have to admit that I am a bit surprise about the need of have data > > center > > > > in 10 different locations. Well, I guess I shouldn't be, as every > > company > > > > is global now(anyone from Mars yet?) > > > > > > > > In your case, since there is only one column family. The headache is > > not > > > as > > > > bad. Let's call your clusters as C1, C2, ... C10 > > > > > > > > The safest way for your most critical data is still have setup the > M-M > > > > replication by 1 to N-1. That is every cluster add the rest of > clusters > > > as > > > > its peer. For example C1 will have C2, C3...C10 as its peers; C2 will > > > have > > > > C1, C3.. C10. Well, that will be a lot of data over the network. > > Although > > > > it is the best/fast way to get all the cluster sync-up. I don't like > > the > > > > idea at all(too expensive for one). > > > > > > > > Now, let's improve it a bit. C1 will setup M-M to 2 of the rest 9, > and > > > > carefully planned the distribution so that all the clusters will get > > > equal > > > > load. Well, a system administrator has to do it manually. > > > > > > > > Now, thinking about the headache: > > > > 1) what if your company(that is your manager who has no idea how > > > difficult > > > > it is) decide to have one more column family to be replicated? how > > about > > > > two more? The load will grow exponentially > > > > 2) how about your company have a new office in the 11th locations? > > again, > > > > grow exponentially > > > > 3) let's say you are the best administrator, and keep nice record of > > > > everything (unforturnatly, Hbase alone doesn't have a good way to > > > maintain > > > > all the record of who is being replicated). And then, the admin left > > the > > > > company? or this is a global company has 10 admin at different > > locations. > > > > How do they communicate of the replication setup? > > > > > > > > :-) Well, the 3) is not too bad. I just like to point it out as it > can > > be > > > > quite true for a company large enough to have 10 locations > > > > > > > > Demai > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 2:42 PM, Ishan Chhabra < > [email protected] > > > > >wrote: > > > > > > > > > Ted: > > > > > Yes. It is the same table that is being written to from all > > locations. > > > A > > > > > single row could be updated from multiple locations, but our schema > > is > > > > > designed in a manner that writes will be independent and not > clobber > > > each > > > > > other. > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 2:33 PM, Ted Yu <[email protected]> > wrote: > > > > > > > > > > > Ishan: > > > > > > In your use case, the same table is written to in 10 clusters at > > > > roughly > > > > > > the same time ? > > > > > > > > > > > > Please clarify. > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 2:29 PM, Ishan Chhabra < > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > @Demai, > > > > > > > We actually have 10 clusters in different locations. > > > > > > > The replication scope is not an issue for me since I have only > > one > > > > > column > > > > > > > family and we want it replicated to each location. > > > > > > > Can you elaborate more on why a replication setup of more than > > 3-4 > > > > > > clusters > > > > > > > would be a headache in your opinion? > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 2:16 PM, Ishan Chhabra < > > > > [email protected] > > > > > > > >wrote: > > > > > > > > > > > > > > > @Demai, > > > > > > > > Writes from B should also go to A and C. So, if I were to > > > continue > > > > on > > > > > > > your > > > > > > > > suggestion, I would setup A-B master master and B-C > > > master-master, > > > > > > which > > > > > > > is > > > > > > > > what I was proposing in the 2nd approach (MST based). > > > > > > > > > > > > > > > > @Vladimir > > > > > > > > That is classified. :P > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 1:20 PM, Vladimir Rodionov < > > > > > > > [email protected]>wrote: > > > > > > > > > > > > > > > >> *I want to setup NxN replication i.e. N clusters each > > > replicating > > > > to > > > > > > > each > > > > > > > >> other. N is expected to be around 10.* > > > > > > > >> > > > > > > > >> Preparing to thermonuclear war? > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> On Fri, Nov 8, 2013 at 1:14 PM, Ishan Chhabra < > > > > > > [email protected] > > > > > > > >> >wrote: > > > > > > > >> > > > > > > > >> > I want to setup NxN replication i.e. N clusters each > > > replicating > > > > > to > > > > > > > each > > > > > > > >> > other. N is expected to be around 10. > > > > > > > >> > > > > > > > > >> > On doing some research, I realize it is possible after > > > > HBASE-7709 > > > > > > fix, > > > > > > > >> but > > > > > > > >> > it would lead to much more data flowing in the system. eg. > > > > > > > >> > > > > > > > > >> > Lets say we have 3 clusters: A,B and C. > > > > > > > >> > A new write to A will go to B and then C, and also go to C > > > > > directly > > > > > > > via > > > > > > > >> the > > > > > > > >> > direct path. This leads to unnecessary network usage and > > > writes > > > > to > > > > > > WAL > > > > > > > >> of > > > > > > > >> > B, that should be avoided. Now imagine this with 10 > > clusters, > > > it > > > > > > won’t > > > > > > > >> > scale. > > > > > > > >> > > > > > > > > >> > One option is to create a minimum spanning tree joining > all > > > the > > > > > > > clusters > > > > > > > >> > and make nodes replicate to their immediate peers in a > > > > > master-master > > > > > > > >> > fashion. This is much better than NxN mesh, but still has > > > extra > > > > > > > network > > > > > > > >> and > > > > > > > >> > WAL usage. It also suffers from a failure scenarios where > > the > > > a > > > > > > single > > > > > > > >> > cluster going down will pause replication to clusters > > > > downstream. > > > > > > > >> > > > > > > > > >> > What I really want is that the ReplicationSource should > only > > > > > forward > > > > > > > >> > WALEdits with cluster-id same as the local cluster-id. > This > > > > seems > > > > > > > like a > > > > > > > >> > straight forward patch to put in. > > > > > > > >> > > > > > > > > >> > Any thoughts on the suggested approach or alternatives? > > > > > > > >> > > > > > > > > >> > -- > > > > > > > >> > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > > > > > > > > > > > > > > -- > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > -- *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
