Hi everyone, I have a general design question (apologies in advanced if this has been asked before).
I'd like to build indexes off of a raw data store and I'm trying to think of the best way to control processing so some part of my cluster can still serve reads and writes without being affected heavily by the index building process. I get the sense that the typical process for this involves something like the following: 1. Dedicate one cluster for index building (let's call it the INDEX cluster) and one for serving application reads on the indexes as well as writes/reads on the raw data set (let's call it the MAIN cluster). 2. Have the raw data set replicated from the MAIN cluster to the INDEX cluster. 3. On the INDEX cluster, use the replicated raw data to constantly rebuild indexes and copy the new versions to the MAIN cluster, overwriting the old versions if necessary. While conceptually simple, I can't help but wonder if it doesn't make more sense to simply switch application reads / writes from one cluster to another based on which one is NOT currently building indexes (but still have the raw data set replicate master-master between them). To be more clear, I'm proposing doing this: 1. Have two clusters, call them CLUSTER_1 and CLUSTER_2, and have the raw data set replicated master-master between them. 2. if CLUSTER_1 is currently rebuilding indexes, redirect all application traffic to CLUSTER_2 including reads from the indexes as well as writes to the raw data set (and vise-versa). I know I'm not addressing a lot of details here but I'm just curious if anyone has ever implemented something along these lines. The main advantage to what I'm proposing would be not having to copy potentially massive indexes across the network but at the cost of having to deal with having clients not always read from the same cluster (seems doable though). Any advice would be much appreciated! Thanks
