Why do you need to swap the replicas from one master to another? If you have a cross DC database that ensures both Masters are in sync, why not just tie SolrSlave-B1 and SolrSlave-B2 to SolrMaster-B at all times? Then you don't have any fail-over to do at all?
We have multiple DCs and a similar setup (though a bit larger, 16 machines per DC comprising 4 replicas of the collection) and we do exactly that. So we have 2 "independent" Solr Clouds, but we feed them from a single input stream, so they should be in sync (except commit times might vary slightly from replica to replica). Users query whichever replica is nearest/least loaded, to minimize cross-DC traffic. But then for us, availability beats consistency, we'd rather have a working cloud if one DC dies, even if it is slightly inconsistent. For us, that's better (its an NRT system) than the alternative. If we do lose a DC, we'll have to manually sync back up before we bring it on-line for users but that's a price we are willing to pay. On 13 June 2014 00:52, Arcadius Ahouansou <arcad...@menelic.com> wrote: > Hello. > > - We currently have solr 4 in master-slave mode across 2 DataCenters. > > - We are planning to run the system in active-active mode, meaning that > search requests will go to Solr Slaves in both DC-A and DC-B. > > - We have a highly available and cross DC database that feeds the > SolrMaster in both DC. So, both Solr Masters are being kept up-to-date. > > - In order to allow all slaves in both DC to have the very same index > version, we have come up with the idea of having multiple masterUrl on each > slave, i.e masterUrl=masterUrl-A,masterUrl-B (and this is the main point of > this post) > > - When both DC are available, only masterUrl-A is used for fetching the > index and the topology would look like the one shown at > https://www.dropbox.com/s/4vqdx70af5ddn69/master-slave-failover.png > > - In case the worst happens and we lose DC-A, the slaves in DC-B will get > network errors like NoRouteToHost or ConnectionTimeout. > > - After few attempts, the slaves will switch to using the next url in the > masterUrl variable which would be masterUrl-B > > - This should work pretty well and when DC-A becomes available, we could > issue a rest API call to reset the masterUrl or restart the master in DC-B > and slaves in DC-B should switch back to using masterUrl-A. > > - I would like to gather your thought about this idea. > > - If this makes sense, I could raise a Jira ticket to enable multiple > masterUrl and the fail-over principle described here. > > Thank you very much. > > Arcadius. >