On 6/6/2014 8:31 AM, Aman Tandon wrote: > In my organisation we also want to implement the solrcloud, but the problem > is that, we are using the master-slave architecture and on master we do all > indexing, architecture of master is lower than the slaves. > > So if we implement the solrcloud in a fashion that master will be the > leader, and slaves will be the replicas then in that case, in the case of > high load leader can bear it, I guess every query firstly goes to leader > then it distributes the request as i noticed from the logs and blogs :) > > As well as master is in NY and slaves are in Dallas, which also might cause > latency issue and it will instead fail our purpose of faster query response. > > So i thought to use this shards parameter so that we query only from the > replicas not to the leader so that leader just work fine. But we were not > sure about this shards parameter, what do you think? what should we do with > latency issue and shards parameter.
SolrCloud does not yet have any way to prefer one set of replicas over the others, so if you just send it requests, they would be sent to both Dallas and New York, affecting search latency. Local replica preference is a desperately needed feature. Old-style distributed search with the shards parameter, combined with master/slave replication, is an effective way to be absolutely sure which servers you are querying. I would actually recommend that you get rid of replication and have your index updating software update each copy of the index independently. This is how I do my Solr install. It opens up a whole new set of possibilities -- you can change the schema and/or config on one set of servers, or upgrade any component -- Solr, Java, etc., without affecting the other set of servers at all. One note: in order for the indexing paradigm I've outlined to be actually effective, you must separately track which inserts/updates/deletes have been done for each server set. If you don't do that, they can get out of sync when you restart a server. Also, if you don't do this, having a server is down for an extended period of time might cause all indexing activity to stop on BOTH server sets. Thanks, Shawn