On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com> wrote:
> So does it means while doing "document add" the state of cluster is fetched > from zookeeper and then depending upon hash of docid the target shard is > decided ? We keep the zookeeper info cached locally. We only updated it when ZooKeeper tells us it has changed. > > Assume we have 3 shards ( with no replicas) in which 1 went down while > indexing , so will all the documents will be routed to remaining 2 shards > or only 2/3 rd of the documents will be indexed ? If answer is remaining 2 > shards will get all the documents , then if later 3rd shard comes up online > then will solr cloud will do rebalancing ? All of the updates that hash to the third shard will fail. That is why we have replicas - if you have a replica, it will take over as the leader. > > Is anywhere in zookeeper we store the range of docids stored in each shard, > or any other information about actual docs ? The range of hashes are stored for each shard in zk. > We have 2 datacentres (dc1 and > dc2) which need to be indexed with exactly same data and we update index > only once a day. Both dc1 and dc2 have exact same solrcloud config and > machines. > > Can we populate dc2 by just copying all the index binaries from > solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing > same documents on dc2). I guess solr replication API doesn't work in > solrcloud, hence loooking for work around. > > Thanks > Varun > > On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> ZooKeeper >> / >> /clusterstate.json - info about the layout and state of the cluster - >> collections, shards, urls, etc >> /collections - config to use for the collection, shard leader voting zk >> nodes >> /configs - sets of config files >> /live_nodes - ephemeral nodes, one per Solr node >> /overseer - work queue for update clusterstate.json, creating new >> collections, etc >> /overseer_elect - overseer voting zk nodes >> >> - Mark >> >> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com> >> wrote: >> >>> Hi Mark, >>> One more question >>> >>> While doing solr doc update/add what information is required from >> zookeeper >>> ? Can you tell what all information is stored in zookeeper other than the >>> startup configs. >>> >>> Thanks >>> Varun >>> >>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >>> >>>> >>>> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> I have some questions regarding role of zookeeper in solrcloud runtime, >>>>> while processing the queries . >>>>> >>>>> 1) Is zookeeper cluster referred by solr shards for processing every >>>>> request, or its only used to copy config on startup time ? >>>> >>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore >> startup >>>> - to get configs and set itself up. Then it only talks to ZooKeeper >> when a >>>> cluster state change happens - in that case, ZooKeeper pings Solr and >> Solr >>>> will get an update view of the cluster. That view is cached and used for >>>> requests. In a stable state, Solr is not talking to ZooKeeper other than >>>> the heartbeat they keep to know a node is up. >>>> >>>>> 2) How loadbalancing is done between replicas ? Is traffic stat shared >>>>> through zookeeper ? >>>> >>>> Basic round robin. Traffic stats are not currently in Zk. >>>> >>>>> 3) If for any reason zookeeper cluster goes offline for sometime, does >>>> solr >>>>> cloud will not be able to server any traffic ? >>>> >>>> It will stop allowing updates, but continue serving searches. >>>> >>>> - Mark >>>> >>>>> >>>>> >>>>> Thanks >>>>> Varun >>>> >>>> >> >>