Re: Role of zookeeper at runtime

Mark Miller Thu, 28 Feb 2013 06:12:45 -0800

On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com> wrote:


> So does it means while doing "document add" the state of cluster is fetched
> from zookeeper and then depending upon hash of docid the target shard is
> decided ?

We keep the zookeeper info cached locally. We only updated it when ZooKeeper 
tells us it has changed.

> 
> Assume we have 3 shards ( with no replicas) in which 1 went down while
> indexing , so will all the documents will be routed to remaining 2 shards
> or only 2/3 rd of the documents will be indexed ? If answer is remaining 2
> shards will get all the documents , then if later 3rd shard comes up online
> then will solr cloud will do rebalancing ?

All of the updates that hash to the third shard will fail. That is why we have 
replicas - if you have a replica, it will take over as the leader.

> 
> Is anywhere in zookeeper we store the range of docids stored in each shard,
> or any other information about actual docs ?

The range of hashes are stored for each shard in zk.

> We have 2 datacentres (dc1 and
> dc2) which need to be indexed with exactly same data and we update index
> only once a day. Both dc1 and dc2 have exact same solrcloud config and
> machines.
> 
> Can we populate dc2 by just copying all the index binaries from
> solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing
> same documents on dc2). I guess solr replication API doesn't work in
> solrcloud, hence loooking for work around.
> 
> Thanks
> Varun
> 
> On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> wrote:
> 
>> ZooKeeper
>> /
>> /clusterstate.json - info about the layout and state of the cluster -
>> collections, shards, urls, etc
>> /collections - config to use for the collection, shard leader voting zk
>> nodes
>> /configs - sets of config files
>> /live_nodes - ephemeral nodes, one per Solr node
>> /overseer - work queue for update clusterstate.json, creating new
>> collections, etc
>> /overseer_elect - overseer voting zk nodes
>> 
>> - Mark
>> 
>> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com>
>> wrote:
>> 
>>> Hi Mark,
>>> One more question
>>> 
>>> While doing solr doc update/add what information is required from
>> zookeeper
>>> ? Can you tell what all information is stored in zookeeper other than the
>>> startup configs.
>>> 
>>> Thanks
>>> Varun
>>> 
>>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>>> 
>>>> 
>>>> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> I have some questions regarding role of zookeeper in solrcloud runtime,
>>>>> while processing the queries .
>>>>> 
>>>>> 1) Is zookeeper cluster referred by solr shards for processing every
>>>>> request, or its only used to copy config on startup time ?
>>>> 
>>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore
>> startup
>>>> - to get configs and set itself up. Then it only talks to ZooKeeper
>> when a
>>>> cluster state change happens - in that case, ZooKeeper pings Solr and
>> Solr
>>>> will get an update view of the cluster. That view is cached and used for
>>>> requests. In a stable state, Solr is not talking to ZooKeeper other than
>>>> the heartbeat they keep to know a node is up.
>>>> 
>>>>> 2) How loadbalancing is done between replicas ? Is traffic stat shared
>>>>> through zookeeper ?
>>>> 
>>>> Basic round robin. Traffic stats are not currently in Zk.
>>>> 
>>>>> 3) If for any reason zookeeper cluster goes offline for sometime, does
>>>> solr
>>>>> cloud will not be able to server any traffic ?
>>>> 
>>>> It will stop allowing updates, but continue serving searches.
>>>> 
>>>> - Mark
>>>> 
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Varun
>>>> 
>>>> 
>> 
>>

Re: Role of zookeeper at runtime

Reply via email to