[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791425#action_12791425
 ] 

Mark Miller commented on SOLR-1277:
-----------------------------------

So based on what we know, it sounds like we are going to have to use a very 
high timeout for the ZooKeeper client?

Then each node will run a thread that periodically updates its availability? 
When a node chooses its shards for a distributed search, it can look at how 
long its been since each shard updated itself, and choose or drop based on 
that? In the event that a *very* long time out period has passed, the client 
will timeout and the znode will actually be removed?

This seems like it will be easier than trying to reconnect after timeouts and 
managing Solr during the "disconnected" period?

Sound like the update itself might be the current load on that node - then 
nodes choosing other nodes for a distrib search can use both how recently nodes 
where updated as well as their reported loads to choose which nodes to select 
for a search?

Does this sound right?

> Implement a Solr specific naming service (using Zookeeper)
> ----------------------------------------------------------
>
>                 Key: SOLR-1277
>                 URL: https://issues.apache.org/jira/browse/SOLR-1277
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to