[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791528#action_12791528
 ] 

Jason Rutherglen commented on SOLR-1277:
----------------------------------------

bq. as two types of failures, possibly

A failure is a failure and whether it's the GC or something
else, it's really the same thing. Sounds like we're defining the
expectation of the client handling of a failure?

I think we'll need to define groups of shards (maybe this is
already in the spec), and allow a configurable failure setting
per group. For example, group "live" would be allowed to return
partial results because the user always wants results returned
quickly. Group "archive" would always return complete results
(if a node is down it can be configured to retry the request N
times until it succeeds under a given max timeout). 

Also a request could be addressed to a group of shards, which
would allow one set of replicated Zookeeper servers for N Solr
clusters (instead of a Zookeeper server per Solr cluster).  

How are we addressing a failed connection to a slave server, and
instead of failing the request, re-making the request to an
adjacent slave?

> Implement a Solr specific naming service (using Zookeeper)
> ----------------------------------------------------------
>
>                 Key: SOLR-1277
>                 URL: https://issues.apache.org/jira/browse/SOLR-1277
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to