[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Ted Dunning (JIRA) Thu, 21 Jan 2010 09:26:16 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803371#action_12803371
 ]


Ted Dunning commented on SOLR-1724:
-----------------------------------

{quote}
... I agree, I'm not really into ephemeral
ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
highly superficial and can be intermittent. 
{quote}
I have found that when I was having trouble with ZK connectivity, the problems 
were simply surfacing issues that I had anyway.  You do have to configure the 
ZK client to not have long pauses (that is incompatible with SOLR how?) and you 
may need to adjust the timeouts on the ZK side.  More importantly, any issues 
with ZK connectivity will have their parallels with any other heartbeat 
mechanism and replicating a heartbeat system that tries to match ZK for 
reliability is going to be a significant  source of very nasty bugs.  Better to 
not rewrite that already works.  Keep in mind that ZK *connection* issues are 
not the same as session expiration.  Katta has a fairly important set of 
bugfixes now to make that distinction and ZK will soon handle connection loss 
on its own. 

It isn't a bad idea to keep shards around for a while if a node goes down.  
That can seriously decrease the cost of momentary outages such as for a 
software upgrade.  The idea is that when the node comes back, it can advertise 
availability of some shards and replication of those shards should cease.



> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Reply via email to