[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803371#action_12803371 ]
Ted Dunning commented on SOLR-1724: ----------------------------------- {quote} ... I agree, I'm not really into ephemeral ZK nodes for Solr hosts/nodes. The reason is contact with ZK is highly superficial and can be intermittent. {quote} I have found that when I was having trouble with ZK connectivity, the problems were simply surfacing issues that I had anyway. You do have to configure the ZK client to not have long pauses (that is incompatible with SOLR how?) and you may need to adjust the timeouts on the ZK side. More importantly, any issues with ZK connectivity will have their parallels with any other heartbeat mechanism and replicating a heartbeat system that tries to match ZK for reliability is going to be a significant source of very nasty bugs. Better to not rewrite that already works. Keep in mind that ZK *connection* issues are not the same as session expiration. Katta has a fairly important set of bugfixes now to make that distinction and ZK will soon handle connection loss on its own. It isn't a bad idea to keep shards around for a while if a node goes down. That can seriously decrease the cost of momentary outages such as for a software upgrade. The idea is that when the node comes back, it can advertise availability of some shards and replication of those shards should cease. > Real Basic Core Management with Zookeeper > ----------------------------------------- > > Key: SOLR-1724 > URL: https://issues.apache.org/jira/browse/SOLR-1724 > Project: Solr > Issue Type: New Feature > Components: multicore > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Fix For: 1.5 > > Attachments: SOLR-1724.patch > > > Though we're implementing cloud, I need something real soon I can > play with and deploy. So this'll be a patch that only deploys > new cores, and that's about it. The arch is real simple: > On Zookeeper there'll be a directory that contains files that > represent the state of the cores of a given set of servers which > will look like the following: > /production/cores-1.txt > /production/cores-2.txt > /production/core-host-1-actual.txt (ephemeral node per host) > Where each core-N.txt file contains: > hostname,corename,instanceDir,coredownloadpath > coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, > etc > and > core-host-actual.txt contains: > hostname,corename,instanceDir,size > Everytime a new core-N.txt file is added, the listening host > finds it's entry in the list and begins the process of trying to > match the entries. Upon completion, it updates it's > /core-host-1-actual.txt file to it's completed state or logs an error. > When all host actual files are written (without errors), then a > new core-1-actual.txt file is written which can be picked up by > another process that can create a new core proxy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.