RE: best practice for restarting the entire SolrCloud cluster

Markus Jelsma Thu, 08 Nov 2012 12:57:38 -0800
Hi - i think you're seeing:
https://issues.apache.org/jira/browse/SOLR-3993
 
 
-----Original message-----
> From:Bill Au <bill.w...@gmail.com>
> Sent: Thu 08-Nov-2012 21:16
> To: solr-user@lucene.apache.org
> Subject: best practice for restarting the entire SolrCloud cluster
> 
> I have a simple SolrCloud cluster with 4 Solr instances and 1 shard.  I can
> start and stop individual Solr instances without any problem.  But not when
> I have to shutdown all the Solr instances at the same time.
> 
> After shutting down all the Solr instances, the first instance that starts
> up wait for all the replicas:
> 
> INFO: Waiting until we see more replicas up: total=4 found=3
> timeoutin=169243
> 
> In the meantime, any additional Solr instances that start up while the
> first one is waiting can't get the leader from zookeeper:
> 
> SEVERE: Error getting leader from zk
> org.apache.solr.common.SolrException: Could not get leader props
> 
> When the first Solr instance see all the replicas, it becomes the leader:
> 
> INFO: Enough replicas found to continue.
> INFO: I may be the new leader - try and sync
> 
> But it fails to sync with the instances that had failed to get the leader
> before:
> 
> WARNING: PeerSync: core=collection1 url=http://host2:8983/solr  exception
> talking to http://host2:8983/solr/collection1/, failed
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://host2:8983/solr/collection1
> 
> So I ended up with one for more replicas down after the restart.  I had to
> figure out which replica is down and restart them.
> 
> What I also discovered is that if I start the first Solr instance and wait
> until it returns after the leaderVoteWait of 3 minutes, the rest of the
> Solr instance can be started without any problem since by then they can get
> the leader from zookeeper.
> 
> Is there a better way to restart an entire SolrCloud cluster?
> 
> Bill
>
RE: best practice for restarting the entire SolrCloud cluster

Reply via email to