Re: leader election stuck after hosts restarts

2021-01-22 Thread Pierre Salagnac
Thanks Alessandro. We found this Jira ticket that may be the root cause of this issue: https://issues.apache.org/jira/browse/SOLR-14356 I'm not sure whether it is the reason of the leader election initially failing, but it prevents Solr from exiting this error loop. Le mer. 13 janv. 2021 à

Re: leader election stuck after hosts restarts

2021-01-13 Thread Alessandro Benedetti
I faced these problems a while ago, but at the time I created a blog post which I hope could help: https://sease.io/2018/05/solrcloud-leader-election-failing.html - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from:

Re: leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Sorry I missed this detail. We are running Solr 8.2. Thanks Le mar. 12 janv. 2021 à 16:46, Phill Campbell a écrit : > Which version of Apache Solr? > > > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > > > Hello, > > We had a stuck leader election for a shard. > > > > We have

Re: leader election stuck after hosts restarts

2021-01-12 Thread Phill Campbell
Which version of Apache Solr? > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > Hello, > We had a stuck leader election for a shard. > > We have collections with 2 shards, each shard has 5 replicas. We have many > collections but the issue happened for a single shard. Once all host

Re: leader election stuck after hosts restarts

2021-01-12 Thread matthew sporleder
When this has happened to me before I have had pretty good luck by restarting the overseer leader, which can be found in zookeeper under /overseer_elect/leader If that doesn't work I've had to do more intrusive and manual recovery methods, which suck. On Tue, Jan 12, 2021 at 10:36 AM Pierre

leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Hello, We had a stuck leader election for a shard. We have collections with 2 shards, each shard has 5 replicas. We have many collections but the issue happened for a single shard. Once all host restarts completed, this shard was stuck with one replica is "recovery" state and all other is "down"