Unfortunately nothing of the both works well for me.
1. Restarting all nodes leads to described situation on some shards.
Even if no alternatives for the shard it does not gain the leader
(all other replicas on down nodes). I suppose it waits for somw timeout.
But what timeout and can it be altered?
2. FORCELEADER simply does not work.
 As well as RELOAD and REBALANCElEADERS.

DELETEREPLICA until no alternatives, than ADDREPLICA - that trick works.

> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, December 24, 2018 4:09 AM
> To: solr-user
> Subject: Re: Nol Leader after nodes restart
> 
> There are a couple of options:
> 
> 1> stop all your nodes. Start them one at a time and wait for "leader
> election" to occur. This can take several minutes, but eventually the
> replicas on that machine will become the leader. Then start the other
> nodes, again one at a time waiting for them to recover fully before
> starting the next node.
> 
> 2> you can try the FORCELEADER collecrions API option..
> 
> The leater election and retry logic has been vastly improved in 7.3+
> (with some of the last improvements in 7.5).
> 
> Best,
> Erick
> 
> On Sun, Dec 23, 2018 at 1:43 AM Vadim Ivanov
> <vadim.iva...@spb.ntk-intourist.ru> wrote:
> >
> > Hi!
> > After restart of  nodes I have situation when no leader on shard can be
> > elected
> > Shard rpk51_222_306 resides on 3 nodes (solr00, solr06, solr09) with
> > corresponding replica names
> > (rpk51_222_306_00, rpk51_222_306_06, rpk51_222_306_09)
> > Logs looks like this
> > PeerSync: core=rpk51_222_306_00 url=http://solr00:8983/solr Requested 26
> > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25
> > PeerSync: core=rpk51_222_306_06 url=http://solr06:8983/solr Requested 29
> > updates from http://solr00:8983/solr/rpk51_222_306_00/ but retrieved 24
> > PeerSync: core=rpk51_222_306_09 url=http://solr09:8983/solr Requested 26
> > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25
> >
> > 00 and 09 tries to recover from 06 and fail
> > 06 tries to recover from 00 and fail
> >
> > It goes continuously every minute and forever
> >
> > How to break this deadlock loop?
> > --
> > Vadim
> >
> >

Reply via email to