Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Erick Erickson Wed, 17 Apr 2019 19:07:18 -0700

Specifically a _leader_ being put into the down or recovering state is almost 
always because ZooKeeper cannot ping it and get a response back before it times 
out. This also points to large GC pauses no the Solr node. Using something like 
GCViewer on the GC logs at the time of the problem will help a lot.


A _follower_ can go into recovery when an update takes too long but that’s 
“leader initiated recovery” and originates _from_ the leader, which is much 
different than the leader going into a down state.

Best,
Erick

> On Apr 17, 2019, at 7:54 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 4/17/2019 6:25 AM, vishal patel wrote:
>> Why did shard1 take a 1.8 minutes time for update? and if it took time for 
>> update then why did replica1 try to become leader? Is it required to update 
>> any timeout?
> 
> There's no information here that can tell us why the update took so long.  My 
> best guess would be long GC pauses due to the heap size being too small.  But 
> there might be other causes.
> 
> Indexing a single document should be VERY fast.  Even a large document should 
> only take a handful of milliseconds.
> 
> If the request included "commit=true" as a parameter, then it might be the 
> commit that was slow, not the indexing.  You'll need to check the logs to 
> determine that.
> 
> The reason that the leader changed was almost certainly the fact that the 
> update took so long.  SolrCloud would have decided that the node was down if 
> any operation took that long.
> 
> Thanks,
> Shawn

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Reply via email to