Specifically a _leader_ being put into the down or recovering state is almost always because ZooKeeper cannot ping it and get a response back before it times out. This also points to large GC pauses no the Solr node. Using something like GCViewer on the GC logs at the time of the problem will help a lot.
A _follower_ can go into recovery when an update takes too long but that’s “leader initiated recovery” and originates _from_ the leader, which is much different than the leader going into a down state. Best, Erick > On Apr 17, 2019, at 7:54 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 4/17/2019 6:25 AM, vishal patel wrote: >> Why did shard1 take a 1.8 minutes time for update? and if it took time for >> update then why did replica1 try to become leader? Is it required to update >> any timeout? > > There's no information here that can tell us why the update took so long. My > best guess would be long GC pauses due to the heap size being too small. But > there might be other causes. > > Indexing a single document should be VERY fast. Even a large document should > only take a handful of milliseconds. > > If the request included "commit=true" as a parameter, then it might be the > commit that was slow, not the indexing. You'll need to check the logs to > determine that. > > The reason that the leader changed was almost certainly the fact that the > update took so long. SolrCloud would have decided that the node was down if > any operation took that long. > > Thanks, > Shawn