Re: recovering mode loop

2015-09-25 Thread Erick Erickson
On a quick look at the replica jstack (the leader didn't come through in text form) there's nothing that jumps out. I _have_ seen lots and lots of updates coming through one at a time do some weird things with replicas going in and out of recovery, so that's a good intuition to follow up on.

Re: recovering mode loop

2015-09-25 Thread Lorenzo Fundaró
I think the attachment was stripped off from the mail :( . here's a public link. https://drive.google.com/file/d/0B_z8xmsby0uxRDZEeWpLcnR2b3M/view?usp=sharing On 25 September 2015 at 09:59, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > This is the last logs i've got, even with a

Re: recovering mode loop

2015-09-23 Thread Erick Erickson
Wow, this is not expected at all. There's no way you should, on the face of it, get overlapping on-deck searchers. I recommend you put your maxWarmingSearchers back to 2, that's a fail-safe that is there to make people look at why they're warming a bunch of searchers at once. With those

recovering mode loop

2015-09-23 Thread Lorenzo Fundaró
Hi !, I keep getting nodes that fall into recovery mode and then issue the following log WARN every 10 seconds: WARN Stopping recovery for core= coreNodeName=core_node7 and sometimes this appears as well: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 At higher traffic time, this gets

Re: recovering mode loop

2015-09-23 Thread Lorenzo Fundaró
I forgot some additional details: solr version is 5.0.0 and when one of the nodes enter recovery mode the leader says this: ​ The current zkClientTimeout is 15 seconds. I am gonna try to increment to 30 seconds. The process is running like this usr/lib/jvm/java-8-oracle/bin/java -server

Re: recovering mode loop

2015-09-23 Thread Lorenzo Fundaró
On 23 September 2015 at 18:08, Erick Erickson wrote: > Wow, this is not expected at all. There's no > way you should, on the face of it, get > overlapping on-deck searchers. > > I recommend you put your maxWarmingSearchers > back to 2, that's a fail-safe that is there to

Re: recovering mode loop

2015-09-23 Thread Erick Erickson
bq: and when one of the nodes enter recovery mode the leader says this: Hmmm, nothing came through, the mail filter is pretty aggressive about stripping attachments though. bq: You mean 10 seconds apart ? Hmmm, no I mean 10 minutes. That would explain the overlapping searchers since the only

Re: recovering mode loop

2015-09-23 Thread Lorenzo Fundaró
here are the logs that didnt make it through the image: (sorry for the misalignment on the logs) 9/23/2015, 7:14:49 PMERRORStreamingSolrClientserror org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at

Re: recovering mode loop

2015-09-23 Thread Erick Erickson
bq: 9/23/2015, 7:14:49 PMWARNZkControllerLeader is publishing core=dawanda coreNodeName =core_node10 state=down on behalf of un-reachable replica Ok, this brings up a different possibility. If you happen to be indexing at a very high rate there's some possibility that the followers get so busy