Believe this is reported in https://issues.apache.org/jira/browse/SOLR-10471
On Mon, Jan 29, 2018 at 2:55 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello SG, > > The default in solr.in.sh is commented so it defaults to the value set in > bin/solr, which is fifteen seconds. Just uncomment the setting in > solr.in.sh and your timeout will be thirty seconds. > > For Solr itself to really default to thirty seconds, Solr's bin/solr needs > to be patched to use the correct value. > > Regards, > Markus > > -----Original message----- > > From:S G <sg.online.em...@gmail.com> > > Sent: Monday 29th January 2018 20:15 > > To: solr-user@lucene.apache.org > > Subject: Re: 7.2.1 cluster dies within minutes after restart > > > > Hi Markus, > > > > We are in the process of upgrading our clusters to 7.2.1 and I am not > sure > > I quite follow the conversation here. > > Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher > value > > in the config (and it's just a default value being wrong/overridden > > somewhere)? > > Or is it more severe in the sense that any config set for > ZK_CLIENT_TIMEOUT > > by the user is just ignored completely by Solr in 7.2.1 ? > > > > Thanks > > SG > > > > > > On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma < > markus.jel...@openindex.io> > > wrote: > > > > > Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml > > > says 30000 if ZK_CLIENT_TIMEOUT is not set, which is by default unset > in > > > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is > > > still 15000, not 30000. > > > > > > But, back to my topic. I see we explicitly set it in solr.in.sh to > 30000. > > > To be sure, i applied your patch to a production machine, all our > > > collections run with 30000. So how would that explain this log line? > > > > > > o.a.z.ClientCnxn Client session timed out, have not heard from server > in > > > 22130ms > > > > > > We also see these with smaller values, seven seconds. And, is this > > > actually an indicator of the problems we have? > > > > > > Any ideas? > > > > > > Many thanks, > > > Markus > > > > > > > > > -----Original message----- > > > > From:Markus Jelsma <markus.jel...@openindex.io> > > > > Sent: Saturday 27th January 2018 10:03 > > > > To: solr-user@lucene.apache.org > > > > Subject: RE: 7.2.1 cluster dies within minutes after restart > > > > > > > > Hello, > > > > > > > > I grepped for it yesterday and found nothing but 30000 in the > settings, > > > but judging from the weird time out value, you may be right. Let me > apply > > > your patch early next week and check for spurious warnings. > > > > > > > > Another note worthy observation for those working on cloud stability > and > > > recovery, whenever this happens, some nodes are also absolutely sure > to run > > > OOM. The leaders usually live longest, the replica's don't, their heap > > > usage peaks every time, consistently. > > > > > > > > Thanks, > > > > Markus > > > > > > > > -----Original message----- > > > > > From:Shawn Heisey <apa...@elyograg.org> > > > > > Sent: Saturday 27th January 2018 0:49 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: 7.2.1 cluster dies within minutes after restart > > > > > > > > > > On 1/26/2018 10:02 AM, Markus Jelsma wrote: > > > > > > o.a.z.ClientCnxn Client session timed out, have not heard from > > > server in 22130ms (although zkClientTimeOut is 30000). > > > > > > > > > > Are you absolutely certain that there is a setting for > zkClientTimeout > > > > > that is actually getting applied? The default value in Solr's > example > > > > > configs is 30 seconds, but the internal default in the code (when > no > > > > > configuration is found) is still 15. I have confirmed this in the > > > code. > > > > > > > > > > Looks like SolrCloud doesn't log the values it's using for things > like > > > > > zkClientTimeout. I think it should. > > > > > > > > > > https://issues.apache.org/jira/browse/SOLR-11915 > > > > > > > > > > Thanks, > > > > > Shawn > > > > > > > > > > > > > > > > > > > >