RE: 7.2.1 cluster dies within minutes after restart

Markus Jelsma Mon, 29 Jan 2018 11:56:24 -0800

Hello SG,

The default in solr.in.sh is commented so it defaults to the value set in 
bin/solr, which is fifteen seconds. Just uncomment the setting in solr.in.sh 
and your timeout will be thirty seconds.


For Solr itself to really default to thirty seconds, Solr's bin/solr needs to 
be patched to use the correct value.

Regards,
Markus
 
-----Original message-----
> From:S G <sg.online.em...@gmail.com>
> Sent: Monday 29th January 2018 20:15
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Hi Markus,
> 
> We are in the process of upgrading our clusters to 7.2.1 and I am not sure
> I quite follow the conversation here.
> Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher value
> in the config (and it's just a default value being wrong/overridden
> somewhere)?
> Or is it more severe in the sense that any config set for ZK_CLIENT_TIMEOUT
> by the user is just ignored completely by Solr in 7.2.1 ?
> 
> Thanks
> SG
> 
> 
> On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> > says 30000 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in
> > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> > still 15000, not 30000.
> >
> > But, back to my topic. I see we explicitly set it in solr.in.sh to 30000.
> > To be sure, i applied your patch to a production machine, all our
> > collections run with 30000. So how would that explain this log line?
> >
> > o.a.z.ClientCnxn Client session timed out, have not heard from server in
> > 22130ms
> >
> > We also see these with smaller values, seven seconds. And, is this
> > actually an indicator of the problems we have?
> >
> > Any ideas?
> >
> > Many thanks,
> > Markus
> >
> >
> > -----Original message-----
> > > From:Markus Jelsma <markus.jel...@openindex.io>
> > > Sent: Saturday 27th January 2018 10:03
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: 7.2.1 cluster dies within minutes after restart
> > >
> > > Hello,
> > >
> > > I grepped for it yesterday and found nothing but 30000 in the settings,
> > but judging from the weird time out value, you may be right. Let me apply
> > your patch early next week and check for spurious warnings.
> > >
> > > Another note worthy observation for those working on cloud stability and
> > recovery, whenever this happens, some nodes are also absolutely sure to run
> > OOM. The leaders usually live longest, the replica's don't, their heap
> > usage peaks every time, consistently.
> > >
> > > Thanks,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Shawn Heisey <apa...@elyograg.org>
> > > > Sent: Saturday 27th January 2018 0:49
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > >
> > > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> > server in 22130ms (although zkClientTimeOut is 30000).
> > > >
> > > > Are you absolutely certain that there is a setting for zkClientTimeout
> > > > that is actually getting applied?  The default value in Solr's example
> > > > configs is 30 seconds, but the internal default in the code (when no
> > > > configuration is found) is still 15.  I have confirmed this in the
> > code.
> > > >
> > > > Looks like SolrCloud doesn't log the values it's using for things like
> > > > zkClientTimeout.  I think it should.
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-11915
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
>

RE: 7.2.1 cluster dies within minutes after restart

Reply via email to