On 9/17/2019 9:35 PM, Hongxu Ma wrote:
My questions:
* Is this error possible caused by "long gc pause"? my solr
zkClientTimeout=60000
It's possible. I can't say for sure that this is the issue, but it
might be.
* If so, how can I prevent this error happen? My thoughts: using G1
collector (as
https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey#ShawnHeisey-GCTuningforSolr)
or enlarge zkClientTimeout again, what's your idea?
If your ZK server ticktime setting is the typical value of 2000, that
means that the largest value you can use for the ZK timeout (which
Solr's zkClientTimeout value ultimately gets used to set) is 40 seconds
-- 20 times the ticktime is the biggest value ZK will allow.
So if your ZK server ticktime is 2000 milliseconds, you're not actually
getting 60 seconds, and I don't know what happens when you try ... I
would expect ZK to either just use its max value or ignore the setting
entirely, and I do not know which it is. That's something we should ask
the ZK mailing list and/or do testing on.
Dealing with the the "no registered leader" problem probably will
involve restarting at least one of the Solr server JVMs in the cloud,
and if that doesn't work, restart all of them.
What version of Solr do you have, and what is your max heap? The CMS
garbage collection that Solr 5.0 and later incorporate by default is
pretty good. My G1 settings might do slightly better, but the
improvement won't be dramatic unless your existing commandline has
absolutely no gc tuning at all.
Thanks,
Shawn