Re: Ping and client session timeouts

Patrick Hunt Fri, 21 May 2010 13:54:31 -0700


On 05/21/2010 11:32 AM, Stephen Green wrote:

Right.  The system can be very memory-intensive, but at the time these
are occurring, it's not under a really heavy load, and there's plenty
of heap available. However, while looking at a thread dump from one of
the nodes, I realized that a very poor decision meant that I had more
than 1200 threads running.  I expect this is more of a problem than
the GC at this point.  I'm taking steps to correct this problem now.


Lately, I've had fewer and fewer problems with GC.  In a former life,
I sat down the hall from the folks who wrote Hotspot's GC and they're
pretty sharp folks :-)

GC as a cause is very common, however had you mentioned 1200 threads Iwould have guessed that to be a potential issue. ;-)

Right.  I'd like to have as small a timeout as possible so that I
notice quickly when things disappear.  What's a reasonable minimum?  I
notice recommendations in other messages on the list that 20000 is a
good value.

The setting you should use typically is determined by your slarequirements. How soon do you want ephemeral nodes to be cleaned up if aclient fails? Say you were doing leader election, this would gatere-election in the case where the current leader failed (set it lowerand you are more responsive (faster), but also more susceptible to"false positives" (such as temp network glitch). Set it higher and youride over the network glitches however it takes longer to recover when aclient really does go down).

In some cases (hbase, solr) we've seen that the timeout had to be setartificially high due to the limitations of the current JVM GC algos.For example some hbase users were seeing GC pause times of > 4 minutes.So this raises the question - do you consider this a failure or not? (Icould reboot the machine faster than it takes to run that GC...)


Good luck,

Patrick

Re: Ping and client session timeouts

Reply via email to