Nitay,
Thanks for sending us the info.

We have experienced such gc problem in our HDFS (hadoop file system) setups.
The gc had been quite a problem for us with the Namenode (hadoop hdfs)
process. We have seen the namenode just stalling for minutes doing garbage
collection. We currently run the namenode with the following gc options

-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC

And have avoided getting into trouble with the gc.

What options are you guys using with the java process that embeds the java
zookeeper client? Maybe the above gc options could help.

Also, we would be interested in having such jni wrappers into the c libarary
in case more people want it. Is the wrapper posted somewhere for us to take
a look at? 

Thanks
mahadev




On 4/8/09 1:17 PM, "Patrick Hunt" <ph...@apache.org> wrote:

> What are you running for a session timeout on your clients?
> 
> Can you run with something like jvisualvm or jconsole, and watch the gc
> activity when the session timeouts occur? Might give you some insight.
> Have you tried one of the alternative GC's available in the VM?
> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbage
> CollectionTuning.aspx
> ie "Flags for Latency Applications"
> 
> We are also working on the following jira:
> https://issues.apache.org/jira/browse/ZOOKEEPER-321
> which will eliminate session expirations for clients w/o ephemerals. (is
> this the case for you?)
> 
> Try turning on debug in your client, the client will spit out:
>     LOG.debug("Got ping response for sessionid:0x"
> If you turn on trace logging in the server you should see session
> updates there as well (c->server, which control session expiration).
> 
> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code
> still running w/in the same (vm) process?
> 
> 
> Unfortunately I can't think of anything else if it is the GC. Basically
> you'd have to increase the timeout or try another gc with lower latency.
> 
> Perhaps Mahadev/Ben/Flavio might have insight...
> 
> Patrick
> 
> Nitay wrote:
>> Hey guys,
>> 
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>> 
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>> 
>> https://issues.apache.org/jira/browse/HBASE-1316
>> 
>> We wanted to loop you guys in to see if you have any thoughts/suggestions on
>> the matter.
>> 
>> Thanks,
>> -n
>> 

Reply via email to