Re: Garbage collection issues

Michael Segel Fri, 18 May 2012 05:20:11 -0700

Head over to Cloudera's site and look at a couple of blog posts from Todd 
Lipcon. 
Also look at MSLABs .


On a side note... you don't have a lot of memory to play with...

On May 18, 2012, at 6:54 AM, Simon Kelly wrote:

> Hi
> 
> Firstly, let me complement the Hbase team on a great piece of software. We're 
> running a few clusters that are working well but we're really struggling with 
> a new one I'm trying to setup and could use a bit of help. I have read as 
> much as I can but just can't seem to get it right.
> 
> The difference between this cluster the others is that this one's load is 99% 
> writes. Each write contains about 40 columns to a single table and column 
> family and the total data size varies between about 1 & 2K. The load per 
> server varies between 20  and 90 requests per second at different times of 
> the day. The row keys are UUID's so are uniformly distributed across the 
> (currently 60) regions. 
> 
> The problem seems to be that after some time a GC cycle takes longer that 
> expected one of the regionservers and the master kills the regionserver.
> 
> This morning I ran the system up till the first regionserver failure and 
> recorded the data with Ganglia. I have attached the following ganglia graphs:
> hbase.regionserver.compactionQueueSize
> hbase.regionserver.memstoreSizeMB
> requests_per_minute (to the service that calls hbase)
> request_processing_time (of the service that calls hbase)
> Any assistance would be greatly appreciated. I did have GC logging on so have 
> access to all that data too.
> 
> Best regards
> Simon Kelly
> 
> Cluster details
> ----------------------
> Its running on 5 machines with the following specs:
> CPUs: 4 x 2.39 GHz
> RAM: 8 GB
> Ubuntu 10.04.2 LTS
> The Hadoop cluster (version 1.0.1, r1243785) is running over all the machines 
> that has 8TB of capacity (60% unused). On top of that is Hbase version 
> 0.92.1, r1298924. All the servers run Hadoop datanodes and Hbase 
> regionservers. One server hosts the Hadoop primary namenode and the Hbase 
> master. 3 servers form the Zookeeper quorum.
> 
> The Hbase config is as follows:
> HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70"
> HBASE_HEAPSIZE=4096
> hbase.rootdir : hdfs://server1:8020/hbase
> hbase.cluster.distributed : true
> hbase.zookeeper.property.clientPort : 2222
> hbase.zookeeper.quorum : server1,server2,server3
> zookeeper.session.timeout : 30000
> hbase.regionserver.maxlogs : 16
> hbase.regionserver.handler.count : 50
> hbase.regionserver.codecs : lzo
> hbase.master.startup.retainassign : false
> hbase.hregion.majorcompaction : 0
> (for the benefit of those without the attachements I'll describe the graphs:
> 0900 - system starts
> 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase compactions 
> happen and a slight increase in request_processing_time
> 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase compactions)
> 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more hbase 
> compactions happen and a slightly larger increase in request_processing_time
> 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase compactions 
> and increase in request_processing_time
> 1230 - hbase logs for server1 record: We slept 13318ms instead of 3000ms and 
> regionserver1 is killed by master, request_processing_time goes way up
> 1326 - hbase logs for server3 record: We slept 77377ms instead of 3000ms and 
> regionserver2 is killed by master
> )

Re: Garbage collection issues

Reply via email to