Head over to Cloudera's site and look at a couple of blog posts from Todd Lipcon. Also look at MSLABs .
On a side note... you don't have a lot of memory to play with... On May 18, 2012, at 6:54 AM, Simon Kelly wrote: > Hi > > Firstly, let me complement the Hbase team on a great piece of software. We're > running a few clusters that are working well but we're really struggling with > a new one I'm trying to setup and could use a bit of help. I have read as > much as I can but just can't seem to get it right. > > The difference between this cluster the others is that this one's load is 99% > writes. Each write contains about 40 columns to a single table and column > family and the total data size varies between about 1 & 2K. The load per > server varies between 20 and 90 requests per second at different times of > the day. The row keys are UUID's so are uniformly distributed across the > (currently 60) regions. > > The problem seems to be that after some time a GC cycle takes longer that > expected one of the regionservers and the master kills the regionserver. > > This morning I ran the system up till the first regionserver failure and > recorded the data with Ganglia. I have attached the following ganglia graphs: > hbase.regionserver.compactionQueueSize > hbase.regionserver.memstoreSizeMB > requests_per_minute (to the service that calls hbase) > request_processing_time (of the service that calls hbase) > Any assistance would be greatly appreciated. I did have GC logging on so have > access to all that data too. > > Best regards > Simon Kelly > > Cluster details > ---------------------- > Its running on 5 machines with the following specs: > CPUs: 4 x 2.39 GHz > RAM: 8 GB > Ubuntu 10.04.2 LTS > The Hadoop cluster (version 1.0.1, r1243785) is running over all the machines > that has 8TB of capacity (60% unused). On top of that is Hbase version > 0.92.1, r1298924. All the servers run Hadoop datanodes and Hbase > regionservers. One server hosts the Hadoop primary namenode and the Hbase > master. 3 servers form the Zookeeper quorum. > > The Hbase config is as follows: > HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70" > HBASE_HEAPSIZE=4096 > hbase.rootdir : hdfs://server1:8020/hbase > hbase.cluster.distributed : true > hbase.zookeeper.property.clientPort : 2222 > hbase.zookeeper.quorum : server1,server2,server3 > zookeeper.session.timeout : 30000 > hbase.regionserver.maxlogs : 16 > hbase.regionserver.handler.count : 50 > hbase.regionserver.codecs : lzo > hbase.master.startup.retainassign : false > hbase.hregion.majorcompaction : 0 > (for the benefit of those without the attachements I'll describe the graphs: > 0900 - system starts > 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase compactions > happen and a slight increase in request_processing_time > 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase compactions) > 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more hbase > compactions happen and a slightly larger increase in request_processing_time > 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase compactions > and increase in request_processing_time > 1230 - hbase logs for server1 record: We slept 13318ms instead of 3000ms and > regionserver1 is killed by master, request_processing_time goes way up > 1326 - hbase logs for server3 record: We slept 77377ms instead of 3000ms and > regionserver2 is killed by master > )
