Great, thanks very much for the help. I'm going to see if I can get more memory into the servers and will also experiment with XX:ParallelGCThreads. We already have XX:CMSInitiatingOccupancyFraction=70 in the config.
Uday, what do you mean by "a fixed size record"? Do you mean the record that is being written to Hbase? On 19 May 2012 12:44, Uday Jarajapu <[email protected]> wrote: > Also, try playing with > > #3) -XX:CMSInitiatingOccupancyFraction=70 to kick off a CMS GC sooner > than a default trigger would. > > #4) a fixed size record to make sure you do not run into the promotion > failure due to fragmentation > > > On Fri, May 18, 2012 at 4:35 PM, Uday Jarajapu > <[email protected]>wrote: > >> I think you have it right for the most part, except you are underarmed >> with only 8G and a 4-core box. Since you have Xmx=xms=4G, the default >> collector (parallel) with the right number of threads might be able to pull >> it off. In fact, CMS might be defaulting to that eventually. >> >> As you know, CMS is great for sweeping heap sizes in the 8G-16G range but >> it eventually defaults to parallel GC for smaller heaps that run out of >> space quickly. On top of that, it is non compacting. So, what works for a >> couple of cycles might quickly run out of room and leave no other choice >> but to stop-the-world. To avoid the hit when that happens, try limiting the >> number of parallel GC Threads to be a third of your cores. In your case, >> that would be 1 unfortunately. Try 1 or 2. >> >> I would recommend trying one of these two tests on the Region server: >> >> #1) -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode >> -XX:ParallelGCThreads=1 ( or 2) * >> >> *#2) -XX:ParallelGCThreads=2 * >> * >> The second test is just for giggles to see if the CMS aspect is helping >> you at all (or if you are ending up doing a stop-the-world more than you >> want. If that is the case, try using the default GC ) >> * >> ** >> *Hope that helps, >> Uday >> >> On Fri, May 18, 2012 at 4:54 AM, Simon Kelly <[email protected]>wrote: >> >>> Hi >>> >>> Firstly, let me complement the Hbase team on a great piece of software. >>> We're running a few clusters that are working well but we're really >>> struggling with a new one I'm trying to setup and could use a bit of help. >>> I have read as much as I can but just can't seem to get it right. >>> >>> The difference between this cluster the others is that this one's load >>> is 99% writes. Each write contains about 40 columns to a single table and >>> column family and the total data size varies between about 1 & 2K. The load >>> per server varies between 20 and 90 requests per second at different times >>> of the day. The row keys are UUID's so are uniformly distributed across the >>> (currently 60) regions. >>> >>> The problem seems to be that after some time a GC cycle takes longer >>> that expected one of the regionservers and the master kills the >>> regionserver. >>> >>> This morning I ran the system up till the first regionserver failure and >>> recorded the data with Ganglia. I have attached the following ganglia >>> graphs: >>> >>> - hbase.regionserver.compactionQueueSize >>> - hbase.regionserver.memstoreSizeMB >>> - requests_per_minute (to the service that calls hbase) >>> - request_processing_time (of the service that calls hbase) >>> >>> Any assistance would be greatly appreciated. I did have GC logging on so >>> have access to all that data too. >>> >>> Best regards >>> Simon Kelly >>> >>> *Cluster details* >>> *----------------------* >>> Its running on 5 machines with the following specs: >>> >>> - CPUs: 4 x 2.39 GHz >>> - RAM: 8 GB >>> - Ubuntu 10.04.2 LTS >>> >>> The Hadoop cluster (version 1.0.1, r1243785) is running over all the >>> machines that has 8TB of capacity (60% unused). On top of that is Hbase >>> version 0.92.1, r1298924. All the servers run Hadoop datanodes and Hbase >>> regionservers. One server hosts the Hadoop primary namenode and the Hbase >>> master. 3 servers form the Zookeeper quorum. >>> >>> The Hbase config is as follows: >>> >>> - HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC >>> -XX:+CMSIncrementalMode -XX:+UseParNewGC >>> -XX:CMSInitiatingOccupancyFraction=70" >>> - HBASE_HEAPSIZE=4096 >>> >>> >>> - hbase.rootdir : hdfs://server1:8020/hbase >>> - hbase.cluster.distributed : true >>> - hbase.zookeeper.property.clientPort : 2222 >>> - hbase.zookeeper.quorum : server1,server2,server3 >>> - zookeeper.session.timeout : 30000 >>> - hbase.regionserver.maxlogs : 16 >>> - hbase.regionserver.handler.count : 50 >>> - hbase.regionserver.codecs : lzo >>> - hbase.master.startup.retainassign : false >>> - hbase.hregion.majorcompaction : 0 >>> >>> (for the benefit of those without the attachements I'll describe the >>> graphs: >>> >>> - 0900 - system starts >>> - 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase >>> compactions happen and a slight increase in request_processing_time >>> - 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase >>> compactions) >>> - 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more >>> hbase compactions happen and a slightly larger increase in >>> request_processing_time >>> - 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase >>> compactions and increase in request_processing_time >>> - 1230 - hbase logs for server1 record: We slept 13318ms instead of >>> 3000ms and regionserver1 is killed by master, request_processing_time >>> goes >>> way up >>> - 1326 - hbase logs for server3 record: We slept 77377ms instead of >>> 3000ms and regionserver2 is killed by master >>> >>> ) >>> >> >> >
