Hi Amit, 12MB write buffer might be a bit high.
How are you generating your keys? You might hot spot a single region server if (for example) you create monotonically increasing keys. When you look at the HBase monitoring page, do you see a single region server getting all the requests? Anything weird in the GC logs? Do they all log similar? -- Lars ________________________________ From: Amit Jain <[email protected]> To: [email protected] Sent: Wednesday, November 16, 2011 3:06 PM Subject: Help with continuous loading configuration Hello, We're doing a proof-of-concept study to see if HBase is a good fit for an application we're planning to build. The application will be recording a continuous stream of sensor data throughout the day and the data needs to be online immediately. Our test cluster consists of 16 machines, each with 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using the HBase client Put class, and have set the table "auto flush" to false and the write buffer size to 12MB. Here are the region server JVM options: export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" And here are the property settings that we're using in the hbase-site.xml file: hbase.rootdir=hdfs://master:9000/hbase hbase.regionserver.handler.count=20 hbase.cluster.distributed=true hbase.zookeeper.quorum=zk01,zk02,zk03 hfile.block.cache.size=0 hbase.hregion.max.filesize=1073741824 hbase.regionserver.global.memstore.upperLimit=0.79 hbase.regionserver.global.memstore.lowerLimit=0.70 hbase.hregion.majorcompaction=0 hbase.hstore.compactionThreshold=15 hbase.hstore.blockingStoreFiles=20 hbase.rpc.timeout=0 zookeeper.session.timeout=3600000 It's taking about 24 hours to load 4TB of data which isn't quite fast enough for our application. Is there a more optimal configuration that we can use to improve loading performance? - Amit
