Hello,

We're doing a proof-of-concept study to see if HBase is a good fit for an
application we're planning to build.  The application will be recording a
continuous stream of sensor data throughout the day and the data needs to
be online immediately.  Our test cluster consists of 16 machines, each with
16 cores and 32GB of RAM and 8TB local storage running CDH3u2.  We're using
the HBase client Put class, and have set the table "auto flush" to false
and the write buffer size to 12MB.  Here are the region server JVM options:

export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

And here are the property settings that we're using in the hbase-site.xml
file:

hbase.rootdir=hdfs://master:9000/hbase
hbase.regionserver.handler.count=20
hbase.cluster.distributed=true
hbase.zookeeper.quorum=zk01,zk02,zk03
hfile.block.cache.size=0
hbase.hregion.max.filesize=1073741824
hbase.regionserver.global.memstore.upperLimit=0.79
hbase.regionserver.global.memstore.lowerLimit=0.70
hbase.hregion.majorcompaction=0
hbase.hstore.compactionThreshold=15
hbase.hstore.blockingStoreFiles=20
hbase.rpc.timeout=0
zookeeper.session.timeout=3600000

It's taking about 24 hours to load 4TB of data which isn't quite fast
enough for our application.  Is there a more optimal configuration that we
can use to improve loading performance?

- Amit

Reply via email to