Hi Lars, The keys are arriving in random order. The HBase monitoring page shows evenly distributed load across all of the region servers. I didn't see anything weird in the gc logs, no mention of any failures. I'm a little unclear about what the optimal values for the following properties should be:
hbase.hstore.compactionThreshold hbase.hstore.blockingStoreFiles Is there some rule of thumb that I can use to determine good values for these properties? - Amit On Wed, Nov 16, 2011 at 3:14 PM, lars hofhansl <[email protected]> wrote: > Hi Amit, > > 12MB write buffer might be a bit high. > > How are you generating your keys? You might hot spot a single region > server if (for example) you create > monotonically increasing keys. When you look at the HBase monitoring page, > do you see a single region server > getting all the requests? > > > Anything weird in the GC logs? Do they all log similar? > > > -- Lars > > > > ________________________________ > From: Amit Jain <[email protected]> > To: [email protected] > Sent: Wednesday, November 16, 2011 3:06 PM > Subject: Help with continuous loading configuration > > Hello, > > We're doing a proof-of-concept study to see if HBase is a good fit for an > application we're planning to build. The application will be recording a > continuous stream of sensor data throughout the day and the data needs to > be online immediately. Our test cluster consists of 16 machines, each with > 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using > the HBase client Put class, and have set the table "auto flush" to false > and the write buffer size to 12MB. Here are the region server JVM options: > > export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" > > And here are the property settings that we're using in the hbase-site.xml > file: > > hbase.rootdir=hdfs://master:9000/hbase > hbase.regionserver.handler.count=20 > hbase.cluster.distributed=true > hbase.zookeeper.quorum=zk01,zk02,zk03 > hfile.block.cache.size=0 > hbase.hregion.max.filesize=1073741824 > hbase.regionserver.global.memstore.upperLimit=0.79 > hbase.regionserver.global.memstore.lowerLimit=0.70 > hbase.hregion.majorcompaction=0 > hbase.hstore.compactionThreshold=15 > hbase.hstore.blockingStoreFiles=20 > hbase.rpc.timeout=0 > zookeeper.session.timeout=3600000 > > It's taking about 24 hours to load 4TB of data which isn't quite fast > enough for our application. Is there a more optimal configuration that we > can use to improve loading performance? > > - Amit >
