Hello St.Ack, I would like to switch to 0.94 but we are using 0.92.1 and we will not change until the end of 2014. I can change the "client" of HBase (e.g. AsyncHBase) if this is the bottleneck. If the problem is server side (e.g. regionserver), are there anything I can do to improve the performance?
Best Regards, Jerry On Thu, Jan 2, 2014 at 11:23 AM, Stack <[email protected]> wrote: > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[email protected]> wrote: > > > Hello HBase users, > > > > I just ran a very simple performance test and would like to see if what I > > experienced make sense. > > > > The experiment is as follows: > > - I filled a hbase region with 700MB data (each row has roughly 45 > columns > > and the size is 20KB for the entire row) > > - I configured the region to hold 4GB (therefore no split occurs) > > - I ran compactions after the data is loaded and make sure that there is > > only 1 region in the table under test. > > - No other table exists in the hbase cluster because this is a DEV > > environment > > - I'm using HBase 0.92.1 > > > > > Can you use a 0.94? It has had some scanner improvements. > > Thanks, > St.Ack > > > > > The test is very basic. I use HBaseClient to scan the entire region to > > retrieve all rows and all columns in the table, just iterating all > KeyValue > > pairs until it is done. It took about 1 minute 22 sec to complete. (Note > > that I disable block cache and uses caching size about 10000). > > > > I ran another test using HFileReaderV2 and scan the entire region to > > retrieve all rows and all columns, just iterating all keyValue pairs > until > > it is done. It took 11 sec. > > > > The performance difference is dramatic (almost 8 times faster using > > HFileReaderV2). > > > > I want to know why the difference is so big or I didn't configure HBase > > properly. From this experiment, HDFS can deliver the data efficiently so > it > > is not the bottleneck. > > > > Any help is appreciated! > > > > Jerry > > > > >
