On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[email protected]> wrote:
> Hello HBase users, > > I just ran a very simple performance test and would like to see if what I > experienced make sense. > > The experiment is as follows: > - I filled a hbase region with 700MB data (each row has roughly 45 columns > and the size is 20KB for the entire row) > - I configured the region to hold 4GB (therefore no split occurs) > - I ran compactions after the data is loaded and make sure that there is > only 1 region in the table under test. > - No other table exists in the hbase cluster because this is a DEV > environment > - I'm using HBase 0.92.1 > > Can you use a 0.94? It has had some scanner improvements. Thanks, St.Ack > The test is very basic. I use HBaseClient to scan the entire region to > retrieve all rows and all columns in the table, just iterating all KeyValue > pairs until it is done. It took about 1 minute 22 sec to complete. (Note > that I disable block cache and uses caching size about 10000). > > I ran another test using HFileReaderV2 and scan the entire region to > retrieve all rows and all columns, just iterating all keyValue pairs until > it is done. It took 11 sec. > > The performance difference is dramatic (almost 8 times faster using > HFileReaderV2). > > I want to know why the difference is so big or I didn't configure HBase > properly. From this experiment, HDFS can deliver the data efficiently so it > is not the bottleneck. > > Any help is appreciated! > > Jerry > >
