Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test.
I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[email protected]> wrote: > I'm also new to HBase and am not familiar with HFileReaderV2. However, in > your description, you didn't mention anything about clearing the linux OS > cache between tests. That might be why you're seeing the big difference if > you ran the HBaseClient test first, it may have warmed the OS cache and > then HFileReaderV2 benefited from it. Just a guess... > > -- Tom > > > > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[email protected]> wrote: > > > Hello HBase users, > > > > I just ran a very simple performance test and would like to see if what I > > experienced make sense. > > > > The experiment is as follows: > > - I filled a hbase region with 700MB data (each row has roughly 45 > columns > > and the size is 20KB for the entire row) > > - I configured the region to hold 4GB (therefore no split occurs) > > - I ran compactions after the data is loaded and make sure that there is > > only 1 region in the table under test. > > - No other table exists in the hbase cluster because this is a DEV > > environment > > - I'm using HBase 0.92.1 > > > > The test is very basic. I use HBaseClient to scan the entire region to > > retrieve all rows and all columns in the table, just iterating all > KeyValue > > pairs until it is done. It took about 1 minute 22 sec to complete. (Note > > that I disable block cache and uses caching size about 10000). > > > > I ran another test using HFileReaderV2 and scan the entire region to > > retrieve all rows and all columns, just iterating all keyValue pairs > until > > it is done. It took 11 sec. > > > > The performance difference is dramatic (almost 8 times faster using > > HFileReaderV2). > > > > I want to know why the difference is so big or I didn't configure HBase > > properly. From this experiment, HDFS can deliver the data efficiently so > it > > is not the bottleneck. > > > > Any help is appreciated! > > > > Jerry > > > > >
