Hello Lars, Yes, I used setCaching for getting more KeyValues in each RPC call. Also yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting is enabled but I don't know how to ensure it has been used (Is there log that can tell me if it has been used?).
I did made sure the HBaseClient runs on the same regionserver that holds the data. I just tried asynchbase (as I'm running out of ideas, I started to try everything), it takes 60 seconds to scan through the data (20 seconds less than using HBaseClient). Best Regards, Jerry On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl <[email protected]> wrote: > From the below I gather you set scanner caching (Scan.setCaching(...))? > When you use HFileReaderV2, you're still reading from HDFS, right? Are you > using short circuit reading (avoiding network IO)? > > In the HBaseClient client you pipe all the data through the network again. > Is the HBaseClient located on a different machine? > > I would use a profiler (just use jVisualVM, which ships with the JDK and > use the "sampling" profiler) to see where the time is spent. > > Lastly, to echo what other folks have said, 0.92 is pretty old at this > point and I personally added a lot of performance improvements to HBase > during the 0.94 timeframe and other's have as well. > If you could test the same with 0.94, I'd be very interested in the > numbers. > > -- Lars > > > > ________________________________ > From: Jerry Lam <[email protected]> > To: user <[email protected]> > Sent: Thursday, January 2, 2014 1:32 PM > Subject: Re: Performance between HBaseClient scan and HFileReaderV2 > > > Hello Vladimir, > > In my use case, I guarantee that a major compaction is executed before any > scan happens because the system we build is a read only system. There will > have no deleted cells. Additionally, I only need to read from a single > column family and therefore I don't need to access multiple HFiles. > > Filter conditions are nice to have because if I can read HFile 8x faster > than using HBaseClient, I can do the filter on the client side and still > perform faster than using HBaseClient. > > Thank you for your input! > > Jerry > > > > > On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov > <[email protected]>wrote: > > > HBase scanner MUST guarantee correct order of KeyValues (coming from > > different HFile's), > > filter condition+ filter condition on included column families and > > qualifiers, time range, max versions and correctly process deleted cells. > > Direct HFileReader does nothing from the above list. > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: [email protected] > > > > ________________________________________ > > From: Jerry Lam [[email protected]] > > Sent: Thursday, January 02, 2014 7:56 AM > > To: user > > Subject: Re: Performance between HBaseClient scan and HFileReaderV2 > > > > Hi Tom, > > > > Good point. Note that I also ran the HBaseClient performance test several > > times (as you can see from the chart). The caching should also benefit > the > > second time I ran the HBaseClient performance test not just benefitting > the > > HFileReaderV2 test. > > > > I still don't understand what makes the HBaseClient performs so poorly in > > comparison to access directly HDFS. I can understand maybe a factor of 2 > > (even that it is too much) but a factor of 8 is quite unreasonable. > > > > Any hint? > > > > Jerry > > > > > > > > On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[email protected]> wrote: > > > > > I'm also new to HBase and am not familiar with HFileReaderV2. However, > > in > > > your description, you didn't mention anything about clearing the linux > OS > > > cache between tests. That might be why you're seeing the big > difference > > if > > > you ran the HBaseClient test first, it may have warmed the OS cache and > > > then HFileReaderV2 benefited from it. Just a guess... > > > > > > -- Tom > > > > > > > > > > > > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[email protected]> > > wrote: > > > > > > > Hello HBase users, > > > > > > > > I just ran a very simple performance test and would like to see if > > what I > > > > experienced make sense. > > > > > > > > The experiment is as follows: > > > > - I filled a hbase region with 700MB data (each row has roughly 45 > > > columns > > > > and the size is 20KB for the entire row) > > > > - I configured the region to hold 4GB (therefore no split occurs) > > > > - I ran compactions after the data is loaded and make sure that there > > is > > > > only 1 region in the table under test. > > > > - No other table exists in the hbase cluster because this is a DEV > > > > environment > > > > - I'm using HBase 0.92.1 > > > > > > > > The test is very basic. I use HBaseClient to scan the entire region > to > > > > retrieve all rows and all columns in the table, just iterating all > > > KeyValue > > > > pairs until it is done. It took about 1 minute 22 sec to complete. > > (Note > > > > that I disable block cache and uses caching size about 10000). > > > > > > > > I ran another test using HFileReaderV2 and scan the entire region to > > > > retrieve all rows and all columns, just iterating all keyValue pairs > > > until > > > > it is done. It took 11 sec. > > > > > > > > The performance difference is dramatic (almost 8 times faster using > > > > HFileReaderV2). > > > > > > > > I want to know why the difference is so big or I didn't configure > HBase > > > > properly. From this experiment, HDFS can deliver the data efficiently > > so > > > it > > > > is not the bottleneck. > > > > > > > > Any help is appreciated! > > > > > > > > Jerry > > > > > > > > > > > > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or [email protected] and > > delete or destroy any copy of this message and its attachments. > > >
