Doug made a good point. Take a look at the performance gain for parallel scan (bottom chart compared to top chart): https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
See https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for explanation of the two methods. Cheers On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil <[email protected]>wrote: > > Hi there, regarding this... > > > We are passing random 10000 row-keys as input, while HBase is taking > around > > 17 secs to return 10000 records. > > > …. Given that you are generating 10,000 random keys, your multi-get is > very likely hitting all 5 nodes of your cluster. > > > Historically, multi-Get used to first sort the requests by RS and then > *serially* go the RS to process the multi-Get. I'm not sure of the > current (0.94.x) behavior if it multi-threads or not. > > One thing you might want to consider is confirming that client behavior, > and if it's not multi-threading then perform a test that does the same RS > sorting via... > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# > getRegionLocation%28byte[]%29 > > …. and then spin up your own threads (one per target RS) and see what > happens. > > > > On 4/15/13 9:04 AM, "Ankit Jain" <[email protected]> wrote: > > >Hi Liang, > > > >Thanks Liang for reply.. > > > >Ans1: > >I tried by using HFile block size of 32 KB and bloom filter is enabled. > >The > >random read performance is 10000 records in 23 secs. > > > >Ans2: > >We are retrieving all the 10000 rows in one call. > > > >Ans3: > >Disk detai: > >Model Number: ST2000DM001-1CH164 > >Serial Number: Z1E276YF > > > >Please suggest some more optimization > > > >Thanks, > >Ankit Jain > > > >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[email protected]> wrote: > > > >> First, it's probably helpless to set block size to 4KB, please refer to > >> the beginning of HFile.java: > >> > >> Smaller blocks are good > >> * for random access, but require more memory to hold the block index, > >>and > >> may > >> * be slower to create (because we must flush the compressor stream at > >>the > >> * conclusion of each data block, which leads to an FS I/O flush). > >> Further, due > >> * to the internal caching in Compression codec, the smallest possible > >> block > >> * size would be around 20KB-30KB. > >> > >> Second, is it a single-thread test client or multi-threads? we couldn't > >> expect too much if the requests are one by one. > >> > >> Third, could you provide more info about your DN disk numbers and IO > >> utils ? > >> > >> Thanks, > >> Liang > >> ________________________________________ > >> 发件人: Ankit Jain [[email protected]] > >> 发送时间: 2013年4月15日 18:53 > >> 收件人: [email protected] > >> 主题: Re: HBase random read performance > >> > >> Hi Anoop, > >> > >> Thanks for reply.. > >> > >> I tried by setting Hfile block size 4KB and also enabled the bloom > >> filter(ROW). The maximum read performance that I was able to achieve is > >> 10000 records in 14 secs (size of record is 1.6KB). > >> > >> Please suggest some tuning.. > >> > >> Thanks, > >> Ankit Jain > >> > >> > >> > >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal < > >> [email protected]> wrote: > >> > >> > Interesting. Can you explain why this happens? > >> > > >> > -----Original Message----- > >> > From: Anoop Sam John [mailto:[email protected]] > >> > Sent: Monday, April 15, 2013 3:47 PM > >> > To: [email protected] > >> > Subject: RE: HBase random read performance > >> > > >> > Ankit > >> > I guess you might be having default HFile block size > >> > which is 64KB. > >> > For random gets a lower value will be better. Try will some thing like > >> 8KB > >> > and check the latency? > >> > > >> > Ya ofcourse blooms can help (if major compaction was not done at the > >>time > >> > of testing) > >> > > >> > -Anoop- > >> > ________________________________________ > >> > From: Ankit Jain [[email protected]] > >> > Sent: Saturday, April 13, 2013 11:01 AM > >> > To: [email protected] > >> > Subject: HBase random read performance > >> > > >> > Hi All, > >> > > >> > We are using HBase 0.94.5 and Hadoop 1.0.4. > >> > > >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node). > >>Each > >> > regionserver has 8 GB RAM. > >> > > >> > We have loaded 25 millions records in HBase table, regions are > >>pre-split > >> > into 16 regions and all the regions are equally loaded. > >> > > >> > We are getting very low random read performance while performing multi > >> get > >> > from HBase. > >> > > >> > We are passing random 10000 row-keys as input, while HBase is taking > >> around > >> > 17 secs to return 10000 records. > >> > > >> > Please suggest some tuning to increase HBase read performance. > >> > > >> > Thanks, > >> > Ankit Jain > >> > iLabs > >> > > >> > > >> > > >> > -- > >> > Thanks, > >> > Ankit Jain > >> > > >> > ________________________________ > >> > > >> > > >> > > >> > > >> > > >> > > >> > NOTE: This message may contain information that is confidential, > >> > proprietary, privileged or otherwise protected by law. The message is > >> > intended solely for the named addressee. If received in error, please > >> > destroy and notify the sender. Any use of this email is prohibited > >>when > >> > received in error. Impetus does not represent, warrant and/or > >>guarantee, > >> > that the integrity of this communication has been maintained nor that > >>the > >> > communication is free of errors, virus, interception or interference. > >> > > >> > >> > >> > >> -- > >> Thanks, > >> Ankit Jain > >> > > > > > > > >-- > >Thanks, > >Ankit Jain > >
