Hi Nicolas, I think it might be good to create a JIRA for that anyway since seems that some users are expecting this behaviour.
My 2¢ ;) JM 2013/4/16 Nicolas Liochon <nkey...@gmail.com> > I think there is something in the middle that could be done. It was > discussed here a while ago, but without any JIRA created. See thread: > > http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E > > If someone can spend some time on it, I can create the JIRA... > > Nicolas > > > On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond <raymond....@intel.com> > wrote: > > > So what is lacking here? The action should also been parallel inside RS > > for each region, Instead of just parallel on RS level? > > Seems this will be rather difficult to implement, and for Get, might not > > be worthy? > > > > > > > > I looked > > > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java > > > in > > > 0.94 > > > > > > In processBatchCallback(), starting line 1538, > > > > > > // step 1: break up into regionserver-sized chunks and build > the > > data > > > structs > > > Map<HRegionLocation, MultiAction<R>> actionsByServer = > > > new HashMap<HRegionLocation, MultiAction<R>>(); > > > for (int i = 0; i < workingList.size(); i++) { > > > > > > So we do group individual action by server. > > > > > > FYI > > > > > > On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > Doug made a good point. > > > > > > > > Take a look at the performance gain for parallel scan (bottom chart > > > > compared to top chart): > > > > > https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png > > > > > > > > See > > > > > > > https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362 > > > > 8300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan > > > el#comment-13628300for explanation of the two methods. > > > > > > > > Cheers > > > > > > > > On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil > > > <doug.m...@explorysmedical.com>wrote: > > > > > > > >> > > > >> Hi there, regarding this... > > > >> > > > >> > We are passing random 10000 row-keys as input, while HBase is > > > >> > taking > > > >> around > > > >> > 17 secs to return 10000 records. > > > >> > > > >> > > > >> …. Given that you are generating 10,000 random keys, your multi-get > > > >> is very likely hitting all 5 nodes of your cluster. > > > >> > > > >> > > > >> Historically, multi-Get used to first sort the requests by RS and > > > >> then > > > >> *serially* go the RS to process the multi-Get. I'm not sure of the > > > >> current (0.94.x) behavior if it multi-threads or not. > > > >> > > > >> One thing you might want to consider is confirming that client > > > >> behavior, and if it's not multi-threading then perform a test that > > > >> does the same RS sorting via... > > > >> > > > >> > > > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable > > > >> .html# > > > >> getRegionLocation%28byte[< > http://hbase.apache.org/apidocs/org/apache/ > > > >> hadoop/hbase/client/HTable.html#getRegionLocation%28byte[> > > > >> ]%29 > > > >> > > > >> …. and then spin up your own threads (one per target RS) and see > what > > > >> happens. > > > >> > > > >> > > > >> > > > >> On 4/15/13 9:04 AM, "Ankit Jain" <ankitjainc...@gmail.com> wrote: > > > >> > > > >> >Hi Liang, > > > >> > > > > >> >Thanks Liang for reply.. > > > >> > > > > >> >Ans1: > > > >> >I tried by using HFile block size of 32 KB and bloom filter is > > enabled. > > > >> >The > > > >> >random read performance is 10000 records in 23 secs. > > > >> > > > > >> >Ans2: > > > >> >We are retrieving all the 10000 rows in one call. > > > >> > > > > >> >Ans3: > > > >> >Disk detai: > > > >> >Model Number: ST2000DM001-1CH164 > > > >> >Serial Number: Z1E276YF > > > >> > > > > >> >Please suggest some more optimization > > > >> > > > > >> >Thanks, > > > >> >Ankit Jain > > > >> > > > > >> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <xieli...@xiaomi.com> wrote: > > > >> > > > > >> >> First, it's probably helpless to set block size to 4KB, please > > > >> >> refer to the beginning of HFile.java: > > > >> >> > > > >> >> Smaller blocks are good > > > >> >> * for random access, but require more memory to hold the block > > > >> >>index, and may > > > >> >> * be slower to create (because we must flush the compressor > > > >> >>stream at the > > > >> >> * conclusion of each data block, which leads to an FS I/O > flush). > > > >> >> Further, due > > > >> >> * to the internal caching in Compression codec, the smallest > > > >> >>possible block > > > >> >> * size would be around 20KB-30KB. > > > >> >> > > > >> >> Second, is it a single-thread test client or multi-threads? we > > > >> >> couldn't expect too much if the requests are one by one. > > > >> >> > > > >> >> Third, could you provide more info about your DN disk numbers > and > > > >> >> IO utils ? > > > >> >> > > > >> >> Thanks, > > > >> >> Liang > > > >> >> ________________________________________ > > > >> >> 发件人: Ankit Jain [ankitjainc...@gmail.com] > > > >> >> 发送时间: 2013年4月15日 18:53 > > > >> >> 收件人: user@hbase.apache.org > > > >> >> 主题: Re: HBase random read performance > > > >> >> > > > >> >> Hi Anoop, > > > >> >> > > > >> >> Thanks for reply.. > > > >> >> > > > >> >> I tried by setting Hfile block size 4KB and also enabled the > bloom > > > >> >> filter(ROW). The maximum read performance that I was able to > > > >> >> achieve is > > > >> >> 10000 records in 14 secs (size of record is 1.6KB). > > > >> >> > > > >> >> Please suggest some tuning.. > > > >> >> > > > >> >> Thanks, > > > >> >> Ankit Jain > > > >> >> > > > >> >> > > > >> >> > > > >> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal < > > > >> >> rishabh.agra...@impetus.co.in> wrote: > > > >> >> > > > >> >> > Interesting. Can you explain why this happens? > > > >> >> > > > > >> >> > -----Original Message----- > > > >> >> > From: Anoop Sam John [mailto:anoo...@huawei.com] > > > >> >> > Sent: Monday, April 15, 2013 3:47 PM > > > >> >> > To: user@hbase.apache.org > > > >> >> > Subject: RE: HBase random read performance > > > >> >> > > > > >> >> > Ankit > > > >> >> > I guess you might be having default HFile > block > > > >> >> > size which is 64KB. > > > >> >> > For random gets a lower value will be better. Try will some > > > >> >> > thing > > > >> like > > > >> >> 8KB > > > >> >> > and check the latency? > > > >> >> > > > > >> >> > Ya ofcourse blooms can help (if major compaction was not done > at > > > >> >> > the > > > >> >>time > > > >> >> > of testing) > > > >> >> > > > > >> >> > -Anoop- > > > >> >> > ________________________________________ > > > >> >> > From: Ankit Jain [ankitjainc...@gmail.com] > > > >> >> > Sent: Saturday, April 13, 2013 11:01 AM > > > >> >> > To: user@hbase.apache.org > > > >> >> > Subject: HBase random read performance > > > >> >> > > > > >> >> > Hi All, > > > >> >> > > > > >> >> > We are using HBase 0.94.5 and Hadoop 1.0.4. > > > >> >> > > > > >> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master > > node). > > > >> >>Each > > > >> >> > regionserver has 8 GB RAM. > > > >> >> > > > > >> >> > We have loaded 25 millions records in HBase table, regions are > > > >> >>pre-split > > > >> >> > into 16 regions and all the regions are equally loaded. > > > >> >> > > > > >> >> > We are getting very low random read performance while > performing > > > >> multi > > > >> >> get > > > >> >> > from HBase. > > > >> >> > > > > >> >> > We are passing random 10000 row-keys as input, while HBase is > > > >> >> > taking > > > >> >> around > > > >> >> > 17 secs to return 10000 records. > > > >> >> > > > > >> >> > Please suggest some tuning to increase HBase read performance. > > > >> >> > > > > >> >> > Thanks, > > > >> >> > Ankit Jain > > > >> >> > iLabs > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > -- > > > >> >> > Thanks, > > > >> >> > Ankit Jain > > > >> >> > > > > >> >> > ________________________________ > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > NOTE: This message may contain information that is > confidential, > > > >> >> > proprietary, privileged or otherwise protected by law. The > > > >> >> > message is intended solely for the named addressee. If received > > > >> >> > in error, please destroy and notify the sender. Any use of this > > > >> >> > email is prohibited > > > >> >>when > > > >> >> > received in error. Impetus does not represent, warrant and/or > > > >> >>guarantee, > > > >> >> > that the integrity of this communication has been maintained > nor > > > >> >> > that > > > >> >>the > > > >> >> > communication is free of errors, virus, interception or > > interference. > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> Thanks, > > > >> >> Ankit Jain > > > >> >> > > > >> > > > > >> > > > > >> > > > > >> >-- > > > >> >Thanks, > > > >> >Ankit Jain > > > >> > > > >> > > > > > > >