Hi there, regarding this... > We are passing random 10000 row-keys as input, while HBase is taking around > 17 secs to return 10000 records.
…. Given that you are generating 10,000 random keys, your multi-get is very likely hitting all 5 nodes of your cluster. Historically, multi-Get used to first sort the requests by RS and then *serially* go the RS to process the multi-Get. I'm not sure of the current (0.94.x) behavior if it multi-threads or not. One thing you might want to consider is confirming that client behavior, and if it's not multi-threading then perform a test that does the same RS sorting via... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# getRegionLocation%28byte[]%29 …. and then spin up your own threads (one per target RS) and see what happens. On 4/15/13 9:04 AM, "Ankit Jain" <[email protected]> wrote: >Hi Liang, > >Thanks Liang for reply.. > >Ans1: >I tried by using HFile block size of 32 KB and bloom filter is enabled. >The >random read performance is 10000 records in 23 secs. > >Ans2: >We are retrieving all the 10000 rows in one call. > >Ans3: >Disk detai: >Model Number: ST2000DM001-1CH164 >Serial Number: Z1E276YF > >Please suggest some more optimization > >Thanks, >Ankit Jain > >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[email protected]> wrote: > >> First, it's probably helpless to set block size to 4KB, please refer to >> the beginning of HFile.java: >> >> Smaller blocks are good >> * for random access, but require more memory to hold the block index, >>and >> may >> * be slower to create (because we must flush the compressor stream at >>the >> * conclusion of each data block, which leads to an FS I/O flush). >> Further, due >> * to the internal caching in Compression codec, the smallest possible >> block >> * size would be around 20KB-30KB. >> >> Second, is it a single-thread test client or multi-threads? we couldn't >> expect too much if the requests are one by one. >> >> Third, could you provide more info about your DN disk numbers and IO >> utils ? >> >> Thanks, >> Liang >> ________________________________________ >> 发件人: Ankit Jain [[email protected]] >> 发送时间: 2013年4月15日 18:53 >> 收件人: [email protected] >> 主题: Re: HBase random read performance >> >> Hi Anoop, >> >> Thanks for reply.. >> >> I tried by setting Hfile block size 4KB and also enabled the bloom >> filter(ROW). The maximum read performance that I was able to achieve is >> 10000 records in 14 secs (size of record is 1.6KB). >> >> Please suggest some tuning.. >> >> Thanks, >> Ankit Jain >> >> >> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal < >> [email protected]> wrote: >> >> > Interesting. Can you explain why this happens? >> > >> > -----Original Message----- >> > From: Anoop Sam John [mailto:[email protected]] >> > Sent: Monday, April 15, 2013 3:47 PM >> > To: [email protected] >> > Subject: RE: HBase random read performance >> > >> > Ankit >> > I guess you might be having default HFile block size >> > which is 64KB. >> > For random gets a lower value will be better. Try will some thing like >> 8KB >> > and check the latency? >> > >> > Ya ofcourse blooms can help (if major compaction was not done at the >>time >> > of testing) >> > >> > -Anoop- >> > ________________________________________ >> > From: Ankit Jain [[email protected]] >> > Sent: Saturday, April 13, 2013 11:01 AM >> > To: [email protected] >> > Subject: HBase random read performance >> > >> > Hi All, >> > >> > We are using HBase 0.94.5 and Hadoop 1.0.4. >> > >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node). >>Each >> > regionserver has 8 GB RAM. >> > >> > We have loaded 25 millions records in HBase table, regions are >>pre-split >> > into 16 regions and all the regions are equally loaded. >> > >> > We are getting very low random read performance while performing multi >> get >> > from HBase. >> > >> > We are passing random 10000 row-keys as input, while HBase is taking >> around >> > 17 secs to return 10000 records. >> > >> > Please suggest some tuning to increase HBase read performance. >> > >> > Thanks, >> > Ankit Jain >> > iLabs >> > >> > >> > >> > -- >> > Thanks, >> > Ankit Jain >> > >> > ________________________________ >> > >> > >> > >> > >> > >> > >> > NOTE: This message may contain information that is confidential, >> > proprietary, privileged or otherwise protected by law. The message is >> > intended solely for the named addressee. If received in error, please >> > destroy and notify the sender. Any use of this email is prohibited >>when >> > received in error. Impetus does not represent, warrant and/or >>guarantee, >> > that the integrity of this communication has been maintained nor that >>the >> > communication is free of errors, virus, interception or interference. >> > >> >> >> >> -- >> Thanks, >> Ankit Jain >> > > > >-- >Thanks, >Ankit Jain
