Re: 答复: HBase random read performance

Ted Yu Mon, 15 Apr 2013 06:30:59 -0700

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):
https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png


See
https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil <[email protected]>wrote:

>
> Hi there, regarding this...
>
> > We are passing random 10000 row-keys as input, while HBase is taking
> around
> > 17 secs to return 10000 records.
>
>
> ….  Given that you are generating 10,000 random keys, your multi-get is
> very likely hitting all 5 nodes of your cluster.
>
>
> Historically, multi-Get used to first sort the requests by RS and then
> *serially* go the RS to process the multi-Get.  I'm not sure of the
> current (0.94.x) behavior if it multi-threads or not.
>
> One thing you might want to consider is confirming that client behavior,
> and if it's not multi-threading then perform a test that does the same RS
> sorting via...
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
> getRegionLocation%28byte[]%29
>
> …. and then spin up your own threads (one per target RS) and see what
> happens.
>
>
>
> On 4/15/13 9:04 AM, "Ankit Jain" <[email protected]> wrote:
>
> >Hi Liang,
> >
> >Thanks Liang for reply..
> >
> >Ans1:
> >I tried by using HFile block size of 32 KB and bloom filter is enabled.
> >The
> >random read performance is 10000 records in 23 secs.
> >
> >Ans2:
> >We are retrieving all the 10000 rows in one call.
> >
> >Ans3:
> >Disk detai:
> >Model Number:       ST2000DM001-1CH164
> >Serial Number:      Z1E276YF
> >
> >Please suggest some more optimization
> >
> >Thanks,
> >Ankit Jain
> >
> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[email protected]> wrote:
> >
> >> First, it's probably helpless to set block size to 4KB, please refer to
> >> the beginning of HFile.java:
> >>
> >>  Smaller blocks are good
> >>  * for random access, but require more memory to hold the block index,
> >>and
> >> may
> >>  * be slower to create (because we must flush the compressor stream at
> >>the
> >>  * conclusion of each data block, which leads to an FS I/O flush).
> >> Further, due
> >>  * to the internal caching in Compression codec, the smallest possible
> >> block
> >>  * size would be around 20KB-30KB.
> >>
> >> Second, is it a single-thread test client or multi-threads? we couldn't
> >> expect too much if the requests are one by one.
> >>
> >> Third, could you provide more info about  your DN disk numbers and IO
> >> utils ?
> >>
> >> Thanks,
> >> Liang
> >> ________________________________________
> >> 发件人: Ankit Jain [[email protected]]
> >> 发送时间: 2013年4月15日 18:53
> >> 收件人: [email protected]
> >> 主题: Re: HBase random read performance
> >>
> >> Hi Anoop,
> >>
> >> Thanks for reply..
> >>
> >> I tried by setting Hfile block size 4KB and also enabled the bloom
> >> filter(ROW). The maximum read performance that I was able to achieve is
> >> 10000 records in 14 secs (size of record is 1.6KB).
> >>
> >> Please suggest some tuning..
> >>
> >> Thanks,
> >> Ankit Jain
> >>
> >>
> >>
> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
> >> [email protected]> wrote:
> >>
> >> > Interesting. Can you explain why this happens?
> >> >
> >> > -----Original Message-----
> >> > From: Anoop Sam John [mailto:[email protected]]
> >> > Sent: Monday, April 15, 2013 3:47 PM
> >> > To: [email protected]
> >> > Subject: RE: HBase random read performance
> >> >
> >> > Ankit
> >> >                  I guess you might be having default HFile block size
> >> > which is 64KB.
> >> > For random gets a lower value will be better. Try will some thing like
> >> 8KB
> >> > and check the latency?
> >> >
> >> > Ya ofcourse blooms can help (if major compaction was not done at the
> >>time
> >> > of testing)
> >> >
> >> > -Anoop-
> >> > ________________________________________
> >> > From: Ankit Jain [[email protected]]
> >> > Sent: Saturday, April 13, 2013 11:01 AM
> >> > To: [email protected]
> >> > Subject: HBase random read performance
> >> >
> >> > Hi All,
> >> >
> >> > We are using HBase 0.94.5 and Hadoop 1.0.4.
> >> >
> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
> >>Each
> >> > regionserver has 8 GB RAM.
> >> >
> >> > We have loaded 25 millions records in HBase table, regions are
> >>pre-split
> >> > into 16 regions and all the regions are equally loaded.
> >> >
> >> > We are getting very low random read performance while performing multi
> >> get
> >> > from HBase.
> >> >
> >> > We are passing random 10000 row-keys as input, while HBase is taking
> >> around
> >> > 17 secs to return 10000 records.
> >> >
> >> > Please suggest some tuning to increase HBase read performance.
> >> >
> >> > Thanks,
> >> > Ankit Jain
> >> > iLabs
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > Ankit Jain
> >> >
> >> > ________________________________
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > NOTE: This message may contain information that is confidential,
> >> > proprietary, privileged or otherwise protected by law. The message is
> >> > intended solely for the named addressee. If received in error, please
> >> > destroy and notify the sender. Any use of this email is prohibited
> >>when
> >> > received in error. Impetus does not represent, warrant and/or
> >>guarantee,
> >> > that the integrity of this communication has been maintained nor that
> >>the
> >> > communication is free of errors, virus, interception or interference.
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Ankit Jain
> >>
> >
> >
> >
> >--
> >Thanks,
> >Ankit Jain
>
>

Re: 答复: HBase random read performance

Reply via email to