Re: 答复: HBase random read performance

Ted Yu Mon, 15 Apr 2013 10:03:49 -0700

This is a related JIRA which should provide noticeable speed up:

HBASE-1935 Scan in parallel


Cheers

On Mon, Apr 15, 2013 at 7:13 AM, Ted Yu <[email protected]> wrote:

> I looked
> at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in
> 0.94
>
> In processBatchCallback(), starting line 1538,
>
>         // step 1: break up into regionserver-sized chunks and build the
> data structs
>         Map<HRegionLocation, MultiAction<R>> actionsByServer =
>           new HashMap<HRegionLocation, MultiAction<R>>();
>         for (int i = 0; i < workingList.size(); i++) {
>
> So we do group individual action by server.
>
> FYI
>
> On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <[email protected]> wrote:
>
>> Doug made a good point.
>>
>> Take a look at the performance gain for parallel scan (bottom chart
>> compared to top chart):
>> https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
>>
>> See
>> https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
>>  explanation of the two methods.
>>
>> Cheers
>>
>> On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil <[email protected]
>> > wrote:
>>
>>>
>>> Hi there, regarding this...
>>>
>>> > We are passing random 10000 row-keys as input, while HBase is taking
>>> around
>>> > 17 secs to return 10000 records.
>>>
>>>
>>> ….  Given that you are generating 10,000 random keys, your multi-get is
>>> very likely hitting all 5 nodes of your cluster.
>>>
>>>
>>> Historically, multi-Get used to first sort the requests by RS and then
>>> *serially* go the RS to process the multi-Get.  I'm not sure of the
>>> current (0.94.x) behavior if it multi-threads or not.
>>>
>>> One thing you might want to consider is confirming that client behavior,
>>> and if it's not multi-threading then perform a test that does the same RS
>>> sorting via...
>>>
>>>
>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
>>> getRegionLocation%28byte[<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[>
>>> ]%29
>>>
>>> …. and then spin up your own threads (one per target RS) and see what
>>> happens.
>>>
>>>
>>>
>>> On 4/15/13 9:04 AM, "Ankit Jain" <[email protected]> wrote:
>>>
>>> >Hi Liang,
>>> >
>>> >Thanks Liang for reply..
>>> >
>>> >Ans1:
>>> >I tried by using HFile block size of 32 KB and bloom filter is enabled.
>>> >The
>>> >random read performance is 10000 records in 23 secs.
>>> >
>>> >Ans2:
>>> >We are retrieving all the 10000 rows in one call.
>>> >
>>> >Ans3:
>>> >Disk detai:
>>> >Model Number:       ST2000DM001-1CH164
>>> >Serial Number:      Z1E276YF
>>> >
>>> >Please suggest some more optimization
>>> >
>>> >Thanks,
>>> >Ankit Jain
>>> >
>>> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[email protected]> wrote:
>>> >
>>> >> First, it's probably helpless to set block size to 4KB, please refer
>>> to
>>> >> the beginning of HFile.java:
>>> >>
>>> >>  Smaller blocks are good
>>> >>  * for random access, but require more memory to hold the block index,
>>> >>and
>>> >> may
>>> >>  * be slower to create (because we must flush the compressor stream at
>>> >>the
>>> >>  * conclusion of each data block, which leads to an FS I/O flush).
>>> >> Further, due
>>> >>  * to the internal caching in Compression codec, the smallest possible
>>> >> block
>>> >>  * size would be around 20KB-30KB.
>>> >>
>>> >> Second, is it a single-thread test client or multi-threads? we
>>> couldn't
>>> >> expect too much if the requests are one by one.
>>> >>
>>> >> Third, could you provide more info about  your DN disk numbers and IO
>>> >> utils ?
>>> >>
>>> >> Thanks,
>>> >> Liang
>>> >> ________________________________________
>>> >> 发件人: Ankit Jain [[email protected]]
>>> >> 发送时间: 2013年4月15日 18:53
>>> >> 收件人: [email protected]
>>> >> 主题: Re: HBase random read performance
>>> >>
>>> >> Hi Anoop,
>>> >>
>>> >> Thanks for reply..
>>> >>
>>> >> I tried by setting Hfile block size 4KB and also enabled the bloom
>>> >> filter(ROW). The maximum read performance that I was able to achieve
>>> is
>>> >> 10000 records in 14 secs (size of record is 1.6KB).
>>> >>
>>> >> Please suggest some tuning..
>>> >>
>>> >> Thanks,
>>> >> Ankit Jain
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
>>> >> [email protected]> wrote:
>>> >>
>>> >> > Interesting. Can you explain why this happens?
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Anoop Sam John [mailto:[email protected]]
>>> >> > Sent: Monday, April 15, 2013 3:47 PM
>>> >> > To: [email protected]
>>> >> > Subject: RE: HBase random read performance
>>> >> >
>>> >> > Ankit
>>> >> >                  I guess you might be having default HFile block
>>> size
>>> >> > which is 64KB.
>>> >> > For random gets a lower value will be better. Try will some thing
>>> like
>>> >> 8KB
>>> >> > and check the latency?
>>> >> >
>>> >> > Ya ofcourse blooms can help (if major compaction was not done at the
>>> >>time
>>> >> > of testing)
>>> >> >
>>> >> > -Anoop-
>>> >> > ________________________________________
>>> >> > From: Ankit Jain [[email protected]]
>>> >> > Sent: Saturday, April 13, 2013 11:01 AM
>>> >> > To: [email protected]
>>> >> > Subject: HBase random read performance
>>> >> >
>>> >> > Hi All,
>>> >> >
>>> >> > We are using HBase 0.94.5 and Hadoop 1.0.4.
>>> >> >
>>> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
>>> >>Each
>>> >> > regionserver has 8 GB RAM.
>>> >> >
>>> >> > We have loaded 25 millions records in HBase table, regions are
>>> >>pre-split
>>> >> > into 16 regions and all the regions are equally loaded.
>>> >> >
>>> >> > We are getting very low random read performance while performing
>>> multi
>>> >> get
>>> >> > from HBase.
>>> >> >
>>> >> > We are passing random 10000 row-keys as input, while HBase is taking
>>> >> around
>>> >> > 17 secs to return 10000 records.
>>> >> >
>>> >> > Please suggest some tuning to increase HBase read performance.
>>> >> >
>>> >> > Thanks,
>>> >> > Ankit Jain
>>> >> > iLabs
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Thanks,
>>> >> > Ankit Jain
>>> >> >
>>> >> > ________________________________
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > NOTE: This message may contain information that is confidential,
>>> >> > proprietary, privileged or otherwise protected by law. The message
>>> is
>>> >> > intended solely for the named addressee. If received in error,
>>> please
>>> >> > destroy and notify the sender. Any use of this email is prohibited
>>> >>when
>>> >> > received in error. Impetus does not represent, warrant and/or
>>> >>guarantee,
>>> >> > that the integrity of this communication has been maintained nor
>>> that
>>> >>the
>>> >> > communication is free of errors, virus, interception or
>>> interference.
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thanks,
>>> >> Ankit Jain
>>> >>
>>> >
>>> >
>>> >
>>> >--
>>> >Thanks,
>>> >Ankit Jain
>>>
>>>
>>
>

Re: 答复: HBase random read performance

Reply via email to