Re: 答复: HBase random read performance

Jean-Marc Spaggiari Tue, 16 Apr 2013 04:02:32 -0700

Hi Nicolas,

I think it might be good to create a JIRA for that anyway since seems that
some users are expecting this behaviour.


My 2¢ ;)

JM

2013/4/16 Nicolas Liochon <nkey...@gmail.com>

> I think there is something in the middle that could be done. It was
> discussed here a while ago, but without any JIRA created.  See thread:
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E
>
> If someone can spend some time on it, I can create the JIRA...
>
> Nicolas
>
>
> On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond <raymond....@intel.com>
> wrote:
>
> > So what is lacking here? The action should also been parallel inside RS
> > for each region, Instead of just parallel on RS level?
> > Seems this will be rather difficult to implement, and for Get, might not
> > be worthy?
> >
> > >
> > > I looked
> > > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> > > in
> > > 0.94
> > >
> > > In processBatchCallback(), starting line 1538,
> > >
> > >         // step 1: break up into regionserver-sized chunks and build
> the
> > data
> > > structs
> > >         Map<HRegionLocation, MultiAction<R>> actionsByServer =
> > >           new HashMap<HRegionLocation, MultiAction<R>>();
> > >         for (int i = 0; i < workingList.size(); i++) {
> > >
> > > So we do group individual action by server.
> > >
> > > FYI
> > >
> > > On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > Doug made a good point.
> > > >
> > > > Take a look at the performance gain for parallel scan (bottom chart
> > > > compared to top chart):
> > > >
> https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
> > > >
> > > > See
> > > >
> > > https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362
> > >
> 8300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
> > > el#comment-13628300for explanation of the two methods.
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
> > > <doug.m...@explorysmedical.com>wrote:
> > > >
> > > >>
> > > >> Hi there, regarding this...
> > > >>
> > > >> > We are passing random 10000 row-keys as input, while HBase is
> > > >> > taking
> > > >> around
> > > >> > 17 secs to return 10000 records.
> > > >>
> > > >>
> > > >> ….  Given that you are generating 10,000 random keys, your multi-get
> > > >> is very likely hitting all 5 nodes of your cluster.
> > > >>
> > > >>
> > > >> Historically, multi-Get used to first sort the requests by RS and
> > > >> then
> > > >> *serially* go the RS to process the multi-Get.  I'm not sure of the
> > > >> current (0.94.x) behavior if it multi-threads or not.
> > > >>
> > > >> One thing you might want to consider is confirming that client
> > > >> behavior, and if it's not multi-threading then perform a test that
> > > >> does the same RS sorting via...
> > > >>
> > > >>
> > > >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
> > > >> .html#
> > > >> getRegionLocation%28byte[<
> http://hbase.apache.org/apidocs/org/apache/
> > > >> hadoop/hbase/client/HTable.html#getRegionLocation%28byte[>
> > > >> ]%29
> > > >>
> > > >> …. and then spin up your own threads (one per target RS) and see
> what
> > > >> happens.
> > > >>
> > > >>
> > > >>
> > > >> On 4/15/13 9:04 AM, "Ankit Jain" <ankitjainc...@gmail.com> wrote:
> > > >>
> > > >> >Hi Liang,
> > > >> >
> > > >> >Thanks Liang for reply..
> > > >> >
> > > >> >Ans1:
> > > >> >I tried by using HFile block size of 32 KB and bloom filter is
> > enabled.
> > > >> >The
> > > >> >random read performance is 10000 records in 23 secs.
> > > >> >
> > > >> >Ans2:
> > > >> >We are retrieving all the 10000 rows in one call.
> > > >> >
> > > >> >Ans3:
> > > >> >Disk detai:
> > > >> >Model Number:       ST2000DM001-1CH164
> > > >> >Serial Number:      Z1E276YF
> > > >> >
> > > >> >Please suggest some more optimization
> > > >> >
> > > >> >Thanks,
> > > >> >Ankit Jain
> > > >> >
> > > >> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <xieli...@xiaomi.com> wrote:
> > > >> >
> > > >> >> First, it's probably helpless to set block size to 4KB, please
> > > >> >> refer to the beginning of HFile.java:
> > > >> >>
> > > >> >>  Smaller blocks are good
> > > >> >>  * for random access, but require more memory to hold the block
> > > >> >>index, and  may
> > > >> >>  * be slower to create (because we must flush the compressor
> > > >> >>stream at the
> > > >> >>  * conclusion of each data block, which leads to an FS I/O
> flush).
> > > >> >> Further, due
> > > >> >>  * to the internal caching in Compression codec, the smallest
> > > >> >>possible  block
> > > >> >>  * size would be around 20KB-30KB.
> > > >> >>
> > > >> >> Second, is it a single-thread test client or multi-threads? we
> > > >> >> couldn't expect too much if the requests are one by one.
> > > >> >>
> > > >> >> Third, could you provide more info about  your DN disk numbers
> and
> > > >> >> IO utils ?
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Liang
> > > >> >> ________________________________________
> > > >> >> 发件人: Ankit Jain [ankitjainc...@gmail.com]
> > > >> >> 发送时间: 2013年4月15日 18:53
> > > >> >> 收件人: user@hbase.apache.org
> > > >> >> 主题: Re: HBase random read performance
> > > >> >>
> > > >> >> Hi Anoop,
> > > >> >>
> > > >> >> Thanks for reply..
> > > >> >>
> > > >> >> I tried by setting Hfile block size 4KB and also enabled the
> bloom
> > > >> >> filter(ROW). The maximum read performance that I was able to
> > > >> >> achieve is
> > > >> >> 10000 records in 14 secs (size of record is 1.6KB).
> > > >> >>
> > > >> >> Please suggest some tuning..
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Ankit Jain
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
> > > >> >> rishabh.agra...@impetus.co.in> wrote:
> > > >> >>
> > > >> >> > Interesting. Can you explain why this happens?
> > > >> >> >
> > > >> >> > -----Original Message-----
> > > >> >> > From: Anoop Sam John [mailto:anoo...@huawei.com]
> > > >> >> > Sent: Monday, April 15, 2013 3:47 PM
> > > >> >> > To: user@hbase.apache.org
> > > >> >> > Subject: RE: HBase random read performance
> > > >> >> >
> > > >> >> > Ankit
> > > >> >> >                  I guess you might be having default HFile
> block
> > > >> >> > size which is 64KB.
> > > >> >> > For random gets a lower value will be better. Try will some
> > > >> >> > thing
> > > >> like
> > > >> >> 8KB
> > > >> >> > and check the latency?
> > > >> >> >
> > > >> >> > Ya ofcourse blooms can help (if major compaction was not done
> at
> > > >> >> > the
> > > >> >>time
> > > >> >> > of testing)
> > > >> >> >
> > > >> >> > -Anoop-
> > > >> >> > ________________________________________
> > > >> >> > From: Ankit Jain [ankitjainc...@gmail.com]
> > > >> >> > Sent: Saturday, April 13, 2013 11:01 AM
> > > >> >> > To: user@hbase.apache.org
> > > >> >> > Subject: HBase random read performance
> > > >> >> >
> > > >> >> > Hi All,
> > > >> >> >
> > > >> >> > We are using HBase 0.94.5 and Hadoop 1.0.4.
> > > >> >> >
> > > >> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master
> > node).
> > > >> >>Each
> > > >> >> > regionserver has 8 GB RAM.
> > > >> >> >
> > > >> >> > We have loaded 25 millions records in HBase table, regions are
> > > >> >>pre-split
> > > >> >> > into 16 regions and all the regions are equally loaded.
> > > >> >> >
> > > >> >> > We are getting very low random read performance while
> performing
> > > >> multi
> > > >> >> get
> > > >> >> > from HBase.
> > > >> >> >
> > > >> >> > We are passing random 10000 row-keys as input, while HBase is
> > > >> >> > taking
> > > >> >> around
> > > >> >> > 17 secs to return 10000 records.
> > > >> >> >
> > > >> >> > Please suggest some tuning to increase HBase read performance.
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> > Ankit Jain
> > > >> >> > iLabs
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > --
> > > >> >> > Thanks,
> > > >> >> > Ankit Jain
> > > >> >> >
> > > >> >> > ________________________________
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > NOTE: This message may contain information that is
> confidential,
> > > >> >> > proprietary, privileged or otherwise protected by law. The
> > > >> >> > message is intended solely for the named addressee. If received
> > > >> >> > in error, please destroy and notify the sender. Any use of this
> > > >> >> > email is prohibited
> > > >> >>when
> > > >> >> > received in error. Impetus does not represent, warrant and/or
> > > >> >>guarantee,
> > > >> >> > that the integrity of this communication has been maintained
> nor
> > > >> >> > that
> > > >> >>the
> > > >> >> > communication is free of errors, virus, interception or
> > interference.
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Thanks,
> > > >> >> Ankit Jain
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> >--
> > > >> >Thanks,
> > > >> >Ankit Jain
> > > >>
> > > >>
> > > >
> >
>

Re: 答复: HBase random read performance

Reply via email to