Re: random reads

Anoop John Fri, 15 Aug 2014 00:36:07 -0700

What about your KV size  and HFile block size for the table.  For a random
read type of use case a lower value for HFile block size might help.


-Anoop-

On Fri, Aug 15, 2014 at 1:56 AM, Esteban Gutierrez <[email protected]>
wrote:

> If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to
> true (thats the default behavior since 0.95/0.96)
>
> Have you noticed if the call processing times or the call queue is too
> high? How does IO look like when you do try to this random gets? are those
> gets going 100% of the time to disk or do you see in the metrics a good
> utilization of the block cache? (e.g. the hit ratio is high) if you think
> region servers are looking good, maybe double check if any of the nodes in
> the cluster has dropped the nic speed rate or make sure your client is not
> the bottleneck by itself. Sometimes users change the blocksize in the
> schema for a specific CF and that also helps.
>
> cheers,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
>
>  On Thu, Aug 14, 2014 at 12:21 PM, Ted Yu <[email protected]> wrote:
>
> > Thomas:
> > Have you set tcpnodelay to true ?
> >
> > See http://hbase.apache.org/book.html for explanation of
> > hbase.ipc.client.tcpnodelay
> >
> > Cheers
> >
> >
> > On Thu, Aug 14, 2014 at 11:41 AM, Thomas Kwan <[email protected]>
> > wrote:
> >
> > > Hi Esteban,
> > >
> > > Thanks for sharing ideas.
> > >
> > > We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read,
> > > and heap size is around 16G for each region server. We have about 20
> > > of them.
> > >
> > > The list of rowkeys that I need to process is about 10M. I am using
> > > batch gets already and the batch size is ~2000 gets.
> > >
> > > thomas
> > >
> > > On Thu, Aug 14, 2014 at 11:01 AM, Esteban Gutierrez
> > > <[email protected]> wrote:
> > > > Hello Thomas,
> > > >
> > > > What version of HBase are you using? sorting and grouping based on
> the
> > > > regions the rows is going to help for sure. I don't think you should
> > > focus
> > > > too much in the locality side of the problem unless your HDFS input
> set
> > > is
> > > > too large (100s or 1000s of MBs per task), otherwise it might be
> faster
> > > to
> > > > load in-memory the input dataset and do the batched calls. As
> discussed
> > > in
> > > > this mailing list recently there are too many factors that might be
> > > > involved in the performance: number of threads or tasks, size of the
> > row,
> > > > RS resources, configurations, etc. so any additional info would be
> very
> > > > helpful.
> > > >
> > > > cheers,
> > > > esteban.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > >
> > > > On Thu, Aug 14, 2014 at 10:32 AM, Thomas Kwan <
> [email protected]>
> > > > wrote:
> > > >
> > > >> Hi there
> > > >>
> > > >> I have a use-case where I need to do a read to check if a hbase
> entry
> > > >> is present, then I do a put to create the entry when it is not
> there.
> > > >>
> > > >> I have a script to get a list of rowkeys from hive and put them on a
> > > >> HDFS directory. Then I have a MR job that reads the rowkeys and do
> > > >> batch reads. I am getting around 1.5K requests per second.
> > > >>
> > > >> To attempt to make this faster, I am wondering if I can
> > > >>
> > > >> - sort and group the rowkeys based on regions
> > > >> - make the MR jobs run on regions that have the data locally
> > > >>
> > > >> Scan or TableInputFormat must have some codes to do something
> similar
> > > >> right?
> > > >>
> > > >> thanks
> > > >> thomas
> > > >>
> > >
> >
>

Re: random reads

Reply via email to