What about your KV size and HFile block size for the table. For a random read type of use case a lower value for HFile block size might help.
-Anoop- On Fri, Aug 15, 2014 at 1:56 AM, Esteban Gutierrez <[email protected]> wrote: > If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to > true (thats the default behavior since 0.95/0.96) > > Have you noticed if the call processing times or the call queue is too > high? How does IO look like when you do try to this random gets? are those > gets going 100% of the time to disk or do you see in the metrics a good > utilization of the block cache? (e.g. the hit ratio is high) if you think > region servers are looking good, maybe double check if any of the nodes in > the cluster has dropped the nic speed rate or make sure your client is not > the bottleneck by itself. Sometimes users change the blocksize in the > schema for a specific CF and that also helps. > > cheers, > esteban. > > > > -- > Cloudera, Inc. > > > > On Thu, Aug 14, 2014 at 12:21 PM, Ted Yu <[email protected]> wrote: > > > Thomas: > > Have you set tcpnodelay to true ? > > > > See http://hbase.apache.org/book.html for explanation of > > hbase.ipc.client.tcpnodelay > > > > Cheers > > > > > > On Thu, Aug 14, 2014 at 11:41 AM, Thomas Kwan <[email protected]> > > wrote: > > > > > Hi Esteban, > > > > > > Thanks for sharing ideas. > > > > > > We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read, > > > and heap size is around 16G for each region server. We have about 20 > > > of them. > > > > > > The list of rowkeys that I need to process is about 10M. I am using > > > batch gets already and the batch size is ~2000 gets. > > > > > > thomas > > > > > > On Thu, Aug 14, 2014 at 11:01 AM, Esteban Gutierrez > > > <[email protected]> wrote: > > > > Hello Thomas, > > > > > > > > What version of HBase are you using? sorting and grouping based on > the > > > > regions the rows is going to help for sure. I don't think you should > > > focus > > > > too much in the locality side of the problem unless your HDFS input > set > > > is > > > > too large (100s or 1000s of MBs per task), otherwise it might be > faster > > > to > > > > load in-memory the input dataset and do the batched calls. As > discussed > > > in > > > > this mailing list recently there are too many factors that might be > > > > involved in the performance: number of threads or tasks, size of the > > row, > > > > RS resources, configurations, etc. so any additional info would be > very > > > > helpful. > > > > > > > > cheers, > > > > esteban. > > > > > > > > > > > > > > > > > > > > -- > > > > Cloudera, Inc. > > > > > > > > > > > > > > > > On Thu, Aug 14, 2014 at 10:32 AM, Thomas Kwan < > [email protected]> > > > > wrote: > > > > > > > >> Hi there > > > >> > > > >> I have a use-case where I need to do a read to check if a hbase > entry > > > >> is present, then I do a put to create the entry when it is not > there. > > > >> > > > >> I have a script to get a list of rowkeys from hive and put them on a > > > >> HDFS directory. Then I have a MR job that reads the rowkeys and do > > > >> batch reads. I am getting around 1.5K requests per second. > > > >> > > > >> To attempt to make this faster, I am wondering if I can > > > >> > > > >> - sort and group the rowkeys based on regions > > > >> - make the MR jobs run on regions that have the data locally > > > >> > > > >> Scan or TableInputFormat must have some codes to do something > similar > > > >> right? > > > >> > > > >> thanks > > > >> thomas > > > >> > > > > > >
