random reads

Thomas Kwan Thu, 14 Aug 2014 10:33:03 -0700

Hi there

I have a use-case where I need to do a read to check if a hbase entry
is present, then I do a put to create the entry when it is not there.


I have a script to get a list of rowkeys from hive and put them on a
HDFS directory. Then I have a MR job that reads the rowkeys and do
batch reads. I am getting around 1.5K requests per second.

To attempt to make this faster, I am wondering if I can

- sort and group the rowkeys based on regions
- make the MR jobs run on regions that have the data locally

Scan or TableInputFormat must have some codes to do something similar right?

thanks
thomas

random reads

Reply via email to