Hi there

I have a use-case where I need to do a read to check if a hbase entry
is present, then I do a put to create the entry when it is not there.

I have a script to get a list of rowkeys from hive and put them on a
HDFS directory. Then I have a MR job that reads the rowkeys and do
batch reads. I am getting around 1.5K requests per second.

To attempt to make this faster, I am wondering if I can

- sort and group the rowkeys based on regions
- make the MR jobs run on regions that have the data locally

Scan or TableInputFormat must have some codes to do something similar right?

thanks
thomas

Reply via email to