Hi there I have a use-case where I need to do a read to check if a hbase entry is present, then I do a put to create the entry when it is not there.
I have a script to get a list of rowkeys from hive and put them on a HDFS directory. Then I have a MR job that reads the rowkeys and do batch reads. I am getting around 1.5K requests per second. To attempt to make this faster, I am wondering if I can - sort and group the rowkeys based on regions - make the MR jobs run on regions that have the data locally Scan or TableInputFormat must have some codes to do something similar right? thanks thomas
