About the use case: We want to do JDBC queries for each row in a Hive partition. Currently, we use Spark to partition the Hive dataFrame, then do batch query in foreachPartition. Since each partition is accessing multiple regionservers, there are a lot of overhead. So we are thinking about partitioning the dataFrame according to the HBase region.
Any help is appreciated! Best, Simon > On Jul 10, 2016, at 2:01 PM, Simon Wang <simon.w...@airbnb.com> wrote: > > Hi all, > > Happy weekend! > > I am writing to ask if there is a way that I can get the region number of any > given row key? > > For the case will salting is applied, I discovered > `SaltingUtils.getSaltedKey` method, but I am not sure how I can get serialize > the key as `ImmutableBytesWritable`. > > In general, how should the client get the region number? Assuming that the > client have no prior knowledge of the table. So the client needs to read from > metadata (salted or not, SPLIT ON or not), serialize key, compare with > splits, etc. > > Thanks in advance! > > > Best, > Simon >