About the use case:

We want to do JDBC queries for each row in a Hive partition. Currently, we use 
Spark to partition the Hive dataFrame, then do batch query in foreachPartition. 
Since each partition is accessing multiple regionservers, there are a lot of 
overhead. So we are thinking about partitioning the dataFrame according to the 
HBase region.

Any help is appreciated!

Best,
Simon

> On Jul 10, 2016, at 2:01 PM, Simon Wang <simon.w...@airbnb.com> wrote:
> 
> Hi all,
> 
> Happy weekend!
> 
> I am writing to ask if there is a way that I can get the region number of any 
> given row key? 
> 
> For the case will salting is applied, I discovered 
> `SaltingUtils.getSaltedKey` method, but I am not sure how I can get serialize 
> the key as `ImmutableBytesWritable`.
> 
> In general, how should the client get the region number? Assuming that the 
> client have no prior knowledge of the table. So the client needs to read from 
> metadata (salted or not, SPLIT ON or not), serialize key, compare with 
> splits, etc.
> 
> Thanks in advance!
> 
> 
> Best,
> Simon
> 

Reply via email to