Dealing with data locality in the HBase Java API

Gokul Balakrishnan Tue, 03 Mar 2015 21:48:59 -0800

Hello,

I'm fairly new to HBase so would be grateful for any assistance.


My project is as follows: use HBase as an underlying data store for an
analytics cluster (powered by Apache Spark).

In doing this, I'm wondering how I may set about leveraging the locality of
the HBase data during processing (in other words, if the Spark instance is
running on a node that also houses HBase data, how to make use of the local
data first).

Is there some form of metadata offered by the Java API which I could then
use to organise the data into (virtual) groups based on the locality to be
passed forward to Spark? It could be something that *identifies on which
node a particular row resides*. I found [1] but I'm not sure if this is
what I'm looking for. Could someone please point me in the right direction?

[1] https://issues.apache.org/jira/browse/HBASE-12361

Thanks so much!
Gokul Balakrishnan.

Dealing with data locality in the HBase Java API

Reply via email to