Hello, I'm fairly new to HBase so would be grateful for any assistance.
My project is as follows: use HBase as an underlying data store for an analytics cluster (powered by Apache Spark). In doing this, I'm wondering how I may set about leveraging the locality of the HBase data during processing (in other words, if the Spark instance is running on a node that also houses HBase data, how to make use of the local data first). Is there some form of metadata offered by the Java API which I could then use to organise the data into (virtual) groups based on the locality to be passed forward to Spark? It could be something that *identifies on which node a particular row resides*. I found [1] but I'm not sure if this is what I'm looking for. Could someone please point me in the right direction? [1] https://issues.apache.org/jira/browse/HBASE-12361 Thanks so much! Gokul Balakrishnan.
