Re: Splitting up an HBase Table into partitions

Michael Segel Tue, 17 Mar 2015 06:23:28 -0700

Hbase doesn’t have partitions.  It has regions.

The split occurs against the regions so that if you have n regions, you have n 
splits.


Please don’t confuse partitions and regions because they are not the same or 
synonymous. 

> On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <[email protected]> wrote:
> 
> Hi,
> 
> My requirement is to partition an HBase Table and return a group of records
> (i.e. rows having a specific format) without having to iterate over all of
> its rows. These partitions (which should ideally be along regions) will
> eventually be sent to Spark but rather than use the HBase or Hadoop RDDs
> directly, I'll be using a custom RDD which recognizes partitions as the
> aforementioned group of records.
> 
> I was looking at achieving this through creating InputSplits through
> TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I
> can't figure out a way to do this without having access to the mapred
> context etc.
> 
> Would greatly appreciate if someone could point me in the right direction.
> 
> [1]
> https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala
> 
> Thanks,
> Gokul

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Splitting up an HBase Table into partitions

Reply via email to