Hbase doesn’t have partitions. It has regions. The split occurs against the regions so that if you have n regions, you have n splits.
Please don’t confuse partitions and regions because they are not the same or synonymous. > On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <[email protected]> wrote: > > Hi, > > My requirement is to partition an HBase Table and return a group of records > (i.e. rows having a specific format) without having to iterate over all of > its rows. These partitions (which should ideally be along regions) will > eventually be sent to Spark but rather than use the HBase or Hadoop RDDs > directly, I'll be using a custom RDD which recognizes partitions as the > aforementioned group of records. > > I was looking at achieving this through creating InputSplits through > TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I > can't figure out a way to do this without having access to the mapred > context etc. > > Would greatly appreciate if someone could point me in the right direction. > > [1] > https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala > > Thanks, > Gokul The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
