Hi Michael,

Thanks for the reply. Yes, I do realise that HBase has regions, perhaps my
usage of the term partitions was misleading. What I'm looking for is
exactly what you've mentioned - a means of creating splits based on
regions, without having to iterate over all rows in the table through the
client API. Do you have any idea how I might achieve this?

Thanks,

On Tuesday, March 17, 2015, Michael Segel <[email protected]> wrote:

> Hbase doesn't have partitions.  It has regions.
>
> The split occurs against the regions so that if you have n regions, you
> have n splits.
>
> Please don't confuse partitions and regions because they are not the same
> or synonymous.
>
> > On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <[email protected]
> <javascript:;>> wrote:
> >
> > Hi,
> >
> > My requirement is to partition an HBase Table and return a group of
> records
> > (i.e. rows having a specific format) without having to iterate over all
> of
> > its rows. These partitions (which should ideally be along regions) will
> > eventually be sent to Spark but rather than use the HBase or Hadoop RDDs
> > directly, I'll be using a custom RDD which recognizes partitions as the
> > aforementioned group of records.
> >
> > I was looking at achieving this through creating InputSplits through
> > TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I
> > can't figure out a way to do this without having access to the mapred
> > context etc.
> >
> > Would greatly appreciate if someone could point me in the right
> direction.
> >
> > [1]
> >
> https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala
> >
> > Thanks,
> > Gokul
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Reply via email to