Take a look at HBASE-3996 where Stack has some comments outstanding.

Cheers

On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <[email protected]> wrote:

> Hello,
>
> I have a table whose key is structured as "eventType + time", and I need to
> periodically run a map reduce job on the table which will process each
> event type within a specific time range.  So, the map reduce job needs to
> process multiple segments of the table as input, and therefore can't be
> setup with a single scan.  (Using a filter on the scan would theoretically
> work, but doesn't scale well as the data size increases.)
>
> Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" only
> supports a single scan there doesn't appear to be a "built in" way to run a
> mapreduce job that has multiple scans as input.  I found the following
> related post which points me to creating my own map reduce "InputFormat"
> type by extending HBase's "TableInputFormatBase" and overriding the
> "getSplits()" method:
>
>
> http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
>
> So, that's currently the direction I'm heading.  However, before I got too
> far in the weeds I thought I'd ask:
>
> 1. Is this still the best/right way to handle this situation?
>
> 2. Does anyone have an example of a custom InputFormat that sets up
> multiple scans against an HBase input table (something like the
> "MultiSegmentTableInputFormat" referred to in the post) that they'd be
> willing to share?
>
> Thanks,
>
>       -Shawn
>

Reply via email to