Take a look at HBASE-3996 where Stack has some comments outstanding. Cheers
On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <[email protected]> wrote: > Hello, > > I have a table whose key is structured as "eventType + time", and I need to > periodically run a map reduce job on the table which will process each > event type within a specific time range. So, the map reduce job needs to > process multiple segments of the table as input, and therefore can't be > setup with a single scan. (Using a filter on the scan would theoretically > work, but doesn't scale well as the data size increases.) > > Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" only > supports a single scan there doesn't appear to be a "built in" way to run a > mapreduce job that has multiple scans as input. I found the following > related post which points me to creating my own map reduce "InputFormat" > type by extending HBase's "TableInputFormatBase" and overriding the > "getSplits()" method: > > > http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects > > So, that's currently the direction I'm heading. However, before I got too > far in the weeds I thought I'd ask: > > 1. Is this still the best/right way to handle this situation? > > 2. Does anyone have an example of a custom InputFormat that sets up > multiple scans against an HBase input table (something like the > "MultiSegmentTableInputFormat" referred to in the post) that they'd be > willing to share? > > Thanks, > > -Shawn >
