Hello, I have a table whose key is structured as "eventType + time", and I need to periodically run a map reduce job on the table which will process each event type within a specific time range. So, the map reduce job needs to process multiple segments of the table as input, and therefore can't be setup with a single scan. (Using a filter on the scan would theoretically work, but doesn't scale well as the data size increases.)
Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" only supports a single scan there doesn't appear to be a "built in" way to run a mapreduce job that has multiple scans as input. I found the following related post which points me to creating my own map reduce "InputFormat" type by extending HBase's "TableInputFormatBase" and overriding the "getSplits()" method: http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects So, that's currently the direction I'm heading. However, before I got too far in the weeds I thought I'd ask: 1. Is this still the best/right way to handle this situation? 2. Does anyone have an example of a custom InputFormat that sets up multiple scans against an HBase input table (something like the "MultiSegmentTableInputFormat" referred to in the post) that they'd be willing to share? Thanks, -Shawn
