Hi all, Has anyone tried extending PutSortReducer in order to add some traditional reduce logic (i.e, aggregating counters) ?
I want to process data with hadoop mapreduce job (aggregate counters per keys - traditional hadoop mr) but I want to bulk load the reduce output to HBase. As I understand things, the "native" way to do it is to run two jobs, the first to aggregate counters by keys and the second to create Puts(Map phase) and bulk load into HBAse (HFileOutputFormat.configureIncrementalLoad()). I was thinking of combining the two into one mapreduce where the Map of the first job is the Map of the combined job and the Reducer of the new job will extend PutSortReducer so that the reduce logic of the first job is implemented and then PutSortReducer reduce goes into action to write out as KeyValue. Any thoughts ? Anyone tried something similar and has something to add / correct ? Thanks, Amit.
