Hi, I am trying to write a chained MapReduce job on data present in HBase tables and need some help with the concept. I am not expecting people to provide code by pseudo code for this based on HBase's Java API would be nice.
In a nutshell, what I am trying to do is, MapReduce Job 1: Read data from two tables with no common row keys and create a summary out of them in the reducer. The output of the reducer is a Java Object containing the summary which has been serialized to byte code. I store this object in a temporary table in HBase. MapReduce Job 2: This is where I am having problems. I now need to read this summary object such that it is available in each mapper so that when I read data from a third (different) table, I can use this summary object to perform more calculations on the data I am reading from the third table. I read about distributed cache and tried to implement it, but that doesn't seem to work out. I can provide more details in the form of edits if the need arises because I don't want to spam this question, right now, with details which might be irrelevant. Thanks, Arun
