Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers

Arun Allamsetty Mon, 07 Jul 2014 09:33:40 -0700

Hi Ted,

I did not at first. I don't know why I didn't realize I could do that at
first. But then I understood that I can. Thanks for the help though.


Cheers,
Arun
On Jul 3, 2014 10:28 AM, "Ted Yu" <[email protected]> wrote:

> Did you read the summary object through HTable API in Job #2 ?
>
> Cheers
>
>
> On Thu, Jul 3, 2014 at 9:14 AM, Arun Allamsetty <[email protected]
> >
> wrote:
>
> > Hi,
> >
> > I am trying to write a chained MapReduce job on data present in HBase
> > tables and need some help with the concept. I am not expecting people to
> > provide code by pseudo code for this based on HBase's Java API would be
> > nice.
> >
> > In a nutshell, what I am trying to do is,
> >
> > MapReduce Job 1: Read data from two tables with no common row keys and
> > create a summary out of them in the reducer. The output of the reducer
> is a
> > Java Object containing the summary which has been serialized to byte
> code.
> > I store this object in a temporary table in HBase.
> >
> > MapReduce Job 2: This is where I am having problems. I now need to read
> > this summary object such that it is available in each mapper so that
> when I
> > read data from a third (different) table, I can use this summary object
> to
> > perform more calculations on the data I am reading from the third table.
> >
> > I read about distributed cache and tried to implement it, but that
> doesn't
> > seem to work out. I can provide more details in the form of edits if the
> > need arises because I don't want to spam this question, right now, with
> > details which might be irrelevant.
> > Thanks,
> > Arun
> >
>

Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers

Reply via email to