Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection

Billie Rinaldi Thu, 11 Oct 2012 11:57:13 -0700

On Wed, Oct 10, 2012 at 7:22 AM, ameet kini <[email protected]> wrote:


> I have a related problem where I need to do a 1-1 join (every row in
> table A joins with a unique row in table B and vice versa). My join
> key is the row id of the table. In the past, I've used Hadoop's
> CompositeInputFormat to do a map-side join over data in HDFS
> (described here
> http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/)  My
> tables in Accumulo seem to fit the eligibility criteria of
> CompositeInputFormat: both tables are sorted by the join key, since
> the join key is the row id in my case, and the tables are partitioned
> the same way (i.e., same split points).
>
> Has anyone tried using CompositeInputFormat over Accumulo tables? Is
> it possible to configure CompositeInputFormat with
> AccumuloInputFormat?
>

I haven't tried it.  If you do, let us know how it works out.

Billie


>
> Thanks,
> Ameet
>
>
> On Tue, Aug 21, 2012 at 8:23 AM, Keith Turner <[email protected]> wrote:
> > Yeah, that would certainly work.
> >
> > You could run two map only jobs (could run concurrently).  A job that
> > reads D1 and writes to Table3 and a job that reads D2 and writes
> > Table3.   Map reduce may be faster, unless you want the final result
> > in Accumulo in which case this may be faster.  The two map reduce jobs
> > could also produce files to bulk import into table3.
> >
> > Keith
> >
> > On Mon, Aug 20, 2012 at 8:26 PM, David Medinets
> > <[email protected]> wrote:
> >> Can you use a new table to join and then scan the new table? Use the
> foreign
> >> key as the rowid. Basically create your own materialized view.
>

Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection

Reply via email to