On Wed, Oct 10, 2012 at 7:22 AM, ameet kini <[email protected]> wrote:
> I have a related problem where I need to do a 1-1 join (every row in > table A joins with a unique row in table B and vice versa). My join > key is the row id of the table. In the past, I've used Hadoop's > CompositeInputFormat to do a map-side join over data in HDFS > (described here > http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/) My > tables in Accumulo seem to fit the eligibility criteria of > CompositeInputFormat: both tables are sorted by the join key, since > the join key is the row id in my case, and the tables are partitioned > the same way (i.e., same split points). > > Has anyone tried using CompositeInputFormat over Accumulo tables? Is > it possible to configure CompositeInputFormat with > AccumuloInputFormat? > I haven't tried it. If you do, let us know how it works out. Billie > > Thanks, > Ameet > > > On Tue, Aug 21, 2012 at 8:23 AM, Keith Turner <[email protected]> wrote: > > Yeah, that would certainly work. > > > > You could run two map only jobs (could run concurrently). A job that > > reads D1 and writes to Table3 and a job that reads D2 and writes > > Table3. Map reduce may be faster, unless you want the final result > > in Accumulo in which case this may be faster. The two map reduce jobs > > could also produce files to bulk import into table3. > > > > Keith > > > > On Mon, Aug 20, 2012 at 8:26 PM, David Medinets > > <[email protected]> wrote: > >> Can you use a new table to join and then scan the new table? Use the > foreign > >> key as the rowid. Basically create your own materialized view. >
