Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make sure to use the same sort and partitions on the first two.
Sent from iPhone. On Aug 10, 2012, at 9:41 AM, Weishung Chung <[email protected]> wrote: > but they are in production now > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <[email protected]> wrote: > >> Thank you, I am trying to avoid to fetch by gets and would like to do >> something like hadoop MultipleInputs. >> Yes, it would be nice if i could denormalize and remodel the schema. >> >> >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <[email protected]>wrote: >> >>> You can scan over one of the tables (using TableInputFormat) and do simple >>> gets on the other table for every row that you want to join. >>> >>> An interesting question to address here would be - why even need a join. >>> Can you talk more about the data and what you are trying to do? In general >>> you really want to denormalize and not need joins when working with HBase >>> (or for that matter most NoSQL stores). >>> >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <[email protected]> >>> wrote: >>> >>>> Basically a join of two data sets on the same row key. >>>> >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <[email protected]> >>>> wrote: >>>> >>>>> How do you want to use two tables? Can you explain your algo a bit? >>>>> >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi HBase users, >>>>>> >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1 >>> table >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another >>>>> method >>>>>> for multitable inputs ? >>>>>> >>>>>> Thank you, >>>>>> Wei Shung >>>>>> >>>>> >>>> >>> >> >>
