CoGroup is your best bet to join multiple tables. They also are handy if you expect a lot of values from a table for the same key and don't want to blow up your collection size. The Collections are simply all the values from each table that matched the given key.
On Thu, Aug 30, 2018 at 2:33 AM Suyash Agarwal <[email protected]> wrote: > Hi, > > Is there a way to join more than two PTables in a single MR job in Apache > Crunch? > I am unable to find an API which does that. And, using multiple Join > Strategies to have two join statements results in different MR jobs. > Cogroup API seems to take arbitrary PTables but I am not sure if that is > the way to go since they result in collection<> of the values of the joined > tables. I am not sure how these collections are different from iterables. > > Thanks. >
