Pig does not currently have a way to do this. The development of feature like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784. Feel free to add a subtask and take a stab at it.
~Aniket On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu < [email protected]> wrote: > You can always split your tables such that same keys end up in same > splits. Then you replicated join the corresponding splits and take the > union. > > On Jul 19, 2013, at 12:26 PM, Arun Ahuja <[email protected]> wrote: > > > I have been using a replicated join to join on very large set of data > with > > another one that is about 1000x smaller. Generally seen large > performance > > gains. > > > > However, they do scale together, so that now even though the RHS table > is > > still 1000x smaller, it is too large to fit into memory. There will > happen > > on only every 20th or so dataset that join is performed on, but I'd like > to > > have something robust built to handle this. > > > > Is there anyway to setup the replicated join to back to a regular join > only > > on memory issues? Or any type of conditional I could set to check the > > dataset size first? Willing to even dig into the Pig could and implement > > this if anyone has ideas. > > > > Thanks > > > > Arun > > -- "...:::Aniket:::... Quetzalco@tl"
