Re: Replicated Join and OOM errors

Aniket Mokashi Sun, 21 Jul 2013 12:30:27 -0700

Pig does not currently have a way to do this. The development of feature
like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784.
Feel free to add a subtask and take a stab at it.


~Aniket


On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu <
[email protected]> wrote:

> You can always split your tables such that same keys end up in same
> splits. Then you replicated join the corresponding splits and take the
> union.
>
> On Jul 19, 2013, at 12:26 PM, Arun Ahuja <[email protected]> wrote:
>
> > I have been using a replicated join to join on very large set of data
> with
> > another one that is about 1000x smaller.  Generally seen large
> performance
> > gains.
> >
> > However, they do scale together, so that now  even though the RHS table
> is
> > still 1000x smaller, it is too large to fit into memory.  There will
> happen
> > on only every 20th or so dataset that join is performed on, but I'd like
> to
> > have something robust built to handle this.
> >
> > Is there anyway to setup the replicated join to back to a regular join
> only
> > on memory issues?  Or any type of conditional I could set to check the
> > dataset size first?  Willing to even dig into the Pig could and implement
> > this if anyone has ideas.
> >
> > Thanks
> >
> > Arun
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Replicated Join and OOM errors

Reply via email to