Re: replicated join vs regular ?

Alex Rovner Fri, 27 Jan 2012 15:22:00 -0800

From what I understand replicated should not be used with full outer join since 
full outer means both tables records will be in the output regardless if they 
exist in the joined table. In your case you only care about session which is 
left join and not a full outer.


Reason for that is pigs and Hadoop schematics of the join: the "small" table is 
loaded into each mapper and thus is not meant to be used solely in the output. 

Alex

Sent from my iPhone

On Jan 27, 2012, at 8:15 AM, Vincent Barat <[email protected]> wrote:

> Hi folks,
> 
> I use replicated joins, and recently I encountered an issue : my rightmost 
> relation seems to become too big and, even if I don't get any "Java heap 
> space" the time it take to finish the maps become exponentially long (I 
> cannot figure why exactly).
> 
> Removing "replicated" fix the issue, but several questions raise.
> 
> In Alan's book " *Figure 8.1. Choosing a Join Implementation " it is said 
> that replicated joins should NOT BE USED for outer joins.
> 
> *Nevertheless, it seems to work in the following case, and is faster than 
> regular joins. So why ?
> 
> sessions = JOIN sessions BY locid LEFT, locations BY locid USING 'replicated';
> 
> (not all sessions have a location in this case)
> 
> Thanks for your advices.
> 
> 
> 
>

Re: replicated join vs regular ?

Reply via email to