[ 
https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589560#action_12589560
 ] 

Pi Song commented on PIG-199:
-----------------------------

Amir,

Just out of curiosity. How do you plan to implement Fragment and Replace Join? 
Is it like ? :-

For A ⋈ B :-
In A:  Map (k1, v1) --> { ((a ,1),(k1,v1)), ((a ,2),(k1,v1)), ((a ,3),(k1,v1)), 
... , ((a ,M),(k1,v1)) }    where a = GetPartitionA( (k1,v1) ) into N partitions
In B:  Map (k1, v1) --> { ((1 ,b),(k1,v1)), ((2 ,b),(k1,v1)), ((3 ,b),(k1,v1)), 
... , ((N ,b),(k1,v1)) }    where b = GetPartitionB( (k1,v1) ) into M partitions

And then having N * M reduce buckets doing local join?

If that is the case, the amount of data will be multiplied. Wouldn't the 
performance be worse? Is this solely for inequality join feature ?

> New Join types in Pig
> ---------------------
>
>                 Key: PIG-199
>                 URL: https://issues.apache.org/jira/browse/PIG-199
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Amir Youssefi
>            Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can 
> potentially improve the performance for large data-sets. I will start with 
> Map Side Joins/Fragment and Replace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to