[ 
https://issues.apache.org/jira/browse/PIG-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shravan Matthur Narayanamurthy updated PIG-554:
-----------------------------------------------

    Attachment: frjofflat1.patch

> Fragment Replicate Join
> -----------------------
>
>                 Key: PIG-554
>                 URL: https://issues.apache.org/jira/browse/PIG-554
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: types_branch
>            Reporter: Shravan Matthur Narayanamurthy
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: frjofflat.patch, frjofflat1.patch
>
>
> Fragment Replicate Join(FRJ) is useful when we want a join between a huge 
> table and a very small table (fitting in memory small) and the join doesn't 
> expand the data by much. The idea is to distribute the processing of the huge 
> files by fragmenting it and replicating the small file to all machines 
> receiving a fragment of the huge file. Because of the availability of the 
> entire small file, the join becomes a trivial task without needing any break 
> in the pipeline. Exhaustive test have done to determine the improvement we 
> get out of FRJ. Here are the details: http://wiki.apache.org/pig/PigFRJoin
> The patch makes changes to parts of the code where new operators are 
> introduced. Currently, when a new operator is introduced, its alias is not 
> set. For schema computation I have modified this behaviour to set the alias 
> of the new operator to that of its predecessor. The logical side of the patch 
> mimics the cogroup behavior as join syntax closely resembles that of cogroup. 
> Currently, this patch doesn't have support for joins other than inner joins. 
> The rest of the code has been documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to