[ https://issues.apache.org/jira/browse/PIG-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661805#action_12661805 ]
Alan Gates commented on PIG-554: -------------------------------- Patch v4 checked in. Thanks Shravan for all your work on this. Initial tests show speed ups in the 2-4x range. This is huge. > Fragment Replicate Join > ----------------------- > > Key: PIG-554 > URL: https://issues.apache.org/jira/browse/PIG-554 > Project: Pig > Issue Type: New Feature > Affects Versions: types_branch > Reporter: Shravan Matthur Narayanamurthy > Assignee: Shravan Matthur Narayanamurthy > Fix For: types_branch > > Attachments: frjofflat.patch, frjofflat1.patch, PIG-554-v3.patch, > PIG-554-v4.patch > > > Fragment Replicate Join(FRJ) is useful when we want a join between a huge > table and a very small table (fitting in memory small) and the join doesn't > expand the data by much. The idea is to distribute the processing of the huge > files by fragmenting it and replicating the small file to all machines > receiving a fragment of the huge file. Because of the availability of the > entire small file, the join becomes a trivial task without needing any break > in the pipeline. Exhaustive test have done to determine the improvement we > get out of FRJ. Here are the details: http://wiki.apache.org/pig/PigFRJoin > The patch makes changes to parts of the code where new operators are > introduced. Currently, when a new operator is introduced, its alias is not > set. For schema computation I have modified this behaviour to set the alias > of the new operator to that of its predecessor. The logical side of the patch > mimics the cogroup behavior as join syntax closely resembles that of cogroup. > Currently, this patch doesn't have support for joins other than inner joins. > The rest of the code has been documented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.