Fragment Replicate Join
-----------------------
Key: PIG-554
URL: https://issues.apache.org/jira/browse/PIG-554
Project: Pig
Issue Type: New Feature
Affects Versions: types_branch
Reporter: Shravan Matthur Narayanamurthy
Fix For: types_branch
Fragment Replicate Join(FRJ) is useful when we want a join between a huge table
and a very small table (fitting in memory small) and the join doesn't expand
the data by much. The idea is to distribute the processing of the huge files by
fragmenting it and replicating the small file to all machines receiving a
fragment of the huge file. Because of the availability of the entire small
file, the join becomes a trivial task without needing any break in the
pipeline. Exhaustive test have done to determine the improvement we get out of
FRJ. Will post the details in a wiki and add a link here
The patch makes changes to parts of the code where new operators are
introduced. Currently, when a new operator is introduced, its alias is not set.
For schema computation I have modified this behaviour to set the alias of the
new operator to that of its predecessor. The logical side of the patch mimics
the cogroup behavior as join syntax closely resembles that of cogroup.
Currently, this patch doesn't have support for joins other than inner joins.
The rest of the code has been documented.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.