Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/JoinFramework ------------------------------------------------------------------------------ == Joins == - Currently, Pig running on top of Hadoop executes all joins in the same way. During the map stage, the data from each relation is annotated with the index of that relation. Then, the data is sorted and partitioned by the join key and provided to the reducer. This is similar to SQL's hash join. In the next generation Pig (currently on types branch), the data from the same relation is guaranteed to be continuous for the same key. This is to allow optimization that only keep N-1 relations in memory. (Unfortunately, we did not see the expected speedup when this optimization was tried - investigation is still in progress.) + Currently, Pig running on top of Hadoop executes all joins in the same way. During the map stage, the data from each relation is annotated with the index of that relation. Then, the data is sorted and partitioned by the join key and provided to the reducer. This is similar to SQL's hash join. The data from the same relation is guaranteed to be continuous for the same key. This is to allow optimization that only keep N-1 relations in memory. In some situations, more efficient join implementations can be constructed if more is known about the data of the relations. They are described in the section.
