Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change
notification.
The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework
--
== Joins ==
- Currently, Pig running on top of Hadoop executes all joins in the same way.
During the map stage, the data from each relation is annotated with the index
of that relation. Then, the data is sorted and partitioned by the join key and
provided to the reducer. This is similar to SQL's hash join. In the next
generation Pig (currently on types branch), the data from the same relation is
guaranteed to be continuous for the same key. This is to allow optimization
that only keep N-1 relations in memory. (Unfortunately, we did not see the
expected speedup when this optimization was tried - investigation is still in
progress.)
+ Currently, Pig running on top of Hadoop executes all joins in the same way.
During the map stage, the data from each relation is annotated with the index
of that relation. Then, the data is sorted and partitioned by the join key and
provided to the reducer. This is similar to SQL's hash join. The data from the
same relation is guaranteed to be continuous for the same key. This is to allow
optimization that only keep N-1 relations in memory.
In some situations, more efficient join implementations can be constructed if
more is known about the data of the relations. They are described in the
section.