[Pig Wiki] Update of "JoinFramework" by OlgaN

Apache Wiki Wed, 14 Jan 2009 17:57:29 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

------------------------------------------------------------------------------
  
  == Joins ==
  
- Currently, Pig running on top of Hadoop executes all joins in the same way. 
During the map stage, the data from each relation is annotated with the index 
of that relation. Then, the data is sorted and partitioned by the join key and 
provided to the reducer. This is similar to SQL's hash join. In the next 
generation Pig (currently on types branch), the data from the same relation is 
guaranteed to be continuous for the same key. This is to allow optimization 
that only keep N-1 relations in memory. (Unfortunately, we did not see the 
expected speedup when this optimization was tried - investigation is still in 
progress.)
+ Currently, Pig running on top of Hadoop executes all joins in the same way. 
During the map stage, the data from each relation is annotated with the index 
of that relation. Then, the data is sorted and partitioned by the join key and 
provided to the reducer. This is similar to SQL's hash join. The data from the 
same relation is guaranteed to be continuous for the same key. This is to allow 
optimization that only keep N-1 relations in memory.
  
  In some situations, more efficient join implementations can be constructed if 
more is known about the data of the relations. They are described in the 
section.

[Pig Wiki] Update of "JoinFramework" by OlgaN

Reply via email to