[Pig Wiki] Update of "JoinFramework" by OlgaN

2009-01-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

--
  
  == Joins ==
  
- Currently, Pig running on top of Hadoop executes all joins in the same way. 
During the map stage, the data from each relation is annotated with the index 
of that relation. Then, the data is sorted and partitioned by the join key and 
provided to the reducer. This is similar to SQL's hash join. In the next 
generation Pig (currently on types branch), the data from the same relation is 
guaranteed to be continuous for the same key. This is to allow optimization 
that only keep N-1 relations in memory. (Unfortunately, we did not see the 
expected speedup when this optimization was tried - investigation is still in 
progress.)
+ Currently, Pig running on top of Hadoop executes all joins in the same way. 
During the map stage, the data from each relation is annotated with the index 
of that relation. Then, the data is sorted and partitioned by the join key and 
provided to the reducer. This is similar to SQL's hash join. The data from the 
same relation is guaranteed to be continuous for the same key. This is to allow 
optimization that only keep N-1 relations in memory.
  
  In some situations, more efficient join implementations can be constructed if 
more is known about the data of the relations. They are described in the 
section.
  


[Pig Wiki] Update of "JoinFramework" by OlgaN

2008-12-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

--
  
  Note that CROSS can take advantage of this approach as well.
  
- Details of the current implementation and performance measurements can be 
found in PigFRJoin.
+ Details of the current implementation and performance measurements can be 
found in http://wiki.apache.org/pig/PigFRJoin.
  
  === Indexed Join (IJ) ===
  


[Pig Wiki] Update of "JoinFramework" by OlgaN

2008-12-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

--
  The data coming out of the join is not guaranteed to be sorted on the join 
key which could cause problems for queries that follow join by `GROUP` or 
`ORDER BY` on the prefix of the join key. This should be taken into account 
when choosing join type.
  
  Note that CROSS can take advantage of this approach as well.
+ 
+ Details of the current implementation and performance measurements can be 
found in PigFRJoin.
  
  === Indexed Join (IJ) ===