Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigMergeJoin

------------------------------------------------------------------------------
  In local mode !LOJoin should not be translated to !POMergeJoin, even when the 
user requests a sort merge join.  We do not need to implement a version of this 
join that does not require the sampling.
  
  == Outer Join ==
- This design will work for inner joins, and with slight modifications for left 
outer joins.  It will not work for right outer or full outer joins.  If we wish 
to extend it to work for those cases at some point in the future, it will have 
to be modified to also sample the left input.  The reason for this is that in 
the current implementation !POMergeJoin does not know how far past the end of 
its input to keep accepting non-matching keys on the right side.  It will need 
to know what key the next block of the left input starts on in order to 
determine when it should stop reading keys from the right input.  A sampling 
pass on the left input that reads the first key of each block could provide 
this information. (Is the intent that each map task will at the end of its 
input continue reading keys from the right side till the first key in the next 
block and perform the outer join - for the outer join for the first key in the 
next block onwards the map task corresponding to that bloc
 k will handle the processing. The extra corner case is the for the first key 
on the left input the outer join for the all the right keys less than that key 
will need to be done by the map task processing the first key (the first key 
would be the first entry in the index for the left side)).
+ This design will work for inner joins, and with slight modifications for left 
outer joins.  It will not work for right outer or full outer joins.  If we wish 
to extend it to work for those cases at some point in the future, it will have 
to be modified to also sample the left input.  The reason for this is that in 
the current implementation !POMergeJoin does not know how far past the end of 
its input to keep accepting non-matching keys on the right side.  It will need 
to know what key the next block of the left input starts on in order to 
determine when it should stop reading keys from the right input.  A sampling 
pass on the left input that reads the first key of each block could provide 
this information. (Is the intent that each map task will at the end of its 
input continue reading keys from the right side till the first key in the next 
block and perform the outer join - for the outer join for the first key in the 
next block onwards the map task corresponding to that bloc
 k will handle the processing. The extra corner case is the for the first key 
on the left input the outer join for the all the right keys less than that key 
will need to be done by the map task processing the first key (the first key 
would be the first entry in the index for the left side)
+ Perhaps a figure might help illustrate:
  
+  Left input   Right input
+ 
+ || 25 ||      || 10 ||
+ || .. ||      || .. ||
+ || 35 ||      || 24 ||
+               || 25 || 
+               || .. ||
+ || 45 ||      || 35 || 
+ || .. ||      || .. ||
+ || 65 ||      || 44 ||
+               || 45 ||
+               || .. ||
+  
  In current implementation (r806281) only inner joins are supported.
  
  == Multiway Join ==

Reply via email to