Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigMergeJoin

------------------------------------------------------------------------------
  In the first release merge join will only work under following conditions:
     * Both inputs are sorted in *ascending* order of join keys. If an input 
consists of many files, there should be a total ordering across the files in 
the ascending order of filename. So for example if one of the inputs to the 
join is a directory called input1 with files a and b under it, the data should 
be sorted in ascending order of join key when read starting at a and ending in 
b. Likewise if an input directory has part files part-00000, part-00001, 
part-00002 and part-00003, the data should be sorted if the files are read in 
the sequence part-00000, part-00001, part-00002 and part-00003.
     * The merge join only has two inputs
-    * The loadfunc for the right input of the join should implement the 
SamplableLoader interface
+    * The loadfunc for the right input of the join should implement the 
SamplableLoader interface (PigStorage does implement the SamplableLoader 
interface).
     * Only inner join will be supported
     * Between the load of the sorted input and the merge join statement there 
can only be filter statements and foreach statement where the foreach statement 
should meet the following conditions:
        * There should be no UDFs in the foreach statement

Reply via email to