Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigMergeJoin

------------------------------------------------------------------------------
  == Pre conditions for merge join ==
  In the first release merge join will only work under following conditions:
     * Both inputs are sorted in *ascending* order of join keys. If an input 
consists of many files, there should be a total ordering across the files in 
the ascending order of filename. So for example if one of the inputs to the 
join is a directory called input1 with files a and b under it, the data should 
be sorted in ascending order of join key when read starting at a and ending in 
b. Likewise if an input directory has part files part-00000, part-00001, 
part-00002 and part-00003, the data should be sorted if the files are read in 
the sequence part-00000, part-00001, part-00002 and part-00003.
+    * Each part file of the sorted input should have a size of at least 1 hdfs 
block size (for example if the hdfs block size is 128 MB, each part file should 
be > 128 MB). If the total input size (including all part files) is < a 
blocksize, then the part files should be uniform in size (without large skews 
in sizes).
     * The merge join only has two inputs
     * The loadfunc for the right input of the join should implement the 
SamplableLoader interface (PigStorage does implement the SamplableLoader 
interface).
     * Only inner join will be supported

Reply via email to