Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by PradeepKamath: http://wiki.apache.org/pig/PigMergeJoin ------------------------------------------------------------------------------ In the first release merge join will only work under following conditions: * Both inputs are sorted in *ascending* order of join keys. If an input consists of many files, there should be a total ordering across the files in the ascending order of filename. So for example if one of the inputs to the join is a directory called input1 with files a and b under it, the data should be sorted in ascending order of join key when read starting at a and ending in b. Likewise if an input directory has part files part-00000, part-00001, part-00002 and part-00003, the data should be sorted if the files are read in the sequence part-00000, part-00001, part-00002 and part-00003. * The merge join only has two inputs - * The loadfunc for the right input of the join should implement the SamplableLoader interface + * The loadfunc for the right input of the join should implement the SamplableLoader interface (PigStorage does implement the SamplableLoader interface). * Only inner join will be supported * Between the load of the sorted input and the merge join statement there can only be filter statements and foreach statement where the foreach statement should meet the following conditions: * There should be no UDFs in the foreach statement