Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by PradeepKamath: http://wiki.apache.org/pig/PigMergeJoin ------------------------------------------------------------------------------ In local mode !LOJoin should not be translated to !POMergeJoin, even when the user requests a sort merge join. We do not need to implement a version of this join that does not require the sampling. == Outer Join == - This design will work for inner joins, and with slight modifications for left outer joins. It will not work for right outer or full outer joins. If we wish to extend it to work for those cases at some point in the future, it will have to be modified to also sample the left input. The reason for this is that in the current implementation !POMergeJoin does not know how far past the end of its input to keep accepting non-matching keys on the right side. It will need to know what key the next block of the left input starts on in order to determine when it should stop reading keys from the right input. A sampling pass on the left input that reads the first key of each block could provide this information. (Is the intent that each map task will at the end of its input continue reading keys from the right side till the first key in the next block and perform the outer join - for the outer join for the first key in the next block onwards the map task corresponding to that bloc k will handle the processing. The extra corner case is the for the first key on the left input the outer join for the all the right keys less than that key will need to be done by the map task processing the first key (the first key would be the first entry in the index for the left side)). + This design will work for inner joins, and with slight modifications for left outer joins. It will not work for right outer or full outer joins. If we wish to extend it to work for those cases at some point in the future, it will have to be modified to also sample the left input. The reason for this is that in the current implementation !POMergeJoin does not know how far past the end of its input to keep accepting non-matching keys on the right side. It will need to know what key the next block of the left input starts on in order to determine when it should stop reading keys from the right input. A sampling pass on the left input that reads the first key of each block could provide this information. (Is the intent that each map task will at the end of its input continue reading keys from the right side till the first key in the next block and perform the outer join - for the outer join for the first key in the next block onwards the map task corresponding to that bloc k will handle the processing. The extra corner case is the for the first key on the left input the outer join for the all the right keys less than that key will need to be done by the map task processing the first key (the first key would be the first entry in the index for the left side) + Perhaps a figure might help illustrate: + Left input Right input + + || 25 || || 10 || + || .. || || .. || + || 35 || || 24 || + || 25 || + || .. || + || 45 || || 35 || + || .. || || .. || + || 65 || || 44 || + || 45 || + || .. || + In current implementation (r806281) only inner joins are supported. == Multiway Join ==
