[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744706#action_12744706 ]
Pradeep Kamath commented on PIG-926: ------------------------------------ Review comments: In MergeJoinIndexer: Is there a reason MergeJoinIndexer is a subclass of RandomSampleLoader? It does not use numSamples or skipInterval and has its own getNext() - is there any benefit to inheritance? I see that readObject() method has been removed - is this no longer needed? Also for the case where right pipeline returns POStatus.STATUS_ERR, the code currently ignores it and continues. I am wondering in the worst case, if all iterations of running the right pipeline result in POStatus.STATUS_ERR, MergeJoinIndexer would return null for the join key. If this happened from all maps, we would get an index will all null entries, in this case, I think merge join succeeds with empty output which would be wrong (I am assuming POMergeJoin treats an index full of null entries as indicating that the right input is empty). If this would be the case, we should throw an ExecException on POStatus.STATUS_ERR. > Merge-Join phase 2 > ------------------ > > Key: PIG-926 > URL: https://issues.apache.org/jira/browse/PIG-926 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Priority: Minor > Attachments: mj_phase2_1.patch > > > This jira is created to keep track of phase-2 work for MergeJoin. Various > limitations exist in phase-1 for Merge Join which are listed on: > http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.