Pradeep Kamath commented on PIG-926:

Review comments:
In MergeJoinIndexer:
Is there a reason MergeJoinIndexer is a subclass of RandomSampleLoader? It does 
not use numSamples or skipInterval and has its own getNext() - is there any 
benefit to inheritance?
I see that readObject() method has been removed - is this no longer needed?
Also for the case where right pipeline returns POStatus.STATUS_ERR, the code 
currently ignores it and continues. I am wondering in the worst case, if all 
iterations of running the
right pipeline result in POStatus.STATUS_ERR, MergeJoinIndexer would return 
null for the join key. If this happened from all maps, we would get an index 
will all null entries, in this
case, I think merge join succeeds with empty output which would be wrong (I am 
assuming POMergeJoin treats an index full of null entries as indicating that 
the right input is empty). If this would be the case, we should throw an 
ExecException on POStatus.STATUS_ERR.

> Merge-Join phase 2
> ------------------
>                 Key: PIG-926
>                 URL: https://issues.apache.org/jira/browse/PIG-926
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>            Priority: Minor
>         Attachments: mj_phase2_1.patch
> This jira is created to keep track of phase-2 work for MergeJoin. Various 
> limitations exist in phase-1 for Merge Join which are listed on: 
> http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to