Pradeep Kamath updated PIG-953:

    Attachment: PIG-953-2.patch

Attached new patch against latest trunk:

Addressing the previous two comments:

bq. [Ashutosh] What about LinkedHashMap? It provides all the properties we are 
seeking here, one data structure, O(1) lookup and guaranteed iteration order.
LinkedHashMap is also a good choice - I think this internal structure at this 
point does not need to
be optimized for lookup - hence leaving it as is

bq. will result in NPE when both obj1 and obj2 are null.
Fixed the NPE in Utils.checkNullAndClass - good catch!

bq. A minor detail: Suppose obj1 is declared of type ArrayList<Integer> and 
obj2 is declared of type ArrayList<String>, obj1.getClass() == obj2.getClass() 
will return true thanks to type erasure by java compiler at compile time. Not 
sure if thats OK or not for the check here. 
You are right - however there is no way to work around type erasure - this is 
the best we can do.

bq. Does zebra requires columns to be named? If it doesn't then SortInfo could 
be changed in such a way that it can provide column position instead of names 
to loader, if columns arent named.
Zebra needs column names and cannot work with positions

bq. Isnt this blocked on https://issues.apache.org/jira/browse/PIG-930
bz2 handling needs to be fixed but this code will be needed when it is fixed. 
This does not make things any
worse since bz2 is currently already broken

bq. because if status is Error, execution should be stopped and exception 
should be thrown as early as possible instead of continue doing work which will 
be wasted. If status is Null NPE will occur while doing join.
Fixed to throw exception

bq. there is no need of passing rightinputfilename from MRCompiler to 
We do need these calls to tell Zebra the filename - you are right that pig's 
doesn't need these - but the code has to work with Zebra also.

bq. Also, seekNear() doesn't sound right. How about seekToClosest() ? 
I know seekNear() is vague and intentionally so - the hope is that users will 
read the javadoc comments
to know how to implement it - seekToClosest would be equally vague in my 
opinion :)

bq. I think introducing order preserving flag on logical operator is a good 
I think the order preserving flag idea should be addressed in a different jira 
as it
is orthogonal to this jira

bq. Instead if we use following, we will achieve the same thing and then 
neither findbugs will complain, nor their is need for our own copy method.
Fixed - removed Utils.getCopy

bq. In POMergeJoin.java
// we should never get here!
        return new Result(POStatus.STATUS_ERR, null);

could be changed to

// we should never get here!
        throw new ExecException(errMsg,2176);
bq. because if we ever get there, it will result in NPE later on otherwise.

The method has to return a Result. I think this will just be passed down the 
pipeline as an error
and should not result in an NPE going by the code in getNext():

Result rightInp = getNextRightInp();
if(rightInp.returnStatus != POStatus.STATUS_OK){
                    prevRightInp = null;
                    return rightInp;

Per a previous review comment, also changed Utils.checkNullEquals() to the 
    public static boolean checkNullEquals(Object obj1, Object obj2, boolean 
checkEquality) {                                                                
        if(obj1 == null || obj2 == null) {                                      
            return obj1 == obj2;                                                
        if(checkEquality) {                                                     
            if(!obj1.equals(obj2)) {                                            
                return false;                                                   
        return true;                                                            

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> ---------------------------------------------------------------------------------------------------------
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953-2.patch, PIG-953.patch
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to