[ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753801#action_12753801
 ] 

Dmitriy V. Ryaboy commented on PIG-953:
---------------------------------------

Pradeep,

First, I think this is very important to have, not just for Merge but for other 
things that might benefit from knowing sort orders, as well. 

A few minor nits from a cursory glance at the code. I didn't check the actual 
logic very carefully yet -- it looks like the large diff blocks in MergeSort et 
al are mostly moves of code blocks, not significant code changes, correct?

On to the comments:

seekNear seems ambiguous, as "near" is a generic concept that does not 
necessarily imply "before or to, but not after" -- which is what this method is 
required to do. How about "seekBefore()"?

Why does getAscColumns and getSortColumns make a copy of the list?  Seems like 
we can save some memory and cpu here.

For that matter, why not use a map of (String)colName-> (Boolean)ascending 
instead of 2 lists? One structure, plus O(1) lookup.

Not sure about the use of super() in the constructor of a class that doesn't 
extend anything but Object. Is there some magic that requires it?

In Log2PhysTranslator, why hardcode the Limit operator? There are other 
operators that don't change sort order, such as filter. Perhaps add a method to 
Logical Operators that indicates if they alter sort order of their inputs?


in Utils

checkNullEquals is better written as

{code}
if (obj1 == null || obj2 == null) {
        return obj1 == obj2;
} else  {
        return checkEquality ? obj1.equals(obj2) : true;
}
{code}

Even with this rewrite, this seems like an odd function. It being as odd as it 
is leads to it not being used safely when you set checkEquality to false (just 
a few lines later)-- if obj1 is null and obj2 is not, the func returns true, 
you try to call a method on obj1, and get an NPE.

Probably better not to roll all this into one amorphous function and simply 
write

{code}
Util.bothNull(obj1, obj2) || (Util.notNull(obj1, obj2) && obj1.equals(obj2));
{code}

(the implementations of bothNull and notNull are obvious -- just conjunction 
and disjunction of obj == null)

In StoreConfig
This comment has a typo (and instead of "an"): 
"* 1) the store does not follow and order by"



> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to