[ 
https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548021
 ] 

Utkarsh Srivastava commented on PIG-7:
--------------------------------------

Looks good. But as Alan said, we should rework this code when there is more 
time. 

2 comments one major, and one minor, both in PigCombine.java

Major:
Lines 93.94: Don't add the indexed tuple directly to the bag. We had a nasty 
bug a while back regarding this. Convert it into a regular tuple before adding 
it.  see lines 148,149 in PigMapReduce.java


Minor: It would be nice to clean up the comments from PigCombine. Also, there 
are some fragments that don't make sense given the restricted setting we are 
applying the cominer in.

For example,

 for (int i = 0; i < inputCount; i++) {  // XXX: shouldn't we only do this if 
INNER flag is set?
                if (t.getBagField(1 + i).isEmpty())
                    return;
            }

Since we are currently running for the case when inputCount == 1, the bag will 
never be empty. (If the bag is empty, that group would never have been created).



> Optimize execution of algebraic functions
> -----------------------------------------
>
>                 Key: PIG-7
>                 URL: https://issues.apache.org/jira/browse/PIG-7
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: combiner.patch, combiner2.patch, combiner3.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X), 
> SUM(X), etc. They can be computed effciently by doing the first level 
> computation using hadoop combiner. This can give a significant (2-3x) speedup 
> for many aggregation queries. 
> Several users asked us for this feature so it is pretty high priority.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to