Alan Gates updated PIG-484:

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked in.  I ran performance tests on large data and saw no significant 
changes.  This is fine, as this change is more for scalability than performance.

> PERFORMANCE: streaming data to aggregate functions
> --------------------------------------------------
>                 Key: PIG-484
>                 URL: https://issues.apache.org/jira/browse/PIG-484
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-484.patch
> Currently, for queries like
> A = load 'data';
> B = group A by $0;
> C = foreach A generate group, MIN(A.$1), MAX (A.$1)
> The data will be put into the bag before being passed to aggregate functions. 
> This is unnecessary and inefficient. In this case, data can be just streamed 
> to the functions.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to