Pradeep Kamath updated PIG-580:

    Status: Patch Available  (was: Open)

> PERFORMANCE: Combiner should also be used when there are distinct aggregates 
> in a foreach following a group provided there are no non-algebraics in the 
> foreach 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: PIG-580
>                 URL: https://issues.apache.org/jira/browse/PIG-580
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-580.patch
> Currently Pig uses the combiner only when there is foreach following a group 
> when the elements in the foreach generate have the following characteristics:
> 1) simple project of the "group" column
> 2) Algebraic UDF
> The above conditions exclude use of the combiner for distinct aggregates - 
> the distinct operation itself is combinable (irrespective of whether it feeds 
> to an algebraic or non algebraic udf). So if the following foreach should 
> also be combinable:
> {code}
> ..
> b = group a by $0;
> c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
> {code}
> The combiner optimizer should cause the distinct to be combined and the final 
> combine output should feed the COUNT() and SUM() in the reduce.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to